date:20130922

[jira] [Updated] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

2013-09-22 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3420:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Inefficiency in hbase handler when process query including rowkey range scan
 

 Key: HIVE-3420
 URL: https://issues.apache.org/jira/browse/HIVE-3420
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
 Environment: Hive-0.9.0 + HBase-0.94.1
Reporter: Gang Deng
Assignee: Navis
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-3420.D7311.1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When query hive with hbase rowkey range, hive map tasks do not leverage 
 startrow, endrow information in tablesplit. For example, if the rowkeys fit 
 into 5 hbase files, then where will be 5 map tasks. Ideally, each task will 
 process 1 file. But in current implementation, each task processes 5 files 
 repeatedly. The behavior not only waste network bandwidth, but also worse the 
 lock contention in HBase block cache as each task have to access the same 
 block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below:
 ……
 if (tableSplit != null) {
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 startRow,
 stopRow,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(startRow);
 scan.setStopRow(stopRow);
 ……
 As tableSplit already include startRow, endRow information of file, the 
 better implementation will be:
 ……
 byte[] splitStart = startRow;
 byte[] splitStop = stopRow;
 if (tableSplit != null) {
 
if(tableSplit.getStartRow() != null){
 splitStart = startRow.length == 0 ||
   Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ?
 tableSplit.getStartRow() : startRow;
 }
 if(tableSplit.getEndRow() != null){
 splitStop = (stopRow.length == 0 ||
   Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) 
   tableSplit.getEndRow().length  0 ?
 tableSplit.getEndRow() : stopRow;
 }   
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 splitStart,
 splitStop,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(splitStart);
 scan.setStopRow(splitStop);
 ……
 In my test, the changed code will improve performance more than 30%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4247) Filtering on a hbase row key duplicates results across multiple mappers

2013-09-22 Thread Ashutosh Chauhan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan resolved HIVE-4247.

Resolution: Duplicate
Fix Version/s: 0.13.0

Fixed via HIVE-3420

Filtering on a hbase row key duplicates results across multiple mappers
---

Key: HIVE-4247
URL: https://issues.apache.org/jira/browse/HIVE-4247
Project: Hive
Issue Type: Bug
Components: HBase Handler
Affects Versions: 0.9.0
Environment: All Platforms
Reporter: Karthik Kumara
Labels: patch
Fix For: 0.13.0

Attachments: HiveHBaseTableInputFormat.patch

Steps to reproduce
1. Create a Hive external table with HiveHbaseHandler with enough data in the
hbase table to spawn multiple mappers for the hive query.
2. Write a query which has a filter (in the where clause) based on the hbase
row key.
3. Running the map reduce job leads to each mapper querying the entire data
set. duplicating the data for each mapper. Each mapper processes the entire
filtered range and the results get multiplied as the number of mappers run.
Expected behavior:
Each mapper should process a different part of the data and should not
duplicate.
Cause:
The cause seems to be the convertFilter method in HiveHBaseTableInputFormat.
convertFilter has this piece of code which rewrites the start and the stop
row for each split which leads each mapper to process the entire range
if (tableSplit != null) {
tableSplit = new TableSplit(
tableSplit.getTableName(),
startRow,
stopRow,
tableSplit.getRegionLocation());
}
The scan already has the start and stop row set when the splits are created.
So this piece of code is probably redundant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

2013-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774003#comment-13774003
 ] 

Hudson commented on HIVE-3420:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #179 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/179/])
HIVE-3420 : Inefficiency in hbase handler when process query including rowkey 
range scan (Navis via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525329)
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java


 Inefficiency in hbase handler when process query including rowkey range scan
 

 Key: HIVE-3420
 URL: https://issues.apache.org/jira/browse/HIVE-3420
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
 Environment: Hive-0.9.0 + HBase-0.94.1
Reporter: Gang Deng
Assignee: Navis
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-3420.D7311.1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When query hive with hbase rowkey range, hive map tasks do not leverage 
 startrow, endrow information in tablesplit. For example, if the rowkeys fit 
 into 5 hbase files, then where will be 5 map tasks. Ideally, each task will 
 process 1 file. But in current implementation, each task processes 5 files 
 repeatedly. The behavior not only waste network bandwidth, but also worse the 
 lock contention in HBase block cache as each task have to access the same 
 block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below:
 ……
 if (tableSplit != null) {
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 startRow,
 stopRow,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(startRow);
 scan.setStopRow(stopRow);
 ……
 As tableSplit already include startRow, endRow information of file, the 
 better implementation will be:
 ……
 byte[] splitStart = startRow;
 byte[] splitStop = stopRow;
 if (tableSplit != null) {
 
if(tableSplit.getStartRow() != null){
 splitStart = startRow.length == 0 ||
   Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ?
 tableSplit.getStartRow() : startRow;
 }
 if(tableSplit.getEndRow() != null){
 splitStop = (stopRow.length == 0 ||
   Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) 
   tableSplit.getEndRow().length  0 ?
 tableSplit.getEndRow() : stopRow;
 }   
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 splitStart,
 splitStop,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(splitStart);
 scan.setStopRow(splitStop);
 ……
 In my test, the changed code will improve performance more than 30%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

2013-09-22 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5154:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Remove unnecessary array creation in ReduceSinkOperator
 ---

 Key: HIVE-5154
 URL: https://issues.apache.org/jira/browse/HIVE-5154
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.13.0

 Attachments: HIVE-5154.D12549.1.patch


 Key array is created for each row, which seemed not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code

2013-09-22 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5253:
--

Attachment: HIVE-5253.3.patch.txt

 Create component to compile and jar dynamic code
 

 Key: HIVE-5253
 URL: https://issues.apache.org/jira/browse/HIVE-5253
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, 
 HIVE-5253.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774088#comment-13774088
 ] 

Hudson commented on HIVE-4113:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #450 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/450/])
HIVE-4113 : Optimize select count(1) with RCFile and Orc (Brock Noland and Yin 
Huai via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525322)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes.q.out
* /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes2.q.out
* /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes3.q.out
* /hive/trunk/contrib/src/test/results/clientpositive/serde_typedbytes5.q.out
* /hive/trunk/contrib/src/test/results/clientpositive/udf_row_sequence.q.out
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
* /hive/trunk/hbase-handler/src/test/results/positive/hbase_queries.q.out
* 
/hive/trunk/hbase-handler/src/test/results/positive/hbase_single_sourced_multi_insert.q.out
* 
/hive/trunk/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java
* 
/hive/trunk/hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
* /hive/trunk/ql/src/test/queries/clientpositive/binary_table_colserde.q
* /hive/trunk/ql/src/test/results/clientpositive/auto_join0.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join15.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join18.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join30.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join31.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/auto_join_reordering_values.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_smb_mapjoin_14.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/binary_output_format.q.out
* /hive/trunk/ql/src/test/results/clientpositive/binary_table_colserde.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out
*

[jira] [Commented] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

2013-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774095#comment-13774095
 ] 

Hudson commented on HIVE-5154:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #451 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/451/])
HIVE-5154 : Remove unnecessary array creation in ReduceSinkOperator (Navis via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525381)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java


 Remove unnecessary array creation in ReduceSinkOperator
 ---

 Key: HIVE-5154
 URL: https://issues.apache.org/jira/browse/HIVE-5154
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.13.0

 Attachments: HIVE-5154.D12549.1.patch


 Key array is created for each row, which seemed not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

2013-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774096#comment-13774096
 ] 

Hudson commented on HIVE-3420:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #451 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/451/])
HIVE-3420 : Inefficiency in hbase handler when process query including rowkey 
range scan (Navis via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525329)
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java


 Inefficiency in hbase handler when process query including rowkey range scan
 

 Key: HIVE-3420
 URL: https://issues.apache.org/jira/browse/HIVE-3420
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
 Environment: Hive-0.9.0 + HBase-0.94.1
Reporter: Gang Deng
Assignee: Navis
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-3420.D7311.1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When query hive with hbase rowkey range, hive map tasks do not leverage 
 startrow, endrow information in tablesplit. For example, if the rowkeys fit 
 into 5 hbase files, then where will be 5 map tasks. Ideally, each task will 
 process 1 file. But in current implementation, each task processes 5 files 
 repeatedly. The behavior not only waste network bandwidth, but also worse the 
 lock contention in HBase block cache as each task have to access the same 
 block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below:
 ……
 if (tableSplit != null) {
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 startRow,
 stopRow,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(startRow);
 scan.setStopRow(stopRow);
 ……
 As tableSplit already include startRow, endRow information of file, the 
 better implementation will be:
 ……
 byte[] splitStart = startRow;
 byte[] splitStop = stopRow;
 if (tableSplit != null) {
 
if(tableSplit.getStartRow() != null){
 splitStart = startRow.length == 0 ||
   Bytes.compareTo(tableSplit.getStartRow(), startRow) = 0 ?
 tableSplit.getStartRow() : startRow;
 }
 if(tableSplit.getEndRow() != null){
 splitStop = (stopRow.length == 0 ||
   Bytes.compareTo(tableSplit.getEndRow(), stopRow) = 0) 
   tableSplit.getEndRow().length  0 ?
 tableSplit.getEndRow() : stopRow;
 }   
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 splitStart,
 splitStop,
 tableSplit.getRegionLocation());
 }
 scan.setStartRow(splitStart);
 scan.setStopRow(splitStop);
 ……
 In my test, the changed code will improve performance more than 30%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-664) optimize UDF split

2013-09-22 Thread Ashutosh Chauhan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774126#comment-13774126
]

Ashutosh Chauhan commented on HIVE-664:
---

optimize UDF split
--

Key: HIVE-664
URL: https://issues.apache.org/jira/browse/HIVE-664
Project: Hive
Issue Type: Bug
Components: UDF
Reporter: Namit Jain
Assignee: Teddy Choi
Labels: optimization
Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt

Min Zhou added a comment - 21/Jul/09 07:34 AM
It's very useful for us .
some comments:
1. Can you implement it directly with Text ? Avoiding string decoding and
encoding would be faster. Of course that trick may lead to another problem,
as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
[ Show » ]
Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some
comments:
1. Can you implement it directly with Text ? Avoiding string decoding and
encoding would be faster. Of course that trick may lead to another problem,
as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
[ Permlink | « Hide ]
Namit Jain added a comment - 21/Jul/09 09:22 AM
Committed. Thanks Emil
[ Show » ]
Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil
[ Permlink | « Hide ]
Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM
There are some easy (compromise) ways to optimize split:
1. Check if the regex argument actually contains some regex specific
characters and if it doesn't, do a straightforward split without converting
to strings.
2. Assume some default value for the second argument (for example -
split(str) to be equivalent to split(str, ' ') and optimize for this value
3. Have two separate split functions - one that does regex and one that
splits around plain text.
I think that 1 is a good choice and can be done rather quickly.
[ Show » ]
Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy
(compromise) ways to optimize split: 1. Check if the regex argument actually
contains some regex specific characters and if it doesn't, do a
straightforward split without converting to strings. 2. Assume some default
value for the second argument (for example - split(str) to be equivalent to
split(str, ' ') and optimize for this value 3. Have two separate split
functions - one that does regex and one that splits around plain text. I
think that 1 is a good choice and can be done rather quickly.

[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2

2013-09-22 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774211#comment-13774211
 ] 

Vaibhav Gumashta commented on HIVE-4763:


[~cwsteinbach] [~thejas] Uploaded the changes on phabricator: 
https://reviews.facebook.net/D12951

 add support for thrift over http transport in HS2
 -

 Key: HIVE-4763
 URL: https://issues.apache.org/jira/browse/HIVE-4763
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
 HIVE-4763.D12855.1.patch


 Subtask for adding support for http transport mode for thrift api in hive 
 server2.
 Support for the different authentication modes will be part of another 
 subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-22 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774212#comment-13774212
]

Sushanth Sowmyan commented on HIVE-5274:

Okay, having generated a simple patch that changes the imports, tests are
failing because of issues casting HBaseHCatStorageHandler to a
HCatStorageHandler(which the old HCIF/HCOF do), which it no longer is - it is a
HiveStorageHandler.

So, what we can do is as follows:

a) old code can be rewritten to use HiveStorageHandler as a post HIVE-5261 mode
- i.e. the only difference between org.apache.hcatalog.* and
org.apache.hive.hcatalog.* will be that the package names are different. But if
we did that, then that might break external storage handlers that people have
written against HCatStorageHandler that they expect to continue working in a
org.apache.hcatalog.* world.

b) viraj's changes to the hbase hcat storagehandler can be changed so that it
continues to follow the old model, using HCatStorageHandler, and not using
HiveStorageHandler.

The latter is less disruptive a change, but this effectively relegates its use
to the org.apache.hcatalog.* world. I still see that as ideal here. However, if
this approach is taken, then the HCatHBaseStorageHandler is not usable any
longer by the org.apache.hive.hcatalog.* world. That said, I do not think that
is a problem, since we're trying to deprecate this too, and we'd be encouraging
use of the Hive HBaseStorageHandler for any new code that might use the new
packages.

[~alangates], [~ekoifman] : Any thoughts/opinions?

HCatalog package renaming backward compatibility follow-up
--

Key: HIVE-5274
URL: https://issues.apache.org/jira/browse/HIVE-5274
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 0.12.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Fix For: 0.12.0

As part of HIVE-4869, the hbase storage handler in hcat was moved to
org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it
was intended to be deprecated as well.
However, it imports and uses several org.apache.hive.hcatalog classes. This
needs to be changed to use org.apache.hcatalog classes.
==
Note : The above is a complete description of this issue in and of by itself,
the following is more details on the backward-compatibility goal I have(not
saying that each of these things are violated) :
a) People using org.apache.hcatalog packages should continue being able to
use that package, and see no difference at compile time or runtime. All code
here is considered deprecated, and will be gone by the time hive 0.14 rolls
around. Additionally, org.apache.hcatalog should behave as if it were 0.11
for all compatibility purposes.
b) People using org.apache.hive.hcatalog packages should never have an
org.apache.hcatalog dependency injected in.
Thus,
It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages
internally (say HCatUtil, for example), as long as any interfaces only expose
org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be
capable of testing it from a pure org.apache.hcatalog.\* world.
It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog,
even in tests.

[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code

2013-09-22 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5253:
--

Attachment: HIVE-5253.3.patch.txt

 Create component to compile and jar dynamic code
 

 Key: HIVE-5253
 URL: https://issues.apache.org/jira/browse/HIVE-5253
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, 
 HIVE-5253.3.patch.txt, HIVE-5253.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-664) optimize UDF split

2013-09-22 Thread Ashutosh Chauhan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan updated HIVE-664:
--

Status: Open (was: Patch Available)

Test {{global_limit.q}} failed.

optimize UDF split
--

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-22 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5279:


Status: Patch Available  (was: Open)

Missed those tests and sorry for delay. 

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
 D12963.3.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-22 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5279:
--

Attachment: D12963.4.patch

navis updated the revision HIVE-5279 [jira] Kryo cannot instantiate 
GenericUDAFEvaluator in GroupByDesc.

  Fixed test fails

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12963

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12963?vs=40065id=40299#toc

BRANCH
  HIVE-5279

ARCANIST PROJECT
  hive

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSumList.java
  ql/src/test/queries/clientpositive/udaf_sum_list.q
  ql/src/test/results/clientpositive/udaf_sum_list.q.out
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml

To: JIRA, ashutoshc, navis


 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
 D12963.3.patch, D12963.4.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-22 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774244#comment-13774244
 ] 

Navis commented on HIVE-5320:
-

It seemed a bug on json-serde(which was on hive before, too) not on hive. Why 
should we do walk-around for this?

 Querying a table with nested struct type over JSON data results in errors
 -

 Key: HIVE-5320
 URL: https://issues.apache.org/jira/browse/HIVE-5320
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-5320.patch


 Querying a table with nested_struct datatype like
 ==
 create table nest_struct_tbl (col1 string, col2 arraystructa1:string, 
 a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe'; 
 ==
 over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
 corrupted data. 
 The JsonSerDe used is 
 json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
 The cause is that the method:
 public ListObject getStructFieldsDataAsList(Object o) 
 in JsonStructObjectInspector.java 
 returns a list referencing to a static arraylist values
 So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
 class is returned with same reference in its recursive calls and its element 
 values are kept on being overwritten in the case STRUCT.
 Solutions:
 1. Fix in JsonSerDe, and change the field 'values' in 
 java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
 to instance scope.
 Filed a ticket to JSonSerDe 
 (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
 defensively save a copy of a list resulted from list = 
 soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
 JsonStructObjectInspector, so that the recursive calls of serialize can work 
 properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1577) Add configuration property hive.exec.local.scratchdir

2013-09-22 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774255#comment-13774255
 ] 

Lefty Leverenz commented on HIVE-1577:
--

Added hive.exec.local.scratchdir to Hive Admin Configuration wikidoc -- 
[https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration].

Also updated the default value of hive.exec.scratchdir.

 Add configuration property hive.exec.local.scratchdir
 -

 Key: HIVE-1577
 URL: https://issues.apache.org/jira/browse/HIVE-1577
 Project: Hive
  Issue Type: New Feature
  Components: Configuration
Reporter: Carl Steinbach

 When Hive is run in local mode it uses the hardcoded local directory 
 {{/${java.io.tmpdir}/${user.name}}} for temporary files. This path should be
 configurable via the property {{hive.exec.local.scratchdir}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server

2013-09-22 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774261#comment-13774261
 ] 

Ashutosh Chauhan commented on HIVE-5172:


Though fix seems harmless, I think it doesn't eliminate root cause. Under 
intense GC pressure, same problem may occur even with this fix. I am fine 
putting this in, but just be aware this fix has not solved the root cause.

 TUGIContainingTransport returning null transport, causing intermittent 
 SocketTimeoutException on hive client and NullPointerException in 
 TUGIBasedProcessor on the server
 -

 Key: HIVE-5172
 URL: https://issues.apache.org/jira/browse/HIVE-5172
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: agate
 Attachments: HIVE-5172.1.patch.txt


 We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore 
 Server 0.9) where we get connection reset or connection timeout errors on the 
 client and NullPointerException in TUGIBasedProcessor on the server. 
 {code}
 hive client logs:
 =
 org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timed out
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 ... 31 more
 {code}
 {code}
 hive metastore server logs:
 ===
 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer 
 (TThreadPoolServer.java:run(182)) - Error occurred during processing of 
 message.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at

[jira] [Resolved] (HIVE-5340) TestJdbcDriver2 is failing on trunk.

2013-09-22 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-5340.


Resolution: Invalid

{{ant very-clean}} fixed the problem in my setup.

 TestJdbcDriver2 is failing on trunk.
 

 Key: HIVE-5340
 URL: https://issues.apache.org/jira/browse/HIVE-5340
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Chauhan

 Seems to be related to yesterday's HIVE-5209 commit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.

2013-09-22 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5338:
---

Component/s: Tests

 TestJdbcDriver2 is failing on trunk.
 

 Key: HIVE-5338
 URL: https://issues.apache.org/jira/browse/HIVE-5338
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Ashutosh Chauhan

 Seems to be related to yesterday's HIVE-5209 commit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.

2013-09-22 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5338:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-5340

 TestJdbcDriver2 is failing on trunk.
 

 Key: HIVE-5338
 URL: https://issues.apache.org/jira/browse/HIVE-5338
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Ashutosh Chauhan

 Seems to be related to yesterday's HIVE-5209 commit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5221) Issue in colun type with data type as BINARY

2013-09-22 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774270#comment-13774270
 ] 

Ashutosh Chauhan commented on HIVE-5221:


yup.. No default decoding. Just remove conditional decoding. This will break 
backward compat, but user can just use udf to get the desired behavior. We 
currently we have a bug, I don't think there is any advantage of being 
backward-bug-compatible. That is just backwards : )

 Issue in colun type with data type as BINARY
 

 Key: HIVE-5221
 URL: https://issues.apache.org/jira/browse/HIVE-5221
 Project: Hive
  Issue Type: Bug
Reporter: Arun Vasu
Assignee: Mohammad Kamrul Islam
Priority: Critical
 Attachments: HIVE-5221.1.patch


 Hi,
 I am using Hive 10. When I create an external table with column type as 
 Binary, the query result on the table is showing some junk values for the 
 column with binary datatype.
 Please find below the query I have used to create the table:
 CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
  ROW FORMAT DELIMITED
FIELDS TERMINATED BY '^'
LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/hivetables/testbinary';
 The query I have used is : select * from bool1
 The sample data in the hdfs file is:
 0^a...@abc.com^001
 1^a...@abc.com^010
  ^a...@abc.com^011
  ^a...@abc.com^100
 t^a...@abc.com^101
 f^a...@abc.com^110
 true^a...@abc.com^111
 false^a...@abc.com^001
 123^^01100010
 12344^^0111
 Please share your inputs if it is possible.
 Thanks,
 Arun

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

2013-09-22 Thread Rakesh Chouhan (JIRA)

Rakesh Chouhan created HIVE-5341:


 Summary: Link doesn't work. Needs to be updated as mentioned in 
the Description
 Key: HIVE-5341
 URL: https://issues.apache.org/jira/browse/HIVE-5341
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Rakesh Chouhan
Priority: Blocker


Go to.. Apache HIVE Getting Started Documentation

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Under Section ...

Simple Example Use Cases
MovieLens User Ratings

wget http://www.grouplens.org/system/files/ml-data.tar+0.gz

The link mentioned as per the document does not work. It needs to be updated to 
the below URL.

http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz

I am setting this defect's priority as a Blocker because, user's will not be 
able to continue their hands on exercises, unless they find the correct URL to 
download the mentioned file.

Referenced from:
http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

[jira] [Resolved] (HIVE-4247) Filtering on a hbase row key duplicates results across multiple mappers

[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

[jira] [Commented] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

[jira] [Commented] (HIVE-3420) Inefficiency in hbase handler when process query including rowkey range scan

[jira] [Commented] (HIVE-664) optimize UDF split

[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code

[jira] [Updated] (HIVE-664) optimize UDF split

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

[jira] [Commented] (HIVE-1577) Add configuration property hive.exec.local.scratchdir

[jira] [Commented] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server

[jira] [Resolved] (HIVE-5340) TestJdbcDriver2 is failing on trunk.

[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.

[jira] [Updated] (HIVE-5338) TestJdbcDriver2 is failing on trunk.

[jira] [Commented] (HIVE-5221) Issue in colun type with data type as BINARY

[jira] [Created] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

23 matches

Site Navigation

Mail list logo

Footer information