[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.2.patch

There are 2 major changes from the last patch:
 - added a parameter hive.metastore.batch.retrieve.max to control the maximum 
number of partitions can be retrieved from the metastore in one batch (default 
300). In Hive.getPartitionsByNames(), the input partition name list are 
separated into sublists and call the metastore API for each sublist.
 - one of the most time consuming DB operations is the retrieve the sub-classes 
of MPartition. In particular the list of FieldSchema are retrieved for each 
partition and they are never used (the table's field schema is used for all 
partitions). So one of the changes here is to omit the retrieval of FieldSchema 
and make the table's fieldschema as the partitions. If later we need the 
partition's fieldschema for schema evaluation, we should add another 
function/flag for that. 

These changes reduce memory by 50% and CPU by 20%. 

The review board is also updated with the Java-only patch. 

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2065) RCFile issues

2011-03-28 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2065:


Attachment: proposal.png

The layout as implemented in the attached patch

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2065) RCFile issues

2011-03-28 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2065:


Attachment: HIVE.2065.patch.0.txt

Patch. Since it contains binary files, it was generated with --binary, and 
needs to be applied as git apply HIVE.2065.patch.0.txt

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-28 Thread Krishna Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012125#comment-13012125
 ] 

Krishna Kumar commented on HIVE-2065:
-

Review Board Entry at https://reviews.apache.org/r/529/

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compliant

2011-03-28 Thread Krishna

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/529/
---

Review request for hive and Yongqiang He.


Summary
---

Patch for HIVE-2065


This addresses bug HIVE-2065.
https://issues.apache.org/jira/browse/HIVE-2065


Diffs
-

  build-common.xml 9f21a69 
  data/files/test_v6_compressed.rc PRE-CREATION 
  data/files/test_v6_uncompressed.rc PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 20d1f4e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 f7eacdc 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
bb1e3c9 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a 
  ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc 
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 
  ql/src/test/results/clientpositive/sample10.q.out 50406c3 

Diff: https://reviews.apache.org/r/529/diff


Testing
---

Tests added, existing tests updated


Thanks,

Krishna



Re: Review Request: HIVE-2050. batch processing partition pruning process

2011-03-28 Thread namit jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/#review355
---


mostly minor issues - can you update the patch, and I will try to get it in 
today


trunk/conf/hive-default.xml
https://reviews.apache.org/r/522/#comment705

spelling: alsore



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/522/#comment706

remove commented code



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
https://reviews.apache.org/r/522/#comment708

Are these parameters used ?



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
https://reviews.apache.org/r/522/#comment707

This check should be inside the loop where
we are iterating over all the partitions.

It may not matter, but we are marking all 
partitions as unknown even if one partition is
unknown.


- namit


On 2011-03-27 22:59:19, Ning Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/522/
 ---
 
 (Updated 2011-03-27 22:59:19)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Introducing a new metastore API to retrieve a list of partitions in batch. 
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 108 
   trunk/conf/hive-default.xml 108 
   trunk/metastore/if/hive_metastore.thrift 108 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 108 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  108 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  108 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 108 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 108 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 108 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 108 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 108 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java
  108 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  108 
 
 Diff: https://reviews.apache.org/r/522/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ning
 




Re: Review Request: HIVE-2050. batch processing partition pruning process

2011-03-28 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
---

(Updated 2011-03-28 11:02:07.935934)


Review request for hive.


Changes
---

Taken Namit's comments. 


Summary
---

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 108 
  trunk/conf/hive-default.xml 108 
  trunk/metastore/if/hive_metastore.thrift 108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
108 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 108 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
108 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 108 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 108 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java
 108 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
108 

Diff: https://reviews.apache.org/r/522/diff


Testing
---


Thanks,

Ning



[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.3.patch

Taken Namit's comment. Review board is also updated. 

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012167#comment-13012167
 ] 

Namit Jain commented on HIVE-2050:
--

+1

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #56

2011-03-28 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/56/

--
[...truncated 26924 lines...]
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out
 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out
[junit] Done query: wrong_distinct1.q
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103281226_1550723762.txt
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103281226_2010052758.txt
[junit] Copying file: 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, 
hr=11)
[junit] POSTHOOK: query: LOAD 

Build failed in Jenkins: Hive-trunk-h0.20 #645

2011-03-28 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/645/

--
[...truncated 29923 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-28_12-28-58_288_1474429126020538153/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-28 12:29:01,379 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-28_12-28-58_288_1474429126020538153/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103281229_871865812.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-28_12-29-02_889_3598898194217223253/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-28_12-29-02_889_3598898194217223253/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103281229_285404463.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
 

[jira] [Assigned] (HIVE-1406) for query against non-native table such as HBase, choose number of mapper tasks automatically

2011-03-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1406:


Assignee: Ted Yu  (was: John Sichi)

 for query against non-native table such as HBase, choose number of mapper 
 tasks automatically
 -

 Key: HIVE-1406
 URL: https://issues.apache.org/jira/browse/HIVE-1406
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Ted Yu

 For native tables, we do this automatically, but for HBase tables, we get 
 only 2 mappers by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 8)

2011-03-28 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/530/
---

Review request for hive.


Summary
---

New review request for HIVE-1803.8


This addresses bug HIVE-1803.
https://issues.apache.org/jira/browse/HIVE-1803


Diffs
-

  lib/README 1c2f0b1 
  lib/javaewah-0.2.jar PRE-CREATION 
  lib/javaewah.LICENSE PRE-CREATION 
  ql/build.xml 1c7570d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
1f01446 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java
 6c320c5 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java 
0c9ccea 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java
 eac168f 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java
 26beb4e 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
391e5de 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapBop.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 
  ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 
  ql/src/test/queries/clientpositive/index_compact_3.q ee8abda 
  ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitmap_and.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitmap_or.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/530/diff


Testing
---


Thanks,

John



[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1803:
-

Status: Open  (was: Patch Available)

Latest review comments are in:

https://reviews.apache.org/r/530/


 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
 HIVE-1803.8.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, 
 bitmap_index_2.png, javaewah.jar, javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-03-28 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012279#comment-13012279
 ] 

Devaraj Das commented on HIVE-1988:
---

I updated the reviewboard https://reviews.apache.org/r/528/ with some changes. 
The main change is: The methods getDelegationToken/renewDelegationToken now 
check for the authentication method being KERBEROS. If not, then it refuses to 
give a delegation token. This takes care of a security hole, where, if a 
delegation token has been compromised, the malicious user in possession of the 
token could use it to authenticate itself with the metastore, and get a new 
delegation token. This process could go forever (and hence the malicious user 
could access the metastore without ever going through a kerberos 
authentication). Making the handing out of delegation tokens based on a prior 
kerberos authentication limits this.

Also, the patch on reviewboard doesn't have generated code. I will upload it 
once someone takes a look at the patch and gives feedback.

 Make the delegation token issued by the MetaStore owned by the right user
 -

 Key: HIVE-1988
 URL: https://issues.apache.org/jira/browse/HIVE-1988
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security, Server Infrastructure
Affects Versions: 0.7.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.8.0

 Attachments: hive-1988-3.patch, hive-1988.patch


 The 'owner' of any delegation token issued by the MetaStore is set to the 
 requesting user. When a delegation token is asked by the user himself during 
 a job submission, this is fine. However, in the case where the token is 
 requested for by services (e.g., Oozie), on behalf of the user, the token's 
 owner is set to the user the service is running as. Later on, when the token 
 is used by a MapReduce task, the MetaStore treats the incoming request as 
 coming from Oozie and does operations as Oozie. This means any new directory 
 creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
 Oozie as the owner.
 Also, the MetaStore doesn't check whether a user asking for a token on behalf 
 of some other user, is actually authorized to act on behalf of that other 
 user. We should start using the ProxyUser authorization in the MetaStore 
 (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Questions about Hive Database/Schema support

2011-03-28 Thread Vijay
Hive 0.6 has support for multiple databases/schemas. Is this feature
mature enough to be used in production? Are there any particular
features known to not work with databases (I know you cannot run
queries using multiple databases at the same time)? Currently, there
doesn't seem to be an easy way to move existing tables into a new
database from CLI but it should be possible to do this directly by
modifying the metastore right? Is there anything to watch out for?

Thanks,
Vijay


[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012339#comment-13012339
 ] 

Namit Jain commented on HIVE-2050:
--

can you update review board also ?

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira