[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Attachment: HIVE-2050.2.patch There are 2 major changes from the last patch: - added a parameter hive.metastore.batch.retrieve.max to control the maximum number of partitions can be retrieved from the metastore in one batch (default 300). In Hive.getPartitionsByNames(), the input partition name list are separated into sublists and call the metastore API for each sublist. - one of the most time consuming DB operations is the retrieve the sub-classes of MPartition. In particular the list of FieldSchema are retrieved for each partition and they are never used (the table's field schema is used for all partitions). So one of the changes here is to omit the retrieval of FieldSchema and make the table's fieldschema as the partitions. If later we need the partition's fieldschema for schema evaluation, we should add another function/flag for that. These changes reduce memory by 50% and CPU by 20%. The review board is also updated with the Java-only patch. batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Status: Patch Available (was: Open) batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2065: Attachment: proposal.png The layout as implemented in the attached patch RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2065: Attachment: HIVE.2065.patch.0.txt Patch. Since it contains binary files, it was generated with --binary, and needs to be applied as git apply HIVE.2065.patch.0.txt RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012125#comment-13012125 ] Krishna Kumar commented on HIVE-2065: - Review Board Entry at https://reviews.apache.org/r/529/ RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compliant
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/529/ --- Review request for hive and Yongqiang He. Summary --- Patch for HIVE-2065 This addresses bug HIVE-2065. https://issues.apache.org/jira/browse/HIVE-2065 Diffs - build-common.xml 9f21a69 data/files/test_v6_compressed.rc PRE-CREATION data/files/test_v6_uncompressed.rc PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 20d1f4e ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java f7eacdc ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java bb1e3c9 ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 ql/src/test/results/clientpositive/sample10.q.out 50406c3 Diff: https://reviews.apache.org/r/529/diff Testing --- Tests added, existing tests updated Thanks, Krishna
Re: Review Request: HIVE-2050. batch processing partition pruning process
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/522/#review355 --- mostly minor issues - can you update the patch, and I will try to get it in today trunk/conf/hive-default.xml https://reviews.apache.org/r/522/#comment705 spelling: alsore trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/522/#comment706 remove commented code trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/522/#comment708 Are these parameters used ? trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/522/#comment707 This check should be inside the loop where we are iterating over all the partitions. It may not matter, but we are marking all partitions as unknown even if one partition is unknown. - namit On 2011-03-27 22:59:19, Ning Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/522/ --- (Updated 2011-03-27 22:59:19) Review request for hive. Summary --- Introducing a new metastore API to retrieve a list of partitions in batch. Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 108 trunk/conf/hive-default.xml 108 trunk/metastore/if/hive_metastore.thrift 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 108 Diff: https://reviews.apache.org/r/522/diff Testing --- Thanks, Ning
Re: Review Request: HIVE-2050. batch processing partition pruning process
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/522/ --- (Updated 2011-03-28 11:02:07.935934) Review request for hive. Changes --- Taken Namit's comments. Summary --- Introducing a new metastore API to retrieve a list of partitions in batch. Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 108 trunk/conf/hive-default.xml 108 trunk/metastore/if/hive_metastore.thrift 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 108 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 108 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 108 Diff: https://reviews.apache.org/r/522/diff Testing --- Thanks, Ning
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Attachment: HIVE-2050.3.patch Taken Namit's comment. Review board is also updated. batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012167#comment-13012167 ] Namit Jain commented on HIVE-2050: -- +1 batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #56
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/56/ -- [...truncated 26924 lines...] [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table default.src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table default.src_sequencefile [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_sequencefile [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table default.src_thrift [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_thrift [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table default.src_json [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out [junit] Done query: wrong_distinct1.q [junit] Begin query: wrong_distinct2.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103281226_1550723762.txt [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103281226_2010052758.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11') [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: query: LOAD
Build failed in Jenkins: Hive-trunk-h0.20 #645
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/645/ -- [...truncated 29923 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-28_12-28-58_288_1474429126020538153/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-28 12:29:01,379 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-28_12-28-58_288_1474429126020538153/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103281229_871865812.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-28_12-29-02_889_3598898194217223253/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-28_12-29-02_889_3598898194217223253/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103281229_285404463.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE
[jira] [Assigned] (HIVE-1406) for query against non-native table such as HBase, choose number of mapper tasks automatically
[ https://issues.apache.org/jira/browse/HIVE-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1406: Assignee: Ted Yu (was: John Sichi) for query against non-native table such as HBase, choose number of mapper tasks automatically - Key: HIVE-1406 URL: https://issues.apache.org/jira/browse/HIVE-1406 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Ted Yu For native tables, we do this automatically, but for HBase tables, we get only 2 mappers by default. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 8)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/530/ --- Review request for hive. Summary --- New review request for HIVE-1803.8 This addresses bug HIVE-1803. https://issues.apache.org/jira/browse/HIVE-1803 Diffs - lib/README 1c2f0b1 lib/javaewah-0.2.jar PRE-CREATION lib/javaewah.LICENSE PRE-CREATION ql/build.xml 1c7570d ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java 6c320c5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java 0c9ccea ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java eac168f ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java 26beb4e ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 391e5de ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapBop.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 ql/src/test/queries/clientpositive/index_compact_3.q ee8abda ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_bitmap_and.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_bitmap_or.q.out PRE-CREATION Diff: https://reviews.apache.org/r/530/diff Testing --- Thanks, John
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1803: - Status: Open (was: Patch Available) Latest review comments are in: https://reviews.apache.org/r/530/ Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user
[ https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012279#comment-13012279 ] Devaraj Das commented on HIVE-1988: --- I updated the reviewboard https://reviews.apache.org/r/528/ with some changes. The main change is: The methods getDelegationToken/renewDelegationToken now check for the authentication method being KERBEROS. If not, then it refuses to give a delegation token. This takes care of a security hole, where, if a delegation token has been compromised, the malicious user in possession of the token could use it to authenticate itself with the metastore, and get a new delegation token. This process could go forever (and hence the malicious user could access the metastore without ever going through a kerberos authentication). Making the handing out of delegation tokens based on a prior kerberos authentication limits this. Also, the patch on reviewboard doesn't have generated code. I will upload it once someone takes a look at the patch and gives feedback. Make the delegation token issued by the MetaStore owned by the right user - Key: HIVE-1988 URL: https://issues.apache.org/jira/browse/HIVE-1988 Project: Hive Issue Type: Bug Components: Metastore, Security, Server Infrastructure Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.8.0 Attachments: hive-1988-3.patch, hive-1988.patch The 'owner' of any delegation token issued by the MetaStore is set to the requesting user. When a delegation token is asked by the user himself during a job submission, this is fine. However, in the case where the token is requested for by services (e.g., Oozie), on behalf of the user, the token's owner is set to the user the service is running as. Later on, when the token is used by a MapReduce task, the MetaStore treats the incoming request as coming from Oozie and does operations as Oozie. This means any new directory creations (e.g., create_table) on the hdfs by the MetaStore will end up with Oozie as the owner. Also, the MetaStore doesn't check whether a user asking for a token on behalf of some other user, is actually authorized to act on behalf of that other user. We should start using the ProxyUser authorization in the MetaStore (HADOOP-6510's APIs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Questions about Hive Database/Schema support
Hive 0.6 has support for multiple databases/schemas. Is this feature mature enough to be used in production? Are there any particular features known to not work with databases (I know you cannot run queries using multiple databases at the same time)? Currently, there doesn't seem to be an easy way to move existing tables into a new database from CLI but it should be possible to do this directly by modifying the metastore right? Is there anything to watch out for? Thanks, Vijay
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Status: Patch Available (was: Open) batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012339#comment-13012339 ] Namit Jain commented on HIVE-2050: -- can you update review board also ? batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira