[jira] [Updated] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-04-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1988:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this.
Thanks Devaraj!

> Make the delegation token issued by the MetaStore owned by the right user
> -
>
> Key: HIVE-1988
> URL: https://issues.apache.org/jira/browse/HIVE-1988
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security, Server Infrastructure
>Affects Versions: 0.7.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.8.0
>
> Attachments: hive-1988-3.patch, hive-1988-5.1.patch, hive-1988.patch
>
>
> The 'owner' of any delegation token issued by the MetaStore is set to the 
> requesting user. When a delegation token is asked by the user himself during 
> a job submission, this is fine. However, in the case where the token is 
> requested for by services (e.g., Oozie), on behalf of the user, the token's 
> owner is set to the user the service is running as. Later on, when the token 
> is used by a MapReduce task, the MetaStore treats the incoming request as 
> coming from Oozie and does operations as Oozie. This means any new directory 
> creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
> Oozie as the owner.
> Also, the MetaStore doesn't check whether a user asking for a token on behalf 
> of some other user, is actually authorized to act on behalf of that other 
> user. We should start using the ProxyUser authorization in the MetaStore 
> (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2082. Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/556/
---

Review request for hive.


Summary
---

The major change is to construct PartitionDesc from TableDesc and reuse the 
column info from the TableDesc. 


Diffs
-

  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
1087411 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1087411 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1087411 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1087411 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1087411 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 1087411 
  trunk/ql/src/test/results/clientpositive/combine2.q.out 1087411 
  trunk/ql/src/test/results/clientpositive/merge3.q.out 1087411 
  trunk/ql/src/test/results/clientpositive/pcr.q.out 1087411 
  trunk/ql/src/test/results/clientpositive/sample10.q.out 1087411 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java 1087411 

Diff: https://reviews.apache.org/r/556/diff


Testing
---

passed all unit tests. 


Thanks,

Ning



[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2082:
-

Attachment: HIVE-2082.patch

Uploading a patch for review. The review board request is here: 
https://reviews.apache.org/r/556/

It also passed all unit tests. 

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2082:
-

Attachment: HIVE-2082.patch

Attaching a patch for review. The review board: 
https://reviews.apache.org/r/556/

It also passed all unit tests. 

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2082:
-

Attachment: HIVE-2082.patch

Attaching a patch for review. The review board is at 
https://reviews.apache.org/r/556/

This patch also passed all unit tests. 

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2082:
-

Status: Patch Available  (was: Open)

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hive-trunk-h0.20 #657

2011-04-06 Thread Apache Hudson Server
See 




[jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016435#comment-13016435
 ] 

Edward Capriolo commented on HIVE-2082:
---

I am curious as to how this is compatible with 
https://issues.apache.org/jira/browse/HIVE-1913. 


> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2065) RCFile issues

2011-04-06 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2065:


Attachment: HIVE.2065.patch.1.txt

Updated patch where sequence file compliance is not addressed but the other two 
issues are.

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
> Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-04-06 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016443#comment-13016443
 ] 

jirapos...@reviews.apache.org commented on HIVE-2065:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/529/
---

(Updated 2011-04-06 17:13:30.910168)


Review request for hive and Yongqiang He.


Changes
---

Updated patch where sequence file compliance is not addressed but the other two 
issues are. 


Summary
---

Patch for HIVE-2065


This addresses bug HIVE-2065.
https://issues.apache.org/jira/browse/HIVE-2065


Diffs (updated)
-

  build-common.xml 9f21a69 
  data/files/test_v6dot0_compressed.rc PRE-CREATION 
  data/files/test_v6dot0_uncompressed.rc PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 20d1f4e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 f7eacdc 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
bb1e3c9 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a 
  ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc 
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 
  ql/src/test/results/clientpositive/sample10.q.out 50406c3 

Diff: https://reviews.apache.org/r/529/diff


Testing
---

Tests added, existing tests updated


Thanks,

Krishna



> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
> Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2065) RCFile issues

2011-04-06 Thread Krishna Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016444#comment-13016444
 ] 

Krishna Kumar commented on HIVE-2065:
-

Updated patch where sequence file compliance is not addressed but the other two 
issues are. Review board also updated : https://reviews.apache.org/r/529/

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
> Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compli

2011-04-06 Thread Krishna

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/529/
---

(Updated 2011-04-06 17:13:30.910168)


Review request for hive and Yongqiang He.


Changes
---

Updated patch where sequence file compliance is not addressed but the other two 
issues are. 


Summary
---

Patch for HIVE-2065


This addresses bug HIVE-2065.
https://issues.apache.org/jira/browse/HIVE-2065


Diffs (updated)
-

  build-common.xml 9f21a69 
  data/files/test_v6dot0_compressed.rc PRE-CREATION 
  data/files/test_v6dot0_uncompressed.rc PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 20d1f4e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 f7eacdc 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
bb1e3c9 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a 
  ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc 
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 
  ql/src/test/results/clientpositive/sample10.q.out 50406c3 

Diff: https://reviews.apache.org/r/529/diff


Testing
---

Tests added, existing tests updated


Thanks,

Krishna



[jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016450#comment-13016450
 ] 

Namit Jain commented on HIVE-2082:
--

Edward, I havent reviewed the patch in detail - but the general idea is as 
follows:

Partition inherits some properties from the Table (for eg. columns), and
others can be overwritten (for eg. serde).

Today, we treat all the properties similarly - this patch should optimize
for the inherited properties by maintaining just 1 copy.

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2097) Explore mechanisms for better compression with RC Files

2011-04-06 Thread Krishna Kumar (JIRA)
Explore mechanisms for better compression with RC Files
---

 Key: HIVE-2097
 URL: https://issues.apache.org/jira/browse/HIVE-2097
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Serializers/Deserializers
Reporter: Krishna Kumar
Priority: Minor


Optimization of the compression mechanisms used by RC File to be explored.

Some initial ideas
 
1. More efficient serialization/deserialization based on type-specific and 
storage-specific knowledge.
 
   For instance, storing sorted numeric values efficiently using some delta 
coding techniques

2. More efficient compression based on type-specific and storage-specific 
knowledge

   Enable compression codecs to be specified based on types or individual 
columns

3. Reordering the on-disk storage for better compression efficiency.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2097) Explore mechanisms for better compression with RC Files

2011-04-06 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar reassigned HIVE-2097:
---

Assignee: Krishna Kumar

> Explore mechanisms for better compression with RC Files
> ---
>
> Key: HIVE-2097
> URL: https://issues.apache.org/jira/browse/HIVE-2097
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
>
> Optimization of the compression mechanisms used by RC File to be explored.
> Some initial ideas
>  
> 1. More efficient serialization/deserialization based on type-specific and 
> storage-specific knowledge.
>  
>For instance, storing sorted numeric values efficiently using some delta 
> coding techniques
> 2. More efficient compression based on type-specific and storage-specific 
> knowledge
>Enable compression codecs to be specified based on types or individual 
> columns
> 3. Reordering the on-disk storage for better compression efficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2097) Explore mechanisms for better compression with RC Files

2011-04-06 Thread Krishna Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016451#comment-13016451
 ] 

Krishna Kumar commented on HIVE-2097:
-

Comment hijacked from HIVE-2065:

He Yongqiang added a comment - 31/Mar/11 23:13

we examined column groups, and sort the data internally based on one column in 
one column group. (But we did not try different compressions across column 
groups.) Tried this with 3-4 tables, and we see ~20% storage savings on one 
table compared the previous RCFile. The main problems for this approach is that 
it is hard to find out the correct/most efficient column group definitions.
One example, table tbl_1 has 20 columns, and user can define:

col_1,col_2,col_11,col_13:0;col_3,col_4,col_15,col_16:1;

This will put col_1, col_2,col_11, col_13 into one column group, and reorder 
that column group based on sorting col_1 (0 is the first column in this column 
group), and put col_3, col_4, col_15,col_16 into another column group, and 
reorder this column group based on sorting col_4, and finally put all other 
columns into the default column group with original order.
And should be easy to allow different compression codec for different column 
groups.

The main block issue for this approach is have a full set of utils to find out 
the best column group definition.



> Explore mechanisms for better compression with RC Files
> ---
>
> Key: HIVE-2097
> URL: https://issues.apache.org/jira/browse/HIVE-2097
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
>
> Optimization of the compression mechanisms used by RC File to be explored.
> Some initial ideas
>  
> 1. More efficient serialization/deserialization based on type-specific and 
> storage-specific knowledge.
>  
>For instance, storing sorted numeric values efficiently using some delta 
> coding techniques
> 2. More efficient compression based on type-specific and storage-specific 
> knowledge
>Enable compression codecs to be specified based on types or individual 
> columns
> 3. Reordering the on-disk storage for better compression efficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2071) enforcereadonlytables hook should not check a configuration variable

2011-04-06 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2071:


Attachment: HIVE.2071.patch.0.txt

Tentative patch to show the implementation for the sessionstate based solution 
described in #1 above.

> enforcereadonlytables hook should not check a configuration variable
> 
>
> Key: HIVE-2071
> URL: https://issues.apache.org/jira/browse/HIVE-2071
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Krishna Kumar
> Attachments: HIVE.2071.patch.0.txt
>
>
> Instead of adding a new configuration parameter which is being checked in
> EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables
> from the hive.exec.pre.hooks at creation time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016461#comment-13016461
 ] 

Ning Zhang commented on HIVE-2082:
--

@Edward, HIVE-1913 fixed a bug in PartitionDesc where previously table 
properties are returned even if partition properties are present. This patch 
doesn't change that. 

What this patch changed is how the PartitionDesc.properties is constructed. 
Previously properties is constructed using part.getSchema(), which will 
construct a new Properties object for each partition. The most memory consuming 
part is the colNames, colTypes and partStrings (see 
MetaStoreUtils.getSchema()). Since they are constructed using the table level 
StorageDescriptor, all partitions have the same colNames, colTypes and 
partStrings. So we could use the same objects for all partitions. 

This patch introduces a new PartitionDesc constructor with an additional 
TableDesc argument. The properties is constructed by using 
part.getSchemaFromTableSchema(tblDesc.getProperties()), which construct the 
properties by cloning the table level properties to the partiton level 
properties first and then overwrite it with partition specific arguments. 
Basically all except the colNames, colTypes and partStrings will be overwritten 
with the partition level Properties. 

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-04-06 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016462#comment-13016462
 ] 

Giridharan Kesavan commented on HIVE-1095:
--

since everyone is getting a 401 error (I tried with the hudson CI user as well 
and got the same 401). We need to check with the nexus admin to see if hive 
repo is available and ready for artifact publishing.

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
> HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, HIVE-1095.v5.PATCH, 
> hiveReleasedToMaven.tar.gz, make-maven.log
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #68

2011-04-06 Thread Apache Hudson Server
See 

--
[...truncated 27007 lines...]
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] Copying file: 

[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: wrong_distinct1.q
[junit] Hive history 
file=
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.srcpart partition (ds=20

Build failed in Jenkins: Hive-trunk-h0.20 #658

2011-04-06 Thread Apache Hudson Server
See 

--
[...truncated 29985 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-06_12-10-17_251_5995522468155093844/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-04-06 12:10:20,296 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-06_12-10-17_251_5995522468155093844/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-04-06_12-10-21_822_2002639595513382877/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-04-06_12-10-21_822_2002639595513382877/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[ju

[jira] [Updated] (HIVE-2091) Test scripts rcfile_columnar.q and join_filters.q need to be made deterministic in their output

2011-04-06 Thread Roman Shaposhnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HIVE-2091:
---

Attachment: HIVE-2091-trunk.patch

Patch against trunk (the previous one was against 0.7.0 branch)

> Test scripts rcfile_columnar.q and join_filters.q   need to be made 
> deterministic in their output
> -
>
> Key: HIVE-2091
> URL: https://issues.apache.org/jira/browse/HIVE-2091
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: Roman Shaposhnik
>Priority: Minor
> Attachments: HIVE-2091-trunk.patch, HIVE-2091.patch
>
>
> Currently this 2 query scripts generate non-deterministic output: 
>   * ql/src/test/queries/clientpositive/rcfile_columnar.q
>   * ql/src/test/queries/clientpositive/join_filters.q  
> The suggestion is to use ORDER BY statement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2082. Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread namit jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/556/#review396
---


Also, why should the test outputs be different.



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java





- namit


On 2011-04-06 15:05:25, Ning Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/556/
> ---
> 
> (Updated 2011-04-06 15:05:25)
> 
> 
> Review request for hive.
> 
> 
> Summary
> ---
> 
> The major change is to construct PartitionDesc from TableDesc and reuse the 
> column info from the TableDesc. 
> 
> 
> Diffs
> -
> 
>   
> trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 1087411 
>   trunk/ql/src/test/results/clientpositive/combine2.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/merge3.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/pcr.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/sample10.q.out 1087411 
>   trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java 1087411 
> 
> Diff: https://reviews.apache.org/r/556/diff
> 
> 
> Testing
> ---
> 
> passed all unit tests. 
> 
> 
> Thanks,
> 
> Ning
> 
>



[jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016515#comment-13016515
 ] 

Namit Jain commented on HIVE-2082:
--

minor comments in review board

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2082:
-

Status: Open  (was: Patch Available)

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016518#comment-13016518
 ] 

Namit Jain commented on HIVE-2068:
--

can you update the review-board entry ?

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2068:
-

Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016535#comment-13016535
 ] 

Siying Dong commented on HIVE-2068:
---

review-board updated.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2082. Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Ning Zhang


> On 2011-04-06 20:36:23, namit jain wrote:
> > Also, why should the test outputs be different.
> >

The previous test outputs are wrong due to a bug: previously partition 
properties are set by the table level property. Now we use partition level 
property. If you look at one of the plan diffs (e.g., combine2.q.out), the 
partition level stats previously are the same as the table level stats. Now 
they are different for different partitions. 


- Ning


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/556/#review396
---


On 2011-04-06 15:05:25, Ning Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/556/
> ---
> 
> (Updated 2011-04-06 15:05:25)
> 
> 
> Review request for hive.
> 
> 
> Summary
> ---
> 
> The major change is to construct PartitionDesc from TableDesc and reuse the 
> column info from the TableDesc. 
> 
> 
> Diffs
> -
> 
>   
> trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 1087411 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 1087411 
>   trunk/ql/src/test/results/clientpositive/combine2.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/merge3.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/pcr.q.out 1087411 
>   trunk/ql/src/test/results/clientpositive/sample10.q.out 1087411 
>   trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java 1087411 
> 
> Diff: https://reviews.apache.org/r/556/diff
> 
> 
> Testing
> ---
> 
> passed all unit tests. 
> 
> 
> Thanks,
> 
> Ning
> 
>



[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2068:
-

Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-04-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016590#comment-13016590
 ] 

Namit Jain commented on HIVE-2082:
--

OK

+1

> Reduce memory consumption in preparing MapReduce job
> 
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016597#comment-13016597
 ] 

Namit Jain commented on HIVE-2068:
--

Siying, I dont see the new changes

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

2011-04-06 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016604#comment-13016604
 ] 

Siying Dong commented on HIVE-2068:
---

Namit, you can't see trunk/conf/hive-default.xml is already included in the 
diff of the review board?

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or 
> aggregation
> ---
>
> Key: HIVE-2068
> URL: https://issues.apache.org/jira/browse/HIVE-2068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, 
> HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT 
> xxx" will start a MapReduce job with input to be the whole table or 
> partition. The latency can be huge if the table or partition is big. We could 
> reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-04-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016667#comment-13016667
 ] 

Amareshwari Sriramadasu commented on HIVE-1095:
---

I created an INFRA ticket for nexus access: 
https://issues.apache.org/jira/browse/INFRA-3567.

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
> HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, HIVE-1095.v5.PATCH, 
> hiveReleasedToMaven.tar.gz, make-maven.log
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-06 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2093:
--

Attachment: HIVE.2093.1.patch

1.ReadEntity and WriteEntity to support database as an entity
2.acquire locks for database in inputs and outputs
3.if database in outputs, user must have All permission to execute the query.
4.fix a small bug in security code that querying user-level privilege always 
returns null.

> inputs are outputs should be populated for create/drop database
> ---
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE.2093.1.patch
>
>
> This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2093) inputs are outputs should be populated for create/drop database

2011-04-06 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2093:
--

Status: Patch Available  (was: Open)

> inputs are outputs should be populated for create/drop database
> ---
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE.2093.1.patch
>
>
> This is needed for many other things: concurrency, authorization etc. to work

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


How To Use Hive

2011-04-06 Thread komara nagarjuna
 Sir,

 I am new to Hadoop and Hive. Now i am developing an application Hadoop
with Hive
 in MultiNode cluster. I install and run Hadoop successfully in master and
slave
 machines successfully.Hive installed in Master machine and it also
connecting to
database through mysql.

 In Hive, I create a table successfully.My problem is how to insert data
to hive tables.
How to communicate Hadoop with Hive.How to use Hive datawarehouse.What is
the
purpose of hive.

 Please explain how to use Hive in real time.


*Thanks & Regards*,
*Nagarjuna komara.*


[jira] [Commented] (HIVE-1612) Cannot build hive for hadoop 0.21.0

2011-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016682#comment-13016682
 ] 

Juho Mäkinen commented on HIVE-1612:


Is this still valid now when 0.7.0 has been released?

> Cannot build hive for hadoop 0.21.0
> ---
>
> Key: HIVE-1612
> URL: https://issues.apache.org/jira/browse/HIVE-1612
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: AJ Pahl
> Attachments: HIVE-1612-for-r1043843.patch, 
> HIVE-1612-for-r1043843.patch, HIVE-1612.patch
>
>
> Current trunk for 0.7.0 does not support building HIVE against the Hadoop 
> 0.21.0 release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1612) Cannot build hive for hadoop 0.21.0

2011-04-06 Thread Bochun Bai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016690#comment-13016690
 ] 

Bochun Bai commented on HIVE-1612:
--

Juho, trunk(v1089737) still not compiled against hadoop-core-0.21.0
And there is no notes for 0.21 support in the RELEASE-NOTE

> Cannot build hive for hadoop 0.21.0
> ---
>
> Key: HIVE-1612
> URL: https://issues.apache.org/jira/browse/HIVE-1612
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: AJ Pahl
> Attachments: HIVE-1612-for-r1043843.patch, 
> HIVE-1612-for-r1043843.patch, HIVE-1612.patch
>
>
> Current trunk for 0.7.0 does not support building HIVE against the Hadoop 
> 0.21.0 release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes

2011-04-06 Thread Russell Melick

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/
---

Review request for hive.


Summary
---

Review request for HIVE-1644.12.patch


This addresses bug HIVE-1644.
https://issues.apache.org/jira/browse/HIVE-1644


Diffs
-

  ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 
  ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b 
  ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
1f01446 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 
0ae9fa2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 
  conf/hive-default.xml c42197f 

Diff: https://reviews.apache.org/r/558/diff


Testing
---


Thanks,

Russell



[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-06 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Attachment: HIVE-1644.12.patch

Includes check to make sure that all partitions in the query exist on the index 
table.

Review board available at https://reviews.apache.org/r/558/

> use filter pushdown for automatically accessing indexes
> ---
>
> Key: HIVE-1644
> URL: https://issues.apache.org/jira/browse/HIVE-1644
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
> HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
> HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
> HIVE-1644.8.patch, HIVE-1644.9.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed 
> down to a table scan.  The next step is to use these for selecting available 
> indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-06 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016702#comment-13016702
 ] 

jirapos...@reviews.apache.org commented on HIVE-1644:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/
---

Review request for hive.


Summary
---

Review request for HIVE-1644.12.patch


This addresses bug HIVE-1644.
https://issues.apache.org/jira/browse/HIVE-1644


Diffs
-

  ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 
  ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b 
  ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
1f01446 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 
0ae9fa2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 
  conf/hive-default.xml c42197f 

Diff: https://reviews.apache.org/r/558/diff


Testing
---


Thanks,

Russell



> use filter pushdown for automatically accessing indexes
> ---
>
> Key: HIVE-1644
> URL: https://issues.apache.org/jira/browse/HIVE-1644
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.8.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
> HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
> HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
> HIVE-1644.8.patch, HIVE-1644.9.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed 
> down to a table scan.  The next step is to use these for selecting available 
> indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-06 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Affects Version/s: (was: 0.7.0)
   0.8.0
 Release Note: Enables automatic use of Compact indexes by setting 
hive.optimize.autoindex=true
   Status: Patch Available  (was: In Progress)

> use filter pushdown for automatically accessing indexes
> ---
>
> Key: HIVE-1644
> URL: https://issues.apache.org/jira/browse/HIVE-1644
> Project: Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.8.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
> HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
> HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
> HIVE-1644.8.patch, HIVE-1644.9.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed 
> down to a table scan.  The next step is to use these for selecting available 
> indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira