[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550379#comment-13550379
 ] 

He Yongqiang commented on HIVE-3874:


I want to list a few thoughts why i think the orc solution is a much more 
appealing one.

1. For a BIG data warehouse that stores more than 90% of existing data in 
rcfile (like FB's >100PB warehouse), data conversion from one format to another 
is something that definitely should be avoided. It is possible to convert some 
tables if there is a big space saving advantage. But managing two distinct 
formats which do not have any compatibility, inter-operability, or even in two 
different code repositories is another big headache that would avoid at the 
first place.
2. Developing the new ORC format in the hive/hcatalog codebase will make hive 
development/operations much easier.
3. Letting new ORC format have some backward compatibility with RCFile will 
save a lot of trouble.




> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550347#comment-13550347
 ] 

He Yongqiang commented on HIVE-3585:


bq. This patch is going to share 90% of its small code with the existing 
AvroSerde that was never shunted into contrib. 

Then why it is so hard to make it part of existing AvroSerde?

bq. I'm not seeing any technical reasons to block progress. 
Technically, there is no issue. Technically I am pretty sure this can be well 
done.

bq. Is anyone planning on exercising a -1?

I have listed two options that i insist on. one is to develop it as part of 
existing avroserde, the other is to put it in contrib or a 3rd party lib (maybe 
github?).



> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549870#comment-13549870
 ] 

He Yongqiang commented on HIVE-3874:


bq. It would be possible to extend the RCFile reader to recognize an ORC file 
and to have it delegate to the ORC File reader.
it will be great to have this support. In this case, what's the fileformat for 
the partition/table, rcfile, or orcfile?

When we did the conversion for old data from sequencefile to rcfile long time 
ago, it is a big headache handle errors like "unrecognized fileformat or 
corruption" because there is no interoperability between these two files. The 
most errors we saw are because the table/partition format does not match the 
actual data format.

two examples:
1. old partition's data is rcfile, new partition's data is in orc format. 
2. in one partition, some files are rcfile, and some files are in orc format.



> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549431#comment-13549431
 ] 

He Yongqiang commented on HIVE-3874:


That should work, just want to make sure they have similar API, so other 
tools/utilities will automatically work, or just needs small changes. One 
example is the block merger.  

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-09 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549410#comment-13549410
 ] 

He Yongqiang commented on HIVE-3874:


will this optimized format support backward compatibility? If it's backward 
compatible, it will be easier to deploy. New formats without backward 
compatibility is really a headache, especially when you have a need to convert 
old data. 

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang edited comment on HIVE-3585 at 1/7/13 10:40 PM:
-

contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

>From http://incubator.apache.org/hcatalog/docs/r0.4.0/, another good place i 
>can think of is the hcatalog project. But i don't know if hcatalog itself 
>includes custom data format support or not.

  was (Author: he yongqiang):
contrib is a good place for any projects that is not mature. There are so 
many custom data formats out there, it does not make sense to support all of 
them in core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  

  
> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang commented on HIVE-3585:


contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  


> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546152#comment-13546152
 ] 

He Yongqiang commented on HIVE-3585:


HBaseSerde is first added to contrib and then moved to core later.
  
bq. Pig is adding TrevniStorage as a builtin, and interoperability is desired.
I think interoperability is not a problem no matter where the code residents.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544382#comment-13544382
 ] 

He Yongqiang commented on HIVE-3585:


So far i am still not convinced to have it as another builtin serde in Hive's 
core codebase. We initially did put some new serdes in contrib or 3rd party 
libs, examples include HBaseSerde and Zebra serde.

If you can make it work with existing Avro serde, it will also be great.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543413#comment-13543413
 ] 

He Yongqiang commented on HIVE-3585:


I did not get why it does not work with partition schema update. 

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543391#comment-13543391
 ] 

He Yongqiang commented on HIVE-3585:


Thanks for just reminding me that there is already a Avro serde. Have you tried 
to make the required changes to be part of the existing Avro serde instead of 
creating a new one?

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543360#comment-13543360
 ] 

He Yongqiang commented on HIVE-3585:


@jakob, awesome to hear you are planning to own its maintenance. No particular 
intention to complicate your use case here, but i think a 3rd party lib or 
contrib folder would be good start and won't affect your usage. If i remember 
correctly, we used to do similar things for Pig's Zebra.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543294#comment-13543294
 ] 

He Yongqiang commented on HIVE-3585:


@Carl, adding code that is not much used is always no harm except a lot of 
maintenance and document pain. You can first go with a contrib folder or a 3rd 
party lib and merge to core hive later if it proves success. 

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543253#comment-13543253
 ] 

He Yongqiang commented on HIVE-3585:


@jakob, you can always implement reader of customized data in a 3rd party lib 
and let hive load it from there.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496697#comment-13496697
 ] 

He Yongqiang commented on HIVE-2206:


okay, i will target commit it this weekend or earlier next week.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496546#comment-13496546
 ] 

He Yongqiang commented on HIVE-3585:


Although it is so similar to RCFIle, i did not see any reference to RCFile in 
its doc. I assume that will help avoid confusion for its users. But as part of 
Hive, if we got two formats that are so similar to each other, the confusion 
will be thrown to all hive users.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Jakob Homan
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496537#comment-13496537
 ] 

He Yongqiang commented on HIVE-3585:


Yeah i read some docs of it. But i really did not see a big difference. Some 
features can be added to RCFile easily. Please point out if you think there is 
a dramatic difference in some designs. 

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Jakob Homan
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496529#comment-13496529
 ] 

He Yongqiang commented on HIVE-2206:


@Carl, keep in mind that you already months of time to comment. So maybe 
addressing your comments in new jiras will make more sense.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496528#comment-13496528
 ] 

He Yongqiang commented on HIVE-2206:


@carl, you can go ahead comment, huai will address them in a sperate diff. 

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496399#comment-13496399
 ] 

He Yongqiang commented on HIVE-2206:


+1, i will commit after tests pass.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2012-11-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496389#comment-13496389
 ] 

He Yongqiang commented on HIVE-3585:


vote for -1.

I did not see any benefit of adding one that is just a copycat of rcfile.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Jakob Homan
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-10-01 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466937#comment-13466937
 ] 

He Yongqiang commented on HIVE-2206:


I will be on vacation this whole week. Given this is a very big diff, I will 
keep this open for another one week or two for more comments. 


> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466625#comment-13466625
 ] 

He Yongqiang commented on HIVE-2206:


@Carl, i just reverted. I will commit again tomorrow.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466584#comment-13466584
 ] 

He Yongqiang commented on HIVE-2206:


I did not see a 24 hours waiting on the bylaw page?

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466581#comment-13466581
 ] 

He Yongqiang commented on HIVE-2206:


@Carl, btw, i did mentioned a few times on the comments that i am planing to 
commit this one.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466580#comment-13466580
 ] 

He Yongqiang commented on HIVE-2206:


I commented that all tests passed.

ok, +1.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2206:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed. Thanks for the hard work, Yin Huai!

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466552#comment-13466552
 ] 

He Yongqiang commented on HIVE-2206:


All tests passed for me.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-09-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457484#comment-13457484
 ] 

He Yongqiang commented on HIVE-2206:


The current patch looks ok. 
@Carl, please give more specific comments. 

We should agree on that new big features should not be enabled by default. 
That's too risky. 

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-07-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424315#comment-13424315
 ] 

He Yongqiang commented on HIVE-2206:


For the last few months (almost one year), Yin has been actively maintaining 
this patch, and i think it is in a very good state to check into trunk. 

So i will do some final review, and hope to commit it sometime next month. 
Please feel free to jump in to review the patch and put any comments here 
before the commit.

In the last review, I will make sure this patch will not have big changes to 
existing execution path, so it can be simply disabled like other optimizations 
in Hive. And Yin will still be actively maintaining this patch (help fix  bugs 
etc) after the commit. 

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, 
> HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
> HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> YSmartPatchForHive.patch, testQueries.2.q
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

2012-07-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420466#comment-13420466
 ] 

He Yongqiang commented on HIVE-2845:


With HIVE-1644, this should be done. Have you looked at the query plan, or 
looked at the patch of HIVE-1644? Maybe Hive-1644 does not process join cases 
(but the code is already there.) The filter needs to be pushed down to the 
mapper to trigger the auto index.


> Add support for index joins in Hive
> ---
>
> Key: HIVE-2845
> URL: https://issues.apache.org/jira/browse/HIVE-2845
> Project: Hive
>  Issue Type: New Feature
>  Components: Indexing, Query Processor
>Reporter: Namit Jain
>  Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of 
> A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the 
> corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3086) Skewed Join Optimization

2012-06-26 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401960#comment-13401960
 ] 

He Yongqiang commented on HIVE-3086:


'hint' by user has been proven not very useful. Automatically detecting skewed 
keys, like what the current skew join processor is doing now, will make it more 
powerful and useful.

@Nadeem, can you add more details to the wiki about the differences between the 
existing one and the one you are working on. The current one can not process 
the case where a same join key is skewed in more than one table. Are you 
targeting those cases? Also there are some problems with existing skew join 
opt, can you also try to fix those as part of your project?

> Skewed Join Optimization
> 
>
> Key: HIVE-3086
> URL: https://issues.apache.org/jira/browse/HIVE-3086
> Project: Hive
>  Issue Type: New Feature
>Reporter: Nadeem Moidu
>Assignee: Nadeem Moidu
>
> During a join operation, if one of the columns has a skewed key, it can cause 
> that particular reducer to become the bottleneck. The following feature will 
> address it:
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager

2011-09-26 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2450:
---

Attachment: HIVE-2450.3.patch

address John's comments. for refactoring the retry logic out, i agree. But i 
think we can do it later when we do the second lock manager (even we do it now, 
it will need to change later).

> move lock retry logic into ZooKeeperHiveLockManager
> ---
>
> Key: HIVE-2450
> URL: https://issues.apache.org/jira/browse/HIVE-2450
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2450.1.patch, HIVE-2450.2.patch, HIVE-2450.3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager

2011-09-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2450:
---

Attachment: HIVE-2450.2.patch

> move lock retry logic into ZooKeeperHiveLockManager
> ---
>
> Key: HIVE-2450
> URL: https://issues.apache.org/jira/browse/HIVE-2450
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2450.1.patch, HIVE-2450.2.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2464) report progress in MapOperator

2011-09-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2464.


Resolution: Won't Fix

> report progress in MapOperator
> --
>
> Key: HIVE-2464
> URL: https://issues.apache.org/jira/browse/HIVE-2464
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2461) Add method to PerfLogger to perform cleanup/final steps.

2011-09-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2461.


Resolution: Fixed

committed, thanks Kevin Wilfong!

> Add method to PerfLogger to perform cleanup/final steps.
> 
>
> Key: HIVE-2461
> URL: https://issues.apache.org/jira/browse/HIVE-2461
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2461.1.patch.txt, HIVE-2461.2.patch.txt
>
>
> I think a method added to PerfLogger to perform cleanup/final steps would be 
> very useful.  For example, it could be used to close any database connections 
> created as part of a PerfLogger subclass, or to perform logging that requires 
> all perf values to first be calculated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2462) make INNER a non-reserved keyword

2011-09-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2462:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed, thanks John!

> make INNER a non-reserved keyword
> -
>
> Key: HIVE-2462
> URL: https://issues.apache.org/jira/browse/HIVE-2462
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.9.0
>
> Attachments: HIVE-2462.1.patch
>
>
> HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards 
> compatibility for queries which were using it as an identifier.  This patch 
> addresses that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2464) report progress in MapOperator

2011-09-22 Thread He Yongqiang (JIRA)
report progress in MapOperator
--

 Key: HIVE-2464
 URL: https://issues.apache.org/jira/browse/HIVE-2464
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2461) Add method to PerfLogger to perform cleanup/final steps.

2011-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113068#comment-13113068
 ] 

He Yongqiang commented on HIVE-2461:


+1, will commit after tests pass

> Add method to PerfLogger to perform cleanup/final steps.
> 
>
> Key: HIVE-2461
> URL: https://issues.apache.org/jira/browse/HIVE-2461
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2461.1.patch.txt, HIVE-2461.2.patch.txt
>
>
> I think a method added to PerfLogger to perform cleanup/final steps would be 
> very useful.  For example, it could be used to close any database connections 
> created as part of a PerfLogger subclass, or to perform logging that requires 
> all perf values to first be calculated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2462) make INNER a non-reserved keyword

2011-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113027#comment-13113027
 ] 

He Yongqiang commented on HIVE-2462:


The patch looks good, we should have it. running test

> make INNER a non-reserved keyword
> -
>
> Key: HIVE-2462
> URL: https://issues.apache.org/jira/browse/HIVE-2462
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.9.0
>
> Attachments: HIVE-2462.1.patch
>
>
> HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards 
> compatibility for queries which were using it as an identifier.  This patch 
> addresses that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2451:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed, thanks Siying!

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2456) JDBCStatsAggregator DELETE STATEMENT should escape _ and %

2011-09-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2456.


Resolution: Fixed

committed, thanks Ning!

> JDBCStatsAggregator DELETE STATEMENT should escape _ and %
> --
>
> Key: HIVE-2456
> URL: https://issues.apache.org/jira/browse/HIVE-2456
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2456.patch
>
>
> JDBCStatsAggregator first aggregates stats from all publishers, and then 
> delete these intermediate results. The delete is using LIKE operator, so it 
> needs to escape '_' and '%'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging

2011-09-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108156#comment-13108156
 ] 

He Yongqiang commented on HIVE-2453:


what i mean is should we tag the hadoop job or the query, or both? for the 
above example, it has 2 jobs, the first one is a join, and the second a group 
by.

> Need a way to categorize queries in hooks for improved logging
> --
>
> Key: HIVE-2453
> URL: https://issues.apache.org/jira/browse/HIVE-2453
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt
>
>
> We need a way to categorize queries, such as whether or not the include a 
> join clause, a group by clause, etc., in the hooks.  This will allow for 
> better performance logging.
> Currently the only way I can find is to go through the operators in the 
> tasks, but which operators are used for the different types of queries may 
> change over time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2456) JDBCStatsAggregator DELETE STATEMENT should escape _ and %

2011-09-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108025#comment-13108025
 ] 

He Yongqiang commented on HIVE-2456:


+1, will commit after tests pass

> JDBCStatsAggregator DELETE STATEMENT should escape _ and %
> --
>
> Key: HIVE-2456
> URL: https://issues.apache.org/jira/browse/HIVE-2456
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2456.patch
>
>
> JDBCStatsAggregator first aggregates stats from all publishers, and then 
> delete these intermediate results. The delete is using LIKE operator, so it 
> needs to escape '_' and '%'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging

2011-09-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108013#comment-13108013
 ] 

He Yongqiang commented on HIVE-2453:


i haven't look at the change. just have a small question: if a query like 
"select key, count(1) from (select a.key as key, b.value as value from src a 
join src b on a.key=b.key) group by key", what tag will this query get?

> Need a way to categorize queries in hooks for improved logging
> --
>
> Key: HIVE-2453
> URL: https://issues.apache.org/jira/browse/HIVE-2453
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt
>
>
> We need a way to categorize queries, such as whether or not the include a 
> join clause, a group by clause, etc., in the hooks.  This will allow for 
> better performance logging.
> Currently the only way I can find is to go through the operators in the 
> tasks, but which operators are used for the different types of queries may 
> change over time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager

2011-09-18 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2450:
---

Attachment: HIVE-2450.1.patch

> move lock retry logic into ZooKeeperHiveLockManager
> ---
>
> Key: HIVE-2450
> URL: https://issues.apache.org/jira/browse/HIVE-2450
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2450.1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105782#comment-13105782
 ] 

He Yongqiang commented on HIVE-2451:


+1, will commit after tests pass

> TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
> of HIVE-1538
> --
>
> Key: HIVE-2451
> URL: https://issues.apache.org/jira/browse/HIVE-2451
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-2451.1.patch
>
>
> Example:
> select count(1) from  TABLESAMPLE(BUCKET xxx out of yyy) where 
>  = 'xxx'
> will not trigger input pruning.
> The reason is that we assume sample filtering operator only happens as the 
> second filter after table scan, which is broken by HIVE-1538, even if the 
> feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager

2011-09-15 Thread He Yongqiang (JIRA)
move lock retry logic into ZooKeeperHiveLockManager
---

 Key: HIVE-2450
 URL: https://issues.apache.org/jira/browse/HIVE-2450
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105619#comment-13105619
 ] 

He Yongqiang commented on HIVE-2206:


ok. how about just "correlation"? 
Also can you take a look if it is possible to the optimization as part of 
physical optimizer. We need a lot of code cleanup in the current patch.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: Queries, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2011-09-14 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105098#comment-13105098
 ] 

He Yongqiang commented on HIVE-2206:


Cool! Yin, please let us know when u are mostly done. one small things is that 
in the hive code let's call the new optimizer as "cooperative scan" instead of 
YSmart. But we can add the paper ref in the comment.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: Queries, YSmartPatchForHive.patch
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-09-14 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104835#comment-13104835
 ] 

He Yongqiang commented on HIVE-2420:


awesome, will first try the config setting. 

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2217) add Query text for debugging in lock data

2011-09-13 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2217:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Jiayan!

> add Query text for debugging in lock data
> -
>
> Key: HIVE-2217
> URL: https://issues.apache.org/jira/browse/HIVE-2217
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.1
>Reporter: Namit Jain
>Assignee: Jiayan Jiang
> Attachments: hive_diff2
>
>
> Currently, the queryId is stored in the lock data - 
> Query text would improve the debuggability

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2217) add Query text for debugging in lock data

2011-09-12 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103161#comment-13103161
 ] 

He Yongqiang commented on HIVE-2217:


+1, will commit after tests pass. 

> add Query text for debugging in lock data
> -
>
> Key: HIVE-2217
> URL: https://issues.apache.org/jira/browse/HIVE-2217
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.1
>Reporter: Namit Jain
>Assignee: Jiayan Jiang
> Attachments: hive_diff2
>
>
> Currently, the queryId is stored in the lock data - 
> Query text would improve the debuggability

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1975) "insert overwrite directory" Not able to insert data with multi level directory path

2011-09-12 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103160#comment-13103160
 ] 

He Yongqiang commented on HIVE-1975:


what's the use case here?  the user can always first create the parent dir. But 
users misspell the dir name, they may not want the dirs created. Or worse, the 
data got loaded to some other place they not noticed.

> "insert overwrite directory" Not able to insert data with multi level 
> directory path
> 
>
> Key: HIVE-1975
> URL: https://issues.apache.org/jira/browse/HIVE-1975
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
> Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-1975.patch
>
>
> Below query execution is failed
> Ex:
> {noformat}
>insert overwrite directory '/HIVEFT25686/chinna/' select * from dept_j;
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

2011-09-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1996:
---

Status: Open  (was: Patch Available)

> "LOAD DATA INPATH" fails when the table already contains a file of the same 
> name
> 
>
> Key: HIVE-1996
> URL: https://issues.apache.org/jira/browse/HIVE-1996
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Kirk True
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's 
> HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
> kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per 
> HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} 
> as it continues to use the same array elements (with the un-renamed, old file 
> names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
> at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name

2011-09-12 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103155#comment-13103155
 ] 

He Yongqiang commented on HIVE-1996:


For this, we need to make the rename optional, and by default disabled. If 
disabled rename, should throw an error to user.

> "LOAD DATA INPATH" fails when the table already contains a file of the same 
> name
> 
>
> Key: HIVE-1996
> URL: https://issues.apache.org/jira/browse/HIVE-1996
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Kirk True
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's 
> HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
> kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per 
> HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} 
> as it continues to use the same array elements (with the un-renamed, old file 
> names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
> at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files

2011-09-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2440:
---

Attachment: HIVE-2440.3.patch

removed childrenPaths from MapOp

> make hive mapper initialize faster when having tons of input files
> --
>
> Key: HIVE-2440
> URL: https://issues.apache.org/jira/browse/HIVE-2440
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch, HIVE-2440.3.patch
>
>
> when one hive job has tons of input files, a lot of mappers may fail because 
> of slow initialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2440) make hive mapper initialize faster when having tons of input files

2011-09-12 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102934#comment-13102934
 ] 

He Yongqiang commented on HIVE-2440:


https://reviews.apache.org/r/1813/

> make hive mapper initialize faster when having tons of input files
> --
>
> Key: HIVE-2440
> URL: https://issues.apache.org/jira/browse/HIVE-2440
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch
>
>
> when one hive job has tons of input files, a lot of mappers may fail because 
> of slow initialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files

2011-09-09 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2440:
---

Attachment: HIVE-2440.2.patch

This fixes test failure on combine3

> make hive mapper initialize faster when having tons of input files
> --
>
> Key: HIVE-2440
> URL: https://issues.apache.org/jira/browse/HIVE-2440
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch
>
>
> when one hive job has tons of input files, a lot of mappers may fail because 
> of slow initialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files

2011-09-09 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2440:
---

Attachment: HIVE-2440.1.patch

> make hive mapper initialize faster when having tons of input files
> --
>
> Key: HIVE-2440
> URL: https://issues.apache.org/jira/browse/HIVE-2440
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2440.1.patch
>
>
> when one hive job has tons of input files, a lot of mappers may fail because 
> of slow initialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2440) make hive mapper initialize faster when having tons of input files

2011-09-09 Thread He Yongqiang (JIRA)
make hive mapper initialize faster when having tons of input files
--

 Key: HIVE-2440
 URL: https://issues.apache.org/jira/browse/HIVE-2440
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang


when one hive job has tons of input files, a lot of mappers may fail because of 
slow initialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2429) skip corruption bug that cause data not decompressed

2011-09-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2429.


Resolution: Fixed

committed, thanks Ramkumar Vadali!

> skip corruption bug that cause data not decompressed
> 
>
> Key: HIVE-2429
> URL: https://issues.apache.org/jira/browse/HIVE-2429
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Ramkumar Vadali
> Attachments: HIVE-2429.patch
>
>
> This is a regression of https://issues.apache.org/jira/browse/HIVE-2404

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2429) skip corruption bug that cause data not decompressed

2011-09-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099268#comment-13099268
 ] 

He Yongqiang commented on HIVE-2429:


+1, will commit after tests pass

> skip corruption bug that cause data not decompressed
> 
>
> Key: HIVE-2429
> URL: https://issues.apache.org/jira/browse/HIVE-2429
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Ramkumar Vadali
> Attachments: HIVE-2429.patch
>
>
> This is a regression of https://issues.apache.org/jira/browse/HIVE-2404

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-09-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099132#comment-13099132
 ] 

He Yongqiang commented on HIVE-2420:


i think a quick fix may be just revert the diff of dedup filters. what do u 
think?

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2429) skip corruption bug that cause data not decompressed

2011-09-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-2429:
--

Assignee: Ramkumar Vadali  (was: He Yongqiang)

> skip corruption bug that cause data not decompressed
> 
>
> Key: HIVE-2429
> URL: https://issues.apache.org/jira/browse/HIVE-2429
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Ramkumar Vadali
>
> This is a regression of https://issues.apache.org/jira/browse/HIVE-2404

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2429) skip corruption bug that cause data not decompressed

2011-09-07 Thread He Yongqiang (JIRA)
skip corruption bug that cause data not decompressed


 Key: HIVE-2429
 URL: https://issues.apache.org/jira/browse/HIVE-2429
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


This is a regression of https://issues.apache.org/jira/browse/HIVE-2404

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2415) disallow partition column names when doing replace columns

2011-09-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098414#comment-13098414
 ] 

He Yongqiang commented on HIVE-2415:


@Ashutosh, yeah, i understand your point of moving the validation from client 
to metastore server. There is another concern is that we want the hive 
metastore have much more flexibility than the client side, so if something goes 
wrong for any reason, we can use thrift metastore interface to fix it. For 
example, if a table is somehow has a normal column whose name conflicts with a 
partition column, we won't be able to fix it if we do validation on the 
metastore side.

> disallow partition column names when doing replace columns
> --
>
> Key: HIVE-2415
> URL: https://issues.apache.org/jira/browse/HIVE-2415
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2415.1.patch
>
>
> alter table replace columns allows to add a column with the same name as 
> partition column, which introduced inconsistency. 
> We should disallow this. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-09-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098251#comment-13098251
 ] 

He Yongqiang commented on HIVE-2420:


This is pretty important. it will block us testing and deploying the open 
source trunk.

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

2011-09-06 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2404.


Resolution: Fixed

committed, thanks Ramkumar!

> Allow RCFile Reader to tolerate corruptions
> ---
>
> Key: HIVE-2404
> URL: https://issues.apache.org/jira/browse/HIVE-2404
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Minor
> Attachments: toleratecorruptions.2.patch, 
> toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return 
> results based on the files that can be processed. A single corrupt block of 
> data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

2011-09-02 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096476#comment-13096476
 ] 

He Yongqiang commented on HIVE-2404:


+1, will commit after tests pass

> Allow RCFile Reader to tolerate corruptions
> ---
>
> Key: HIVE-2404
> URL: https://issues.apache.org/jira/browse/HIVE-2404
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Minor
> Attachments: toleratecorruptions.2.patch, 
> toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return 
> results based on the files that can be processed. A single corrupt block of 
> data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

2011-09-02 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096295#comment-13096295
 ] 

He Yongqiang commented on HIVE-2404:


Awesome feature! some nitpick comments on review board. Thanks!

> Allow RCFile Reader to tolerate corruptions
> ---
>
> Key: HIVE-2404
> URL: https://issues.apache.org/jira/browse/HIVE-2404
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.1
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Minor
> Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return 
> results based on the files that can be processed. A single corrupt block of 
> data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2413) BlockMergeTask ignores client-specified jars

2011-09-02 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2413:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed, thanks Krishna Kumar!

> BlockMergeTask ignores client-specified jars
> 
>
> Key: HIVE-2413
> URL: https://issues.apache.org/jira/browse/HIVE-2413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2413.v0.patch, HIVE-2413.v1.patch
>
>
> User-specified jars are not added to the hadoop tasks while executing a 
> BlockMergeTask resulting in a ClassNotFoundException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-09-01 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095608#comment-13095608
 ] 

He Yongqiang commented on HIVE-2417:


Committed, thanks Krishna Kumar!

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-09-01 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2417:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-31 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094963#comment-13094963
 ] 

He Yongqiang commented on HIVE-2417:


+1, will commit after tests pass

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2413) BlockMergeTask ignores client-specified jars

2011-08-31 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094962#comment-13094962
 ] 

He Yongqiang commented on HIVE-2413:


[junit] java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
[junit] at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
[junit] at org.apache.hadoop.fs.Path.(Path.java:90)
[junit] at 
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:602)
[junit] at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
[junit] at 
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
[junit] at 
org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:203)
[junit] at 
org.apache.hadoop.hive.ql.exec.DDLTask.mergeFiles(DDLTask.java:410)
[junit] at 
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:366)
[junit] at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
[junit] at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
[junit] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
[junit] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
[junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
[junit] at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:638)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table(TestCliDriver.java:1190)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

I got these error with a bunch of testcases. Here are some of them: 
rcfile_merge3.q, load_fs.q, alter_merge.q etc

can u take a look?


> BlockMergeTask ignores client-specified jars
> 
>
> Key: HIVE-2413
> URL: https://issues.apache.org/jira/browse/HIVE-2413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2413.v0.patch
>
>
> User-specified jars are not added to the hadoop tasks while executing a 
> BlockMergeTask resulting in a ClassNotFoundException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094298#comment-13094298
 ] 

He Yongqiang commented on HIVE-2417:


by "2 inserts", i mean remove the "load" command, and use 2 inserts to pop the 
data. 

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094297#comment-13094297
 ] 

He Yongqiang commented on HIVE-2417:


bq.The 'create' adds one file, and the insert adds another file.
sorry, i thought you are doing an "insert overwrite ", can u do 2 inserts? 

bq.This is needed so that the rcfiles in the target table are compressed with 
Bzip2. Do you mean that we should be using Default compression codec instead? 
Fine with me but why is that important?

Yes. i mean if you remove this line and keep the line "set 
hive.exec.compress.output = true;". The output will be compressed using 
DefaultCodec. The reason is that BZip2 may not installed for all hive users/dev.

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094287#comment-13094287
 ] 

He Yongqiang commented on HIVE-2417:


Good catch, this is a regression introduced in HIVE-2396.
Can you make the testcase more easy to reproduce the problem? I mean if without 
the change in this diff, should get an error or incorrect results when running 
with that testcase. 

1. remove this "+set 
mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;",
2. tgt_rc_merge_test only contains one file, so the 'alter table 
tgt_rc_merge_test concatenate;' will basically do nothing. Can you make sure 
this table at least contains 2 files? You can upload 2 gzip compressed rcfile 
if there is not.




> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2413) BlockMergeTask ignores client-specified jars

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094278#comment-13094278
 ] 

He Yongqiang commented on HIVE-2413:


+1, will commit after tests pass.

> BlockMergeTask ignores client-specified jars
> 
>
> Key: HIVE-2413
> URL: https://issues.apache.org/jira/browse/HIVE-2413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2413.v0.patch
>
>
> User-specified jars are not added to the hadoop tasks while executing a 
> BlockMergeTask resulting in a ClassNotFoundException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094276#comment-13094276
 ] 

He Yongqiang commented on HIVE-2420:


Amareshwari, please feel free to reassign to me if u do not have time on it. 
Thanks!

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2422) remove the intermediate dir when the hive query finish

2011-08-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2422:
---

Summary: remove the intermediate dir when the hive query finish   (was: 
remove the intermediate dir of one hive query when it finish )

> remove the intermediate dir when the hive query finish 
> ---
>
> Key: HIVE-2422
> URL: https://issues.apache.org/jira/browse/HIVE-2422
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's 
> output feed the second job. When the query finish, the first job's output 
> should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2422) remove the intermediate dir of one hive query when it finish

2011-08-30 Thread He Yongqiang (JIRA)
remove the intermediate dir of one hive query when it finish 
-

 Key: HIVE-2422
 URL: https://issues.apache.org/jira/browse/HIVE-2422
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang


right now if one hive query got compiled to 2 mr jobs, and the first job's 
output feed the second job. When the query finish, the first job's output 
should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2422) remove the intermediate dir of one hive query when it finish

2011-08-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-2422:
--

Assignee: He Yongqiang

> remove the intermediate dir of one hive query when it finish 
> -
>
> Key: HIVE-2422
> URL: https://issues.apache.org/jira/browse/HIVE-2422
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's 
> output feed the second job. When the query finish, the first job's output 
> should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094115#comment-13094115
 ] 

He Yongqiang commented on HIVE-2420:


Amareshwari, can you help take a look?

there is .q file in the diff. and the query in that .q file should be converted 
to a sort merge join. But it is not, i think this is because after ppd, the 
partition pruner expr is not correctly populated. 

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-08-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2420:
---

Attachment: HIVE-2420.reproduce.diff

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-2420.reproduce.diff
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-08-30 Thread He Yongqiang (JIRA)
partition pruner expr is not populated due to some bug in ppd
-

 Key: HIVE-2420
 URL: https://issues.apache.org/jira/browse/HIVE-2420
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-2420.reproduce.diff



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2415) disallow partition column names when doing replace columns

2011-08-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094094#comment-13094094
 ] 

He Yongqiang commented on HIVE-2415:


@Ashutosh Chauhan, today it is doing 2 metastore calls. one is in 
DDLSemanticAnalyzer, and the other is in DDLTask. Merging these 2 (check and 
change) to metastore server will save one metastore call, but add more load to 
metastore. Since this is only for a DDL command, it should be fine.


> disallow partition column names when doing replace columns
> --
>
> Key: HIVE-2415
> URL: https://issues.apache.org/jira/browse/HIVE-2415
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2415.1.patch
>
>
> alter table replace columns allows to add a column with the same name as 
> partition column, which introduced inconsistency. 
> We should disallow this. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2415) disallow partition column names when doing replace columns

2011-08-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2415:
---

Status: Patch Available  (was: Open)

> disallow partition column names when doing replace columns
> --
>
> Key: HIVE-2415
> URL: https://issues.apache.org/jira/browse/HIVE-2415
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2415.1.patch
>
>
> alter table replace columns allows to add a column with the same name as 
> partition column, which introduced inconsistency. 
> We should disallow this. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2415) disallow partition column names when doing replace columns

2011-08-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2415:
---

Attachment: HIVE-2415.1.patch

> disallow partition column names when doing replace columns
> --
>
> Key: HIVE-2415
> URL: https://issues.apache.org/jira/browse/HIVE-2415
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2415.1.patch
>
>
> alter table replace columns allows to add a column with the same name as 
> partition column, which introduced inconsistency. 
> We should disallow this. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2415) disallow partition column names when doing replace columns

2011-08-26 Thread He Yongqiang (JIRA)
disallow partition column names when doing replace columns
--

 Key: HIVE-2415
 URL: https://issues.apache.org/jira/browse/HIVE-2415
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


alter table replace columns allows to add a column with the same name as 
partition column, which introduced inconsistency. 

We should disallow this. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2406) return empty list instead of null for get_privileges

2011-08-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2406.


Resolution: Duplicate

merge with HIVE-2405

> return empty list instead of null for get_privileges
> 
>
> Key: HIVE-2406
> URL: https://issues.apache.org/jira/browse/HIVE-2406
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> This is to remove the thrift exception when running hive, which enables 
> authorization and uses a thrift remote metastore.
> this is an example of stack:
>> show grant user heyongqiang;
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
> unknown result
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782)
>   ... 16 more
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
>   ... 15 more
> Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
> unknown result
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_p

[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege

2011-08-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2405:
---

Attachment: HIVE-2405.2.patch

> get_privilege does not get user level privilege
> ---
>
> Key: HIVE-2405
> URL: https://issues.apache.org/jira/browse/HIVE-2405
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch
>
>
> hive> set hive.security.authorization.enabled=true;
> hive>  grant all to user heyongqiang;  
> hive> show grant user heyongqiang; 
> principalName heyongqiang 
> principalType USER
> privilege All 
> grantTime Wed Aug 24 11:51:54 PDT 2011
> grantor   heyongqiang 
> Time taken: 0.032 seconds
> hive>  CREATE TABLE src (foo INT, bar STRING); 
> Authorization failed:No privilege 'Create' found for outputs { 
> database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2405) get_privilege does not get user level privilege

2011-08-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090560#comment-13090560
 ] 

He Yongqiang commented on HIVE-2405:


a new patch merged with HIVE-2406

> get_privilege does not get user level privilege
> ---
>
> Key: HIVE-2405
> URL: https://issues.apache.org/jira/browse/HIVE-2405
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch
>
>
> hive> set hive.security.authorization.enabled=true;
> hive>  grant all to user heyongqiang;  
> hive> show grant user heyongqiang; 
> principalName heyongqiang 
> principalType USER
> privilege All 
> grantTime Wed Aug 24 11:51:54 PDT 2011
> grantor   heyongqiang 
> Time taken: 0.032 seconds
> hive>  CREATE TABLE src (foo INT, bar STRING); 
> Authorization failed:No privilege 'Create' found for outputs { 
> database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2406) return empty list instead of null for get_privileges

2011-08-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090559#comment-13090559
 ] 

He Yongqiang commented on HIVE-2406:


with the fix, it is more clean:


hive> show grant user heyongqiang;
OK
Time taken: 0.121 seconds


will merge this small change with HIVE-2405

> return empty list instead of null for get_privileges
> 
>
> Key: HIVE-2406
> URL: https://issues.apache.org/jira/browse/HIVE-2406
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> This is to remove the thrift exception when running hive, which enables 
> authorization and uses a thrift remote metastore.
> this is an example of stack:
>> show grant user heyongqiang;
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
> unknown result
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782)
>   ... 16 more
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: list_privileges failed: unknown 
> result
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
>   ... 15 more
> Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
> unknown result
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
>   at 
> org.apache.hadoop.hive.metastore.

[jira] [Updated] (HIVE-2406) return empty list instead of null for get_privileges

2011-08-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2406:
---

Description: 
This is to remove the thrift exception when running hive, which enables 
authorization and uses a thrift remote metastore.

this is an example of stack:

   > show grant user heyongqiang;
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: list_privileges failed: unknown result
at 
org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
unknown result
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086)
at 
org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782)
... 16 more
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: list_privileges failed: unknown result
at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: list_privileges failed: unknown result
at 
org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784)
at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450)
... 15 more
Caused by: org.apache.thrift.TApplicationException: list_privileges failed: 
unknown result
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086)
at 
org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782)




  was:This is to remove the thrift exception when running hive, which enables 
authorization and uses a thrift remote metastore.


> return empty list instead of null for get_privileges
> 
>
> Key: HIVE-2406
> URL: https://issues.apa

[jira] [Created] (HIVE-2406) return empty list instead of null for get_privileges

2011-08-24 Thread He Yongqiang (JIRA)
return empty list instead of null for get_privileges


 Key: HIVE-2406
 URL: https://issues.apache.org/jira/browse/HIVE-2406
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


This is to remove the thrift exception when running hive, which enables 
authorization and uses a thrift remote metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2405) get_privilege does not get user level privilege

2011-08-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090446#comment-13090446
 ] 

He Yongqiang commented on HIVE-2405:


this patch can also be applied to 0.7

> get_privilege does not get user level privilege
> ---
>
> Key: HIVE-2405
> URL: https://issues.apache.org/jira/browse/HIVE-2405
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2405.1.patch
>
>
> hive> set hive.security.authorization.enabled=true;
> hive>  grant all to user heyongqiang;  
> hive> show grant user heyongqiang; 
> principalName heyongqiang 
> principalType USER
> privilege All 
> grantTime Wed Aug 24 11:51:54 PDT 2011
> grantor   heyongqiang 
> Time taken: 0.032 seconds
> hive>  CREATE TABLE src (foo INT, bar STRING); 
> Authorization failed:No privilege 'Create' found for outputs { 
> database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege

2011-08-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2405:
---

Status: Patch Available  (was: Open)

> get_privilege does not get user level privilege
> ---
>
> Key: HIVE-2405
> URL: https://issues.apache.org/jira/browse/HIVE-2405
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2405.1.patch
>
>
> hive> set hive.security.authorization.enabled=true;
> hive>  grant all to user heyongqiang;  
> hive> show grant user heyongqiang; 
> principalName heyongqiang 
> principalType USER
> privilege All 
> grantTime Wed Aug 24 11:51:54 PDT 2011
> grantor   heyongqiang 
> Time taken: 0.032 seconds
> hive>  CREATE TABLE src (foo INT, bar STRING); 
> Authorization failed:No privilege 'Create' found for outputs { 
> database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege

2011-08-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2405:
---

Attachment: HIVE-2405.1.patch

> get_privilege does not get user level privilege
> ---
>
> Key: HIVE-2405
> URL: https://issues.apache.org/jira/browse/HIVE-2405
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2405.1.patch
>
>
> hive> set hive.security.authorization.enabled=true;
> hive>  grant all to user heyongqiang;  
> hive> show grant user heyongqiang; 
> principalName heyongqiang 
> principalType USER
> privilege All 
> grantTime Wed Aug 24 11:51:54 PDT 2011
> grantor   heyongqiang 
> Time taken: 0.032 seconds
> hive>  CREATE TABLE src (foo INT, bar STRING); 
> Authorization failed:No privilege 'Create' found for outputs { 
> database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   >