[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550379#comment-13550379 ] He Yongqiang commented on HIVE-3874: I want to list a few thoughts why i think the orc solution is a much more appealing one. 1. For a BIG data warehouse that stores more than 90% of existing data in rcfile (like FB's >100PB warehouse), data conversion from one format to another is something that definitely should be avoided. It is possible to convert some tables if there is a big space saving advantage. But managing two distinct formats which do not have any compatibility, inter-operability, or even in two different code repositories is another big headache that would avoid at the first place. 2. Developing the new ORC format in the hive/hcatalog codebase will make hive development/operations much easier. 3. Letting new ORC format have some backward compatibility with RCFile will save a lot of trouble. > Create a new Optimized Row Columnar file format for Hive > > > Key: HIVE-3874 > URL: https://issues.apache.org/jira/browse/HIVE-3874 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: OrcFileIntro.pptx > > > There are several limitations of the current RC File format that I'd like to > address by creating a new format: > * each column value is stored as a binary blob, which means: > ** the entire column value must be read, decompressed, and deserialized > ** the file format can't use smarter type-specific compression > ** push down filters can't be evaluated > * the start of each row group needs to be found by scanning > * user metadata can only be added to the file when the file is created > * the file doesn't store the number of rows per a file or row group > * there is no mechanism for seeking to a particular row number, which is > required for external indexes. > * there is no mechanism for storing light weight indexes within the file to > enable push-down filters to skip entire row groups. > * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550347#comment-13550347 ] He Yongqiang commented on HIVE-3585: bq. This patch is going to share 90% of its small code with the existing AvroSerde that was never shunted into contrib. Then why it is so hard to make it part of existing AvroSerde? bq. I'm not seeing any technical reasons to block progress. Technically, there is no issue. Technically I am pretty sure this can be well done. bq. Is anyone planning on exercising a -1? I have listed two options that i insist on. one is to develop it as part of existing avroserde, the other is to put it in contrib or a 3rd party lib (maybe github?). > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549870#comment-13549870 ] He Yongqiang commented on HIVE-3874: bq. It would be possible to extend the RCFile reader to recognize an ORC file and to have it delegate to the ORC File reader. it will be great to have this support. In this case, what's the fileformat for the partition/table, rcfile, or orcfile? When we did the conversion for old data from sequencefile to rcfile long time ago, it is a big headache handle errors like "unrecognized fileformat or corruption" because there is no interoperability between these two files. The most errors we saw are because the table/partition format does not match the actual data format. two examples: 1. old partition's data is rcfile, new partition's data is in orc format. 2. in one partition, some files are rcfile, and some files are in orc format. > Create a new Optimized Row Columnar file format for Hive > > > Key: HIVE-3874 > URL: https://issues.apache.org/jira/browse/HIVE-3874 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: OrcFileIntro.pptx > > > There are several limitations of the current RC File format that I'd like to > address by creating a new format: > * each column value is stored as a binary blob, which means: > ** the entire column value must be read, decompressed, and deserialized > ** the file format can't use smarter type-specific compression > ** push down filters can't be evaluated > * the start of each row group needs to be found by scanning > * user metadata can only be added to the file when the file is created > * the file doesn't store the number of rows per a file or row group > * there is no mechanism for seeking to a particular row number, which is > required for external indexes. > * there is no mechanism for storing light weight indexes within the file to > enable push-down filters to skip entire row groups. > * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549431#comment-13549431 ] He Yongqiang commented on HIVE-3874: That should work, just want to make sure they have similar API, so other tools/utilities will automatically work, or just needs small changes. One example is the block merger. > Create a new Optimized Row Columnar file format for Hive > > > Key: HIVE-3874 > URL: https://issues.apache.org/jira/browse/HIVE-3874 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: OrcFileIntro.pptx > > > There are several limitations of the current RC File format that I'd like to > address by creating a new format: > * each column value is stored as a binary blob, which means: > ** the entire column value must be read, decompressed, and deserialized > ** the file format can't use smarter type-specific compression > ** push down filters can't be evaluated > * the start of each row group needs to be found by scanning > * user metadata can only be added to the file when the file is created > * the file doesn't store the number of rows per a file or row group > * there is no mechanism for seeking to a particular row number, which is > required for external indexes. > * there is no mechanism for storing light weight indexes within the file to > enable push-down filters to skip entire row groups. > * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549410#comment-13549410 ] He Yongqiang commented on HIVE-3874: will this optimized format support backward compatibility? If it's backward compatible, it will be easier to deploy. New formats without backward compatibility is really a headache, especially when you have a need to convert old data. > Create a new Optimized Row Columnar file format for Hive > > > Key: HIVE-3874 > URL: https://issues.apache.org/jira/browse/HIVE-3874 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: OrcFileIntro.pptx > > > There are several limitations of the current RC File format that I'd like to > address by creating a new format: > * each column value is stored as a binary blob, which means: > ** the entire column value must be read, decompressed, and deserialized > ** the file format can't use smarter type-specific compression > ** push down filters can't be evaluated > * the start of each row group needs to be found by scanning > * user metadata can only be added to the file when the file is created > * the file doesn't store the number of rows per a file or row group > * there is no mechanism for seeking to a particular row number, which is > required for external indexes. > * there is no mechanism for storing light weight indexes within the file to > enable push-down filters to skip entire row groups. > * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348 ] He Yongqiang edited comment on HIVE-3585 at 1/7/13 10:40 PM: - contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. >From http://incubator.apache.org/hcatalog/docs/r0.4.0/, another good place i >can think of is the hcatalog project. But i don't know if hcatalog itself >includes custom data format support or not. was (Author: he yongqiang): contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. Another good place i can think of is the hcatalog project. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348 ] He Yongqiang commented on HIVE-3585: contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. Another good place i can think of is the hcatalog project. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546152#comment-13546152 ] He Yongqiang commented on HIVE-3585: HBaseSerde is first added to contrib and then moved to core later. bq. Pig is adding TrevniStorage as a builtin, and interoperability is desired. I think interoperability is not a problem no matter where the code residents. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544382#comment-13544382 ] He Yongqiang commented on HIVE-3585: So far i am still not convinced to have it as another builtin serde in Hive's core codebase. We initially did put some new serdes in contrib or 3rd party libs, examples include HBaseSerde and Zebra serde. If you can make it work with existing Avro serde, it will also be great. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543413#comment-13543413 ] He Yongqiang commented on HIVE-3585: I did not get why it does not work with partition schema update. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543391#comment-13543391 ] He Yongqiang commented on HIVE-3585: Thanks for just reminding me that there is already a Avro serde. Have you tried to make the required changes to be part of the existing Avro serde instead of creating a new one? > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543360#comment-13543360 ] He Yongqiang commented on HIVE-3585: @jakob, awesome to hear you are planning to own its maintenance. No particular intention to complicate your use case here, but i think a 3rd party lib or contrib folder would be good start and won't affect your usage. If i remember correctly, we used to do similar things for Pig's Zebra. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543294#comment-13543294 ] He Yongqiang commented on HIVE-3585: @Carl, adding code that is not much used is always no harm except a lot of maintenance and document pain. You can first go with a contrib folder or a 3rd party lib and merge to core hive later if it proves success. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543253#comment-13543253 ] He Yongqiang commented on HIVE-3585: @jakob, you can always implement reader of customized data in a 3rd party lib and let hive load it from there. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Mark Wagner >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496697#comment-13496697 ] He Yongqiang commented on HIVE-2206: okay, i will target commit it this weekend or earlier next week. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, > HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > This issue proposes a new logical optimizer called Correlation Optimizer, > which is used to merge correlated MapReduce jobs (MR jobs) into a single MR > job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The > paper and slides of YSmart are linked at the bottom. > Since Hive translates queries in a sentence by sentence fashion, for every > operation which may need to shuffle the data (e.g. join and aggregation > operations), Hive will generate a MapReduce job for that operation. However, > for those operations which may need to shuffle the data, they may involve > correlations explained below and thus can be executed in a single MR job. > # Input Correlation: Multiple MR jobs have input correlation (IC) if their > input relation sets are not disjoint; > # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they > have not only input correlation, but also the same partition key; > # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its > child nodes if it has the same partition key as that child node. > The current implementation of correlation optimizer only detect correlations > among MR jobs for reduce-side join operators and reduce-side aggregation > operators (not map only aggregation). A query will be optimized if it > satisfies following conditions. > # There exists a MR job for reduce-side join operator or reduce side > aggregation operator which have JFC with all of its parents MR jobs (TCs will > be also exploited if JFC exists); > # All input tables of those correlated MR job are original input tables (not > intermediate tables generated by sub-queries); and > # No self join is involved in those correlated MR jobs. > Correlation optimizer is implemented as a logical optimizer. The main reasons > are that it only needs to manipulate the query plan tree and it can leverage > the existing component on generating MR jobs. > Current implementation can serve as a framework for correlation related > optimizations. I think that it is better than adding individual optimizers. > There are several work that can be done in future to improve this optimizer. > Here are three examples. > # Support queries only involve TC; > # Support queries in which input tables of correlated MR jobs involves > intermediate tables; and > # Optimize queries involving self join. > References: > Paper and presentation of YSmart. > Paper: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496546#comment-13496546 ] He Yongqiang commented on HIVE-3585: Although it is so similar to RCFIle, i did not see any reference to RCFile in its doc. I assume that will help avoid confusion for its users. But as part of Hive, if we got two formats that are so similar to each other, the confusion will be thrown to all hive users. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Jakob Homan >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496537#comment-13496537 ] He Yongqiang commented on HIVE-3585: Yeah i read some docs of it. But i really did not see a big difference. Some features can be added to RCFile easily. Please point out if you think there is a dramatic difference in some designs. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Jakob Homan >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496529#comment-13496529 ] He Yongqiang commented on HIVE-2206: @Carl, keep in mind that you already months of time to comment. So maybe addressing your comments in new jiras will make more sense. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, > HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > This issue proposes a new logical optimizer called Correlation Optimizer, > which is used to merge correlated MapReduce jobs (MR jobs) into a single MR > job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The > paper and slides of YSmart are linked at the bottom. > Since Hive translates queries in a sentence by sentence fashion, for every > operation which may need to shuffle the data (e.g. join and aggregation > operations), Hive will generate a MapReduce job for that operation. However, > for those operations which may need to shuffle the data, they may involve > correlations explained below and thus can be executed in a single MR job. > # Input Correlation: Multiple MR jobs have input correlation (IC) if their > input relation sets are not disjoint; > # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they > have not only input correlation, but also the same partition key; > # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its > child nodes if it has the same partition key as that child node. > The current implementation of correlation optimizer only detect correlations > among MR jobs for reduce-side join operators and reduce-side aggregation > operators (not map only aggregation). A query will be optimized if it > satisfies following conditions. > # There exists a MR job for reduce-side join operator or reduce side > aggregation operator which have JFC with all of its parents MR jobs (TCs will > be also exploited if JFC exists); > # All input tables of those correlated MR job are original input tables (not > intermediate tables generated by sub-queries); and > # No self join is involved in those correlated MR jobs. > Correlation optimizer is implemented as a logical optimizer. The main reasons > are that it only needs to manipulate the query plan tree and it can leverage > the existing component on generating MR jobs. > Current implementation can serve as a framework for correlation related > optimizations. I think that it is better than adding individual optimizers. > There are several work that can be done in future to improve this optimizer. > Here are three examples. > # Support queries only involve TC; > # Support queries in which input tables of correlated MR jobs involves > intermediate tables; and > # Optimize queries involving self join. > References: > Paper and presentation of YSmart. > Paper: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496528#comment-13496528 ] He Yongqiang commented on HIVE-2206: @carl, you can go ahead comment, huai will address them in a sperate diff. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, > HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > This issue proposes a new logical optimizer called Correlation Optimizer, > which is used to merge correlated MapReduce jobs (MR jobs) into a single MR > job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The > paper and slides of YSmart are linked at the bottom. > Since Hive translates queries in a sentence by sentence fashion, for every > operation which may need to shuffle the data (e.g. join and aggregation > operations), Hive will generate a MapReduce job for that operation. However, > for those operations which may need to shuffle the data, they may involve > correlations explained below and thus can be executed in a single MR job. > # Input Correlation: Multiple MR jobs have input correlation (IC) if their > input relation sets are not disjoint; > # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they > have not only input correlation, but also the same partition key; > # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its > child nodes if it has the same partition key as that child node. > The current implementation of correlation optimizer only detect correlations > among MR jobs for reduce-side join operators and reduce-side aggregation > operators (not map only aggregation). A query will be optimized if it > satisfies following conditions. > # There exists a MR job for reduce-side join operator or reduce side > aggregation operator which have JFC with all of its parents MR jobs (TCs will > be also exploited if JFC exists); > # All input tables of those correlated MR job are original input tables (not > intermediate tables generated by sub-queries); and > # No self join is involved in those correlated MR jobs. > Correlation optimizer is implemented as a logical optimizer. The main reasons > are that it only needs to manipulate the query plan tree and it can leverage > the existing component on generating MR jobs. > Current implementation can serve as a framework for correlation related > optimizations. I think that it is better than adding individual optimizers. > There are several work that can be done in future to improve this optimizer. > Here are three examples. > # Support queries only involve TC; > # Support queries in which input tables of correlated MR jobs involves > intermediate tables; and > # Optimize queries involving self join. > References: > Paper and presentation of YSmart. > Paper: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496399#comment-13496399 ] He Yongqiang commented on HIVE-2206: +1, i will commit after tests pass. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, > HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > This issue proposes a new logical optimizer called Correlation Optimizer, > which is used to merge correlated MapReduce jobs (MR jobs) into a single MR > job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The > paper and slides of YSmart are linked at the bottom. > Since Hive translates queries in a sentence by sentence fashion, for every > operation which may need to shuffle the data (e.g. join and aggregation > operations), Hive will generate a MapReduce job for that operation. However, > for those operations which may need to shuffle the data, they may involve > correlations explained below and thus can be executed in a single MR job. > # Input Correlation: Multiple MR jobs have input correlation (IC) if their > input relation sets are not disjoint; > # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they > have not only input correlation, but also the same partition key; > # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its > child nodes if it has the same partition key as that child node. > The current implementation of correlation optimizer only detect correlations > among MR jobs for reduce-side join operators and reduce-side aggregation > operators (not map only aggregation). A query will be optimized if it > satisfies following conditions. > # There exists a MR job for reduce-side join operator or reduce side > aggregation operator which have JFC with all of its parents MR jobs (TCs will > be also exploited if JFC exists); > # All input tables of those correlated MR job are original input tables (not > intermediate tables generated by sub-queries); and > # No self join is involved in those correlated MR jobs. > Correlation optimizer is implemented as a logical optimizer. The main reasons > are that it only needs to manipulate the query plan tree and it can leverage > the existing component on generating MR jobs. > Current implementation can serve as a framework for correlation related > optimizations. I think that it is better than adding individual optimizers. > There are several work that can be done in future to improve this optimizer. > Here are three examples. > # Support queries only involve TC; > # Support queries in which input tables of correlated MR jobs involves > intermediate tables; and > # Optimize queries involving self join. > References: > Paper and presentation of YSmart. > Paper: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496389#comment-13496389 ] He Yongqiang commented on HIVE-3585: vote for -1. I did not see any benefit of adding one that is just a copycat of rcfile. > Integrate Trevni as another columnar oriented file format > - > > Key: HIVE-3585 > URL: https://issues.apache.org/jira/browse/HIVE-3585 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0 >Reporter: alex gemini >Assignee: Jakob Homan >Priority: Minor > > add new avro module trevni as another columnar format.New columnar format > need a columnar SerDe,seems fastutil is a good choice.the shark project use > fastutil library as columnar serde library but it seems too large (almost > 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466937#comment-13466937 ] He Yongqiang commented on HIVE-2206: I will be on vacation this whole week. Given this is a very big diff, I will keep this open for another one week or two for more comments. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466625#comment-13466625 ] He Yongqiang commented on HIVE-2206: @Carl, i just reverted. I will commit again tomorrow. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466584#comment-13466584 ] He Yongqiang commented on HIVE-2206: I did not see a 24 hours waiting on the bylaw page? > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466581#comment-13466581 ] He Yongqiang commented on HIVE-2206: @Carl, btw, i did mentioned a few times on the comments that i am planing to commit this one. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466580#comment-13466580 ] He Yongqiang commented on HIVE-2206: I commented that all tests passed. ok, +1. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2206: --- Resolution: Fixed Status: Resolved (was: Patch Available) I just committed. Thanks for the hard work, Yin Huai! > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466552#comment-13466552 ] He Yongqiang commented on HIVE-2206: All tests passed for me. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 0.10.0 >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, > HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, > HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, > HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, > HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457484#comment-13457484 ] He Yongqiang commented on HIVE-2206: The current patch looks ok. @Carl, please give more specific comments. We should agree on that new big features should not be enabled by default. That's too risky. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.10-r1384442.patch.txt, > HIVE-2206.11-r1385084.patch.txt, HIVE-2206.1.patch.txt, > HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, > HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, > HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, > HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424315#comment-13424315 ] He Yongqiang commented on HIVE-2206: For the last few months (almost one year), Yin has been actively maintaining this patch, and i think it is in a very good state to check into trunk. So i will do some final review, and hope to commit it sometime next month. Please feel free to jump in to review the patch and put any comments here before the commit. In the last review, I will make sure this patch will not have big changes to existing execution path, so it can be simply disabled like other optimizations in Hive. And Yin will still be actively maintaining this patch (help fix bugs etc) after the commit. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, > HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, > HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, > HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, > YSmartPatchForHive.patch, testQueries.2.q > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2845) Add support for index joins in Hive
[ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420466#comment-13420466 ] He Yongqiang commented on HIVE-2845: With HIVE-1644, this should be done. Have you looked at the query plan, or looked at the patch of HIVE-1644? Maybe Hive-1644 does not process join cases (but the code is already there.) The filter needs to be pushed down to the mapper to trigger the auto index. > Add support for index joins in Hive > --- > > Key: HIVE-2845 > URL: https://issues.apache.org/jira/browse/HIVE-2845 > Project: Hive > Issue Type: New Feature > Components: Indexing, Query Processor >Reporter: Namit Jain > Labels: indexing, joins, performance > > Hive supports indexes, which are used for filters currently. > It would be very useful to add support for index-based joins in Hive. > If 2 tables A and B are being joined, and an index exists on the join key of > A, > B can be scanned (by the mappers), and for each row in B, a lookup for the > corresponding row in A can be performed. > This can be very useful for some usecases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3086) Skewed Join Optimization
[ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401960#comment-13401960 ] He Yongqiang commented on HIVE-3086: 'hint' by user has been proven not very useful. Automatically detecting skewed keys, like what the current skew join processor is doing now, will make it more powerful and useful. @Nadeem, can you add more details to the wiki about the differences between the existing one and the one you are working on. The current one can not process the case where a same join key is skewed in more than one table. Are you targeting those cases? Also there are some problems with existing skew join opt, can you also try to fix those as part of your project? > Skewed Join Optimization > > > Key: HIVE-3086 > URL: https://issues.apache.org/jira/browse/HIVE-3086 > Project: Hive > Issue Type: New Feature >Reporter: Nadeem Moidu >Assignee: Nadeem Moidu > > During a join operation, if one of the columns has a skewed key, it can cause > that particular reducer to become the bottleneck. The following feature will > address it: > https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager
[ https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2450: --- Attachment: HIVE-2450.3.patch address John's comments. for refactoring the retry logic out, i agree. But i think we can do it later when we do the second lock manager (even we do it now, it will need to change later). > move lock retry logic into ZooKeeperHiveLockManager > --- > > Key: HIVE-2450 > URL: https://issues.apache.org/jira/browse/HIVE-2450 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2450.1.patch, HIVE-2450.2.patch, HIVE-2450.3.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager
[ https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2450: --- Attachment: HIVE-2450.2.patch > move lock retry logic into ZooKeeperHiveLockManager > --- > > Key: HIVE-2450 > URL: https://issues.apache.org/jira/browse/HIVE-2450 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2450.1.patch, HIVE-2450.2.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2464) report progress in MapOperator
[ https://issues.apache.org/jira/browse/HIVE-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2464. Resolution: Won't Fix > report progress in MapOperator > -- > > Key: HIVE-2464 > URL: https://issues.apache.org/jira/browse/HIVE-2464 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2461) Add method to PerfLogger to perform cleanup/final steps.
[ https://issues.apache.org/jira/browse/HIVE-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2461. Resolution: Fixed committed, thanks Kevin Wilfong! > Add method to PerfLogger to perform cleanup/final steps. > > > Key: HIVE-2461 > URL: https://issues.apache.org/jira/browse/HIVE-2461 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2461.1.patch.txt, HIVE-2461.2.patch.txt > > > I think a method added to PerfLogger to perform cleanup/final steps would be > very useful. For example, it could be used to close any database connections > created as part of a PerfLogger subclass, or to perform logging that requires > all perf values to first be calculated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2462) make INNER a non-reserved keyword
[ https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2462: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed, thanks John! > make INNER a non-reserved keyword > - > > Key: HIVE-2462 > URL: https://issues.apache.org/jira/browse/HIVE-2462 > Project: Hive > Issue Type: Improvement >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.9.0 > > Attachments: HIVE-2462.1.patch > > > HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards > compatibility for queries which were using it as an identifier. This patch > addresses that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2464) report progress in MapOperator
report progress in MapOperator -- Key: HIVE-2464 URL: https://issues.apache.org/jira/browse/HIVE-2464 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2461) Add method to PerfLogger to perform cleanup/final steps.
[ https://issues.apache.org/jira/browse/HIVE-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113068#comment-13113068 ] He Yongqiang commented on HIVE-2461: +1, will commit after tests pass > Add method to PerfLogger to perform cleanup/final steps. > > > Key: HIVE-2461 > URL: https://issues.apache.org/jira/browse/HIVE-2461 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2461.1.patch.txt, HIVE-2461.2.patch.txt > > > I think a method added to PerfLogger to perform cleanup/final steps would be > very useful. For example, it could be used to close any database connections > created as part of a PerfLogger subclass, or to perform logging that requires > all perf values to first be calculated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2462) make INNER a non-reserved keyword
[ https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113027#comment-13113027 ] He Yongqiang commented on HIVE-2462: The patch looks good, we should have it. running test > make INNER a non-reserved keyword > - > > Key: HIVE-2462 > URL: https://issues.apache.org/jira/browse/HIVE-2462 > Project: Hive > Issue Type: Improvement >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.9.0 > > Attachments: HIVE-2462.1.patch > > > HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards > compatibility for queries which were using it as an identifier. This patch > addresses that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2451: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed, thanks Siying! > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2456) JDBCStatsAggregator DELETE STATEMENT should escape _ and %
[ https://issues.apache.org/jira/browse/HIVE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2456. Resolution: Fixed committed, thanks Ning! > JDBCStatsAggregator DELETE STATEMENT should escape _ and % > -- > > Key: HIVE-2456 > URL: https://issues.apache.org/jira/browse/HIVE-2456 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-2456.patch > > > JDBCStatsAggregator first aggregates stats from all publishers, and then > delete these intermediate results. The delete is using LIKE operator, so it > needs to escape '_' and '%'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging
[ https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108156#comment-13108156 ] He Yongqiang commented on HIVE-2453: what i mean is should we tag the hadoop job or the query, or both? for the above example, it has 2 jobs, the first one is a join, and the second a group by. > Need a way to categorize queries in hooks for improved logging > -- > > Key: HIVE-2453 > URL: https://issues.apache.org/jira/browse/HIVE-2453 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt > > > We need a way to categorize queries, such as whether or not the include a > join clause, a group by clause, etc., in the hooks. This will allow for > better performance logging. > Currently the only way I can find is to go through the operators in the > tasks, but which operators are used for the different types of queries may > change over time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2456) JDBCStatsAggregator DELETE STATEMENT should escape _ and %
[ https://issues.apache.org/jira/browse/HIVE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108025#comment-13108025 ] He Yongqiang commented on HIVE-2456: +1, will commit after tests pass > JDBCStatsAggregator DELETE STATEMENT should escape _ and % > -- > > Key: HIVE-2456 > URL: https://issues.apache.org/jira/browse/HIVE-2456 > Project: Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-2456.patch > > > JDBCStatsAggregator first aggregates stats from all publishers, and then > delete these intermediate results. The delete is using LIKE operator, so it > needs to escape '_' and '%'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging
[ https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108013#comment-13108013 ] He Yongqiang commented on HIVE-2453: i haven't look at the change. just have a small question: if a query like "select key, count(1) from (select a.key as key, b.value as value from src a join src b on a.key=b.key) group by key", what tag will this query get? > Need a way to categorize queries in hooks for improved logging > -- > > Key: HIVE-2453 > URL: https://issues.apache.org/jira/browse/HIVE-2453 > Project: Hive > Issue Type: Improvement >Reporter: Kevin Wilfong >Assignee: Kevin Wilfong > Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt > > > We need a way to categorize queries, such as whether or not the include a > join clause, a group by clause, etc., in the hooks. This will allow for > better performance logging. > Currently the only way I can find is to go through the operators in the > tasks, but which operators are used for the different types of queries may > change over time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager
[ https://issues.apache.org/jira/browse/HIVE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2450: --- Attachment: HIVE-2450.1.patch > move lock retry logic into ZooKeeperHiveLockManager > --- > > Key: HIVE-2450 > URL: https://issues.apache.org/jira/browse/HIVE-2450 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2450.1.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105782#comment-13105782 ] He Yongqiang commented on HIVE-2451: +1, will commit after tests pass > TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression > of HIVE-1538 > -- > > Key: HIVE-2451 > URL: https://issues.apache.org/jira/browse/HIVE-2451 > Project: Hive > Issue Type: Bug >Reporter: Siying Dong >Assignee: Siying Dong > Attachments: HIVE-2451.1.patch > > > Example: > select count(1) from TABLESAMPLE(BUCKET xxx out of yyy) where > = 'xxx' > will not trigger input pruning. > The reason is that we assume sample filtering operator only happens as the > second filter after table scan, which is broken by HIVE-1538, even if the > feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2450) move lock retry logic into ZooKeeperHiveLockManager
move lock retry logic into ZooKeeperHiveLockManager --- Key: HIVE-2450 URL: https://issues.apache.org/jira/browse/HIVE-2450 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105619#comment-13105619 ] He Yongqiang commented on HIVE-2206: ok. how about just "correlation"? Also can you take a look if it is possible to the optimization as part of physical optimizer. We need a lot of code cleanup in the current patch. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: Queries, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105098#comment-13105098 ] He Yongqiang commented on HIVE-2206: Cool! Yin, please let us know when u are mostly done. one small things is that in the hive code let's call the new optimizer as "cooperative scan" instead of YSmart. But we can add the paper ref in the comment. > add a new optimizer for query correlation discovery and optimization > > > Key: HIVE-2206 > URL: https://issues.apache.org/jira/browse/HIVE-2206 > Project: Hive > Issue Type: New Feature >Reporter: He Yongqiang >Assignee: Yin Huai > Attachments: Queries, YSmartPatchForHive.patch > > > reference: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104835#comment-13104835 ] He Yongqiang commented on HIVE-2420: awesome, will first try the config setting. > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2217) add Query text for debugging in lock data
[ https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2217: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed, thanks Jiayan! > add Query text for debugging in lock data > - > > Key: HIVE-2217 > URL: https://issues.apache.org/jira/browse/HIVE-2217 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Namit Jain >Assignee: Jiayan Jiang > Attachments: hive_diff2 > > > Currently, the queryId is stored in the lock data - > Query text would improve the debuggability -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2217) add Query text for debugging in lock data
[ https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103161#comment-13103161 ] He Yongqiang commented on HIVE-2217: +1, will commit after tests pass. > add Query text for debugging in lock data > - > > Key: HIVE-2217 > URL: https://issues.apache.org/jira/browse/HIVE-2217 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Namit Jain >Assignee: Jiayan Jiang > Attachments: hive_diff2 > > > Currently, the queryId is stored in the lock data - > Query text would improve the debuggability -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1975) "insert overwrite directory" Not able to insert data with multi level directory path
[ https://issues.apache.org/jira/browse/HIVE-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103160#comment-13103160 ] He Yongqiang commented on HIVE-1975: what's the use case here? the user can always first create the parent dir. But users misspell the dir name, they may not want the dirs created. Or worse, the data got loaded to some other place they not noticed. > "insert overwrite directory" Not able to insert data with multi level > directory path > > > Key: HIVE-1975 > URL: https://issues.apache.org/jira/browse/HIVE-1975 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 > Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise > Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam > Attachments: HIVE-1975.patch > > > Below query execution is failed > Ex: > {noformat} >insert overwrite directory '/HIVEFT25686/chinna/' select * from dept_j; > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1996: --- Status: Open (was: Patch Available) > "LOAD DATA INPATH" fails when the table already contains a file of the same > name > > > Key: HIVE-1996 > URL: https://issues.apache.org/jira/browse/HIVE-1996 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Kirk True >Assignee: Chinna Rao Lalam > Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch > > > Steps: > 1. From the command line copy the kv2.txt data file into the current user's > HDFS directory: > {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt > kv2.txt}} > 2. In Hive, create the table: > {{create table tst_src1 (key_ int, value_ string);}} > 3. Load the data into the table from HDFS: > {{load data inpath './kv2.txt' into table tst_src1;}} > 4. Repeat step 1 > 5. Repeat step 3 > Expected: > To have kv2.txt renamed in HDFS and then copied to the destination as per > HIVE-307. > Actual: > File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} > as it continues to use the same array elements (with the un-renamed, old file > names). It crashes with this error: > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) > at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103155#comment-13103155 ] He Yongqiang commented on HIVE-1996: For this, we need to make the rename optional, and by default disabled. If disabled rename, should throw an error to user. > "LOAD DATA INPATH" fails when the table already contains a file of the same > name > > > Key: HIVE-1996 > URL: https://issues.apache.org/jira/browse/HIVE-1996 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Kirk True >Assignee: Chinna Rao Lalam > Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch > > > Steps: > 1. From the command line copy the kv2.txt data file into the current user's > HDFS directory: > {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt > kv2.txt}} > 2. In Hive, create the table: > {{create table tst_src1 (key_ int, value_ string);}} > 3. Load the data into the table from HDFS: > {{load data inpath './kv2.txt' into table tst_src1;}} > 4. Repeat step 1 > 5. Repeat step 3 > Expected: > To have kv2.txt renamed in HDFS and then copied to the destination as per > HIVE-307. > Actual: > File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} > as it continues to use the same array elements (with the un-renamed, old file > names). It crashes with this error: > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) > at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files
[ https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2440: --- Attachment: HIVE-2440.3.patch removed childrenPaths from MapOp > make hive mapper initialize faster when having tons of input files > -- > > Key: HIVE-2440 > URL: https://issues.apache.org/jira/browse/HIVE-2440 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch, HIVE-2440.3.patch > > > when one hive job has tons of input files, a lot of mappers may fail because > of slow initialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2440) make hive mapper initialize faster when having tons of input files
[ https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102934#comment-13102934 ] He Yongqiang commented on HIVE-2440: https://reviews.apache.org/r/1813/ > make hive mapper initialize faster when having tons of input files > -- > > Key: HIVE-2440 > URL: https://issues.apache.org/jira/browse/HIVE-2440 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch > > > when one hive job has tons of input files, a lot of mappers may fail because > of slow initialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files
[ https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2440: --- Attachment: HIVE-2440.2.patch This fixes test failure on combine3 > make hive mapper initialize faster when having tons of input files > -- > > Key: HIVE-2440 > URL: https://issues.apache.org/jira/browse/HIVE-2440 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2440.1.patch, HIVE-2440.2.patch > > > when one hive job has tons of input files, a lot of mappers may fail because > of slow initialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2440) make hive mapper initialize faster when having tons of input files
[ https://issues.apache.org/jira/browse/HIVE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2440: --- Attachment: HIVE-2440.1.patch > make hive mapper initialize faster when having tons of input files > -- > > Key: HIVE-2440 > URL: https://issues.apache.org/jira/browse/HIVE-2440 > Project: Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2440.1.patch > > > when one hive job has tons of input files, a lot of mappers may fail because > of slow initialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2440) make hive mapper initialize faster when having tons of input files
make hive mapper initialize faster when having tons of input files -- Key: HIVE-2440 URL: https://issues.apache.org/jira/browse/HIVE-2440 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang when one hive job has tons of input files, a lot of mappers may fail because of slow initialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2429) skip corruption bug that cause data not decompressed
[ https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2429. Resolution: Fixed committed, thanks Ramkumar Vadali! > skip corruption bug that cause data not decompressed > > > Key: HIVE-2429 > URL: https://issues.apache.org/jira/browse/HIVE-2429 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Ramkumar Vadali > Attachments: HIVE-2429.patch > > > This is a regression of https://issues.apache.org/jira/browse/HIVE-2404 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2429) skip corruption bug that cause data not decompressed
[ https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099268#comment-13099268 ] He Yongqiang commented on HIVE-2429: +1, will commit after tests pass > skip corruption bug that cause data not decompressed > > > Key: HIVE-2429 > URL: https://issues.apache.org/jira/browse/HIVE-2429 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Ramkumar Vadali > Attachments: HIVE-2429.patch > > > This is a regression of https://issues.apache.org/jira/browse/HIVE-2404 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099132#comment-13099132 ] He Yongqiang commented on HIVE-2420: i think a quick fix may be just revert the diff of dedup filters. what do u think? > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2429) skip corruption bug that cause data not decompressed
[ https://issues.apache.org/jira/browse/HIVE-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang reassigned HIVE-2429: -- Assignee: Ramkumar Vadali (was: He Yongqiang) > skip corruption bug that cause data not decompressed > > > Key: HIVE-2429 > URL: https://issues.apache.org/jira/browse/HIVE-2429 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Ramkumar Vadali > > This is a regression of https://issues.apache.org/jira/browse/HIVE-2404 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2429) skip corruption bug that cause data not decompressed
skip corruption bug that cause data not decompressed Key: HIVE-2429 URL: https://issues.apache.org/jira/browse/HIVE-2429 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang This is a regression of https://issues.apache.org/jira/browse/HIVE-2404 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2415) disallow partition column names when doing replace columns
[ https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098414#comment-13098414 ] He Yongqiang commented on HIVE-2415: @Ashutosh, yeah, i understand your point of moving the validation from client to metastore server. There is another concern is that we want the hive metastore have much more flexibility than the client side, so if something goes wrong for any reason, we can use thrift metastore interface to fix it. For example, if a table is somehow has a normal column whose name conflicts with a partition column, we won't be able to fix it if we do validation on the metastore side. > disallow partition column names when doing replace columns > -- > > Key: HIVE-2415 > URL: https://issues.apache.org/jira/browse/HIVE-2415 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2415.1.patch > > > alter table replace columns allows to add a column with the same name as > partition column, which introduced inconsistency. > We should disallow this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098251#comment-13098251 ] He Yongqiang commented on HIVE-2420: This is pretty important. it will block us testing and deploying the open source trunk. > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Amareshwari Sriramadasu > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2404) Allow RCFile Reader to tolerate corruptions
[ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2404. Resolution: Fixed committed, thanks Ramkumar! > Allow RCFile Reader to tolerate corruptions > --- > > Key: HIVE-2404 > URL: https://issues.apache.org/jira/browse/HIVE-2404 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Minor > Attachments: toleratecorruptions.2.patch, > toleratecorruptions.3.patch, toleratecorruptions.patch > > > Sometimes it is useful to tolerate corruptions during a query and return > results based on the files that can be processed. A single corrupt block of > data should not prevent reading the rest of the data. > We need a way to gracefully ignore errors while reading a RC File -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions
[ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096476#comment-13096476 ] He Yongqiang commented on HIVE-2404: +1, will commit after tests pass > Allow RCFile Reader to tolerate corruptions > --- > > Key: HIVE-2404 > URL: https://issues.apache.org/jira/browse/HIVE-2404 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Minor > Attachments: toleratecorruptions.2.patch, > toleratecorruptions.3.patch, toleratecorruptions.patch > > > Sometimes it is useful to tolerate corruptions during a query and return > results based on the files that can be processed. A single corrupt block of > data should not prevent reading the rest of the data. > We need a way to gracefully ignore errors while reading a RC File -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions
[ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096295#comment-13096295 ] He Yongqiang commented on HIVE-2404: Awesome feature! some nitpick comments on review board. Thanks! > Allow RCFile Reader to tolerate corruptions > --- > > Key: HIVE-2404 > URL: https://issues.apache.org/jira/browse/HIVE-2404 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.1 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Minor > Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch > > > Sometimes it is useful to tolerate corruptions during a query and return > results based on the files that can be processed. A single corrupt block of > data should not prevent reading the rest of the data. > We need a way to gracefully ignore errors while reading a RC File -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2413) BlockMergeTask ignores client-specified jars
[ https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2413: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed, thanks Krishna Kumar! > BlockMergeTask ignores client-specified jars > > > Key: HIVE-2413 > URL: https://issues.apache.org/jira/browse/HIVE-2413 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2413.v0.patch, HIVE-2413.v1.patch > > > User-specified jars are not added to the hadoop tasks while executing a > BlockMergeTask resulting in a ClassNotFoundException. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095608#comment-13095608 ] He Yongqiang commented on HIVE-2417: Committed, thanks Krishna Kumar! > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2417: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094963#comment-13094963 ] He Yongqiang commented on HIVE-2417: +1, will commit after tests pass > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2413) BlockMergeTask ignores client-specified jars
[ https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094962#comment-13094962 ] He Yongqiang commented on HIVE-2413: [junit] java.lang.IllegalArgumentException: Can not create a Path from an empty string [junit] at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82) [junit] at org.apache.hadoop.fs.Path.(Path.java:90) [junit] at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:602) [junit] at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) [junit] at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) [junit] at org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:203) [junit] at org.apache.hadoop.hive.ql.exec.DDLTask.mergeFiles(DDLTask.java:410) [junit] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:366) [junit] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) [junit] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) [junit] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) [junit] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) [junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) [junit] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) [junit] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:638) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table(TestCliDriver.java:1190) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) I got these error with a bunch of testcases. Here are some of them: rcfile_merge3.q, load_fs.q, alter_merge.q etc can u take a look? > BlockMergeTask ignores client-specified jars > > > Key: HIVE-2413 > URL: https://issues.apache.org/jira/browse/HIVE-2413 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2413.v0.patch > > > User-specified jars are not added to the hadoop tasks while executing a > BlockMergeTask resulting in a ClassNotFoundException. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094298#comment-13094298 ] He Yongqiang commented on HIVE-2417: by "2 inserts", i mean remove the "load" command, and use 2 inserts to pop the data. > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094297#comment-13094297 ] He Yongqiang commented on HIVE-2417: bq.The 'create' adds one file, and the insert adds another file. sorry, i thought you are doing an "insert overwrite ", can u do 2 inserts? bq.This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important? Yes. i mean if you remove this line and keep the line "set hive.exec.compress.output = true;". The output will be compressed using DefaultCodec. The reason is that BZip2 may not installed for all hive users/dev. > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094287#comment-13094287 ] He Yongqiang commented on HIVE-2417: Good catch, this is a regression introduced in HIVE-2396. Can you make the testcase more easy to reproduce the problem? I mean if without the change in this diff, should get an error or incorrect results when running with that testcase. 1. remove this "+set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;", 2. tgt_rc_merge_test only contains one file, so the 'alter table tgt_rc_merge_test concatenate;' will basically do nothing. Can you make sure this table at least contains 2 files? You can upload 2 gzip compressed rcfile if there is not. > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2413) BlockMergeTask ignores client-specified jars
[ https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094278#comment-13094278 ] He Yongqiang commented on HIVE-2413: +1, will commit after tests pass. > BlockMergeTask ignores client-specified jars > > > Key: HIVE-2413 > URL: https://issues.apache.org/jira/browse/HIVE-2413 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2413.v0.patch > > > User-specified jars are not added to the hadoop tasks while executing a > BlockMergeTask resulting in a ClassNotFoundException. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094276#comment-13094276 ] He Yongqiang commented on HIVE-2420: Amareshwari, please feel free to reassign to me if u do not have time on it. Thanks! > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Amareshwari Sriramadasu > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2422) remove the intermediate dir when the hive query finish
[ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2422: --- Summary: remove the intermediate dir when the hive query finish (was: remove the intermediate dir of one hive query when it finish ) > remove the intermediate dir when the hive query finish > --- > > Key: HIVE-2422 > URL: https://issues.apache.org/jira/browse/HIVE-2422 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > > right now if one hive query got compiled to 2 mr jobs, and the first job's > output feed the second job. When the query finish, the first job's output > should be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2422) remove the intermediate dir of one hive query when it finish
remove the intermediate dir of one hive query when it finish - Key: HIVE-2422 URL: https://issues.apache.org/jira/browse/HIVE-2422 Project: Hive Issue Type: Bug Reporter: He Yongqiang right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2422) remove the intermediate dir of one hive query when it finish
[ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang reassigned HIVE-2422: -- Assignee: He Yongqiang > remove the intermediate dir of one hive query when it finish > - > > Key: HIVE-2422 > URL: https://issues.apache.org/jira/browse/HIVE-2422 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > > right now if one hive query got compiled to 2 mr jobs, and the first job's > output feed the second job. When the query finish, the first job's output > should be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094115#comment-13094115 ] He Yongqiang commented on HIVE-2420: Amareshwari, can you help take a look? there is .q file in the diff. and the query in that .q file should be converted to a sort merge join. But it is not, i think this is because after ppd, the partition pruner expr is not correctly populated. > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Amareshwari Sriramadasu > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2420: --- Attachment: HIVE-2420.reproduce.diff > partition pruner expr is not populated due to some bug in ppd > - > > Key: HIVE-2420 > URL: https://issues.apache.org/jira/browse/HIVE-2420 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: Amareshwari Sriramadasu > Attachments: HIVE-2420.reproduce.diff > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
partition pruner expr is not populated due to some bug in ppd - Key: HIVE-2420 URL: https://issues.apache.org/jira/browse/HIVE-2420 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: Amareshwari Sriramadasu Attachments: HIVE-2420.reproduce.diff -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2415) disallow partition column names when doing replace columns
[ https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094094#comment-13094094 ] He Yongqiang commented on HIVE-2415: @Ashutosh Chauhan, today it is doing 2 metastore calls. one is in DDLSemanticAnalyzer, and the other is in DDLTask. Merging these 2 (check and change) to metastore server will save one metastore call, but add more load to metastore. Since this is only for a DDL command, it should be fine. > disallow partition column names when doing replace columns > -- > > Key: HIVE-2415 > URL: https://issues.apache.org/jira/browse/HIVE-2415 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2415.1.patch > > > alter table replace columns allows to add a column with the same name as > partition column, which introduced inconsistency. > We should disallow this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2415) disallow partition column names when doing replace columns
[ https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2415: --- Status: Patch Available (was: Open) > disallow partition column names when doing replace columns > -- > > Key: HIVE-2415 > URL: https://issues.apache.org/jira/browse/HIVE-2415 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2415.1.patch > > > alter table replace columns allows to add a column with the same name as > partition column, which introduced inconsistency. > We should disallow this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2415) disallow partition column names when doing replace columns
[ https://issues.apache.org/jira/browse/HIVE-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2415: --- Attachment: HIVE-2415.1.patch > disallow partition column names when doing replace columns > -- > > Key: HIVE-2415 > URL: https://issues.apache.org/jira/browse/HIVE-2415 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2415.1.patch > > > alter table replace columns allows to add a column with the same name as > partition column, which introduced inconsistency. > We should disallow this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2415) disallow partition column names when doing replace columns
disallow partition column names when doing replace columns -- Key: HIVE-2415 URL: https://issues.apache.org/jira/browse/HIVE-2415 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang alter table replace columns allows to add a column with the same name as partition column, which introduced inconsistency. We should disallow this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2406) return empty list instead of null for get_privileges
[ https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2406. Resolution: Duplicate merge with HIVE-2405 > return empty list instead of null for get_privileges > > > Key: HIVE-2406 > URL: https://issues.apache.org/jira/browse/HIVE-2406 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > > This is to remove the thrift exception when running hive, which enables > authorization and uses a thrift remote metastore. > this is an example of stack: >> show grant user heyongqiang; > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.thrift.TApplicationException: list_privileges failed: > unknown result > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086) > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782) > ... 16 more > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) > ... 15 more > Caused by: org.apache.thrift.TApplicationException: list_privileges failed: > unknown result > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_p
[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2405: --- Attachment: HIVE-2405.2.patch > get_privilege does not get user level privilege > --- > > Key: HIVE-2405 > URL: https://issues.apache.org/jira/browse/HIVE-2405 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch > > > hive> set hive.security.authorization.enabled=true; > hive> grant all to user heyongqiang; > hive> show grant user heyongqiang; > principalName heyongqiang > principalType USER > privilege All > grantTime Wed Aug 24 11:51:54 PDT 2011 > grantor heyongqiang > Time taken: 0.032 seconds > hive> CREATE TABLE src (foo INT, bar STRING); > Authorization failed:No privilege 'Create' found for outputs { > database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090560#comment-13090560 ] He Yongqiang commented on HIVE-2405: a new patch merged with HIVE-2406 > get_privilege does not get user level privilege > --- > > Key: HIVE-2405 > URL: https://issues.apache.org/jira/browse/HIVE-2405 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch > > > hive> set hive.security.authorization.enabled=true; > hive> grant all to user heyongqiang; > hive> show grant user heyongqiang; > principalName heyongqiang > principalType USER > privilege All > grantTime Wed Aug 24 11:51:54 PDT 2011 > grantor heyongqiang > Time taken: 0.032 seconds > hive> CREATE TABLE src (foo INT, bar STRING); > Authorization failed:No privilege 'Create' found for outputs { > database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2406) return empty list instead of null for get_privileges
[ https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090559#comment-13090559 ] He Yongqiang commented on HIVE-2406: with the fix, it is more clean: hive> show grant user heyongqiang; OK Time taken: 0.121 seconds will merge this small change with HIVE-2405 > return empty list instead of null for get_privileges > > > Key: HIVE-2406 > URL: https://issues.apache.org/jira/browse/HIVE-2406 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > > This is to remove the thrift exception when running hive, which enables > authorization and uses a thrift remote metastore. > this is an example of stack: >> show grant user heyongqiang; > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.thrift.TApplicationException: list_privileges failed: > unknown result > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086) > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782) > ... 16 more > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: list_privileges failed: unknown > result > at > org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) > at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) > ... 15 more > Caused by: org.apache.thrift.TApplicationException: list_privileges failed: > unknown result > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) > at > org.apache.hadoop.hive.metastore.
[jira] [Updated] (HIVE-2406) return empty list instead of null for get_privileges
[ https://issues.apache.org/jira/browse/HIVE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2406: --- Description: This is to remove the thrift exception when running hive, which enables authorization and uses a thrift remote metastore. this is an example of stack: > show grant user heyongqiang; org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: list_privileges failed: unknown result at org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.apache.thrift.TApplicationException: list_privileges failed: unknown result at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086) at org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782) ... 16 more org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: list_privileges failed: unknown result at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:597) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:351) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:660) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: list_privileges failed: unknown result at org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1784) at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:450) ... 15 more Caused by: org.apache.thrift.TApplicationException: list_privileges failed: unknown result at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_list_privileges(ThriftHiveMetastore.java:2769) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.list_privileges(ThriftHiveMetastore.java:2734) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.list_privileges(HiveMetaStoreClient.java:1086) at org.apache.hadoop.hive.ql.metadata.Hive.showPrivilegeGrant(Hive.java:1782) was:This is to remove the thrift exception when running hive, which enables authorization and uses a thrift remote metastore. > return empty list instead of null for get_privileges > > > Key: HIVE-2406 > URL: https://issues.apa
[jira] [Created] (HIVE-2406) return empty list instead of null for get_privileges
return empty list instead of null for get_privileges Key: HIVE-2406 URL: https://issues.apache.org/jira/browse/HIVE-2406 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang This is to remove the thrift exception when running hive, which enables authorization and uses a thrift remote metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090446#comment-13090446 ] He Yongqiang commented on HIVE-2405: this patch can also be applied to 0.7 > get_privilege does not get user level privilege > --- > > Key: HIVE-2405 > URL: https://issues.apache.org/jira/browse/HIVE-2405 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2405.1.patch > > > hive> set hive.security.authorization.enabled=true; > hive> grant all to user heyongqiang; > hive> show grant user heyongqiang; > principalName heyongqiang > principalType USER > privilege All > grantTime Wed Aug 24 11:51:54 PDT 2011 > grantor heyongqiang > Time taken: 0.032 seconds > hive> CREATE TABLE src (foo INT, bar STRING); > Authorization failed:No privilege 'Create' found for outputs { > database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2405: --- Status: Patch Available (was: Open) > get_privilege does not get user level privilege > --- > > Key: HIVE-2405 > URL: https://issues.apache.org/jira/browse/HIVE-2405 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2405.1.patch > > > hive> set hive.security.authorization.enabled=true; > hive> grant all to user heyongqiang; > hive> show grant user heyongqiang; > principalName heyongqiang > principalType USER > privilege All > grantTime Wed Aug 24 11:51:54 PDT 2011 > grantor heyongqiang > Time taken: 0.032 seconds > hive> CREATE TABLE src (foo INT, bar STRING); > Authorization failed:No privilege 'Create' found for outputs { > database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2405: --- Attachment: HIVE-2405.1.patch > get_privilege does not get user level privilege > --- > > Key: HIVE-2405 > URL: https://issues.apache.org/jira/browse/HIVE-2405 > Project: Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-2405.1.patch > > > hive> set hive.security.authorization.enabled=true; > hive> grant all to user heyongqiang; > hive> show grant user heyongqiang; > principalName heyongqiang > principalType USER > privilege All > grantTime Wed Aug 24 11:51:54 PDT 2011 > grantor heyongqiang > Time taken: 0.032 seconds > hive> CREATE TABLE src (foo INT, bar STRING); > Authorization failed:No privilege 'Create' found for outputs { > database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira