[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571130#comment-13571130 ] Gunther Hagleitner commented on HIVE-2340: -- Ah, that's good info. Makes sense now. The patch is useful as is, but is the only way to actually optimize the groupby/orderby case to do the ratio thing as a conditional task? And if so would that be this or a follow up jira? optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3988) lateral view followed by mapjoin should not be allowed
Namit Jain created HIVE-3988: Summary: lateral view followed by mapjoin should not be allowed Key: HIVE-3988 URL: https://issues.apache.org/jira/browse/HIVE-3988 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Consider the following queries: drop table lazy_array_map; create table lazy_array_map (map_col mapint,string, array_col arraystring); INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), array('100','200','300') FROM src LIMIT 1; select /*+ MAPJOIN(a) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; select /*+ MAPJOIN(subq1) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; The last 2 queries should throw an error, but they work fine right now. The same affect can be achieved without a mapjoin hint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3988) lateral view followed by mapjoin should not be allowed
[ https://issues.apache.org/jira/browse/HIVE-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3988: - Description: Consider the following queries: drop table lazy_array_map; create table lazy_array_map (map_col mapint,string, array_col array string ); INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), array('100','200','300') FROM src LIMIT 1; select /*+ MAPJOIN(a) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; select /*+ MAPJOIN(subq1) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; The last 2 queries should throw an error, but they work fine right now. The same affect can be achieved without a mapjoin hint. was: Consider the following queries: drop table lazy_array_map; create table lazy_array_map (map_col mapint,string, array_col arraystring); INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), array('100','200','300') FROM src LIMIT 1; select /*+ MAPJOIN(a) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; select /*+ MAPJOIN(subq1) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; The last 2 queries should throw an error, but they work fine right now. The same affect can be achieved without a mapjoin hint. lateral view followed by mapjoin should not be allowed -- Key: HIVE-3988 URL: https://issues.apache.org/jira/browse/HIVE-3988 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Consider the following queries: drop table lazy_array_map; create table lazy_array_map (map_col mapint,string, array_col array string ); INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), array('100','200','300') FROM src LIMIT 1; select /*+ MAPJOIN(a) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; select /*+ MAPJOIN(subq1) */ * from (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X AS myCol) subq1 join src a on subq1.myCol = a.key; The last 2 queries should throw an error, but they work fine right now. The same affect can be achieved without a mapjoin hint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp
[ https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jithin John updated HIVE-3790: -- Fix Version/s: 0.9.1 Status: Patch Available (was: Open) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp Key: HIVE-3790 URL: https://issues.apache.org/jira/browse/HIVE-3790 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0 Reporter: Jithin John Fix For: 0.9.1 Current releases of Hive lacks a generic function which would find the date offset to a date / timestamp. Current releases have date_add (date) and date_sub(date) which allows user to add or substract days only.But we could not use year or month as a unit. The Function DATE_OFFSET(date,offset,unit) returns the date offset value from start_date according to the unit. Here the unit can be year , month and day. The function could be used for date range queries and is more flexible than the existing functions. Functionality :- Function Name: DATE_OFFSET(date,offset,unit) Add a offset value to the unit part of the date/timestamp. Returns the date in the format of -MM-dd . Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1 - 2009-06-29 Usage :- Case : To calculate the expiry date of a item from manufacturing date Table :- ITEM_TAB Manufacturing_date |item id|store id|value|unit|price 2012-12-01|110001|00003|0.99|1.00|0.99 2012-12-02|110001|00008|0.99|0.00|0.00 2012-12-03|110001|00009|0.99|0.00|0.00 2012-12-04|110001|001112002|0.99|0.00|0.00 2012-12-05|110001|001112003|0.99|0.00|0.00 2012-12-06|110001|001112006|0.99|1.00|0.99 2012-12-07|110001|001112007|0.99|0.00|0.00 2012-12-08|110001|001112008|0.99|0.00|0.00 2012-12-09|110001|001112009|0.99|0.00|0.00 2012-12-10|110001|001112010|0.99|0.00|0.00 2012-12-11|110001|001113003|0.99|0.00|0.00 2012-12-12|110001|001113006|0.99|0.00|0.00 2012-12-13|110001|001113008|0.99|0.00|0.00 2012-12-14|110001|001113010|0.99|0.00|0.00 2012-12-15|110001|001114002|0.99|0.00|0.00 2012-12-16|110001|001114004|0.99|1.00|0.99 2012-12-17|110001|001114005|0.99|0.00|0.00 2012-12-18|110001|001121004|0.99|0.00|0.00 QUERY: select man_date , date_offset(man_date ,5 ,'year') as expiry_date from item_tab; RESULT: 2012-12-01 2017-12-01 2012-12-02 2017-12-02 2012-12-03 2017-12-03 2012-12-04 2017-12-04 2012-12-05 2017-12-05 2012-12-06 2017-12-06 2012-12-07 2017-12-07 2012-12-08 2017-12-08 2012-12-09 2017-12-09 2012-12-10 2017-12-10 2012-12-11 2017-12-11 2012-12-12 2017-12-12 2012-12-13 2017-12-13 2012-12-14 2017-12-14 2012-12-15 2017-12-15 2012-12-16 2017-12-16 2012-12-17 2017-12-17 2012-12-18 2017-12-18 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp
[ https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jithin John updated HIVE-3790: -- Status: Open (was: Patch Available) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp Key: HIVE-3790 URL: https://issues.apache.org/jira/browse/HIVE-3790 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0 Reporter: Jithin John Fix For: 0.9.1 Current releases of Hive lacks a generic function which would find the date offset to a date / timestamp. Current releases have date_add (date) and date_sub(date) which allows user to add or substract days only.But we could not use year or month as a unit. The Function DATE_OFFSET(date,offset,unit) returns the date offset value from start_date according to the unit. Here the unit can be year , month and day. The function could be used for date range queries and is more flexible than the existing functions. Functionality :- Function Name: DATE_OFFSET(date,offset,unit) Add a offset value to the unit part of the date/timestamp. Returns the date in the format of -MM-dd . Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1 - 2009-06-29 Usage :- Case : To calculate the expiry date of a item from manufacturing date Table :- ITEM_TAB Manufacturing_date |item id|store id|value|unit|price 2012-12-01|110001|00003|0.99|1.00|0.99 2012-12-02|110001|00008|0.99|0.00|0.00 2012-12-03|110001|00009|0.99|0.00|0.00 2012-12-04|110001|001112002|0.99|0.00|0.00 2012-12-05|110001|001112003|0.99|0.00|0.00 2012-12-06|110001|001112006|0.99|1.00|0.99 2012-12-07|110001|001112007|0.99|0.00|0.00 2012-12-08|110001|001112008|0.99|0.00|0.00 2012-12-09|110001|001112009|0.99|0.00|0.00 2012-12-10|110001|001112010|0.99|0.00|0.00 2012-12-11|110001|001113003|0.99|0.00|0.00 2012-12-12|110001|001113006|0.99|0.00|0.00 2012-12-13|110001|001113008|0.99|0.00|0.00 2012-12-14|110001|001113010|0.99|0.00|0.00 2012-12-15|110001|001114002|0.99|0.00|0.00 2012-12-16|110001|001114004|0.99|1.00|0.99 2012-12-17|110001|001114005|0.99|0.00|0.00 2012-12-18|110001|001121004|0.99|0.00|0.00 QUERY: select man_date , date_offset(man_date ,5 ,'year') as expiry_date from item_tab; RESULT: 2012-12-01 2017-12-01 2012-12-02 2017-12-02 2012-12-03 2017-12-03 2012-12-04 2017-12-04 2012-12-05 2017-12-05 2012-12-06 2017-12-06 2012-12-07 2017-12-07 2012-12-08 2017-12-08 2012-12-09 2017-12-09 2012-12-10 2017-12-10 2012-12-11 2017-12-11 2012-12-12 2017-12-12 2012-12-13 2017-12-13 2012-12-14 2017-12-14 2012-12-15 2017-12-15 2012-12-16 2017-12-16 2012-12-17 2017-12-17 2012-12-18 2017-12-18 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3972: - Status: Open (was: Patch Available) comments Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp
[ https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jithin John updated HIVE-3790: -- Attachment: HIVE-3790.patch Attaching patch for the same. Please review it.. UDF to introduce an OFFSET(day,month or year) for a given date or timestamp Key: HIVE-3790 URL: https://issues.apache.org/jira/browse/HIVE-3790 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0 Reporter: Jithin John Fix For: 0.9.1 Attachments: HIVE-3790.patch Current releases of Hive lacks a generic function which would find the date offset to a date / timestamp. Current releases have date_add (date) and date_sub(date) which allows user to add or substract days only.But we could not use year or month as a unit. The Function DATE_OFFSET(date,offset,unit) returns the date offset value from start_date according to the unit. Here the unit can be year , month and day. The function could be used for date range queries and is more flexible than the existing functions. Functionality :- Function Name: DATE_OFFSET(date,offset,unit) Add a offset value to the unit part of the date/timestamp. Returns the date in the format of -MM-dd . Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1 - 2009-06-29 Usage :- Case : To calculate the expiry date of a item from manufacturing date Table :- ITEM_TAB Manufacturing_date |item id|store id|value|unit|price 2012-12-01|110001|00003|0.99|1.00|0.99 2012-12-02|110001|00008|0.99|0.00|0.00 2012-12-03|110001|00009|0.99|0.00|0.00 2012-12-04|110001|001112002|0.99|0.00|0.00 2012-12-05|110001|001112003|0.99|0.00|0.00 2012-12-06|110001|001112006|0.99|1.00|0.99 2012-12-07|110001|001112007|0.99|0.00|0.00 2012-12-08|110001|001112008|0.99|0.00|0.00 2012-12-09|110001|001112009|0.99|0.00|0.00 2012-12-10|110001|001112010|0.99|0.00|0.00 2012-12-11|110001|001113003|0.99|0.00|0.00 2012-12-12|110001|001113006|0.99|0.00|0.00 2012-12-13|110001|001113008|0.99|0.00|0.00 2012-12-14|110001|001113010|0.99|0.00|0.00 2012-12-15|110001|001114002|0.99|0.00|0.00 2012-12-16|110001|001114004|0.99|1.00|0.99 2012-12-17|110001|001114005|0.99|0.00|0.00 2012-12-18|110001|001121004|0.99|0.00|0.00 QUERY: select man_date , date_offset(man_date ,5 ,'year') as expiry_date from item_tab; RESULT: 2012-12-01 2017-12-01 2012-12-02 2017-12-02 2012-12-03 2017-12-03 2012-12-04 2017-12-04 2012-12-05 2017-12-05 2012-12-06 2017-12-06 2012-12-07 2017-12-07 2012-12-08 2017-12-08 2012-12-09 2017-12-09 2012-12-10 2017-12-10 2012-12-11 2017-12-11 2012-12-12 2017-12-12 2012-12-13 2017-12-13 2012-12-14 2017-12-14 2012-12-15 2017-12-15 2012-12-16 2017-12-16 2012-12-17 2017-12-17 2012-12-18 2017-12-18 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571160#comment-13571160 ] Phabricator commented on HIVE-3972: --- njain has commented on the revision HIVE-3972 [jira] Support using multiple reducer for fetching order by results. INLINE COMMENTS conf/hive-default.xml.template:1621 nit: reducers for the last MapReduce task for order by ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 apache header ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 can you perform explain extended ? I think, it also shows the number of reducers. ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 Might be easier to create a tmp table with 10 rows initially to reduce the number of results. ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:8 Add some comments - it would be good to have a lot of examples. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 What happens if it is -1 ? Shouldn't useBucketingForOrderBy be false ? REVISION DETAIL https://reviews.facebook.net/D8349 To: JIRA, navis Cc: njain Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Graduate HCatalog from the incubator and become part of Hive
And my axe! Erm... I mean, my +1. On Mon, Feb 4, 2013 at 10:18 PM, Alan Gates ga...@hortonworks.com wrote: FYI. Alan. Begin forwarded message: From: Alan Gates ga...@hortonworks.com Date: February 4, 2013 10:18:09 PM PST To: hcatalog-...@incubator.apache.org Subject: [VOTE] Graduate HCatalog from the incubator and become part of Hive The Hive PMC has voted to accept HCatalog as a submodule of Hive. You can see the vote thread at http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6RrzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e . We now need to vote to graduate from the incubator and become a submodule of Hive. This entails the following: 1) the establishment of an HCatalog submodule in the Apache Hive Project; 2) the adoption of the Apache HCatalog codebase into the Hive HCatalog submodule; and 3) adding all currently active HCatalog committers as submodule committers on the Hive HCatalog submodule. Definitions for all these can be found in the (now adopted) Hive bylaws at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committer. This vote will stay open for at least 72 hours (thus 23:00 PST on 2/7/13). PPMC members votes are binding in this vote, though input from all is welcome. If this vote passes the next step will be to submit the graduation motion to the Incubator PMC. Here's my +1. Alan.
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571168#comment-13571168 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. A general question ??? How does it work with hive.optimize.reducededuplication ? INLINE COMMENTS conf/hive-default.xml.template:1034 Sorry for joining late: Can you explain this more clearly ? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2340: - Status: Open (was: Patch Available) comments optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571177#comment-13571177 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. Do you think it might be a good idea to get HIVE-3972 first ? INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 Isn't it true that R1 and R2 will have the same cost for RS - GBY -- anything -- RS ? If yes, how do you know which rule will be fired ? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Graduate HCatalog from the incubator and become part of Hive
+1, non-binding - Alex On Feb 5, 2013, at 10:06 AM, Sushanth Sowmyan khorg...@gmail.com wrote: And my axe! Erm... I mean, my +1. On Mon, Feb 4, 2013 at 10:18 PM, Alan Gates ga...@hortonworks.com wrote: FYI. Alan. Begin forwarded message: From: Alan Gates ga...@hortonworks.com Date: February 4, 2013 10:18:09 PM PST To: hcatalog-...@incubator.apache.org Subject: [VOTE] Graduate HCatalog from the incubator and become part of Hive The Hive PMC has voted to accept HCatalog as a submodule of Hive. You can see the vote thread at http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6RrzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e . We now need to vote to graduate from the incubator and become a submodule of Hive. This entails the following: 1) the establishment of an HCatalog submodule in the Apache Hive Project; 2) the adoption of the Apache HCatalog codebase into the Hive HCatalog submodule; and 3) adding all currently active HCatalog committers as submodule committers on the Hive HCatalog submodule. Definitions for all these can be found in the (now adopted) Hive bylaws at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committer. This vote will stay open for at least 72 hours (thus 23:00 PST on 2/7/13). PPMC members votes are binding in this vote, though input from all is welcome. If this vote passes the next step will be to submit the graduation motion to the Incubator PMC. Here's my +1. Alan. -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
[jira] [Updated] (HIVE-1662) Add file pruning into Hive.
[ https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-1662: -- Attachment: HIVE-1662.D8391.1.patch navis requested code review of HIVE-1662 [jira] Add file pruning into Hive.. Reviewers: JIRA DPAL-1979 Add file pruning based on INPUT__FILE__NAME now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8391 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/metadata/NativeTablePredicateHandler.java ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/file_pruning.q ql/src/test/results/clientpositive/file_pruning.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/20493/ To: JIRA, navis Add file pruning into Hive. --- Key: HIVE-1662 URL: https://issues.apache.org/jira/browse/HIVE-1662 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Navis Attachments: HIVE-1662.D8391.1.patch now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1662) Add file pruning into Hive.
[ https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1662: Assignee: Navis Status: Patch Available (was: Open) Add file pruning into Hive. --- Key: HIVE-1662 URL: https://issues.apache.org/jira/browse/HIVE-1662 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Navis Attachments: HIVE-1662.D8391.1.patch now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Requests
Thanks Mark. Appreciate that. I'll take a look. On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover grover.markgro...@gmail.comwrote: Swarnim, I left some comments on reviewboard. On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Hello, I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2]. If you guys get a chance to review and provide feedback on it, I will really appreciate. Thanks, [1] https://reviews.apache.org/r/9275/ [2] https://reviews.apache.org/r/9276/ -- Swarnim -- Swarnim
hive-trunk-hadoop1 - Build # 69 - Failure
Changes for Build #14 [hashutosh] HIVE-3004 : RegexSerDe should support other column types in addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan) Changes for Build #15 [hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via Ashutosh Chauhan) Changes for Build #16 [namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns from an RC File (Kevin Wilfong via namit) Changes for Build #17 [namit] HIVE-3899 Partition pruning fails on constant = constant expression (Kevin Wilfong via namit) Changes for Build #18 [hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via Ashutosh Chauhan) [namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error (Navis via namit) Changes for Build #19 [cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark Grover and Gunther Hagleitner via cws) Changes for Build #20 [namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is used twice or more (Navis via namit) Changes for Build #21 [namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than the max number of reducers (Kevin Wilfong via namit) Changes for Build #22 Changes for Build #23 [namit] HIVE-3893 something wrong with the hive-default.xml (jet cheng via namit) Changes for Build #24 [namit] HIVE-3915 Union with map-only query on one side and two MR job query on the other produces wrong results (Kevin Wilfong via namit) Changes for Build #25 [namit] HIVE-3909 Wrong data due to HIVE-2820 (Navis via namit) Changes for Build #26 [namit] HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) Changes for Build #27 [hashutosh] HIVE-3537 : release locks at the end of move tasks (Namit via Ashutosh Chauhan) Changes for Build #28 [namit] HIVE-3884 Better align columns in DESCRIBE table_name output to make more human-readable (Dilip Joseph via namit) Changes for Build #29 Changes for Build #30 [namit] HIVE-3916 For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress (Kevin Wilfong via namit) Changes for Build #31 [hashutosh] HIVE-2332 : If all of the parameters of distinct functions are exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan) Changes for Build #32 Changes for Build #33 [namit] HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) Changes for Build #34 [hashutosh] NPE in union processing followed by lateral view followed by 2 group bys (Navis via Ashutosh Chauhan) Changes for Build #35 Changes for Build #36 Changes for Build #37 [namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options (Kevin Wilfong via namit) Changes for Build #38 Changes for Build #39 [cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0 (Prasad Mujumdar via cws) Changes for Build #40 Changes for Build #41 [hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) [hashutosh] HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) Changes for Build #42 Changes for Build #43 [hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types that require access to a Schema (Sean Busbey via Ashutosh Chauhan) [namit] HIVE-3943 Skewed query fails if hdfs path has special characters (Gang Tim Liu via namit) Changes for Build #44 [namit] HIVE-3825 Add Operator level Hooks (Pamela Vagata via namit) Changes for Build #45 [namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES (Kevin Wilfong via namit) [namit] HIVE-3944 Make accept qfile argument for miniMR tests (Navis via namit) Changes for Build #46 Changes for Build #47 [hashutosh] Adding csv.txt file, left out from commit of 3528 Changes for Build #48 [namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23 (Sushanth Sownyan via namit) [namit] HIVE-3921 recursive_dir.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3923 join_filters_overlap.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3924 join_nullsafe.q fails on 0.23 (Sushanth Sownyan via namit) Changes for Build #49 Changes for Build #50 [hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext cannot be loaded/instantiated (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3947 : MiniMR test remains pending after test completion (Navis via Ashutosh Chauhan) Changes for Build #51 Changes for Build #52 [kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a partition through the CLI. (Samuel Yuan via kevinwilfong) Changes for Build #53 [namit] HIVE-3873 lot of tests failing for hadoop 23 (Gang Tim Liu via namit) Changes for Build #54 [hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan) Changes for Build #55 Changes for Build
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #56
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/ -- [...truncated 7482 lines...] [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] Creating empty https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/exec/package-info.class [javac] Creating empty https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/udf/generic/package-info.class [javac] Creating empty https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/exec/errors/package-info.class [javac] Creating empty https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/lockmgr/package-info.class [copy] Copying 1 file to https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes jar: [echo] Project: ql [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/libthrift-0.9.0.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/thrift/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/commons-lang-2.4.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/commons-lang/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/json-20090211.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/json/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/JavaEWAH-0.3.2.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/javaewah/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/avro-1.7.1.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/avro/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/avro-mapred-1.7.1.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/avro-mapred/classes [unzip] Expanding: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/javolution-5.5.1.jar into https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/javolution/classes [jar] Building jar: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/hive-exec-0.10.0-SNAPSHOT.jar :: delivering :: org.apache.hive#hive-exec;0.10.0-SNAPSHOT :: 0.10.0-SNAPSHOT :: integration :: Tue Feb 05 16:12:36 UTC 2013 delivering ivy file to https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/ivy-0.10.0-SNAPSHOT.xml :: publishing :: org.apache.hive#hive-exec published hive-exec to /home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/jars/hive-exec.jar published ivy to /home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/ivys/ivy.xml create-dirs: [echo] Project: contrib [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/contrib/src/test/resources does not exist. init: [echo] Project: contrib setup: [echo] Project: contrib ivy-init-settings: [echo] Project: contrib ivy-resolve: [echo] Project: contrib [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml [ivy:resolve] downloading /home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/jars/hive-exec.jar ... [ivy:resolve] (4741kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hive#hive-exec;0.10.0-SNAPSHOT!hive-exec.jar (56ms) [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] ERRORS [ivy:resolve] SERVER ERROR: Service Unavailable url=http://www.sourceforge.net/projects/jdo2-api/files/jdo2-api//jdo2-api-2.3-ec.jar [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS [ivy:report] Processing https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-contrib-default.xml to
Hive-trunk-hadoop2 - Build # 108 - Still Failing
Changes for Build #66 [hashutosh] HIVE-3004 : RegexSerDe should support other column types in addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan) Changes for Build #67 [namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns from an RC File (Kevin Wilfong via namit) [hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via Ashutosh Chauhan) Changes for Build #68 [namit] HIVE-3899 Partition pruning fails on constant = constant expression (Kevin Wilfong via namit) Changes for Build #69 [hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via Ashutosh Chauhan) [namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error (Navis via namit) Changes for Build #70 [namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is used twice or more (Navis via namit) [cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark Grover and Gunther Hagleitner via cws) Changes for Build #71 [namit] HIVE-3893 something wrong with the hive-default.xml (jet cheng via namit) [namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than the max number of reducers (Kevin Wilfong via namit) Changes for Build #72 [namit] HIVE-3915 Union with map-only query on one side and two MR job query on the other produces wrong results (Kevin Wilfong via namit) Changes for Build #73 [namit] HIVE-3909 Wrong data due to HIVE-2820 (Navis via namit) Changes for Build #74 [namit] HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) Changes for Build #75 [namit] HIVE-3884 Better align columns in DESCRIBE table_name output to make more human-readable (Dilip Joseph via namit) [hashutosh] HIVE-3537 : release locks at the end of move tasks (Namit via Ashutosh Chauhan) Changes for Build #76 [namit] HIVE-3916 For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress (Kevin Wilfong via namit) Changes for Build #77 [hashutosh] HIVE-2332 : If all of the parameters of distinct functions are exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan) Changes for Build #78 Changes for Build #79 [hashutosh] NPE in union processing followed by lateral view followed by 2 group bys (Navis via Ashutosh Chauhan) [namit] HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) Changes for Build #80 Changes for Build #81 Changes for Build #82 [namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options (Kevin Wilfong via namit) Changes for Build #83 Changes for Build #84 [cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0 (Prasad Mujumdar via cws) Changes for Build #85 Changes for Build #86 [hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) [hashutosh] HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) Changes for Build #87 Changes for Build #88 [namit] HIVE-3825 Add Operator level Hooks (Pamela Vagata via namit) [hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types that require access to a Schema (Sean Busbey via Ashutosh Chauhan) [namit] HIVE-3943 Skewed query fails if hdfs path has special characters (Gang Tim Liu via namit) Changes for Build #89 [namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES (Kevin Wilfong via namit) [namit] HIVE-3944 Make accept qfile argument for miniMR tests (Navis via namit) Changes for Build #90 [namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23 (Sushanth Sownyan via namit) [namit] HIVE-3921 recursive_dir.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3923 join_filters_overlap.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3924 join_nullsafe.q fails on 0.23 (Sushanth Sownyan via namit) [hashutosh] Adding csv.txt file, left out from commit of 3528 Changes for Build #91 Changes for Build #92 [hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext cannot be loaded/instantiated (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3947 : MiniMR test remains pending after test completion (Navis via Ashutosh Chauhan) Changes for Build #93 Changes for Build #94 [kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a partition through the CLI. (Samuel Yuan via kevinwilfong) Changes for Build #95 [namit] HIVE-3873 lot of tests failing for hadoop 23 (Gang Tim Liu via namit) Changes for Build #96 [hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784 [hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan) Changes for Build #97 [namit] HIVE-933 Infer bucketing/sorting properties (Kevin Wilfong via namit) [hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan,
Hive-trunk-h0.21 - Build # 1957 - Failure
Changes for Build #1955 [namit] HIVE-3937 Hive Profiler (Pamela Vagata via namit) [hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port (Navis via Ashutosh Chauhan) Changes for Build #1956 Changes for Build #1957 No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1957) Status: Failure Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1957/ to view the results.
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571510#comment-13571510 ] Owen O'Malley commented on HIVE-3874: - [~kevinwilfong] Thanks for the bug fixes, Kevin. I pushed the DynamicByteArray and double serialization fixes to [github|https://github.com/hortonworks/orc]. I have the null column problem fixed, but it is tied into my other changes on my row-seek dev branch. I hope to finish up the row-seek today and I'll merge it into master and make the patch putting it into Hive. Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-701: - Attachment: HIVE-701.D8397.1.patch sxyuan requested code review of HIVE-701 [jira] Make keywords non-reserved. Reviewers: kevinwilfong, JIRA Almost all keywords in Hive are reserved. This change makes all but the following keywords non-reserved: IF, HAVING, WHERE, SELECT, UNIQUEJOIN, JOIN, ON, TRANSFORM, MAP, REDUCE, TABLESAMPLE, CAST, FUNCTION, EXTENDED, FORMATTED, PRETTY, CASE, WHEN, THEN, ELSE, END, DATABASE, CROSS Because the grammar grew too large, it was split into multiple files to accommodate Java's code size limit. As a result, the custom error handling needed to be moved as well. TEST PLAN Use keywords as identifiers in test queries. Existing unit tests should ensure that keywords will not be mistakenly identified as identifiers. REVISION DETAIL https://reviews.facebook.net/D8397 AFFECTED FILES cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java ql/src/test/results/clientnegative/show_tables_bad1.q.out ql/src/test/results/clientnegative/archive_partspec3.q.out ql/src/test/results/clientnegative/invalid_create_tbl2.q.out ql/src/test/results/clientnegative/select_udtf_alias.q.out ql/src/test/results/clientnegative/show_tables_bad2.q.out ql/src/test/results/clientnegative/invalid_tbl_name.q.out ql/src/test/results/clientnegative/lateral_view_join.q.out ql/src/test/results/clientpositive/nonreserved_keywords_input37.q.out ql/src/test/results/clientpositive/nonreserved_keywords_insert_into1.q.out ql/src/test/results/compiler/errors/wrong_distinct2.q.out ql/src/test/results/compiler/errors/missing_overwrite.q.out ql/src/test/queries/clientnegative/show_tables_bad1.q ql/src/test/queries/clientnegative/show_tables_bad2.q ql/src/test/queries/clientpositive/nonreserved_keywords_insert_into1.q ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/SelectClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ql/build.xml To: JIRA lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.D8397.1.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-701: - Attachment: HIVE-701.1.patch.txt https://reviews.facebook.net/D8397 lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.1.patch.txt, HIVE-701.D8397.1.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-701: - Status: Patch Available (was: Open) lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.1.patch.txt, HIVE-701.D8397.1.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3989) TestCase TestMTQueries fails with IBM Java 6
Renata Ghisloti Duarte de Souza created HIVE-3989: - Summary: TestCase TestMTQueries fails with IBM Java 6 Key: HIVE-3989 URL: https://issues.apache.org/jira/browse/HIVE-3989 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.9.0 Environment: IBM Java 6 x86 64 Reporter: Renata Ghisloti Duarte de Souza Priority: Minor Fix For: 0.10.0 The testcase fails with IBM java 6, due to a Hashmap problem. Following the error: [junit] diff -a /home/renata/stg-hadoop/hive-0.10/release-0.10.0/build/ql/test/logs/clientpositive/join2.q.out /home/renata/stg-hadoop/hive-0.10/release-0.10.0/ql/src/test/results/clientpositive/join2.q.out [junit] 109c109 [junit] 0 {VALUE._col0} [junit] --- [junit] 0 {VALUE._col4} [junit] 112c112 [junit]outputColumnNames: _col0, _col9 [junit] --- [junit]outputColumnNames: _col4, _col9 [junit] 115c115 [junit]expr: _col0 [junit] --- [junit]expr: _col4 [junit] Test join2.q results check failed with error code 1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingwei Lu updated HIVE-3672: - Attachment: HIVE-3672.6.patch.txt Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingwei Lu updated HIVE-3672: - Status: Patch Available (was: Open) Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #283
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/ -- [...truncated 36454 lines...] [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2013-02-05_14-53-56_527_6993925461901486378/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_396289522.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] Copying file: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2013-02-05_14-54-00_889_4235432531689754431/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2013-02-05_14-54-00_889_4235432531689754431/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_573949786.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_531624891.txt [junit] Hive history file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_1845300102.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (key int, value
[jira] [Updated] (HIVE-1662) Add file pruning into Hive.
[ https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1662: Status: Open (was: Patch Available) Add file pruning into Hive. --- Key: HIVE-1662 URL: https://issues.apache.org/jira/browse/HIVE-1662 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Navis Attachments: HIVE-1662.D8391.1.patch now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1662) Add file pruning into Hive.
[ https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-1662: -- Attachment: HIVE-1662.D8391.2.patch navis updated the revision HIVE-1662 [jira] Add file pruning into Hive.. Fixed NPEs Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8391 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8391?vs=27249id=27273#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/metadata/NativeTablePredicateHandler.java ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/file_pruning.q ql/src/test/results/clientpositive/file_pruning.q.out To: JIRA, navis Add file pruning into Hive. --- Key: HIVE-1662 URL: https://issues.apache.org/jira/browse/HIVE-1662 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Navis Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly
[ https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571996#comment-13571996 ] Phabricator commented on HIVE-2839: --- navis has commented on the revision HIVE-2839 [jira] Filters on outer join with mapjoin hint is not applied correctly. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:169 It's just a list of operators which can have multiple parents. The conditions you've mentioned should be checked before calling this. I've done this intentionally cause this is utility class. ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:181 hm.. ok. ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:130 Return value might be used for setting OperatorDescs, which has ArrayList instead of List. Would it be better to wrap again with ArrayList before setting? REVISION DETAIL https://reviews.facebook.net/D2079 To: JIRA, navis Cc: njain Filters on outer join with mapjoin hint is not applied correctly Key: HIVE-2839 URL: https://issues.apache.org/jira/browse/HIVE-2839 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch, HIVE-2839.D2079.3.patch, HIVE-2839.D2079.4.patch, HIVE-2839.D2079.5.patch, HIVE-2839.D2079.6.patch Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions. {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND true limit 10; FAILED: Hive Internal Error: java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} and {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND b.key * 10 '1000' limit 10; java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212) at
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572006#comment-13572006 ] Phabricator commented on HIVE-3972: --- navis has commented on the revision HIVE-3972 [jira] Support using multiple reducer for fetching order by results. INLINE COMMENTS conf/hive-default.xml.template:1621 ok. It's harder than writing some codes. ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 ah, ok. ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 ok. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 It will be calculated by input size, which might be 1 or not. Then it would be safer assuming that it's not 1. REVISION DETAIL https://reviews.facebook.net/D8349 To: JIRA, navis Cc: njain Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-701: - Attachment: HIVE-701.HIVE-701.D8397.2.patch sxyuan updated the revision HIVE-701 [jira] Make keywords non-reserved. Forgot a step. FORMATTED and PRETTY are also non-reserved. Reviewers: kevinwilfong, JIRA REVISION DETAIL https://reviews.facebook.net/D8397 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8397?vs=27255id=27285#toc AFFECTED FILES cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java ql/src/test/results/clientnegative/show_tables_bad1.q.out ql/src/test/results/clientnegative/archive_partspec3.q.out ql/src/test/results/clientnegative/invalid_create_tbl2.q.out ql/src/test/results/clientnegative/select_udtf_alias.q.out ql/src/test/results/clientnegative/show_tables_bad2.q.out ql/src/test/results/clientnegative/invalid_tbl_name.q.out ql/src/test/results/clientnegative/lateral_view_join.q.out ql/src/test/results/clientpositive/nonreserved_keywords_input37.q.out ql/src/test/results/clientpositive/nonreserved_keywords_insert_into1.q.out ql/src/test/results/compiler/errors/wrong_distinct2.q.out ql/src/test/results/compiler/errors/missing_overwrite.q.out ql/src/test/queries/clientnegative/show_tables_bad1.q ql/src/test/queries/clientnegative/show_tables_bad2.q ql/src/test/queries/clientpositive/nonreserved_keywords_insert_into1.q ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/SelectClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ql/build.xml To: kevinwilfong, JIRA, sxyuan lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-701: - Attachment: HIVE-701.2.patch.txt Updated, see Phabricator. lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)
Navis created HIVE-3990: --- Summary: Provide input threshold for direct-fetcher (HIVE-2925) Key: HIVE-3990 URL: https://issues.apache.org/jira/browse/HIVE-3990 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial As a followup of HIVE-2925, add input threshold for fetch task conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)
[ https://issues.apache.org/jira/browse/HIVE-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3990: Status: Patch Available (was: Open) Provide input threshold for direct-fetcher (HIVE-2925) -- Key: HIVE-3990 URL: https://issues.apache.org/jira/browse/HIVE-3990 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial As a followup of HIVE-2925, add input threshold for fetch task conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1662) Add file pruning into Hive.
[ https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1662: Status: Patch Available (was: Open) Add file pruning into Hive. --- Key: HIVE-1662 URL: https://issues.apache.org/jira/browse/HIVE-1662 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Navis Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch now hive support filename virtual column. if a file name filter presents in a query, hive should be able to only add files which passed the filter to input paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)
[ https://issues.apache.org/jira/browse/HIVE-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3990: -- Attachment: HIVE-3990.D8415.1.patch navis requested code review of HIVE-3990 [jira] Provide input threshold for direct-fetcher (HIVE-2925). Reviewers: JIRA DPAL-1371 Provide input threshold for direct-fetcher As a followup of HIVE-2925, add input threshold for fetch task conversion. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8415 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/metadata/InputEstimator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java ql/src/test/queries/clientpositive/nonmr_fetch_threshold.q ql/src/test/results/clientpositive/nonmr_fetch_threshold.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/20535/ To: JIRA, navis Provide input threshold for direct-fetcher (HIVE-2925) -- Key: HIVE-3990 URL: https://issues.apache.org/jira/browse/HIVE-3990 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3990.D8415.1.patch As a followup of HIVE-2925, add input threshold for fetch task conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly
[ https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572118#comment-13572118 ] Phabricator commented on HIVE-2839: --- njain has commented on the revision HIVE-2839 [jira] Filters on outer join with mapjoin hint is not applied correctly. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:169 You should assert numberParents == 1 We need to check that before coming to this function. ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:130 ok - that's fine ideally, it should be cleaned up, but can be a follow-up REVISION DETAIL https://reviews.facebook.net/D2079 To: JIRA, navis Cc: njain Filters on outer join with mapjoin hint is not applied correctly Key: HIVE-2839 URL: https://issues.apache.org/jira/browse/HIVE-2839 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch, HIVE-2839.D2079.3.patch, HIVE-2839.D2079.4.patch, HIVE-2839.D2079.5.patch, HIVE-2839.D2079.6.patch Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions. {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND true limit 10; FAILED: Hive Internal Error: java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} and {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND b.key * 10 '1000' limit 10; java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1321) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325) at
[jira] [Commented] (HIVE-701) lots of reserved keywords in hive
[ https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572122#comment-13572122 ] Phabricator commented on HIVE-701: -- njain has commented on the revision HIVE-701 [jira] Make keywords non-reserved. INLINE COMMENTS ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q:9 I haven't looked at the patch, but don't use MAP/REDUCE for tests. We are trying to deprecate this syntax, if possible. REVISION DETAIL https://reviews.facebook.net/D8397 To: kevinwilfong, JIRA, sxyuan Cc: njain lots of reserved keywords in hive - Key: HIVE-701 URL: https://issues.apache.org/jira/browse/HIVE-701 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Samuel Yuan Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch There is a problem if we want to use some reserved keywords: for example, creating a function of name left/right ? left/right is already a reserved keyword. The other way around should also be possible - if we want to add a 'show tables status' and some applications already use status as a column name, they should not break -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572126#comment-13572126 ] Phabricator commented on HIVE-2340: --- hagleitn has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138 HashSet? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251 I think the number of reducers story deserves more comments (similar to what you've explained on the jira) ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 I think if you just run this optimization *after* CommonJoinResolver everything should be fine. It will either already have converted joins to mapjoins and this optimization won't apply or you still have a regular join and you can merge it without worrying about missing out on a mapjoin conversion. You could still have the sorted flag to express intent, but there isn't any optimization that will pull the rug out under you at the moment. Am I missing something? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572148#comment-13572148 ] Navis commented on HIVE-3972: - I've missed some commits (HIVE-3633, etc). Should be merged correctly. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2238) Support for Median and Mode UDAFs
[ https://issues.apache.org/jira/browse/HIVE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRETTY SITHARA updated HIVE-2238: - Attachment: HIVE-2238.1.patch.txt Patch for HIVE-2238 Support for Median and Mode UDAFs - Key: HIVE-2238 URL: https://issues.apache.org/jira/browse/HIVE-2238 Project: Hive Issue Type: New Feature Components: UDF Reporter: Travis Powell Attachments: HIVE-2238.1.patch.txt Median and Mode are essential functions for reducing/refining the data set, and would allow for greater control over the selection of data. More involved analytics are probably best handled by relational databases or OLAP cubes, but Median and Mode are very practical for Hive solely in terms of delivering a smaller data set, where items selected only have a certain mode. (Rows that describe an object to which the table is joined where that object has a column value frequency threshold.) Comments are more than welcome. Would be happy to support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2238) Support for Median and Mode UDAFs
[ https://issues.apache.org/jira/browse/HIVE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRETTY SITHARA updated HIVE-2238: - Labels: patch (was: ) Hadoop Flags: Incompatible change Status: Patch Available (was: Open) Support for Median and Mode UDAFs - Key: HIVE-2238 URL: https://issues.apache.org/jira/browse/HIVE-2238 Project: Hive Issue Type: New Feature Components: UDF Reporter: Travis Powell Labels: patch Attachments: HIVE-2238.1.patch.txt Median and Mode are essential functions for reducing/refining the data set, and would allow for greater control over the selection of data. More involved analytics are probably best handled by relational databases or OLAP cubes, but Median and Mode are very practical for Hive solely in terms of delivering a smaller data set, where items selected only have a certain mode. (Rows that describe an object to which the table is joined where that object has a column value frequency threshold.) Comments are more than welcome. Would be happy to support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive Operator Counters
Hi all, Does anyone notice that the operator counters are not properly maintained? They are useful for understanding the query plan and execution, e.g how many rows each operator is processing and producing, and how much time each operator is spending. NUM_INPUT_ROWS NUM_OUTPUT_ROWS TIME_TAKEN They can be found in org.apache.hadoop.hive.ql.exec.Operator, but since counterNameToEnum is never initialized, these counters are not being calculated. If this used to work and was broken somehow, I'll be glad to contribute:) Jie
[jira] [Updated] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3972: -- Attachment: HIVE-3972.D8349.3.patch navis updated the revision HIVE-3972 [jira] Support using multiple reducer for fetching order by results. Addressed comments merged missing commits Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8349 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8349?vs=27135id=27303#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/MergeSortingFetcher.java ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java ql/src/test/queries/clientpositive/orderby_query_bucketing.q ql/src/test/results/clientpositive/orderby_query_bucketing.q.out To: JIRA, navis Cc: njain Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572244#comment-13572244 ] Jie Li commented on HIVE-2227: -- Even though this ticket has not been committed, the current counterNameToEnum is not initialized and remains null. This prevents the operator counters (NUM_INPUT_ROWS NUM_OUTPUT_ROWS TIME_TAKEN) to be recorded. Any thoughts on what was broken? Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Attachments: HIVE-2227-1.patch After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2238) Support for Median and Mode UDAFs
[ https://issues.apache.org/jira/browse/HIVE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572246#comment-13572246 ] Arun A K commented on HIVE-2238: [~554268] Please attach the .q and .q.out files as well. Need to submit for review once that has been attached. Support for Median and Mode UDAFs - Key: HIVE-2238 URL: https://issues.apache.org/jira/browse/HIVE-2238 Project: Hive Issue Type: New Feature Components: UDF Reporter: Travis Powell Labels: patch Attachments: HIVE-2238.1.patch.txt Median and Mode are essential functions for reducing/refining the data set, and would allow for greater control over the selection of data. More involved analytics are probably best handled by relational databases or OLAP cubes, but Median and Mode are very practical for Hive solely in terms of delivering a smaller data set, where items selected only have a certain mode. (Rows that describe an object to which the table is joined where that object has a column value frequency threshold.) Comments are more than welcome. Would be happy to support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira