date:20130205

Namit Jain created HIVE-3988:


 Summary: lateral view followed by mapjoin should not be allowed
 Key: HIVE-3988
 URL: https://issues.apache.org/jira/browse/HIVE-3988
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


Consider the following queries:

drop table lazy_array_map;
create table lazy_array_map (map_col mapint,string, array_col arraystring);
INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), 
array('100','200','300') FROM src LIMIT 1;

select /*+ MAPJOIN(a) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;

select /*+ MAPJOIN(subq1) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;


The last 2 queries should throw an error, but they work fine right now.
The same affect can be achieved without a mapjoin hint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3988) lateral view followed by mapjoin should not be allowed


 [ 
https://issues.apache.org/jira/browse/HIVE-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3988:
-

Description: 
Consider the following queries:

drop table lazy_array_map;
create table lazy_array_map (map_col mapint,string, array_col array string 
);
INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), 
array('100','200','300') FROM src LIMIT 1;

select /*+ MAPJOIN(a) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;

select /*+ MAPJOIN(subq1) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;


The last 2 queries should throw an error, but they work fine right now.
The same affect can be achieved without a mapjoin hint.

  was:
Consider the following queries:

drop table lazy_array_map;
create table lazy_array_map (map_col mapint,string, array_col arraystring);
INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), 
array('100','200','300') FROM src LIMIT 1;

select /*+ MAPJOIN(a) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;

select /*+ MAPJOIN(subq1) */ * from
(SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) X 
AS myCol) subq1
join
src a
on subq1.myCol = a.key;


The last 2 queries should throw an error, but they work fine right now.
The same affect can be achieved without a mapjoin hint.


 lateral view followed by mapjoin should not be allowed
 --

 Key: HIVE-3988
 URL: https://issues.apache.org/jira/browse/HIVE-3988
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 Consider the following queries:
 drop table lazy_array_map;
 create table lazy_array_map (map_col mapint,string, array_col array 
 string );
 INSERT OVERWRITE TABLE lazy_array_map select map(1,'one',2,'two',3,'three'), 
 array('100','200','300') FROM src LIMIT 1;
 select /*+ MAPJOIN(a) */ * from
 (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) 
 X AS myCol) subq1
 join
 src a
 on subq1.myCol = a.key;
 select /*+ MAPJOIN(subq1) */ * from
 (SELECT array_col, myCol from lazy_array_map lateral view explode(array_col) 
 X AS myCol) subq1
 join
 src a
 on subq1.myCol = a.key;
 The last 2 queries should throw an error, but they work fine right now.
 The same affect can be achieved without a mapjoin hint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp

2013-02-05 Thread Jithin John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jithin John updated HIVE-3790:
--

Fix Version/s: 0.9.1
   Status: Patch Available  (was: Open)

 UDF to introduce an OFFSET(day,month or year) for a given date or timestamp 
 

 Key: HIVE-3790
 URL: https://issues.apache.org/jira/browse/HIVE-3790
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0
Reporter: Jithin John
 Fix For: 0.9.1


 Current releases of Hive lacks a  generic function which would find the date 
 offset to a date / timestamp. Current releases have date_add (date) and 
 date_sub(date) which allows user to add or substract days only.But we could 
 not use year or month as a unit.
 
 The Function DATE_OFFSET(date,offset,unit) returns the date offset value from 
 start_date according to the unit. Here the unit can be year , month and day.
 The function could be used for date range queries and is more flexible than 
 the existing functions.
 Functionality :-
 Function Name: DATE_OFFSET(date,offset,unit)

 Add a offset value to the unit part of the date/timestamp.
 Returns  the date in the format of -MM-dd .
 Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1
 - 2009-06-29
 Usage :-
 Case  : To calculate the expiry date of a item from manufacturing date 
 Table :- ITEM_TAB
  Manufacturing_date  |item id|store id|value|unit|price
   2012-12-01|110001|00003|0.99|1.00|0.99
   2012-12-02|110001|00008|0.99|0.00|0.00
   2012-12-03|110001|00009|0.99|0.00|0.00
   2012-12-04|110001|001112002|0.99|0.00|0.00
   2012-12-05|110001|001112003|0.99|0.00|0.00
   2012-12-06|110001|001112006|0.99|1.00|0.99
   2012-12-07|110001|001112007|0.99|0.00|0.00
   2012-12-08|110001|001112008|0.99|0.00|0.00
   2012-12-09|110001|001112009|0.99|0.00|0.00
   2012-12-10|110001|001112010|0.99|0.00|0.00
   2012-12-11|110001|001113003|0.99|0.00|0.00
   2012-12-12|110001|001113006|0.99|0.00|0.00
   2012-12-13|110001|001113008|0.99|0.00|0.00
   2012-12-14|110001|001113010|0.99|0.00|0.00
   2012-12-15|110001|001114002|0.99|0.00|0.00
   2012-12-16|110001|001114004|0.99|1.00|0.99
   2012-12-17|110001|001114005|0.99|0.00|0.00
   2012-12-18|110001|001121004|0.99|0.00|0.00 
 QUERY:
 select man_date , date_offset(man_date ,5 ,'year') as expiry_date from 
 item_tab;
 RESULT:
 2012-12-01  2017-12-01
 2012-12-02  2017-12-02
 2012-12-03  2017-12-03
 2012-12-04  2017-12-04
 2012-12-05  2017-12-05
 2012-12-06  2017-12-06
 2012-12-07  2017-12-07
 2012-12-08  2017-12-08
 2012-12-09  2017-12-09
 2012-12-10  2017-12-10
 2012-12-11  2017-12-11
 2012-12-12  2017-12-12
 2012-12-13  2017-12-13
 2012-12-14  2017-12-14
 2012-12-15  2017-12-15
 2012-12-16  2017-12-16
 2012-12-17  2017-12-17
 2012-12-18  2017-12-18

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp

2013-02-05 Thread Jithin John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jithin John updated HIVE-3790:
--

Status: Open  (was: Patch Available)

 UDF to introduce an OFFSET(day,month or year) for a given date or timestamp 
 

 Key: HIVE-3790
 URL: https://issues.apache.org/jira/browse/HIVE-3790
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0
Reporter: Jithin John
 Fix For: 0.9.1


 Current releases of Hive lacks a  generic function which would find the date 
 offset to a date / timestamp. Current releases have date_add (date) and 
 date_sub(date) which allows user to add or substract days only.But we could 
 not use year or month as a unit.
 
 The Function DATE_OFFSET(date,offset,unit) returns the date offset value from 
 start_date according to the unit. Here the unit can be year , month and day.
 The function could be used for date range queries and is more flexible than 
 the existing functions.
 Functionality :-
 Function Name: DATE_OFFSET(date,offset,unit)

 Add a offset value to the unit part of the date/timestamp.
 Returns  the date in the format of -MM-dd .
 Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1
 - 2009-06-29
 Usage :-
 Case  : To calculate the expiry date of a item from manufacturing date 
 Table :- ITEM_TAB
  Manufacturing_date  |item id|store id|value|unit|price
   2012-12-01|110001|00003|0.99|1.00|0.99
   2012-12-02|110001|00008|0.99|0.00|0.00
   2012-12-03|110001|00009|0.99|0.00|0.00
   2012-12-04|110001|001112002|0.99|0.00|0.00
   2012-12-05|110001|001112003|0.99|0.00|0.00
   2012-12-06|110001|001112006|0.99|1.00|0.99
   2012-12-07|110001|001112007|0.99|0.00|0.00
   2012-12-08|110001|001112008|0.99|0.00|0.00
   2012-12-09|110001|001112009|0.99|0.00|0.00
   2012-12-10|110001|001112010|0.99|0.00|0.00
   2012-12-11|110001|001113003|0.99|0.00|0.00
   2012-12-12|110001|001113006|0.99|0.00|0.00
   2012-12-13|110001|001113008|0.99|0.00|0.00
   2012-12-14|110001|001113010|0.99|0.00|0.00
   2012-12-15|110001|001114002|0.99|0.00|0.00
   2012-12-16|110001|001114004|0.99|1.00|0.99
   2012-12-17|110001|001114005|0.99|0.00|0.00
   2012-12-18|110001|001121004|0.99|0.00|0.00 
 QUERY:
 select man_date , date_offset(man_date ,5 ,'year') as expiry_date from 
 item_tab;
 RESULT:
 2012-12-01  2017-12-01
 2012-12-02  2017-12-02
 2012-12-03  2017-12-03
 2012-12-04  2017-12-04
 2012-12-05  2017-12-05
 2012-12-06  2017-12-06
 2012-12-07  2017-12-07
 2012-12-08  2017-12-08
 2012-12-09  2017-12-09
 2012-12-10  2017-12-10
 2012-12-11  2017-12-11
 2012-12-12  2017-12-12
 2012-12-13  2017-12-13
 2012-12-14  2017-12-14
 2012-12-15  2017-12-15
 2012-12-16  2017-12-16
 2012-12-17  2017-12-17
 2012-12-18  2017-12-18

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3972) Support using multiple reducer for fetching order by results


 [ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3972:
-

Status: Open  (was: Patch Available)

comments

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp

2013-02-05 Thread Jithin John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jithin John updated HIVE-3790:
--

Attachment: HIVE-3790.patch

Attaching patch for the same. Please review it..

 UDF to introduce an OFFSET(day,month or year) for a given date or timestamp 
 

 Key: HIVE-3790
 URL: https://issues.apache.org/jira/browse/HIVE-3790
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0
Reporter: Jithin John
 Fix For: 0.9.1

 Attachments: HIVE-3790.patch


 Current releases of Hive lacks a  generic function which would find the date 
 offset to a date / timestamp. Current releases have date_add (date) and 
 date_sub(date) which allows user to add or substract days only.But we could 
 not use year or month as a unit.
 
 The Function DATE_OFFSET(date,offset,unit) returns the date offset value from 
 start_date according to the unit. Here the unit can be year , month and day.
 The function could be used for date range queries and is more flexible than 
 the existing functions.
 Functionality :-
 Function Name: DATE_OFFSET(date,offset,unit)

 Add a offset value to the unit part of the date/timestamp.
 Returns  the date in the format of -MM-dd .
 Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1
 - 2009-06-29
 Usage :-
 Case  : To calculate the expiry date of a item from manufacturing date 
 Table :- ITEM_TAB
  Manufacturing_date  |item id|store id|value|unit|price
   2012-12-01|110001|00003|0.99|1.00|0.99
   2012-12-02|110001|00008|0.99|0.00|0.00
   2012-12-03|110001|00009|0.99|0.00|0.00
   2012-12-04|110001|001112002|0.99|0.00|0.00
   2012-12-05|110001|001112003|0.99|0.00|0.00
   2012-12-06|110001|001112006|0.99|1.00|0.99
   2012-12-07|110001|001112007|0.99|0.00|0.00
   2012-12-08|110001|001112008|0.99|0.00|0.00
   2012-12-09|110001|001112009|0.99|0.00|0.00
   2012-12-10|110001|001112010|0.99|0.00|0.00
   2012-12-11|110001|001113003|0.99|0.00|0.00
   2012-12-12|110001|001113006|0.99|0.00|0.00
   2012-12-13|110001|001113008|0.99|0.00|0.00
   2012-12-14|110001|001113010|0.99|0.00|0.00
   2012-12-15|110001|001114002|0.99|0.00|0.00
   2012-12-16|110001|001114004|0.99|1.00|0.99
   2012-12-17|110001|001114005|0.99|0.00|0.00
   2012-12-18|110001|001121004|0.99|0.00|0.00 
 QUERY:
 select man_date , date_offset(man_date ,5 ,'year') as expiry_date from 
 item_tab;
 RESULT:
 2012-12-01  2017-12-01
 2012-12-02  2017-12-02
 2012-12-03  2017-12-03
 2012-12-04  2017-12-04
 2012-12-05  2017-12-05
 2012-12-06  2017-12-06
 2012-12-07  2017-12-07
 2012-12-08  2017-12-08
 2012-12-09  2017-12-09
 2012-12-10  2017-12-10
 2012-12-11  2017-12-11
 2012-12-12  2017-12-12
 2012-12-13  2017-12-13
 2012-12-14  2017-12-14
 2012-12-15  2017-12-15
 2012-12-16  2017-12-16
 2012-12-17  2017-12-17
 2012-12-18  2017-12-18

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results


[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571160#comment-13571160
 ] 

Phabricator commented on HIVE-3972:
---

njain has commented on the revision HIVE-3972 [jira] Support using multiple 
reducer for fetching order by results.

INLINE COMMENTS
  conf/hive-default.xml.template:1621 nit: reducers


  for the last MapReduce task for order by
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 apache header
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 can you 
perform explain extended ?
  I think, it also shows the number of reducers.
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 Might be 
easier to create a tmp table with 10 rows initially to reduce the number of 
results.
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:8 Add some 
comments - it would be good to have a lot of examples.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 What 
happens if it is -1 ?

  Shouldn't useBucketingForOrderBy be false ?

REVISION DETAIL
  https://reviews.facebook.net/D8349

To: JIRA, navis
Cc: njain


 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [VOTE] Graduate HCatalog from the incubator and become part of Hive

2013-02-05 Thread Sushanth Sowmyan

And my axe! Erm... I mean, my +1.


On Mon, Feb 4, 2013 at 10:18 PM, Alan Gates ga...@hortonworks.com wrote:
 FYI.

 Alan.

 Begin forwarded message:

 From: Alan Gates ga...@hortonworks.com
 Date: February 4, 2013 10:18:09 PM PST
 To: hcatalog-...@incubator.apache.org
 Subject: [VOTE] Graduate HCatalog from the incubator and become part of Hive

 The Hive PMC has voted to accept HCatalog as a submodule of Hive.  You can 
 see the vote thread at 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6RrzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e
  .  We now need to vote to graduate from the incubator and become a 
 submodule of Hive.  This entails the following:

 1) the establishment of an HCatalog submodule in the Apache Hive Project;
 2) the adoption of the Apache HCatalog codebase into the Hive HCatalog 
 submodule; and
 3) adding all currently active HCatalog committers as submodule committers 
 on the Hive HCatalog submodule.

 Definitions for all these can be found in the (now adopted) Hive bylaws at 
 https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committer.

 This vote will stay open for at least 72 hours (thus 23:00 PST on 2/7/13).  
 PPMC members votes are binding in this vote, though input from all is 
 welcome.

 If this vote passes the next step will be to submit the graduation motion to 
 the Incubator PMC.

 Here's my +1.

 Alan.

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571168#comment-13571168
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision HIVE-2340 [jira] optimize orderby followed 
by a groupby.

  A general question ???

  How does it work with hive.optimize.reducededuplication ?

INLINE COMMENTS
  conf/hive-default.xml.template:1034 Sorry for joining late: Can you explain 
this more clearly ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby


 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2340:
-

Status: Open  (was: Patch Available)

comments

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571177#comment-13571177
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision HIVE-2340 [jira] optimize orderby followed 
by a groupby.

  Do you think it might be a good idea to get HIVE-3972 first ?

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 
Isn't it true that R1 and R2 will have the same cost for

  RS - GBY -- anything -- RS ?

  If yes, how do you know which rule will be fired ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [VOTE] Graduate HCatalog from the incubator and become part of Hive

2013-02-05 Thread Alexander Alten-Lorenz

+1, non-binding

- Alex

On Feb 5, 2013, at 10:06 AM, Sushanth Sowmyan khorg...@gmail.com wrote:

And my axe! Erm... I mean, my +1.

On Mon, Feb 4, 2013 at 10:18 PM, Alan Gates ga...@hortonworks.com wrote:
FYI.

Alan.

Begin forwarded message:

From: Alan Gates ga...@hortonworks.com
Date: February 4, 2013 10:18:09 PM PST
To: hcatalog-...@incubator.apache.org
Subject: [VOTE] Graduate HCatalog from the incubator and become part of Hive

The Hive PMC has voted to accept HCatalog as a submodule of Hive. You can
see the vote thread at
http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6RrzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e
. We now need to vote to graduate from the incubator and become a
submodule of Hive. This entails the following:

1) the establishment of an HCatalog submodule in the Apache Hive Project;
2) the adoption of the Apache HCatalog codebase into the Hive HCatalog
submodule; and
3) adding all currently active HCatalog committers as submodule committers
on the Hive HCatalog submodule.

Definitions for all these can be found in the (now adopted) Hive bylaws at
https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committer.

This vote will stay open for at least 72 hours (thus 23:00 PST on 2/7/13).
PPMC members votes are binding in this vote, though input from all is
welcome.

If this vote passes the next step will be to submit the graduation motion
to the Incubator PMC.

Here's my +1.

Alan.

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

[jira] [Updated] (HIVE-1662) Add file pruning into Hive.


 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-1662:
--

Attachment: HIVE-1662.D8391.1.patch

navis requested code review of HIVE-1662 [jira] Add file pruning into Hive..

Reviewers: JIRA

DPAL-1979 Add file pruning based on INPUT__FILE__NAME

now hive support filename virtual column.
if a file name filter presents in a query, hive should be able to only add 
files which passed the filter to input paths.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8391

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/NativeTablePredicateHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/file_pruning.q
  ql/src/test/results/clientpositive/file_pruning.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/20493/

To: JIRA, navis


 Add file pruning into Hive.
 ---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Navis
 Attachments: HIVE-1662.D8391.1.patch


 now hive support filename virtual column. 
 if a file name filter presents in a query, hive should be able to only add 
 files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1662) Add file pruning into Hive.


 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-1662:


Assignee: Navis
  Status: Patch Available  (was: Open)

 Add file pruning into Hive.
 ---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Navis
 Attachments: HIVE-1662.D8391.1.patch


 now hive support filename virtual column. 
 if a file name filter presents in a query, hive should be able to only add 
 files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Requests

2013-02-05 Thread kulkarni.swar...@gmail.com

Thanks Mark. Appreciate that. I'll take a look.


On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover grover.markgro...@gmail.comwrote:

 Swarnim,
 I left some comments on  reviewboard.

 On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2].
 If
  you guys get a chance to review and provide feedback on it, I will really
  appreciate.
 
  Thanks,
 
  [1] https://reviews.apache.org/r/9275/
  [2] https://reviews.apache.org/r/9276/
 
  --
  Swarnim
 




-- 
Swarnim

hive-trunk-hadoop1 - Build # 69 - Failure

Changes for Build #14
[hashutosh] HIVE-3004 : RegexSerDe should support other column types in 
addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan)


Changes for Build #15
[hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via 
Ashutosh Chauhan)


Changes for Build #16
[namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns
from an RC File (Kevin Wilfong via namit)


Changes for Build #17
[namit] HIVE-3899 Partition pruning fails on constant = constant expression
(Kevin Wilfong via namit)


Changes for Build #18
[hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via 
Ashutosh Chauhan)

[namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error
(Navis via namit)


Changes for Build #19
[cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark 
Grover and Gunther Hagleitner via cws)


Changes for Build #20
[namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit)


Changes for Build #21
[namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than 
the
max number of reducers (Kevin Wilfong via namit)


Changes for Build #22

Changes for Build #23
[namit] HIVE-3893 something wrong with the hive-default.xml
(jet cheng via namit)


Changes for Build #24
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)


Changes for Build #25
[namit] HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit)


Changes for Build #26
[namit] HIVE-3699 Multiple insert overwrite into multiple tables query stores 
same results
in all tables (Navis via namit)


Changes for Build #27
[hashutosh] HIVE-3537 : release locks at the end of move tasks (Namit via 
Ashutosh Chauhan)


Changes for Build #28
[namit] HIVE-3884 Better align columns in DESCRIBE table_name output to make 
more
human-readable (Dilip Joseph via namit)


Changes for Build #29

Changes for Build #30
[namit] HIVE-3916 For outer joins, when looping over the rows looking for 
filtered tags,
it doesn't report progress (Kevin Wilfong via namit)


Changes for Build #31
[hashutosh] HIVE-2332 : If all of the parameters of distinct functions are 
exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan)


Changes for Build #32

Changes for Build #33
[namit] HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit)


Changes for Build #34
[hashutosh] NPE in union processing followed by lateral view followed by 2 
group bys (Navis via Ashutosh Chauhan)


Changes for Build #35

Changes for Build #36

Changes for Build #37
[namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options
(Kevin Wilfong via namit)


Changes for Build #38

Changes for Build #39
[cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0
 (Prasad Mujumdar via cws)


Changes for Build #40

Changes for Build #41
[hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin 
via Ashutosh Chauhan)

[hashutosh] HIVE-3833 : object inspectors should be initialized based on 
partition metadata (Namit Jain via Ashutosh Chauhan)


Changes for Build #42

Changes for Build #43
[hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types 
that require access to a Schema (Sean Busbey via Ashutosh Chauhan)

[namit] HIVE-3943 Skewed query fails if hdfs path has special characters
(Gang Tim Liu via namit)


Changes for Build #44
[namit] HIVE-3825 Add Operator level Hooks
(Pamela Vagata via namit)


Changes for Build #45
[namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
(Kevin Wilfong via namit)

[namit] HIVE-3944 Make accept qfile argument for miniMR tests
(Navis via namit)


Changes for Build #46

Changes for Build #47
[hashutosh] Adding csv.txt file, left out from commit of 3528


Changes for Build #48
[namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23
(Sushanth Sownyan via namit)

[namit] HIVE-3921 recursive_dir.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3923 join_filters_overlap.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3924 join_nullsafe.q fails on 0.23
(Sushanth Sownyan via namit)


Changes for Build #49

Changes for Build #50
[hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext 
cannot be loaded/instantiated (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3947 : MiniMR test remains pending after test completion 
(Navis via Ashutosh Chauhan)


Changes for Build #51

Changes for Build #52
[kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a 
partition through the CLI. (Samuel Yuan via kevinwilfong)


Changes for Build #53
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #54
[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #55

Changes for Build

Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #56

See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/

--
[...truncated 7482 lines...]
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] Creating empty 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/exec/package-info.class
[javac] Creating empty 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/udf/generic/package-info.class
[javac] Creating empty 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/exec/errors/package-info.class
[javac] Creating empty 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes/org/apache/hadoop/hive/ql/lockmgr/package-info.class
 [copy] Copying 1 file to 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/classes

jar:
 [echo] Project: ql
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/libthrift-0.9.0.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/thrift/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/commons-lang-2.4.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/commons-lang/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/json-20090211.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/json/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/JavaEWAH-0.3.2.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/javaewah/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/avro-1.7.1.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/avro/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/avro-mapred-1.7.1.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/avro-mapred/classes
[unzip] Expanding: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/lib/default/javolution-5.5.1.jar
 into 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/javolution/classes
  [jar] Building jar: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/hive-exec-0.10.0-SNAPSHOT.jar
:: delivering :: org.apache.hive#hive-exec;0.10.0-SNAPSHOT :: 0.10.0-SNAPSHOT 
:: integration :: Tue Feb 05 16:12:36 UTC 2013
delivering ivy file to 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ql/ivy-0.10.0-SNAPSHOT.xml
:: publishing :: org.apache.hive#hive-exec
published hive-exec to 
/home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/jars/hive-exec.jar
published ivy to 
/home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/ivys/ivy.xml

create-dirs:
 [echo] Project: contrib
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/contrib/src/test/resources
 does not exist.

init:
 [echo] Project: contrib

setup:
 [echo] Project: contrib

ivy-init-settings:
 [echo] Project: contrib

ivy-resolve:
 [echo] Project: contrib
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml
[ivy:resolve] downloading 
/home/hudson/.ivy2/local/org.apache.hive/hive-exec/0.10.0-SNAPSHOT/jars/hive-exec.jar
 ...
[ivy:resolve] 
 
(4741kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hive#hive-exec;0.10.0-SNAPSHOT!hive-exec.jar (56ms)
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  ERRORS
[ivy:resolve]   SERVER ERROR: Service Unavailable 
url=http://www.sourceforge.net/projects/jdo2-api/files/jdo2-api//jdo2-api-2.3-ec.jar
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/56/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-contrib-default.xml
 to

Hive-trunk-hadoop2 - Build # 108 - Still Failing

Changes for Build #66
[hashutosh] HIVE-3004 : RegexSerDe should support other column types in 
addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan)


Changes for Build #67
[namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns
from an RC File (Kevin Wilfong via namit)

[hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via 
Ashutosh Chauhan)


Changes for Build #68
[namit] HIVE-3899 Partition pruning fails on constant = constant expression
(Kevin Wilfong via namit)


Changes for Build #69
[hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via 
Ashutosh Chauhan)

[namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error
(Navis via namit)


Changes for Build #70
[namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit)

[cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark 
Grover and Gunther Hagleitner via cws)


Changes for Build #71
[namit] HIVE-3893 something wrong with the hive-default.xml
(jet cheng via namit)

[namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than 
the
max number of reducers (Kevin Wilfong via namit)


Changes for Build #72
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)


Changes for Build #73
[namit] HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit)


Changes for Build #74
[namit] HIVE-3699 Multiple insert overwrite into multiple tables query stores 
same results
in all tables (Navis via namit)


Changes for Build #75
[namit] HIVE-3884 Better align columns in DESCRIBE table_name output to make 
more
human-readable (Dilip Joseph via namit)

[hashutosh] HIVE-3537 : release locks at the end of move tasks (Namit via 
Ashutosh Chauhan)


Changes for Build #76
[namit] HIVE-3916 For outer joins, when looping over the rows looking for 
filtered tags,
it doesn't report progress (Kevin Wilfong via namit)


Changes for Build #77
[hashutosh] HIVE-2332 : If all of the parameters of distinct functions are 
exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan)


Changes for Build #78

Changes for Build #79
[hashutosh] NPE in union processing followed by lateral view followed by 2 
group bys (Navis via Ashutosh Chauhan)

[namit] HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit)


Changes for Build #80

Changes for Build #81

Changes for Build #82
[namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options
(Kevin Wilfong via namit)


Changes for Build #83

Changes for Build #84
[cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0
 (Prasad Mujumdar via cws)


Changes for Build #85

Changes for Build #86
[hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin 
via Ashutosh Chauhan)

[hashutosh] HIVE-3833 : object inspectors should be initialized based on 
partition metadata (Namit Jain via Ashutosh Chauhan)


Changes for Build #87

Changes for Build #88
[namit] HIVE-3825 Add Operator level Hooks
(Pamela Vagata via namit)

[hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types 
that require access to a Schema (Sean Busbey via Ashutosh Chauhan)

[namit] HIVE-3943 Skewed query fails if hdfs path has special characters
(Gang Tim Liu via namit)


Changes for Build #89
[namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
(Kevin Wilfong via namit)

[namit] HIVE-3944 Make accept qfile argument for miniMR tests
(Navis via namit)


Changes for Build #90
[namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23
(Sushanth Sownyan via namit)

[namit] HIVE-3921 recursive_dir.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3923 join_filters_overlap.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3924 join_nullsafe.q fails on 0.23
(Sushanth Sownyan via namit)

[hashutosh] Adding csv.txt file, left out from commit of 3528


Changes for Build #91

Changes for Build #92
[hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext 
cannot be loaded/instantiated (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3947 : MiniMR test remains pending after test completion 
(Navis via Ashutosh Chauhan)


Changes for Build #93

Changes for Build #94
[kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a 
partition through the CLI. (Samuel Yuan via kevinwilfong)


Changes for Build #95
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #96
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #97
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)

[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan,

Hive-trunk-h0.21 - Build # 1957 - Failure

Changes for Build #1955
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #1956

Changes for Build #1957



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1957)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1957/ to 
view the results.

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-05 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571510#comment-13571510
 ] 

Owen O'Malley commented on HIVE-3874:
-

[~kevinwilfong] Thanks for the bug fixes, Kevin. I pushed the DynamicByteArray 
and double serialization fixes to [github|https://github.com/hortonworks/orc]. 
I have the null column problem fixed, but it is tied into my other changes on 
my row-seek dev branch. I hope to finish up the row-seek today and I'll merge 
it into master and make the patch putting it into Hive.

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-701) lots of reserved keywords in hive

2013-02-05 Thread Renata Ghisloti Duarte de Souza (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-701:
-

Attachment: HIVE-701.D8397.1.patch

sxyuan requested code review of HIVE-701 [jira] Make keywords non-reserved.

Reviewers: kevinwilfong, JIRA

Almost all keywords in Hive are reserved. This change makes all but the 
following keywords non-reserved:

IF, HAVING, WHERE, SELECT, UNIQUEJOIN, JOIN, ON, TRANSFORM, MAP, REDUCE, 
TABLESAMPLE, CAST, FUNCTION, EXTENDED, FORMATTED, PRETTY, CASE, WHEN, THEN, 
ELSE, END, DATABASE, CROSS

Because the grammar grew too large, it was split into multiple files to 
accommodate Java's code size limit. As a result, the custom error handling 
needed to be moved as well.

TEST PLAN
  Use keywords as identifiers in test queries. Existing unit tests should 
ensure that keywords will not be mistakenly identified as identifiers.

REVISION DETAIL
  https://reviews.facebook.net/D8397

AFFECTED FILES
  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
  ql/src/test/results/clientnegative/show_tables_bad1.q.out
  ql/src/test/results/clientnegative/archive_partspec3.q.out
  ql/src/test/results/clientnegative/invalid_create_tbl2.q.out
  ql/src/test/results/clientnegative/select_udtf_alias.q.out
  ql/src/test/results/clientnegative/show_tables_bad2.q.out
  ql/src/test/results/clientnegative/invalid_tbl_name.q.out
  ql/src/test/results/clientnegative/lateral_view_join.q.out
  ql/src/test/results/clientpositive/nonreserved_keywords_input37.q.out
  ql/src/test/results/clientpositive/nonreserved_keywords_insert_into1.q.out
  ql/src/test/results/compiler/errors/wrong_distinct2.q.out
  ql/src/test/results/compiler/errors/missing_overwrite.q.out
  ql/src/test/queries/clientnegative/show_tables_bad1.q
  ql/src/test/queries/clientnegative/show_tables_bad2.q
  ql/src/test/queries/clientpositive/nonreserved_keywords_insert_into1.q
  ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q
  ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/SelectClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
  ql/build.xml

To: JIRA


 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.D8397.1.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-701) lots of reserved keywords in hive

2013-02-05 Thread Samuel Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-701:
-

Attachment: HIVE-701.1.patch.txt

https://reviews.facebook.net/D8397

 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.1.patch.txt, HIVE-701.D8397.1.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-701) lots of reserved keywords in hive

2013-02-05 Thread Samuel Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-701:
-

Status: Patch Available  (was: Open)

 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.1.patch.txt, HIVE-701.D8397.1.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3989) TestCase TestMTQueries fails with IBM Java 6

Renata Ghisloti Duarte de Souza created HIVE-3989:
-

 Summary: TestCase TestMTQueries fails with IBM Java 6
 Key: HIVE-3989
 URL: https://issues.apache.org/jira/browse/HIVE-3989
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.0
 Environment: IBM Java 6 x86 64
Reporter: Renata Ghisloti Duarte de Souza
Priority: Minor
 Fix For: 0.10.0


The testcase fails with IBM java 6, due to a Hashmap problem.

Following the error:

[junit] diff -a 
/home/renata/stg-hadoop/hive-0.10/release-0.10.0/build/ql/test/logs/clientpositive/join2.q.out
 
/home/renata/stg-hadoop/hive-0.10/release-0.10.0/ql/src/test/results/clientpositive/join2.q.out
[junit] 109c109
[junit]  0 {VALUE._col0}
[junit] ---
[junit]  0 {VALUE._col4}
[junit] 112c112
[junit]outputColumnNames: _col0, _col9
[junit] ---
[junit]outputColumnNames: _col4, _col9
[junit] 115c115
[junit]expr: _col0
[junit] ---
[junit]expr: _col4
[junit] Test join2.q results check failed with error code 1


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-05 Thread Jingwei Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Attachment: HIVE-3672.6.patch.txt

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-05 Thread Jingwei Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Status: Patch Available  (was: Open)

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #283

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/

--
[...truncated 36454 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-02-05_14-53-56_527_6993925461901486378/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_396289522.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2013-02-05_14-54-00_889_4235432531689754431/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-02-05_14-54-00_889_4235432531689754431/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_573949786.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_531624891.txt
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/283/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302051454_1845300102.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (key int, value

[jira] [Updated] (HIVE-1662) Add file pruning into Hive.


 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-1662:


Status: Open  (was: Patch Available)

 Add file pruning into Hive.
 ---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Navis
 Attachments: HIVE-1662.D8391.1.patch


 now hive support filename virtual column. 
 if a file name filter presents in a query, hive should be able to only add 
 files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1662) Add file pruning into Hive.


 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-1662:
--

Attachment: HIVE-1662.D8391.2.patch

navis updated the revision HIVE-1662 [jira] Add file pruning into Hive..

  Fixed NPEs

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8391

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8391?vs=27249id=27273#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/NativeTablePredicateHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/file_pruning.q
  ql/src/test/results/clientpositive/file_pruning.q.out

To: JIRA, navis


 Add file pruning into Hive.
 ---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Navis
 Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch


 now hive support filename virtual column. 
 if a file name filter presents in a query, hive should be able to only add 
 files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly


[ 
https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571996#comment-13571996
 ] 

Phabricator commented on HIVE-2839:
---

navis has commented on the revision HIVE-2839 [jira] Filters on outer join 
with mapjoin hint is not applied correctly.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:169 It's 
just a list of operators which can have multiple parents. The conditions you've 
mentioned should be checked before calling this. I've done this intentionally 
cause this is utility class.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:181 hm.. ok.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:130 Return 
value might be used for setting OperatorDescs, which has ArrayList instead of 
List. Would it be better to wrap again with ArrayList before setting?

REVISION DETAIL
  https://reviews.facebook.net/D2079

To: JIRA, navis
Cc: njain


 Filters on outer join with mapjoin hint is not applied correctly
 

 Key: HIVE-2839
 URL: https://issues.apache.org/jira/browse/HIVE-2839
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch, HIVE-2839.D2079.3.patch, 
 HIVE-2839.D2079.4.patch, HIVE-2839.D2079.5.patch, HIVE-2839.D2079.6.patch


 Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions.
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND true limit 10;
 FAILED: Hive Internal Error: 
 java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc
  cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to 
 org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {code}
 and 
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND b.key * 10  '1000' limit 10;
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
   at

[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results


[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572006#comment-13572006
 ] 

Phabricator commented on HIVE-3972:
---

navis has commented on the revision HIVE-3972 [jira] Support using multiple 
reducer for fetching order by results.

INLINE COMMENTS
  conf/hive-default.xml.template:1621 ok. It's harder than writing some codes.
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowFetcher.java:1 ah, ok.
  ql/src/test/queries/clientpositive/orderby_query_bucketing.q:3 ok.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5604 It 
will be calculated by input size, which might be 1 or not. Then it would be 
safer assuming that it's not 1.

REVISION DETAIL
  https://reviews.facebook.net/D8349

To: JIRA, navis
Cc: njain


 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-701) lots of reserved keywords in hive


 [ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-701:
-

Attachment: HIVE-701.HIVE-701.D8397.2.patch

sxyuan updated the revision HIVE-701 [jira] Make keywords non-reserved.

  Forgot a step. FORMATTED and PRETTY are also non-reserved.

Reviewers: kevinwilfong, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8397

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8397?vs=27255id=27285#toc

AFFECTED FILES
  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
  ql/src/test/results/clientnegative/show_tables_bad1.q.out
  ql/src/test/results/clientnegative/archive_partspec3.q.out
  ql/src/test/results/clientnegative/invalid_create_tbl2.q.out
  ql/src/test/results/clientnegative/select_udtf_alias.q.out
  ql/src/test/results/clientnegative/show_tables_bad2.q.out
  ql/src/test/results/clientnegative/invalid_tbl_name.q.out
  ql/src/test/results/clientnegative/lateral_view_join.q.out
  ql/src/test/results/clientpositive/nonreserved_keywords_input37.q.out
  ql/src/test/results/clientpositive/nonreserved_keywords_insert_into1.q.out
  ql/src/test/results/compiler/errors/wrong_distinct2.q.out
  ql/src/test/results/compiler/errors/missing_overwrite.q.out
  ql/src/test/queries/clientnegative/show_tables_bad1.q
  ql/src/test/queries/clientnegative/show_tables_bad2.q
  ql/src/test/queries/clientpositive/nonreserved_keywords_insert_into1.q
  ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q
  ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/SelectClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
  ql/build.xml

To: kevinwilfong, JIRA, sxyuan


 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, 
 HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-701) lots of reserved keywords in hive

2013-02-05 Thread Samuel Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-701:
-

Attachment: HIVE-701.2.patch.txt

Updated, see Phabricator.

 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, 
 HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)

Navis created HIVE-3990:
---

 Summary: Provide input threshold for direct-fetcher (HIVE-2925)
 Key: HIVE-3990
 URL: https://issues.apache.org/jira/browse/HIVE-3990
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


As a followup of HIVE-2925, add input threshold for fetch task conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)


 [ 
https://issues.apache.org/jira/browse/HIVE-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3990:


Status: Patch Available  (was: Open)

 Provide input threshold for direct-fetcher (HIVE-2925)
 --

 Key: HIVE-3990
 URL: https://issues.apache.org/jira/browse/HIVE-3990
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial

 As a followup of HIVE-2925, add input threshold for fetch task conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1662) Add file pruning into Hive.


 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-1662:


Status: Patch Available  (was: Open)

 Add file pruning into Hive.
 ---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Navis
 Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch


 now hive support filename virtual column. 
 if a file name filter presents in a query, hive should be able to only add 
 files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)


 [ 
https://issues.apache.org/jira/browse/HIVE-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3990:
--

Attachment: HIVE-3990.D8415.1.patch

navis requested code review of HIVE-3990 [jira] Provide input threshold for 
direct-fetcher (HIVE-2925).

Reviewers: JIRA

DPAL-1371 Provide input threshold for direct-fetcher

As a followup of HIVE-2925, add input threshold for fetch task conversion.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8415

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/InputEstimator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
  ql/src/test/queries/clientpositive/nonmr_fetch_threshold.q
  ql/src/test/results/clientpositive/nonmr_fetch_threshold.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/20535/

To: JIRA, navis


 Provide input threshold for direct-fetcher (HIVE-2925)
 --

 Key: HIVE-3990
 URL: https://issues.apache.org/jira/browse/HIVE-3990
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3990.D8415.1.patch


 As a followup of HIVE-2925, add input threshold for fetch task conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly


[ 
https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572118#comment-13572118
 ] 

Phabricator commented on HIVE-2839:
---

njain has commented on the revision HIVE-2839 [jira] Filters on outer join 
with mapjoin hint is not applied correctly.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:169 You 
should assert numberParents == 1

  We need to check that before coming to this function.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java:130 ok - 
that's fine
  ideally, it should be cleaned up, but can be a follow-up

REVISION DETAIL
  https://reviews.facebook.net/D2079

To: JIRA, navis
Cc: njain


 Filters on outer join with mapjoin hint is not applied correctly
 

 Key: HIVE-2839
 URL: https://issues.apache.org/jira/browse/HIVE-2839
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch, HIVE-2839.D2079.3.patch, 
 HIVE-2839.D2079.4.patch, HIVE-2839.D2079.5.patch, HIVE-2839.D2079.6.patch


 Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions.
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND true limit 10;
 FAILED: Hive Internal Error: 
 java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc
  cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to 
 org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {code}
 and 
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND b.key * 10  '1000' limit 10;
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1321)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325)
   at

[jira] [Commented] (HIVE-701) lots of reserved keywords in hive


[ 
https://issues.apache.org/jira/browse/HIVE-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572122#comment-13572122
 ] 

Phabricator commented on HIVE-701:
--

njain has commented on the revision HIVE-701 [jira] Make keywords 
non-reserved.

INLINE COMMENTS
  ql/src/test/queries/clientpositive/nonreserved_keywords_input37.q:9 I haven't 
looked at the patch, but don't use MAP/REDUCE
  for tests.
  We are trying to deprecate this syntax, if possible.

REVISION DETAIL
  https://reviews.facebook.net/D8397

To: kevinwilfong, JIRA, sxyuan
Cc: njain


 lots of reserved keywords in hive
 -

 Key: HIVE-701
 URL: https://issues.apache.org/jira/browse/HIVE-701
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Samuel Yuan
 Attachments: HIVE-701.1.patch.txt, HIVE-701.2.patch.txt, 
 HIVE-701.D8397.1.patch, HIVE-701.HIVE-701.D8397.2.patch


 There is a problem if we want to use some reserved keywords:
 for example, creating a function of name left/right ? left/right is already a 
 reserved keyword.
 The other way around should also be possible - if we want to add a 'show 
 tables status' and some applications already use status as a column name, 
 they should not break

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572126#comment-13572126
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has commented on the revision HIVE-2340 [jira] optimize orderby 
followed by a groupby.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138
 HashSet?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251
 I think the number of reducers story deserves more comments (similar to what 
you've explained on the jira)
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 I think if you just run this optimization *after* CommonJoinResolver 
everything should be fine. It will either already have converted joins to 
mapjoins and this optimization won't apply or you still have a regular join and 
you can merge it without worrying about missing out on a mapjoin conversion. 
You could still have the sorted flag to express intent, but there isn't any 
optimization that will pull the rug out under you at the moment. Am I missing 
something?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
 HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results


[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572148#comment-13572148
 ] 

Navis commented on HIVE-3972:
-

I've missed some commits (HIVE-3633, etc). Should be merged correctly.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2238) Support for Median and Mode UDAFs

2013-02-05 Thread PRETTY SITHARA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRETTY SITHARA updated HIVE-2238:
-

Attachment: HIVE-2238.1.patch.txt

Patch for HIVE-2238

 Support for Median and Mode UDAFs
 -

 Key: HIVE-2238
 URL: https://issues.apache.org/jira/browse/HIVE-2238
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Travis Powell
 Attachments: HIVE-2238.1.patch.txt


 Median and Mode are essential functions for reducing/refining the data set, 
 and would allow for greater control over the selection of data. More involved 
 analytics are probably best handled by relational databases or OLAP cubes, 
 but Median and Mode are very practical for Hive solely in terms of delivering 
 a smaller data set, where items selected only have a certain mode. (Rows that 
 describe an object to which the table is joined where that object has a 
 column value frequency threshold.)
 Comments are more than welcome. Would be happy to support. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2238) Support for Median and Mode UDAFs

2013-02-05 Thread PRETTY SITHARA (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRETTY SITHARA updated HIVE-2238:
-

  Labels: patch  (was: )
Hadoop Flags: Incompatible change
  Status: Patch Available  (was: Open)

 Support for Median and Mode UDAFs
 -

 Key: HIVE-2238
 URL: https://issues.apache.org/jira/browse/HIVE-2238
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Travis Powell
  Labels: patch
 Attachments: HIVE-2238.1.patch.txt


 Median and Mode are essential functions for reducing/refining the data set, 
 and would allow for greater control over the selection of data. More involved 
 analytics are probably best handled by relational databases or OLAP cubes, 
 but Median and Mode are very practical for Hive solely in terms of delivering 
 a smaller data set, where items selected only have a certain mode. (Rows that 
 describe an object to which the table is joined where that object has a 
 column value frequency threshold.)
 Comments are more than welcome. Would be happy to support. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive Operator Counters

2013-02-05 Thread Jie Li

Hi all,

Does anyone notice that the operator counters are not properly
maintained? They are useful for understanding the query plan and
execution, e.g how many rows each operator is processing and
producing, and how much time each operator is spending.

NUM_INPUT_ROWS
NUM_OUTPUT_ROWS
TIME_TAKEN

They can be found in org.apache.hadoop.hive.ql.exec.Operator, but
since counterNameToEnum is never initialized, these counters are not
being calculated.

If this used to work and was broken somehow, I'll be glad to contribute:)

Jie

[jira] [Updated] (HIVE-3972) Support using multiple reducer for fetching order by results