Re: Is anybody working on the globally order by of hive ?

2010-06-12 Thread Jeff Hammerbacher
See https://issues.apache.org/jira/browse/HIVE-1402.

On Fri, Jun 11, 2010 at 1:22 PM, John Sichi jsi...@facebook.com wrote:

 If someone is interested in adding parallel ORDER BY to Hive (using
 TotalOrderPartitioner), here's a good starting point:

 http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

 The goal would be to take that manual two-step sample-then-sort process and
 turn it into an automatic plan within Hive.  I have a better example for the
 sampling query which I haven't published yet.

 We would also need to name the final output files in such a way that the
 total order could be iterated via the filenames.



Re: Is anybody working on the globally order by of hive ?

2010-06-12 Thread Jeff Zhang
Great, I can work on this issue.




On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher ham...@cloudera.com wrote:
 See https://issues.apache.org/jira/browse/HIVE-1402.

 On Fri, Jun 11, 2010 at 1:22 PM, John Sichi jsi...@facebook.com wrote:

 If someone is interested in adding parallel ORDER BY to Hive (using
 TotalOrderPartitioner), here's a good starting point:

 http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

 The goal would be to take that manual two-step sample-then-sort process and
 turn it into an automatic plan within Hive.  I have a better example for the
 sampling query which I haven't published yet.

 We would also need to name the final output files in such a way that the
 total order could be iterated via the filenames.





-- 
Best Regards

Jeff Zhang


Is anybody working on the globally order by of hive ?

2010-06-11 Thread Jeff Zhang
Hi all,

From the wiki of hive, Hive do not have the feature of globally order
by, the sort by of hive is for each reducer. Our team think the
globally order by is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


-- 
Best Regards

Jeff Zhang


Re: Is anybody working on the globally order by of hive ?

2010-06-11 Thread Edward Capriolo
On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang zjf...@gmail.com wrote:

 Hi all,

 From the wiki of hive, Hive do not have the feature of globally order
 by, the sort by of hive is for each reducer. Our team think the
 globally order by is an important feature for users, so wondering is
 anybody working it ? I am very interested to been involved.


 --
 Best Regards

 Jeff Zhang


Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in
this. As of now order by sets reduce tasks to 1 :)

Edward


Re: Is anybody working on the globally order by of hive ?

2010-06-11 Thread Ning Zhang
Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict 
mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
--
Sent from my blackberry


From: Edward Capriolo edlinuxg...@gmail.com
To: hive-u...@hadoop.apache.org hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally order by of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang 
zjf...@gmail.commailto:zjf...@gmail.com wrote:
Hi all,

From the wiki of hive, Hive do not have the feature of globally order
by, the sort by of hive is for each reducer. Our team think the
globally order by is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in 
this. As of now order by sets reduce tasks to 1 :)

Edward


RE: Is anybody working on the globally order by of hive ?

2010-06-11 Thread John Sichi
If someone is interested in adding parallel ORDER BY to Hive (using 
TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and 
turn it into an automatic plan within Hive.  I have a better example for the 
sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the total 
order could be iterated via the filenames.

JVS


From: Ning Zhang [nzh...@facebook.com]
Sent: Friday, June 11, 2010 12:40 PM
To: 'hive-u...@hadoop.apache.org'
Cc: 'hive-dev@hadoop.apache.org'
Subject: Re: Is anybody working on the globally order by of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict 
mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
--
Sent from my blackberry


From: Edward Capriolo edlinuxg...@gmail.com
To: hive-u...@hadoop.apache.org hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally order by of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang 
zjf...@gmail.commailto:zjf...@gmail.com wrote:
Hi all,

From the wiki of hive, Hive do not have the feature of globally order
by, the sort by of hive is for each reducer. Our team think the
globally order by is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in 
this. As of now order by sets reduce tasks to 1 :)

Edward