If someone is interested in adding parallel ORDER BY to Hive (using 
TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and 
turn it into an automatic plan within Hive.  I have a better example for the 
sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the total 
order could be iterated via the filenames.

JVS

________________________________________
From: Ning Zhang [nzh...@facebook.com]
Sent: Friday, June 11, 2010 12:40 PM
To: 'hive-u...@hadoop.apache.org'
Cc: 'hive-dev@hadoop.apache.org'
Subject: Re: Is anybody working on the globally "order by" of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict 
mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
------
Sent from my blackberry

________________________________
From: Edward Capriolo <edlinuxg...@gmail.com>
To: hive-u...@hadoop.apache.org <hive-u...@hadoop.apache.org>
Cc: hive-dev@hadoop.apache.org <hive-dev@hadoop.apache.org>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang 
<zjf...@gmail.com<mailto:zjf...@gmail.com>> wrote:
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in 
this. As of now order by sets reduce tasks to 1 :)

Edward

Reply via email to