Re: Is anybody working on the globally order by of hive ?
See https://issues.apache.org/jira/browse/HIVE-1402. On Fri, Jun 11, 2010 at 1:22 PM, John Sichi jsi...@facebook.com wrote: If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive. I have a better example for the sampling query which I haven't published yet. We would also need to name the final output files in such a way that the total order could be iterated via the filenames.
Re: Is anybody working on the globally order by of hive ?
Great, I can work on this issue. On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher ham...@cloudera.com wrote: See https://issues.apache.org/jira/browse/HIVE-1402. On Fri, Jun 11, 2010 at 1:22 PM, John Sichi jsi...@facebook.com wrote: If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive. I have a better example for the sampling query which I haven't published yet. We would also need to name the final output files in such a way that the total order could be iterated via the filenames. -- Best Regards Jeff Zhang
Is anybody working on the globally order by of hive ?
Hi all, From the wiki of hive, Hive do not have the feature of globally order by, the sort by of hive is for each reducer. Our team think the globally order by is an important feature for users, so wondering is anybody working it ? I am very interested to been involved. -- Best Regards Jeff Zhang
Re: Is anybody working on the globally order by of hive ?
On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang zjf...@gmail.com wrote: Hi all, From the wiki of hive, Hive do not have the feature of globally order by, the sort by of hive is for each reducer. Our team think the globally order by is an important feature for users, so wondering is anybody working it ? I am very interested to been involved. -- Best Regards Jeff Zhang Jeff, I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :) Edward
Re: Is anybody working on the globally order by of hive ?
Good idea Edward. It would definitely better if it is what it sounds to be. Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back. Thanks, Ning -- Sent from my blackberry From: Edward Capriolo edlinuxg...@gmail.com To: hive-u...@hadoop.apache.org hive-u...@hadoop.apache.org Cc: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org Sent: Fri Jun 11 11:13:57 2010 Subject: Re: Is anybody working on the globally order by of hive ? On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang zjf...@gmail.commailto:zjf...@gmail.com wrote: Hi all, From the wiki of hive, Hive do not have the feature of globally order by, the sort by of hive is for each reducer. Our team think the globally order by is an important feature for users, so wondering is anybody working it ? I am very interested to been involved. -- Best Regards Jeff Zhang Jeff, I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :) Edward
RE: Is anybody working on the globally order by of hive ?
If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive. I have a better example for the sampling query which I haven't published yet. We would also need to name the final output files in such a way that the total order could be iterated via the filenames. JVS From: Ning Zhang [nzh...@facebook.com] Sent: Friday, June 11, 2010 12:40 PM To: 'hive-u...@hadoop.apache.org' Cc: 'hive-dev@hadoop.apache.org' Subject: Re: Is anybody working on the globally order by of hive ? Good idea Edward. It would definitely better if it is what it sounds to be. Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back. Thanks, Ning -- Sent from my blackberry From: Edward Capriolo edlinuxg...@gmail.com To: hive-u...@hadoop.apache.org hive-u...@hadoop.apache.org Cc: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org Sent: Fri Jun 11 11:13:57 2010 Subject: Re: Is anybody working on the globally order by of hive ? On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang zjf...@gmail.commailto:zjf...@gmail.com wrote: Hi all, From the wiki of hive, Hive do not have the feature of globally order by, the sort by of hive is for each reducer. Our team think the globally order by is an important feature for users, so wondering is anybody working it ? I am very interested to been involved. -- Best Regards Jeff Zhang Jeff, I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :) Edward