Re: Tips on sorting using Hadoop
Hi, Is there a way to do this with streaming? I've noticed there is a -partitioner option for streaming, does that mean I have to write a java partitioner class to perform total order sorting? Thanks, Joseph On Sun, Sep 21, 2008 at 2:12 AM, lohit [EMAIL PROTECTED] wrote: Since this is sorting, does it help if you run map/reduce twice? Number of output bytes should be same as input bytes. To do total order sorting, you have to make your partition function split the keyspace equally in order among the number of reducers. For example look at the TeraSort as to how this is done. http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java Thanks, Lohit - Original Message From: Edward J. Yoon [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Saturday, September 20, 2008 10:53:40 AM Subject: Re: Tips on sorting using Hadoop I would recommend that run map/reduce twice. /Edward On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram [EMAIL PROTECTED] wrote: Hi, I want to sort my records ( consisting of string, int, float) using Hadoop. One way I have found is to set number of reducers = 1, but this would mean all the records go to 1 reducer and it won't be optimized. Can anyone point me to some better way to do sorting using Hadoop ? Thanks, Tenaali -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org -- Screenshots, http://flickr.com/photos/bizkit Blog, http://bz.d22.cc 張至(bizkit)
Re: Tips on sorting using Hadoop
Since this is sorting, does it help if you run map/reduce twice? Number of output bytes should be same as input bytes. To do total order sorting, you have to make your partition function split the keyspace equally in order among the number of reducers. For example look at the TeraSort as to how this is done. http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java Thanks, Lohit - Original Message From: Edward J. Yoon [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Saturday, September 20, 2008 10:53:40 AM Subject: Re: Tips on sorting using Hadoop I would recommend that run map/reduce twice. /Edward On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram [EMAIL PROTECTED] wrote: Hi, I want to sort my records ( consisting of string, int, float) using Hadoop. One way I have found is to set number of reducers = 1, but this would mean all the records go to 1 reducer and it won't be optimized. Can anyone point me to some better way to do sorting using Hadoop ? Thanks, Tenaali -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Tips on sorting using Hadoop
On Sat, Sep 20, 2008 at 11:12 AM, lohit [EMAIL PROTECTED] wrote: To do total order sorting, you have to make your partition function split the keyspace equally in order among the number of reducers. A library to do this was checked in yesterday. See HADOOP-3019. -- Owen
Tips on sorting using Hadoop
Hi, I want to sort my records ( consisting of string, int, float) using Hadoop. One way I have found is to set number of reducers = 1, but this would mean all the records go to 1 reducer and it won't be optimized. Can anyone point me to some better way to do sorting using Hadoop ? Thanks, Tenaali