Re: Tips on sorting using Hadoop

2008-09-24 Thread bz
Hi,

Is there a way to do this with streaming?

I've noticed there is a -partitioner option for streaming, does that mean
I have to write a java partitioner class to perform total order sorting?

Thanks,
Joseph



On Sun, Sep 21, 2008 at 2:12 AM, lohit [EMAIL PROTECTED] wrote:

 Since this is sorting, does it help if you run map/reduce twice? Number of
 output bytes should be same as input bytes.
 To do total order sorting, you have to make your partition function split
 the keyspace equally in order among the number of reducers.
 For example look at the TeraSort as to how this is done.
 http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java

 Thanks,
 Lohit



 - Original Message 
 From: Edward J. Yoon [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Saturday, September 20, 2008 10:53:40 AM
 Subject: Re: Tips on sorting using Hadoop

 I would recommend that run map/reduce twice.

 /Edward

 On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram [EMAIL PROTECTED] wrote:
  Hi,
  I want to sort my records ( consisting of string, int, float) using
 Hadoop.
 
  One way I have found is to set number of reducers = 1, but this would
 mean
  all the records go to 1 reducer and it won't be optimized. Can anyone
 point
  me to some better way to do sorting using Hadoop ?
 
  Thanks,
  Tenaali
 



 --
 Best regards, Edward J. Yoon
 [EMAIL PROTECTED]
 http://blog.udanax.org




-- 
Screenshots, http://flickr.com/photos/bizkit
Blog, http://bz.d22.cc
張至(bizkit)


Re: Tips on sorting using Hadoop

2008-09-20 Thread lohit
Since this is sorting, does it help if you run map/reduce twice? Number of 
output bytes should be same as input bytes.
To do total order sorting, you have to make your partition function split the 
keyspace equally in order among the number of reducers. 
For example look at the TeraSort as to how this is done. 
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java

Thanks,
Lohit



- Original Message 
From: Edward J. Yoon [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Saturday, September 20, 2008 10:53:40 AM
Subject: Re: Tips on sorting using Hadoop

I would recommend that run map/reduce twice.

/Edward

On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram [EMAIL PROTECTED] wrote:
 Hi,
 I want to sort my records ( consisting of string, int, float) using Hadoop.

 One way I have found is to set number of reducers = 1, but this would mean
 all the records go to 1 reducer and it won't be optimized. Can anyone point
 me to some better way to do sorting using Hadoop ?

 Thanks,
 Tenaali




-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org



Re: Tips on sorting using Hadoop

2008-09-20 Thread Owen O'Malley
On Sat, Sep 20, 2008 at 11:12 AM, lohit [EMAIL PROTECTED] wrote:

 To do total order sorting, you have to make your partition function split
 the keyspace equally in order among the number of reducers.


A library to do this was checked in yesterday. See HADOOP-3019.

-- Owen


Tips on sorting using Hadoop

2008-09-12 Thread Tenaali Ram
Hi,
I want to sort my records ( consisting of string, int, float) using Hadoop.

One way I have found is to set number of reducers = 1, but this would mean
all the records go to 1 reducer and it won't be optimized. Can anyone point
me to some better way to do sorting using Hadoop ?

Thanks,
Tenaali