Re: Sorting data numerically
Simplest possible solution: zero-pad your keys to ten places? - Aaron On Sat, Mar 21, 2009 at 11:40 PM, Akira Kitada akit...@gmail.com wrote: Hi, By default Hadoop does ASCII sort the mapper's output, not numeric sort. However, I often want the framework to sort records in numeric order. Can I make the framework to do numeric sort? (I use Hadoop Streaming) Thanks, Akira
Re: Sorting data numerically
If Akira was to write his/her own Mappers, using types like IntWritable would result in it being numerically sorted right? Cheers, Tim On Mon, Mar 23, 2009 at 5:04 PM, Aaron Kimball aa...@cloudera.com wrote: Simplest possible solution: zero-pad your keys to ten places? - Aaron On Sat, Mar 21, 2009 at 11:40 PM, Akira Kitada akit...@gmail.com wrote: Hi, By default Hadoop does ASCII sort the mapper's output, not numeric sort. However, I often want the framework to sort records in numeric order. Can I make the framework to do numeric sort? (I use Hadoop Streaming) Thanks, Akira
Re: Sorting data numerically
Anytime, you can write your own key-classes which implements WritableComparable interface, and you can sort you key in any way you want. In fact, Hadoop MapReduce code have provide some frequently-used key-classes, such as BytesWritable, IntWritable, LongWritable, etc. Please study the code, you will get more. On Tue, Mar 24, 2009 at 12:15 AM, tim robertson timrobertson...@gmail.comwrote: If Akira was to write his/her own Mappers, using types like IntWritable would result in it being numerically sorted right? Cheers, Tim On Mon, Mar 23, 2009 at 5:04 PM, Aaron Kimball aa...@cloudera.com wrote: Simplest possible solution: zero-pad your keys to ten places? - Aaron On Sat, Mar 21, 2009 at 11:40 PM, Akira Kitada akit...@gmail.com wrote: Hi, By default Hadoop does ASCII sort the mapper's output, not numeric sort. However, I often want the framework to sort records in numeric order. Can I make the framework to do numeric sort? (I use Hadoop Streaming) Thanks, Akira
Re: Sorting data numerically
On Mar 23, 2009, at 9:15 AM, tim robertson wrote: If Akira was to write his/her own Mappers, using types like IntWritable would result in it being numerically sorted right? Yes. Or they can use the KeyFieldBasedComparator. I think if you put the following in your job conf, you'll get the right behavior. mapred.output.key.comparator.class = org.apache.hadoop.mapred.lib.KeyFieldBasedComparator mapred.text.key.comparator.options = -n -- Owen
Sorting data numerically
Hi, By default Hadoop does ASCII sort the mapper's output, not numeric sort. However, I often want the framework to sort records in numeric order. Can I make the framework to do numeric sort? (I use Hadoop Streaming) Thanks, Akira