Hi,

I have a small question about a custom partitioners.


I couldn't really find a method in the Java API which lets me partition the 
dataset in a total order.

Is that something that I just overlooked, or is that something not really 
supported?


The use case is writing out data so that it can be consumed by another tool. 
and also for quick human inspection

I can easily apply the sorting on a hash partition, but I would loose the total 
order, that all keys in partition one come after or before partition two etc.


PIG supports this by mapping the ORDER BY statement to 2 jobs. The first one 
samples the data to build a sketch to approximate the key distribution and then 
feds the result into the total order partitioner.


Thanks


Johannes

Reply via email to