I have large number of key,value pairs. I don't actually care if data goes in value or key. Let me be more exact. (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each pair. I can put it in keys or values. I have experimented with both options (heavy key , light value) vs (light key, heavy value). It turns out that hk,lv option is much much better than (lk,hv). Has someone else also noticed this? Is there a way to make things faster in light key , heavy value option. As some application will need that also. Remember in both cases we are talking about atleast dozen or so million pairs. There is a difference of time in shuffle phase. Which is weird as amount of data transferred is same.
-gyanit -- View this message in context: http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html Sent from the Hadoop core-user mailing list archive at Nabble.com.