Yes. But not so badly. As long as the highest probability item is a small fraction of what any reducer must do, the compute load imbalance will be very, very small.
On Fri, Feb 5, 2010 at 2:09 PM, Mandar Rahurkar <[email protected]> wrote: > 1. I have an implementation with some optimizations that you > mentioned. Even when keying on the first two words on a ngram, we > would still have skewed sharding for unigrams. Isn't it? > -- Ted Dunning, CTO DeepDyve
