On Thu, Dec 26, 2013 at 10:19 AM, Suneel Marthi <suneel_mar...@yahoo.com>wrote:
> I heard people outside of dev@ and user@ who have tried running Streaming > KMeans (from 0.8) on their Production clusters on large datasets and had > seen the job crash in the Reduce phase due to OOM errors (this is with > -Xmx2GB). > Excessive memory usage in reduce was a known bug that was addressed (supposedly) by using a combiner. This really smells like bug resurrection happened somehow. Clearly that also means that our unit tests are insufficient.