Hi Suneel, Is it possible to upload debug or log messages from the OOM exceptions you have seen to take a look on them?
Regards Sotiris On Thu, Dec 26, 2013 at 8:19 PM, Suneel Marthi <[email protected]>wrote: > I would push the code freeze until this is resolved (and the reason I had > been holding off). This is something that should have been raised for 0.8 > release and I dob;t think we should defer this to the next one. > > I heard people outside of dev@ and user@ who have tried running Streaming > KMeans (from 0.8) on their Production clusters on large datasets and had > seen the job crash in the Reduce phase due to OOM errors (this is with > -Xmx2GB). > > > > > > > On Thursday, December 26, 2013 12:53 PM, Isabel Drost-Fromm < > [email protected]> wrote: > > On Thu, Dec 26, 2013 at 12:28:18AM -0800, Suneel Marthi wrote: > > > Its when you increase the no. of documents and the size of each > > document (add more dimensions) that you start seeing performance issues > which are: > > a)The Mappers take long to complete and its either the searcher.remove() > or searcher.searchFirst() calls (will check again in my next attempt) that > seems to be the bottleneck. > > b) Once the Mappers complete (after several hours) the Reducer dies with > an OOM exception (despite having set -Xmx2G). > > Given that there seem to be a couple of people experiencing issues I think > it makes sense to create a JIRA issue here to track progress - either code > improvements or better documentation on how to run this implementation. > > @Suneel: Does it make sense to push code freeze to after fixing this or > should this be communicated as a known defect in the release notes? > > > Isabel
