Re: Streaming KMeans clustering

Sotiris Salloumis Fri, 27 Dec 2013 00:00:26 -0800

Hi Suneel,

Is it possible to upload debug or log messages from the OOM exceptions you
have seen to take a look on them?


Regards
Sotiris


On Thu, Dec 26, 2013 at 8:19 PM, Suneel Marthi <[email protected]>wrote:

> I would push the code freeze until this is resolved (and the reason I had
> been holding off). This is something that should have been raised for 0.8
> release and I dob;t think we should defer this to the next one.
>
> I heard people outside of dev@ and user@ who have tried running Streaming
> KMeans (from 0.8) on their Production clusters on large datasets and had
> seen the job crash in the Reduce phase due to OOM errors (this is with
> -Xmx2GB).
>
>
>
>
>
>
> On Thursday, December 26, 2013 12:53 PM, Isabel Drost-Fromm <
> [email protected]> wrote:
>
> On Thu, Dec 26, 2013 at 12:28:18AM -0800, Suneel Marthi wrote:
>
> > Its when you increase the no. of documents and the size of each
> >  document (add more dimensions) that you start seeing performance issues
> which are:
> > a)The Mappers take long to complete and its either the searcher.remove()
> or searcher.searchFirst() calls (will check again in my next attempt) that
> seems to be the bottleneck.
> > b) Once the Mappers complete (after several hours) the Reducer dies with
> an OOM exception (despite having set -Xmx2G).
>
> Given that there seem to be a couple of people experiencing issues I think
> it makes sense to create a JIRA issue here to track progress - either code
> improvements or better documentation on how to run this implementation.
>
> @Suneel: Does it make sense to push code freeze to after fixing this or
> should this be communicated as a known defect in the release notes?
>
>
> Isabel

Re: Streaming KMeans clustering

Reply via email to