Re: Large Similarity Job failing

Debasish Das Wed, 25 Feb 2015 10:13:54 -0800

Is the threshold valid only for tall skinny matrices ? Mine is 6 m x 1.5 m
and I made sparsity pattern 100:1.5M..we would like to increase the
sparsity pattern to 1000:1.5M


I am running 1.1 stable and I get random shuffle failures...may be 1.2 sort
shuffle will help..

I read in Reza paper that oversample works only if cols are skinny so I am
not very keen to oversample...
 On Feb 17, 2015 2:01 PM, "Xiangrui Meng" <men...@gmail.com> wrote:

> The complexity of DIMSUM is independent of the number of rows but
> still have quadratic dependency on the number of columns. 1.5M columns
> may be too large to use DIMSUM. Try to increase the threshold and see
> whether it helps. -Xiangrui
>
> On Tue, Feb 17, 2015 at 6:28 AM, Debasish Das <debasish.da...@gmail.com>
> wrote:
> > Hi,
> >
> > I am running brute force similarity from RowMatrix on a job with 5M x
> 1.5M
> > sparse matrix with 800M entries. With 200M entries the job run fine but
> with
> > 800M I am getting exceptions like too many files open and no space left
> on
> > device...
> >
> > Seems like I need more nodes or use dimsum sampling ?
> >
> > I am running on 10 nodes where ulimit on each node is set at
> 65K...Memory is
> > not an issue since I can cache the dataset before similarity computation
> > starts.
> >
> > I tested the same job on YARN with Spark 1.1 and Spark 1.2 stable. Both
> the
> > jobs failed with FetchFailed msgs.
> >
> > Thanks.
> > Deb
>

Re: Large Similarity Job failing

Reply via email to