Is the threshold valid only for tall skinny matrices ? Mine is 6 m x 1.5 m and I made sparsity pattern 100:1.5M..we would like to increase the sparsity pattern to 1000:1.5M
I am running 1.1 stable and I get random shuffle failures...may be 1.2 sort shuffle will help.. I read in Reza paper that oversample works only if cols are skinny so I am not very keen to oversample... On Feb 17, 2015 2:01 PM, "Xiangrui Meng" <men...@gmail.com> wrote: > The complexity of DIMSUM is independent of the number of rows but > still have quadratic dependency on the number of columns. 1.5M columns > may be too large to use DIMSUM. Try to increase the threshold and see > whether it helps. -Xiangrui > > On Tue, Feb 17, 2015 at 6:28 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > > Hi, > > > > I am running brute force similarity from RowMatrix on a job with 5M x > 1.5M > > sparse matrix with 800M entries. With 200M entries the job run fine but > with > > 800M I am getting exceptions like too many files open and no space left > on > > device... > > > > Seems like I need more nodes or use dimsum sampling ? > > > > I am running on 10 nodes where ulimit on each node is set at > 65K...Memory is > > not an issue since I can cache the dataset before similarity computation > > starts. > > > > I tested the same job on YARN with Spark 1.1 and Spark 1.2 stable. Both > the > > jobs failed with FetchFailed msgs. > > > > Thanks. > > Deb >