Is the threshold valid only for tall skinny matrices ? Mine is 6 m x 1.5 m
and I made sparsity pattern 100:1.5M..we would like to increase the
sparsity pattern to 1000:1.5M
I am running 1.1 stable and I get random shuffle failures...may be 1.2 sort
shuffle will help..
I read in Reza paper that
Hi Reza,
With 40 nodes and shuffle space managed by YARN over HDFS usercache we
could run the similarity job without doing any thresholding...We used hash
based shuffle and sort hopefully will further improve it...Note that this
job was almost 6M x 1.5M
We will go towards 50 M x ~ 3M columns and
The complexity of DIMSUM is independent of the number of rows but
still have quadratic dependency on the number of columns. 1.5M columns
may be too large to use DIMSUM. Try to increase the threshold and see
whether it helps. -Xiangrui
On Tue, Feb 17, 2015 at 6:28 AM, Debasish Das