The complexity of DIMSUM is independent of the number of rows but
still have quadratic dependency on the number of columns. 1.5M columns
may be too large to use DIMSUM. Try to increase the threshold and see
whether it helps. -Xiangrui

On Tue, Feb 17, 2015 at 6:28 AM, Debasish Das <debasish.da...@gmail.com> wrote:
> Hi,
>
> I am running brute force similarity from RowMatrix on a job with 5M x 1.5M
> sparse matrix with 800M entries. With 200M entries the job run fine but with
> 800M I am getting exceptions like too many files open and no space left on
> device...
>
> Seems like I need more nodes or use dimsum sampling ?
>
> I am running on 10 nodes where ulimit on each node is set at 65K...Memory is
> not an issue since I can cache the dataset before similarity computation
> starts.
>
> I tested the same job on YARN with Spark 1.1 and Spark 1.2 stable. Both the
> jobs failed with FetchFailed msgs.
>
> Thanks.
> Deb

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to