Large Similarity Job failing

Debasish Das Tue, 17 Feb 2015 06:30:14 -0800

Hi,

I am running brute force similarity from RowMatrix on a job with 5M x 1.5M
sparse matrix with 800M entries. With 200M entries the job run fine but
with 800M I am getting exceptions like too many files open and no space
left on device...


Seems like I need more nodes or use dimsum sampling ?

I am running on 10 nodes where ulimit on each node is set at 65K...Memory
is not an issue since I can cache the dataset before similarity computation
starts.

I tested the same job on YARN with Spark 1.1 and Spark 1.2 stable. Both the
jobs failed with FetchFailed msgs.

Thanks.
Deb

Large Similarity Job failing

Reply via email to