Hi, I am running brute force similarity from RowMatrix on a job with 5M x 1.5M sparse matrix with 800M entries. With 200M entries the job run fine but with 800M I am getting exceptions like too many files open and no space left on device...
Seems like I need more nodes or use dimsum sampling ? I am running on 10 nodes where ulimit on each node is set at 65K...Memory is not an issue since I can cache the dataset before similarity computation starts. I tested the same job on YARN with Spark 1.1 and Spark 1.2 stable. Both the jobs failed with FetchFailed msgs. Thanks. Deb