subject:"how to process a file in spark standalone cluster without distributed storage \(i.e. HDFS\/EC2\)\?"

how to process a file in spark standalone cluster without distributed storage (i.e. HDFS/EC2)?

2015-02-06 Thread Henry Hung

Hi All, sc.textFile will not work because the file is not distributed to other workers, So I try to read the file first using FileUtils.readLines and then use sc.parallelize, but the readLines failed because OOM (file is large). Is there a way to split local files and upload those partition to

RE: how to process a file in spark standalone cluster without distributed storage (i.e. HDFS/EC2)?

2015-02-06 Thread Henry Hung

Hi All, I already find a solution to solve this problem. Please ignore my question... Thanx Best regards, Henry From: MA33 YTHung1 Sent: Friday, February 6, 2015 4:34 PM To: user@spark.apache.org Subject: how to process a file in spark standalone cluster without distributed storage (i.e.