where storagelevel DISK_ONLY persists RDD to

2015-01-25 Thread Larry Liu
I would like to persist RDD TO HDFS or NFS mount. How to change the location?

Re: where storagelevel DISK_ONLY persists RDD to

2015-01-25 Thread Larry Liu
an RDD to someplace like HDFS or NFS it you are attempting to interoperate with another system, such as Hadoop. `.persist` is for keeping the contents of an RDD around so future uses of that particular RDD don't need to recalculate its composite parts. On Sun Jan 25 2015 at 3:36:31 AM Larry Liu

Re: Shuffle to HDFS

2015-01-25 Thread Larry Liu
, I don’t think current Spark’s shuffle can support HDFS as a shuffle output. Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this will severely increase the shuffle time. Thanks Jerry *From:* Larry Liu [mailto:larryli...@gmail.com] *Sent:* Sunday, January

Re: How to use more executors

2015-01-21 Thread Larry Liu
Will SPARK-1706 be included in next release? On Wed, Jan 21, 2015 at 2:50 PM, Ted Yu yuzhih...@gmail.com wrote: Please see SPARK-1706 On Wed, Jan 21, 2015 at 2:43 PM, Larry Liu larryli...@gmail.com wrote: I tried to submit a job with --conf spark.cores.max=6 or --total-executor-cores 6

How to use more executors

2015-01-21 Thread Larry Liu
I tried to submit a job with --conf spark.cores.max=6 or --total-executor-cores 6 on a standalone cluster. But I don't see more than 1 executor on each worker. I am wondering how to use multiple executors when submitting jobs. Thanks larry

wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry

wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
Hi, A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry

Re: wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
Hi, Matei Thanks for your response. I tried to copy the file (1G) from NFS and took 10 seconds. The NFS mount is a LAN environment and the NFS server is running on the same server that Spark is running on. So basically I mount the NFS on the same bare metal machine. Larry On Wed, Dec 17, 2014

Re: wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
use, run jstack to see where the process is spending time. Also make sure Spark's local work directories (spark.local.dir) are not on NFS. They shouldn't be though, that should be /tmp. Matei On Dec 17, 2014, at 11:56 AM, Larry Liu larryli...@gmail.com wrote: Hi, Matei Thanks for your

How to disable input split

2014-10-17 Thread Larry Liu
Is it possible to disable input split if input is already small?

Re: input split size

2014-10-17 Thread Larry Liu
Thanks, Andrew. What about reading out of local? On Fri, Oct 17, 2014 at 5:38 PM, Andrew Ash and...@andrewash.com wrote: When reading out of HDFS it's the HDFS block size. On Fri, Oct 17, 2014 at 5:27 PM, Larry Liu larryli...@gmail.com wrote: What is the default input split size? How