I would like to persist RDD TO HDFS or NFS mount. How to change the
location?
an RDD to
someplace like HDFS or NFS it you are attempting to interoperate with
another system, such as Hadoop. `.persist` is for keeping the contents of
an RDD around so future uses of that particular RDD don't need to
recalculate its composite parts.
On Sun Jan 25 2015 at 3:36:31 AM Larry Liu
,
I don’t think current Spark’s shuffle can support HDFS as a shuffle
output. Anyway, is there any specific reason to spill shuffle data to HDFS
or NFS, this will severely increase the shuffle time.
Thanks
Jerry
*From:* Larry Liu [mailto:larryli...@gmail.com]
*Sent:* Sunday, January
Will SPARK-1706 be included in next release?
On Wed, Jan 21, 2015 at 2:50 PM, Ted Yu yuzhih...@gmail.com wrote:
Please see SPARK-1706
On Wed, Jan 21, 2015 at 2:43 PM, Larry Liu larryli...@gmail.com wrote:
I tried to submit a job with --conf spark.cores.max=6
or --total-executor-cores 6
I tried to submit a job with --conf spark.cores.max=6
or --total-executor-cores 6 on a standalone cluster. But I don't see more
than 1 executor on each worker. I am wondering how to use multiple
executors when submitting jobs.
Thanks
larry
A wordcounting job for about 1G text file takes 1 hour while input from a
NFS mount. The same job took 30 seconds while input from local file system.
Is there any tuning required for a NFS mount input?
Thanks
Larry
Hi,
A wordcounting job for about 1G text file takes 1 hour while input from a
NFS mount. The same job took 30 seconds while input from local file system.
Is there any tuning required for a NFS mount input?
Thanks
Larry
Hi, Matei
Thanks for your response.
I tried to copy the file (1G) from NFS and took 10 seconds. The NFS mount
is a LAN environment and the NFS server is running on the same server that
Spark is running on. So basically I mount the NFS on the same bare metal
machine.
Larry
On Wed, Dec 17, 2014
use, run jstack to see where the process is
spending time. Also make sure Spark's local work directories
(spark.local.dir) are not on NFS. They shouldn't be though, that should be
/tmp.
Matei
On Dec 17, 2014, at 11:56 AM, Larry Liu larryli...@gmail.com wrote:
Hi, Matei
Thanks for your
Is it possible to disable input split if input is already small?
Thanks, Andrew. What about reading out of local?
On Fri, Oct 17, 2014 at 5:38 PM, Andrew Ash and...@andrewash.com wrote:
When reading out of HDFS it's the HDFS block size.
On Fri, Oct 17, 2014 at 5:27 PM, Larry Liu larryli...@gmail.com wrote:
What is the default input split size? How
11 matches
Mail list logo