Just getting started with Spark, so hopefully this is all there and I just 
haven't found it yet.

I have a driver pgm on my client machine, I can use addFiles to distribute 
files to the remote
worker nodes of the cluster. They are there to be found by my code running in 
the executors,
so al is good. But ...


1)      it also makes a copy on the local machine

-          is there a way to identify this isn't needed? I only need it on the 
cluster.

-          if I sent a .tar file, it unzips it for me, which is nice, but 
again, extra work on the

client machine when I'm not using it there.

2)      it copies the files to the spark_installdir/work/.

-          that's fine, I suppose. though is there any way to designate a 
location?

3)      they don't get cleaned up

-          I don't see anything ever getting removed from the work/. location; 
just keeps adding up

-          there was a cleanFiles() call, but I don't know that it cleaned up 
rather than just stopped

copying anymore (how it was doc'ed). But, this is deprecated now so it's moot 
anyway.

-          is there a removeFiles() call to clean up? What's the expected use 
case? How does my

code manually clean up; permission issues if I try?
Again, I searched the archives but didn't see any of this, but I'm just getting 
started so may very well
be missing this somewhere.

Thanks!
Tom

Reply via email to