Spark throws rsync: change_dir errors on startup

2015-04-01 Thread Horsmann, Tobias
Hi,

I try to set up a minimal 2-node spark cluster for testing purposes. When I 
start the cluster with start-all.sh I get a rsync error message:

rsync: change_dir "/usr/local/spark130/sbin//right" failed: No such file or 
directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1183) [sender=3.1.0]

(For clarification, my 2 nodes are called ‚right‘ and ‚left‘ referencing to the 
physical machines standing in front of me)
It seems that a file named after my master node ‚right‘ is expected to exist 
and the synchronisation with it fails as it does not exist.
I don’t understand what Spark is trying to do here. Why does it expect this 
file to exist and what content should it have?
 I assume I did something wrong in my configuration setup – can someone 
interpret this error message and has an idea where his error is coming from?

Regards,
Tobias


Re: Spark throws rsync: change_dir errors on startup

2015-04-02 Thread Horsmann, Tobias
Hi,
Verbose output showed no additional information about the origin of the error

rsync from right
sending incremental file list

sent 20 bytes  received 12 bytes  64.00 bytes/sec
total size is 0  speedup is 0.00
starting org.apache.spark.deploy.master.Master, logging to 
/usr/local/spark130/sbin/../logs/spark-huser-org.apache.spark.deploy.master.Master-1-cl-pc6.out
left: rsync from right
left: rsync: change_dir "/usr/local/spark130//right" failed: No such file or 
directory (2)
left: rsync error: some files/attrs were not transferred (see previous errors) 
(code 23) at main.c(1183) [sender=3.1.0]
left: starting org.apache.spark.deploy.worker.Worker, logging to 
/usr/local/spark130/sbin/../logs/spark-huser-org.apache.spark.deploy.worker.Worker-1-cl-pc5.out
right: rsync from right
right: sending incremental file list
right: rsync: change_dir "/usr/local/spark130//right" failed: No such file or 
directory (2)
right:
right: sent 20 bytes  received 12 bytes  64.00 bytes/sec
right: total size is 0  speedup is 0.00
right: rsync error: some files/attrs were not transferred (see previous errors) 
(code 23) at main.c(1183) [sender=3.1.0]
right: starting org.apache.spark.deploy.worker.Worker, logging to 
/usr/local/spark130/sbin/../logs/spark-huser-org.apache.spark.deploy.worker.Worker-1-cl-pc6.out

I also edited the script to remove the additional slash, but this did not help 
either. The workers are basically started by the script it is just this error 
message that is thrown.

Now, funny thing. I was so brave to create the folder //right Spark is 
desperately looking for. Guess what, this caused to a complete wipe of my local 
spark installation /usr/local/spark130 was cleaned completely expect for the 
logs folder….

Any suggestions what is happening here?

Von: Akhil Das mailto:ak...@sigmoidanalytics.com>>
Datum: Donnerstag, 2. April 2015 07:51
An: Tobias Horsmann 
mailto:tobias.horsm...@uni-due.de>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Betreff: Re: Spark throws rsync: change_dir errors on startup

Error 23 is defined as a "partial transfer" and might be caused by filesystem 
incompatibilities, such as different character sets or access control lists. In 
this case it could be caused by the double slashes (// at the end of sbin), You 
could try editing your sbin/spark-daemon.sh file, look for rsync inside the 
file, add -v along with that command to see what exactly i going wrong.

Thanks
Best Regards

On Wed, Apr 1, 2015 at 7:25 PM, Horsmann, Tobias 
mailto:tobias.horsm...@uni-due.de>> wrote:
Hi,

I try to set up a minimal 2-node spark cluster for testing purposes. When I 
start the cluster with start-all.sh I get a rsync error message:

rsync: change_dir "/usr/local/spark130/sbin//right" failed: No such file or 
directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1183) [sender=3.1.0]

(For clarification, my 2 nodes are called ‚right‘ and ‚left‘ referencing to the 
physical machines standing in front of me)
It seems that a file named after my master node ‚right‘ is expected to exist 
and the synchronisation with it fails as it does not exist.
I don’t understand what Spark is trying to do here. Why does it expect this 
file to exist and what content should it have?
 I assume I did something wrong in my configuration setup – can someone 
interpret this error message and has an idea where his error is coming from?

Regards,
Tobias



Which OS for Spark cluster nodes?

2015-04-03 Thread Horsmann, Tobias
Hi,
Are there any recommendations for operating systems that one should use for 
setting up Spark/Hadoop nodes in general?
I am not familiar with the differences between the various linux distributions 
or how well they are (not) suited for cluster set-ups, so I wondered if there 
is some preferred choices?

Regards,



Spark: Using "node-local" files within functions?

2015-04-14 Thread Horsmann, Tobias
Hi,

I am trying to use Spark in combination with Yarn with 3rd party code which is 
unaware of distributed file systems. Providing hdfs file references thus does 
not work.

My idea to resolve this issue was the following:

Within a function I take the HDFS file reference I get as parameter and copy it 
into the local file system and provide the 3rd party components what they 
expect.
textFolder.map(new Function<>()
{
public List<...> call(String inputFile)
throws Exception
{
   //resolve, copy hdfs file to local file system

   //get local file pointer
   //this function should be executed on a node, right. There is 
probably a local file system)

   //call 3rd party library with 'local file' reference

   // do other stuff
}
}

This seem to work, but I am not really sure if this might cause other problems 
when going to productive file sizes. E.g. the files I copy to the local file 
system might be large. Would this affect Yarn somehow? Are there more advisable 
ways to befriend HDFS-unaware libraries with HDFS file pointer?

Regards,