Re: [R] NetWorkSpace from REvolution; Distributed Computing setup questions

2010-11-10 Thread yeoldefortran

This is very late, but in case you are still looking for the solution to
this: Everything you did was right on, except the argument to sleigh is
'rprog' instead of 'RProg'.  That should fix the problem.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/NetWorkSpace-from-REvolution-Distributed-Computing-setup-questions-tp3019785p3036894.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NetWorkSpace from REvolution; Distributed Computing setup questions

2010-10-29 Thread Timothy Murphy
***Summary:***

I'm setting up a cluster using netWorkSpace, and I'm having issues
with the sleigh initialization. My R function to initialize the sleigh
succeeds and the sleigh appears to be ready, but I get apparently
conflicting information from status(s), rankCount(s), and s; and
basic sleigh functions cause the sleigh to hang indefinitely.

Also, the log file contains an error that indicates that the script is
trying to find a file in a nonexistent directory:
/usr/local/lib/R/site-library/nws/bin/RNWSSleighWorker.sh: 37:
/Library/Frameworks/R.framework/Resources/bin/R: not found (see
section 4).

I've spent quite a bit of time trying to debug this, and I've gathered
here all the information that I think may be pertinent to solving the
problem. The following is therefore a bit lengthy, but I think
complete (as far as I'm able to tell from the existing documentation).
It's organized into sections roughly by the topic tested.

So if you're familiar with the workings of netWorkSpaces, I would be
very grateful if you would take a look at my diagnostics below and
tell me if you can identify the problem.

***Details:***

***Section 1***
Currently my setup is:

MASTER:
MacBook Pro running OS X 10.6.4 R-2.11.1, Python 2.6.1, and NWSserver-2.0.0.

WORKER:
Optiplex GX620 running Ubuntu 10.10 (64bit), R-2.11.1, Python 2.6.6,
NWSserver-2.0.0, and NWS-2.0.0.3 (client)
R and Python are in the PATH on both machines; I can start them from
the worker's command line by typing R or python.
The client is able to find the RNWSSleighWorker.sh file.

(Note 1: I put the server software on the client because I was getting
an message saying: No nws server found each time I tried to install
the client software. I don't know if this is needed)
(Note 2: I plan to set up many more machines if I can get this working)
(Note 3: Originally I was trying this on Windows machines with Cygwin,
but I encountered the same error and figured I could at least rule out
a possible cause by setting it up on a linux machine. Ultimately I
would like to get this working in Windows/Cygwin.)

***Section 2***
The function I used to start the sleigh is:

s=sleigh(
+   nwsHost=172.30.xx.xx,
+   nwsPort=8765,
+   launch=sshcmd,
+   nodeList=c(10.85.xxx.xxx),
+   scriptExec=envcmd,
+   scriptDir=/usr/local/lib/R/site-library/nws/bin,
+   scriptName=RNWSSleighWorker.sh,
+   workingDir='~/tmp/',
+   logDir='~/tmp/',
+   outfile=outfileTest,
+   user=tj)

This function returns the message below and then clear command prompt:

Executing command:
'/Library/Frameworks/R.framework/Resources/library/nws/bin/SleighWorkerWrapper.sh'
'ssh' '-f' '-x' '-l' 'tj' '10.85.101.109' 'env'
'RSleighName=10.85.101.109'
'RSleighNwsName=sleigh_ride_0450__nwssNGG4LF'
'RSleighUserNwsName=sleigh_user_0452__nwssNGG4LF' 'RSleighID=1'
'RSleighWorkerCount=1'
'RSleighScriptDir=/usr/local/lib/R/site-library/nws/bin'
'RSleighNwsHost=172.30.34.71' 'RSleighNwsPort=8765'
'RSleighWorkingDir=~/tmp/'
'RProg=/Library/Frameworks/R.framework/Resources/bin/R'
'RSleighWorkerOut=sleigh_ride_0450__nwssNGG4LF_0001.txt'
'RSleighLogDir=~/tmp/'
'/usr/local/lib/R/site-library/nws/bin/RNWSSleighWorker.sh'

If I type the name of the sleigh s as below, I get information that
makes it look like the sleigh is ready to receive commands:

 s
NWS Sleigh Object
NWS Host:   172.30.xx.xx:8765
Workspace Name: sleigh_ride_0446__nwssNGG4LF
1 Worker Nodes: 10.85.xxx.xxx

Likewise, if I send a simple ssh command to the worker I get a response:
 system('ssh t...@10.85.101.109 date')
Fri Oct 29 15:15:10 EDT 2010

I can also communicate values between the two machines using the NWS
web server and the nwsStore() and nwsFetch() functions.

However, if I check the status of the sleigh using status(s) or
rankCount(s), I get less encouraging information:

 status(s)
$numWorkers
[1] 0
$closed
[1] 0

 rankCount(s6)
[1] 0

***Section 3***
I can access the NWS server through localhost:8766 and see that
sleighs are being created. There are two entries: a sleigh_ride and a
sleigh_user; but the worker count in the sleigh_ride is also zero.

If I execute either of the following test sleigh functions, the sleigh
will hang indefinitely (though the R terminal will not hang, since I
used blocking=false):
eachWorker(s5, Sys.info, eo=list(blocking=FALSE))
eachWorker(s, function() library(nws), eo=list(blocking=FALSE))

***Section 4***
***CRUX OF THE ISSUE (probably):***
Finally, three files get created in the ~/tmp directory that I
specified as the logDir and workingDir, named: outfileTest,
RSleighSentinelLog_1000_1 , and
sleigh_ride_0450__nwssNGG4LF_0001.txt. All three contain exactly the
same information:

/usr/local/lib/R/site-library/nws/bin/RNWSSleighWorker.sh: 37:
/Library/Frameworks/R.framework/Resources/bin/R: not found

Whats puzzling is the Library/Frameworks/R.framework/Resources/bin/R
part. That looks like an OS X-style path rather than Ubuntu-style. I
didn't specify that path