Hi,
I see you are running falkon release 174 (probably downloaded it from http://dev.globus.org/images/1/19/Falkon.r174.tgz). If you have svn, try checking out the latest version (revision 403).
svn co https://svn.globus.org/repos/falkon
If you don't have SVN, let me know and I can tar up the latest release and post it online. The 174 release is likely at least 6 months old, so getting the latest one is advisable.

In the meantime, the problem that you are facing, is likely connectivity between the service and the worker nodes. This will happen with the new release as well. Essentially, when the service starts up, it starts a server socket on some port (50001 in your case). This port needs to be open in the firewall for incoming connections. In your case, this is fine, as the worker is able to establish a connection to this port, as it registers and starts up. The worker also sets up a notification end point (a server socket on saturn.pku.edu.cn:50100 in your case), waiting for notification about work. So, when the service receives tasks from the client, it sends a notification to the worker, but in your case, the worker never received the notification, and its likely that the TCP timeout mechanism didn't get triggered yet to signify an error. I bet you have a firewall rule on the worker, that prohibits incoming connections on the 50100 port. This all worked when you were on the same node, because you had no firewall to deal with.

So, you have 2 options.

1) If you can configure your firewall, simply open up a window of ports (i.e. 50000 - 60000), which is what the default Falkon config expects. Do this on the client side, and on the worker side. On the service, you only need 1 port open, wherever the service is running.

2) If you can't configure the firewall on the worker, simply switch the worker to the C implementation (as opposed to the default Java implementation), and the worker will not need any open ports, as there are no more open server sockets on the worker nodes. If you want to try this option, let me know and I can guide you through the specific config options, as I don't think its in the startup guide.

For now, option #1 is your easiest, but option #2 can work with a few modifications to config files and making sure the C worker is compiled.

One last thing to watch out for. On the machine where you run your service, if the machine has multiple network adaptors, sometimes the wrong adaptor will be chosen by Java, and the web service calls won't be routed properly. Furthermore, on the client side, or the workers, the notifications can be sent to the wrong places if the wrong network adaptor is used. If you have a case with multiple network adaptors, we have methods to overide the default network interface Java picks.

Cheers,
Ioan

?????? wrote:
Hello everyone.

I describe my situation.

Recently I installed Falkon on 3 SMP Servers. I followed the User Guide to see 
if it works.

When I ran the client, the container and the executor on the same server, 
everything seemed OK. However, when I ran the container on one server and the 
executor on another, jobs could not be dispatched to the executor. The 
following are messages after running the container, the executor, and the 
client, respectively:

Container:

[glo...@moon falkon.r174]$ falkon-service-stdout.sh 50001 
${FALKON_CONFIG}/Falkon.config
starting GT4.0.4 container based on config file 
/home/globus/lyn/falkon.r174/config/Falkon.config and with Falkon service on 
port 50001...
Starting SOAP server at: http://162.105.203.136:50001/wsrf/services/ With the following services: [1]: http://162.105.203.136:50001/wsrf/services/AdminService
[2]: http://162.105.203.136:50001/wsrf/services/AuthzCalloutTestService
??
[25]: http://162.105.203.136:50001/wsrf/services/WidgetService
[26]: http://162.105.203.136:50001/wsrf/services/gsi/AuthenticationService
Falkon Service Started...
java.lang.ArrayIndexOutOfBoundsException: 16
        at org.globus.GenericPortal.common.GetSample.run(Unknown Source)
Worker saturn.pku.edu.cn:50100 is registered and ready to receive work!

Executor:

l...@saturn:~/falkon.r174/logs> falkon-worker-stdout.sh 162.105.203.136 50001
creating Falkon Java Executor resource...
Setting appropriate security from file 
'/home/lyn/falkon.r174/config/worker-security-config.xml'!
Endpoint reference written to file /home/lyn/falkon.r174/worker/WorkerEPR.txt 
(10885ms)
Starting Falkon Java Executor in interactive mode...
Number of workers: 1
Started worker 0!
Worker saturn.pku.edu.cn:50100 started succesful!
Waiting for shutdownHook to be triggered...
WORKER: lifeListen Thread not started... will live forever until terminated 
explicitly!!!

Client:

[...@biogrid falkon.r174]$ falkon-client.sh 162.105.203.136 50001 
workloads/sleep/sleep_1x10
Starting Falkon Command Line Client v1.0
Starting non-interactive mode....
Reading file: workloads/sleep/sleep_1x10... Finished reading 10 tasks in memory.... null time 0.0 tasks_success 0 tasks_failed 0 tasks_sent 0 completed 0.0 tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?
Notification Endpoint (automatic): biogrid.pku.edu.cn:50100
null time 0.0 tasks_success 0 tasks_failed 0 tasks_sent 0 completed 0.0 
tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?
null time 1.002 tasks_success 0 tasks_failed 0 tasks_sent 10 completed 0.0 
tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?
null time 2.003 tasks_success 0 tasks_failed 0 tasks_sent 10 completed 0.0 
tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?
??
null time 302.591 tasks_success 0 tasks_failed 0 tasks_sent 10 completed 0.0 
tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?
null time 303.593 tasks_success 0 tasks_failed 0 tasks_sent 10 completed 0.0 
tasks_tp 0.0 aver_tp 0.0 stdev_tp 0.0 ETA ?

What is the cause of this problem?

Related log files are presented as attachments.

Thank in advance
Yingnan Li


--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118
=================================================================
Cel:   1-847-722-0876
Tel:   1-847-491-8163
Email: ira...@eecs.northwestern.edu
Web:   http://www.eecs.northwestern.edu/~iraicu/
      https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================


Reply via email to