I buddy of mine who has a cluster that is over 1000(2000) nodes.
 
I've compiled a simple helloworld app to test it out.
 
I am using Intel MPI 2.0 and running over ethernet so I'm trying both the 
ssm(since the nodes are smp machines) and sock devices
 
 
i'm doing the following mpdboot -n 1500 --rsh=ssh 
I do a mpdtrace and all of the nodes in my mpd.hosts file is there.
 
I do a mpiexec -np 1500 ./helloworld and I get a newline 
 
15-20 minutes goes by and nothing happens. It looks like something is timing 
out.
 
Run the program on anywhere below 128 processors and it works.
 
Does anyone have any experience running intel mpi over 1000 nodes and do you 
have any tips to speed up task execution? Any tips to solve this issue?
 
Thanks,
BC
 
 
 
This message may contain confidential and/or privileged information.  If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein.  If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to