On 04/08/2010 03:13 PM, Kevin D. Clark wrote: > Tom Buskey writes: > > >> On Thu, Apr 8, 2010 at 10:42 AM, Kevin D. Clark wrote: >> > > >>> The problem that I had was that I frequently had to deal with the >>> situation of "this particular problem only really efficiently runs on >>> 1, 4, or 16 nodes in the cluster" or "this problem only really >>> efficiently runs on 1, 2, 4, 8, or 16" nodes in the cluster"....now, >>> what nodes were these again, and how do I relate all of the logfiles >>> that I obtained from the last program run? >>> >>> >>> >> You might have proven my point. >> > > Just to be clear, I was trying to illustrate your point, because you > an I appear to be in complete agreement on this issue. Maybe I'm not understanding the issue, but isn't the above why queuing systems were made? We're using a dirt-old version of Platform LSF and it already solves the 'running on heterogeneous systems distributed across an arbitrary number of nodes' problem. While returning the output via LSF or shared filesystem.
The original problem ($DWARVES) had to do with doing what really looks like sysadmin-type stuff, which dsh already can do. It has the notion of groups so you can have Solaris-specific commands sent to the group of Solaris systems, Red Hat-specific to Red Hat, etc. or have a group that includes all hosts for commands that works across everything. You can have dsh dispatch commands concurrently rather than serially that the for loop does. We can get ~200 nodes updated via systemimager in only a few minutes using this method. -Mark _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/