On Wed, 1 Oct 2008, Donald Becker wrote:

That's correct. Our model is that a "cluster" is a single system -- and a single install.

That's the idea that I've also started with, almost 10 years ago ;-) Not using Beo*/bproc, but NFS-root which allowed a single install in the node "image" to be used on all nodes - although you'd probably call this 2 system installs (the master node itself and the node "image"). But over the course of the years I have changed my mind...

If you are running different distributions on nodes, you discard many of the opportunities of running a cluster. More importantly, it's much more knowledge- and labor-intensive to maintain the cluster while guaranteeing consistency.

It indeed requires more work, however in some cases it cannot be avoided. From my own experience: a quantum chemistry program was distributed some 5 years ago as a binary statically compiled on RH9 or RHEL3 (kernel 2.4 based) with MPICH included. This meant that when I wanted to switch to running a 2.6 kernel this program could not run anymore so some of the nodes had to be kept to an older distribution until a newer program version could be obtained (that took about a year); it also meant that whenever there were discussions about using higher performance interconnects than GigE, this software's users were insisting on buying more nodes rather than a faster interconnect. This situation has caused both technical and administrative issues and the possibility of running different distributions has solved all of them easily.

Having the possibility to run several distributions side-by-side requires spending some effort in organizing the other installed software, normally shared through NFS or a parallel FS to the nodes. But once you make the jump from 1 to 2, you might as well make it from 1 to many.

This leads me to observe that we have non-similar points of view: you are a maker of a cluster-oriented distribution, trying to promote it and its underlying ideas (which are fine ideas, no question about that :-)), and sure that it works because it was bought and used successfully. I, on the other hand, have to find solutions to keep the scientists productive (whatever productive means ;-)) and to keep them as far as possible from the system details so that they can concentrate on their work. So it's not surprising that we come to different conclusions - at least they sustain an interesting discussion :-)

I would be interested to hear Mark Hahn's opinion on this, as from how he presented himself to this list it seemed to me that he is in a very similar position to mine: supporting a variety of users with a variety of needs. But others should not feel left out, write your opinions as well ;-)

Most distributions (all the commercially interesting ones) are workstation-oriented

I don't really agree with this statement (looking at RHEL and SLES), but anyone who installs a workstation-oriented distribution on a cluster node gets what (s)he pays for :-) I have seen very recently (identity hidden to protect the guilty ;-)) such a node "image" which contained OpenOffice - to be fair, it was used via NFS-root so it wasn't wasting node memory, only master disk space...

--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: [EMAIL PROTECTED]
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to