Re: [Beowulf] Building my own highend cluster

pauln Wed, 28 Jun 2006 14:22:31 -0700

This isn't very good but it gives an overview of what I'm doing -
it's not exact and needs updating.  If people are interested in more

detail I'll gladly update the page. It also doesn't explain howto usecfengine

to configure the ramdisk.


.. my apologies in advance:
http://www.psc.edu/~pauln/Diskless_Boot_Howto.html

Basically we build a node with kickstart (or whatever) and then tar up
it's entire filesystem back to the server.  Then make a 64MB ramdisk and

stick the contents of "/" into that. There are some etc files whichneed to bemodified too - rc.sysinit, fstab being the critical ones. Then we usepxelinuxand pxe booting to load the kernel and ramdisk onto the client.The reason for using cfengine is so the same ramdisk can be used formachines ofdifferent classes or types. Editing the ramdisk is a pain, but makingchangesto the cfagent text file is really easy. As you stated, mounting rootover nfs is

inefficient and requires server directories for each client though we still
use NFS for /usr.

One of our production storage clusters uses this technique for booting.Since

we require that /usr/ is mounted via NFS and this represents a single point

of failure, we cluster 2 nfs servers with the heartbeat package (whichshare

/var/lib/nfs).

paul

Vincent Diepeveen wrote:

Hi Paul,

Have a FAQ which describes how to boot nodes diskless?
I'm about to build coming months a 16 node cluster (as soon as i'vegot budget
for more nodes) and the only achievement in my life so far in beowulf
area was getting a 2 node cluster to work *with* disks.

That's too expensive however and eating unnecessary power.
14 disks is $$$ but more importantly also eat effectively nearly 30watt a disk from the power(maxtors are like 22 watt and that's *after* the psu lost a lot ofpower!!!!).
Let's save on power therefore!

Further my experiences with NFS is that it slows down the entire network.

My basic idea is to just buy some wood, and mount 7-14 mainboards
with each a dual woodcrest 1.6ghz dual core and 512MB ram (or less
if i can get that RAM a lot cheaper), or when budget is smaller then
simply A64 with a single chip or mainboards with a single woodcrestchip (in both
cases dual core of course).
At it, the 'master node' being a dual opteron dual core (or a 15thnode having
a bit faster woodcrest chips and quite a bit more RAM) which i plan
to connect to 2 switches which can have 8 nodes each, or if i canafford it and
it exists a 16 node switch.
Put the wood & mainboards in in the garage then take a big fan whichcan use apipe to blow out air and make some filter to allow air getting intothe garage,
this filter catching dust.
The big advantage of putting the cluster in the garage is because abig white Canadian/American
shepherd will guard it nearly 20 hours a day there.
(Anyway any environmental norm with respect to radiation i don't needto care about when constructing that supercomputer,just walking the dog means in this country you break the law; and inthe meantime my government has
12 meters away from that garage 2 x 450 megawatt (MVA) powerlines,
as they don't need to fix any existing bad situations. In fact they'rebusy building a new swimming pool for hundreds of kids underneath
those 2 x 450 megawatt cables right now).

So the only questions right now are:

a) which cables do i need for QM400 cards to connect?
b) which SWITCHES work for QM400 cards?
c) do QM400 cards work node <==> node without switch in between (justlike the QM500s worked here like that)
d) will they work for those woodcrest mainboards?

Of course looking for second hand cables, second hand switches?
[EMAIL PROTECTED] to email to for those who have some left.
I've got 17 of those cards so could on paper move to 16 nodes (keeping1 card in reserve).
SOFTWARE for the cluster:
My plan is to take a look once more again to openmosix and openssi andplan to modify it, even if i lose a factor2 in latency to it after modification, if that would give thepossibility for a shared memory supercomputer then that'squite awesome and preferred. If losses under factor 2 are not possibleto latency then i'll again have to work with the
shmem library.

Programming in MPI/SHMEM is by the way pretty retarded way to program.
Shared memory programming is just *so superior* over that in terms ofease for the programmer and it also meansyou can just recompile your code to work on a quad opteron or 8processor opteron without any need to strip
MPI commands.
Patching openssi or openmosix is probably more interesting to do thancontinueing an old MPI/SHMEM version.
Migration of shared memory by the way is not needed and not preferredfor the approach i use in Diep,
so on paper both OpenMosix and OpenSSI qualify.
If it can work then that'll be a factor 2.8 ^ 7 = 1349 times morepowerful than deep blue at least, and factor 2 morepowerful than hydra and it's very secure and fool proof, just notsaucage proof.
Hydra managed to get on CNN regurarly when playing Adams. Amazingly.We'll hit by september probably indirectlynational TV when selling one of our products and i hope to reach somemore when this cluster works.
Thanks for any suggestions,
Vincent

----- Original Message ----- From: "pauln" <[EMAIL PROTECTED]>
To: "Vincent Diepeveen" <[EMAIL PROTECTED]>
Cc: "Eray Ozkural" <[EMAIL PROTECTED]>; <[email protected]>
Sent: Wednesday, June 28, 2006 8:39 PM
Subject: Re: [Beowulf] Ultimate cluster distro
This isn't really a distribution-related comment but in light ofVincent'spoints I think it's appropriate. We're running diskless nodes from asingle
generic root fs ramdisk which is dynamically configured at boot by a
cfengine script. Other filesystems (ie /usr) are mounted over nfs.I've found
that this combination of pxelinux and cfengine is extremely powerful for
managing clusters - especially ones that tend to change frequently. paul
Vincent Diepeveen wrote:
Let me kick off with a few points, most likely many will enhancethat with more points
a) having a compiled driver into the kernel of the network card inquestion
   this is by far the hardest part.
b) pdsh installed at all machines and naming of machines in alogical mannerc) diskless operation at nodes other than the masternode, usinglocal disk only as 'scratch'd) because A usually goes wrong the capability to easily compile avanilla kernel inside the distribution
D is by far most important

Vincent
----- Original Message ----- From: "Eray Ozkural"<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, June 24, 2006 3:06 PM
Subject: [Beowulf] Ultimate cluster distro
I would like to make a small survey here to get
a rough idea of every essential detail in a cluster
distro, because I am thinking of writing some add-on
for our linux distribution to this end.

Best,

--
Eray Ozkural (exa), PhD candidate. Comp. Sci. Dept., BilkentUniversity, Ankara
http://www.cs.bilkent.edu.tr/~erayo  Malfunct: http://www.malfunct.com
ai-philosophy: http://groups.yahoo.com/group/ai-philosophy
Pardus: www.uludag.org.tr   KDE Project: http://www.kde.org
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf



_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Building my own highend cluster

Reply via email to