----- Original Message ----- From: "Brian Dobbins" <[EMAIL PROTECTED]>
To: "Vincent Diepeveen" <[EMAIL PROTECTED]>
Cc: "pauln" <[EMAIL PROTECTED]>; "Eray Ozkural" <[EMAIL PROTECTED]>; <[email protected]>
Sent: Saturday, June 03, 2006 11:04 AM
Subject: Re: [Beowulf] Building my own highend cluster


Hi Vincent (and others),

 I just wanted to add my own two cents after having fairly recently
[snip]

Thanks, i'll have a look at it!

Of course i prefer to just put in a cdrom, hit enter and then connect the
cables.

But really, if you guys talk about cfengine i have no clue what universe you talk about.

If i boot a machine without harddrive, basically the machine says: "F you, error! Press enter to reboot"

Ok let's start please there. What do i do after getting that message?

Which key do i hit?

recalled the relative complexity of creating diskless nodes 'by hand' a
few years back and subsequently finding the wonderful simplicity of
tools such as Warewulf (or Rocks).  So, in the interest of providing
more information to the discussion at hand, here's a bit more detail and
other assorted thoughts:

I put in a warewulf cdrom in the 'masternode', press enter, select
at all 'diskless nodes' in bios: "boot over network" and it all works fine?

btw does that 'boot over network means i need a 16 node hub for 100 mbit and
connect all the machines besides the quadrics network also to 100 mbit?

About wareful, small problem, how to coboot it with openSSI and elan3 drivers?

Now don't tell me it's based upon open-BS, learning linux when Linus started releasing
it start of 90s was already hard enough for me :)

[From pauln]
.. my apologies in advance:
http://www.psc.edu/~pauln/Diskless_Boot_Howto.html

 While I think cfengine and custom scripts gives a ton of flexibility,
I've found it much easier on our diskless clusters to use the Warewulf
software ( http://www.warewulf-cluster.org/ ).  It handles a lot of the
behind-the-scenes dirty work for you (ie, making the RAM disks/tmpfs,
configuring PXE & DHCP, etc.) and the people on the mailing list tend to
be quick to respond to troubles with effective solutions.  Also, it's
actively supported by other people and it just makes life a lot easier,
in my opinion.  It isn't hard at all to tweak, either, and I'd happily
go into more detail if you wish, but I'd really recommend a quick look
through the website as well, just to get a rough idea of the process.

 Secondly, though I haven't used it myself, I recently spoke with a
friend who was very knowledgeable about Rocks, which also has a diskless
mode, I'm told.  Here's the link for that: (
http://www.rocksclusters.org/ )

Programming in MPI/SHMEM is by the way pretty retarded way to program.

 If ease-of-use and shared-memory style are more important to you than
performance, you might be interested in checking out the "Cluster
OpenMP" developments in the Intel compilers.

OpenMP doesn't enter the room here of course.

No no shared memory programming is way more easier.

Just share in linux with shget and shmat some memory and you're shared memory.

That's how diep works basically.

If i go add all kind of fancy MPI commands to that, it of course slows down a factor 2 or so
first, at a single processor.

Much easier is to just keep using what i've got. Start n processes at n cores, and use shared memory
to divide memory segments.

The assumption in diep is that the process that allocates shared memory segments the first and also cleans them (or initializes them whatever you want to call it) is the processor at which the memory gets allocated.

If that principle gets followed, then diep runs parallel fine, even with pretty bad latencies from processor to processor.

The luck i've got in Diep is that it has the most chessknowledge in its evaluation function from all chessprograms in world. That's a result from me having been dogfood for world top players over the years and i actually managed to draw a world top 6 player once myself in an official major league game),
some of them in world top 10 even. You learn the game quickly then :)

So needing those 64 bytes from a remote node isn't happening too frequently in Diep and with 4 cores at a dual opteron
of course the odds of it being at a remote memory node is far less than 50%.

Example of access to remote memory is in hashtable:

   unsigned int
     l,procnr,hindex;
procnr = ((((unsigned int)(hashpos.lo&0x000000000000ffff))*nprocesses)>>16); hindex = (unsigned int)((((hashpos.lo>>16)&0x00000000ffffffff)*abmod)>>32);
   hentry = &(globaltrans[procnr][hindex]);

So basically it exists:
  HashEntry *globaltrans[MAXPROCESSORS];

I attach simply with shmget/shmat shared memory to that from remote processors.
Then what happens is a lookup.

This is a lot simpler of course than OpenMP not to mention MPI.

This is simplistically how you program for a shared memory machine such as a quad opteron or a quad xeon too.

This is how the commercial version of the software looks like too of course.

As you see i also avoid a slow 'modulo' instruction or 2 in the code.
Average coders would write here something like:

   procnr = ((unsigned int)hashpos.lo) % (unsigned int)nprocesses);
   hindex = (unsigned int)((hashpos.lo>>16)%abmod);

modulo and dividing is BAD on the processor. Very very slow.

Though not near as slow as a MPI call.

 This is mostly an aside, but why would you need to strip MPI commands to
run on a 4 or 8 processor system?

The basic point is: most sciensits first slow down their program factor 20 to get MPI
and in order to then simply throw factor 1000 at it.

I can't afford that loss at a single mainboard machine. This software is quite optimized written to run optimal at a single mainboard
machine. No slowdowns.

So if i add MPI calls that slows me down.

If i move from my dual opteron dual core to a 16 node cluster using mpi calls, my first priority is to be faster than something very well
optimized for a single mainboard machine.

THAT IS NOT EASY.

matter.  I agree shared memory methods are easier to program, but I

It's not about stripping.

We're talking about 2.2 MB of optimized C code where i would ADD mpi commands to, with all bugs that you get and that need to bugfixed. Bugfixing that takes years.

Vincent

 Finally, going back to the beginning of the discussion, I'd just caution
you about putting motherboards on a slab of wood in a garage.  The
filter might keep dust out of the garage, but other things always seem
to manage to get into garages, and lots of creepy-crawly things love
warmth and light - two things your system are bound to give off.  :)

Bugs :)

Thanks,
Vincent

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to