Well, a cluster woundn't really work. This is a distributed computing
environment more akin to something like seti@home or folding@home. Each
computer will boot, connect to the internet, grab a workload, and start
chewing on it. When it's done, it connects again, uploads the results,
downloads another workload, and continues. And yes, there is a proxy
that I will insert in-between. But that is beside the point. IT's not
that I'm taking one huge load and breaking it up. It's already broken
into small chunks when I get it. I just wanted a simple setup where the
machine boots and loads the program. The program then reads the config
file it gets pointed at, reads its data directory, and picks up where it
left off. If it doesn't have any data or the data is done, then connect
to the internet (proxy) for a new workload.
I know in the olden days, say about 20 years ago, this wouldn't be an
issue. Since everyone is only reading from/writing to their own data set
and nothing else, a simple NFS boot would be fine. There's not even a
syslog, or if there is then the syslog isn't /var/log/messages (or
/var/log/syslog on some distros) but /var/log/syslog/<hostname>
Here's the basics of what I had in mind for this:
1) PXE boot Kernel, Mount root as NFS root
2) init starts TestFolders (to make sure each worker has it's own folder)
3) Start worker program
The problem, of course, is that for programs like this each instance
will be looking for the same folder. I can't have worker1 store it's
data in /home/worker/.data because worker2 wants ITS data in
/home/worker/.data. And a RamFS wouldn't work because we need to keep
track of work between reboots (rare, but they do occasionally happen).
So I had the TestFolders script in my head. Basically,
/home/worker/.data would be a symlink to /tmp/workerdata.
/tmp/workerdata would in turn be a symlink to
/home/worker/data.<hostname>. That way each machine THINKS it's got its
own /home/worker/.data but no one is stepping on anyone else's toes.
That's why I had thought of NFS/PXE booting. If there's just no way to
do this outside of Puppet or something like that, then so be it. But I'd
think it would not be that hard to setup with just DHCP and iPXE.
Speaking of iPXE, anyone got a good example boot script for it?
Something that will display a menu, then attmept to load kernel (and
initrd if necessary) from the tftp server and execute the kernel with
the options provided, which change based on which menu option is chosen?
I've looked at some example scripts, but everything seems to resolve
around iSCSI booting. I don't want an iSCSI boot, just a simple boot
from a NFS folder.
On 12/14/2020 4:04 PM, Daniel Fussell wrote:
On 12/11/20 6:46 PM, Dan Egli wrote:
On 12/11/2020 3:01 PM, Daniel Fussell wrote:
PXE booting is usually done to install a base image to a machine. If
you want to PXE boot and run diskless clients, you may have to shuffle
some things into the initrd and change systemd service dependencies to
start the network before mounting the root filesystem and chrooting
to it.
As said before, to have a common filesystem/image with machine-specific
customization (whether made by systemd, admin, or user) , you are going
to have to get clever with overlayfs or some combination of small NFS or
local volumes mounted into the tree. You will have to be careful that
processes needing files in the customized mount areas are not run before
the overlay/mounts are ready, else the processes may continue to use the
open file descriptors from the original volume.
Who said anything about machine specific? I'm looking at the idea of
having 25-50 machines, identical hardware, identical software,
identical uses. Only difference between them would be 1) hostname, 2)
ip and 3) specific SET of data being processed.
This sounds more like a compute cluster or a render farm than your
run-of-the-mill home network; clusters are easier to PXE boot and run
from a common image.
I'd further say that if you are splitting up data sets for machines to
chew on, then you should probably look at cluster managers like Moab,
Maui, PBS, Torque, etc. If you are CPU bound, store all of your data
sets on an NFS export on a central server and tell the processing nodes
which files to work with in the job scheduler; there's no reason to
create separate home directories for each processing node. The results
files can be written to the NFS, either in the same directory, or in
separate directories as you desire. Those are things is determined by
the job scheduling config options, not the boot-time config.
If you are IO bound (especially network IO bound), I understand various
cluster managers will distribute the necessary files in each batch job
to each node, then start the batch, and write the results out to the
destination of your choosing. I've heard of some clusters managers that
can split the dataset to be stored on each node's local storage, and if
a process needs data stored on another node it will migrate the entire
process to that needed node because it's sometimes cheaper to move the
process' memory and use local storage than to move entire data sets
across a network.
;-Daniel
/*
PLUG:http://plug.org, #utah on irc.freenode.net
Unsubscribe:http://plug.org/mailman/options/plug
Don't fear the penguin.
*/
--
Dan Egli
From my Test Server
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/