Hi,

Our group is trying to set up a prototype for what will eventually become a cluster of ~50 nodes.

Anyone have experiences with a stateless Hadoop cluster setup using this method on CentOS? Are there any caveats with a read-only root file system approach? This would save us from having to keep a root volume on every system (whether it is installed on a USB thumb drive, or a RAID 1 of bootable / partitions).

http://citethisbook.net/Red_Hat_Introduction_to_Stateless_Linux.html

We would like to keep the OS root file system separate from the Hadoop filesystem(s) for maintenance reasons (we can hot swap disks while the system is running)

We were also considering installing the root filesystem on USB flash drives, making it persistent yet separate. However we would identify and turn off anything that would cause excess writes to the root filesystem given the limited number of USB flash drive write cycles (keep IO writes to the root filesystem to a minimum). We would do this by storing the Hadoop logs on the same filesystem/drive as what we specify in dfs.data.dir/dfs.name.dir.

In the end we would have something like this:

USB (MS DOS partition table + 1 ext2/3/4 partition)
/dev/sda
/dev/sda1    mounted as /        (possibly read-only)
/dev/sda2    mounted as /var    (read-write)
/dev/sda3    mounted as /tmp    (read-write)

Hadoop Disks (no partition table or GPT since these are 3TB disks)
/dev/sdb    /mnt/d0
/dev/sdc    /mnt/d1
/dev/sdd    /mnt/d2

/mnt/d0 would contain all Hadoop logs.

Hadoop configuration files would still reside on /


Any issues with such a setup?  Are there better ways of achieving this?

Reply via email to