Hi Ferdy, I'm not aware of anyone running this way in production, but for test purposes it is often useful to run two DataNodes on a single physical server. It works fine, you just need to give the two services different HADOOP_CONF_DIR values with modified port numbers and storage directories. I've previously posted recipes for doing those configurations, but the spam filter is bouncing messages from me containing the link, so just go to the Apache list archive at http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/ and browse to my email of Thu, 16 Sep 2010 00:45:23 GMT, for a discussion of the details.
If you give the two DN control of separate subsets of disks, it will support your scenario below. --Matt On May 9, 2011, at 4:24 AM, Ferdy Galema wrote: Is it possible to enforce a replication of 2 for a single node, so that replicas are spread out over disks? Currently with more replicas than nodes this results in "under-replicated" blocks. I understand that normally the best way to replicate is to span multiple machines for availabilty purposes. However is there a way around this?