Hi Sungho,
Here is a recipe for how to run multiple nodes on a single server, posted to
this list on Sept. 15:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3c8a898c33-dc4e-418c-adc0-5689d434b...@yahoo-inc.com%3E
For v22 and later, the world has been split into three parts; where there was
formerly HADOOP_HOME, there is now HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, and
HADOOP_MAPRED_HOME, and in the default configuration each of them has its own
conf/ subdirectory. However, it is acceptable to pile all the contents of
the three conf directories into a single conf directory somewhere else (the
only name conflict is configuration.xsl which can be shared), set an
environment variable $HADOOP_CONF_DIR to point to it, and pass that value in
with the --config option whenever you launch processes with bin/hadoop or
bin/hdfs.
Now, the above recipe assumes you want multiple nodes from ONE cluster running
on a single server. I suggest you start with that and get it working, so you
understand the hdfs-site.xml file and how it is used.
You seem to be asking to run multiple CLUSTERS on a single server. I believe
the same mechanism will work (pointing different node invocations at
different config directories), but you will need to make several more changes
in the $HADOOP_CONF_DIR/hdfs-site.xml files, to create different namenode
configurations as well as the different datanode configurations addressed in
the recipe. Please look at the documentation for which parameters to change.
A couple comments:
- You probably can't run two namenodes simultaneously in the same server,
unless it has a huge amount of memory and you don't care about performance.
But you can have two different configurations stored, and run them at different
times.
- If the ONLY difference in the two clusters is the number of datanodes, you
actually don't have to have different namenode configurations. You can just
configure 10 datanodes, and then sometimes run only 5 of them (clearing storage
in between test runs, of course, so it doesn't look like you lost half your
stored blocks!). This is because namenodes have no configuration for which or
how many datanodes to expect; namenodes simply accept registration from any
datanode that initiates communication with it.
- Your statement I can control number of datanode by change conf and restart
is therefore not entirely correct. Each datanode launched has to be pointed at
its own config, but there is no place in the config to define how many
datanodes to launch. (This is partly because running multiple nodes on a single
server is not considered normal for a production environment, even though it is
useful for a test environment.) You may be thinking of the slaves file, which
is used by some launch scripts, but that is a tool to assist users in launching
clusters, not part of namenode configuration, and is also not really oriented
toward launching multiple nodes in a single server, if you read the scripts.
If you want launch scripts to help you locally launch different numbers of
nodes with different configs, you'll have to write them yourself, but they're
really easy. They just consist of multiple lines that look like
$HADOOP_COMMON_HOME/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script
$HADOOP_HDFS_HOME/bin/hdfs start datanode|namenode [args]
with different values of $HADOOP_CONF_DIR for each line.
The same lines with stop instead of start will give you a well-behaved kill
script.
As always you have to start and stop each node with appropriate userId so they
have read/write and i/o access permissions.
Hope this helps,
--Matt
On Mar 1, 2011, at 4:19 AM, Sungho Jeon wrote:
Hi, I'm graduate student and my major is computer science, data mining.
Is that possible that install multiple hadoop in one node?
I mean, I want to install several hadoop that have different conf.
Specifically, one hadoop has 5 datanode and other hadoop has 10 datanode.
Of course I can control number of datanode by change conf and restart.
But, without changing conf, install multiple hadoop in one node is possible?
Thanks