The only security is the one provided by the slave/master whitelists (more dumb proof than attack proof, but still useful to avoid clusters talking to each other accidentally).
I want to automate the deployment of hadoop clusters through Glu (from LinkedIn) since we already use it to do single click deployments. https://github.com/pongasoft/glu So far what I want to deploy, configure and start automatically without host names or ssh is: - hdfs (done, except for that UI glitch) - mapreduce (done, looks fine) - hbase (almost done) - hive (not started) - sqoop (not started) - oozie (not started) hdfs took me a while to figure out since I've never deployed hadoop clusters before, mapreduce was easier and hbase is comming along quickly. We use a single config file per cluster, mostly maps ip lists to roles, and include some configuration variables. From there a script tells glu what binaries go on what machines then glu deploys everything that needs to be deployed in parallel. If a new version of a binary is released only the machines that do not have the new binaries get redeployed. Adding/removing hdfs/mapreduce slaves is done in a few clicks in the Glu WebUI and takes just a few seconds (12s to deploy 3 machines last time I measured). -- View this message in context: http://hadoop.6.n7.nabble.com/IP-based-hadoop-cluster-tp70191p70241.html Sent from the common-user mailing list archive at Nabble.com.