Thanks Tom! Your explanation makes things a lot clearer. I think that changing the 'fs.default.name' to something like 'dfs.namenode.address' would certainly be less confusing since it would clarify the purpose of these values.
-lincoln -- lincolnritter.com On Fri, Jul 11, 2008 at 4:21 AM, Tom White <[EMAIL PROTECTED]> wrote: > On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter > <[EMAIL PROTECTED]> wrote: >> Thank you, Tom. >> >> Forgive me for being dense, but I don't understand your reply: >> > > Sorry! I'll try to explain it better (see below). > >> >> Do you mean that it is possible to use the Hadoop daemons with S3 but >> the default filesystem must be HDFS? > > The HDFS daemons use the value of "fs.default.name" to set the > namenode host and port, so if you set it to a s3 URI, you can't run > the HDFS daemons. So in this case you would use the start-mapred.sh > script instead of start-all.sh. > >> If that is the case, can I >> specify the output filesystem on a per-job basis and can that be an S3 >> FS? > > Yes, that's exactly how you do it. > >> >> Also, is there a particular reason to not allow S3 as the default FS? > > You can allow S3 as the default FS, it's just that then you can't run > HDFS at all in this case. You would only do this if you don't want to > use HDFS at all, for example, if you were running a MapReduce job > which read from S3 and wrote to S3. > > It might be less confusing if the HDFS daemons didn't use > fs.default.name to define the namenode host and port. Just like > mapred.job.tracker defines the host and port for the jobtracker, > dfs.namenode.address (or similar) could define the namenode. Would > this be a good change to make? > > Tom >