Thanks Tom!

Your explanation makes things a lot clearer.  I think that changing
the 'fs.default.name' to something like 'dfs.namenode.address' would
certainly be less confusing since it would clarify the purpose of
these values.

-lincoln

--
lincolnritter.com



On Fri, Jul 11, 2008 at 4:21 AM, Tom White <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
> <[EMAIL PROTECTED]> wrote:
>> Thank you, Tom.
>>
>> Forgive me for being dense, but I don't understand your reply:
>>
>
> Sorry! I'll try to explain it better (see below).
>
>>
>> Do you mean that it is possible to use the Hadoop daemons with S3 but
>> the default filesystem must be HDFS?
>
> The HDFS daemons use the value of "fs.default.name" to set the
> namenode host and port, so if you set it to a s3 URI, you can't run
> the HDFS daemons. So in this case you would use the start-mapred.sh
> script instead of start-all.sh.
>
>> If that is the case, can I
>> specify the output filesystem on a per-job basis and can that be an S3
>> FS?
>
> Yes, that's exactly how you do it.
>
>>
>> Also, is there a particular reason to not allow S3 as the default FS?
>
> You can allow S3 as the default FS, it's just that then you can't run
> HDFS at all in this case. You would only do this if you don't want to
> use HDFS at all, for example, if you were running a MapReduce job
> which read from S3 and wrote to S3.
>
> It might be less confusing if the HDFS daemons didn't use
> fs.default.name to define the namenode host and port. Just like
> mapred.job.tracker defines the host and port for the jobtracker,
> dfs.namenode.address (or similar) could define the namenode. Would
> this be a good change to make?
>
> Tom
>

Reply via email to