Tom White wrote:
From the client's point of view fs.default.name sets the default
filesystem, and is used to resolve paths that don't specify a
protocol. You can always use a fully qualified URI to specify the path
e.g. s3://bucket/a/b or hdfs://nn/a/b. This allows you to e.g. e.g.
take map inputs from HDFS and write reduce outputs to S3.
For HDFS the setting of fs.default.name in hadoop-site.xml determines
the host and port for the namenode.
Does this help? How are you trying to use S3 by the way?
Yup - I got that far. It looks like with S3 there is no real name node or data
node cluster -- that S3 distribution is used instead (sort of directly). That's
where my question was. I like that if that's the case. Does that make sense?
We will probably be running at least one version of a log writer/map-reducer
on EC2/S3. Basically, large volumes of data related to a specific type of
problem that we map-reduce for analysis. We've been playing with Pig on
top of map-reduce as well. Good stuff.
The only gotcha I see: We wrote (extended really) a SWIG wrapper on top
of the C libhdfs library so we could interface to Python. It looks like the
libhdfs
connect logic isn't using the URI schemes 100% correctly -- I doubt S3 will
work through there. But that looks like an easy fix if that's the case (I
think).
Testing that next ...
--
Steve Sapovits
Invite Media - http://www.invitemedia.com
[EMAIL PROTECTED]