Tom White wrote:

From the client's point of view fs.default.name sets the default
filesystem, and is used to resolve paths that don't specify a
protocol. You can always use a fully qualified URI to specify the path
e.g. s3://bucket/a/b or hdfs://nn/a/b. This allows you to e.g. e.g.
take map inputs from HDFS and write reduce outputs to S3.

For HDFS the setting of fs.default.name in hadoop-site.xml determines
the host and port for the namenode.

Does this help? How are you trying to use S3 by the way?

Yup - I got that far.  It looks like with S3 there is no real name node or data
node cluster -- that S3 distribution is used instead (sort of directly).  That's
where my question was.  I like that if that's the case.  Does that make sense?

We will probably be running at least one version of a log writer/map-reducer
on EC2/S3. Basically, large volumes of data related to a specific type of problem that we map-reduce for analysis. We've been playing with Pig on
top of map-reduce as well.  Good stuff.

The only gotcha I see:  We wrote (extended really) a SWIG wrapper on top
of the C libhdfs library so we could interface to Python.  It looks like the 
libhdfs
connect logic isn't using the URI schemes 100% correctly -- I doubt S3 will
work through there.  But that looks like an easy fix if that's the case (I 
think).
Testing that next ...

--
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

Reply via email to