Re: Hadoop Pipes Error

Steve Loughran Thu, 31 Mar 2011 02:55:02 -0700

On 31/03/11 07:53, Adarsh Sharma wrote:

Thanks Amareshwari,


here is the posting :
The *nopipe* example needs more documentation. It assumes that it is run
with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
*WordCountInputFormat*.java, which has a very specific input split
format. By running with a TextInputFormat, it will send binary bytes as
the input split and won't work right. The *nopipe* example should
probably be recoded *to* use libhdfs *too*, but that is more complicated
*to* get running as a unit test. Also note that since the C++ example is
using local file reads, it will only work on a cluster if you have nfs
or something working across the cluster.

Please need if I'm wrong.

I need to run it with TextInputFormat.

If posiible Please explain the above post more clearly.



Here goes.

1.
> The *nopipe* example needs more documentation. It assumes that it is run
> with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
> *WordCountInputFormat*.java, which has a very specific input split
> format. By running with a TextInputFormat, it will send binary bytes as
> the input split and won't work right.

The input for the pipe is the content generated by
src/test/org/apache/hadoop/mapred/pipes/WordCountInputFormat.java

This is covered here.
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v1.0

I would recommend following the tutorial here, or either of the books"Hadoop the definitive guide" or "Hadoop in Action". Both authors earntheir money by explaining how to use Hadoop, which is why both books aregood explanations of it.


2.
>The *nopipe* example should
> probably be recoded *to* use libhdfs *too*, but that is more complicated
> *to* get running as a unit test.

Ignore that -it's irrelevant for your problem as owen is discussingautomated testing.


3.

> Also note that since the C++ example is
> using local file reads, it will only work on a cluster if you have nfs
> or something working across the cluster.

unless your cluster has a shared filesystem at the OS level it won'twork. Either have a shared filesystem like NFS, or run it on a singlemachine.


-Steve

Re: Hadoop Pipes Error

Reply via email to