date:20080304

clustering problem

2008-03-04 Thread Ved Prakash

Hi Guys,

I am having problems creating clusters on 2 machines

Machine configuration :
Master : OS: Fedora core 7
 hadoop-0.15.2

hadoop-site.xml listing


  
fs.default.name
anaconda:50001
  
  
mapred.job.tracker
anaconda:50002
  
  
dfs.replication
2
  
  
dfs.secondary.info.port
50003
  
  
dfs.info.port
50004
  
  
mapred.job.tracker.info.port
50005
  
  
tasktracker.http.port
50006
  


conf/masters
localhost

conf/slaves
anaconda
v-desktop

the datanode, namenode, secondarynamenode seems to be working fine on the
master but on slave this is not the case

slave
OS: Ubuntu

hadoop-site.xml listing

same as master

in the logs on slave machine I see this

2008-03-05 12:15:25,705 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2008-03-05 12:15:25,920 FATAL org.apache.hadoop.dfs.DataNode: Incompatible
build versions: namenode BV = Unknown; datanode BV = 607333
2008-03-05 12:15:25,926 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: Incompatible build versions: namenode BV = Unknown;
datanode BV = 607333
at org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:316)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:238)
at org.apache.hadoop.dfs.DataNode.(DataNode.java:206)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1575)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1519)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1540)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1711)

Can someone help me with this please.

Thanks

Ved

Using Sorted Files For Filtering Input (File Index)

2008-03-04 Thread Andy Pavlo

Let's say I have a simple data file with  pairs and the entire 
file is ascending sorted order by 'value'. What I want to be able to do is 
filter the data so that the map function is only invoked with  
pairs where 'value' is greater than some input value.

Does such a feature already exist or would I need to implement my own 
RecordReader to do this filter? Is this the right place to do this in 
Hadoop's input pipeline?

What I essentially want is a cheap index. By sorting the values ahead of time, 
you could just do a binary search on the InputSplit until you found the 
starting value that satisfies the predicate. The RecordReader would then 
start this point in the file, read all the lines in, and pass the records to 
map().

Any thoughts?
-- 
Andy Pavlo
[EMAIL PROTECTED]

Re: Processing multiple files - need to identify in map

2008-03-04 Thread lohit

Hi Tarandeep,

the jobconf you get in your configure() method has the info 
It is available via  map.input.file parameter (more info here 
http://wiki.apache.org/hadoop/TaskExecutionEnvironment)

Yes, you can have multiple input directories.
You can use JobConf::addInputPath() to add more input paths before submitting 
your job
more info here
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#addInputPath(org.apache.hadoop.fs.Path)

Thanks,
Lohit

- Original Message 
From: Tarandeep Singh <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, March 4, 2008 5:38:41 PM
Subject: Processing multiple files - need to identify in map

Hi,

I need to identify from which file, a key came from, in the map phase.
Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Chris K Wensel


more specifically, call
jobConf.get( "map.input.file" );

in the configure(JobConf conf) method of your mapper.

there are some cases this won't work. but in general it works fine.

and yes, you can add many input directories.

jobConf.addInputPath(...)

On Mar 4, 2008, at 5:54 PM, Ted Dunning wrote:



Yes.

Use the configure method which is called each time a new file is  
used in the

map.  Save the file name in a field of the mapper.


The other alternative is to derive a new InputFormat that remembers  
the

input file name.


On 3/4/08 5:38 PM, "Tarandeep Singh" <[EMAIL PROTECTED]> wrote:


Hi,

I need to identify from which file, a key came from, in the map  
phase.

Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my  
map

reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran




Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Ted Dunning


Yes.

Use the configure method which is called each time a new file is used in the
map.  Save the file name in a field of the mapper.


The other alternative is to derive a new InputFormat that remembers the
input file name.


On 3/4/08 5:38 PM, "Tarandeep Singh" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I need to identify from which file, a key came from, in the map phase.
> Is it possible ?
> 
> What I have is multiple types of log files in one directory that I
> need to process for my application. Right now, I am relying on the
> structure of the log files (e.g if a line starts with "weblog", the
> line came from Log File A or if the number of tab-separated fields in
> the line is N, then it is Log File B)
> 
> Is there a better way to do this ?
> 
> Is there a way that the Hadoop framework passes me as a key the path
> of the file (right now it is the offset in the file, I guess) ?
> 
> One more related question - can I set 2 directories as input to my map
> reduce program ? This is just to avoid copying files from one log
> directory to another.
> 
> thanks,
> Taran

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Aaron Kimball

the Reporter object given to the map() method can get you the InputSplit 
that is being mapped over. If this subclasses FileInputSplit, you can 
grab the path name from there.


- Aaron

Tarandeep Singh wrote:

Hi,

I need to identify from which file, a key came from, in the map phase.
Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran

Processing multiple files - need to identify in map

2008-03-04 Thread Tarandeep Singh

Hi,

I need to identify from which file, a key came from, in the map phase.
Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran

Re: What's the best way to get to a single key?

2008-03-04 Thread Ted Dunning

And this, btw, provides a rationale for having a key in the reducer output.

On 3/4/08 12:53 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> So you should be able to
> just switch from specifying SequenceFileOutputFormat to
> MapFileOutputFormat in your jobs and everything should work the same
> except you'll have index files that permit random access.

Re: What's the best way to get to a single key?

2008-03-04 Thread Doug Cutting


Xavier Stevens wrote:

Is there a way to do this when your input data is using SequenceFile
compression?


Yes.  A MapFile is simply a directory containing two SequenceFiles named 
"data" and "index".  MapFileOutputFormat uses the same compression 
parameters as SequenceFileOutputFormat.  SequenceFileInputFormat 
recognizes MapFiles and reads the "data" file.  So you should be able to 
just switch from specifying SequenceFileOutputFormat to 
MapFileOutputFormat in your jobs and everything should work the same 
except you'll have index files that permit random access.


Doug

Re: configuration access

2008-03-04 Thread Steve Sapovits

Arun C Murthy wrote:

> Can you re-check if the right paths (for your config files) are on the
> CLASSPATH?

That was it.  Thanks.

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

Re: Hadoop / HDFS over WAN

2008-03-04 Thread Raghu Angadi



I don't think there are any known deployments of Hadoop over WAN. There 
aren't any WAN specific tweaks or configuration settings present that I 
know of.


Hadoop apps tend to be data intensive. Any more details what the 
configuration likely to be? Will HDFS itself be across WAN?


Some exmpale tweaks could be, if you have a high latency and high 
bandwidth across WAN, each socket connection might need use large 
recv/send buffers for TCP sockets to mask latency.


Raghu.

Tom Deckers (tdeckers) wrote:

How well does HDFS perform over WAN links?  Any best practices to take
into account?

 


Thanks!
Tom.

RE: What's the best way to get to a single key?

2008-03-04 Thread Xavier Stevens

Is there a way to do this when your input data is using SequenceFile
compression?

Thanks,

-Xavier 

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 03, 2008 2:52 PM
To: core-user@hadoop.apache.org
Subject: Re: What's the best way to get to a single key?

Use MapFileOutputFormat to write your data, then call:

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/
MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],
%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)

The documentation is pretty sparse, but the intent is that you open a
MapFile.Reader for each mapreduce output, pass the partitioner used, the
key, and the value to be read into.

A MapFile maintains an index of keys, so the entire file need not be
scanned.  If you really only need the value of a single key then you
might avoid opening all of the output files.  In that case you could
might use the Partitioner and the MapFile API directly.

Doug

Xavier Stevens wrote:
> I am curious how others might be solving this problem.  I want to 
> retrieve a record from HDFS based on its key.  Are there any methods 
> that can shortcut this type of search to avoid parsing all data until 
> you find it?  Obviously Hbase would do this as well, but I wanted to 
> know if there is a way to do it using just Map/Reduce and HDFS.
> 
> Thanks,
> 
> -Xavier
>

Re: map/reduce function on xml string

2008-03-04 Thread Colin Evans

Here's the code.  If folks are interested, I can submit it as a patch as 
well.




Prasan Ary wrote:

Colin,
  Is it possible that you share some of the code with us?
   
  thx,

  Prasan

Colin Evans <[EMAIL PROTECTED]> wrote:
  We ended up subclassing TextInputFormat and adding a custom RecordReader 
that starts and ends record reads on tags. The

StreamXmlRecordReader class is a good reference for this.



Prasan Ary wrote:
  

Hi All,
I am writing a java implementation for my map/reduce function on hadoop.
Input to this is a xml file, and the map function has to process a well formed 
xml records. So far I have been unable to split the xml file at xml record 
boundary to feed into my map function.
Can anybody point me to resources where forcing file split at desired boundary 
is explained ?

thx,
Pra.


-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.






   
-

Looking for last minute shopping deals?  Find them fast with Yahoo! Search.
  


package com.metaweb.hadoop.util;

import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DataOutputBuffer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.*;

import java.io.IOException;

/**
 * Reads records that are delimited by a specifc begin/end tag.
 */
public class XmlInputFormat extends TextInputFormat {
public static final String START_TAG_KEY = "xmlinput.start";
public static final String END_TAG_KEY = "xmlinput.end";

public void configure(JobConf jobConf) {
super.configure(jobConf);
}

public RecordReader getRecordReader(InputSplit inputSplit, JobConf jobConf, 
Reporter reporter) throws IOException {
return new XmlRecordReader((FileSplit) inputSplit, jobConf);
}

public static class XmlRecordReader implements RecordReader {
private byte[] startTag;
private byte[] endTag;
private long start;
private long end;
private FSDataInputStream fsin;
private DataOutputBuffer buffer = new DataOutputBuffer();

public XmlRecordReader(FileSplit split, JobConf jobConf) throws 
IOException {
startTag = jobConf.get("xmlinput.start").getBytes("utf-8");
endTag = jobConf.get("xmlinput.end").getBytes("utf-8");

// open the file and seek to the start of the split
start = split.getStart();
end = start + split.getLength();
Path file = split.getPath();
FileSystem fs = file.getFileSystem(jobConf);
fsin = fs.open(split.getPath());
fsin.seek(start);
}

public boolean next(WritableComparable key, Writable value) throws 
IOException {
if (fsin.getPos() < end) {
if (readUntilMatch(startTag, false)) {
try {
buffer.write(startTag);
if (readUntilMatch(endTag, true)) {
((Text) key).set(Long.toString(fsin.getPos()));
((Text) value).set(buffer.getData(), 0, 
buffer.getLength());
return true;
}
}
finally {
buffer.reset();
}
}
}
return false;
}

public WritableComparable createKey() {
return new Text();
}

public Writable createValue() {
return new Text();
}

public long getPos() throws IOException {
return fsin.getPos();
}

public void close() throws IOException {
fsin.close();
}

public float getProgress() throws IOException {
return ((float) (fsin.getPos() - start)) / ((float) (end - start));
}

/

private boolean readUntilMatch(byte[] match, boolean withinBlock) 
throws IOException {
int i = 0;
while (true) {
int b = fsin.read();
// end of file:
if (b == -1) return false;
// save to buffer:
if (withinBlock) buffer.write(b);

// check if we're matching:
if (b == match[i]) {
i++;
if (i >= match.length) return true;
} else i = 0;
// see if we've passed the stop point:
if(!withinBlock && i == 0 && fsin.getPos() >= end) return false;
}
}
}
}

Re: configuration access

2008-03-04 Thread Arun C Murthy



On Mar 4, 2008, at 8:34 AM, Steve Sapovits wrote:



Can someone point me to working examples of config access?  I'm  
trying to use
the FileSystem and Configuration classes to get the  
'fs.default.name' value.  I see
that the Configuration object thinks it's loaded hadoop-default.xml  
and hadoop-site.xml
files, but no matter what I ask it for I get a default back and not  
what's configured.


Can you re-check if the right paths (for your config files) are on  
the CLASSPATH?


You should be able to get the configured value via:

String fsName = conf.get("fs.default.name");

Arun

A working example of this would probably set me straight.  I tried  
explicitly adding
resources hard-coding the full path names of the config. files ...  
same thing.


--
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

Re: map/reduce function on xml string

2008-03-04 Thread Prasan Ary

Colin,
  Is it possible that you share some of the code with us?

  thx,
  Prasan

Colin Evans <[EMAIL PROTECTED]> wrote:
  We ended up subclassing TextInputFormat and adding a custom RecordReader 
that starts and ends record reads on tags. The
StreamXmlRecordReader class is a good reference for this.

Prasan Ary wrote:
> Hi All,
> I am writing a java implementation for my map/reduce function on hadoop.
> Input to this is a xml file, and the map function has to process a well 
> formed xml records. So far I have been unable to split the xml file at xml 
> record boundary to feed into my map function.
> Can anybody point me to resources where forcing file split at desired 
> boundary is explained ?
> 
> thx,
> Pra.
>
> 
> -
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
> 

-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

configuration access

2008-03-04 Thread Steve Sapovits


Can someone point me to working examples of config access?  I'm trying to use
the FileSystem and Configuration classes to get the 'fs.default.name' value.  I 
see
that the Configuration object thinks it's loaded hadoop-default.xml and 
hadoop-site.xml
files, but no matter what I ask it for I get a default back and not what's 
configured.
A working example of this would probably set me straight.  I tried explicitly 
adding 
resources hard-coding the full path names of the config. files ... same thing.

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

Re: Could we call hadoop a distributed OS?

2008-03-04 Thread jacques defarge

It could be defined as a middleware in a broad sense. But I don't think this
is still a good definition of hadoop. It is a collection of distributed
tools for large data processing.


On Tue, Mar 4, 2008 at 12:34 PM, wang daming <[EMAIL PROTECTED]>
wrote:

> how about middleware?
>
>
>
> 2008/3/3, Amar Kamat <[EMAIL PROTECTED]>:
> >
> > HADOOP is not a distributed OS. It requires some OS on
> > which it can be run. Also its not an application. Its a
> > platform for running applications on the grid. There are certain class
> of
> > applications (like the ones to do with web) that can make use of this
> > platform (service) to run data intensive applications that have inherent
> > parallelism, on the grid. According to me appropriate classification
> would
> > be a distributed computing software.
> > Amar
> > On Mon, 3 Mar 2008,
> > Steve Han wrote:
> >
> > > Or it just a distributed application?How we can learn more about the
> > design
> > > ideas of Hadoop.Thanks
> > >
> > >
> >
>

Re: Could we call hadoop a distributed OS?

2008-03-04 Thread wang daming

how about middleware?



2008/3/3, Amar Kamat <[EMAIL PROTECTED]>:
>
> HADOOP is not a distributed OS. It requires some OS on
> which it can be run. Also its not an application. Its a
> platform for running applications on the grid. There are certain class of
> applications (like the ones to do with web) that can make use of this
> platform (service) to run data intensive applications that have inherent
> parallelism, on the grid. According to me appropriate classification would
> be a distributed computing software.
> Amar
> On Mon, 3 Mar 2008,
> Steve Han wrote:
>
> > Or it just a distributed application?How we can learn more about the
> design
> > ideas of Hadoop.Thanks
> >
> >
>

Hadoop / HDFS over WAN

2008-03-04 Thread Tom Deckers (tdeckers)

How well does HDFS perform over WAN links?  Any best practices to take
into account?

 

Thanks!
Tom.

Re: org.apache.hadoop.dfs.NameNode: java.lang.NullPointerException

2008-03-04 Thread André Martin


Hi Raghu,
thx for filing :-) Btw. I sent dhruba the requested namenode log 
yesterday. Hopefully it helps.


Cu on the 'net,
   Bye - bye,

  < André   èrbnA >

Raghu Angadi wrote:

filed https://issues.apache.org/jira/browse/HADOOP-2934

Raghu.

André Martin wrote:

Hi everyone,
the namenode doesn't re-start properly:

2008-03-02 01:25:25,120 INFO org.apache.hadoop.dfs.NameNode: 
STARTUP_MSG:

/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = se09/141.76.xxx.xxx
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2008-02-28_11-01-44
STARTUP_MSG:   build = 
http://svn.apache.org/repos/asf/hadoop/core/trunk -r 631915; 
compiled by 'hudson' on Thu Feb 28 11:11:52 UTC 2008

/
2008-03-02 01:25:25,247 INFO 
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing RPC Metrics 
with serverName=NameNode, port=8000
2008-03-02 01:25:25,254 INFO org.apache.hadoop.dfs.NameNode: 
Namenode up at: se09.inf.tu-dresden.de/141.76.44.xxx:xxx
2008-03-02 01:25:25,257 INFO 
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics 
with processName=NameNode, sessionId=null
2008-03-02 01:25:25,260 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2008-03-02 01:25:25,358 INFO org.apache.hadoop.fs.FSNamesystem: 
fsOwner=amartin,students
2008-03-02 01:25:25,359 INFO org.apache.hadoop.fs.FSNamesystem: 
supergroup=supergroup
2008-03-02 01:25:25,359 INFO org.apache.hadoop.fs.FSNamesystem: 
isPermissionEnabled=true
2008-03-02 01:25:29,887 ERROR org.apache.hadoop.dfs.NameNode: 
java.lang.NullPointerException
at 
org.apache.hadoop.dfs.FSImage.readINodeUnderConstruction(FSImage.java:950) 

at 
org.apache.hadoop.dfs.FSImage.loadFilesUnderConstruction(FSImage.java:919) 


at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:749)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:634)
at 
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)
at 
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
at 
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:261)

at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:242)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:176)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:162)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:851)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:860)

2008-03-02 01:25:29,888 INFO org.apache.hadoop.dfs.NameNode: 
SHUTDOWN_MSG:

/
SHUTDOWN_MSG: Shutting down NameNode at se09/141.76.xxx.xxx
/

Any ideas? Looks like a bug...

Cu on the 'net,
   Bye - bye,

  < André   èrbnA >

Re: Namenode fails to re-start after cluster shutdown

2008-03-04 Thread André Martin


OK, that makes sense - thx!

Cu on the 'net,
Bye - bye,

< André   e`rbnA >


Konstantin Shvachko wrote:
Also, the namenode still says: "Upgrade for version -13 has been 
completed. Upgrade is not finalized." even 15 hours after launching 
it :-/


You can -finalizeUpgrade if you don't need the previous version anymore.

http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Upgrade+and+Rollback

quietmode in configuration.java

2008-03-04 Thread Till Varoquaux

Hi,

I am trying to set up Hadoop for some quick testing, and I've run into
a small problem.  My current configuration doesn't seem to get fully
read, but no errors are thrown.  A quick read of the source code
turned up the quietmode variable, which seems to be the culprit.  I'd
like to set quietmode to false, but short of changing the source I
can't find a way.  Can anyone help?

Till

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2008-03-04 Thread Amareshwari Sri Ramadasu


Hi,

Here is some discussion on how to run wordcount-nopipe :
http://www.nabble.com/pipe-application-error-td13840804.html
Probably makes sense for your question.

Thanks
Amareshwari
11 Nov. wrote:

I traced into the c++ recordreader code:
  WordCountReader(HadoopPipes::MapContext& context) {
std::string filename;
HadoopUtils::StringInStream stream(context.getInputSplit());
HadoopUtils::deserializeString(filename, stream);
struct stat statResult;
stat(filename.c_str(), &statResult);
bytesTotal = statResult.st_size;
bytesRead = 0;
cout << filename<:
  

hi colleagues,
   I have set up the single node cluster to test pipes examples.
   wordcount-simple and wordcount-part work just fine. but
wordcount-nopipe can't run. Here is my commnad line:

 bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input
input/ -output out-dir-nopipe1

and here is the error message printed on my console:

08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to
process : 1
08/03/03 23:23:07 INFO mapred.JobClient: Running job:
job_200803032218_0004
08/03/03 23:23:08 INFO mapred.JobClient:  map 0% reduce 0%
08/03/03 23:23:11 INFO mapred.JobClient: Task Id :
task_200803032218_0004_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(
Application.java:138)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(
PipesMapRunner.java:83)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at org.apache.hadoop.mapred.TaskTracker$Child.main(
TaskTracker.java:1787)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java
:313)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java
:335)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(
BinaryProtocol.java:112)

task_200803032218_0004_m_00_0:
task_200803032218_0004_m_00_0:
task_200803032218_0004_m_00_0:
task_200803032218_0004_m_00_0: Hadoop Pipes Exception: failed to open
at /home/hadoop/hadoop-0.15.2-single-cluster
/src/examples/pipes/impl/wordcount-nopipe.cc:67 in
WordCountReader::WordCountReader(HadoopPipes::MapContext&)


Could anybody tell me how to fix this? That will be appreciated.
Thanks a lot!

clustering problem

Using Sorted Files For Filtering Input (File Index)

Re: Processing multiple files - need to identify in map

Re: Processing multiple files - need to identify in map

Re: Processing multiple files - need to identify in map

Re: Processing multiple files - need to identify in map

Processing multiple files - need to identify in map

Re: What's the best way to get to a single key?

Re: What's the best way to get to a single key?

Re: configuration access

Re: Hadoop / HDFS over WAN

RE: What's the best way to get to a single key?

Re: map/reduce function on xml string

Re: configuration access

Re: map/reduce function on xml string

configuration access

Re: Could we call hadoop a distributed OS?

Re: Could we call hadoop a distributed OS?

Hadoop / HDFS over WAN

Re: org.apache.hadoop.dfs.NameNode: java.lang.NullPointerException

Re: Namenode fails to re-start after cluster shutdown

quietmode in configuration.java

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

23 matches

Site Navigation

Mail list logo

Footer information