Re: Stackoverflow

Chris Douglas Tue, 03 Jun 2008 11:35:26 -0700

By "not exactly small, do you mean each line is long or that there
are many records?
Well, not small in the meaning, that even I could get my boss toallow me to
give you the data, transfering it might be painful. (E.g. the job that
aborted had about 12M lines with with ~2.6GB data => the lines arenot really
long, but longer than 80 chars)

Ah, I see. Would it be possible to run the Java sort example overyour data? It would be helpful to verify that this is not specific tostreaming.


${hadoop} jar hadoop-0.17-examples.jar sort -m <num maps> \
  -r 88 \
  -inFormat org.apache.hadoop.mapred.TextInputFormat \
  -outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
  -outKey org.apache.hadoop.io.LongWritable \
  -outValue org.apache.hadoop.io.Text \
  <input dir> <ouput dir (ignored)>

This should be close to streaming with cat as the mapper.

util.QuickSort is only used on the map side, so this shouldn't have
anything to do with the reduce. Is it always and only the *last* map


Nope, although sometimes it happens earlier.

Is it always the same splits when you re-run your job? Thoughdistributing the full dataset may not be feasible, if there aresplits that fail consistently then we might be able to work from that.

task that fails? If I sent you a patch that would print a trace with
the partitions, would you mind running it? Do you have any other
settings that differ from the defaults? -C
If you tell me how to apply it, I'm happy to. (I'm not the biggestJavahotshot on this planet, I'm just using the provided 0.17.0 jars,Guess I
would have to patch the source and run ant. On all nodes or just the
control?).

Unfortunately, it would need to be deployed to all the TaskTrackers,and it would be pretty invasive (i.e. I was planning on logging allthe offsets from the sort as the stack unwinds from the exception).I'll test something and send it to you, and if it's not too muchtrouble you can try it.

My hadoop-site.xml:
[snip]


Nothing suspect, there. -C

Re: Stackoverflow

Reply via email to