By "not exactly small, do you mean each line is long or that there
are many records?

Well, not small in the meaning, that even I could get my boss to allow me to
give you the data, transfering it might be painful. (E.g. the job that
aborted had about 12M lines with with ~2.6GB data => the lines are not really
long, but longer than 80 chars)

Ah, I see. Would it be possible to run the Java sort example over your data? It would be helpful to verify that this is not specific to streaming.

${hadoop} jar hadoop-0.17-examples.jar sort -m <num maps> \
  -r 88 \
  -inFormat org.apache.hadoop.mapred.TextInputFormat \
  -outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
  -outKey org.apache.hadoop.io.LongWritable \
  -outValue org.apache.hadoop.io.Text \
  <input dir> <ouput dir (ignored)>

This should be close to streaming with cat as the mapper.

util.QuickSort is only used on the map side, so this shouldn't have
anything to do with the reduce. Is it always and only the *last* map

Nope, although sometimes it happens earlier.

Is it always the same splits when you re-run your job? Though distributing the full dataset may not be feasible, if there are splits that fail consistently then we might be able to work from that.

task that fails? If I sent you a patch that would print a trace with
the partitions, would you mind running it? Do you have any other
settings that differ from the defaults? -C

If you tell me how to apply it, I'm happy to. (I'm not the biggest Java hotshot on this planet, I'm just using the provided 0.17.0 jars, Guess I
would have to patch the source and run ant. On all nodes or just the
control?).

Unfortunately, it would need to be deployed to all the TaskTrackers, and it would be pretty invasive (i.e. I was planning on logging all the offsets from the sort as the stack unwinds from the exception). I'll test something and send it to you, and if it's not too much trouble you can try it.

My hadoop-site.xml:
[snip]

Nothing suspect, there. -C

Reply via email to