After some debugging I see that the "string" returned by getInputSplit() has several non-text characters in it. When dumped as hex it looks like this -
00 23 68 64 66 73 3A 2F 2F 6E 79 63 2D 71 77 73 2D 30 32 39 2F 69 6E 2D 64 69 72 2F 77 6F 72 64 73 2E 74 78 74 00 00 00 00 00 00 00 00 00 00 00 00 00 02 C4 AC In text, this is roughly - ")hdfs://nyc-qws-029/in-dir/words912415.txt�������������Р". Now I was expecting a human readable string, something like "hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185". i.e. a description of the split that I can parse out. After a couple of quick glances at the the pipes code it looks like the Java InputSplit object it passed to the C++ wrapper as is, without any explicit conversion to string. Since I am new to Hadoop, I am not sure if this is a bug or something I am doing wrong. Please advice, Roshan On Fri, Jun 12, 2009 at 7:02 PM, Roshan James < roshan.james.subscript...@gmail.com> wrote: > I am working with the wordcount example of Hadoop Pipes (0.20.0). I have a > 7 machine cluster. > > When I look at MapContext.getInputSplit() in my map function, I see that it > returns the empty string. I was expecting to see a filename and some sort of > range specification of so. I am using the default java record reader right > now. Is this a know bug or am I missing something? > > best, > Roshan >