So after squinting at this a bit I feel this is the format:

Length of string: 00 23

String:
68 64 66 73 3A
2F 2F 6E 79 63
2D 71 77 73 2D
30 32 39 2F 69
6E 2D 64 69 72
2F 77 6F 72 64
73 2E 74 78 74

Start Offset: 00 00 00 00 00 00 00 00
Size: 00 00 00 00 00 02 C4 AC

And this should be the split for file "hdfs://nyc-qws-029/in-dir/words.txt"
from offset 0 to 181420.

That said, is there some reason why this is the format? I don't want the
deserialiser I write to break from one version of Hadoop to the next.

Roshan


On Tue, Jun 16, 2009 at 9:41 AM, Roshan James <
roshan.james.subscript...@gmail.com> wrote:

> Why dont we convert input split information into the same string format
> that is displayed in the webUI? Something like this -
> "hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185". Its a simple format
> and we can always parse such a string in C++.
>
> Is there some reason for the current binary format? If there is good reason
> for it, I am game to write such a deserialiser class. Is there some
> reference for this binary format that I can use to write the deserialiser?
>
> Roshan
>
>
> On Mon, Jun 15, 2009 at 5:40 PM, Owen O'Malley <omal...@apache.org> wrote:
>
>> *Sigh* We need Avro for input splits.
>>
>> That is the expected behavior. It would be great if someone wrote a C++
>> FileInputSplit class that took a binary string and converted it back to a
>> filename, offset, and length.
>>
>> -- Owen
>>
>
>

Reply via email to