Why dont we convert input split information into the same string format that is displayed in the webUI? Something like this - "hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185". Its a simple format and we can always parse such a string in C++.
Is there some reason for the current binary format? If there is good reason for it, I am game to write such a deserialiser class. Is there some reference for this binary format that I can use to write the deserialiser? Roshan On Mon, Jun 15, 2009 at 5:40 PM, Owen O'Malley <omal...@apache.org> wrote: > *Sigh* We need Avro for input splits. > > That is the expected behavior. It would be great if someone wrote a C++ > FileInputSplit class that took a binary string and converted it back to a > filename, offset, and length. > > -- Owen >