Thanks, it looks like I can write a line reader in C++ that roughly does
what the Java version does. This also means that I can deserialise my own
custom formats as well. Thanks!

Roshan

On Tue, Jun 16, 2009 at 12:22 PM, Owen O'Malley <omal...@apache.org> wrote:

> Sorry, I forget how much isn't clear to people who are just starting.
>
> FileInputFormat creates FileSplits. The serialization is very stable and
> can't be changed without breaking things. The reason that pipes can't
> stringify it is that the string form of input splits are ambiguous (and
> since it is user code, we really can't make assumptions about it). The
> format of FileSplit is:
>
> <16 bit filename byte length>
> <filename in bytes>
> <64 bit offset>
> <64 bit length>
>
> Technically the filename uses a funky utf-8 encoding, but in practice as
> long as the filename has ascii characters they are ascii. Look at
> org.apache.hadoop.io.UTF.writeString for the precise definition.
>
> -- Owen
>

Reply via email to