I need to process a dataset that contains text records of fixed length
in bytes.  For example, each record may be 100 bytes in length, with
the first field being the first 10 bytes, the second field being the
second 10 bytes, etc...  There are no newlines on the file.  Field
values have been either whitespace-padded or truncated to fit within
the specific locations in these fixed-width records.

Does Hadoop have an InputFormat to support processing of such files?
I looked but couldn't find one.

Of course, I could pre-process the file (outside of Hadoop) to put
newlines at the end of each record, but I'd prefer not to require such
a prep step.

Thanks.

Reply via email to