BytesWritable / SequenceFile yields dummy linefeed at end as soon as content 
has one or more linefeeds.
-------------------------------------------------------------------------------------------------------

                 Key: HADOOP-7760
                 URL: https://issues.apache.org/jira/browse/HADOOP-7760
             Project: Hadoop Common
          Issue Type: Bug
          Components: record
    Affects Versions: 0.20.2
         Environment: Easily reproducable on Debian Linux cluster but also on 
my Arch Linux desktop.

I am aware there are some newer releases in the 0.20 series, but all changelogs 
and release note links for those @ 
http://hadoop.apache.org/common/releases.html are broken, so I can't check if 
this has been fixed and/or whether it's safe to upgrade.
            Reporter: Dieter Plaetinck
            Priority: Minor


I create SequenceFiles which have BytesWritable as values.
I notice that if I store content which contains no linefeeds ("\n") or one 
linefeed, in the value, the value can also be read out of the sequencefile 
properly.
However, as soon as I store input which contains two or more linefeeds (which 
is actually pretty much always the case), during the process of writing to the 
sequencefile and reading my data back, one *extra* linefeed is yielded at the 
end of the value, a linefeed which did not exist in the input.
So this effectively corrupts my data, although i could write a hacky workaround 
for it.
I have written a program that demonstrates the behavior, by showing what 
happens when writing 2 sequencefiles:
one that has a record which value contains one linefeeds.
another that has a record which value contains two linefeeds.
Upon reading, the latter value will contain 3 linefeeds.

Test file is : http://pastie.org/2728797

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to