[jira] [Resolved] (HADOOP-13064) LineReader reports incorrect number of bytes read resulting in correctness issues using LineRecordReader

Joe Ellis (JIRA) Thu, 05 May 2016 11:55:38 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joe Ellis resolved HADOOP-13064.
--------------------------------
    Resolution: Duplicate

> LineReader reports incorrect number of bytes read resulting in correctness 
> issues using LineRecordReader
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13064
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Joe Ellis
>            Priority: Critical
>         Attachments: LineReaderTest.java
>
>
> The specific issue we were seeing with LineReader is that when we pass in 
> '\r\n' as the line delimiter the number of bytes that it claims to have read 
> is less than what it actually read. We narrowed this down to only happening 
> when the delimiter is split across the internal buffer boundary, so if 
> fillbuffer fills with "row\r" and the next call fills with "\n" then the 
> number of bytes reported would be 4 rather than 5.
> This results in correctness issues in LineRecordReader because if this off by 
> one issue is seen enough times when reading a split then it will continue to 
> read records past its split boundary, resulting in records appearing to come 
> from multiple splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HADOOP-13064) LineReader reports incorrect number of bytes read resulting in correctness issues using LineRecordReader

Reply via email to