[ 
https://issues.apache.org/jira/browse/HADOOP-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264777#comment-15264777
 ] 

Andrew Ash commented on HADOOP-13064:
-------------------------------------

[~jellis] those two do look pretty related -- were you testing with version 
2.7.1 by chance?  Can you check if your test passes in 2.7.2 which contains 
fixes for both those tickets?

> LineReader reports incorrect number of bytes read resulting in correctness 
> issues using LineRecordReader
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13064
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Joe Ellis
>            Priority: Critical
>         Attachments: LineReaderTest.java
>
>
> The specific issue we were seeing with LineReader is that when we pass in 
> '\r\n' as the line delimiter the number of bytes that it claims to have read 
> is less than what it actually read. We narrowed this down to only happening 
> when the delimiter is split across the internal buffer boundary, so if 
> fillbuffer fills with "row\r" and the next call fills with "\n" then the 
> number of bytes reported would be 4 rather than 5.
> This results in correctness issues in LineRecordReader because if this off by 
> one issue is seen enough times when reading a split then it will continue to 
> read records past its split boundary, resulting in records appearing to come 
> from multiple splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to