[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

Hadoop QA (JIRA) Thu, 17 Sep 2015 00:36:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791717#comment-14791717
 ]


Hadoop QA commented on MAPREDUCE-6481:
--------------------------------------

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 58s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | mapreduce tests |   1m 45s | Tests passed in 
hadoop-mapreduce-client-core. |
| | |  68m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756420/MAPREDUCE-6481.000.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0832b38 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5998/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5998/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5998/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5998/console |


This message was automatically generated.

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6481
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.7.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
>     if (appendLength > 0) {
>         if (ambiguousByteCount > 0) {
>           str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>           //appending the ambiguous characters (refer case 2.2)
>           bytesConsumed += ambiguousByteCount;
>           ambiguousByteCount=0;
>         }
>         str.append(buffer, startPosn, appendLength);
>         txtLength += appendLength;
>       }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

Reply via email to