[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597824#comment-14597824 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2183 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2183/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597782#comment-14597782 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #235 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/235/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597721#comment-14597721 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #226 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/226/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597707#comment-14597707 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2165 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2165/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/CHANGES.txt org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597533#comment-14597533 ] Hudson commented on MAPREDUCE-5948: --- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #237 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/237/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597511#comment-14597511 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-Yarn-trunk #967 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/967/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596736#comment-14596736 ] Hudson commented on MAPREDUCE-5948: --- FAILURE: Integrated in Hadoop-trunk-Commit #8048 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8048/]) MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/CHANGES.txt org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593600#comment-14593600 ] Jason Lowe commented on MAPREDUCE-5948: --- Thanks for taking this up, Akira. Will try to review the patch shortly. [~Markovich] could you provide more details on the duplicate records with bz2? A similar problem was reported in MAPREDUCE-6299 but there are no details to work with. Note that the latest patch will not address any issues with bz2, as it only fixes the handling of duplicate records with uncompressed input. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593729#comment-14593729 ] Jason Lowe commented on MAPREDUCE-5948: --- +1 for the latest patch. This should resolve the dropped/duplicate problems with uncompressed input. We can tackle the reported duplicate records for bz2 in MAPREDUCE-6299. Will commit this early next week if there are no objections. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580875#comment-14580875 ] Hadoop QA commented on MAPREDUCE-5948: -- \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 15s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 18s | Tests passed in hadoop-common. | | {color:green}+1{color} | mapreduce tests | 1m 43s | Tests passed in hadoop-mapreduce-client-core. | | | | 67m 24s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12738857/MAPREDUCE-5948.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c7729ef | | hadoop-common test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/console | This message was automatically generated. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch, MAPREDUCE-5948.003.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579825#comment-14579825 ] Hadoop QA commented on MAPREDUCE-5948: -- \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 24s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 24m 27s | Tests passed in hadoop-common. | | {color:green}+1{color} | mapreduce tests | 1m 44s | Tests passed in hadoop-mapreduce-client-core. | | | | 70m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12738701/MAPREDUCE-5948.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3c2397c | | hadoop-common test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/console | This message was automatically generated. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Akira AJISAKA Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, MAPREDUCE-5948.002.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570468#comment-14570468 ] Akira AJISAKA commented on MAPREDUCE-5948: -- Hi [~vinayrpet], are you willing to take over this issue? If you are not interested in the work, I'd like to rebase the patch and address the review comments. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570486#comment-14570486 ] Vinayakumar B commented on MAPREDUCE-5948: -- Sure, Go ahead :) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569101#comment-14569101 ] Rushabh S Shah commented on MAPREDUCE-5948: --- Sorry this fell off my radar too. I don't have enough cycles to work on this right now. We can move this to next release. Or if someone is interested to work on this, I am more than happy to let him/her take. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510783#comment-14510783 ] Rivkin Andrey commented on MAPREDUCE-5948: -- Today, faced with this problem. In Bz2 files multibyte record delimiter leads to record duplicates. In uncommpresed files leads to record drop. Hadoop version 2.5.0. Also we can't set hex values, for example /x01 in hadoop conf. (textinputformat.record.delimiter). org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510791#comment-14510791 ] Hadoop QA commented on MAPREDUCE-5948: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c8d7290 | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5443/console | This message was automatically generated. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046456#comment-14046456 ] Hadoop QA commented on MAPREDUCE-5948: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//console This message is automatically generated. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Vinayakumar B Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046477#comment-14046477 ] Jason Lowe commented on MAPREDUCE-5948: --- Moved this to MAPREDUCE since that's were the change should go. Thanks for the patch, Rushabh. Some comments on the patch: * LineReader: I'm not sure changing LineReader's byteConsumed semantics will be OK. LineReader is used by things besides MapReduce, so changing the return result for some corner cases may be problematic for other downstream projects. Rather than rely on the bytes consumed result from LineReader it seems we could track this using the existing totalBytesRead value. We simply need to track how many bytes from the input stream we've read before we try to read the next line. If we've read past the end of the split (i.e.: bytes split length) then we can set the finished flag. * UncompressedSplitLineReader: Given how complicated split processing can be and all the corner cases, it'd be nice to have some comments explaining what's going on similar to what's in CompressedSplitLineReader. Otherwise many will wonder why we're going out of our way to make sure we stop right at the split when we fill the buffer, etc. * bufferPosition is not really a position in the buffer but a position within the split, and that's similar to totalBytesRead. I'm not sure we can completely combine them, but it'd be nice if their names were more specific to what they're tracking (one is the total bytes read from the input stream and another is total bytes consumed by the line reader). * {{(inDelimiter actualBytesRead 0)}} can just be {{(inDelimiter)}} because we already know actualBytesRead 0 at that point due to the previous condition check. * Test doesn't cleanup the temporary files it creates when it's done. (Actually just noticed the tests are creating files in the wrong directory, should be something like TestLineRecordReader.class.getName() and not TestTextInputFormat.) * Nit: whitespace between {{if}} and ( and lines formatted to 80 columns to follow the coding standard org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Vinayakumar B Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.2#6252)