[
https://issues.apache.org/jira/browse/PHOENIX-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nihal Jain updated PHOENIX-7267:
--------------------------------
Description:
We are trying to load data where there are few bad record for some files due to
which mappers fail and hence the entire job fail with following error:
{code:java}
Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.io.IOException: (startline 1) EOF reached before encapsulated token
finished
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:206)
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:77)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF
reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
at
org.apache.phoenix.thirdparty.com.google.common.collect.Iterators.getNext(Iterators.java:895)
at
org.apache.phoenix.thirdparty.com.google.common.collect.Iterables.getFirst(Iterables.java:827)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:164)
... 9 more
Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated
token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
... 15 more {code}
I have figured out there is code in commons-csv which throws a RuntimeException
when it fails to parse a record, which in turn is not handled by phoenix as we
only catch IOException.
See
[https://github.com/apache/commons-csv/blob/rel/commons-csv-1.0/src/main/java/org/apache/commons/csv/CSVParser.java#L398]
Also see
[https://github.com/apache/phoenix/blob/master/phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/FormatToBytesWritableMapper.java#L167]
This is undesired, in worst case the job should just skip the failed record
than the whole job. Note we are passing --ignore-errors.
This bug is to fix this behavior and figure out a way to handle the failed
records and make the job continue. Also will bump commons-csv to 1.10.0, seems
quite a while we have not bumped it. Better to move up here as well.
was:
We are trying to load data where there are few bad record for some files due to
which mappers fail and hence the entire job fail with following error:
{code:java}
Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.io.IOException: (startline 1) EOF reached before encapsulated token
finished
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:206)
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:77)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF
reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
at
org.apache.phoenix.thirdparty.com.google.common.collect.Iterators.getNext(Iterators.java:895)
at
org.apache.phoenix.thirdparty.com.google.common.collect.Iterables.getFirst(Iterables.java:827)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
at
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:164)
... 9 more
Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated
token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
... 15 more {code}
I have figured out there is code in commons-csv which throws a RuntimeException
when it fails to parse are record which is not handled by phoenix as we only
catch IOException.
See
[https://github.com/apache/commons-csv/blob/rel/commons-csv-1.0/src/main/java/org/apache/commons/csv/CSVParser.java#L398]
Also see
[https://github.com/apache/phoenix/blob/master/phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/FormatToBytesWritableMapper.java#L167]
This is undesired, in worst case the job should just skip the failed record
than the whole job. Note we are passing --ignore-errors.
This bug is to fix this behavior.
> CsvBulkLoadTool fails for a bad record with "(startline 1) EOF reached before
> encapsulated token finished"
> ----------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7267
> URL: https://issues.apache.org/jira/browse/PHOENIX-7267
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.2.0, 5.1.3, 5.3.0
> Reporter: Nihal Jain
> Assignee: Nihal Jain
> Priority: Major
>
> We are trying to load data where there are few bad record for some files due
> to which mappers fail and hence the entire job fail with following error:
> {code:java}
> Error: java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: (startline 1) EOF reached before encapsulated token
> finished
> at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:206)
> at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:77)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF
> reached before encapsulated token finished
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
> at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
> at
> org.apache.phoenix.thirdparty.com.google.common.collect.Iterators.getNext(Iterators.java:895)
> at
> org.apache.phoenix.thirdparty.com.google.common.collect.Iterables.getFirst(Iterables.java:827)
> at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
> at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
> at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:164)
> ... 9 more
> Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated
> token finished
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
> at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
> ... 15 more {code}
> I have figured out there is code in commons-csv which throws a
> RuntimeException when it fails to parse a record, which in turn is not
> handled by phoenix as we only catch IOException.
> See
> [https://github.com/apache/commons-csv/blob/rel/commons-csv-1.0/src/main/java/org/apache/commons/csv/CSVParser.java#L398]
>
> Also see
> [https://github.com/apache/phoenix/blob/master/phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/FormatToBytesWritableMapper.java#L167]
>
> This is undesired, in worst case the job should just skip the failed record
> than the whole job. Note we are passing --ignore-errors.
> This bug is to fix this behavior and figure out a way to handle the failed
> records and make the job continue. Also will bump commons-csv to 1.10.0,
> seems quite a while we have not bumped it. Better to move up here as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)