Nico Kruber created FLINK-21562:
-----------------------------------

             Summary: Add more informative message on CSV parsing errors
                 Key: FLINK-21562
                 URL: https://issues.apache.org/jira/browse/FLINK-21562
             Project: Flink
          Issue Type: Improvement
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / API
    Affects Versions: 1.11.3
            Reporter: Nico Kruber


I was parsing a CSV file with comments in it and used {{'csv.allow-comments' = 
'true'}} without also passing {{'csv.ignore-parse-errors' = 'true'}} to the 
table DDL to not hide any actual format errors.
Since I didn't just have strings in my table, this did of course stumble on the 
commented-out line with the following error:

{code}
2021-02-16 17:45:53,055 WARN  org.apache.flink.runtime.taskmanager.Task         
           [] - Source: TableSourceScan(table=[[default_catalog, 
default_database, airports]], fields=[IATA_CODE, AIRPORT, CITY, STATE, COUNTRY, 
LATITUDE, LONGITUDE]) -> SinkConversionToTuple2 -> Sink: SQL Client Stream 
Collect Sink (1/1)#0 (9f3a3965f18ed99ee42580bdb559ba66) switched from RUNNING 
to FAILED.
java.io.IOException: Failed to deserialize CSV row.
        at 
org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:257)
 ~[flink-csv-1.12.1.jar:1.12.1]
        at 
org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:162)
 ~[flink-csv-1.12.1.jar:1.12.1]
        at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
 ~[flink-dist_2.12-1.12.1.jar:1.12.1]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
 ~[flink-dist_2.12-1.12.1.jar:1.12.1]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) 
~[flink-dist_2.12-1.12.1.jar:1.12.1]
        at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241)
 ~[flink-dist_2.12-1.12.1.jar:1.12.1]
Caused by: java.lang.NumberFormatException: empty String
        at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) 
~[?:1.8.0_275]
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
~[?:1.8.0_275]
        at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_275]
        at 
org.apache.flink.formats.csv.CsvToRowDataConverters.convertToDouble(CsvToRowDataConverters.java:203)
 ~[flink-csv-1.12.1.jar:1.12.1]
        at 
org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createNullableConverter$ac6e531e$1(CsvToRowDataConverters.java:113)
 ~[flink-csv-1.12.1.jar:1.12.1]
        at 
org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createRowConverter$18bb1dd$1(CsvToRowDataConverters.java:98)
 ~[flink-csv-1.12.1.jar:1.12.1]
        at 
org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:251)
 ~[flink-csv-1.12.1.jar:1.12.1]
        ... 5 more
{code}

Two things should be improved here:
# commented-out lines should be ignored by default (potentially, FLINK-17133 
addresses this or at least gives the user the power to do so)
# the error message itself is not very informative: "empty String".

This ticket is about the latter. I would suggest to have at least a few more 
pointers to direct the user to finding the source in the CSV file/item/... - 
here, the data type could just be wrong or the CSV file itself may be 
wrong/corrupted and the user would need to investigate.
What exactly may help here, probably depends on the actual input connector this 
format is currently working with, e.g. line number in a csv file would be best, 
otherwise that may not be possible but we could show the whole line or at least 
a few surrounding fields...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to