[ 
https://issues.apache.org/jira/browse/FLINK-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949378#comment-16949378
 ] 

Jingsong Lee commented on FLINK-14266:
--------------------------------------

Thanks [~fhueske] , I think there are two choices:
 # Extends DelimitedInputFormat and use CsvRowDeserializationSchema to 
deserialize bytes with offset and numBytes, need deal with selectedFields too. 
DelimitedInputFormat already has the split logical to deal with half-line. But 
as fabian said, we do not know whether the next new-line character is a record 
delimiter or contained in a string field.
 # Use jackson ObjectReader.readValues(InputStream). The difficulty are:
 ## ObjectReader do not know current read offset, it has buffer to cache more 
bytes. One solution is to use BoundedInputStream, But we need to read the 
unfinished line, so we need to modify splitLength first to find the correct end 
position based on line delimiter and escapeChar.
 ## We also need to correctly determine the line separator when starting 
reading. If first char is line separator, maybe the character before it is an 
escape character. We need to deal with these things carefully.

 

 

> Introduce RowCsvInputFormat to new CSV module
> ---------------------------------------------
>
>                 Key: FLINK-14266
>                 URL: https://issues.apache.org/jira/browse/FLINK-14266
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / FileSystem
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Now, we have an old CSV, but that is not standard CSV support. we should 
> support the RFC-compliant CSV format for table/sql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to