[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336848#comment-17336848
 ] 

Flink Jira Bot commented on FLINK-4785:
---

This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: csv, stale-major
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17328892#comment-17328892
 ] 

Flink Jira Bot commented on FLINK-4785:
---

This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: csv, stale-major
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2019-04-08 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812955#comment-16812955
 ] 

Liya Fan commented on FLINK-4785:
-

[~f.pompermaier] Thanks for the information. I can also access it now.

I think this can be a known issue. Flink does not support the standard CSV file 
format, as specified by the RFC specification. 

So it does not process some special case very well, like comma in quotes, 
double quotes, etc. 

For more details, please see https://issues.apache.org/jira/browse/FLINK-10684

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2019-04-08 Thread Flavio Pompermaier (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1681#comment-1681
 ] 

Flavio Pompermaier commented on FLINK-4785:
---

Which url is no longer valid?
[https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java]
 ?

I can still access it..and I don't know whether this problem has been solved or 
not..but I think it's not (you should try to run that main class)

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2019-04-07 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812098#comment-16812098
 ] 

Liya Fan commented on FLINK-4785:
-

Hi [~f.pompermaier], it seems the url is no longer valid.

Does this problem still exist?

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2017-03-10 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905787#comment-15905787
 ] 

Fabian Hueske commented on FLINK-4785:
--

Oh, yes. Sorry for the confusion. I'll delete my comment.

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2017-03-10 Thread Luke Hutchison (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905779#comment-15905779
 ] 

Luke Hutchison commented on FLINK-4785:
---

I'm pretty sure I have seen backslash escaping in CSV before, but the 
old-school way of quoting quote characters (double double quotes) is the one 
that made it into the RFC, presumably for backwards compatibility with 
spreadsheets.

Fabian -- you copied the text from the wrong bug report, 
https://issues.apache.org/jira/browse/FLINK-6016 , rather than 
https://issues.apache.org/jira/browse/FLINK-6107 , which is:

--

The RFC for the CSV format specifies that double quotes are valid in quoted 
strings in CSV, by doubling the quote character:

https://tools.ietf.org/html/rfc4180

However, when parsing a CSV file with Flink containing quoted quotes, such as:

bob,"The name is ""Bob"""

you get this exception:

org.apache.flink.api.common.io.ParseException: Line could not be parsed: 
'bob,"The name is ""Bob"""'
ParserError UNQUOTED_CHARS_AFTER_QUOTED_STRING 
Expect field types: class java.lang.String, class java.lang.String

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2017-03-10 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905696#comment-15905696
 ] 

Fabian Hueske commented on FLINK-4785:
--

Copying the issue description of FLINK-6017

{quote}
The RFC for the CSV format specifies that newlines are valid in quoted strings 
in CSV:

https://tools.ietf.org/html/rfc4180

However, when parsing a CSV file with Flink containing a newline, such as:

"3
4",5

you get this exception:

Line could not be parsed: '"3'
ParserError UNTERMINATED_QUOTED_STRING 
Expect field types: class java.lang.String, class java.lang.String 
{quote}

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes

2016-10-14 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575710#comment-15575710
 ] 

Fabian Hueske commented on FLINK-4785:
--

The {{StringParser}} can parse enclosed by a quote character (usually a double 
quote). This is required to parse strings that contain a field delimiter 
character. Otherwise we could not parse a line like this:

{quote}
12,2.45,"I am a string with a field delimiter, right?",12
{quote}

The problem is that the {{StringParser}} does not support to escape the quote 
character. In CSV files where a single double quote is used as quote character, 
this is usually done by double double quotes like this:

{quote}
12,2.45,"Bill said to Bob: ""Hi!"".",12
{quote}

When we rewrote the {{StringParser}} a while back we decided to not support 
double double quotes because there were no users requesting support for it and 
to simplify the parser logic and keep the configuration options of the 
CsvInputFormat concise (which are already quite a few).

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -
>
> Key: FLINK-4785
> URL: https://issues.apache.org/jira/browse/FLINK-4785
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.1.2
>Reporter: Flavio Pompermaier
>  Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)