[ 
https://issues.apache.org/jira/browse/SYSTEMML-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867204#comment-15867204
 ] 

Matthias Boehm commented on SYSTEMML-1244:
------------------------------------------

Just to clarify - there were two issues: (1) tokens that are a concatenation of 
quoted tokens (according to RFC4180) and non-quoted tokens were split after the 
last quote, and (2) incorrect parsing of frame meta data. 

We now made the related split and count functionality more robust with regard 
to these special cases without sacrificing performance for the common case 
without quotes. 

[~acs_s] would you mind closing your related PR?

> FrameReader with CSV format have issues due to double quotes in some cases
> --------------------------------------------------------------------------
>
>                 Key: SYSTEMML-1244
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1244
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Arvind Surve
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.13
>
>
> This is an example for input data,
> It has three columns with TAB as a field separator.
> "20news-bydate-train/alt.atheism/49960" """"    88.0
> "20news-bydate-train/alt.atheism/49960" "#"     1.0
> Couple of observations so far:
>   1. Double quote is considered as a part of input.
>   2. Next Double quote is considered as end of input field.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to