[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436150#comment-15436150 ] Hyukjin Kwon commented on SPARK-17227: -- Ah, SPARK-17222 is about miltiple-lines but IMHO it might have been nicer to summarize those in the single JIRA because I guess a single PR would fix all the listed JIRAs. > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436147#comment-15436147 ] Hyukjin Kwon commented on SPARK-17227: -- We may have to open a JIRA to deal with multiple-lines first. The root cause is the use of {{LineRecordReader}} and for this reason, JSON also forces the format to be in-line-document. I got rid of the weird inconsistent behavior in {{CSVParser}} in current master anyway. So, Spark's CSV datasource does not support any multiple line stuff if my understanding is correct. > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436140#comment-15436140 ] Hyukjin Kwon commented on SPARK-17227: -- Also, it would be great if the JIRA has an example and problem so that this can be tested and reproduced. > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436139#comment-15436139 ] Hyukjin Kwon commented on SPARK-17227: -- If I remember this correctly, we are not using that {{rowSeparator}} anymore (although this is set into the parser) and I think that should be removed as CSV datasource uses {{LineRecordReader}} for each line. > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435551#comment-15435551 ] Sean Owen commented on SPARK-17227: --- Making some things simply configurable seems uncontroversial. They're just parameters that can be left at the default and plenty of other libraries do that. Some may be harder to fix and the question will just be if the configurability costs something in complexity or speed or something. IMHO > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435539#comment-15435539 ] Andrew Ash commented on SPARK-17227: Rob and I work together, and we've seen datasets in mostly-CSV format that have non-standard record delimiters ('\0' character for instance). For some broader context, we've created our own CSV text parser and use that in all our various internal products that use Spark, but would like to contribute this additional flexibility back to the Spark community at large and in the process eliminate the need for our internal CSV datasource. Here are the tickets Rob just opened that we would require to eliminate our internal CSV datasource: SPARK-17222 SPARK-17224 SPARK-17225 SPARK-17226 SPARK-17227 The basic question then, is would the Spark community accept patches that extend Spark's CSV parser to cover these features? We're willing to write the code and get the patches through code review, but would rather know up front if these changes would never be accepted into mainline Spark due to philosophical disagreements around what Spark's CSV datasource should be. > Allow configuring record delimiter in csv > - > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement >Reporter: Robert Kruszewski >Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org