[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435539#comment-15435539 ]
Andrew Ash commented on SPARK-17227: ------------------------------------ Rob and I work together, and we've seen datasets in mostly-CSV format that have non-standard record delimiters ('\0' character for instance). For some broader context, we've created our own CSV text parser and use that in all our various internal products that use Spark, but would like to contribute this additional flexibility back to the Spark community at large and in the process eliminate the need for our internal CSV datasource. Here are the tickets Rob just opened that we would require to eliminate our internal CSV datasource: SPARK-17222 SPARK-17224 SPARK-17225 SPARK-17226 SPARK-17227 The basic question then, is would the Spark community accept patches that extend Spark's CSV parser to cover these features? We're willing to write the code and get the patches through code review, but would rather know up front if these changes would never be accepted into mainline Spark due to philosophical disagreements around what Spark's CSV datasource should be. > Allow configuring record delimiter in csv > ----------------------------------------- > > Key: SPARK-17227 > URL: https://issues.apache.org/jira/browse/SPARK-17227 > Project: Spark > Issue Type: Improvement > Reporter: Robert Kruszewski > Priority: Minor > > Instead of hard coded "\n" -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org