[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435539#comment-15435539
 ] 

Andrew Ash commented on SPARK-17227:
------------------------------------

Rob and I work together, and we've seen datasets in mostly-CSV format that have 
non-standard record delimiters ('\0' character for instance).

For some broader context, we've created our own CSV text parser and use that in 
all our various internal products that use Spark, but would like to contribute 
this additional flexibility back to the Spark community at large and in the 
process eliminate the need for our internal CSV datasource.

Here are the tickets Rob just opened that we would require to eliminate our 
internal CSV datasource:

SPARK-17222
SPARK-17224
SPARK-17225
SPARK-17226
SPARK-17227

The basic question then, is would the Spark community accept patches that 
extend Spark's CSV parser to cover these features?  We're willing to write the 
code and get the patches through code review, but would rather know up front if 
these changes would never be accepted into mainline Spark due to philosophical 
disagreements around what Spark's CSV datasource should be.

> Allow configuring record delimiter in csv
> -----------------------------------------
>
>                 Key: SPARK-17227
>                 URL: https://issues.apache.org/jira/browse/SPARK-17227
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Robert Kruszewski
>            Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to