[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436150#comment-15436150
 ] 

Hyukjin Kwon commented on SPARK-17227:
--

Ah, SPARK-17222 is about miltiple-lines but IMHO it might have been nicer to 
summarize those in the single JIRA because I guess a single PR would fix all 
the listed JIRAs.

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436147#comment-15436147
 ] 

Hyukjin Kwon commented on SPARK-17227:
--

We may have to open a JIRA to deal with multiple-lines first. The root cause is 
the use of {{LineRecordReader}} and for this reason, JSON also forces the 
format to be in-line-document. I got rid of the weird inconsistent behavior in 
{{CSVParser}} in current master anyway. So, Spark's CSV datasource does not 
support any multiple line stuff if my understanding is correct.

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436140#comment-15436140
 ] 

Hyukjin Kwon commented on SPARK-17227:
--

Also, it would be great if the JIRA has an example and problem so that this can 
be tested and reproduced.

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436139#comment-15436139
 ] 

Hyukjin Kwon commented on SPARK-17227:
--

If I remember this correctly, we are not using that {{rowSeparator}} anymore 
(although this is set into the parser) and I think that should be removed as 
CSV datasource uses {{LineRecordReader}} for each line.

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435551#comment-15435551
 ] 

Sean Owen commented on SPARK-17227:
---

Making some things simply configurable seems uncontroversial. They're just 
parameters that can be left at the default and plenty of other libraries do 
that. Some may be harder to fix and the question will just be if the 
configurability costs something in complexity or speed or something. IMHO

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435539#comment-15435539
 ] 

Andrew Ash commented on SPARK-17227:


Rob and I work together, and we've seen datasets in mostly-CSV format that have 
non-standard record delimiters ('\0' character for instance).

For some broader context, we've created our own CSV text parser and use that in 
all our various internal products that use Spark, but would like to contribute 
this additional flexibility back to the Spark community at large and in the 
process eliminate the need for our internal CSV datasource.

Here are the tickets Rob just opened that we would require to eliminate our 
internal CSV datasource:

SPARK-17222
SPARK-17224
SPARK-17225
SPARK-17226
SPARK-17227

The basic question then, is would the Spark community accept patches that 
extend Spark's CSV parser to cover these features?  We're willing to write the 
code and get the patches through code review, but would rather know up front if 
these changes would never be accepted into mainline Spark due to philosophical 
disagreements around what Spark's CSV datasource should be.

> Allow configuring record delimiter in csv
> -
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
>  Issue Type: Improvement
>Reporter: Robert Kruszewski
>Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org