[jira] [Commented] (SPARK-13184) Support minPartitions parameter for JSON and CSV datasources as options

Reynold Xin (JIRA) Wed, 25 May 2016 16:26:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301057#comment-15301057
 ]


Reynold Xin commented on SPARK-13184:
-------------------------------------

Rather than doing this specifically for JSON or CSV, it might make more sense 
to have the data source specific options override the default options. That is 
to say, we can build the data source options by capturing the list of SQLConf 
and then apply the list of user specified options, and then in data source 
execution code, rather than using sparkSession.sessionState.conf.xyz, we get 
the option from the combined options.


> Support minPartitions parameter for JSON and CSV datasources as options
> -----------------------------------------------------------------------
>
>                 Key: SPARK-13184
>                 URL: https://issues.apache.org/jira/browse/SPARK-13184
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> After looking through the pull requests below at Spark CSV datasources,
> https://github.com/databricks/spark-csv/pull/256
> https://github.com/databricks/spark-csv/issues/141
> https://github.com/databricks/spark-csv/pull/186
> It looks Spark might need to be able to set {{minPartitions}}.
> {{repartition()}} or {{coalesce()}} can be alternatives but it looks it needs 
> to shuffle the data for most cases.
> Although I am still not sure if it needs this, I will open this ticket just 
> for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13184) Support minPartitions parameter for JSON and CSV datasources as options

Reply via email to