Hyukjin Kwon created SPARK-13184:
------------------------------------

             Summary: Support minPartitions parameter for JSON and CSV 
datasources as options
                 Key: SPARK-13184
                 URL: https://issues.apache.org/jira/browse/SPARK-13184
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Hyukjin Kwon
            Priority: Minor


After looking through the pull requests below at Spark CSV datasources,

https://github.com/databricks/spark-csv/pull/256
https://github.com/databricks/spark-csv/issues/141
https://github.com/databricks/spark-csv/pull/186

It looks Spark might need to be able to set {{minPartitions}}.

{{repartition()}} or {{coalesce()}} can be alternatives but it looks it needs 
to shuffle the data for most cases.

Although I am still not sure if it needs this, I will open this ticket just for 
discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to