[ https://issues.apache.org/jira/browse/SPARK-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520007#comment-16520007 ]
Hyukjin Kwon commented on SPARK-13184: -------------------------------------- I think it's rather almost close to an obsolete one because Spark introduced merging small files into fewer partitions ... I actually think it's not an issue anymore within Spark. Let me know if I am mistaken. I will reopen it. > Support minPartitions parameter for JSON and CSV datasources as options > ----------------------------------------------------------------------- > > Key: SPARK-13184 > URL: https://issues.apache.org/jira/browse/SPARK-13184 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > Priority: Minor > > After looking through the pull requests below at Spark CSV datasources, > https://github.com/databricks/spark-csv/pull/256 > https://github.com/databricks/spark-csv/issues/141 > https://github.com/databricks/spark-csv/pull/186 > It looks Spark might need to be able to set {{minPartitions}}. > {{repartition()}} or {{coalesce()}} can be alternatives but it looks it needs > to shuffle the data for most cases. > Although I am still not sure if it needs this, I will open this ticket just > for discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org