[ https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621935#comment-14621935 ]
Fei Wang commented on SPARK-8968: --------------------------------- changed, how about this? > dynamic partitioning in spark sql performance issue due to the high GC > overhead > ------------------------------------------------------------------------------- > > Key: SPARK-8968 > URL: https://issues.apache.org/jira/browse/SPARK-8968 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.4.0 > Reporter: Fei Wang > > now the dynamic partitioning show the bad performance for big data due to the > GC/memory overhead. this is because each task each partition now we open a > writer to write the data, this will cause many small files and high GC. We > can shuffle data by the partition columns so that each partition will have > ony one partition file and this also reduce the gc overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org