[jira] [Updated] (SPARK-7825) Poor performance in Cross Product due to no combine operations for small files.
[ https://issues.apache.org/jira/browse/SPARK-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tang Yan updated SPARK-7825: Affects Version/s: (was: 1.3.1) (was: 1.2.2) (was: 1.2.1) (was: 1.3.0) (was: 1.2.0) > Poor performance in Cross Product due to no combine operations for small > files. > --- > > Key: SPARK-7825 > URL: https://issues.apache.org/jira/browse/SPARK-7825 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Tang Yan > > Dealing with Cross Product, if one table has many small files, spark sql > has to handle so many tasks which will lead to poor performance, while Hive > has a CombineHiveInputFormat which can combine small files to decrease the > task number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7825) Poor performance in Cross Product due to no combine operations for small files.
Tang Yan created SPARK-7825: --- Summary: Poor performance in Cross Product due to no combine operations for small files. Key: SPARK-7825 URL: https://issues.apache.org/jira/browse/SPARK-7825 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0 Reporter: Tang Yan Dealing with Cross Product, if one table has many small files, spark sql has to handle so many tasks which will lead to poor performance, while Hive has a CombineHiveInputFormat which can combine small files to decrease the task number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org