Tang Yan created SPARK-7825: ------------------------------- Summary: Poor performance in Cross Product due to no combine operations for small files. Key: SPARK-7825 URL: https://issues.apache.org/jira/browse/SPARK-7825 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0 Reporter: Tang Yan
Dealing with Cross Product, if one table has many small files, spark sql has to handle so many tasks which will lead to poor performance, while Hive has a CombineHiveInputFormat which can combine small files to decrease the task number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org