spencerlee created SPARK-13140: ---------------------------------- Summary: spark sql aggregate performance decrease Key: SPARK-13140 URL: https://issues.apache.org/jira/browse/SPARK-13140 Project: Spark Issue Type: Question Affects Versions: 1.6.0 Reporter: spencerlee
In our scenario, their are 30 + key columns with 60+ metric columns. our typical query is: select key1, key2, key3, key4, key5, sum(metric1), sum(metric2), sum(metric3).... sum(metric30) from table_name group by key1, key2, key3, key4, key5. I import a single parquet file(60M, about 250w+ records) into sparksql , and do the typical query with local mode. I found that, when I only aggregate 24 metrics, the response time is about 4.81s, when I aggregate 25+ metrics, the response time is 45.9s, which is almost 10 times slower. that's obviously unreasonable. Is this a bug or need modify some configuration to tune the query? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org