GitHub user jeanlyn opened a pull request: https://github.com/apache/spark/pull/6426
[SPARK-7885][SQL]add config to control map aggregation in spark sql [SPARK-7885](https://issues.apache.org/jira/browse/SPARK-7885),we add `spark.sql.partialAggregation.enable`,it's true by default,we can set false to make map aggregation unable to avoid gc problem.For example,we run the sql ```sql insert overwrite table groupbytest select sale_ord_id as order_id, coalesce(sum(sku_offer_amount),0.0) as sku_offer_amount, coalesce(sum(suit_offer_amount),0.0) as suit_offer_amount, coalesce(sum(flash_gp_offer_amount),0.0) + coalesce(sum(gp_offer_amount),0.0) as gp_offer_amount, coalesce(sum(flash_gp_offer_amount),0.0) as flash_gp_offer_amount, coalesce(sum(full_minus_offer_amount),0.0) as full_rebate_offer_amount, 0.0 as telecom_point_offer_amount, coalesce(sum(coupon_pay_amount),0.0) as dq_and_jq_pay_amount, coalesce(sum(jq_pay_amount),0.0) + coalesce(sum(pop_shop_jq_pay_amount),0.0) + coalesce(sum(lim_cate_jq_pay_amount),0.0) as jq_pay_amount, coalesce(sum(dq_pay_amount),0.0) + coalesce(sum(pop_shop_dq_pay_amount),0.0) + coalesce(sum(lim_cate_dq_pay_amount),0.0) as dq_pay_amount, coalesce(sum(gift_cps_pay_amount),0.0) as gift_cps_pay_amount , coalesce(sum(mobile_red_packet_pay_amount),0.0) as mobile_red_packet_pay_amount, coalesce(sum(acct_bal_pay_amount),0.0) as acct_bal_pay_amount, coalesce(sum(jbean_pay_amount),0.0) as jbean_pay_amount, coalesce(sum(sku_rebate_amount),0.0) as sku_rebate_amount, coalesce(sum(yixun_point_pay_amount),0.0) as yixun_point_pay_amount, coalesce(sum(sku_freight_coupon_amount),0.0) as freight_coupon_amount from ord_at_det_di where ds = '2015-05-20' group by sale_ord_id ``` use 6 executor, each executor has 8GB memory and 2 cpu,we got gc problems during the map aggregation and finally the executor crash ![5869030a-d924-4249-9e1d-c637caa9363a](https://cloud.githubusercontent.com/assets/3426093/7828153/4afdaf88-0462-11e5-8af0-3bff04edab92.png) When we set `spark.sql.partialAggregation.enable` false ,the sql run in 2 min You can merge this pull request into a Git repository by running: $ git pull https://github.com/jeanlyn/spark partialAggregation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6426 ---- commit b17c676bb0d33019bbdd124048221595f278b9d0 Author: jeanlyn <jeanly...@gmail.com> Date: 2015-05-27T03:03:47Z add config to control map aggregation in spark sql ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org