[ https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581119#comment-15581119 ]
Guoqiang Li commented on SPARK-17930: ------------------------------------- TPC-DS 2T data (Parquet) and the SQL(query 2) => {noformat} select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'M' and cd_education_status = '4 yr Degree' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 2001 group by i_item_id order by i_item_id limit 100; {noformat} spark-defaults.conf => {noformat} spark.master yarn-client spark.executor.instances 20 spark.driver.memory 16g spark.executor.memory 30g spark.executor.cores 5 spark.default.parallelism 100 spark.sql.shuffle.partitions 100000 spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.maxResultSize 0 spark.rpc.netty.dispatcher.numThreads 8 spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M spark.cleaner.referenceTracking.blocking true spark.cleaner.referenceTracking.blocking.shuffle true {noformat} Performance test results are as follows => ||[SPARK-17930|https://github.com/witgo/spark/tree/SPARK-17930]||[ed14633|https://github.com/witgo/spark/commit/ed1463341455830b8867b721a1b34f291139baf3]|| |54.5 s|231.7 s| > The SerializerInstance instance used when deserializing a TaskResult is not > reused > ----------------------------------------------------------------------------------- > > Key: SPARK-17930 > URL: https://issues.apache.org/jira/browse/SPARK-17930 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.6.1, 2.0.1 > Reporter: Guoqiang Li > > The following code is called when the DirectTaskResult instance is > deserialized > {noformat} > def value(): T = { > if (valueObjectDeserialized) { > valueObject > } else { > // Each deserialization creates a new instance of SerializerInstance, > which is very time-consuming > val resultSer = SparkEnv.get.serializer.newInstance() > valueObject = resultSer.deserialize(valueBytes) > valueObjectDeserialized = true > valueObject > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org