[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I do some performance test between use skew join algorithm and not use skew join algorithm. I generate 2 table with 1/5 data skew in table S and 1/1 data skew in table R. Two table skew in same key. spark.sql.adaptive.skewjoin.threshold 600 spark.sql.adaptive.shuffle.targetPostShuffleInputSize 500 record: S 1000 rows; R 1 rows sql: select count(*) from R,S where rid=sid and sname>'wang9' and rname > 'zhang9'; skew algorithm : 167.695s normal algorithm: 303.922s R2_txt is 1 rows without data skew. sql: select count(*) from R2_txt,S where rid=sid and sname>'wang' and rname > 'zhang9'; skew algorithm : 38.717s normal algorithm: 114.21s --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 skewed join implementation suit for dataframe and sql statement you will get 210 output files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 @tgravescs : In join case,some like : select count(*) from A join B. if the parameter spark.sql.shuffle.partitions=200 ,then we get 200 tasks output about 'count num', the output is not in HDFS but cache in spark . Calculate the sum of 200 tasks. we got the correct value. If skewed. wo get 210 tasks output about 'count num'. it's some processing about next step. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 @tgravescs ï¼ Thank you for your response, when a single reduce task handling huge data, it's slowly and unstable. so we split one reduce task to multi- reduce task. A single reduce task doing like A join B. we split to multi-task. task 1 doing A1 join B, task 2 dong A2 join B and so on. A1 is a part of A which read from a range of maps output. For spark sql, it is the A1 as a separate partitions when processing. so it can use mutil-executor to run the task. for dispersion the process pressure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 following the review commentï¼ I rewrite code for read a range of mapsãlike thisï¼ class BlockStoreShuffleReader[K, C]( handle: BaseShuffleHandle[K, _, C], startPartition: Int, endPartition: Int, context: TaskContext, serializerManager: SerializerManager = SparkEnv.get.serializerManager, blockManager: BlockManager = SparkEnv.get.blockManager, mapOutputTracker: MapOutputTracker = SparkEnv.get.mapOutputTracker, startMapId: Option[Int] = None, endMapId: Option[Int] = None) To decide how many range for read from the mapsãUse the spark.sql.adaptive.skewjoin.threshold valueãWe think the output size less than the skew thresholdï¼ It can handling in one taskï¼else we split to many taskï¼which every one task handing data size slightly less the skew threshold --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 I will rewrite the read ShuffleReader interface , for read a range of maps but not only read a map data. it will be finished soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82779299 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -138,13 +138,16 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging * and the second item is a sequence of (shuffle block id, shuffle block size) tuples * describing the shuffle blocks that are stored at that block manager. */ - def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int) + def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int, + mapid: Int = -1) --- End diff -- it's a good idea. the seq[Int] parameter can fetch more maps data. it can reduce the task num --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82779060 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -687,18 +691,21 @@ private[spark] object MapOutputTracker extends Logging { shuffleId: Int, startPartition: Int, endPartition: Int, - statuses: Array[MapStatus]): Seq[(BlockManagerId, Seq[(BlockId, Long)])] = { + statuses: Array[MapStatus], + mapIdx: Int = -1): Seq[(BlockManagerId, Seq[(BlockId, Long)])] = { --- End diff -- it's conflicts with mapId --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
GitHub user YuhuWang2002 opened a pull request: https://github.com/apache/spark/pull/15297 [WIP][SPARK-9862]Handling data skew ## What changes were proposed in this pull request? As https://issues.apache.org/jira/browse/SPARK-9862 said, handling data skew when join. ## How was this patch tested? Unit tests in ExchangeCoordinatorSuite also can generate skew data and manual test Author: wangyuhu<wangyuhu2...@126.com> You can merge this pull request into a Git repository by running: $ git pull https://github.com/YuhuWang2002/spark-1 skewjoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15297 commit ef1baae1768dc8fd676f617bf7c1e85d72179ba8 Author: wangyuhu <wangy...@huawei.com> Date: 2016-09-28T02:46:49Z [SPARK-9862] handling data skew , add skew join feature commit c561ea718fd65adc0f1187097b9da88fc0054192 Author: wangyuhu <wangy...@huawei.com> Date: 2016-09-28T08:41:46Z code style fix commit 9025e24b6552b39bd3ab20632b702b60edc2ad10 Author: wangyuhu <wangy...@huawei.com> Date: 2016-09-29T10:56:10Z add comment commit 0ba86a2284e684a733f989ed6e595575f511c8bd Author: wangyuhu <wangy...@huawei.com> Date: 2016-09-29T11:38:26Z modify UT code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org