[ https://issues.apache.org/jira/browse/SPARK-29954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan updated SPARK-29954: -------------------------------- Parent Issue: SPARK-31412 (was: SPARK-29544) > collect the runtime statistics of row count in map stage > -------------------------------------------------------- > > Key: SPARK-29954 > URL: https://issues.apache.org/jira/browse/SPARK-29954 > Project: Spark > Issue Type: Sub-task > Components: Shuffle > Affects Versions: 3.0.0 > Reporter: Ke Jia > Priority: Major > > We need the row count info to more accurately estimate the data skew > situation when too many duplicated data. This PR will collect the row count > info in map stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org