Venki Korukanti created SPARK-38679: ---------------------------------------
Summary: Expose the number partitions in a stage to TaskContext Key: SPARK-38679 URL: https://issues.apache.org/jira/browse/SPARK-38679 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1 Reporter: Venki Korukanti Add a new api to expose total partition count in a task. so that the task knows what fraction of the computation is doing. With this extra information, users can also generate 32bit unique int ids as below rather than using `monotonically_increasing_id` which generates 64bit long ids. {{ rdd.mapPartitions { rowsIter =>}} {{ val partitionId = TaskContext.get().partitionId()}} {{ val numPartitions = TaskContext.get().numPartitions()}} {{ var i = 0}} {{ rowsIter.map { row =>}} {{ val rowId = partitionId + i * numPartitions}} {{ i += 1}} {{ (rowId, row)}} {{ }}} {{ }}} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org