[ https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515640#comment-17515640 ]
Apache Spark commented on SPARK-38679: -------------------------------------- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/36029 > Expose the number partitions in a stage to TaskContext > ------------------------------------------------------ > > Key: SPARK-38679 > URL: https://issues.apache.org/jira/browse/SPARK-38679 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.2.1 > Reporter: Venki Korukanti > Assignee: Venki Korukanti > Priority: Major > Fix For: 3.4.0 > > > Add a new api to expose total partition count in the stage belonging to the > task in TaskContext, so that the task knows what fraction of the computation > is doing. > With this extra information, users can also generate 32bit unique int ids as > below rather than using `monotonically_increasing_id` which generates 64bit > long ids. > > {code:java} > rdd.mapPartitions { rowsIter => > val partitionId = TaskContext.get().partitionId() > val numPartitions = TaskContext.get().numPartitions() > var i = 0 > rowsIter.map { row => > val rowId = partitionId + i * numPartitions > i += 1 > (rowId, row) > } > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org