[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext
[ https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38679: --- Assignee: Venki Korukanti > Expose the number partitions in a stage to TaskContext > -- > > Key: SPARK-38679 > URL: https://issues.apache.org/jira/browse/SPARK-38679 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Venki Korukanti >Assignee: Venki Korukanti >Priority: Major > > Add a new api to expose total partition count in the stage belonging to the > task in TaskContext, so that the task knows what fraction of the computation > is doing. > With this extra information, users can also generate 32bit unique int ids as > below rather than using `monotonically_increasing_id` which generates 64bit > long ids. > > {code:java} > rdd.mapPartitions { rowsIter => > val partitionId = TaskContext.get().partitionId() > val numPartitions = TaskContext.get().numPartitions() > var i = 0 > rowsIter.map { row => > val rowId = partitionId + i * numPartitions > i += 1 > (rowId, row) > } > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext
[ https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38679: Assignee: Apache Spark > Expose the number partitions in a stage to TaskContext > -- > > Key: SPARK-38679 > URL: https://issues.apache.org/jira/browse/SPARK-38679 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Venki Korukanti >Assignee: Apache Spark >Priority: Major > > Add a new api to expose total partition count in the stage belonging to the > task in TaskContext, so that the task knows what fraction of the computation > is doing. > With this extra information, users can also generate 32bit unique int ids as > below rather than using `monotonically_increasing_id` which generates 64bit > long ids. > > {code:java} > rdd.mapPartitions { rowsIter => > val partitionId = TaskContext.get().partitionId() > val numPartitions = TaskContext.get().numPartitions() > var i = 0 > rowsIter.map { row => > val rowId = partitionId + i * numPartitions > i += 1 > (rowId, row) > } > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext
[ https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38679: Assignee: (was: Apache Spark) > Expose the number partitions in a stage to TaskContext > -- > > Key: SPARK-38679 > URL: https://issues.apache.org/jira/browse/SPARK-38679 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Venki Korukanti >Priority: Major > > Add a new api to expose total partition count in the stage belonging to the > task in TaskContext, so that the task knows what fraction of the computation > is doing. > With this extra information, users can also generate 32bit unique int ids as > below rather than using `monotonically_increasing_id` which generates 64bit > long ids. > > {code:java} > rdd.mapPartitions { rowsIter => > val partitionId = TaskContext.get().partitionId() > val numPartitions = TaskContext.get().numPartitions() > var i = 0 > rowsIter.map { row => > val rowId = partitionId + i * numPartitions > i += 1 > (rowId, row) > } > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org