[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext

2022-03-31 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38679:
---

Assignee: Venki Korukanti

> Expose the number partitions in a stage to TaskContext
> --
>
> Key: SPARK-38679
> URL: https://issues.apache.org/jira/browse/SPARK-38679
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Major
>
> Add a new api to expose total partition count in the stage belonging to the 
> task in TaskContext, so that the task knows what fraction of the computation 
> is doing.
> With this extra information, users can also generate 32bit unique int ids as 
> below rather than using `monotonically_increasing_id` which generates 64bit 
> long ids.
>  
> {code:java}
>    rdd.mapPartitions { rowsIter =>
>         val partitionId = TaskContext.get().partitionId()
>         val numPartitions = TaskContext.get().numPartitions()
>         var i = 0
>         rowsIter.map { row =>
>           val rowId = partitionId + i * numPartitions
>           i += 1
>           (rowId, row)
>        }
>   }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext

2022-03-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38679:


Assignee: Apache Spark

> Expose the number partitions in a stage to TaskContext
> --
>
> Key: SPARK-38679
> URL: https://issues.apache.org/jira/browse/SPARK-38679
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Venki Korukanti
>Assignee: Apache Spark
>Priority: Major
>
> Add a new api to expose total partition count in the stage belonging to the 
> task in TaskContext, so that the task knows what fraction of the computation 
> is doing.
> With this extra information, users can also generate 32bit unique int ids as 
> below rather than using `monotonically_increasing_id` which generates 64bit 
> long ids.
>  
> {code:java}
>    rdd.mapPartitions { rowsIter =>
>         val partitionId = TaskContext.get().partitionId()
>         val numPartitions = TaskContext.get().numPartitions()
>         var i = 0
>         rowsIter.map { row =>
>           val rowId = partitionId + i * numPartitions
>           i += 1
>           (rowId, row)
>        }
>   }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38679) Expose the number partitions in a stage to TaskContext

2022-03-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38679:


Assignee: (was: Apache Spark)

> Expose the number partitions in a stage to TaskContext
> --
>
> Key: SPARK-38679
> URL: https://issues.apache.org/jira/browse/SPARK-38679
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Venki Korukanti
>Priority: Major
>
> Add a new api to expose total partition count in the stage belonging to the 
> task in TaskContext, so that the task knows what fraction of the computation 
> is doing.
> With this extra information, users can also generate 32bit unique int ids as 
> below rather than using `monotonically_increasing_id` which generates 64bit 
> long ids.
>  
> {code:java}
>    rdd.mapPartitions { rowsIter =>
>         val partitionId = TaskContext.get().partitionId()
>         val numPartitions = TaskContext.get().numPartitions()
>         var i = 0
>         rowsIter.map { row =>
>           val rowId = partitionId + i * numPartitions
>           i += 1
>           (rowId, row)
>        }
>   }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org