Venki Korukanti created SPARK-38679:
---------------------------------------

             Summary: Expose the number partitions in a stage to TaskContext
                 Key: SPARK-38679
                 URL: https://issues.apache.org/jira/browse/SPARK-38679
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.2.1
            Reporter: Venki Korukanti


Add a new api to expose total partition count in a task. so that the task knows 
what fraction of the computation is doing.

With this extra information, users can also generate 32bit unique int ids as 
below rather than using `monotonically_increasing_id` which generates 64bit 
long ids.

 

{{   rdd.mapPartitions { rowsIter =>}}
{{        val partitionId = TaskContext.get().partitionId()}}
{{        val numPartitions = TaskContext.get().numPartitions()}}
{{        var i = 0}}
{{        rowsIter.map { row =>}}

{{          val rowId = partitionId + i * numPartitions}}

{{          i += 1}}

{{          (rowId, row)}}

{{        }}}

{{    }}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to