Yep, I had submitted a PR that included it way back in the original
direct stream for kafka, but it got nixed in favor of
TaskContext.partitionId ;) The concern then was about too many
xWithBlah apis on rdd.
If we do want to deprecate taskcontext.partitionId and add
foreachPartitionWithIndex, I
Seems like a good new API to add?
On Thu, Oct 20, 2016 at 11:14 AM, Cody Koeninger wrote:
> Access to the partition ID is necessary for basically every single one
> of my jobs, and there isn't a foreachPartiionWithIndex equivalent.
> You can kind of work around it with
Access to the partition ID is necessary for basically every single one
of my jobs, and there isn't a foreachPartiionWithIndex equivalent.
You can kind of work around it with empty foreach after the map, but
it's really awkward to explain to people.
On Thu, Oct 20, 2016 at 12:52 PM, Reynold Xin
FYI - Xiangrui submitted an amazing pull request to fix a long standing
issue with a lot of the nondeterministic expressions (rand, randn,
monotonically_increasing_id): https://github.com/apache/spark/pull/15567
Prior to this PR, we were using TaskContext.partitionId as the partition
index in