SplittableDoFn for zipWithIndex for a large file

Chak-Pong Chung Thu, 13 Dec 2018 16:22:17 -0800

Hello everyone!

I asked the following question and think I might get some suggestions
whether what I want is doable or not.


https://stackoverflow.com/questions/53746046/how-can-i-implement-zipwithindex-like-spark-in-apache-beam/53747612#53747612

If I can get `PCollection` id and the number of (contiguous)lines in each
`PCollection`, then I can calculate the row order within each
partition/`PCollection`  first and then do prefix-sum to compute the offset
for each partition. This is doable in MPI or openMP since I can get the
id/rank of each processor/thread.

Best,
Chak-Pong

SplittableDoFn for zipWithIndex for a large file

Reply via email to