Re: SplittableDoFn for zipWithIndex for a large file

Scott Wegner Thu, 13 Dec 2018 16:33:21 -0800

I previously responded to your post on user@:
https://lists.apache.org/thread.html/5c10b7edf982ef63d1d1d70545e3fe2716d00628ff5c2a7854383413@%3Cuser.beam.apache.org%3E


I've also mirrored my response on StackOverflow:
https://stackoverflow.com/a/53771980/33791

On Thu, Dec 13, 2018 at 4:21 PM Chak-Pong Chung <cchun...@gatech.edu> wrote:

> Hello everyone!
>
> I asked the following question and think I might get some suggestions
> whether what I want is doable or not.
>
>
> https://stackoverflow.com/questions/53746046/how-can-i-implement-zipwithindex-like-spark-in-apache-beam/53747612#53747612
>
> If I can get `PCollection` id and the number of (contiguous)lines in each
> `PCollection`, then I can calculate the row order within each
> partition/`PCollection`  first and then do prefix-sum to compute the offset
> for each partition. This is doable in MPI or openMP since I can get the
> id/rank of each processor/thread.
>
> Best,
> Chak-Pong
>


-- 




Got feedback? tinyurl.com/swegner-feedback

Re: SplittableDoFn for zipWithIndex for a large file

Reply via email to