Hmmm...Thanks...This could very well be my bottleneck since I see tons of threads get on WAIT state after sometime& stay like that relatively forever.I have a 100 G worth of elements to process...........Is there a way to bypass this "startBundle" & get a fairly optimized behavior?Anyone? Thanks+regardsAmir-
From: Demin Alexey <diomi...@gmail.com> To: dev@beam.incubator.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Friday, November 18, 2016 10:40 AM Subject: Re: Flink runner. Wrapper for DoFn Very simple example: My DoFn on startBundle load filters from remote db and build optimized index, on processElement apply filters on every element for decision about push element to next operation or drop his. In current implementation it's like matching regexp on string, you have 2 way 1) compile regexp every time for every element 2) compile regexp one time and apply on all element now flink work by 1 way and this way not optimal 2016-11-18 22:26 GMT+04:00 amir bahmanyari <amirto...@yahoo.com.invalid>: > Hi Alexey," startBundle can be expensive"...Could you elaborate on > "expensive" as per each element pls? > Thanks > > From: Demin Alexey <diomi...@gmail.com> > To: dev@beam.incubator.apache.org > Sent: Friday, November 18, 2016 7:40 AM > Subject: Flink runner. Wrapper for DoFn > > Hi > > In flink runner we have this code: > > https://github.com/apache/incubator-beam/blob/master/ > runners/flink/runner/src/main/java/org/apache/beam/runners/ > flink/translation/wrappers/streaming/DoFnOperator.java#L262 > > but in mostly cases method startBundle can be expensive for making for > every element (for example connection for db/build cache/ etc) > > Why so important invoke startBundle/finishBundle on every > incoming streamRecord ? > > Thanks > Alexey Diomin > > > >