Very simple example: My DoFn on startBundle load filters from remote db and build optimized index, on processElement apply filters on every element for decision about push element to next operation or drop his.
In current implementation it's like matching regexp on string, you have 2 way 1) compile regexp every time for every element 2) compile regexp one time and apply on all element now flink work by 1 way and this way not optimal 2016-11-18 22:26 GMT+04:00 amir bahmanyari <amirto...@yahoo.com.invalid>: > Hi Alexey," startBundle can be expensive"...Could you elaborate on > "expensive" as per each element pls? > Thanks > > From: Demin Alexey <diomi...@gmail.com> > To: dev@beam.incubator.apache.org > Sent: Friday, November 18, 2016 7:40 AM > Subject: Flink runner. Wrapper for DoFn > > Hi > > In flink runner we have this code: > > https://github.com/apache/incubator-beam/blob/master/ > runners/flink/runner/src/main/java/org/apache/beam/runners/ > flink/translation/wrappers/streaming/DoFnOperator.java#L262 > > but in mostly cases method startBundle can be expensive for making for > every element (for example connection for db/build cache/ etc) > > Why so important invoke startBundle/finishBundle on every > incoming streamRecord ? > > Thanks > Alexey Diomin > > > >