Thank you, Kenn! Shen
On Thu, Mar 8, 2018 at 9:58 PM, Kenneth Knowles <k...@google.com> wrote: > > > On Thu, Mar 8, 2018 at 6:50 PM Shen Li <cs.she...@gmail.com> wrote: > >> Hi Kenn, >> >> I just want to confirm that I understand it correctly. >> >> > - You know that W is expired only when you can be sure that no main >> input element could reference it. >> >> This is determined by the *main input* watermark, allowedLateness, and >> maximumLookback, right? >> >> https://github.com/apache/beam/blob/master/sdks/java/ >> core/src/main/java/org/apache/beam/sdk/transforms/windowing/ >> WindowMappingFn.java#L68 >> > > Yes, I think you can use this formula: https://github.com/ > apache/beam/blob/master/sdks/java/core/src/main/java/org/ > apache/beam/sdk/transforms/windowing/WindowMappingFn.java#L61 > > > >> > when W expires on the side input you make it ready, you process the >> elements with empty contents on the side input, then you GC the side input. >> >> Even if W is unready according to *side input* watermark, the >> runner/engine should still make it ready when it violates maximumLookback >> and *main input* watermark. Is that correct? >> > > This is true if there are no buffered elements. Then you can be sure that > no main input element will show up that accesses W. > > Kenn > > >> >> Thanks, >> Shen >> >> >> >> >> >> On Thu, Mar 8, 2018 at 9:31 PM, Shen Li <cs.she...@gmail.com> wrote: >> >>> I see. Thank you Kenn and Lukasz. >>> >>> Best, >>> Shen >>> >>> >>> On Thu, Mar 8, 2018 at 7:46 PM, Kenneth Knowles <k...@google.com> wrote: >>> >>>> I think the description of when a side input is ready vs expired is the >>>> trouble here. >>>> >>>> - You know that W is expired only when you can be sure that no main >>>> input element could reference it. >>>> - You know that W is ready *even if it got no data* if the input that >>>> would end up in W would be dropped (aka when W expires according to the >>>> *side input* watermark) >>>> >>>> So for your scenario, you push back the elements, that holds W from >>>> being collected, when W expires on the side input you make it ready, you >>>> process the elements with empty contents on the side input, then you GC the >>>> side input. >>>> >>>> Kenn >>>> >>>> On Thu, Mar 8, 2018 at 4:32 PM Shen Li <cs.she...@gmail.com> wrote: >>>> >>>>> Hi Lukasz, >>>>> >>>>> Let's explain this problem using a specific example. >>>>> >>>>> Say I have a main input element X, which accesses side input window W. >>>>> When X arrives at a ParDo operator, W is not ready and not expired either. >>>>> So, in this case, the ParDo should push back X and wait for W to become >>>>> ready. Say, after two minutes, W is still unready but is expired due to >>>>> advanced main input watermark. In this situation, how does Beam expect >>>>> runners/engines to handle the pushed back value X? Discard X or throw an >>>>> error? >>>>> >>>>> Thanks, >>>>> Shen >>>>> >>>>> On Thu, Mar 8, 2018 at 6:35 PM, Lukasz Cwik <lc...@google.com> wrote: >>>>> >>>>>> I believe your missing over this point: "and also to not expire the >>>>>> side input till the main input watermark advances beyond the garbage >>>>>> collection hold of the side input." >>>>>> >>>>>> On Thu, Mar 8, 2018 at 3:33 PM, Shen Li <cs.she...@gmail.com> wrote: >>>>>> >>>>>>> Hi Lukasz, >>>>>>> >>>>>>> Thanks again. >>>>>>> >>>>>>> > the runner is required to hold back the main input till the side >>>>>>> input is ready >>>>>>> >>>>>>> Yes, I understand these requirements. But what if the side input >>>>>>> expires before it becomes ready? >>>>>>> >>>>>>> Shen >>>>>>> >>>>>>> >>>>>> >>>>> >>> >>