soooounds good. Thanks guys <3 On Tue, Apr 9, 2019 at 3:19 PM Lukasz Cwik <[email protected]> wrote:
> UnboundedSources and SplittableDoFns report watermarks which the runner > uses to compute how much the watermark could advance if it processed some > outstanding work. But it is always upto the runner to choose when the > watermark advances. The runner could process each work item in watermark > priority order and advance the watermark in small increments or could > process many work items and then advance the watermark a lot. (Note that > the BoundedSources API doesn't allow for reporting the watermark and it > starts at Beam's concept of START OF TIME and advances in one step to > Beam's concept of END OF TIME). > > You might be able to write what you want with an event based timer. Kenn > wrote (2?) blog posts on state and timers that have some pretty good > explanations and examples. > > On Tue, Apr 9, 2019 at 2:27 PM Pablo Estrada <[email protected]> wrote: > >> hi Luke, >> thanks for the prompt reply: ) >> >> That makes sense. I think I'll go back to my cave to read a bunch about >> streaming. : ) >> >> I was looking for this to try to write a sequence generator for Python in >> streaming, and I was trying to debug what was going on. I was trying to >> allow the DoFn to receive a watermark reported by the upstream source. (... >> does that answer "which watermark?"... I am not sure that it does... but >> maybe..). >> >> Do you think it's a reasonable use case for DoFns to know what the >> upstream watermark is? >> I hope that makes at least a some sense... : ) >> >> If it doesn't make sense, feel free to ignore, and I'll go do my readings. >> Thanks! >> -P. >> >> On Tue, Apr 9, 2019 at 1:44 PM Lukasz Cwik <[email protected]> wrote: >> >>> WatermarkReporterParam is about reporting the watermark. The main >>> usecase is for SplittableDoFns to be able to report the data watermark. >>> >>> The watermark is per input and output of a DoFn. Also each bundle being >>> processed has its local watermarks while the runner computes the global >>> watermark. The runners watermark could be per key, or key range or global >>> across all keys. >>> >>> There is no runner agnostic way to read the watermark today. Is there a >>> usecase you are targeting that would help from having access to the >>> watermark (also, which watermark?)? >>> >>> >>> On Tue, Apr 9, 2019 at 1:28 PM Pablo Estrada <[email protected]> wrote: >>> >>>> I am experimenting with state / timers in Python. As I look at the >>>> DoFnProcessParams[1], I see that it's possible for a DoFn to receive >>>> several arguments (e.g. Timers, Side Inputs, etc). Also the Watermark via >>>> WatermarkReporterParam. >>>> >>>> I see that this parameter is not handled by runners when filling up the >>>> arguments for a DoFn[2][3]. So, as far as I can tell, DoFns are not >>>> currently able to get the watermark. >>>> >>>> Is this a bug, or is it intentional? Perhaps there's another way to >>>> find out the watermark for a DoFn? >>>> >>>> Best >>>> -P. >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py#L381-L390 >>>> >>>> [2] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L477-L488 >>>> [3] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L605-L620 >>>> >>>
