Hi,
I just found out that DirectRunner is apparently not using
LateDataDroppingDoFnRunner, which means that it doesn't drop late data
in cases where there is no GBK operation involved (dropping in GBK seems
to be correct). There is apparently no @Category(ValidatesRunner) test
for that behavior (because DirectRunner would fail it), so the question
is - should late data dropping be considered part of model (of which
DirectRunner should be a canonical implementation) and therefore that
should be fixed there, or is the late data dropping an optional feature
of a runner?
I'm strongly in favor of the first option, and I think it is likely that
all real-world runners would probably adhere to that (I didn't check
that, though).
Opinions?
Jan
- Dropping late data in DirectRunner Jan Lukavský
-