Hi,
I came across a case where using
PCollection#applyWindowingStrategyInternal seems legit in user core. The
case is roughly as follows:
a) compute some streaming statistics
b) apply the same transform (say ComputeWindowedAggregation) with
different parameters on these statistics yielding two windowed
PCollections - first is global with early trigger, the other is sliding
window, the specific parameters of the windowFns are encapsulated in the
ComputeWindowedAggregation transform
c) apply the same transform on both of the above PCollections,
yielding two PCollections with the same types, but different windowFns
d) flatten these PCollections into single one (e.g. for downstream
processing - joining - or flushing to sink)
Now, the flatten will not work, because these PCollections have
different windowFns. It would be possible to restore the windowing for
either of them, but it requires to somewhat break the encapsulation of
the transforms that produce the windowed outputs. A more natural
solution is to take the WindowingStrategy from the global aggregation
and set it via setWindowingStrategyInternal() to the other PCollection.
This works, but it uses API that is marked as @Internal (and obviously,
the name as well suggests it is not intended for client-code usage).
The question is, should we make a legitimate version of this call? Or
should we introduce a way for Flatten.pCollections() to re-window the
input PCollections appropriately? In the case of conflicting WindowFns,
where one of them is GlobalWindowing strategy, it seems to me that the
user's intention is quite well-defined (this might extend to some
'flatten windowFn resolution strategy', maybe).
WDYT?
Jan