Hi,

I came across a case where using PCollection#applyWindowingStrategyInternal seems legit in user core. The case is roughly as follows:

 a) compute some streaming statistics

 b) apply the same transform (say ComputeWindowedAggregation) with different parameters on these statistics yielding two windowed PCollections - first is global with early trigger, the other is sliding window, the specific parameters of the windowFns are encapsulated in the ComputeWindowedAggregation transform

 c) apply the same transform on both of the above PCollections, yielding two PCollections with the same types, but different windowFns

 d) flatten these PCollections into single one (e.g. for downstream processing - joining - or flushing to sink)

Now, the flatten will not work, because these PCollections have different windowFns. It would be possible to restore the windowing for either of them, but it requires to somewhat break the encapsulation of the transforms that produce the windowed outputs. A more natural solution is to take the WindowingStrategy from the global aggregation and set it via setWindowingStrategyInternal() to the other PCollection. This works, but it uses API that is marked as @Internal (and obviously, the name as well suggests it is not intended for client-code usage).

The question is, should we make a legitimate version of this call? Or should we introduce a way for Flatten.pCollections() to re-window the input PCollections appropriately? In the case of conflicting WindowFns, where one of them is GlobalWindowing strategy, it seems to me that the user's intention is quite well-defined (this might extend  to some 'flatten windowFn resolution strategy', maybe).

WDYT?

 Jan

Reply via email to