On Thu, Nov 7, 2019 at 7:36 PM Kenneth Knowles <k...@apache.org> wrote:

>
>
> On Thu, Nov 7, 2019 at 9:19 AM Luke Cwik <lc...@google.com> wrote:
>
>> I did suggest one other alternative on Jincheng's PR[1] which was to
>> allow windowless values to be sent across the gRPC port. The SDK would then
>> be responsible for ensuring that the execution didn't access any properties
>> that required knowledge of the timestamp, pane or window. This is different
>> then adding the ValueOnlyWindowedValueCoder as a model coder because it
>> allows SDKs to pass around raw values to functions without any windowing
>> overhead which could be useful for things like the side input window
>> mapping or window merging functions we have.
>>
>
> When you say "pass around" what does it mean? If it is over the wire,
> there is already no overhead to ValueOnlyWindowedValueCoder. So do you mean
> the overhead of having the layer of boxing of WindowedValue? I would assume
> all non-value components of the WindowedValue from
> ValueOnlyWindowedValueCoder are pointers to a single shared immutable
> instance carried by the coder instance.
>

I was referring to the layer of boxing of WindowedValue. My concern wasn't
the performance overhead of passing around a wrapper object but the
cognitive overhead of understanding why everything needs to be wrapped in a
windowed value. Since you have been working on SQL for some time, this
would be analogous to executing a UDF and making all the machinery around
it take WindowedValue<T> instead of T.


> I think raw values can already be passed to functions, no? The main thing
> is that elements in a PCollection always have a window, timestamp, and
> paneinfo. Not all values are elements. Is there a specific case you have in
> mind? I would not expect WindowMappingFn or window merging fn to be passing
> "elements" but just values of the appropriate type for the function.
>

This is about the machinery around WindowMappingFn/WindowMergingFn. For
example the implementation around WindowMappingFn takes a
WindowedValue<Window> and unwraps it forwarding it to the WindowMappingFn
and then takes the result and wraps it in a WindowedValue<Window> and
returns that to the runner.


>
>
>> On Thu, Nov 7, 2019 at 8:48 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> I think there is some misunderstanding about what is meant by option
>>> 2. What Kenn (I think) and I are proposing is not a WindowedValueCoder
>>> whose window/timestamp/paneinfo coders are parameterized to be
>>> constant coders, but a WindowedValueCoder whose
>>> window/timestamp/paneinfo values are specified as constants in the
>>> coder.
>>>
>>> Let's call this NewValueOnlyWindowedValueCoder, and is parameterized
>>> by a window, timestamp, and pane info instance
>>>
>>> The existing ValueOnlyWindowedValueCoder is literally
>>> NewValueOnlyWindowedValueCoder(GlobalWindow, MIN_TIMESTAMP,
>>> PaneInfo.NO_FIRING). Note in particular that using the existing
>>> ValueOnlyWindowedValueCoder would give the wrong timestamp and pane
>>> info if it is use for the result of a GBK, which I think is the loss
>>> of consistency referred to here.
>>>
>>
> Yes, this is exactly what I am proposing and sounds like what "approach 2"
> is. I think this approach "just works". It is simpler and more efficient
> than "WindowedValueCoder.of(ValueCoder, WindowCoder, TimestampCoder,
> PaneInfoCoder)" which would require some overhead even if WindowCoder,
> TimestampCoder and PaneInfoCoder were constant coders consuming zero bytes.
>
> Kenn
>
>
>
>> On Thu, Nov 7, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for your feedback and the valuable comments, Kenn & Robert!
>>> >
>>> > I think your comments are more comprehensive and enlighten me a lot.
>>> The two proposals which I mentioned above are to reuse the existing coder
>>> (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your
>>> comments, I think we can further abstract 'FullWindowedValueCoder' and
>>> 'ValueOnlyWindowedValueCoder', that is, we can rename
>>> 'FullWindowedValueCoder' to 'WindowedValueCoder', and make the coders of
>>> window/timestamp/pane configurable. Then we can remove
>>> 'ValueOnlyWindowedValueCoder' and need to add a mount of constant coders
>>> for window/timestamp/pane.
>>> >
>>> > I have replied your comments on the doc, and quick feedback as
>>> following:
>>> >
>>> > Regarding to "Approach 2: probably no SDK harness work / compatible
>>> with existing Beam model so no risk of introducing inconsistency",if we
>>> "just puts default window/timestamp/pane info on elements" and don't change
>>> the original coder, then the performance is not optimized. If we want to
>>> get the best performance, then the default coder of Window/timestamp/pane
>>> should be constant coder. In this case the SDK harnesses need to be aware
>>> of the constant coder and there will be some development work in the SDK
>>> harness. Besides, the SDK harness also needs to make the coders for
>>> window/timestamp/pane configurable and this will introduce some related
>>> changes, such as updating WindowedValueCoder._get_component_coders, etc.
>>> >
>>> > Regarding to "Approach 1: option a: if the SDK harness has to
>>> understand 'values without windows' then very large changes and high risk
>>> of introducing inconsistency (I eliminated many of these inconsistencies)",
>>> we only need to add ValueOnlyWindowedValueCoder to the StandardCoders and
>>> all the SDK harness should be aware of this coder. There is no much changes
>>> actually.
>>> >
>>> > Please feel free to correct me if there is anyting incorrect. :)
>>> >
>>> > Besides, I'm not quite clear about the consistency issues you meant
>>> here. Could you please give me some hints about this?
>>> >
>>> > Best,
>>> > Jincheng
>>> >
>>> > Robert Bradshaw <rober...@google.com> 于2019年11月7日周四 上午3:38写道:
>>> >>
>>> >> Yes, the portability framework is designed to support this, and
>>> >> possibly even more efficient transfers of data than element-by-element
>>> >> as per the wire coder specified in the IO port operators. I left some
>>> >> comments on the doc as well, and would also prefer approach 2.
>>> >>
>>> >> On Wed, Nov 6, 2019 at 11:03 AM Kenneth Knowles <k...@apache.org>
>>> wrote:
>>> >> >
>>> >> > I think the portability framework is designed for this. The runner
>>> controls the coder on the grpc ports and the runner controls the process
>>> bundle descriptor.
>>> >> >
>>> >> > I commented on the doc. I think what is missing is analysis of
>>> scope of SDK harness changes and risk to model consistency
>>> >> >
>>> >> >     Approach 2: probably no SDK harness work / compatible with
>>> existing Beam model so no risk of introducing inconsistency
>>> >> >
>>> >> >     Approach 1: what are all the details?
>>> >> >         option a: if the SDK harness has to understand "values
>>> without windows" then very large changes and high risk of introducing
>>> inconsistency (I eliminated many of these inconsistencies)
>>> >> >         option b: if the coder just puts default
>>> window/timestamp/pane info on elements, then it is the same as approach 2,
>>> no work / no risk
>>> >> >
>>> >> > Kenn
>>> >> >
>>> >> > On Wed, Nov 6, 2019 at 1:09 AM jincheng sun <
>>> sunjincheng...@gmail.com> wrote:
>>> >> >>
>>> >> >> Hi all,
>>> >> >>
>>> >> >> I am trying to make some improvements of portability framework to
>>> make it usable in other projects. However, we find that the coder between
>>> runner and harness can only be FullWindowedValueCoder. This means each time
>>> when sending a WindowedValue, we have to encode/decode timestamp, windows
>>> and pan infos. In some circumstances(such as using the portability
>>> framework in Flink), only values are needed between runner and harness. So,
>>> it would be nice if we can configure the coder and avoid redundant encoding
>>> and decoding between runner and harness to improve the performance.
>>> >> >>
>>> >> >> There are two approaches to solve this issue:
>>> >> >>
>>> >> >>     Approach 1:  Support ValueOnlyWindowedValueCoder between
>>> runner and harness.
>>> >> >>     Approach 2:  Add a "constant" window coder that embeds all the
>>> windowing information as part of the coder that should be used to wrap the
>>> value during decoding.
>>> >> >>
>>> >> >> More details can be found here [1].
>>> >> >>
>>> >> >> As of the shortcomings of “Approach 2” which still need to
>>> encode/decode timestamp and pane infos, we tend to choose “Approach 1”
>>> which brings better performance and is more thorough.
>>> >> >>
>>> >> >> Welcome any feedback :)
>>> >> >>
>>> >> >> Best,
>>> >> >> Jincheng
>>> >> >>
>>> >> >> [1]
>>> https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing
>>> >> >>
>>>
>>

Reply via email to