Hi all,
I am trying to make some improvements of portability framework to make it
usable in other projects. However, we find that the coder between runner
and harness can only be FullWindowedValueCoder. This means each time when
sending a WindowedValue, we have to encode/decode timestamp, windows and
pan infos. In some circumstances(such as using the portability framework in
Flink), only values are needed between runner and harness. So, it would be
nice if we can configure the coder and avoid redundant encoding and
decoding between runner and harness to improve the performance.
There are two approaches to solve this issue:
Approach 1: Support ValueOnlyWindowedValueCoder between runner and
harness.
Approach 2: Add a "constant" window coder that embeds all the
windowing information as part of the coder that should be used to wrap the
value during decoding.
More details can be found here [1].
As of the shortcomings of “Approach 2” which still need to encode/decode
timestamp and pane infos, we tend to choose “Approach 1” which brings
better performance and is more thorough.
Welcome any feedback :)
Best,
Jincheng
[1]
https://docs.google.com/document/d/1TTKZC6ppVozG5zV5RiRKXse6qnJl-EsHGb_LkUfoLxY/edit?usp=sharing