Hi Jing,

Thanks for the comments! Please find below my comments, which are based on
the offline discussion with Xuannan.

On Wed, Jul 5, 2023 at 1:36 AM Jing Ge <j...@ververica.com> wrote:

> Hi Xuannan, Hi Dong
>
> Thanks for the Proposal! After reading the FLIP, I'd like to ask some
> questions:
>
> 1. Naming convention for boolean variables. It is recommended to follow
> JavaBean [1], i.e. objectReuseCompliant as the variable name with
> isObjectReuseCompliant() and setObjectReuseCompliant() as the methods' name.
>
>
Good point. We have updated the FLIP as suggested.


>
> 2.
>
>    -
>
>    *If pipeline.object-reuse is set to true, records emitted by this
>    operator will be re-used.*
>    -
>
>    *Otherwise, if getIsObjectReuseCompliant() returns true, records
>    emitted by this operator will be re-used.*
>    -
>
>    *Otherwise, records emitted by this operator will be deep-copied
>    before being given to the next operator in the chain.*
>
>
> If I understand you correctly,  the hard coding objectReusedCompliant
> should have higher priority over the configuration, the checking logic
> should be:
>
>    -
>
>    *If getIsObjectReuseCompliant() returns true, records emitted by this
>    operator will be re-used.*
>    -
>
>    *Otherwise, if pipeline.object-reuse is set to true, records emitted
>    by this operator will be re-used.*
>    -
>
>    *Otherwise, records emitted by this operator will be deep-copied
>    before being given to the next operator in the chain.*
>
>
> The results are the same but the checking logics are different, but there
> are some additional thoughts, which lead us to the next question.
>

>

> 3. Current design lets specific operators enable object reuse and ignore
> the global config. There could be another thought, on the contrary: if an
> operator has hard coded the objectReuseCompliant as false, i.e. disable
> object reuse on purpose, records should not be reused even if the global
> config pipeline.object-reused is set to true, which turns out that the
> objectReuseCompliant could be a triple value logic: ture: force object
> reusing; false: force deep-copying; unknown: depends on
> pipeline.object-reuse config.
>

With the current proposal, if pipeline.object-reused == true and
operatorA's objectReuseCompliant == false, Flink will not deep-copy
operatorA's output. I think you are suggesting changing the behavior such
that Flink should deep-copy the operatorA's output.

Could you explain what is the advantage of this approach compared to the
approach described in the FLIP?

My concern with this approach is that it can cause performance regression.
This is an operator's objectReuseCompliant will be false by default unless
it is explicitly overridden. For those jobs which are currently configured
with pipeline.object-reused = true, these jobs will likely start to have
lower performance (due to object deep-copy) after upgrading to the newer
Flink version.

Best,
Dong


>
> Best regards,
> Jing
>
>
> [1] https://en.wikipedia.org/wiki/JavaBeans
>
> On Mon, Jul 3, 2023 at 4:25 AM Xuannan Su <suxuanna...@gmail.com> wrote:
>
>> Hi all,
>>
>> Dong(cc'ed) and I are opening this thread to discuss our proposal to
>> add operator attribute to allow operator to specify support for
>> object-reuse [1].
>>
>> Currently, the default configuration for pipeline.object-reuse is set
>> to false to avoid data corruption, which can result in suboptimal
>> performance. We propose adding APIs that operators can utilize to
>> inform the Flink runtime whether it is safe to reuse the emitted
>> records. This enhancement would enable Flink to maximize its
>> performance using the default configuration.
>>
>> Please refer to the FLIP document for more details about the proposed
>> design and implementation. We welcome any feedback and opinions on
>> this proposal.
>>
>> Best regards,
>>
>> Dong and Xuannan
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>>
>

Reply via email to