Hi Xuannan, Hi Dong,

Thanks for your clarification.

@Xuannan

A Jira ticket has been created for the doc update:
https://issues.apache.org/jira/browse/FLINK-32546

@Dong

I don't have a concrete example. I just thought about it from a conceptual
or pattern's perspective. Since we have 1. coarse-grained global switch(CGS
as abbreviation), i.e. the pipeline.object-reuse and 2. fine-grained local
switch(FGS as abbreviation), i.e. the objectReuseCompliant variable for
specific operators/functions, there will be the following patterns with
appropriate combinations:

pattern 1: coarse-grained switch only. Local object reuse will be
controlled by the coarse-grained switch:
1.1 cgs == true -> local object reused enabled
1.2 cgs == true  -> local object reused enabled
1.3 cgs == false -> local object reused disabled, i.e. deep copy enabled
1.4 cgs == false -> local object reused disabled, i.e. deep copy enabled

afaiu, this is the starting point. I wrote 4 on purpose to make the
regression check easier. We can consider it as the combinations with
cgs(true/false) and fgs(true/false) while fgs is ignored.

Now we introduce fine-grained switch. There will be two patterns:

pattern 2: fine-grained switch over coarse-grained switch. Coarse-grained
switch will be ignored when the local fine-grained switch has different
value:
2.1 cgs == true and fgs == true -> local object reused enabled
2.2 cgs == true and fgs == false -> local object reused disabled, i.e. deep
copy enabled
2.3 cgs == false and fgs == true -> local object reused enabled
2.4 cgs == false and fgs == false -> local object reused disabled, i.e.
deep copy enabled

cgs is actually ignored.

Current FLIP is using a slightly different pattern:

pattern 3: fine-grained switch over coarse-grained switch only when
coarse-grained switch is off, i..e cgs OR fgs:
3.1 cgs == true and fgs == true -> local object reused enabled
3.2 cgs == true and fgs == false -> local object reused enabled
3.3 cgs == false and fgs == true -> local object reused enabled
3.4 cgs == false and fgs == false -> local object reused disabled, i.e.
deep copy enabled

All of those patterns are rational and each has different focus. It depends
on the real requirement to choose one of them.

As we can see, if fgs is using 2VL, there is a regression between pattern 1
and pattern 2. You are absolutely right in this case. That's why I
suggested 3VL, i.e. fgs will have triple values: true, false, unknown(e.g.
null)

pattern 4: 3VL fgs with the null as init value (again, there are just two
combination, I made it 4 on purpose):
4.1 cgs == true and fgs == null -> local object reused enabled
4.2 cgs == true and fgs == null -> local object reused enabled
4.3 cgs == false and fgs == null -> local object reused disabled, i.e. deep
copy enabled
4.4 cgs == false and fgs == null -> local object reused disabled, i.e. deep
copy enabled

Since the default value of fgs is null, pattern 4 is backward compatible
with pattern 1, which means no regression.

Now we will set value to fgs and follow the pattern 2:
4.5 cgs == true and fgs == true -> local object reused enabled
4.6 cgs == true and fgs == false -> local object reused disabled, i.e. deep
copy enabled
4.7 cgs == false and fgs == true -> local object reused enabled
4.8 cgs == false and fgs == false -> local object reused disabled, i.e.
deep copy enabled

Pattern 4 contains pattern 3 with the following combinations(force enabling
local object reuse):
4.5 cgs == true and fgs == true -> local object reused enabled
4.2 cgs == true and fgs == null -> local object reused enabled
4.7 cgs == false and fgs == true -> local object reused enabled
4.4 cgs == false and fgs == null -> local object reused disabled, i.e. deep
copy enabled

Comparing pattern 4 to pattern 3, user will have one additional flexibility
to control(force disabling) the local object reuse capability because of
3VL, i.e. 4.2+4.6 vs. 3.2.

It is commonly used in the hierarchical RBAC to enable more fine-grained
access control of sub role.

I hope I have been able to explain myself clearly. Looking forward to your
feedback.

Best regards,
Jing



On Wed, Jul 5, 2023 at 12:47 PM Dong Lin <lindon...@gmail.com> wrote:

> Hi Jing,
>
> Thanks for the comments! Please find below my comments, which are based on
> the offline discussion with Xuannan.
>
> On Wed, Jul 5, 2023 at 1:36 AM Jing Ge <j...@ververica.com> wrote:
>
>> Hi Xuannan, Hi Dong
>>
>> Thanks for the Proposal! After reading the FLIP, I'd like to ask some
>> questions:
>>
>> 1. Naming convention for boolean variables. It is recommended to follow
>> JavaBean [1], i.e. objectReuseCompliant as the variable name with
>> isObjectReuseCompliant() and setObjectReuseCompliant() as the methods' name.
>>
>>
> Good point. We have updated the FLIP as suggested.
>
>
>>
>> 2.
>>
>>    -
>>
>>    *If pipeline.object-reuse is set to true, records emitted by this
>>    operator will be re-used.*
>>    -
>>
>>    *Otherwise, if getIsObjectReuseCompliant() returns true, records
>>    emitted by this operator will be re-used.*
>>    -
>>
>>    *Otherwise, records emitted by this operator will be deep-copied
>>    before being given to the next operator in the chain.*
>>
>>
>> If I understand you correctly,  the hard coding objectReusedCompliant
>> should have higher priority over the configuration, the checking logic
>> should be:
>>
>>    -
>>
>>    *If getIsObjectReuseCompliant() returns true, records emitted by this
>>    operator will be re-used.*
>>    -
>>
>>    *Otherwise, if pipeline.object-reuse is set to true, records emitted
>>    by this operator will be re-used.*
>>    -
>>
>>    *Otherwise, records emitted by this operator will be deep-copied
>>    before being given to the next operator in the chain.*
>>
>>
>> The results are the same but the checking logics are different, but there
>> are some additional thoughts, which lead us to the next question.
>>
>
>>
>
>> 3. Current design lets specific operators enable object reuse and ignore
>> the global config. There could be another thought, on the contrary: if an
>> operator has hard coded the objectReuseCompliant as false, i.e. disable
>> object reuse on purpose, records should not be reused even if the global
>> config pipeline.object-reused is set to true, which turns out that the
>> objectReuseCompliant could be a triple value logic: ture: force object
>> reusing; false: force deep-copying; unknown: depends on
>> pipeline.object-reuse config.
>>
>
> With the current proposal, if pipeline.object-reused == true and
> operatorA's objectReuseCompliant == false, Flink will not deep-copy
> operatorA's output. I think you are suggesting changing the behavior such
> that Flink should deep-copy the operatorA's output.
>
> Could you explain what is the advantage of this approach compared to the
> approach described in the FLIP?
>
> My concern with this approach is that it can cause performance regression.
> This is an operator's objectReuseCompliant will be false by default unless
> it is explicitly overridden. For those jobs which are currently configured
> with pipeline.object-reused = true, these jobs will likely start to have
> lower performance (due to object deep-copy) after upgrading to the newer
> Flink version.
>
> Best,
> Dong
>
>
>>
>> Best regards,
>> Jing
>>
>>
>> [1] https://en.wikipedia.org/wiki/JavaBeans
>>
>> On Mon, Jul 3, 2023 at 4:25 AM Xuannan Su <suxuanna...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Dong(cc'ed) and I are opening this thread to discuss our proposal to
>>> add operator attribute to allow operator to specify support for
>>> object-reuse [1].
>>>
>>> Currently, the default configuration for pipeline.object-reuse is set
>>> to false to avoid data corruption, which can result in suboptimal
>>> performance. We propose adding APIs that operators can utilize to
>>> inform the Flink runtime whether it is safe to reuse the emitted
>>> records. This enhancement would enable Flink to maximize its
>>> performance using the default configuration.
>>>
>>> Please refer to the FLIP document for more details about the proposed
>>> design and implementation. We welcome any feedback and opinions on
>>> this proposal.
>>>
>>> Best regards,
>>>
>>> Dong and Xuannan
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>>>
>>

Reply via email to