Re: Beam 2.28.0 objectReuse and fasterCopy for FlinkPipelineOption

2021-04-12 Thread Eleanore Jin
Thanks a lot Jan,

Im clear now.

Best,
Eleanore

On Mon, Apr 12, 2021 at 11:22 AM Jan Lukavský  wrote:

> Hi Eleanore,
>
> that's a good question. :)
>
> There has been discussion about this in the PR of the mentioned Jira [1].
> Generally, objectReuse enables Flink to hypothetically reuse instances of
> deserialized objects instead of creating new ones. But due to how Beam
> defines Coders it creates new instances nevertheless. See the discussion
> for details.
>
>  Jan
>
> [1] https://github.com/apache/beam/pull/13240#issuecomment-721635620
> On 4/12/21 7:29 PM, Eleanore Jin wrote:
>
> Hi Jan,
>
> Thanks a lot for the reply! This helps, I wonder if you have any idea
> whats the difference between fasterCopy vs objectReuse option?
>
> Eleanore
>
> On Fri, Apr 9, 2021 at 11:53 AM Jan Lukavský  wrote:
>
>> Hi Eleanore,
>>
>> the --fasterCopy option disables clone between operators (see [1]). It
>> should be safe to use it, unless your pipeline outputs an object and
>> later modifies the same instance. This is generally not supported by the
>> Beam model and is considered to be an user error. FlinkRunner
>> historically chose a way of "better-safe-than-sorry" approach and
>> explicitly cloned every received object between (non-shuffle) operators.
>> Enabling this option should increase performance, you can verify your
>> Pipeline is not doing any disallowed mutations using DirectRunner, which
>> checks this by default (without --enforceImmutability=false).
>>
>>   Jan
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-11146
>>
>> On 4/9/21 7:57 AM, Eleanore Jin wrote:
>> > Hi community,
>> >
>> > I am upgrading from Beam 2.23.0 -> 2.28.0, and a new
>> > FlinkPipelineOption is introduced: fasterCopy.
>> >
>> > Can you please help me understand what is the difference between the
>> > option objectReuse vs fasterCopy?
>> >
>> > Thanks a lot!
>> > Eleanore
>>
>


Re: Beam 2.28.0 objectReuse and fasterCopy for FlinkPipelineOption

2021-04-12 Thread Jan Lukavský

Hi Eleanore,

that's a good question. :)

There has been discussion about this in the PR of the mentioned Jira 
[1]. Generally, objectReuse enables Flink to hypothetically reuse 
instances of deserialized objects instead of creating new ones. But due 
to how Beam defines Coders it creates new instances nevertheless. See 
the discussion for details.


 Jan

[1] https://github.com/apache/beam/pull/13240#issuecomment-721635620

On 4/12/21 7:29 PM, Eleanore Jin wrote:

Hi Jan,

Thanks a lot for the reply! This helps, I wonder if you have any idea 
whats the difference between fasterCopy vs objectReuse option?


Eleanore

On Fri, Apr 9, 2021 at 11:53 AM Jan Lukavský > wrote:


Hi Eleanore,

the --fasterCopy option disables clone between operators (see
[1]). It
should be safe to use it, unless your pipeline outputs an object and
later modifies the same instance. This is generally not supported
by the
Beam model and is considered to be an user error. FlinkRunner
historically chose a way of "better-safe-than-sorry" approach and
explicitly cloned every received object between (non-shuffle)
operators.
Enabling this option should increase performance, you can verify your
Pipeline is not doing any disallowed mutations using DirectRunner,
which
checks this by default (without --enforceImmutability=false).

  Jan

[1] https://issues.apache.org/jira/browse/BEAM-11146


On 4/9/21 7:57 AM, Eleanore Jin wrote:
> Hi community,
>
> I am upgrading from Beam 2.23.0 -> 2.28.0, and a new
> FlinkPipelineOption is introduced: fasterCopy.
>
> Can you please help me understand what is the difference between
the
> option objectReuse vs fasterCopy?
>
> Thanks a lot!
> Eleanore



Re: Beam 2.28.0 objectReuse and fasterCopy for FlinkPipelineOption

2021-04-12 Thread Eleanore Jin
Hi Jan,

Thanks a lot for the reply! This helps, I wonder if you have any idea whats
the difference between fasterCopy vs objectReuse option?

Eleanore

On Fri, Apr 9, 2021 at 11:53 AM Jan Lukavský  wrote:

> Hi Eleanore,
>
> the --fasterCopy option disables clone between operators (see [1]). It
> should be safe to use it, unless your pipeline outputs an object and
> later modifies the same instance. This is generally not supported by the
> Beam model and is considered to be an user error. FlinkRunner
> historically chose a way of "better-safe-than-sorry" approach and
> explicitly cloned every received object between (non-shuffle) operators.
> Enabling this option should increase performance, you can verify your
> Pipeline is not doing any disallowed mutations using DirectRunner, which
> checks this by default (without --enforceImmutability=false).
>
>   Jan
>
> [1] https://issues.apache.org/jira/browse/BEAM-11146
>
> On 4/9/21 7:57 AM, Eleanore Jin wrote:
> > Hi community,
> >
> > I am upgrading from Beam 2.23.0 -> 2.28.0, and a new
> > FlinkPipelineOption is introduced: fasterCopy.
> >
> > Can you please help me understand what is the difference between the
> > option objectReuse vs fasterCopy?
> >
> > Thanks a lot!
> > Eleanore
>


Re: Beam 2.28.0 objectReuse and fasterCopy for FlinkPipelineOption

2021-04-09 Thread Jan Lukavský

Hi Eleanore,

the --fasterCopy option disables clone between operators (see [1]). It 
should be safe to use it, unless your pipeline outputs an object and 
later modifies the same instance. This is generally not supported by the 
Beam model and is considered to be an user error. FlinkRunner 
historically chose a way of "better-safe-than-sorry" approach and 
explicitly cloned every received object between (non-shuffle) operators. 
Enabling this option should increase performance, you can verify your 
Pipeline is not doing any disallowed mutations using DirectRunner, which 
checks this by default (without --enforceImmutability=false).


 Jan

[1] https://issues.apache.org/jira/browse/BEAM-11146

On 4/9/21 7:57 AM, Eleanore Jin wrote:

Hi community,

I am upgrading from Beam 2.23.0 -> 2.28.0, and a new 
FlinkPipelineOption is introduced: fasterCopy.


Can you please help me understand what is the difference between the 
option objectReuse vs fasterCopy?


Thanks a lot!
Eleanore


Beam 2.28.0 objectReuse and fasterCopy for FlinkPipelineOption

2021-04-08 Thread Eleanore Jin
Hi community,

I am upgrading from Beam 2.23.0 -> 2.28.0, and a new FlinkPipelineOption is
introduced: fasterCopy.

Can you please help me understand what is the difference between the option
objectReuse vs fasterCopy?

Thanks a lot!
Eleanore