Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-07 Thread Yong Fang
Hi Ken,

I think the main reason is that currently Kryo is the only generic
serializer in Flink. I'm looking forward to your FLIP of Fury, and we can
continue to discuss this issue there.

If there are no other questions, I will close the voting for this FLIP.
Thank you again.

Best,
Fang Yong

On Sat, Jan 6, 2024 at 2:27 AM Ken Krugler 
wrote:

> Hi Fang Yong,
>
> Thanks for the response, and I understand the desire to limit the impact
> of this FLIP.
>
> I guess I should spend the time to start a new FLIP on switching to Fury,
> which could include cleaning up method names.
>
> In the context of “facilitate user understanding”, one aspect of this
> cleanup is the current ExecutionConfig.enable/disable/hasGenericTypes()
> methods.
>
> These are inconsistent with the current xxxKryo() methods, and cause
> confusion whenever I’m teaching a Flink course :)
>
> Regards,
>
> — Ken
>
>
>
>
> On Jan 4, 2024, at 6:40 PM, Yong Fang  wrote:
>
> Hi Ken,
>
> Sorry for the late reply. After discussing with @Xintong, we think it is
> better to keep the method names in the FLIP mainly for the following
> reasons:
>
> 1. This FLIP is mainly to support the configurable serializer while
> keeping consistent with Flink at the semantic layer. Keeping the existing
> naming rules can facilitate user understanding.
>
> 2. In the future, if Flink can choose Fury as the generic serializer, we
> can update the corresponding methods in that FLIP after the discussion of
> Fury is completed. This will be a minor modification, and we can avoid
> over-design in the current FLIP.
>
> Thanks for your feedback!
>
> Best,
> Fang Yong
>
> On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler 
> wrote:
>
>> Hi Xintong,
>>
>> I agree that decoupling from Kryo is a bigger topic, well beyond the
>> scope of this FLIP.
>>
>> The reason I’d brought up Fury is that this increases my confidence that
>> Flink will want to decouple from Kryo sooner rather than later.
>>
>> So I feel it would be worth investing in a (minor) name change now, to
>> improve that migration path in the future. Thus my suggestion for avoiding
>> the explicit use of Kryo in method names.
>>
>> Regards,
>>
>> — Ken
>>
>>
>>
>>
>> > On Dec 17, 2023, at 7:16 PM, Xintong Song 
>> wrote:
>> >
>> > Hi Ken,
>> >
>> > I think the main purpose of this FLIP is to change how users interact
>> with
>> > the knobs for customizing the serialization behaviors, from requiring
>> code
>> > changes to working with pure configurations. Redesigning the knobs
>> (i.e.,
>> > names, semantics, etc.), on the other hand, is not the purpose of this
>> > FLIP. Preserving the existing names and semantics should also help
>> minimize
>> > the migration cost for existing users. Therefore, I'm in favor of not
>> > changing them.
>> >
>> > Concerning decoupling from Kryo, and introducing other serialization
>> > frameworks like Fury, I think that's a bigger topic that is worth
>> further
>> > discussion. At the moment, I'm not aware of any community consensus on
>> > doing so. And even if in the future we decide to do so, the changes
>> needed
>> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
>> > FLIP on these issues.
>> >
>> > WDYT?
>> >
>> > Best,
>> >
>> > Xintong
>> >
>> >
>> >
>> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler <
>> kkrugler_li...@transpac.com>
>> > wrote:
>> >
>> >> Hi Yong,
>> >>
>> >> Looks good, thanks for creating this.
>> >>
>> >> One comment - related to my recent email about Fury, I would love to
>> see
>> >> the v2 serialization decoupled from Kryo.
>> >>
>> >> As part of that, instead of using xxxKryo in methods, call them
>> xxxGeneric.
>> >>
>> >> A more extreme change would be to totally rely on Fury (so no more POJO
>> >> serializer). Fury is faster than the POJO serializer in my tests, but
>> this
>> >> would be a much bigger change.
>> >>
>> >> Though it could dramatically simplify the Flink serialization support.
>> >>
>> >> — Ken
>> >>
>> >> PS - a separate issue is how to migrate state from Kryo to something
>> like
>> >> Fury, which supports schema evolution. I think this might be possible,
>> by
>> >> having a smarter deserializer that identifies state as being created by
>> >> Kryo, and using (shaded) Kryo to deserialize, while still writing as
>> Fury.
>> >>
>> >>> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
>> >>>
>> >>> Hi devs,
>> >>>
>> >>> I'd like to start a discussion about FLIP-398: Improve Serialization
>> >>> Configuration And Usage In Flink [1].
>> >>>
>> >>> Currently, users can register custom data types and serializers in
>> Flink
>> >>> jobs through various methods, including registration in code,
>> >>> configuration, and annotations. These lead to difficulties in
>> upgrading
>> >>> Flink jobs and priority issues.
>> >>>
>> >>> In flink-2.0 we would like to manage job data types and serializers
>> >> through
>> >>> configurations. This FLIP will introduce a unified option for data
>> type
>> >> and
>> >>> serializer and users 

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-05 Thread Ken Krugler
Hi Fang Yong,

Thanks for the response, and I understand the desire to limit the impact of 
this FLIP.

I guess I should spend the time to start a new FLIP on switching to Fury, which 
could include cleaning up method names.

In the context of “facilitate user understanding”, one aspect of this cleanup 
is the current ExecutionConfig.enable/disable/hasGenericTypes() methods.

These are inconsistent with the current xxxKryo() methods, and cause confusion 
whenever I’m teaching a Flink course :)

Regards,

— Ken




> On Jan 4, 2024, at 6:40 PM, Yong Fang  wrote:
> 
> Hi Ken,
> 
> Sorry for the late reply. After discussing with @Xintong, we think it is 
> better to keep the method names in the FLIP mainly for the following reasons:
> 
> 1. This FLIP is mainly to support the configurable serializer while keeping 
> consistent with Flink at the semantic layer. Keeping the existing naming 
> rules can facilitate user understanding. 
> 
> 2. In the future, if Flink can choose Fury as the generic serializer, we can 
> update the corresponding methods in that FLIP after the discussion of Fury is 
> completed. This will be a minor modification, and we can avoid over-design in 
> the current FLIP.
> 
> Thanks for your feedback!
> 
> Best,
> Fang Yong
> 
> On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler  > wrote:
>> Hi Xintong,
>> 
>> I agree that decoupling from Kryo is a bigger topic, well beyond the scope 
>> of this FLIP.
>> 
>> The reason I’d brought up Fury is that this increases my confidence that 
>> Flink will want to decouple from Kryo sooner rather than later.
>> 
>> So I feel it would be worth investing in a (minor) name change now, to 
>> improve that migration path in the future. Thus my suggestion for avoiding 
>> the explicit use of Kryo in method names.
>> 
>> Regards,
>> 
>> — Ken
>> 
>> 
>> 
>> 
>> > On Dec 17, 2023, at 7:16 PM, Xintong Song > > > wrote:
>> > 
>> > Hi Ken,
>> > 
>> > I think the main purpose of this FLIP is to change how users interact with
>> > the knobs for customizing the serialization behaviors, from requiring code
>> > changes to working with pure configurations. Redesigning the knobs (i.e.,
>> > names, semantics, etc.), on the other hand, is not the purpose of this
>> > FLIP. Preserving the existing names and semantics should also help minimize
>> > the migration cost for existing users. Therefore, I'm in favor of not
>> > changing them.
>> > 
>> > Concerning decoupling from Kryo, and introducing other serialization
>> > frameworks like Fury, I think that's a bigger topic that is worth further
>> > discussion. At the moment, I'm not aware of any community consensus on
>> > doing so. And even if in the future we decide to do so, the changes needed
>> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
>> > FLIP on these issues.
>> > 
>> > WDYT?
>> > 
>> > Best,
>> > 
>> > Xintong
>> > 
>> > 
>> > 
>> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler > > >
>> > wrote:
>> > 
>> >> Hi Yong,
>> >> 
>> >> Looks good, thanks for creating this.
>> >> 
>> >> One comment - related to my recent email about Fury, I would love to see
>> >> the v2 serialization decoupled from Kryo.
>> >> 
>> >> As part of that, instead of using xxxKryo in methods, call them 
>> >> xxxGeneric.
>> >> 
>> >> A more extreme change would be to totally rely on Fury (so no more POJO
>> >> serializer). Fury is faster than the POJO serializer in my tests, but this
>> >> would be a much bigger change.
>> >> 
>> >> Though it could dramatically simplify the Flink serialization support.
>> >> 
>> >> — Ken
>> >> 
>> >> PS - a separate issue is how to migrate state from Kryo to something like
>> >> Fury, which supports schema evolution. I think this might be possible, by
>> >> having a smarter deserializer that identifies state as being created by
>> >> Kryo, and using (shaded) Kryo to deserialize, while still writing as Fury.
>> >> 
>> >>> On Dec 6, 2023, at 6:35 PM, Yong Fang > >>> > wrote:
>> >>> 
>> >>> Hi devs,
>> >>> 
>> >>> I'd like to start a discussion about FLIP-398: Improve Serialization
>> >>> Configuration And Usage In Flink [1].
>> >>> 
>> >>> Currently, users can register custom data types and serializers in Flink
>> >>> jobs through various methods, including registration in code,
>> >>> configuration, and annotations. These lead to difficulties in upgrading
>> >>> Flink jobs and priority issues.
>> >>> 
>> >>> In flink-2.0 we would like to manage job data types and serializers
>> >> through
>> >>> configurations. This FLIP will introduce a unified option for data type
>> >> and
>> >>> serializer and users can configure all custom data types and
>> >>> pojo/kryo/custom serializers. In addition, this FLIP will add more
>> >> built-in
>> >>> serializers for complex data types such as List and Map, and optimize the
>> >>> management of Avro Serializers.
>> >>> 
>> 

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-04 Thread Yong Fang
Hi Ken,

Sorry for the late reply. After discussing with @Xintong, we think it is
better to keep the method names in the FLIP mainly for the following
reasons:

1. This FLIP is mainly to support the configurable serializer while keeping
consistent with Flink at the semantic layer. Keeping the existing naming
rules can facilitate user understanding.

2. In the future, if Flink can choose Fury as the generic serializer, we
can update the corresponding methods in that FLIP after the discussion of
Fury is completed. This will be a minor modification, and we can avoid
over-design in the current FLIP.

Thanks for your feedback!

Best,
Fang Yong

On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler 
wrote:

> Hi Xintong,
>
> I agree that decoupling from Kryo is a bigger topic, well beyond the scope
> of this FLIP.
>
> The reason I’d brought up Fury is that this increases my confidence that
> Flink will want to decouple from Kryo sooner rather than later.
>
> So I feel it would be worth investing in a (minor) name change now, to
> improve that migration path in the future. Thus my suggestion for avoiding
> the explicit use of Kryo in method names.
>
> Regards,
>
> — Ken
>
>
>
>
> > On Dec 17, 2023, at 7:16 PM, Xintong Song  wrote:
> >
> > Hi Ken,
> >
> > I think the main purpose of this FLIP is to change how users interact
> with
> > the knobs for customizing the serialization behaviors, from requiring
> code
> > changes to working with pure configurations. Redesigning the knobs (i.e.,
> > names, semantics, etc.), on the other hand, is not the purpose of this
> > FLIP. Preserving the existing names and semantics should also help
> minimize
> > the migration cost for existing users. Therefore, I'm in favor of not
> > changing them.
> >
> > Concerning decoupling from Kryo, and introducing other serialization
> > frameworks like Fury, I think that's a bigger topic that is worth further
> > discussion. At the moment, I'm not aware of any community consensus on
> > doing so. And even if in the future we decide to do so, the changes
> needed
> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> > FLIP on these issues.
> >
> > WDYT?
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler  >
> > wrote:
> >
> >> Hi Yong,
> >>
> >> Looks good, thanks for creating this.
> >>
> >> One comment - related to my recent email about Fury, I would love to see
> >> the v2 serialization decoupled from Kryo.
> >>
> >> As part of that, instead of using xxxKryo in methods, call them
> xxxGeneric.
> >>
> >> A more extreme change would be to totally rely on Fury (so no more POJO
> >> serializer). Fury is faster than the POJO serializer in my tests, but
> this
> >> would be a much bigger change.
> >>
> >> Though it could dramatically simplify the Flink serialization support.
> >>
> >> — Ken
> >>
> >> PS - a separate issue is how to migrate state from Kryo to something
> like
> >> Fury, which supports schema evolution. I think this might be possible,
> by
> >> having a smarter deserializer that identifies state as being created by
> >> Kryo, and using (shaded) Kryo to deserialize, while still writing as
> Fury.
> >>
> >>> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> >>>
> >>> Hi devs,
> >>>
> >>> I'd like to start a discussion about FLIP-398: Improve Serialization
> >>> Configuration And Usage In Flink [1].
> >>>
> >>> Currently, users can register custom data types and serializers in
> Flink
> >>> jobs through various methods, including registration in code,
> >>> configuration, and annotations. These lead to difficulties in upgrading
> >>> Flink jobs and priority issues.
> >>>
> >>> In flink-2.0 we would like to manage job data types and serializers
> >> through
> >>> configurations. This FLIP will introduce a unified option for data type
> >> and
> >>> serializer and users can configure all custom data types and
> >>> pojo/kryo/custom serializers. In addition, this FLIP will add more
> >> built-in
> >>> serializers for complex data types such as List and Map, and optimize
> the
> >>> management of Avro Serializers.
> >>>
> >>> Looking forward to hearing from you, thanks!
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> >>>
> >>> Best,
> >>> Fang Yong
> >>
> >> --
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> Custom big data solutions
> >> Flink & Pinot
> >>
> >>
> >>
> >>
>
>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-28 Thread Ken Krugler
Hi Xintong,

I agree that decoupling from Kryo is a bigger topic, well beyond the scope of 
this FLIP.

The reason I’d brought up Fury is that this increases my confidence that Flink 
will want to decouple from Kryo sooner rather than later.

So I feel it would be worth investing in a (minor) name change now, to improve 
that migration path in the future. Thus my suggestion for avoiding the explicit 
use of Kryo in method names.

Regards,

— Ken




> On Dec 17, 2023, at 7:16 PM, Xintong Song  wrote:
> 
> Hi Ken,
> 
> I think the main purpose of this FLIP is to change how users interact with
> the knobs for customizing the serialization behaviors, from requiring code
> changes to working with pure configurations. Redesigning the knobs (i.e.,
> names, semantics, etc.), on the other hand, is not the purpose of this
> FLIP. Preserving the existing names and semantics should also help minimize
> the migration cost for existing users. Therefore, I'm in favor of not
> changing them.
> 
> Concerning decoupling from Kryo, and introducing other serialization
> frameworks like Fury, I think that's a bigger topic that is worth further
> discussion. At the moment, I'm not aware of any community consensus on
> doing so. And even if in the future we decide to do so, the changes needed
> should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> FLIP on these issues.
> 
> WDYT?
> 
> Best,
> 
> Xintong
> 
> 
> 
> On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
> wrote:
> 
>> Hi Yong,
>> 
>> Looks good, thanks for creating this.
>> 
>> One comment - related to my recent email about Fury, I would love to see
>> the v2 serialization decoupled from Kryo.
>> 
>> As part of that, instead of using xxxKryo in methods, call them xxxGeneric.
>> 
>> A more extreme change would be to totally rely on Fury (so no more POJO
>> serializer). Fury is faster than the POJO serializer in my tests, but this
>> would be a much bigger change.
>> 
>> Though it could dramatically simplify the Flink serialization support.
>> 
>> — Ken
>> 
>> PS - a separate issue is how to migrate state from Kryo to something like
>> Fury, which supports schema evolution. I think this might be possible, by
>> having a smarter deserializer that identifies state as being created by
>> Kryo, and using (shaded) Kryo to deserialize, while still writing as Fury.
>> 
>>> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
>>> 
>>> Hi devs,
>>> 
>>> I'd like to start a discussion about FLIP-398: Improve Serialization
>>> Configuration And Usage In Flink [1].
>>> 
>>> Currently, users can register custom data types and serializers in Flink
>>> jobs through various methods, including registration in code,
>>> configuration, and annotations. These lead to difficulties in upgrading
>>> Flink jobs and priority issues.
>>> 
>>> In flink-2.0 we would like to manage job data types and serializers
>> through
>>> configurations. This FLIP will introduce a unified option for data type
>> and
>>> serializer and users can configure all custom data types and
>>> pojo/kryo/custom serializers. In addition, this FLIP will add more
>> built-in
>>> serializers for complex data types such as List and Map, and optimize the
>>> management of Avro Serializers.
>>> 
>>> Looking forward to hearing from you, thanks!
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
>>> 
>>> Best,
>>> Fang Yong
>> 
>> --
>> Ken Krugler
>> http://www.scaleunlimited.com
>> Custom big data solutions
>> Flink & Pinot
>> 
>> 
>> 
>> 



--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink & Pinot





Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-24 Thread Yong Fang
Hi devs,

Thanks for all the feedback. If there are no more comments, I would like to
start a vote for this FLIP, thanks again!

Best,
Fang Yong

On Wed, Dec 20, 2023 at 9:12 PM Yong Fang  wrote:

> Hi Ken,
>
> Thanks for your feedback. The purpose of this FLIP is to improve the use
> of serialization, including configurable serializer for users, providing
> serializer for composite data types, and resolving the default enabling of
> Kryo, etc. Introducing a better serialization framework would be a great
> help for Flink's performance, and it's great to see your tests on Fury.
> However, as @Xintong mentioned, this could be a huge work and beyond the
> scope of this FLIP. If you're interested, I think we could create a new
> FLIP for it and discuss it further. What do you think? Thanks.
>
> Best,
> Fang Yong
>
> On Mon, Dec 18, 2023 at 11:16 AM Xintong Song 
> wrote:
>
>> Hi Ken,
>>
>> I think the main purpose of this FLIP is to change how users interact with
>> the knobs for customizing the serialization behaviors, from requiring code
>> changes to working with pure configurations. Redesigning the knobs (i.e.,
>> names, semantics, etc.), on the other hand, is not the purpose of this
>> FLIP. Preserving the existing names and semantics should also help
>> minimize
>> the migration cost for existing users. Therefore, I'm in favor of not
>> changing them.
>>
>> Concerning decoupling from Kryo, and introducing other serialization
>> frameworks like Fury, I think that's a bigger topic that is worth further
>> discussion. At the moment, I'm not aware of any community consensus on
>> doing so. And even if in the future we decide to do so, the changes needed
>> should be the same w/ or w/o this FLIP. So I'd suggest not to block this
>> FLIP on these issues.
>>
>> WDYT?
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
>> wrote:
>>
>> > Hi Yong,
>> >
>> > Looks good, thanks for creating this.
>> >
>> > One comment - related to my recent email about Fury, I would love to see
>> > the v2 serialization decoupled from Kryo.
>> >
>> > As part of that, instead of using xxxKryo in methods, call them
>> xxxGeneric.
>> >
>> > A more extreme change would be to totally rely on Fury (so no more POJO
>> > serializer). Fury is faster than the POJO serializer in my tests, but
>> this
>> > would be a much bigger change.
>> >
>> > Though it could dramatically simplify the Flink serialization support.
>> >
>> > — Ken
>> >
>> > PS - a separate issue is how to migrate state from Kryo to something
>> like
>> > Fury, which supports schema evolution. I think this might be possible,
>> by
>> > having a smarter deserializer that identifies state as being created by
>> > Kryo, and using (shaded) Kryo to deserialize, while still writing as
>> Fury.
>> >
>> > > On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
>> > >
>> > > Hi devs,
>> > >
>> > > I'd like to start a discussion about FLIP-398: Improve Serialization
>> > > Configuration And Usage In Flink [1].
>> > >
>> > > Currently, users can register custom data types and serializers in
>> Flink
>> > > jobs through various methods, including registration in code,
>> > > configuration, and annotations. These lead to difficulties in
>> upgrading
>> > > Flink jobs and priority issues.
>> > >
>> > > In flink-2.0 we would like to manage job data types and serializers
>> > through
>> > > configurations. This FLIP will introduce a unified option for data
>> type
>> > and
>> > > serializer and users can configure all custom data types and
>> > > pojo/kryo/custom serializers. In addition, this FLIP will add more
>> > built-in
>> > > serializers for complex data types such as List and Map, and optimize
>> the
>> > > management of Avro Serializers.
>> > >
>> > > Looking forward to hearing from you, thanks!
>> > >
>> > > [1]
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
>> > >
>> > > Best,
>> > > Fang Yong
>> >
>> > --
>> > Ken Krugler
>> > http://www.scaleunlimited.com
>> > Custom big data solutions
>> > Flink & Pinot
>> >
>> >
>> >
>> >
>>
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-20 Thread Yong Fang
Hi Ken,

Thanks for your feedback. The purpose of this FLIP is to improve the use of
serialization, including configurable serializer for users, providing
serializer for composite data types, and resolving the default enabling of
Kryo, etc. Introducing a better serialization framework would be a great
help for Flink's performance, and it's great to see your tests on Fury.
However, as @Xintong mentioned, this could be a huge work and beyond the
scope of this FLIP. If you're interested, I think we could create a new
FLIP for it and discuss it further. What do you think? Thanks.

Best,
Fang Yong

On Mon, Dec 18, 2023 at 11:16 AM Xintong Song  wrote:

> Hi Ken,
>
> I think the main purpose of this FLIP is to change how users interact with
> the knobs for customizing the serialization behaviors, from requiring code
> changes to working with pure configurations. Redesigning the knobs (i.e.,
> names, semantics, etc.), on the other hand, is not the purpose of this
> FLIP. Preserving the existing names and semantics should also help minimize
> the migration cost for existing users. Therefore, I'm in favor of not
> changing them.
>
> Concerning decoupling from Kryo, and introducing other serialization
> frameworks like Fury, I think that's a bigger topic that is worth further
> discussion. At the moment, I'm not aware of any community consensus on
> doing so. And even if in the future we decide to do so, the changes needed
> should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> FLIP on these issues.
>
> WDYT?
>
> Best,
>
> Xintong
>
>
>
> On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
> wrote:
>
> > Hi Yong,
> >
> > Looks good, thanks for creating this.
> >
> > One comment - related to my recent email about Fury, I would love to see
> > the v2 serialization decoupled from Kryo.
> >
> > As part of that, instead of using xxxKryo in methods, call them
> xxxGeneric.
> >
> > A more extreme change would be to totally rely on Fury (so no more POJO
> > serializer). Fury is faster than the POJO serializer in my tests, but
> this
> > would be a much bigger change.
> >
> > Though it could dramatically simplify the Flink serialization support.
> >
> > — Ken
> >
> > PS - a separate issue is how to migrate state from Kryo to something like
> > Fury, which supports schema evolution. I think this might be possible, by
> > having a smarter deserializer that identifies state as being created by
> > Kryo, and using (shaded) Kryo to deserialize, while still writing as
> Fury.
> >
> > > On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> > >
> > > Hi devs,
> > >
> > > I'd like to start a discussion about FLIP-398: Improve Serialization
> > > Configuration And Usage In Flink [1].
> > >
> > > Currently, users can register custom data types and serializers in
> Flink
> > > jobs through various methods, including registration in code,
> > > configuration, and annotations. These lead to difficulties in upgrading
> > > Flink jobs and priority issues.
> > >
> > > In flink-2.0 we would like to manage job data types and serializers
> > through
> > > configurations. This FLIP will introduce a unified option for data type
> > and
> > > serializer and users can configure all custom data types and
> > > pojo/kryo/custom serializers. In addition, this FLIP will add more
> > built-in
> > > serializers for complex data types such as List and Map, and optimize
> the
> > > management of Avro Serializers.
> > >
> > > Looking forward to hearing from you, thanks!
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> > >
> > > Best,
> > > Fang Yong
> >
> > --
> > Ken Krugler
> > http://www.scaleunlimited.com
> > Custom big data solutions
> > Flink & Pinot
> >
> >
> >
> >
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-17 Thread Xintong Song
Hi Ken,

I think the main purpose of this FLIP is to change how users interact with
the knobs for customizing the serialization behaviors, from requiring code
changes to working with pure configurations. Redesigning the knobs (i.e.,
names, semantics, etc.), on the other hand, is not the purpose of this
FLIP. Preserving the existing names and semantics should also help minimize
the migration cost for existing users. Therefore, I'm in favor of not
changing them.

Concerning decoupling from Kryo, and introducing other serialization
frameworks like Fury, I think that's a bigger topic that is worth further
discussion. At the moment, I'm not aware of any community consensus on
doing so. And even if in the future we decide to do so, the changes needed
should be the same w/ or w/o this FLIP. So I'd suggest not to block this
FLIP on these issues.

WDYT?

Best,

Xintong



On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
wrote:

> Hi Yong,
>
> Looks good, thanks for creating this.
>
> One comment - related to my recent email about Fury, I would love to see
> the v2 serialization decoupled from Kryo.
>
> As part of that, instead of using xxxKryo in methods, call them xxxGeneric.
>
> A more extreme change would be to totally rely on Fury (so no more POJO
> serializer). Fury is faster than the POJO serializer in my tests, but this
> would be a much bigger change.
>
> Though it could dramatically simplify the Flink serialization support.
>
> — Ken
>
> PS - a separate issue is how to migrate state from Kryo to something like
> Fury, which supports schema evolution. I think this might be possible, by
> having a smarter deserializer that identifies state as being created by
> Kryo, and using (shaded) Kryo to deserialize, while still writing as Fury.
>
> > On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> >
> > Hi devs,
> >
> > I'd like to start a discussion about FLIP-398: Improve Serialization
> > Configuration And Usage In Flink [1].
> >
> > Currently, users can register custom data types and serializers in Flink
> > jobs through various methods, including registration in code,
> > configuration, and annotations. These lead to difficulties in upgrading
> > Flink jobs and priority issues.
> >
> > In flink-2.0 we would like to manage job data types and serializers
> through
> > configurations. This FLIP will introduce a unified option for data type
> and
> > serializer and users can configure all custom data types and
> > pojo/kryo/custom serializers. In addition, this FLIP will add more
> built-in
> > serializers for complex data types such as List and Map, and optimize the
> > management of Avro Serializers.
> >
> > Looking forward to hearing from you, thanks!
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> >
> > Best,
> > Fang Yong
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-14 Thread Ken Krugler
Hi Yong,

Looks good, thanks for creating this.

One comment - related to my recent email about Fury, I would love to see the v2 
serialization decoupled from Kryo.

As part of that, instead of using xxxKryo in methods, call them xxxGeneric.

A more extreme change would be to totally rely on Fury (so no more POJO 
serializer). Fury is faster than the POJO serializer in my tests, but this 
would be a much bigger change.

Though it could dramatically simplify the Flink serialization support.

— Ken

PS - a separate issue is how to migrate state from Kryo to something like Fury, 
which supports schema evolution. I think this might be possible, by having a 
smarter deserializer that identifies state as being created by Kryo, and using 
(shaded) Kryo to deserialize, while still writing as Fury.

> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> 
> Hi devs,
> 
> I'd like to start a discussion about FLIP-398: Improve Serialization
> Configuration And Usage In Flink [1].
> 
> Currently, users can register custom data types and serializers in Flink
> jobs through various methods, including registration in code,
> configuration, and annotations. These lead to difficulties in upgrading
> Flink jobs and priority issues.
> 
> In flink-2.0 we would like to manage job data types and serializers through
> configurations. This FLIP will introduce a unified option for data type and
> serializer and users can configure all custom data types and
> pojo/kryo/custom serializers. In addition, this FLIP will add more built-in
> serializers for complex data types such as List and Map, and optimize the
> management of Avro Serializers.
> 
> Looking forward to hearing from you, thanks!
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> 
> Best,
> Fang Yong

--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink & Pinot





Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-11 Thread Zhu Zhu
Thanks Yong for creating the FLIP and starting the discussion.

Having a unified config option for all kinds of serializers can make it
much easier
for users to set, track and maintain the serializers for jobs.

+1

Thanks,
Zhu

weijie guo  于2023年12月12日周二 14:01写道:

> Thanks for driving this, Yong.
>
> This FLIP look good to me overall.
>
> I also used a similar approach to decouple the serializer from
> ExecutionConfig in the PoC of DataStream API V2.
>
> I'm +1 for this.
>
> Best regards,
>
> Weijie
>
>
> Xintong Song  于2023年12月12日周二 12:10写道:
>
> > Thanks for working on this, Yong.
> >
> > The usability of the serialization mechanism has indeed been a pain for a
> > long time.
> >
> > The proposed changes look good to me. +1 from my side.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Dec 7, 2023 at 10:36 AM Yong Fang  wrote:
> >
> > > Hi devs,
> > >
> > > I'd like to start a discussion about FLIP-398: Improve Serialization
> > > Configuration And Usage In Flink [1].
> > >
> > > Currently, users can register custom data types and serializers in
> Flink
> > > jobs through various methods, including registration in code,
> > > configuration, and annotations. These lead to difficulties in upgrading
> > > Flink jobs and priority issues.
> > >
> > > In flink-2.0 we would like to manage job data types and serializers
> > through
> > > configurations. This FLIP will introduce a unified option for data type
> > and
> > > serializer and users can configure all custom data types and
> > > pojo/kryo/custom serializers. In addition, this FLIP will add more
> > built-in
> > > serializers for complex data types such as List and Map, and optimize
> the
> > > management of Avro Serializers.
> > >
> > > Looking forward to hearing from you, thanks!
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> > >
> > > Best,
> > > Fang Yong
> > >
> >
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-11 Thread weijie guo
Thanks for driving this, Yong.

This FLIP look good to me overall.

I also used a similar approach to decouple the serializer from
ExecutionConfig in the PoC of DataStream API V2.

I'm +1 for this.

Best regards,

Weijie


Xintong Song  于2023年12月12日周二 12:10写道:

> Thanks for working on this, Yong.
>
> The usability of the serialization mechanism has indeed been a pain for a
> long time.
>
> The proposed changes look good to me. +1 from my side.
>
> Best,
>
> Xintong
>
>
>
> On Thu, Dec 7, 2023 at 10:36 AM Yong Fang  wrote:
>
> > Hi devs,
> >
> > I'd like to start a discussion about FLIP-398: Improve Serialization
> > Configuration And Usage In Flink [1].
> >
> > Currently, users can register custom data types and serializers in Flink
> > jobs through various methods, including registration in code,
> > configuration, and annotations. These lead to difficulties in upgrading
> > Flink jobs and priority issues.
> >
> > In flink-2.0 we would like to manage job data types and serializers
> through
> > configurations. This FLIP will introduce a unified option for data type
> and
> > serializer and users can configure all custom data types and
> > pojo/kryo/custom serializers. In addition, this FLIP will add more
> built-in
> > serializers for complex data types such as List and Map, and optimize the
> > management of Avro Serializers.
> >
> > Looking forward to hearing from you, thanks!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> >
> > Best,
> > Fang Yong
> >
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-11 Thread Xintong Song
Thanks for working on this, Yong.

The usability of the serialization mechanism has indeed been a pain for a
long time.

The proposed changes look good to me. +1 from my side.

Best,

Xintong



On Thu, Dec 7, 2023 at 10:36 AM Yong Fang  wrote:

> Hi devs,
>
> I'd like to start a discussion about FLIP-398: Improve Serialization
> Configuration And Usage In Flink [1].
>
> Currently, users can register custom data types and serializers in Flink
> jobs through various methods, including registration in code,
> configuration, and annotations. These lead to difficulties in upgrading
> Flink jobs and priority issues.
>
> In flink-2.0 we would like to manage job data types and serializers through
> configurations. This FLIP will introduce a unified option for data type and
> serializer and users can configure all custom data types and
> pojo/kryo/custom serializers. In addition, this FLIP will add more built-in
> serializers for complex data types such as List and Map, and optimize the
> management of Avro Serializers.
>
> Looking forward to hearing from you, thanks!
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
>
> Best,
> Fang Yong
>