Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-09 Thread Ahmed Hamdy
Hi Danny
+1 (non-binding)
Best Regards
Ahmed Hamdy


On Tue, 9 Jan 2024 at 11:59, Xingbo Huang  wrote:

> Hi Danny,
>
> +1
>
> Thanks a lot for investigating this. Let me share the current code
> management and release situation of pyflink here. I hope it will be helpful
> to you.
>
> Since Flink 1.13, release managers need to release two python packages to
> pypi, apache-flink[1] and apache-flink-libraries[2]. apache-flink contains
> all pyflink python code and apache-flink-libraries contains the jar package
> corresponding to flink binary. The reason why the content of
> apache-flink-libaries is not put into apache-flink is because starting from
> Flink 1.11, pyflink provides different python versions and platform wheel
> packages. If all wheel packages contain these jar packages, the space of
> apache-flink will quickly increase, but pypi's project space is limited(I
> applied to expand apache-flink twice), so in 1.13, I move the corresponding
> jar package from apache-flink to an independent apache-flink-libraries
> package.
>
> Since each apache-connector wheel package today is shared by the platform,
> we do not need to publish a corresponding apache-libraries package for a
> connector currently, but we still need to package the corresponding jar of
> the connector into the corresponding apche-connector pypi package.
>
> [1] https://pypi.org/project/apache-flink/
> [2] https://pypi.org/project/apache-flink-libraries/
>
> Best,
> Xingbo
>
> Leonard Xu  于2024年1月9日周二 13:46写道:
>
> > +1
> >
> > Thanks Danny for driving this.
> >
> > Best,
> > Leonard
> >
> >
> > > 2024年1月9日 上午2:01,Márton Balassi  写道:
> > >
> > > +1
> > >
> > > Thanks, Danny - I really appreciate you taking the time for the
> in-depth
> > > investigation. Please proceed, looking forward to your experience.
> > >
> > > On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser <
> martijnvis...@apache.org>
> > > wrote:
> > >
> > >> Thanks for investigating Danny. It looks like the best direction to go
> > to
> > >> :)
> > >>
> > >> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry <
> peter.vary.apa...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Thanks Danny for working on this!
> > >>>
> > >>> It would be good to do this in a way that the different connectors
> > could
> > >>> reuse as much code as possible, so if possible put most of the code
> to
> > >> the
> > >>> flink connector shared utils repo [1]
> > >>>
> > >>> +1 from for the general direction (non-binding)
> > >>>
> > >>> Thanks,
> > >>> Peter
> > >>>
> > >>> [1] https://github.com/apache/flink-connector-shared-utils
> > >>>
> > >>>
> > >>> Danny Cranmer  ezt írta (időpont: 2024.
> jan.
> > >> 8.,
> > >>> H, 17:31):
> > >>>
> >  Hello all,
> > 
> >  I have been working with Péter and Marton on externalizing python
> >  connectors [1] from the main repo to the connector repositories. We
> > >> have
> >  the code moved and the CI running tests for Kafka and AWS
> Connectors.
> > >> I am
> >  now looking into the release process.
> > 
> >  When we undertake a Flink release we perform the following steps
> [2],
> >  regarding Python: 1/ run python build on CI, 2/ download Wheels
> > >> artifacts,
> >  3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is
> to
> >  follow the same steps for connectors, using Github actions instead
> of
> > >> Azure
> >  pipeline.
> > 
> >  Today we have a single pypi project for pyflink that contains all
> the
> > >> Flink
> >  libs, apache-flink [3]. I propose we create a new pypi project per
> >  connector using the existing connector version, and following naming
> >  convention: apache-, for example:
> >  apache-flink-connector-aws, apache-flink-connector-kafka. Therefore
> to
> > >> use
> >  a DataStream API connector in python, users would need to first
> > >> install the
> >  lib, for example "python -m pip install apache-flink-connector-aws".
> > 
> >  Once we have consensus I will update the release process and
> perform a
> >  release of the flink-connector-aws project to test it end-to-end. I
> > >> look
> >  forward to any feedback.
> > 
> >  Thanks,
> >  Danny
> > 
> >  [1] https://issues.apache.org/jira/browse/FLINK-33528
> >  [2]
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> >  [3] https://pypi.org/project/apache-flink/
> > 
> > >>
> >
> >
>


Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-09 Thread Xingbo Huang
Hi Danny,

+1

Thanks a lot for investigating this. Let me share the current code
management and release situation of pyflink here. I hope it will be helpful
to you.

Since Flink 1.13, release managers need to release two python packages to
pypi, apache-flink[1] and apache-flink-libraries[2]. apache-flink contains
all pyflink python code and apache-flink-libraries contains the jar package
corresponding to flink binary. The reason why the content of
apache-flink-libaries is not put into apache-flink is because starting from
Flink 1.11, pyflink provides different python versions and platform wheel
packages. If all wheel packages contain these jar packages, the space of
apache-flink will quickly increase, but pypi's project space is limited(I
applied to expand apache-flink twice), so in 1.13, I move the corresponding
jar package from apache-flink to an independent apache-flink-libraries
package.

Since each apache-connector wheel package today is shared by the platform,
we do not need to publish a corresponding apache-libraries package for a
connector currently, but we still need to package the corresponding jar of
the connector into the corresponding apche-connector pypi package.

[1] https://pypi.org/project/apache-flink/
[2] https://pypi.org/project/apache-flink-libraries/

Best,
Xingbo

Leonard Xu  于2024年1月9日周二 13:46写道:

> +1
>
> Thanks Danny for driving this.
>
> Best,
> Leonard
>
>
> > 2024年1月9日 上午2:01,Márton Balassi  写道:
> >
> > +1
> >
> > Thanks, Danny - I really appreciate you taking the time for the in-depth
> > investigation. Please proceed, looking forward to your experience.
> >
> > On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser 
> > wrote:
> >
> >> Thanks for investigating Danny. It looks like the best direction to go
> to
> >> :)
> >>
> >> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry 
> >> wrote:
> >>>
> >>> Thanks Danny for working on this!
> >>>
> >>> It would be good to do this in a way that the different connectors
> could
> >>> reuse as much code as possible, so if possible put most of the code to
> >> the
> >>> flink connector shared utils repo [1]
> >>>
> >>> +1 from for the general direction (non-binding)
> >>>
> >>> Thanks,
> >>> Peter
> >>>
> >>> [1] https://github.com/apache/flink-connector-shared-utils
> >>>
> >>>
> >>> Danny Cranmer  ezt írta (időpont: 2024. jan.
> >> 8.,
> >>> H, 17:31):
> >>>
>  Hello all,
> 
>  I have been working with Péter and Marton on externalizing python
>  connectors [1] from the main repo to the connector repositories. We
> >> have
>  the code moved and the CI running tests for Kafka and AWS Connectors.
> >> I am
>  now looking into the release process.
> 
>  When we undertake a Flink release we perform the following steps [2],
>  regarding Python: 1/ run python build on CI, 2/ download Wheels
> >> artifacts,
>  3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
>  follow the same steps for connectors, using Github actions instead of
> >> Azure
>  pipeline.
> 
>  Today we have a single pypi project for pyflink that contains all the
> >> Flink
>  libs, apache-flink [3]. I propose we create a new pypi project per
>  connector using the existing connector version, and following naming
>  convention: apache-, for example:
>  apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to
> >> use
>  a DataStream API connector in python, users would need to first
> >> install the
>  lib, for example "python -m pip install apache-flink-connector-aws".
> 
>  Once we have consensus I will update the release process and perform a
>  release of the flink-connector-aws project to test it end-to-end. I
> >> look
>  forward to any feedback.
> 
>  Thanks,
>  Danny
> 
>  [1] https://issues.apache.org/jira/browse/FLINK-33528
>  [2]
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>  [3] https://pypi.org/project/apache-flink/
> 
> >>
>
>


Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Leonard Xu
+1

Thanks Danny for driving this.

Best,
Leonard


> 2024年1月9日 上午2:01,Márton Balassi  写道:
> 
> +1
> 
> Thanks, Danny - I really appreciate you taking the time for the in-depth
> investigation. Please proceed, looking forward to your experience.
> 
> On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser 
> wrote:
> 
>> Thanks for investigating Danny. It looks like the best direction to go to
>> :)
>> 
>> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry 
>> wrote:
>>> 
>>> Thanks Danny for working on this!
>>> 
>>> It would be good to do this in a way that the different connectors could
>>> reuse as much code as possible, so if possible put most of the code to
>> the
>>> flink connector shared utils repo [1]
>>> 
>>> +1 from for the general direction (non-binding)
>>> 
>>> Thanks,
>>> Peter
>>> 
>>> [1] https://github.com/apache/flink-connector-shared-utils
>>> 
>>> 
>>> Danny Cranmer  ezt írta (időpont: 2024. jan.
>> 8.,
>>> H, 17:31):
>>> 
 Hello all,
 
 I have been working with Péter and Marton on externalizing python
 connectors [1] from the main repo to the connector repositories. We
>> have
 the code moved and the CI running tests for Kafka and AWS Connectors.
>> I am
 now looking into the release process.
 
 When we undertake a Flink release we perform the following steps [2],
 regarding Python: 1/ run python build on CI, 2/ download Wheels
>> artifacts,
 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
 follow the same steps for connectors, using Github actions instead of
>> Azure
 pipeline.
 
 Today we have a single pypi project for pyflink that contains all the
>> Flink
 libs, apache-flink [3]. I propose we create a new pypi project per
 connector using the existing connector version, and following naming
 convention: apache-, for example:
 apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to
>> use
 a DataStream API connector in python, users would need to first
>> install the
 lib, for example "python -m pip install apache-flink-connector-aws".
 
 Once we have consensus I will update the release process and perform a
 release of the flink-connector-aws project to test it end-to-end. I
>> look
 forward to any feedback.
 
 Thanks,
 Danny
 
 [1] https://issues.apache.org/jira/browse/FLINK-33528
 [2]
 
>> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
 [3] https://pypi.org/project/apache-flink/
 
>> 



Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Márton Balassi
+1

Thanks, Danny - I really appreciate you taking the time for the in-depth
investigation. Please proceed, looking forward to your experience.

On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser 
wrote:

> Thanks for investigating Danny. It looks like the best direction to go to
> :)
>
> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry 
> wrote:
> >
> > Thanks Danny for working on this!
> >
> > It would be good to do this in a way that the different connectors could
> > reuse as much code as possible, so if possible put most of the code to
> the
> > flink connector shared utils repo [1]
> >
> > +1 from for the general direction (non-binding)
> >
> > Thanks,
> > Peter
> >
> > [1] https://github.com/apache/flink-connector-shared-utils
> >
> >
> > Danny Cranmer  ezt írta (időpont: 2024. jan.
> 8.,
> > H, 17:31):
> >
> > > Hello all,
> > >
> > > I have been working with Péter and Marton on externalizing python
> > > connectors [1] from the main repo to the connector repositories. We
> have
> > > the code moved and the CI running tests for Kafka and AWS Connectors.
> I am
> > > now looking into the release process.
> > >
> > > When we undertake a Flink release we perform the following steps [2],
> > > regarding Python: 1/ run python build on CI, 2/ download Wheels
> artifacts,
> > > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> > > follow the same steps for connectors, using Github actions instead of
> Azure
> > > pipeline.
> > >
> > > Today we have a single pypi project for pyflink that contains all the
> Flink
> > > libs, apache-flink [3]. I propose we create a new pypi project per
> > > connector using the existing connector version, and following naming
> > > convention: apache-, for example:
> > > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to
> use
> > > a DataStream API connector in python, users would need to first
> install the
> > > lib, for example "python -m pip install apache-flink-connector-aws".
> > >
> > > Once we have consensus I will update the release process and perform a
> > > release of the flink-connector-aws project to test it end-to-end. I
> look
> > > forward to any feedback.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-33528
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > > [3] https://pypi.org/project/apache-flink/
> > >
>


Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Martijn Visser
Thanks for investigating Danny. It looks like the best direction to go to :)

On Mon, Jan 8, 2024 at 5:56 PM Péter Váry  wrote:
>
> Thanks Danny for working on this!
>
> It would be good to do this in a way that the different connectors could
> reuse as much code as possible, so if possible put most of the code to the
> flink connector shared utils repo [1]
>
> +1 from for the general direction (non-binding)
>
> Thanks,
> Peter
>
> [1] https://github.com/apache/flink-connector-shared-utils
>
>
> Danny Cranmer  ezt írta (időpont: 2024. jan. 8.,
> H, 17:31):
>
> > Hello all,
> >
> > I have been working with Péter and Marton on externalizing python
> > connectors [1] from the main repo to the connector repositories. We have
> > the code moved and the CI running tests for Kafka and AWS Connectors. I am
> > now looking into the release process.
> >
> > When we undertake a Flink release we perform the following steps [2],
> > regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
> > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> > follow the same steps for connectors, using Github actions instead of Azure
> > pipeline.
> >
> > Today we have a single pypi project for pyflink that contains all the Flink
> > libs, apache-flink [3]. I propose we create a new pypi project per
> > connector using the existing connector version, and following naming
> > convention: apache-, for example:
> > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
> > a DataStream API connector in python, users would need to first install the
> > lib, for example "python -m pip install apache-flink-connector-aws".
> >
> > Once we have consensus I will update the release process and perform a
> > release of the flink-connector-aws project to test it end-to-end. I look
> > forward to any feedback.
> >
> > Thanks,
> > Danny
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33528
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > [3] https://pypi.org/project/apache-flink/
> >


Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Péter Váry
Thanks Danny for working on this!

It would be good to do this in a way that the different connectors could
reuse as much code as possible, so if possible put most of the code to the
flink connector shared utils repo [1]

+1 from for the general direction (non-binding)

Thanks,
Peter

[1] https://github.com/apache/flink-connector-shared-utils


Danny Cranmer  ezt írta (időpont: 2024. jan. 8.,
H, 17:31):

> Hello all,
>
> I have been working with Péter and Marton on externalizing python
> connectors [1] from the main repo to the connector repositories. We have
> the code moved and the CI running tests for Kafka and AWS Connectors. I am
> now looking into the release process.
>
> When we undertake a Flink release we perform the following steps [2],
> regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
> 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> follow the same steps for connectors, using Github actions instead of Azure
> pipeline.
>
> Today we have a single pypi project for pyflink that contains all the Flink
> libs, apache-flink [3]. I propose we create a new pypi project per
> connector using the existing connector version, and following naming
> convention: apache-, for example:
> apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
> a DataStream API connector in python, users would need to first install the
> lib, for example "python -m pip install apache-flink-connector-aws".
>
> Once we have consensus I will update the release process and perform a
> release of the flink-connector-aws project to test it end-to-end. I look
> forward to any feedback.
>
> Thanks,
> Danny
>
> [1] https://issues.apache.org/jira/browse/FLINK-33528
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> [3] https://pypi.org/project/apache-flink/
>


[DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Danny Cranmer
Hello all,

I have been working with Péter and Marton on externalizing python
connectors [1] from the main repo to the connector repositories. We have
the code moved and the CI running tests for Kafka and AWS Connectors. I am
now looking into the release process.

When we undertake a Flink release we perform the following steps [2],
regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
follow the same steps for connectors, using Github actions instead of Azure
pipeline.

Today we have a single pypi project for pyflink that contains all the Flink
libs, apache-flink [3]. I propose we create a new pypi project per
connector using the existing connector version, and following naming
convention: apache-, for example:
apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
a DataStream API connector in python, users would need to first install the
lib, for example "python -m pip install apache-flink-connector-aws".

Once we have consensus I will update the release process and perform a
release of the flink-connector-aws project to test it end-to-end. I look
forward to any feedback.

Thanks,
Danny

[1] https://issues.apache.org/jira/browse/FLINK-33528
[2]
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
[3] https://pypi.org/project/apache-flink/