Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
Hi Danny +1 (non-binding) Best Regards Ahmed Hamdy On Tue, 9 Jan 2024 at 11:59, Xingbo Huang wrote: > Hi Danny, > > +1 > > Thanks a lot for investigating this. Let me share the current code > management and release situation of pyflink here. I hope it will be helpful > to you. > > Since Flink 1.13, release managers need to release two python packages to > pypi, apache-flink[1] and apache-flink-libraries[2]. apache-flink contains > all pyflink python code and apache-flink-libraries contains the jar package > corresponding to flink binary. The reason why the content of > apache-flink-libaries is not put into apache-flink is because starting from > Flink 1.11, pyflink provides different python versions and platform wheel > packages. If all wheel packages contain these jar packages, the space of > apache-flink will quickly increase, but pypi's project space is limited(I > applied to expand apache-flink twice), so in 1.13, I move the corresponding > jar package from apache-flink to an independent apache-flink-libraries > package. > > Since each apache-connector wheel package today is shared by the platform, > we do not need to publish a corresponding apache-libraries package for a > connector currently, but we still need to package the corresponding jar of > the connector into the corresponding apche-connector pypi package. > > [1] https://pypi.org/project/apache-flink/ > [2] https://pypi.org/project/apache-flink-libraries/ > > Best, > Xingbo > > Leonard Xu 于2024年1月9日周二 13:46写道: > > > +1 > > > > Thanks Danny for driving this. > > > > Best, > > Leonard > > > > > > > 2024年1月9日 上午2:01,Márton Balassi 写道: > > > > > > +1 > > > > > > Thanks, Danny - I really appreciate you taking the time for the > in-depth > > > investigation. Please proceed, looking forward to your experience. > > > > > > On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser < > martijnvis...@apache.org> > > > wrote: > > > > > >> Thanks for investigating Danny. It looks like the best direction to go > > to > > >> :) > > >> > > >> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry < > peter.vary.apa...@gmail.com> > > >> wrote: > > >>> > > >>> Thanks Danny for working on this! > > >>> > > >>> It would be good to do this in a way that the different connectors > > could > > >>> reuse as much code as possible, so if possible put most of the code > to > > >> the > > >>> flink connector shared utils repo [1] > > >>> > > >>> +1 from for the general direction (non-binding) > > >>> > > >>> Thanks, > > >>> Peter > > >>> > > >>> [1] https://github.com/apache/flink-connector-shared-utils > > >>> > > >>> > > >>> Danny Cranmer ezt írta (időpont: 2024. > jan. > > >> 8., > > >>> H, 17:31): > > >>> > > Hello all, > > > > I have been working with Péter and Marton on externalizing python > > connectors [1] from the main repo to the connector repositories. We > > >> have > > the code moved and the CI running tests for Kafka and AWS > Connectors. > > >> I am > > now looking into the release process. > > > > When we undertake a Flink release we perform the following steps > [2], > > regarding Python: 1/ run python build on CI, 2/ download Wheels > > >> artifacts, > > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is > to > > follow the same steps for connectors, using Github actions instead > of > > >> Azure > > pipeline. > > > > Today we have a single pypi project for pyflink that contains all > the > > >> Flink > > libs, apache-flink [3]. I propose we create a new pypi project per > > connector using the existing connector version, and following naming > > convention: apache-, for example: > > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore > to > > >> use > > a DataStream API connector in python, users would need to first > > >> install the > > lib, for example "python -m pip install apache-flink-connector-aws". > > > > Once we have consensus I will update the release process and > perform a > > release of the flink-connector-aws project to test it end-to-end. I > > >> look > > forward to any feedback. > > > > Thanks, > > Danny > > > > [1] https://issues.apache.org/jira/browse/FLINK-33528 > > [2] > > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > > [3] https://pypi.org/project/apache-flink/ > > > > >> > > > > >
Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
Hi Danny, +1 Thanks a lot for investigating this. Let me share the current code management and release situation of pyflink here. I hope it will be helpful to you. Since Flink 1.13, release managers need to release two python packages to pypi, apache-flink[1] and apache-flink-libraries[2]. apache-flink contains all pyflink python code and apache-flink-libraries contains the jar package corresponding to flink binary. The reason why the content of apache-flink-libaries is not put into apache-flink is because starting from Flink 1.11, pyflink provides different python versions and platform wheel packages. If all wheel packages contain these jar packages, the space of apache-flink will quickly increase, but pypi's project space is limited(I applied to expand apache-flink twice), so in 1.13, I move the corresponding jar package from apache-flink to an independent apache-flink-libraries package. Since each apache-connector wheel package today is shared by the platform, we do not need to publish a corresponding apache-libraries package for a connector currently, but we still need to package the corresponding jar of the connector into the corresponding apche-connector pypi package. [1] https://pypi.org/project/apache-flink/ [2] https://pypi.org/project/apache-flink-libraries/ Best, Xingbo Leonard Xu 于2024年1月9日周二 13:46写道: > +1 > > Thanks Danny for driving this. > > Best, > Leonard > > > > 2024年1月9日 上午2:01,Márton Balassi 写道: > > > > +1 > > > > Thanks, Danny - I really appreciate you taking the time for the in-depth > > investigation. Please proceed, looking forward to your experience. > > > > On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser > > wrote: > > > >> Thanks for investigating Danny. It looks like the best direction to go > to > >> :) > >> > >> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry > >> wrote: > >>> > >>> Thanks Danny for working on this! > >>> > >>> It would be good to do this in a way that the different connectors > could > >>> reuse as much code as possible, so if possible put most of the code to > >> the > >>> flink connector shared utils repo [1] > >>> > >>> +1 from for the general direction (non-binding) > >>> > >>> Thanks, > >>> Peter > >>> > >>> [1] https://github.com/apache/flink-connector-shared-utils > >>> > >>> > >>> Danny Cranmer ezt írta (időpont: 2024. jan. > >> 8., > >>> H, 17:31): > >>> > Hello all, > > I have been working with Péter and Marton on externalizing python > connectors [1] from the main repo to the connector repositories. We > >> have > the code moved and the CI running tests for Kafka and AWS Connectors. > >> I am > now looking into the release process. > > When we undertake a Flink release we perform the following steps [2], > regarding Python: 1/ run python build on CI, 2/ download Wheels > >> artifacts, > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to > follow the same steps for connectors, using Github actions instead of > >> Azure > pipeline. > > Today we have a single pypi project for pyflink that contains all the > >> Flink > libs, apache-flink [3]. I propose we create a new pypi project per > connector using the existing connector version, and following naming > convention: apache-, for example: > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to > >> use > a DataStream API connector in python, users would need to first > >> install the > lib, for example "python -m pip install apache-flink-connector-aws". > > Once we have consensus I will update the release process and perform a > release of the flink-connector-aws project to test it end-to-end. I > >> look > forward to any feedback. > > Thanks, > Danny > > [1] https://issues.apache.org/jira/browse/FLINK-33528 > [2] > > >> > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > [3] https://pypi.org/project/apache-flink/ > > >> > >
Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
+1 Thanks Danny for driving this. Best, Leonard > 2024年1月9日 上午2:01,Márton Balassi 写道: > > +1 > > Thanks, Danny - I really appreciate you taking the time for the in-depth > investigation. Please proceed, looking forward to your experience. > > On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser > wrote: > >> Thanks for investigating Danny. It looks like the best direction to go to >> :) >> >> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry >> wrote: >>> >>> Thanks Danny for working on this! >>> >>> It would be good to do this in a way that the different connectors could >>> reuse as much code as possible, so if possible put most of the code to >> the >>> flink connector shared utils repo [1] >>> >>> +1 from for the general direction (non-binding) >>> >>> Thanks, >>> Peter >>> >>> [1] https://github.com/apache/flink-connector-shared-utils >>> >>> >>> Danny Cranmer ezt írta (időpont: 2024. jan. >> 8., >>> H, 17:31): >>> Hello all, I have been working with Péter and Marton on externalizing python connectors [1] from the main repo to the connector repositories. We >> have the code moved and the CI running tests for Kafka and AWS Connectors. >> I am now looking into the release process. When we undertake a Flink release we perform the following steps [2], regarding Python: 1/ run python build on CI, 2/ download Wheels >> artifacts, 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to follow the same steps for connectors, using Github actions instead of >> Azure pipeline. Today we have a single pypi project for pyflink that contains all the >> Flink libs, apache-flink [3]. I propose we create a new pypi project per connector using the existing connector version, and following naming convention: apache-, for example: apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to >> use a DataStream API connector in python, users would need to first >> install the lib, for example "python -m pip install apache-flink-connector-aws". Once we have consensus I will update the release process and perform a release of the flink-connector-aws project to test it end-to-end. I >> look forward to any feedback. Thanks, Danny [1] https://issues.apache.org/jira/browse/FLINK-33528 [2] >> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release [3] https://pypi.org/project/apache-flink/ >>
Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
+1 Thanks, Danny - I really appreciate you taking the time for the in-depth investigation. Please proceed, looking forward to your experience. On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser wrote: > Thanks for investigating Danny. It looks like the best direction to go to > :) > > On Mon, Jan 8, 2024 at 5:56 PM Péter Váry > wrote: > > > > Thanks Danny for working on this! > > > > It would be good to do this in a way that the different connectors could > > reuse as much code as possible, so if possible put most of the code to > the > > flink connector shared utils repo [1] > > > > +1 from for the general direction (non-binding) > > > > Thanks, > > Peter > > > > [1] https://github.com/apache/flink-connector-shared-utils > > > > > > Danny Cranmer ezt írta (időpont: 2024. jan. > 8., > > H, 17:31): > > > > > Hello all, > > > > > > I have been working with Péter and Marton on externalizing python > > > connectors [1] from the main repo to the connector repositories. We > have > > > the code moved and the CI running tests for Kafka and AWS Connectors. > I am > > > now looking into the release process. > > > > > > When we undertake a Flink release we perform the following steps [2], > > > regarding Python: 1/ run python build on CI, 2/ download Wheels > artifacts, > > > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to > > > follow the same steps for connectors, using Github actions instead of > Azure > > > pipeline. > > > > > > Today we have a single pypi project for pyflink that contains all the > Flink > > > libs, apache-flink [3]. I propose we create a new pypi project per > > > connector using the existing connector version, and following naming > > > convention: apache-, for example: > > > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to > use > > > a DataStream API connector in python, users would need to first > install the > > > lib, for example "python -m pip install apache-flink-connector-aws". > > > > > > Once we have consensus I will update the release process and perform a > > > release of the flink-connector-aws project to test it end-to-end. I > look > > > forward to any feedback. > > > > > > Thanks, > > > Danny > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33528 > > > [2] > > > > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > > > [3] https://pypi.org/project/apache-flink/ > > > >
Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
Thanks for investigating Danny. It looks like the best direction to go to :) On Mon, Jan 8, 2024 at 5:56 PM Péter Váry wrote: > > Thanks Danny for working on this! > > It would be good to do this in a way that the different connectors could > reuse as much code as possible, so if possible put most of the code to the > flink connector shared utils repo [1] > > +1 from for the general direction (non-binding) > > Thanks, > Peter > > [1] https://github.com/apache/flink-connector-shared-utils > > > Danny Cranmer ezt írta (időpont: 2024. jan. 8., > H, 17:31): > > > Hello all, > > > > I have been working with Péter and Marton on externalizing python > > connectors [1] from the main repo to the connector repositories. We have > > the code moved and the CI running tests for Kafka and AWS Connectors. I am > > now looking into the release process. > > > > When we undertake a Flink release we perform the following steps [2], > > regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts, > > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to > > follow the same steps for connectors, using Github actions instead of Azure > > pipeline. > > > > Today we have a single pypi project for pyflink that contains all the Flink > > libs, apache-flink [3]. I propose we create a new pypi project per > > connector using the existing connector version, and following naming > > convention: apache-, for example: > > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use > > a DataStream API connector in python, users would need to first install the > > lib, for example "python -m pip install apache-flink-connector-aws". > > > > Once we have consensus I will update the release process and perform a > > release of the flink-connector-aws project to test it end-to-end. I look > > forward to any feedback. > > > > Thanks, > > Danny > > > > [1] https://issues.apache.org/jira/browse/FLINK-33528 > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > > [3] https://pypi.org/project/apache-flink/ > >
Re: [DISCUSS] Externalized Python Connector Release/Dependency Process
Thanks Danny for working on this! It would be good to do this in a way that the different connectors could reuse as much code as possible, so if possible put most of the code to the flink connector shared utils repo [1] +1 from for the general direction (non-binding) Thanks, Peter [1] https://github.com/apache/flink-connector-shared-utils Danny Cranmer ezt írta (időpont: 2024. jan. 8., H, 17:31): > Hello all, > > I have been working with Péter and Marton on externalizing python > connectors [1] from the main repo to the connector repositories. We have > the code moved and the CI running tests for Kafka and AWS Connectors. I am > now looking into the release process. > > When we undertake a Flink release we perform the following steps [2], > regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts, > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to > follow the same steps for connectors, using Github actions instead of Azure > pipeline. > > Today we have a single pypi project for pyflink that contains all the Flink > libs, apache-flink [3]. I propose we create a new pypi project per > connector using the existing connector version, and following naming > convention: apache-, for example: > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use > a DataStream API connector in python, users would need to first install the > lib, for example "python -m pip install apache-flink-connector-aws". > > Once we have consensus I will update the release process and perform a > release of the flink-connector-aws project to test it end-to-end. I look > forward to any feedback. > > Thanks, > Danny > > [1] https://issues.apache.org/jira/browse/FLINK-33528 > [2] > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > [3] https://pypi.org/project/apache-flink/ >
[DISCUSS] Externalized Python Connector Release/Dependency Process
Hello all, I have been working with Péter and Marton on externalizing python connectors [1] from the main repo to the connector repositories. We have the code moved and the CI running tests for Kafka and AWS Connectors. I am now looking into the release process. When we undertake a Flink release we perform the following steps [2], regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts, 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to follow the same steps for connectors, using Github actions instead of Azure pipeline. Today we have a single pypi project for pyflink that contains all the Flink libs, apache-flink [3]. I propose we create a new pypi project per connector using the existing connector version, and following naming convention: apache-, for example: apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use a DataStream API connector in python, users would need to first install the lib, for example "python -m pip install apache-flink-connector-aws". Once we have consensus I will update the release process and perform a release of the flink-connector-aws project to test it end-to-end. I look forward to any feedback. Thanks, Danny [1] https://issues.apache.org/jira/browse/FLINK-33528 [2] https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release [3] https://pypi.org/project/apache-flink/