[jira] [Closed] (FLINK-36307) Remove deprecated PyFlink config options
[ https://issues.apache.org/jira/browse/FLINK-36307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-36307. --- Resolution: Fixed Merged to master via fc3ace9bd652a8035c3a3d5a1b2cd580f4566a57 > Remove deprecated PyFlink config options > > > Key: FLINK-36307 > URL: https://issues.apache.org/jira/browse/FLINK-36307 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Xuannan Su >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 2.0-preview > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-36311) Remove deprecated flink-formats config options
[ https://issues.apache.org/jira/browse/FLINK-36311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-36311. --- > Remove deprecated flink-formats config options > -- > > Key: FLINK-36311 > URL: https://issues.apache.org/jira/browse/FLINK-36311 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Reporter: Xuannan Su >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 2.0-preview > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-36308) Remove deprecated CEP config options
[ https://issues.apache.org/jira/browse/FLINK-36308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-36308. --- > Remove deprecated CEP config options > > > Key: FLINK-36308 > URL: https://issues.apache.org/jira/browse/FLINK-36308 > Project: Flink > Issue Type: Sub-task > Components: Library / CEP >Reporter: Xuannan Su >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 2.0-preview > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-36311) Remove deprecated flink-formats config options
[ https://issues.apache.org/jira/browse/FLINK-36311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu resolved FLINK-36311. - Resolution: Fixed Merged to master viade40ebe53797a4abe023a437e63d9ba5aa095d40 to 483a94a15aed6661a333255139280affb935d52c > Remove deprecated flink-formats config options > -- > > Key: FLINK-36311 > URL: https://issues.apache.org/jira/browse/FLINK-36311 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Reporter: Xuannan Su >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 2.0-preview > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-36308) Remove deprecated CEP config options
[ https://issues.apache.org/jira/browse/FLINK-36308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu resolved FLINK-36308. - Resolution: Fixed Merged to master via 42d918dec1658c4dc1f0586148704c46bc1dd8d0 > Remove deprecated CEP config options > > > Key: FLINK-36308 > URL: https://issues.apache.org/jira/browse/FLINK-36308 > Project: Flink > Issue Type: Sub-task > Components: Library / CEP >Reporter: Xuannan Su >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 2.0-preview > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody
[ https://issues.apache.org/jira/browse/FLINK-35302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-35302: Fix Version/s: 1.20.0 (was: 1.19.1) > Flink REST server throws exception on unknown fields in RequestBody > --- > > Key: FLINK-35302 > URL: https://issues.apache.org/jira/browse/FLINK-35302 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.19.0 >Reporter: Juntao Hu >Assignee: Juntao Hu >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > > As > [FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance] > and FLINK-33268 mentioned, when an old version REST client receives response > from a new version REST server, with strict JSON mapper, the client will > throw exceptions on newly added fields, which is not convenient for > situations where a centralized client deals with REST servers of different > versions (e.g. k8s operator). > But this incompatibility can also happens at server side, when a new version > REST client sends requests to an old version REST server with additional > fields. Making server flexible with unknown fields can save clients from > backward compatibility code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845526#comment-17845526 ] Dian Fu edited comment on FLINK-35112 at 5/11/24 2:56 AM: -- Fix in - master via b4d71144de8e3772257804b6ed8ad688076430d6 - release-1.19 via d16b20e4fb4fb906927188ca11af599edd0953c1 was (Author: dianfu): Fix in master via b4d71144de8e3772257804b6ed8ad688076430d6 > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Assignee: Wouter Zorgdrager >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-35112: Fix Version/s: 1.19.1 > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Assignee: Wouter Zorgdrager >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0, 1.19.1 > > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-35112. --- Fix Version/s: 1.20.0 Resolution: Fixed Fix in master via b4d71144de8e3772257804b6ed8ad688076430d6 > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Assignee: Wouter Zorgdrager >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-35112: --- Assignee: Wouter Zorgdrager > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Assignee: Wouter Zorgdrager >Priority: Minor > Labels: pull-request-available > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody
[ https://issues.apache.org/jira/browse/FLINK-35302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-35302: --- Assignee: Juntao Hu > Flink REST server throws exception on unknown fields in RequestBody > --- > > Key: FLINK-35302 > URL: https://issues.apache.org/jira/browse/FLINK-35302 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.19.0 >Reporter: Juntao Hu >Assignee: Juntao Hu >Priority: Major > Labels: pull-request-available > Fix For: 1.19.1 > > > As > [FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance] > and FLINK-33268 mentioned, when an old version REST client receives response > from a new version REST server, with strict JSON mapper, the client will > throw exceptions on newly added fields, which is not convenient for > situations where a centralized client deals with REST servers of different > versions (e.g. k8s operator). > But this incompatibility can also happens at server side, when a new version > REST client sends requests to an old version REST server with additional > fields. Making server flexible with unknown fields can save clients from > backward compatibility code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies
[ https://issues.apache.org/jira/browse/FLINK-35208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-35208. --- Fix Version/s: 1.20.0 Resolution: Fixed Merged to master via 0f01a0e9a6856781d5a0e33b26172bb913ec1928 > Respect pipeline.cached-files during processing Python dependencies > --- > > Key: FLINK-35208 > URL: https://issues.apache.org/jira/browse/FLINK-35208 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > > Currently, PyFlink will make use of distributed cache > (StreamExecutionEnvironment#cachedFiles) during handling the Python > dependencies(See > [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339] > for more details). > However, if pipeline.cached-files is configured, it will clear > StreamExecutionEnvironment#cachedFiles(see > [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132] > for more details) which may break the above functionalities. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies
[ https://issues.apache.org/jira/browse/FLINK-35208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-35208: Description: Currently, PyFlink will make use of distributed cache (StreamExecutionEnvironment#cachedFiles) during handling the Python dependencies(See [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339] for more details). However, if pipeline.cached-files is configured, it will clear StreamExecutionEnvironment#cachedFiles(see [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132] for more details) which may break the above functionalities. was: Currently, PyFlink will make use of distributed cache (update StreamExecutionEnvironment#cachedFiles) during handling the Python dependencies(See [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339] for more details). However, if pipeline.cached-files is configured, it will clear StreamExecutionEnvironment#cachedFiles(see [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132] for more details) which may break the above functionalities. > Respect pipeline.cached-files during processing Python dependencies > --- > > Key: FLINK-35208 > URL: https://issues.apache.org/jira/browse/FLINK-35208 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > > Currently, PyFlink will make use of distributed cache > (StreamExecutionEnvironment#cachedFiles) during handling the Python > dependencies(See > [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339] > for more details). > However, if pipeline.cached-files is configured, it will clear > StreamExecutionEnvironment#cachedFiles(see > [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132] > for more details) which may break the above functionalities. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35208) Respect pipeline.cached-files during processing Python dependencies
Dian Fu created FLINK-35208: --- Summary: Respect pipeline.cached-files during processing Python dependencies Key: FLINK-35208 URL: https://issues.apache.org/jira/browse/FLINK-35208 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Assignee: Dian Fu Currently, PyFlink will make use of distributed cache (update StreamExecutionEnvironment#cachedFiles) during handling the Python dependencies(See [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/python/util/PythonDependencyUtils.java#L339] for more details). However, if pipeline.cached-files is configured, it will clear StreamExecutionEnvironment#cachedFiles(see [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1132] for more details) which may break the above functionalities. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837965#comment-17837965 ] Dian Fu commented on FLINK-35112: - Great (y). Thanks [~wzorgdrager] > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Priority: Minor > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35112) Membership for Row class does not include field names
[ https://issues.apache.org/jira/browse/FLINK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837512#comment-17837512 ] Dian Fu commented on FLINK-35112: - Sounds reasonable for me. [~wzorgdrager] Do you want to submit a PR for this issue? > Membership for Row class does not include field names > - > > Key: FLINK-35112 > URL: https://issues.apache.org/jira/browse/FLINK-35112 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.1 >Reporter: Wouter Zorgdrager >Priority: Minor > > In the Row class in PyFlink I cannot do a membership check for field names. > This minimal example will show the unexpected behavior: > ``` > from pyflink.common import Row > row = Row(name="Alice", age=11) > # Expected to be True, but is False > print("name" in row) > person = Row("name", "age") > # This is True, as expected > print('name' in person) > ``` > The related code in the Row class is: > ``` > def __contains__(self, item): > return item in self._values > ``` > It should be relatively easy to fix with the following code: > ``` > def __contains__(self, item): > if hasattr(self, "_fields"): > return item in self._fields > else: > return item in self._values > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34985) It doesn't support to access fields by name for map function in thread mode
Dian Fu created FLINK-34985: --- Summary: It doesn't support to access fields by name for map function in thread mode Key: FLINK-34985 URL: https://issues.apache.org/jira/browse/FLINK-34985 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Reported in slack channel: [https://apache-flink.slack.com/archives/C065944F9M2/p1711640068929589] ``` hi all, I seem to be running into an issue when switching to thread mode in PyFlink. In an UDF the {{Row}} seems to get converted into a tuple and you cannot access fields by their name anymore. In process mode it works fine. This bug can easily be reproduced using this minimal example, which is close to the PyFlink docs: from pyflink.datastream import StreamExecutionEnvironment from pyflink.common import Row from pyflink.table import StreamTableEnvironment, DataTypes from pyflink.table.udf import udf env = StreamExecutionEnvironment.get_execution_environment() t_env = StreamTableEnvironment.create(env) t_env.get_config().set("parallelism.default", "1") # This does work: t_env.get_config().set("python.execution-mode", "process") # This doesn't work: #t_env.get_config().set("python.execution-mode", "thread") def map_function(a: Row) -> Row: return Row(a.a + 1, a.b * a.b) # map operation with a python general scalar function func = udf( map_function, result_type=DataTypes.ROW( [ DataTypes.FIELD("a", DataTypes.BIGINT()), DataTypes.FIELD("b", DataTypes.BIGINT()), ] ), ) table = ( t_env.from_elements( [(2, 4), (0, 0)], schema=DataTypes.ROW( [ DataTypes.FIELD("a", DataTypes.BIGINT()), DataTypes.FIELD("b", DataTypes.BIGINT()), ] ), ) .map(func) .alias("a", "b") .execute() .print() )``` The exception I get in this execution mode is: 2024-03-28 16:32:10 Caused by: pemja.core.PythonException: : 'tuple' object has no attribute 'a' 2024-03-28 16:32:10 at /usr/local/lib/python3.10/dist-packages/pyflink/fn_execution/embedded/operation_utils.process_element(operation_utils.py:72) 2024-03-28 16:32:10 at /usr/local/lib/python3.10/dist-packages/pyflink/fn_execution/table/operations.process_element(operations.py:102) 2024-03-28 16:32:10 at .(:1) 2024-03-28 16:32:10 at /opt/flink/wouter/minimal_example.map_function(minimal_example.py:19) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33220) PyFlink support for Datagen connector
[ https://issues.apache.org/jira/browse/FLINK-33220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829346#comment-17829346 ] Dian Fu commented on FLINK-33220: - [~liu.chong] I missed this ticket. Feel free to submit the PR and ping me~ > PyFlink support for Datagen connector > - > > Key: FLINK-33220 > URL: https://issues.apache.org/jira/browse/FLINK-33220 > Project: Flink > Issue Type: New Feature > Components: API / Python >Reporter: Liu Chong >Priority: Minor > > This is a simple Jira to propose the support of Datagen in PyFlink datastream > API as a built-in source connector -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33220) PyFlink support for Datagen connector
[ https://issues.apache.org/jira/browse/FLINK-33220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829346#comment-17829346 ] Dian Fu edited comment on FLINK-33220 at 3/21/24 1:24 AM: -- [~liu.chong] Sorry, I missed this ticket. Feel free to submit the PR and ping me~ was (Author: dianfu): [~liu.chong] I missed this ticket. Feel free to submit the PR and ping me~ > PyFlink support for Datagen connector > - > > Key: FLINK-33220 > URL: https://issues.apache.org/jira/browse/FLINK-33220 > Project: Flink > Issue Type: New Feature > Components: API / Python >Reporter: Liu Chong >Priority: Minor > > This is a simple Jira to propose the support of Datagen in PyFlink datastream > API as a built-in source connector -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases
[ https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818326#comment-17818326 ] Dian Fu commented on FLINK-34202: - [~lincoln.86xy] It will randomly select one Python version to test and all the Python versions will be tested finally, however, maybe not in one nightly CI. Personally I think this should be acceptable as the case that some test only failed in one specific Python version rarely happens. Even it does, it will still be found, just with one day or two day delay and this should be acceptable. > python tests take suspiciously long in some of the cases > > > Key: FLINK-34202 > URL: https://issues.apache.org/jira/browse/FLINK-34202 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Xingbo Huang >Priority: Critical > Labels: pull-request-available, test-stability > > [This release-1.18 > build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=3e4dd1a2-fe2f-5e5d-a581-48087e718d53&t=b4612f28-e3b5-5853-8a8b-610ae894217a] > has the python stage running into a timeout without any obvious reason. The > [python stage run for > JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=b53e1644-5cb4-5a3b-5d48-f523f39bcf06] > was also getting close to the 4h timeout. > I'm creating this issue for documentation purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34202) python tests take suspiciously long in some of the cases
[ https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818242#comment-17818242 ] Dian Fu commented on FLINK-34202: - [~hxbks2ks] Could you help to take a look at this issue? > python tests take suspiciously long in some of the cases > > > Key: FLINK-34202 > URL: https://issues.apache.org/jira/browse/FLINK-34202 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > [This release-1.18 > build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=3e4dd1a2-fe2f-5e5d-a581-48087e718d53&t=b4612f28-e3b5-5853-8a8b-610ae894217a] > has the python stage running into a timeout without any obvious reason. The > [python stage run for > JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=b53e1644-5cb4-5a3b-5d48-f523f39bcf06] > was also getting close to the 4h timeout. > I'm creating this issue for documentation purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33613) Python UDF Runner process leak in Process Mode
[ https://issues.apache.org/jira/browse/FLINK-33613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-33613. --- Fix Version/s: 1.19.0 1.18.1 1.17.3 Resolution: Fixed Fixed in: - master via 977463cce3ea0f88e2f184c30720bf4e8e97fd4a - release-1.18 via f6d005681e3f5f83ab1074660c4d9878dabb9176 - release-1.17 via c1818f530617af8996f4a74bb064e203186fd98e > Python UDF Runner process leak in Process Mode > -- > > Key: FLINK-33613 > URL: https://issues.apache.org/jira/browse/FLINK-33613 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yu Chen >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0, 1.18.1, 1.17.3 > > Attachments: ps-ef.txt, streaming_word_count-1.py > > > While working with PyFlink, we found that in Process Mode, the Python UDF > process may leak after a failover of the job. It leads to a rising number of > processes with their threads in the host machine, which eventually results in > failure to create new threads. > > You can try to reproduce it with the attached test task > `streamin_word_count.py`. > (Note that the job will continue failover, and you can watch the process > leaks by `ps -ef` on Taskmanager. > > Our test environment: > * K8S Application Mode > * 4 Taskmanagers with 12 slots/TM > * Job's parallelism was set to 48 > The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence > with slots of TM (12), but we found that there are 180 processes on one > Taskmanager after several failovers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-33613) Python UDF Runner process leak in Process Mode
[ https://issues.apache.org/jira/browse/FLINK-33613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-33613: --- Assignee: Dian Fu > Python UDF Runner process leak in Process Mode > -- > > Key: FLINK-33613 > URL: https://issues.apache.org/jira/browse/FLINK-33613 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yu Chen >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Attachments: ps-ef.txt, streaming_word_count-1.py > > > While working with PyFlink, we found that in Process Mode, the Python UDF > process may leak after a failover of the job. It leads to a rising number of > processes with their threads in the host machine, which eventually results in > failure to create new threads. > > You can try to reproduce it with the attached test task > `streamin_word_count.py`. > (Note that the job will continue failover, and you can watch the process > leaks by `ps -ef` on Taskmanager. > > Our test environment: > * K8S Application Mode > * 4 Taskmanagers with 12 slots/TM > * Job's parallelism was set to 48 > The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence > with slots of TM (12), but we found that there are 180 processes on one > Taskmanager after several failovers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-33225. --- Fix Version/s: 1.19.0 1.18.1 Resolution: Fixed Merged to - master via b1bfd70ad8a9e4e1a710dc5775837ba7102d4b70 - release-1.18 via 73273ae1669c9f01b61667dedb85a7e745d6bbe2 > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0, 1.17.1, 1.18.1 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Major > Labels: github-pullrequest, pull-request-available > Fix For: 1.19.0, 1.18.1 > > > In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, > `JVM_ARGS` need to be passed as an array. For example, the current behavior > of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M > -XX:MaxMetaspaceSize=200M'` is: > {{> raise RuntimeError(}} > {{ "Java gateway process exited before sending its port > number.\nStderr:\n"}} > {{ + stderr_info}} > {{ )}} > {{E RuntimeError: Java gateway process exited before sending > its port number.}} > {{E Stderr:}} > {{E Improperly specified VM option > 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}} > {{E Error: Could not create the Java Virtual Machine.}} > {{E Error: A fatal exception has occurred. Program will exit.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-33225) Python API incorrectly passes `JVM_ARGS` as single argument
[ https://issues.apache.org/jira/browse/FLINK-33225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-33225: --- Assignee: Deepyaman Datta > Python API incorrectly passes `JVM_ARGS` as single argument > --- > > Key: FLINK-33225 > URL: https://issues.apache.org/jira/browse/FLINK-33225 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0, 1.17.1, 1.18.1 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Major > Labels: github-pullrequest, pull-request-available > > In the same vein as https://issues.apache.org/jira/browse/FLINK-31915, > `JVM_ARGS` need to be passed as an array. For example, the current behavior > of export `JVM_ARGS='-XX:CompressedClassSpaceSize=100M > -XX:MaxMetaspaceSize=200M'` is: > {{> raise RuntimeError(}} > {{ "Java gateway process exited before sending its port > number.\nStderr:\n"}} > {{ + stderr_info}} > {{ )}} > {{E RuntimeError: Java gateway process exited before sending > its port number.}} > {{E Stderr:}} > {{E Improperly specified VM option > 'CompressedClassSpaceSize=100M -XX:MaxMetaspaceSize=200M'}} > {{E Error: Could not create the Java Virtual Machine.}} > {{E Error: A fatal exception has occurred. Program will exit.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33528) Externalize Python connector code
[ https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786595#comment-17786595 ] Dian Fu commented on FLINK-33528: - Thanks very much for the efforts (y) > Externalize Python connector code > - > > Key: FLINK-33528 > URL: https://issues.apache.org/jira/browse/FLINK-33528 > Project: Flink > Issue Type: Technical Debt > Components: API / Python, Connectors / Common >Affects Versions: 1.18.0 >Reporter: Márton Balassi >Assignee: Peter Vary >Priority: Major > Fix For: 1.19.0 > > > During the connector externalization effort end to end tests for the python > connectors were left in the main repository under: > [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors] > These include both python connector implementation and tests. Currently they > depend on a previously released version of the underlying connectors, > otherwise they would introduce a circular dependency given that they are in > the flink repo at the moment. > This setup prevents us from propagating any breaking change to PublicEvolving > and Internal APIs used by the connectors as they lead to breaking the python > e2e tests. We run into this while implementing FLINK-25857. > Note that we made the decision to turn off the Python test when merging > FLINK-25857, so now we are forced to fix this until 1.19 such that we can > reenable the test runs - now in the externalized connector repos. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"
[ https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-33529: Affects Version/s: 1.17.2 > PyFlink fails with "No module named 'cloudpickle" > - > > Key: FLINK-33529 > URL: https://issues.apache.org/jira/browse/FLINK-33529 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0, 1.17.2 > Environment: Python 3.7.16 or Python 3.9 > YARN >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0, 1.18.1, 1.17.3 > > Attachments: batch_wc.py, flink1.17-get_site_packages.py, > flink1.18-get_site_packages.py > > > PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same > program works fine on Flink 1.17. This is after the change > (https://issues.apache.org/jira/browse/FLINK-32034). > *Repro:* > {code} > [hadoop@ip-1-2-3-4 ~]$ python --version > Python 3.7.16 > [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink > flink-1.18.0-1.amzn2.x86_64 > [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d > [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output > s3://prabhuflinks3/OUT2/ > {code} > *Error* > {code} > ModuleNotFoundError: No module named 'cloudpickle' > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92) > at > org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > {code} > *Analysis* > 1. On Flink 1.17 and Python-3.7.16, > PythonEnvironmentManagerUtils#getSitePackagesPath used to return following > two paths > {code} > [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp > /tmp/lib/python3.7/site-packages > /tmp/lib64/python3.7/site-packages > {code} > whereas Flink 1.18 (FLINK-32034) has changed the > PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is > returned > {code} > [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp > /tmp/lib64/python3.7/site-packages > [root@ip-172-31-45-97 tmp]# > {code} > The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" > which is not returned by the getSitePackagesPath in Flink1.18 causing the > pyflink job failure. > *Attached batch_wc.py, flink1.17-get_site_packages.py and > flink1.18-get_site_packages.py.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"
[ https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-33529. --- Fix Version/s: 1.19.0 1.18.1 1.17.3 Resolution: Fixed > PyFlink fails with "No module named 'cloudpickle" > - > > Key: FLINK-33529 > URL: https://issues.apache.org/jira/browse/FLINK-33529 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0 > Environment: Python 3.7.16 or Python 3.9 > YARN >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0, 1.18.1, 1.17.3 > > Attachments: batch_wc.py, flink1.17-get_site_packages.py, > flink1.18-get_site_packages.py > > > PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same > program works fine on Flink 1.17. This is after the change > (https://issues.apache.org/jira/browse/FLINK-32034). > *Repro:* > {code} > [hadoop@ip-1-2-3-4 ~]$ python --version > Python 3.7.16 > [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink > flink-1.18.0-1.amzn2.x86_64 > [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d > [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output > s3://prabhuflinks3/OUT2/ > {code} > *Error* > {code} > ModuleNotFoundError: No module named 'cloudpickle' > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92) > at > org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > {code} > *Analysis* > 1. On Flink 1.17 and Python-3.7.16, > PythonEnvironmentManagerUtils#getSitePackagesPath used to return following > two paths > {code} > [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp > /tmp/lib/python3.7/site-packages > /tmp/lib64/python3.7/site-packages > {code} > whereas Flink 1.18 (FLINK-32034) has changed the > PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is > returned > {code} > [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp > /tmp/lib64/python3.7/site-packages > [root@ip-172-31-45-97 tmp]# > {code} > The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" > which is not returned by the getSitePackagesPath in Flink1.18 causing the > pyflink job failure. > *Attached batch_wc.py, flink1.17-get_site_packages.py and > flink1.18-get_site_packages.py.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"
[ https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785727#comment-17785727 ] Dian Fu commented on FLINK-33529: - Fixed in: * master via 34d9594ab1e5412371b77912f120b7949c92dcdd * release-1.18 via 2844ea232accece60158b455079920fb2d78f448 * release-1.17 via 01b1ce4f349315e1942b3290c0fa81ab0a6b183e > PyFlink fails with "No module named 'cloudpickle" > - > > Key: FLINK-33529 > URL: https://issues.apache.org/jira/browse/FLINK-33529 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0 > Environment: Python 3.7.16 or Python 3.9 > YARN >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: pull-request-available > Attachments: batch_wc.py, flink1.17-get_site_packages.py, > flink1.18-get_site_packages.py > > > PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same > program works fine on Flink 1.17. This is after the change > (https://issues.apache.org/jira/browse/FLINK-32034). > *Repro:* > {code} > [hadoop@ip-1-2-3-4 ~]$ python --version > Python 3.7.16 > [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink > flink-1.18.0-1.amzn2.x86_64 > [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d > [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output > s3://prabhuflinks3/OUT2/ > {code} > *Error* > {code} > ModuleNotFoundError: No module named 'cloudpickle' > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92) > at > org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > {code} > *Analysis* > 1. On Flink 1.17 and Python-3.7.16, > PythonEnvironmentManagerUtils#getSitePackagesPath used to return following > two paths > {code} > [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp > /tmp/lib/python3.7/site-packages > /tmp/lib64/python3.7/site-packages > {code} > whereas Flink 1.18 (FLINK-32034) has changed the > PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is > returned > {code} > [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp > /tmp/lib64/python3.7/site-packages > [root@ip-172-31-45-97 tmp]# > {code} > The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" > which is not returned by the getSitePackagesPath in Flink1.18 causing the > pyflink job failure. > *Attached batch_wc.py, flink1.17-get_site_packages.py and > flink1.18-get_site_packages.py.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-33529) PyFlink fails with "No module named 'cloudpickle"
[ https://issues.apache.org/jira/browse/FLINK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-33529: --- Assignee: Prabhu Joseph > PyFlink fails with "No module named 'cloudpickle" > - > > Key: FLINK-33529 > URL: https://issues.apache.org/jira/browse/FLINK-33529 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0 > Environment: Python 3.7.16 or Python 3.9 > YARN >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: pull-request-available > Attachments: batch_wc.py, flink1.17-get_site_packages.py, > flink1.18-get_site_packages.py > > > PyFlink fails with "No module named 'cloudpickle" on Flink 1.18. The same > program works fine on Flink 1.17. This is after the change > (https://issues.apache.org/jira/browse/FLINK-32034). > *Repro:* > {code} > [hadoop@ip-1-2-3-4 ~]$ python --version > Python 3.7.16 > [hadoop@ip-1-2-3-4 ~]$ rpm -qa | grep flink > flink-1.18.0-1.amzn2.x86_64 > [hadoop@ip-1-2-3-4 ~]$ flink-yarn-session -d > [hadoop@ip-1-2-3-4 ~]$ flink run -py /tmp/batch_wc.py --output > s3://prabhuflinks3/OUT2/ > {code} > *Error* > {code} > ModuleNotFoundError: No module named 'cloudpickle' > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:656) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:281) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92) > at > org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > {code} > *Analysis* > 1. On Flink 1.17 and Python-3.7.16, > PythonEnvironmentManagerUtils#getSitePackagesPath used to return following > two paths > {code} > [root@ip-172-31-45-97 tmp]# python flink1.17-get_site_packages.py /tmp > /tmp/lib/python3.7/site-packages > /tmp/lib64/python3.7/site-packages > {code} > whereas Flink 1.18 (FLINK-32034) has changed the > PythonEnvironmentManagerUtils#getSitePackagesPath and only one path is > returned > {code} > [root@ip-172-31-45-97 tmp]# python flink1.18-get_site_packages.py /tmp > /tmp/lib64/python3.7/site-packages > [root@ip-172-31-45-97 tmp]# > {code} > The pyflink dependencies are installed in "/tmp/lib/python3.7/site-packages" > which is not returned by the getSitePackagesPath in Flink1.18 causing the > pyflink job failure. > *Attached batch_wc.py, flink1.17-get_site_packages.py and > flink1.18-get_site_packages.py.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33528) Externalize Python connector code
[ https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785708#comment-17785708 ] Dian Fu commented on FLINK-33528: - Yes, I think so. > Externalize Python connector code > - > > Key: FLINK-33528 > URL: https://issues.apache.org/jira/browse/FLINK-33528 > Project: Flink > Issue Type: Technical Debt > Components: API / Python, Connectors / Common >Affects Versions: 1.18.0 >Reporter: Márton Balassi >Priority: Major > Fix For: 1.19.0 > > > During the connector externalization effort end to end tests for the python > connectors were left in the main repository under: > [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors] > These include both python connector implementation and tests. Currently they > depend on a previously released version of the underlying connectors, > otherwise they would introduce a circular dependency given that they are in > the flink repo at the moment. > This setup prevents us from propagating any breaking change to PublicEvolving > and Internal APIs used by the connectors as they lead to breaking the python > e2e tests. We run into this while implementing FLINK-25857. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33528) Externalize Python connector code
[ https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785456#comment-17785456 ] Dian Fu commented on FLINK-33528: - I think it's fair enough to externalize Python connectors. Previously we have also found that the Java APIs of some connectors which reside in external repositories were changed in an incompatible way which breaks the Python connectors API which is located in Flink repository. > Externalize Python connector code > - > > Key: FLINK-33528 > URL: https://issues.apache.org/jira/browse/FLINK-33528 > Project: Flink > Issue Type: Technical Debt > Components: API / Python, Connectors / Common >Affects Versions: 1.18.0 >Reporter: Márton Balassi >Priority: Major > Fix For: 1.19.0 > > > During the connector externalization effort end to end tests for the python > connectors were left in the main repository under: > [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors] > These include both python connector implementation and tests. Currently they > depend on a previously released version of the underlying connectors, > otherwise they would introduce a circular dependency given that they are in > the flink repo at the moment. > This setup prevents us from propagating any breaking change to PublicEvolving > and Internal APIs used by the connectors as they lead to breaking the python > e2e tests. We run into this while implementing FLINK-25857. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32962) Failure to install python dependencies from requirements file
[ https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762927#comment-17762927 ] Dian Fu commented on FLINK-32962: - +1 to backport to Flink 1.17 and 1.18, especially Flink 1.18. > Failure to install python dependencies from requirements file > - > > Key: FLINK-32962 > URL: https://issues.apache.org/jira/browse/FLINK-32962 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1 >Reporter: Aleksandr Pilipenko >Priority: Major > Labels: pull-request-available > > We have encountered an issue when Flink fails to install python dependencies > from requirements file if python environment contains setuptools dependency > version 67.5.0 or above. > Flink job fails with following error: > {code:java} > py4j.protocol.Py4JJavaError: An error occurred while calling o118.await. > : java.util.concurrent.ExecutionException: > org.apache.flink.table.api.TableException: Failed to wait job finish > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118) > at > org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81) > ... > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.client.program.ProgramInvocationException: Job failed > (JobID: 2ca4026944022ac4537c503464d4c47f) > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83) > ... 6 more > Caused by: org.apache.flink.client.program.ProgramInvocationException: Job > failed (JobID: 2ca4026944022ac4537c503464d4c47f) > at > org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ... > Caused by: java.io.IOException: java.io.IOException: Failed to execute the > command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install > --ignore-installed -r > /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5 > --install-option > --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements > output: > Usage: > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [package-index-options] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r > [package-index-options] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [-e] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [-e] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > ... > no such option: --install-option > at > org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.
[jira] [Assigned] (FLINK-32962) Failure to install python dependencies from requirements file
[ https://issues.apache.org/jira/browse/FLINK-32962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-32962: --- Assignee: Aleksandr Pilipenko > Failure to install python dependencies from requirements file > - > > Key: FLINK-32962 > URL: https://issues.apache.org/jira/browse/FLINK-32962 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.15.4, 1.16.2, 1.18.0, 1.17.1 >Reporter: Aleksandr Pilipenko >Assignee: Aleksandr Pilipenko >Priority: Major > Labels: pull-request-available > > We have encountered an issue when Flink fails to install python dependencies > from requirements file if python environment contains setuptools dependency > version 67.5.0 or above. > Flink job fails with following error: > {code:java} > py4j.protocol.Py4JJavaError: An error occurred while calling o118.await. > : java.util.concurrent.ExecutionException: > org.apache.flink.table.api.TableException: Failed to wait job finish > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118) > at > org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81) > ... > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.client.program.ProgramInvocationException: Job failed > (JobID: 2ca4026944022ac4537c503464d4c47f) > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83) > ... 6 more > Caused by: org.apache.flink.client.program.ProgramInvocationException: Job > failed (JobID: 2ca4026944022ac4537c503464d4c47f) > at > org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ... > Caused by: java.io.IOException: java.io.IOException: Failed to execute the > command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install > --ignore-installed -r > /var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5 > --install-option > --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vwgn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements > output: > Usage: > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [package-index-options] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r > [package-index-options] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [-e] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > [-e] ... > /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] > ... > no such option: --install-option > at > org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402) > at > org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238) > at > org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) > at > org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperat
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760240#comment-17760240 ] Dian Fu commented on FLINK-32758: - Merged to: - release-1.18 via 1f7796ee50cfbea4fb633692e6be01070ed45c6f and 8551a39ee46054d3ec05f3d31758f7ad39b69a39 - release-1.17 via 30eeb91c3d2048b88e0a9903d9c973085df2c2ea and 870ac98dcdb92774fed783254a3bf4d8ddc317aa > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools
[jira] [Closed] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32758. --- Fix Version/s: 1.18.0 1.17.2 Resolution: Fixed > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.18.0, 1.17.2 > > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760238#comment-17760238 ] Dian Fu commented on FLINK-32758: - Fixed in master via 5b5a0af15d57ed4424cf8dd744808433e397ebc4 > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirem
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759969#comment-17759969 ] Dian Fu commented on FLINK-32758: - Have verified that upgrade *{*}cibuildwheel{*}* doesn't work and exclude fastavro 1.8.0 works: fastavro>=1.1.0,!=1.8.0. [~deepyaman] What's your thought? > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r d
[jira] [Commented] (FLINK-32981) Add python dynamic Flink home detection
[ https://issues.apache.org/jira/browse/FLINK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759855#comment-17759855 ] Dian Fu commented on FLINK-32981: - [~gaborgsomogyi] [~mbalassi] It seems that this PR breaks the PyFlink wheel package build (https://issues.apache.org/jira/browse/FLINK-32989). Could you help to take a look? > Add python dynamic Flink home detection > --- > > Key: FLINK-32981 > URL: https://issues.apache.org/jira/browse/FLINK-32981 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.19.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > During `pyflink` library compilation Flink home is calculated from the > provided `pyflink` version which is normally something like: `1.19.dev0`. > Such case `.dev0` is replaced to `-SNAPSHOT` which ends-up in hardcoded home > directory: > `../../flink-dist/target/flink-1.18-SNAPSHOT-bin/flink-1.18-SNAPSHOT`. This > is fine as long as one uses the basic version types described > [here](https://peps.python.org/pep-0440/#developmental-releases). In order to > support any kind of `pyflink` version one can dynamically find out the Flink > home directory through globbing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32989) PyFlink wheel package build failed
Dian Fu created FLINK-32989: --- Summary: PyFlink wheel package build failed Key: FLINK-32989 URL: https://issues.apache.org/jira/browse/FLINK-32989 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.19.0 Reporter: Dian Fu {code} Compiling pyflink/fn_execution/coder_impl_fast.pyx because it changed. Compiling pyflink/fn_execution/table/aggregate_fast.pyx because it changed. Compiling pyflink/fn_execution/table/window_aggregate_fast.pyx because it changed. Compiling pyflink/fn_execution/stream_fast.pyx because it changed. Compiling pyflink/fn_execution/beam/beam_stream_fast.pyx because it changed. Compiling pyflink/fn_execution/beam/beam_coder_impl_fast.pyx because it changed. Compiling pyflink/fn_execution/beam/beam_operations_fast.pyx because it changed. [1/7] Cythonizing pyflink/fn_execution/beam/beam_coder_impl_fast.pyx [2/7] Cythonizing pyflink/fn_execution/beam/beam_operations_fast.pyx [3/7] Cythonizing pyflink/fn_execution/beam/beam_stream_fast.pyx [4/7] Cythonizing pyflink/fn_execution/coder_impl_fast.pyx [5/7] Cythonizing pyflink/fn_execution/stream_fast.pyx [6/7] Cythonizing pyflink/fn_execution/table/aggregate_fast.pyx [7/7] Cythonizing pyflink/fn_execution/table/window_aggregate_fast.pyx /home/vsts/work/1/s/flink-python/dev/.conda/envs/3.7/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/vsts/work/1/s/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd tree = Parsing.p_module(s, pxd, full_module_name) Exactly one Flink home directory must exist, but found: [] {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740&view=logs&j=d15e2b2e-10cd-5f59-7734-42d57dc5564d&t=4a86776f-e6e1-598a-f75a-c43d8b819662 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759777#comment-17759777 ] Dian Fu commented on FLINK-32758: - [~deepyaman] Actually the tests have passed on the CI: !image-2023-08-29-10-19-37-977.png! It's failing when building wheel package for MacOS. It uses third-party platform **cibuildwheel** to build wheel packages, see [https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-python-wheels.yml#L41] for more details. Currently it's using version 2.8.0. I will try if the latest version (2.15.0) works on my CI. > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.
[jira] [Updated] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-32758: Attachment: image-2023-08-29-10-19-37-977.png > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > Attachments: image-2023-08-29-10-19-37-977.png > > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{pydot==1.4.2}} > {{ # via
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759774#comment-17759774 ] Dian Fu commented on FLINK-32758: - [~Sergey Nuyanzin] [~deepyaman] I have submitted a hotfix to temporary limiting fastavro < 1.8 to make the CI green (verified it on my CI): [https://github.com/apache/flink/commit/345dece9a8fd58d6ea1c829052fb2f3c68516b48] > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1, 1.19.0 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Blocker > Labels: pull-request-available, test-stability > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758790#comment-17758790 ] Dian Fu commented on FLINK-32758: - Will backport to release-1.17 and release-1.8 branch after one official nightly test of the master branch. > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Major > Labels: pull-request-available > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} >
[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758788#comment-17758788 ] Dian Fu commented on FLINK-32758: - Merged to master via 0dd6f9745b8df005e3aef286ae73092696ca2799 > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1 >Reporter: Deepyaman Datta >Priority: Major > Labels: pull-request-available > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{pydot==1.4.2}} > {{ # via apache-beam}} > {{pymongo==4.4.1}} > {{ # via apac
[jira] [Assigned] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
[ https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-32758: --- Assignee: Deepyaman Datta > PyFlink bounds are overly restrictive and outdated > -- > > Key: FLINK-32758 > URL: https://issues.apache.org/jira/browse/FLINK-32758 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.17.1 >Reporter: Deepyaman Datta >Assignee: Deepyaman Datta >Priority: Major > Labels: pull-request-available > > Hi! I am part of a team building the Flink backend for Ibis > ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink > under the hood for execution; however, PyFlink's requirements are > incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's > outdated and restrictive requirements prevent it from being used alongside > most recent releases of Python data libraries. > Some of the major libraries we (and likely others in the Python community > interested in using PyFlink alongside other libraries) need compatibility > with: > * PyArrow (at least >=10.0.0, but there's no reason not to be also be > compatible with latest) > * pandas (should be compatible with 2.x series, but also probably with > 1.4.x, released January 2022, and 1.5.x) > * numpy (1.22 was released in December 2022) > * Newer releases of Apache Beam > * Newer releases of cython > Furthermore, uncapped dependencies could be more generally preferable, as > they avoid the need for frequent PyFlink releases as newer versions of > libraries are released. A common (and great) argument for not upper-bounding > dependencies, especially for libraries: > [https://iscinumpy.dev/post/bound-version-constraints/] > I am currently testing removing upper bounds in > [https://github.com/apache/flink/pull/23141]; so far, builds pass without > issue in > [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], > and I'm currently waiting on > [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] > to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed > dependencies results in: > {{#}} > {{# This file is autogenerated by pip-compile with Python 3.8}} > {{# by the following command:}} > {{#}} > {{# pip-compile --config=pyproject.toml > --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} > {{#}} > {{apache-beam==2.49.0}} > {{ # via -r dev/dev-requirements.txt}} > {{avro-python3==1.10.2}} > {{ # via -r dev/dev-requirements.txt}} > {{certifi==2023.7.22}} > {{ # via requests}} > {{charset-normalizer==3.2.0}} > {{ # via requests}} > {{cloudpickle==2.2.1}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{crcmod==1.7}} > {{ # via apache-beam}} > {{cython==3.0.0}} > {{ # via -r dev/dev-requirements.txt}} > {{dill==0.3.1.1}} > {{ # via apache-beam}} > {{dnspython==2.4.1}} > {{ # via pymongo}} > {{docopt==0.6.2}} > {{ # via hdfs}} > {{exceptiongroup==1.1.2}} > {{ # via pytest}} > {{fastavro==1.8.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{fasteners==0.18}} > {{ # via apache-beam}} > {{find-libpython==0.3.0}} > {{ # via pemja}} > {{grpcio==1.56.2}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{grpcio-tools==1.56.2}} > {{ # via -r dev/dev-requirements.txt}} > {{hdfs==2.7.0}} > {{ # via apache-beam}} > {{httplib2==0.22.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{idna==3.4}} > {{ # via requests}} > {{iniconfig==2.0.0}} > {{ # via pytest}} > {{numpy==1.24.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # pandas}} > {{ # pyarrow}} > {{objsize==0.6.1}} > {{ # via apache-beam}} > {{orjson==3.9.2}} > {{ # via apache-beam}} > {{packaging==23.1}} > {{ # via pytest}} > {{pandas==2.0.3}} > {{ # via -r dev/dev-requirements.txt}} > {{pemja==0.3.0 ; platform_system != "Windows"}} > {{ # via -r dev/dev-requirements.txt}} > {{pluggy==1.2.0}} > {{ # via pytest}} > {{proto-plus==1.22.3}} > {{ # via apache-beam}} > {{protobuf==4.23.4}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{ # grpcio-tools}} > {{ # proto-plus}} > {{py4j==0.10.9.7}} > {{ # via -r dev/dev-requirements.txt}} > {{pyarrow==11.0.0}} > {{ # via}} > {{ # -r dev/dev-requirements.txt}} > {{ # apache-beam}} > {{pydot==1.4.2}} > {{ # via apache-beam}} > {{pymongo==4.4.1}} > {{ # via apache-beam}} > {{pyparsing==3.1.1}} > {{ # v
[jira] [Created] (FLINK-32724) Mark CEP API classes as Public / PublicEvolving
Dian Fu created FLINK-32724: --- Summary: Mark CEP API classes as Public / PublicEvolving Key: FLINK-32724 URL: https://issues.apache.org/jira/browse/FLINK-32724 Project: Flink Issue Type: Improvement Components: Library / CEP Reporter: Dian Fu Currently most CEP API classes, e.g. Pattern, PatternSelectFunction etc are not annotated as Public / PublicEvolving. We should improve this to make it clear which classes are public. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32544) PythonFunctionFactoryTest fails on Java 17
[ https://issues.apache.org/jira/browse/FLINK-32544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-32544: --- Assignee: Dian Fu > PythonFunctionFactoryTest fails on Java 17 > -- > > Key: FLINK-32544 > URL: https://issues.apache.org/jira/browse/FLINK-32544 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Legacy Components / Flink on Tez >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Assignee: Dian Fu >Priority: Major > > https://dev.azure.com/chesnay/flink/_build/results?buildId=3676&view=logs&j=fba17979-6d2e-591d-72f1-97cf42797c11&t=727942b6-6137-54f7-1ef9-e66e706ea068 > {code} > Jul 05 10:17:23 Exception in thread "main" > java.lang.reflect.InaccessibleObjectException: Unable to make field private > static java.util.IdentityHashMap java.lang.ApplicationShutdownHooks.hooks > accessible: module java.base does not "opens java.lang" to unnamed module > @1880a322 > Jul 05 10:17:23 at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) > Jul 05 10:17:23 at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) > Jul 05 10:17:23 at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178) > Jul 05 10:17:23 at > java.base/java.lang.reflect.Field.setAccessible(Field.java:172) > Jul 05 10:17:23 at > org.apache.flink.client.python.PythonFunctionFactoryTest.closeStartedPythonProcess(PythonFunctionFactoryTest.java:115) > Jul 05 10:17:23 at > org.apache.flink.client.python.PythonFunctionFactoryTest.cleanEnvironment(PythonFunctionFactoryTest.java:79) > Jul 05 10:17:23 at > org.apache.flink.client.python.PythonFunctionFactoryTest.main(PythonFunctionFactoryTest.java:52) > {code} > Side-notes: > * maybe re-evaluate if the test could be run through maven now > * The shutdown hooks business is quite sketchy, and AFAICT would be > unnecessary if the test were an ITCase -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception
[ https://issues.apache.org/jira/browse/FLINK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-32536: Fix Version/s: 1.18.0 > Python tests fail with Arrow DirectBuffer exception > --- > > Key: FLINK-32536 > URL: https://issues.apache.org/jira/browse/FLINK-32536 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > https://dev.azure.com/chesnay/flink/_build/results?buildId=3674&view=logs&j=fba17979-6d2e-591d-72f1-97cf42797c11&t=727942b6-6137-54f7-1ef9-e66e706ea068 > {code} > 2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame. > 2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E : > java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, > int) which is not available. Please set the system property > 'io.netty.tryReflectionSetAccessible' to 'true'. > 2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 Eat > org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184) > 2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 Eat > org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546) > 2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 Eat > jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source) > 2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 Eat > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 Eat > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > 2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > 2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > 2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > 2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > 2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > 2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 Eat > java.base/java.lang.Thread.run(Thread.java:833) > {code} > {code} > 2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas > 2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas > 2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas > 2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table > 2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas > 2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas > 2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas > 2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table > 2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time > 2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function > 2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_udaf.py::Batch
[jira] [Closed] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception
[ https://issues.apache.org/jira/browse/FLINK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32536. --- Resolution: Fixed Fixed in master via 7f34df40c0de8e7bffd149196851f89012faf842 > Python tests fail with Arrow DirectBuffer exception > --- > > Key: FLINK-32536 > URL: https://issues.apache.org/jira/browse/FLINK-32536 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > > https://dev.azure.com/chesnay/flink/_build/results?buildId=3674&view=logs&j=fba17979-6d2e-591d-72f1-97cf42797c11&t=727942b6-6137-54f7-1ef9-e66e706ea068 > {code} > 2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame. > 2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E : > java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, > int) which is not available. Please set the system property > 'io.netty.tryReflectionSetAccessible' to 'true'. > 2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 Eat > org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184) > 2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 Eat > org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546) > 2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 Eat > jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source) > 2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 Eat > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 Eat > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > 2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > 2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > 2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > 2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > 2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 Eat > org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > 2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 Eat > java.base/java.lang.Thread.run(Thread.java:833) > {code} > {code} > 2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas > 2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas > 2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas > 2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table > 2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas > 2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas > 2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas > 2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table > 2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time > 2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED > pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function > 2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED > pyflink/table/tests/te
[jira] [Closed] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32327. --- Fix Version/s: 1.18.0 Resolution: Fixed Fixed in master via 940cb746b91d5022f52c49d4e8e09e15ffea4709 > Python Kafka connector runs into strange NullPointerException > - > > Key: FLINK-32327 > URL: https://issues.apache.org/jira/browse/FLINK-32327 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Chesnay Schepler >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > The following error occurs when running the python kafka tests: > (this uses a slightly modified version of the code, but the error also > happens without it) > {code:python} > def set_record_serializer(self, record_serializer: > 'KafkaRecordSerializationSchema') \ > -> 'KafkaSinkBuilder': > """ > Sets the :class:`KafkaRecordSerializationSchema` that transforms > incoming records to kafka > producer records. > > :param record_serializer: The > :class:`KafkaRecordSerializationSchema`. > """ > # NOTE: If topic selector is a generated first-column selector, do > extra preprocessing > j_topic_selector = > get_field_value(record_serializer._j_serialization_schema, > 'topicSelector') > > caching_name_suffix = > 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' > if > j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): > class_name = get_field_value(j_topic_selector, 'topicSelector')\ > .getClass().getCanonicalName() > > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): > E AttributeError: 'NoneType' object has no attribute 'startswith' > {code} > My assumption is that {{getCanonicalName}} returns {{null}} for some objects, > and this set of objects may have increased in Java 17. I tried adding a null > check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-32327: --- Assignee: Dian Fu > Python Kafka connector runs into strange NullPointerException > - > > Key: FLINK-32327 > URL: https://issues.apache.org/jira/browse/FLINK-32327 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Chesnay Schepler >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > > The following error occurs when running the python kafka tests: > (this uses a slightly modified version of the code, but the error also > happens without it) > {code:python} > def set_record_serializer(self, record_serializer: > 'KafkaRecordSerializationSchema') \ > -> 'KafkaSinkBuilder': > """ > Sets the :class:`KafkaRecordSerializationSchema` that transforms > incoming records to kafka > producer records. > > :param record_serializer: The > :class:`KafkaRecordSerializationSchema`. > """ > # NOTE: If topic selector is a generated first-column selector, do > extra preprocessing > j_topic_selector = > get_field_value(record_serializer._j_serialization_schema, > 'topicSelector') > > caching_name_suffix = > 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' > if > j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): > class_name = get_field_value(j_topic_selector, 'topicSelector')\ > .getClass().getCanonicalName() > > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): > E AttributeError: 'NoneType' object has no attribute 'startswith' > {code} > My assumption is that {{getCanonicalName}} returns {{null}} for some objects, > and this set of objects may have increased in Java 17. I tried adding a null > check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738519#comment-17738519 ] Dian Fu commented on FLINK-32327: - Thanks (y). I will find some time to investigate this issue next week. > Python Kafka connector runs into strange NullPointerException > - > > Key: FLINK-32327 > URL: https://issues.apache.org/jira/browse/FLINK-32327 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Chesnay Schepler >Priority: Major > > The following error occurs when running the python kafka tests: > (this uses a slightly modified version of the code, but the error also > happens without it) > {code:python} > def set_record_serializer(self, record_serializer: > 'KafkaRecordSerializationSchema') \ > -> 'KafkaSinkBuilder': > """ > Sets the :class:`KafkaRecordSerializationSchema` that transforms > incoming records to kafka > producer records. > > :param record_serializer: The > :class:`KafkaRecordSerializationSchema`. > """ > # NOTE: If topic selector is a generated first-column selector, do > extra preprocessing > j_topic_selector = > get_field_value(record_serializer._j_serialization_schema, > 'topicSelector') > > caching_name_suffix = > 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' > if > j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): > class_name = get_field_value(j_topic_selector, 'topicSelector')\ > .getClass().getCanonicalName() > > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): > E AttributeError: 'NoneType' object has no attribute 'startswith' > {code} > My assumption is that {{getCanonicalName}} returns {{null}} for some objects, > and this set of objects may have increased in Java 17. I tried adding a null > check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733729#comment-17733729 ] Dian Fu commented on FLINK-32327: - PS: I can help to dig into this problem if we know which ones are failing. > Python Kafka connector runs into strange NullPointerException > - > > Key: FLINK-32327 > URL: https://issues.apache.org/jira/browse/FLINK-32327 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Chesnay Schepler >Priority: Major > > The following error occurs when running the python kafka tests: > (this uses a slightly modified version of the code, but the error also > happens without it) > {code:python} > def set_record_serializer(self, record_serializer: > 'KafkaRecordSerializationSchema') \ > -> 'KafkaSinkBuilder': > """ > Sets the :class:`KafkaRecordSerializationSchema` that transforms > incoming records to kafka > producer records. > > :param record_serializer: The > :class:`KafkaRecordSerializationSchema`. > """ > # NOTE: If topic selector is a generated first-column selector, do > extra preprocessing > j_topic_selector = > get_field_value(record_serializer._j_serialization_schema, > 'topicSelector') > > caching_name_suffix = > 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' > if > j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): > class_name = get_field_value(j_topic_selector, 'topicSelector')\ > .getClass().getCanonicalName() > > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): > E AttributeError: 'NoneType' object has no attribute 'startswith' > {code} > My assumption is that {{getCanonicalName}} returns {{null}} for some objects, > and this set of objects may have increased in Java 17. I tried adding a null > check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-32327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733728#comment-17733728 ] Dian Fu commented on FLINK-32327: - [~chesnay] It seems that you have disabled the entire Python tests for Java 17. Have you seen other failing tests on Java 17 for Python? > Python Kafka connector runs into strange NullPointerException > - > > Key: FLINK-32327 > URL: https://issues.apache.org/jira/browse/FLINK-32327 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Chesnay Schepler >Priority: Major > > The following error occurs when running the python kafka tests: > (this uses a slightly modified version of the code, but the error also > happens without it) > {code:python} > def set_record_serializer(self, record_serializer: > 'KafkaRecordSerializationSchema') \ > -> 'KafkaSinkBuilder': > """ > Sets the :class:`KafkaRecordSerializationSchema` that transforms > incoming records to kafka > producer records. > > :param record_serializer: The > :class:`KafkaRecordSerializationSchema`. > """ > # NOTE: If topic selector is a generated first-column selector, do > extra preprocessing > j_topic_selector = > get_field_value(record_serializer._j_serialization_schema, > 'topicSelector') > > caching_name_suffix = > 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' > if > j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): > class_name = get_field_value(j_topic_selector, 'topicSelector')\ > .getClass().getCanonicalName() > > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): > E AttributeError: 'NoneType' object has no attribute 'startswith' > {code} > My assumption is that {{getCanonicalName}} returns {{null}} for some objects, > and this set of objects may have increased in Java 17. I tried adding a null > check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib
[ https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32136. --- Fix Version/s: 1.18.0 1.16.3 1.17.2 Assignee: Dian Fu Resolution: Fixed Fixed in: - master via f64563bc1a7ff698acd708b61e9e80ae9c3e848f - release-1.17 via 1b3f25432a29005370b0f51aaa7d4ee79a5edd58 - release-1.16 via f9394025fb756c844ec5f2615971f227d40b9244 > Pyflink gateway server launch fails when purelib != platlib > --- > > Key: FLINK-32136 > URL: https://issues.apache.org/jira/browse/FLINK-32136 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.13.3 >Reporter: William Ashley >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.16.3, 1.17.2 > > > On distros where python's {{purelib}} is different than {{platlib}} (e.g. > Amazon Linux 2, but from my research it's all of the Redhat-based ones), you > wind up with components of packages being installed across two different > locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and > {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}). > {{_find_flink_home}} > [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60] > this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} > environment variable was the one being used. However, from 1.13.3, a > refactoring of {{launch_gateway_server_process}} > ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200] > > [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280]) > re-ordered some method calls. {{{}prepare_environment_variable{}}}'s > [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95] > of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that > matters, and it is the incorrect location. > I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not > exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == > {{{}purelib{}}}). > Repro steps on Amazon Linux 2 > {quote}{{yum -y install python3 java-11}} > {{pip3 install apache-flink==1.13.3}} > {{python3 -c 'from pyflink.table import EnvironmentSettings ; > EnvironmentSettings.new_instance()'}} > {quote} > The resulting error is > {quote}{{The flink-python jar is not found in the opt folder of the > FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}} > {{Error: Could not find or load main class > org.apache.flink.client.python.PythonGatewayServer}} > {{Caused by: java.lang.ClassNotFoundException: > org.apache.flink.client.python.PythonGatewayServer}} > {{Traceback (most recent call last):}} > {{ File "", line 1, in }} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 214, in new_instance}} > {{ return EnvironmentSettings.Builder()}} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 48, in {_}{{_}}init{{_}}{_}}} > {{ gateway = get_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 62, in get_gateway}} > {{ _gateway = launch_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 112, in launch_gateway}} > {{ raise Exception("Java gateway process exited before sending its port > number")}} > {{Exception: Java gateway process exited before sending its port number}} > {quote} > The flink home under /lib64/ does not contain the jar, but it is in the /lib/ > location > {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink > -name "flink-python*.jar"}} > {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name > "flink-python*.jar"}} > {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}} > {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib
[ https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733327#comment-17733327 ] Dian Fu commented on FLINK-32136: - [~wash] I have submitted a PR. Could you help to review? It would be great if you could also help to verify it~ > Pyflink gateway server launch fails when purelib != platlib > --- > > Key: FLINK-32136 > URL: https://issues.apache.org/jira/browse/FLINK-32136 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.13.3 >Reporter: William Ashley >Priority: Major > Labels: pull-request-available > > On distros where python's {{purelib}} is different than {{platlib}} (e.g. > Amazon Linux 2, but from my research it's all of the Redhat-based ones), you > wind up with components of packages being installed across two different > locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and > {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}). > {{_find_flink_home}} > [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60] > this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} > environment variable was the one being used. However, from 1.13.3, a > refactoring of {{launch_gateway_server_process}} > ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200] > > [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280]) > re-ordered some method calls. {{{}prepare_environment_variable{}}}'s > [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95] > of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that > matters, and it is the incorrect location. > I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not > exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == > {{{}purelib{}}}). > Repro steps on Amazon Linux 2 > {quote}{{yum -y install python3 java-11}} > {{pip3 install apache-flink==1.13.3}} > {{python3 -c 'from pyflink.table import EnvironmentSettings ; > EnvironmentSettings.new_instance()'}} > {quote} > The resulting error is > {quote}{{The flink-python jar is not found in the opt folder of the > FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}} > {{Error: Could not find or load main class > org.apache.flink.client.python.PythonGatewayServer}} > {{Caused by: java.lang.ClassNotFoundException: > org.apache.flink.client.python.PythonGatewayServer}} > {{Traceback (most recent call last):}} > {{ File "", line 1, in }} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 214, in new_instance}} > {{ return EnvironmentSettings.Builder()}} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 48, in {_}{{_}}init{{_}}{_}}} > {{ gateway = get_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 62, in get_gateway}} > {{ _gateway = launch_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 112, in launch_gateway}} > {{ raise Exception("Java gateway process exited before sending its port > number")}} > {{Exception: Java gateway process exited before sending its port number}} > {quote} > The flink home under /lib64/ does not contain the jar, but it is in the /lib/ > location > {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink > -name "flink-python*.jar"}} > {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name > "flink-python*.jar"}} > {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}} > {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib
[ https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733327#comment-17733327 ] Dian Fu edited comment on FLINK-32136 at 6/16/23 5:25 AM: -- [~wash] I have submitted a PR (https://github.com/apache/flink/pull/22802). Could you help to review? It would be great if you could also help to verify it~ was (Author: dianfu): [~wash] I have submitted a PR. Could you help to review? It would be great if you could also help to verify it~ > Pyflink gateway server launch fails when purelib != platlib > --- > > Key: FLINK-32136 > URL: https://issues.apache.org/jira/browse/FLINK-32136 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.13.3 >Reporter: William Ashley >Priority: Major > Labels: pull-request-available > > On distros where python's {{purelib}} is different than {{platlib}} (e.g. > Amazon Linux 2, but from my research it's all of the Redhat-based ones), you > wind up with components of packages being installed across two different > locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and > {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}). > {{_find_flink_home}} > [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60] > this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} > environment variable was the one being used. However, from 1.13.3, a > refactoring of {{launch_gateway_server_process}} > ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200] > > [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280]) > re-ordered some method calls. {{{}prepare_environment_variable{}}}'s > [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95] > of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that > matters, and it is the incorrect location. > I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not > exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == > {{{}purelib{}}}). > Repro steps on Amazon Linux 2 > {quote}{{yum -y install python3 java-11}} > {{pip3 install apache-flink==1.13.3}} > {{python3 -c 'from pyflink.table import EnvironmentSettings ; > EnvironmentSettings.new_instance()'}} > {quote} > The resulting error is > {quote}{{The flink-python jar is not found in the opt folder of the > FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}} > {{Error: Could not find or load main class > org.apache.flink.client.python.PythonGatewayServer}} > {{Caused by: java.lang.ClassNotFoundException: > org.apache.flink.client.python.PythonGatewayServer}} > {{Traceback (most recent call last):}} > {{ File "", line 1, in }} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 214, in new_instance}} > {{ return EnvironmentSettings.Builder()}} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 48, in {_}{{_}}init{{_}}{_}}} > {{ gateway = get_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 62, in get_gateway}} > {{ _gateway = launch_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 112, in launch_gateway}} > {{ raise Exception("Java gateway process exited before sending its port > number")}} > {{Exception: Java gateway process exited before sending its port number}} > {quote} > The flink home under /lib64/ does not contain the jar, but it is in the /lib/ > location > {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink > -name "flink-python*.jar"}} > {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name > "flink-python*.jar"}} > {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}} > {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32034) Python's DistUtils is deprecated as of 3.10
[ https://issues.apache.org/jira/browse/FLINK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32034. --- Fix Version/s: 1.18.0 1.17.2 Assignee: Colten Pilgreen Resolution: Fixed Fixed in: - master via 6ee1912e949cf290e5e1e620b1f4bae6552b428b - release-1.17 via dd1dde377f775c43fcd95eed1bae91bf5fcfee2e > Python's DistUtils is deprecated as of 3.10 > --- > > Key: FLINK-32034 > URL: https://issues.apache.org/jira/browse/FLINK-32034 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 > Environment: Kubernetes > Java 11 > Python 3.10.9 >Reporter: Colten Pilgreen >Assignee: Colten Pilgreen >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.0, 1.17.2 > > Attachments: get_site_packages_path_script.py, > get_site_packages_path_script_shortened.py > > > I have recent just went through an upgrade from 1.13 to 1.17, along with that > I upgraded the python version on our Flink Session server. Most everything > that is part of our workflow works, except for Python Dependency Management. > After doing some digging, I found the reason is due to the DeprecationWarning > that is printed when trying to get the site packages path. The script is > GET_SITE_PACKAGES_PATH_SCRIPT and it is executed in the getSitePackagesPath > method in the PythonEnvironmentManagerUtils class. The issue is that the > DeprecationWarning is included into the PYTHONPATH environment variable which > is passed to the beam runner. The deprecation warning breaks Python's ability > to find the site packages due to characters that are not allowed in > filesystem paths. > > Example of the PYTHONPATH environment variable: > PYTHONPATH == :1: DeprecationWarning: The distutils package is > deprecated and slated for removal in Python 3.12. Use setuptools or check PEP > 632 for potential > alternatives:/tmp/python-dist-c63e1464-925c-4289-bb71-c6f50e83186f/python-requirements/lib/python3.10/site-packages > HADOOP_CONF_DIR == /opt/flink/conf -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32136) Pyflink gateway server launch fails when purelib != platlib
[ https://issues.apache.org/jira/browse/FLINK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733017#comment-17733017 ] Dian Fu commented on FLINK-32136: - [~wash] Good catch! This seems like a critical problem. Would you like to open a PR to fix this issue? > Pyflink gateway server launch fails when purelib != platlib > --- > > Key: FLINK-32136 > URL: https://issues.apache.org/jira/browse/FLINK-32136 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.13.3 >Reporter: William Ashley >Priority: Major > > On distros where python's {{purelib}} is different than {{platlib}} (e.g. > Amazon Linux 2, but from my research it's all of the Redhat-based ones), you > wind up with components of packages being installed across two different > locations (e.g. {{/usr/local/lib/python3.7/site-packages/pyflink}} and > {{{}/usr/local/lib64/python3.7/site-packages/pyflink{}}}). > {{_find_flink_home}} > [handles|https://github.com/apache/flink/blob/06688f345f6793a8964ec2175f44cda13c33/flink-python/pyflink/find_flink_home.py#L58C63-L60] > this, and in flink releases <= 1.13.2 its setting of the {{FLINK_LIB_DIR}} > environment variable was the one being used. However, from 1.13.3, a > refactoring of {{launch_gateway_server_process}} > ([1.13.2,|https://github.com/apache/flink/blob/release-1.13.2/flink-python/pyflink/pyflink_gateway_server.py#L200] > > [1.13.3|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L280]) > re-ordered some method calls. {{{}prepare_environment_variable{}}}'s > [non-awareness|https://github.com/apache/flink/blob/release-1.13.3/flink-python/pyflink/pyflink_gateway_server.py#L94C67-L95] > of multiple homes and setting of {{FLINK_LIB_DIR}} now is the one that > matters, and it is the incorrect location. > I've confirmed this problem on Amazon Linux 2 and 2023. The problem does not > exist on, for example, Ubuntu 20 and 22 (for which {{platlib}} == > {{{}purelib{}}}). > Repro steps on Amazon Linux 2 > {quote}{{yum -y install python3 java-11}} > {{pip3 install apache-flink==1.13.3}} > {{python3 -c 'from pyflink.table import EnvironmentSettings ; > EnvironmentSettings.new_instance()'}} > {quote} > The resulting error is > {quote}{{The flink-python jar is not found in the opt folder of the > FLINK_HOME: /usr/local/lib64/python3.7/site-packages/pyflink}} > {{Error: Could not find or load main class > org.apache.flink.client.python.PythonGatewayServer}} > {{Caused by: java.lang.ClassNotFoundException: > org.apache.flink.client.python.PythonGatewayServer}} > {{Traceback (most recent call last):}} > {{ File "", line 1, in }} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 214, in new_instance}} > {{ return EnvironmentSettings.Builder()}} > {{ File > "/usr/local/lib64/python3.7/site-packages/pyflink/table/environment_settings.py", > line 48, in {_}{{_}}init{{_}}{_}}} > {{ gateway = get_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 62, in get_gateway}} > {{ _gateway = launch_gateway()}} > {{ File "/usr/local/lib64/python3.7/site-packages/pyflink/java_gateway.py", > line 112, in launch_gateway}} > {{ raise Exception("Java gateway process exited before sending its port > number")}} > {{Exception: Java gateway process exited before sending its port number}} > {quote} > The flink home under /lib64/ does not contain the jar, but it is in the /lib/ > location > {quote}{{bash-4.2# find /usr/local/lib64/python3.7/site-packages/pyflink > -name "flink-python*.jar"}} > {{bash-4.2# find /usr/local/lib/python3.7/site-packages/pyflink -name > "flink-python*.jar"}} > {{/usr/local/lib/python3.7/site-packages/pyflink/opt/flink-python_2.11-1.13.3.jar}} > {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error
[ https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-32040: Affects Version/s: 1.17.0 1.16.0 > The WatermarkStrategy defined with the Function(with_idleness) report an error > -- > > Key: FLINK-32040 > URL: https://issues.apache.org/jira/browse/FLINK-32040 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0, 1.17.0 >Reporter: Joekwal >Priority: Major > > *version:* upgrade pyflink1.15.2 to pyflink1.16.1 > > *Report an error:* > Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time > characteristic set to 'ProcessingTime', or did you forget to call > 'data_stream.assign_timestamps_and_watermarks(...)'? > The application before with version 1.15.2 has never reported the error. > > *Example:* > {code:java} > ```python``` > class MyTimestampAssigner(TimestampAssigner): >def extract_timestamp(self, value, record_timestamp: int) -> int: >return value['version'] > sql=""" > select columns,version(milliseconds) from kafka_source > """ > table = st_env.sql_query(sql) > stream = st_env.to_changelog_stream(table) > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > > .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10))) > stream = stream.key_by(CommonKeySelector()) \ > .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \ > .process(WindowFunction(), typeInfo){code} > > Try to debug to trace > ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks?? > and find ??watermark_strategy._timestamp_assigner?? is none. > *Solution:* > Remove the function ??with_idleness(Duration.of_seconds(10))?? > {code:java} > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > .with_timestamp_assigner(MyTimestampAssigner())) {code} > Is this a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error
[ https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732524#comment-17732524 ] Dian Fu commented on FLINK-32040: - Good catch! I think you are right that this is a bug. cc [~Juntao Hu] > The WatermarkStrategy defined with the Function(with_idleness) report an error > -- > > Key: FLINK-32040 > URL: https://issues.apache.org/jira/browse/FLINK-32040 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Joekwal >Priority: Major > > *version:* upgrade pyflink1.15.2 to pyflink1.16.1 > > *Report an error:* > Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time > characteristic set to 'ProcessingTime', or did you forget to call > 'data_stream.assign_timestamps_and_watermarks(...)'? > The application before with version 1.15.2 has never reported the error. > > *Example:* > {code:java} > ```python``` > class MyTimestampAssigner(TimestampAssigner): >def extract_timestamp(self, value, record_timestamp: int) -> int: >return value['version'] > sql=""" > select columns,version(milliseconds) from kafka_source > """ > table = st_env.sql_query(sql) > stream = st_env.to_changelog_stream(table) > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > > .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10))) > stream = stream.key_by(CommonKeySelector()) \ > .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \ > .process(WindowFunction(), typeInfo){code} > > Try to debug to trace > ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks?? > and find ??watermark_strategy._timestamp_assigner?? is none. > *Solution:* > Remove the function ??with_idleness(Duration.of_seconds(10))?? > {code:java} > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > .with_timestamp_assigner(MyTimestampAssigner())) {code} > Is this a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32040) The WatermarkStrategy defined with the Function(with_idleness) report an error
[ https://issues.apache.org/jira/browse/FLINK-32040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-32040: Priority: Major (was: Blocker) > The WatermarkStrategy defined with the Function(with_idleness) report an error > -- > > Key: FLINK-32040 > URL: https://issues.apache.org/jira/browse/FLINK-32040 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Joekwal >Priority: Major > > *version:* upgrade pyflink1.15.2 to pyflink1.16.1 > > *Report an error:* > Record has Java Long.MIN_VALUE timestamp (= no timestamp marker). Is the time > characteristic set to 'ProcessingTime', or did you forget to call > 'data_stream.assign_timestamps_and_watermarks(...)'? > The application before with version 1.15.2 has never reported the error. > > *Example:* > {code:java} > ```python``` > class MyTimestampAssigner(TimestampAssigner): >def extract_timestamp(self, value, record_timestamp: int) -> int: >return value['version'] > sql=""" > select columns,version(milliseconds) from kafka_source > """ > table = st_env.sql_query(sql) > stream = st_env.to_changelog_stream(table) > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > > .with_timestamp_assigner(MyTimestampAssigner()).with_idleness(Duration.of_seconds(10))) > stream = stream.key_by(CommonKeySelector()) \ > .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) \ > .process(WindowFunction(), typeInfo){code} > > Try to debug to trace > ??pyflink.datastream.data_stream.DataStream.assign_timestamps_and_watermarks?? > and find ??watermark_strategy._timestamp_assigner?? is none. > *Solution:* > Remove the function ??with_idleness(Duration.of_seconds(10))?? > {code:java} > stream = stream.assign_timestamps_and_watermarks( > WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_minutes(1)) > .with_timestamp_assigner(MyTimestampAssigner())) {code} > Is this a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32034) Python's DistUtils is deprecated as of 3.10
[ https://issues.apache.org/jira/browse/FLINK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732518#comment-17732518 ] Dian Fu commented on FLINK-32034: - [~coltenp] Thanks for reporting this issue and the finding. Would you like to create a PR for this issue? > Python's DistUtils is deprecated as of 3.10 > --- > > Key: FLINK-32034 > URL: https://issues.apache.org/jira/browse/FLINK-32034 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 > Environment: Kubernetes > Java 11 > Python 3.10.9 >Reporter: Colten Pilgreen >Priority: Minor > Attachments: get_site_packages_path_script.py, > get_site_packages_path_script_shortened.py > > > I have recent just went through an upgrade from 1.13 to 1.17, along with that > I upgraded the python version on our Flink Session server. Most everything > that is part of our workflow works, except for Python Dependency Management. > After doing some digging, I found the reason is due to the DeprecationWarning > that is printed when trying to get the site packages path. The script is > GET_SITE_PACKAGES_PATH_SCRIPT and it is executed in the getSitePackagesPath > method in the PythonEnvironmentManagerUtils class. The issue is that the > DeprecationWarning is included into the PYTHONPATH environment variable which > is passed to the beam runner. The deprecation warning breaks Python's ability > to find the site packages due to characters that are not allowed in > filesystem paths. > > Example of the PYTHONPATH environment variable: > PYTHONPATH == :1: DeprecationWarning: The distutils package is > deprecated and slated for removal in Python 3.12. Use setuptools or check PEP > 632 for potential > alternatives:/tmp/python-dist-c63e1464-925c-4289-bb71-c6f50e83186f/python-requirements/lib/python3.10/site-packages > HADOOP_CONF_DIR == /opt/flink/conf -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32207) Error import Pyflink.table.descriptors due to python3.10 version mismatch
[ https://issues.apache.org/jira/browse/FLINK-32207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32207. --- Resolution: Duplicate > Error import Pyflink.table.descriptors due to python3.10 version mismatch > - > > Key: FLINK-32207 > URL: https://issues.apache.org/jira/browse/FLINK-32207 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 > Environment: Colab > Python3.10 >Reporter: Alireza Omidvar >Priority: Major > Fix For: 1.17.1 > > Attachments: image (1).png, image (2).png > > > Gentlemen, > > > > I have problem with some apache-flink modules. I am running a 1.17.0 apache- > flink and I write test codes in Colab I faced a problem for import modules > > > > > from pyflink.table import DataTypes > > from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime > > from pyflink.table.catalog import FileSystem > > > > > not working for me (python version 3.10) > > > Any help is highly appreciated the strange is that other modules importing > fine. I checked with your Github but didn't find these on yours too which > means modules are not inside your descriptor.py too. I think it needed > installation of connectors but it failed too. > > > > > Please see the link below: > > > > > > [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb] > > > > > I am running a test after producing the stream > ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb]) > to Confluent server and I like to do a flink job but the above mentioned > modules are not found with the following links in collab: > > > That is not probably a bug. Only version of apache-flink now working on colab > is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and between > different modules found out that Kafka and Json modules are not in > descriptors.py of version 1.17 Apache-flink default. But modules exist in > Apache-flink 1.13 version. > [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing] > [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing] > > I've got this error for Json, Kafka ... > --- > > ImportError Traceback (most recent call last) > in () 1 from pyflink.table import DataTypes > 2 from > pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from > pyflink.table.catalog import FileSystem ImportError: cannot import name > 'Kafka' from 'pyflink.table.descriptors' > (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) > > --- > > NOTE: If your import is failing due to a missing package, you can manually > install dependencies using either !pip or !apt. To view examples of > installing some common dependencies, click the "Open Examples" button below. > > --- > > I have doubt that if current error is related to a version and dependencies > then > > I have to ask the developer if I do this python 3.8 env is that possible to > get solved? > > > Thanks for your time , > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32207) Error import Pyflink.table.descriptors due to python3.10 version mismatch
[ https://issues.apache.org/jira/browse/FLINK-32207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732515#comment-17732515 ] Dian Fu commented on FLINK-32207: - This seems duplicate with FLINK-32206. Closing this ticket~. Feel free to reopen it if I misunderstood the problem. > Error import Pyflink.table.descriptors due to python3.10 version mismatch > - > > Key: FLINK-32207 > URL: https://issues.apache.org/jira/browse/FLINK-32207 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 > Environment: Colab > Python3.10 >Reporter: Alireza Omidvar >Priority: Major > Fix For: 1.17.1 > > Attachments: image (1).png, image (2).png > > > Gentlemen, > > > > I have problem with some apache-flink modules. I am running a 1.17.0 apache- > flink and I write test codes in Colab I faced a problem for import modules > > > > > from pyflink.table import DataTypes > > from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime > > from pyflink.table.catalog import FileSystem > > > > > not working for me (python version 3.10) > > > Any help is highly appreciated the strange is that other modules importing > fine. I checked with your Github but didn't find these on yours too which > means modules are not inside your descriptor.py too. I think it needed > installation of connectors but it failed too. > > > > > Please see the link below: > > > > > > [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb] > > > > > I am running a test after producing the stream > ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb]) > to Confluent server and I like to do a flink job but the above mentioned > modules are not found with the following links in collab: > > > That is not probably a bug. Only version of apache-flink now working on colab > is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and between > different modules found out that Kafka and Json modules are not in > descriptors.py of version 1.17 Apache-flink default. But modules exist in > Apache-flink 1.13 version. > [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing] > [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing] > > I've got this error for Json, Kafka ... > --- > > ImportError Traceback (most recent call last) > in () 1 from pyflink.table import DataTypes > 2 from > pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from > pyflink.table.catalog import FileSystem ImportError: cannot import name > 'Kafka' from 'pyflink.table.descriptors' > (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) > > --- > > NOTE: If your import is failing due to a missing package, you can manually > install dependencies using either !pip or !apt. To view examples of > installing some common dependencies, click the "Open Examples" button below. > > --- > > I have doubt that if current error is related to a version and dependencies > then > > I have to ask the developer if I do this python 3.8 env is that possible to > get solved? > > > Thanks for your time , > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32206) ModuleNotFoundError for Pyflink.table.descriptors wheel version mismatch
[ https://issues.apache.org/jira/browse/FLINK-32206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732513#comment-17732513 ] Dian Fu commented on FLINK-32206: - 'Kafka' has already been removed from 'pyflink.table.descriptors'. I guess you are referring an outdated example. > ModuleNotFoundError for Pyflink.table.descriptors wheel version mismatch > > > Key: FLINK-32206 > URL: https://issues.apache.org/jira/browse/FLINK-32206 > Project: Flink > Issue Type: Bug > Components: API / Python, Connectors / Kafka, Connectors / MongoDB >Affects Versions: 1.17.0 >Reporter: Alireza Omidvar >Priority: Major > Attachments: image (1).png, image (2).png > > > Gentlemen, > I have problem with some apache-flink modules. I am running a 1.17.0 apache- > flink and I write test codes in Colab I faced a problem on importing Kafka, > Json and FileSystem modules > > from pyflink.table.descriptors import Schema, Kafka, Json, Rowtime > from pyflink.table.catalog import FileSystem > > not working for me (python version 3.10) > > Any help is highly appreciated the strange is that other modules importing > fine. I checked with your Github but didn't find these on official version > too which means modules are not inside the descriptor.py in newer version. > > Please see the link below: > > [https://github.com/aomidvar/scrapper-price-comparison/blob/d8a10f74101bf96974e769813c33b83d7a71f02b/kafkaconsumer1.ipynb] > > > I am running a test after producing the stream > ([https://github.com/aomidvar/scrapper-price-comparison/blob/main/kafkaproducer1.ipynb]) > to Confluent server and I like to do a flink job but the above mentioned > modules are not found with the following links in collab: > > That is probably an easy fix bug. Only version of apache-flink now working on > colab is 1.17.0. I prefer 3.10 but installed a virtual python 3.8 env and > between different modules found out that Kafka and Json modules are not in > descriptors.py of version 1.17 Apache-flink default. But modules exist in > Apache-flink 1.13 version. > [https://colab.research.google.com/drive/1aHKv8WA6RA10zTdwdzUubB5K0anEmOws?usp=sharing] > [https://colab.research.google.com/drive/1eCHJlsb8AjdmJtPc95X3H4btmFVSoCL4?usp=sharing] > > I've got this error for Json, Kafka ... > --- > > ImportError Traceback (most recent call last) > in () 1 from pyflink.table import DataTypes > 2 from > pyflink.table.descriptors import Schema, Kafka, Json, Rowtime 3 from > pyflink.table.catalog import FileSystem ImportError: cannot import name > 'Kafka' from 'pyflink.table.descriptors' > (/usr/local/lib/python3.10/dist-packages/pyflink/table/descriptors.py) > > --- > > NOTE: If your import is failing due to a missing package, you can manually > install dependencies using either !pip or !apt. To view examples of > installing some common dependencies, click the "Open Examples" button below. > > --- > > I have doubt that if current error is related to a version and dependencies > then > > I have to ask the developer if I do this python 3.8 env is that possible to > get solved? > > > Thanks for your time , > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-31916) Python API only respects deprecated env.java.opts key
[ https://issues.apache.org/jira/browse/FLINK-31916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu resolved FLINK-31916. - Resolution: Fixed Merged to master via 626d70d86d3bee944edf548606b38042f31e09b6 > Python API only respects deprecated env.java.opts key > - > > Key: FLINK-31916 > URL: https://issues.apache.org/jira/browse/FLINK-31916 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Runtime / Configuration >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > pyflink_gateway_server.py is only reading the deprecated env.java.opts from > the configuration. > This key should only be used as a fallback, with env.java.opts.tm/jm/client > being the actual keys to support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32056) Update the used Pulsar connector in flink-python to 4.0.0
[ https://issues.apache.org/jira/browse/FLINK-32056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32056. --- Fix Version/s: 1.18.0 1.17.2 Resolution: Fixed Fixed in: - master via fbf7b91424ec626ae56dd2477347a7759db6d5fe - release-1.17 via d3a3755a7eef5708871580671169fd6bd2babf28 > Update the used Pulsar connector in flink-python to 4.0.0 > - > > Key: FLINK-32056 > URL: https://issues.apache.org/jira/browse/FLINK-32056 > Project: Flink > Issue Type: Bug > Components: API / Python, Connectors / Pulsar >Affects Versions: 1.18.0, 1.17.1 >Reporter: Martijn Visser >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.17.2 > > > flink-python still references and tests flink-connector-pulsar:3.0.0, while > it should be using flink-connector-pulsar:4.0.0. That's because the newer > version is the only version compatible with Flink 1.17 and it doesn't rely on > flink-shaded. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32131) Support specifying hadoop config dir for Python HiveCatalog
[ https://issues.apache.org/jira/browse/FLINK-32131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-32131. --- Resolution: Fixed Merged to master via 5a89d22c146f451f12fd0c6de64804d315e1f4b6 > Support specifying hadoop config dir for Python HiveCatalog > --- > > Key: FLINK-32131 > URL: https://issues.apache.org/jira/browse/FLINK-32131 > Project: Flink > Issue Type: Improvement > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Hadoop config directory could be specified for HiveCatalog in Java, however, > this is still not supported in Python HiveCatalog. This JIRA is to align them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32131) Support specifying hadoop config dir for Python HiveCatalog
Dian Fu created FLINK-32131: --- Summary: Support specifying hadoop config dir for Python HiveCatalog Key: FLINK-32131 URL: https://issues.apache.org/jira/browse/FLINK-32131 Project: Flink Issue Type: Improvement Components: API / Python Reporter: Dian Fu Assignee: Dian Fu Fix For: 1.18.0 Hadoop config directory could be specified for HiveCatalog in Java, however, this is still not supported in Python HiveCatalog. This JIRA is to align them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-20794) Support to select distinct columns in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-20794. --- Fix Version/s: (was: 1.18.0) Resolution: Not A Problem > Support to select distinct columns in the Table API > --- > > Key: FLINK-20794 > URL: https://issues.apache.org/jira/browse/FLINK-20794 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Assignee: Vishaal Selvaraj >Priority: Major > Attachments: screenshot-1.png > > > Currently, there is no corresponding functionality in Table API for the > following SQL: > {code:java} > SELECT DISTINCT users FROM Orders > {code} > For example, for the following job: > {code:java} > table.select("distinct a") > {code} > It will thrown the following exception: > {code:java} > org.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' > foundorg.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' founddistinct a ^ > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47) > at > org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40) > at > org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-20794) Support to select distinct columns in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722623#comment-17722623 ] Dian Fu commented on FLINK-20794: - Thanks for the analysis. Since string based interfaces have been removed, I guess we could close this ticket~ > Support to select distinct columns in the Table API > --- > > Key: FLINK-20794 > URL: https://issues.apache.org/jira/browse/FLINK-20794 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Assignee: Vishaal Selvaraj >Priority: Major > Fix For: 1.18.0 > > Attachments: screenshot-1.png > > > Currently, there is no corresponding functionality in Table API for the > following SQL: > {code:java} > SELECT DISTINCT users FROM Orders > {code} > For example, for the following job: > {code:java} > table.select("distinct a") > {code} > It will thrown the following exception: > {code:java} > org.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' > foundorg.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' founddistinct a ^ > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47) > at > org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40) > at > org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor
[ https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719599#comment-17719599 ] Dian Fu edited comment on FLINK-31968 at 5/5/23 2:27 AM: - [~gradysnik] Thanks for the confirmation. I'm closing this ticket since it should have been addressed in FLINK-28786. Feel free to reopen it if this issue still happens after 1.17.1. was (Author: dianfu): [~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has been addressed in FLINK-28786. Feel free to reopen it if this issue still happens after 1.17.1. > [PyFlink] 1.17.0 version on M1 processor > > > Key: FLINK-31968 > URL: https://issues.apache.org/jira/browse/FLINK-31968 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yury Smirnov >Priority: Major > Attachments: 1170_output.log > > > PyFlink version 1.17.0 > NumPy version 1.21.4 and 1.21.6 > Python versions 3.8, 3.9, 3.9 > Slack thread: > [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] > While running any of [PyFlink > Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples] > , getting error: > {code:java} > /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python > /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py > Traceback (most recent call last): > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 76, in > basic_operations() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 53, in basic_operations > show(ds.map(update_tel), env) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 28, in show > env.execute() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py", > line 764, in execute > return > JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph)) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py", > line 146, in deco > return f(*a, **kw) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute. > : org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.concurrent.
[jira] [Commented] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor
[ https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719599#comment-17719599 ] Dian Fu commented on FLINK-31968: - [~gradysnik] Thanks for the confirmation. I'm closing this ticket since it has been addressed in FLINK-28786. Feel free to reopen it if this issue still happens after 1.17.1. > [PyFlink] 1.17.0 version on M1 processor > > > Key: FLINK-31968 > URL: https://issues.apache.org/jira/browse/FLINK-31968 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yury Smirnov >Priority: Major > Attachments: 1170_output.log > > > PyFlink version 1.17.0 > NumPy version 1.21.4 and 1.21.6 > Python versions 3.8, 3.9, 3.9 > Slack thread: > [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] > While running any of [PyFlink > Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples] > , getting error: > {code:java} > /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python > /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py > Traceback (most recent call last): > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 76, in > basic_operations() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 53, in basic_operations > show(ds.map(update_tel), env) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 28, in show > env.execute() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py", > line 764, in execute > return > JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph)) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py", > line 146, in deco > return f(*a, **kw) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute. > : org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47) > at akka.dispatch.OnComplete.internal(Future.scala:300) > at akka.dispatch.OnComplete.internal(Future.scala:297) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) > at
[jira] [Closed] (FLINK-31968) [PyFlink] 1.17.0 version on M1 processor
[ https://issues.apache.org/jira/browse/FLINK-31968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-31968. --- Resolution: Duplicate > [PyFlink] 1.17.0 version on M1 processor > > > Key: FLINK-31968 > URL: https://issues.apache.org/jira/browse/FLINK-31968 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yury Smirnov >Priority: Major > Attachments: 1170_output.log > > > PyFlink version 1.17.0 > NumPy version 1.21.4 and 1.21.6 > Python versions 3.8, 3.9, 3.9 > Slack thread: > [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1682508650487979] > While running any of [PyFlink > Examples|https://github.com/apache/flink/blob/release-1.17/flink-python/pyflink/examples] > , getting error: > {code:java} > /Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/bin/python > /Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py > Traceback (most recent call last): > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 76, in > basic_operations() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 53, in basic_operations > show(ds.map(update_tel), env) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/word_count/word_count.py", > line 28, in show > env.execute() > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/datastream/stream_execution_environment.py", > line 764, in execute > return > JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph)) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/pyflink/util/exceptions.py", > line 146, in deco > return f(*a, **kw) > File > "/Users/ysmirnov/Documents/Repos/flink-python-jobs/venv/lib/python3.9/site-packages/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling o10.execute. > : org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at > org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47) > at akka.dispatch.OnComplete.internal(Future.scala:300) > at akka.dispatch.OnComplete.internal(Future.scala:297) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) > at > org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecuti
[jira] [Updated] (FLINK-31988) Implement Python wrapper for new KDS source
[ https://issues.apache.org/jira/browse/FLINK-31988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-31988: Component/s: API / Python Connectors / Kinesis > Implement Python wrapper for new KDS source > --- > > Key: FLINK-31988 > URL: https://issues.apache.org/jira/browse/FLINK-31988 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Connectors / Kinesis >Reporter: Hong Liang Teoh >Priority: Major > > *What?* > - Implement Python wrapper for KDS source > - Write tests for this KDS source -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-20794) Support to select distinct columns in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719245#comment-17719245 ] Dian Fu commented on FLINK-20794: - [~supercmmetry] Thanks for taking this ticket. Have assigned it to you~ > Support to select distinct columns in the Table API > --- > > Key: FLINK-20794 > URL: https://issues.apache.org/jira/browse/FLINK-20794 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Assignee: Vishaal Selvaraj >Priority: Major > Fix For: 1.18.0 > > > Currently, there is no corresponding functionality in Table API for the > following SQL: > {code:java} > SELECT DISTINCT users FROM Orders > {code} > For example, for the following job: > {code:java} > table.select("distinct a") > {code} > It will thrown the following exception: > {code:java} > org.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' > foundorg.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' founddistinct a ^ > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47) > at > org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40) > at > org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-20794) Support to select distinct columns in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reassigned FLINK-20794: --- Assignee: Vishaal Selvaraj > Support to select distinct columns in the Table API > --- > > Key: FLINK-20794 > URL: https://issues.apache.org/jira/browse/FLINK-20794 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Assignee: Vishaal Selvaraj >Priority: Major > Fix For: 1.18.0 > > > Currently, there is no corresponding functionality in Table API for the > following SQL: > {code:java} > SELECT DISTINCT users FROM Orders > {code} > For example, for the following job: > {code:java} > table.select("distinct a") > {code} > It will thrown the following exception: > {code:java} > org.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' > foundorg.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' founddistinct a ^ > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47) > at > org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40) > at > org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip
[ https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716967#comment-17716967 ] Dian Fu edited comment on FLINK-28786 at 4/27/23 10:04 AM: --- Fixed in: - master via a3368635e3d06f764d144f8c8e2e06e499e79665 - release-1.17 via 52c9742eed7128284278b07c40785bf1c4e30139 - release-1.16 via ae998dda4ee60e2268a8c4f8bfdbbd46cc1a0746 was (Author: dianfu): Fixed in: - master via a3368635e3d06f764d144f8c8e2e06e499e79665 - release-1.7 via 52c9742eed7128284278b07c40785bf1c4e30139 > Cannot run PyFlink 1.16 on MacOS with M1 chip > - > > Key: FLINK-28786 > URL: https://issues.apache.org/jira/browse/FLINK-28786 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0 >Reporter: Ran Tao >Assignee: Huang Xingbo >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0, 1.16.2 > > > I have tested it with 2 m1 machines. i will reproduce the bug 100%. > 1.m1 machine > macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > 1.m1 machine > macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > reproduce step: > 1.python -m pip install -r flink-python/dev/dev-requirements.txt > 2.cd flink-python; python setup.py sdist bdist_wheel; cd > apache-flink-libraries; python setup.py sdist; cd ..; > 3.python -m pip install apache-flink-libraries/dist/*.tar.gz > 4.python -m pip install dist/*.whl > when run > [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/] > it will cause > {code:java} > :219: RuntimeWarning: > apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate > binary incompatibility. Expected 24 from C header, got 32 from PyObject > Traceback (most recent call last): > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 129, in > word_count(known_args.input, known_args.output) > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 49, in word_count > t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 121, in create > return TableEnvironment(j_tenv) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 100, in __init__ > self._open() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1637, in _open > startup_loopback_server() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1628, in startup_loopback_server > from pyflink.fn_execution.beam.beam_worker_pool_service import \ > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py", > line 44, in > from pyflink.fn_execution.beam import beam_sdk_worker_main # noqa # > pylint: disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py", > line 21, in > import pyflink.fn_execution.beam.beam_operations # noqa # pylint: > disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py", > line 27, in > from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, > RemoteOperatorStateBackend > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py", > line 33, in > from pyflink.fn_execution.beam.beam_coders import FlinkCoder > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py", > line 27, in > from pyflink.fn_execution.beam import beam_coder_impl_fast as > beam_coder_impl > File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init > pyflink.fn_execution.beam.beam_coder_impl_fast > KeyError: '__pyx_vtable__' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip
[ https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-28786. --- Fix Version/s: 1.17.1 (was: 1.17.0) Resolution: Fixed > Cannot run PyFlink 1.16 on MacOS with M1 chip > - > > Key: FLINK-28786 > URL: https://issues.apache.org/jira/browse/FLINK-28786 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0 >Reporter: Ran Tao >Assignee: Huang Xingbo >Priority: Major > Labels: pull-request-available > Fix For: 1.16.2, 1.17.1 > > > I have tested it with 2 m1 machines. i will reproduce the bug 100%. > 1.m1 machine > macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > 1.m1 machine > macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > reproduce step: > 1.python -m pip install -r flink-python/dev/dev-requirements.txt > 2.cd flink-python; python setup.py sdist bdist_wheel; cd > apache-flink-libraries; python setup.py sdist; cd ..; > 3.python -m pip install apache-flink-libraries/dist/*.tar.gz > 4.python -m pip install dist/*.whl > when run > [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/] > it will cause > {code:java} > :219: RuntimeWarning: > apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate > binary incompatibility. Expected 24 from C header, got 32 from PyObject > Traceback (most recent call last): > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 129, in > word_count(known_args.input, known_args.output) > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 49, in word_count > t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 121, in create > return TableEnvironment(j_tenv) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 100, in __init__ > self._open() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1637, in _open > startup_loopback_server() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1628, in startup_loopback_server > from pyflink.fn_execution.beam.beam_worker_pool_service import \ > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py", > line 44, in > from pyflink.fn_execution.beam import beam_sdk_worker_main # noqa # > pylint: disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py", > line 21, in > import pyflink.fn_execution.beam.beam_operations # noqa # pylint: > disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py", > line 27, in > from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, > RemoteOperatorStateBackend > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py", > line 33, in > from pyflink.fn_execution.beam.beam_coders import FlinkCoder > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py", > line 27, in > from pyflink.fn_execution.beam import beam_coder_impl_fast as > beam_coder_impl > File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init > pyflink.fn_execution.beam.beam_coder_impl_fast > KeyError: '__pyx_vtable__' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip
[ https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716967#comment-17716967 ] Dian Fu commented on FLINK-28786: - Fixed in: - master via a3368635e3d06f764d144f8c8e2e06e499e79665 - release-1.7 via 52c9742eed7128284278b07c40785bf1c4e30139 > Cannot run PyFlink 1.16 on MacOS with M1 chip > - > > Key: FLINK-28786 > URL: https://issues.apache.org/jira/browse/FLINK-28786 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0 >Reporter: Ran Tao >Assignee: Huang Xingbo >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0, 1.16.2 > > > I have tested it with 2 m1 machines. i will reproduce the bug 100%. > 1.m1 machine > macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > 1.m1 machine > macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > reproduce step: > 1.python -m pip install -r flink-python/dev/dev-requirements.txt > 2.cd flink-python; python setup.py sdist bdist_wheel; cd > apache-flink-libraries; python setup.py sdist; cd ..; > 3.python -m pip install apache-flink-libraries/dist/*.tar.gz > 4.python -m pip install dist/*.whl > when run > [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/] > it will cause > {code:java} > :219: RuntimeWarning: > apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate > binary incompatibility. Expected 24 from C header, got 32 from PyObject > Traceback (most recent call last): > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 129, in > word_count(known_args.input, known_args.output) > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 49, in word_count > t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 121, in create > return TableEnvironment(j_tenv) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 100, in __init__ > self._open() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1637, in _open > startup_loopback_server() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1628, in startup_loopback_server > from pyflink.fn_execution.beam.beam_worker_pool_service import \ > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py", > line 44, in > from pyflink.fn_execution.beam import beam_sdk_worker_main # noqa # > pylint: disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_sdk_worker_main.py", > line 21, in > import pyflink.fn_execution.beam.beam_operations # noqa # pylint: > disable=unused-import > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_operations.py", > line 27, in > from pyflink.fn_execution.state_impl import RemoteKeyedStateBackend, > RemoteOperatorStateBackend > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/state_impl.py", > line 33, in > from pyflink.fn_execution.beam.beam_coders import FlinkCoder > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_coders.py", > line 27, in > from pyflink.fn_execution.beam import beam_coder_impl_fast as > beam_coder_impl > File "pyflink/fn_execution/beam/beam_coder_impl_fast.pyx", line 1, in init > pyflink.fn_execution.beam.beam_coder_impl_fast > KeyError: '__pyx_vtable__' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31949) The state of CountTumblingWindowAssigner of Python DataStream API were never purged
Dian Fu created FLINK-31949: --- Summary: The state of CountTumblingWindowAssigner of Python DataStream API were never purged Key: FLINK-31949 URL: https://issues.apache.org/jira/browse/FLINK-31949 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Posted By Urs from user-mailing list: {code:java} "In FLINK-26444, a couple of convenience window assigners were added to the Python Datastream API, including CountTumblingWindowAssigner. This assigner uses a CountTrigger by default, which produces TriggerResult.FIRE. As such, using this window assigner on a data stream will always produce a "state leak" since older count windows will always be retained without any chance to work on the elements again." {code} See [https://lists.apache.org/thread/ql8x283xzgd98z0vsqr9npl5j74hscsm] for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip
[ https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716757#comment-17716757 ] Dian Fu edited comment on FLINK-28786 at 4/26/23 2:41 PM: -- Reopen it as one user reported an issue related to Mac M1 on 1.17.0 (https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129): {code} Traceback (most recent call last): File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 287, in _execute response = task() File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 360, in lambda: self.create_worker().do_instruction(request), request) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction getattr(request, request_type), request.instruction_id) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 634, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1004, in process_bundle element.data) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 226, in process_encoded input_stream, True) File "apache_beam/coders/coder_impl.py", line 1519, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 1520, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 135, in apache_beam.coders.coder_impl.CoderImpl.decode_from_stream File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 34, in decode_from_stream return self._value_coder.decode_from_stream(in_stream, nested) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 58, in decode_from_stream return self._value_coder.decode_from_stream(data_input_stream) TypeError: Argument 'input_stream' has incorrect type (expected pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream) {code} was (Author: dianfu): Reopen it as one user reported an issue related to Mac M1 (https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129): {code} Traceback (most recent call last): File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 287, in _execute response = task() File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 360, in lambda: self.create_worker().do_instruction(request), request) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction getattr(request, request_type), request.instruction_id) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 634, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1004, in process_bundle element.data) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 226, in process_encoded input_stream, True) File "apache_beam/coders/coder_impl.py", line 1519, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 1520, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 135, in apache_beam.coders.coder_impl.CoderImpl.decode_from_stream File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 34, in decode_from_stream return self._value_coder.decode_from_stream(in_stream, nested) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 58, in decode_from_stream return self._value_coder.decode_from_stream(data_input_stream) TypeError: Argument 'input_stream' has incorrect type (expected pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream) {code} > Cannot run PyFlink 1.16 on MacOS with M1 chip > - > > Key: FLINK-28786 > URL: https://issues.apache.org/jira/browse/FLINK-28786 >
[jira] [Reopened] (FLINK-28786) Cannot run PyFlink 1.16 on MacOS with M1 chip
[ https://issues.apache.org/jira/browse/FLINK-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu reopened FLINK-28786: - Reopen it as one user reported an issue related to Mac M1 (https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679904702297129): {code} Traceback (most recent call last): File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 287, in _execute response = task() File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 360, in lambda: self.create_worker().do_instruction(request), request) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction getattr(request, request_type), request.instruction_id) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 634, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1004, in process_bundle element.data) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 226, in process_encoded input_stream, True) File "apache_beam/coders/coder_impl.py", line 1519, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 1520, in apache_beam.coders.coder_impl.ParamWindowedValueCoderImpl.decode_from_stream File "apache_beam/coders/coder_impl.py", line 135, in apache_beam.coders.coder_impl.CoderImpl.decode_from_stream File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 34, in decode_from_stream return self._value_coder.decode_from_stream(in_stream, nested) File "/Users/dianfu/venv/examples-37/lib/python3.7/site-packages/pyflink/fn_execution/beam/beam_coder_impl_slow.py", line 58, in decode_from_stream return self._value_coder.decode_from_stream(data_input_stream) TypeError: Argument 'input_stream' has incorrect type (expected pyflink.fn_execution.stream_fast.LengthPrefixInputStream, got BeamInputStream) {code} > Cannot run PyFlink 1.16 on MacOS with M1 chip > - > > Key: FLINK-28786 > URL: https://issues.apache.org/jira/browse/FLINK-28786 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0 >Reporter: Ran Tao >Assignee: Huang Xingbo >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0, 1.16.2 > > > I have tested it with 2 m1 machines. i will reproduce the bug 100%. > 1.m1 machine > macos bigsur 11.5.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > 1.m1 machine > macos monterey 12.1 & jdk8 * & jdk11 & python 3.8 & python 3.9 > reproduce step: > 1.python -m pip install -r flink-python/dev/dev-requirements.txt > 2.cd flink-python; python setup.py sdist bdist_wheel; cd > apache-flink-libraries; python setup.py sdist; cd ..; > 3.python -m pip install apache-flink-libraries/dist/*.tar.gz > 4.python -m pip install dist/*.whl > when run > [word_count.py|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/table_api_tutorial/] > it will cause > {code:java} > :219: RuntimeWarning: > apache_beam.coders.coder_impl.StreamCoderImpl size changed, may indicate > binary incompatibility. Expected 24 from C header, got 32 from PyObject > Traceback (most recent call last): > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 129, in > word_count(known_args.input, known_args.output) > File "/Users/chucheng/GitLab/pyflink-demo/table/streaming/word_count.py", > line 49, in word_count > t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 121, in create > return TableEnvironment(j_tenv) > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 100, in __init__ > self._open() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1637, in _open > startup_loopback_server() > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/table/table_environment.py", > line 1628, in startup_loopback_server > from pyflink.fn_execution.beam.beam_worker_pool_service import \ > File > "/Users/chucheng/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_worker_pool_service.py", > line 44, in > from pyflink.fn_execution.beam imp
[jira] [Updated] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex result type
[ https://issues.apache.org/jira/browse/FLINK-31905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-31905: Description: For the following job: {code} import logging, sys from pyflink.common import Row from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import Schema, DataTypes, TableDescriptor, StreamTableEnvironment from pyflink.table.expressions import col, row from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf logging.basicConfig(stream=sys.stdout, level=logging.ERROR, format="%(message)s") class EmitLastState(AggregateFunction): """ Aggregator that emits the latest state for the purpose of enabling parallelism on CDC tables. """ def create_accumulator(self) -> ACC: return Row(None, None) def accumulate(self, accumulator: ACC, *args): key, obj = args if (accumulator[0] is None) or (key > accumulator[0]): accumulator[0] = key accumulator[1] = obj def retract(self, accumulator: ACC, *args): pass def get_value(self, accumulator: ACC) -> T: return accumulator[1] some_complex_inner_type = DataTypes.ROW( [ DataTypes.FIELD("f0", DataTypes.STRING()), DataTypes.FIELD("f1", DataTypes.STRING()) ] ) some_complex_type = DataTypes.ROW( [ DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type)) for k in ("f0", "f1", "f2") ] + [ DataTypes.FIELD("f3", DataTypes.DATE()), DataTypes.FIELD("f4", DataTypes.VARCHAR(32)), DataTypes.FIELD("f5", DataTypes.VARCHAR(2)), ] ) @udf(input_types=DataTypes.STRING(), result_type=some_complex_type) def complex_udf(s): return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None) if __name__ == "__main__": env = StreamExecutionEnvironment.get_execution_environment() table_env = StreamTableEnvironment.create(env) table_env.get_config().set('pipeline.classpaths', 'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar') # Create schema _schema = { "p_key": DataTypes.INT(False), "modified_id": DataTypes.INT(False), "content": DataTypes.STRING() } schema = Schema.new_builder().from_fields( *zip(*[(k, v) for k, v in _schema.items()]) ).\ primary_key("p_key").\ build() # Create table descriptor descriptor = TableDescriptor.for_connector("postgres-cdc").\ option("hostname", "host.docker.internal").\ option("port", "5432").\ option("database-name", "flink_issue").\ option("username", "root").\ option("password", "root").\ option("debezium.plugin.name", "pgoutput").\ option("schema-name", "flink_schema").\ option("table-name", "flink_table").\ option("slot.name", "flink_slot").\ schema(schema).\ build() table_env.create_temporary_table("flink_table", descriptor) # Create changelog stream stream = table_env.from_path("flink_table")\ # Define UDAF accumulator_type = DataTypes.ROW( [ DataTypes.FIELD("f0", DataTypes.INT(False)), DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in _schema.items()])), ] ) result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in _schema.items()]) emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, result_type=result_type) # Emit last state based on modified_id to enable parallel processing stream = stream.\ group_by(col("p_key")).\ select( col("p_key"), emit_last(col("modified_id"),row(*(col(k) for k in _schema.keys())).cast(result_type)).alias("tmp_obj") ) # Select the elements of the objects stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in _schema.keys())) # We apply a UDF which parses the xml and returns a complex nested structure stream = stream.select(col("p_key"), complex_udf(col("content")).alias("nested_obj")) # We select an element from the nested structure in order to flatten it # The next line is the line causing issues, commenting the next line will make the pipeline work stream = stream.select(col("p_key"), col("nested_obj").get("f0")) # Interestingly, the below part does work... # stream = stream.select(col("nested_obj").get("f0")) table_env.to_changelog_stream(stream).print() # Execute env.execute_async() {code} {code} py4j.protocol.Py4JJavaError: An error occurred while calling o8.toChangelogStream. : java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source) at java.base/jdk.internal.u
[jira] [Created] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex type
Dian Fu created FLINK-31905: --- Summary: Exception thrown when accessing nested field of the result of Python UDF with complex type Key: FLINK-31905 URL: https://issues.apache.org/jira/browse/FLINK-31905 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu For the following job: {code} import logging, sys from pyflink.common import Row from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import Schema, DataTypes, TableDescriptor, StreamTableEnvironment from pyflink.table.expressions import col, row from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf logging.basicConfig(stream=sys.stdout, level=logging.ERROR, format="%(message)s") class EmitLastState(AggregateFunction): """ Aggregator that emits the latest state for the purpose of enabling parallelism on CDC tables. """ def create_accumulator(self) -> ACC: return Row(None, None) def accumulate(self, accumulator: ACC, *args): key, obj = args if (accumulator[0] is None) or (key > accumulator[0]): accumulator[0] = key accumulator[1] = obj def retract(self, accumulator: ACC, *args): pass def get_value(self, accumulator: ACC) -> T: return accumulator[1] some_complex_inner_type = DataTypes.ROW( [ DataTypes.FIELD("f0", DataTypes.STRING()), DataTypes.FIELD("f1", DataTypes.STRING()) ] ) some_complex_type = DataTypes.ROW( [ DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type)) for k in ("f0", "f1", "f2") ] + [ DataTypes.FIELD("f3", DataTypes.DATE()), DataTypes.FIELD("f4", DataTypes.VARCHAR(32)), DataTypes.FIELD("f5", DataTypes.VARCHAR(2)), ] ) @udf(input_types=DataTypes.STRING(), result_type=some_complex_type) def complex_udf(s): return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None) if __name__ == "__main__": env = StreamExecutionEnvironment.get_execution_environment() table_env = StreamTableEnvironment.create(env) table_env.get_config().set('pipeline.classpaths', 'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar') # Create schema _schema = { "p_key": DataTypes.INT(False), "modified_id": DataTypes.INT(False), "content": DataTypes.STRING() } schema = Schema.new_builder().from_fields( *zip(*[(k, v) for k, v in _schema.items()]) ).\ primary_key("p_key").\ build() # Create table descriptor descriptor = TableDescriptor.for_connector("postgres-cdc").\ option("hostname", "host.docker.internal").\ option("port", "5432").\ option("database-name", "flink_issue").\ option("username", "root").\ option("password", "root").\ option("debezium.plugin.name", "pgoutput").\ option("schema-name", "flink_schema").\ option("table-name", "flink_table").\ option("slot.name", "flink_slot").\ schema(schema).\ build() table_env.create_temporary_table("flink_table", descriptor) # Create changelog stream stream = table_env.from_path("flink_table")\ # Define UDAF accumulator_type = DataTypes.ROW( [ DataTypes.FIELD("f0", DataTypes.INT(False)), DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in _schema.items()])), ] ) result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in _schema.items()]) emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, result_type=result_type) # Emit last state based on modified_id to enable parallel processing stream = stream.\ group_by(col("p_key")).\ select( col("p_key"), emit_last(col("modified_id"),row(*(col(k) for k in _schema.keys())).cast(result_type)).alias("tmp_obj") ) # Select the elements of the objects stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in _schema.keys())) # We apply a UDF which parses the xml and returns a complex nested structure stream = stream.select(col("p_key"), complex_udf(col("content")).alias("nested_obj")) # We select an element from the nested structure in order to flatten it # The next line is the line causing issues, commenting the next line will make the pipeline work stream = stream.select(col("p_key"), col("nested_obj").get("f0")) # Interestingly, the below part does work... # stream = stream.select(col("nested_obj").get("f0")) table_env.to_changelog_stream(stream).print() # Execute env.execute_async() {code} {code} py4j.protocol.Py4JJavaError: An error occurred while calling o8.toChangelogStream. : java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 at
[jira] [Updated] (FLINK-31905) Exception thrown when accessing nested field of the result of Python UDF with complex result type
[ https://issues.apache.org/jira/browse/FLINK-31905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-31905: Summary: Exception thrown when accessing nested field of the result of Python UDF with complex result type (was: Exception thrown when accessing nested field of the result of Python UDF with complex type) > Exception thrown when accessing nested field of the result of Python UDF with > complex result type > - > > Key: FLINK-31905 > URL: https://issues.apache.org/jira/browse/FLINK-31905 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Priority: Major > > For the following job: > {code} > import logging, sys > from pyflink.common import Row > from pyflink.datastream import StreamExecutionEnvironment > from pyflink.table import Schema, DataTypes, TableDescriptor, > StreamTableEnvironment > from pyflink.table.expressions import col, row > from pyflink.table.udf import ACC, T, udaf, AggregateFunction, udf > logging.basicConfig(stream=sys.stdout, level=logging.ERROR, > format="%(message)s") > class EmitLastState(AggregateFunction): > """ > Aggregator that emits the latest state for the purpose of > enabling parallelism on CDC tables. > """ > def create_accumulator(self) -> ACC: > return Row(None, None) > def accumulate(self, accumulator: ACC, *args): > key, obj = args > if (accumulator[0] is None) or (key > accumulator[0]): > accumulator[0] = key > accumulator[1] = obj > def retract(self, accumulator: ACC, *args): > pass > def get_value(self, accumulator: ACC) -> T: > return accumulator[1] > some_complex_inner_type = DataTypes.ROW( > [ > DataTypes.FIELD("f0", DataTypes.STRING()), > DataTypes.FIELD("f1", DataTypes.STRING()) > ] > ) > some_complex_type = DataTypes.ROW( > [ > DataTypes.FIELD(k, DataTypes.ARRAY(some_complex_inner_type)) > for k in ("f0", "f1", "f2") > ] > + [ > DataTypes.FIELD("f3", DataTypes.DATE()), > DataTypes.FIELD("f4", DataTypes.VARCHAR(32)), > DataTypes.FIELD("f5", DataTypes.VARCHAR(2)), > ] > ) > @udf(input_types=DataTypes.STRING(), result_type=some_complex_type) > def complex_udf(s): > return Row(f0=None, f1=None, f2=None, f3=None, f4=None, f5=None) > if __name__ == "__main__": > env = StreamExecutionEnvironment.get_execution_environment() > table_env = StreamTableEnvironment.create(env) > table_env.get_config().set('pipeline.classpaths', > 'file:///Users/dianfu/code/src/workspace/pyflink-examples/flink-sql-connector-postgres-cdc-2.1.1.jar') > # Create schema > _schema = { > "p_key": DataTypes.INT(False), > "modified_id": DataTypes.INT(False), > "content": DataTypes.STRING() > } > schema = Schema.new_builder().from_fields( > *zip(*[(k, v) for k, v in _schema.items()]) > ).\ > primary_key("p_key").\ > build() > # Create table descriptor > descriptor = TableDescriptor.for_connector("postgres-cdc").\ > option("hostname", "host.docker.internal").\ > option("port", "5432").\ > option("database-name", "flink_issue").\ > option("username", "root").\ > option("password", "root").\ > option("debezium.plugin.name", "pgoutput").\ > option("schema-name", "flink_schema").\ > option("table-name", "flink_table").\ > option("slot.name", "flink_slot").\ > schema(schema).\ > build() > table_env.create_temporary_table("flink_table", descriptor) > # Create changelog stream > stream = table_env.from_path("flink_table")\ > # Define UDAF > accumulator_type = DataTypes.ROW( > [ > DataTypes.FIELD("f0", DataTypes.INT(False)), > DataTypes.FIELD("f1", DataTypes.ROW([DataTypes.FIELD(k, v) for k, > v in _schema.items()])), > ] > ) > result_type = DataTypes.ROW([DataTypes.FIELD(k, v) for k, v in > _schema.items()]) > emit_last = udaf(EmitLastState(), accumulator_type=accumulator_type, > result_type=result_type) > # Emit last state based on modified_id to enable parallel processing > stream = stream.\ > group_by(col("p_key")).\ > select( > col("p_key"), > emit_last(col("modified_id"),row(*(col(k) for k in > _schema.keys())).cast(result_type)).alias("tmp_obj") > ) > # Select the elements of the objects > stream = stream.select(*(col("tmp_obj").get(k).alias(k) for k in > _schema.keys())) > # We apply a UDF which parses the xml and returns a complex nested > structure > stream = stream.select(col("p_key"), > complex_udf(col("content
[jira] [Created] (FLINK-31861) Introduce ByteArraySchema which serialize/deserialize data of type byte array
Dian Fu created FLINK-31861: --- Summary: Introduce ByteArraySchema which serialize/deserialize data of type byte array Key: FLINK-31861 URL: https://issues.apache.org/jira/browse/FLINK-31861 Project: Flink Issue Type: Improvement Components: API / Core Reporter: Dian Fu The aim of this ticket is to introduce ByteArraySchema which serialize/deserialize data of type byte array. In this case, users could get the raw bytes from a data source. See [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1681928862762699] for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark
[ https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709597#comment-17709597 ] Dian Fu commented on FLINK-28528: - The exception happens when converting TableSchema to ResolvedSchema [1]. To resolve this issue, I guess we may need to support ResolvedSchema in PyFlink instead of TableSchema which is already deprecated in Java. [1] https://github.com/apache/flink/blob/b30b12d97a00ee5a338b2bd4233df1e2b38ce87f/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java#L430 > Table.getSchema fails on table with watermark > - > > Key: FLINK-28528 > URL: https://issues.apache.org/jira/browse/FLINK-28528 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.15.1 >Reporter: Xuannan Su >Assignee: Xingbo Huang >Priority: Major > > The bug can be reproduced with the following test. The test can pass if we > use the commented way to define the watermark. > {code:python} > def test_flink_2(self): > env = StreamExecutionEnvironment.get_execution_environment() > t_env = StreamTableEnvironment.create(env) > table = t_env.from_descriptor( > TableDescriptor.for_connector("filesystem") > .schema( > Schema.new_builder() > .column("name", DataTypes.STRING()) > .column("cost", DataTypes.INT()) > .column("distance", DataTypes.INT()) > .column("time", DataTypes.TIMESTAMP(3)) > .watermark("time", expr.col("time") - expr.lit(60).seconds) > # .watermark("time", "`time` - INTERVAL '60' SECOND") > .build() > ) > .format("csv") > .option("path", "./input.csv") > .build() > ) > print(table.get_schema()) > {code} > It causes the following exception > {code:none} > E pyflink.util.exceptions.TableException: > org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is > not string serializable. Currently, only expressions that originated from a > SQL expression have a well-defined string representation. > E at > org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51) > E at > org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455) > E at > java.util.Collections$SingletonList.forEach(Collections.java:4824) > E at > org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451) > E at org.apache.flink.table.api.Table.getSchema(Table.java:101) > E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > E at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > E at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > E at java.lang.reflect.Method.invoke(Method.java:498) > E at > org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > E at > org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > E at > org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > E at > org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > E at > org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > E at > org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > E at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark
[ https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709595#comment-17709595 ] Dian Fu commented on FLINK-28528: - The root cause is that currently table.get_schema() only supports expressions defined via SQL string. For example, the following statements should work: * .watermark("time", "`time` - INTERVAL '60' SECOND") * .column_by_expression('TM', 'parse_bq_datetime_udf(window_start)') > Table.getSchema fails on table with watermark > - > > Key: FLINK-28528 > URL: https://issues.apache.org/jira/browse/FLINK-28528 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.15.1 >Reporter: Xuannan Su >Assignee: Xingbo Huang >Priority: Major > > The bug can be reproduced with the following test. The test can pass if we > use the commented way to define the watermark. > {code:python} > def test_flink_2(self): > env = StreamExecutionEnvironment.get_execution_environment() > t_env = StreamTableEnvironment.create(env) > table = t_env.from_descriptor( > TableDescriptor.for_connector("filesystem") > .schema( > Schema.new_builder() > .column("name", DataTypes.STRING()) > .column("cost", DataTypes.INT()) > .column("distance", DataTypes.INT()) > .column("time", DataTypes.TIMESTAMP(3)) > .watermark("time", expr.col("time") - expr.lit(60).seconds) > # .watermark("time", "`time` - INTERVAL '60' SECOND") > .build() > ) > .format("csv") > .option("path", "./input.csv") > .build() > ) > print(table.get_schema()) > {code} > It causes the following exception > {code:none} > E pyflink.util.exceptions.TableException: > org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is > not string serializable. Currently, only expressions that originated from a > SQL expression have a well-defined string representation. > E at > org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51) > E at > org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455) > E at > java.util.Collections$SingletonList.forEach(Collections.java:4824) > E at > org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451) > E at org.apache.flink.table.api.Table.getSchema(Table.java:101) > E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > E at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > E at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > E at java.lang.reflect.Method.invoke(Method.java:498) > E at > org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > E at > org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > E at > org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > E at > org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > E at > org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > E at > org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > E at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28528) Table.getSchema fails on table with watermark
[ https://issues.apache.org/jira/browse/FLINK-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709591#comment-17709591 ] Dian Fu commented on FLINK-28528: - A similar issue reported in slack channel: https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680795285070859 > Table.getSchema fails on table with watermark > - > > Key: FLINK-28528 > URL: https://issues.apache.org/jira/browse/FLINK-28528 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.15.1 >Reporter: Xuannan Su >Assignee: Xingbo Huang >Priority: Major > > The bug can be reproduced with the following test. The test can pass if we > use the commented way to define the watermark. > {code:python} > def test_flink_2(self): > env = StreamExecutionEnvironment.get_execution_environment() > t_env = StreamTableEnvironment.create(env) > table = t_env.from_descriptor( > TableDescriptor.for_connector("filesystem") > .schema( > Schema.new_builder() > .column("name", DataTypes.STRING()) > .column("cost", DataTypes.INT()) > .column("distance", DataTypes.INT()) > .column("time", DataTypes.TIMESTAMP(3)) > .watermark("time", expr.col("time") - expr.lit(60).seconds) > # .watermark("time", "`time` - INTERVAL '60' SECOND") > .build() > ) > .format("csv") > .option("path", "./input.csv") > .build() > ) > print(table.get_schema()) > {code} > It causes the following exception > {code:none} > E pyflink.util.exceptions.TableException: > org.apache.flink.table.api.TableException: Expression 'minus(time, 6)' is > not string serializable. Currently, only expressions that originated from a > SQL expression have a well-defined string representation. > E at > org.apache.flink.table.expressions.ResolvedExpression.asSerializableString(ResolvedExpression.java:51) > E at > org.apache.flink.table.api.TableSchema.lambda$fromResolvedSchema$13(TableSchema.java:455) > E at > java.util.Collections$SingletonList.forEach(Collections.java:4824) > E at > org.apache.flink.table.api.TableSchema.fromResolvedSchema(TableSchema.java:451) > E at org.apache.flink.table.api.Table.getSchema(Table.java:101) > E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > E at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > E at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > E at java.lang.reflect.Method.invoke(Method.java:498) > E at > org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > E at > org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > E at > org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) > E at > org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > E at > org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) > E at > org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) > E at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF
[ https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-31707: Fix Version/s: 1.16.2 > Constant string cannot be used as input arguments of Pandas UDAF > > > Key: FLINK-31707 > URL: https://issues.apache.org/jira/browse/FLINK-31707 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.16.2, 1.18.0, 1.17.1 > > > It will throw exceptions as following when using constant strings in Pandas > UDAF: > {code} > E raise ValueError("field_type %s is not supported." % > field_type) > E ValueError: field_type type_name: CHAR > E char_info { > E length: 3 > E } > Eis not supported. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF
[ https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708188#comment-17708188 ] Dian Fu edited comment on FLINK-31707 at 4/4/23 5:26 AM: - Fixed in: - master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb - release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d - release-1.16 via 3291e4d6f9afff40e1e9718e23388610577de741 was (Author: dianfu): Fixed in: - master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb - release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d > Constant string cannot be used as input arguments of Pandas UDAF > > > Key: FLINK-31707 > URL: https://issues.apache.org/jira/browse/FLINK-31707 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.17.1 > > > It will throw exceptions as following when using constant strings in Pandas > UDAF: > {code} > E raise ValueError("field_type %s is not supported." % > field_type) > E ValueError: field_type type_name: CHAR > E char_info { > E length: 3 > E } > Eis not supported. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF
[ https://issues.apache.org/jira/browse/FLINK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-31707. --- Fix Version/s: 1.18.0 1.17.1 Resolution: Fixed Fixed in: - master via 7c6d8b0134cbcdc60d56b87d39ff2f28c310b1eb - release-1.17 via 9c5ca0590806932e4e8f9d3f942f0a2a5442fe2d > Constant string cannot be used as input arguments of Pandas UDAF > > > Key: FLINK-31707 > URL: https://issues.apache.org/jira/browse/FLINK-31707 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.17.1 > > > It will throw exceptions as following when using constant strings in Pandas > UDAF: > {code} > E raise ValueError("field_type %s is not supported." % > field_type) > E ValueError: field_type type_name: CHAR > E char_info { > E length: 3 > E } > Eis not supported. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31707) Constant string cannot be used as input arguments of Pandas UDAF
Dian Fu created FLINK-31707: --- Summary: Constant string cannot be used as input arguments of Pandas UDAF Key: FLINK-31707 URL: https://issues.apache.org/jira/browse/FLINK-31707 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Assignee: Dian Fu It will throw exceptions as following when using constant strings in Pandas UDAF: {code} E raise ValueError("field_type %s is not supported." % field_type) E ValueError: field_type type_name: CHAR E char_info { E length: 3 E } Eis not supported. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31690) The current key is not set for KeyedCoProcessOperator
[ https://issues.apache.org/jira/browse/FLINK-31690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-31690. --- Fix Version/s: 1.16.2 1.18.0 1.17.1 Resolution: Fixed Fixed in: - master via 6c1ffe544e31bb67df94175a559f2f40362795a4 - release-1.17 via e3d612e7e98bedde42c365df3f2ed2a2ca76aefa - release-1.16 via 01cdaee25cdc41773a5c42638f4c5209373b5aa4 > The current key is not set for KeyedCoProcessOperator > - > > Key: FLINK-31690 > URL: https://issues.apache.org/jira/browse/FLINK-31690 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.16.2, 1.18.0, 1.17.1 > > > See https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680294701254239 for > more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31690) The current key is not set for KeyedCoProcessOperator
Dian Fu created FLINK-31690: --- Summary: The current key is not set for KeyedCoProcessOperator Key: FLINK-31690 URL: https://issues.apache.org/jira/browse/FLINK-31690 Project: Flink Issue Type: Bug Components: API / Python Reporter: Dian Fu Assignee: Dian Fu See https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680294701254239 for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31532) Support StreamExecutionEnvironment.socketTextStream in Python DataStream API
Dian Fu created FLINK-31532: --- Summary: Support StreamExecutionEnvironment.socketTextStream in Python DataStream API Key: FLINK-31532 URL: https://issues.apache.org/jira/browse/FLINK-31532 Project: Flink Issue Type: Improvement Components: API / Python Reporter: Dian Fu Currently, StreamExecutionEnvironment.socketTextStream is still missing in Python DataStream API. It would be great to support it. It may be helpful to in special cases, e.g. testing, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31503) "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Python
[ https://issues.apache.org/jira/browse/FLINK-31503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-31503. --- Fix Version/s: 1.16.2 1.18.0 1.17.1 Resolution: Fixed Fixed in: - master via de258f3ce01e720d23bec67c20892133f89293d3 - release-1.17 via 3163b8f9caa53e9487ce062eba2c3d399dfe08a2 - release-1.16 via d438b3bdc48a0456088594700d438725d0fb1480 > "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider > org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is > thrown when executing Python UDFs in SQL Client > -- > > Key: FLINK-31503 > URL: https://issues.apache.org/jira/browse/FLINK-31503 > Project: Flink > Issue Type: Bug > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Fix For: 1.16.2, 1.18.0, 1.17.1 > > > The following exception will be thrown when executing SQL statements > containing Python UDFs in SQL Client: > {code} > Caused by: java.util.ServiceConfigurationError: > org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider > org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype > at java.util.ServiceLoader.fail(ServiceLoader.java:239) > at java.util.ServiceLoader.access$300(ServiceLoader.java:185) > at > java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:415) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:507) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSortedSet$Builder.addAll(ImmutableSortedSet.java:528) > at > org.apache.beam.sdk.util.common.ReflectHelpers.loadServicesOrdered(ReflectHelpers.java:199) > at > org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.initializeRegistry(PipelineOptionsFactory.java:2089) > at > org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2083) > at > org.apache.beam.sdk.options.PipelineOptionsFactory$Cache.(PipelineOptionsFactory.java:2047) > at > org.apache.beam.sdk.options.PipelineOptionsFactory.resetCache(PipelineOptionsFactory.java:581) > at > org.apache.beam.sdk.options.PipelineOptionsFactory.(PipelineOptionsFactory.java:547) > at > org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:241) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)