[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074469#comment-17074469 ] Dian Fu commented on FLINK-1: - Thanks [~aljoscha] and [~sunjincheng121] for your suggestions, make sense to me. Also thanks [~zhongwei] for preparing the new PR. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074451#comment-17074451 ] Wei Zhong commented on FLINK-1: --- Ok, I'll update the PR and move the code back to `flink-python` module. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074446#comment-17074446 ] sunjincheng commented on FLINK-1: - I agree with [~aljoscha] that we should be careful when adding codes to the core modules. It seems that the added code in the core module is just to eliminate the code duplication which maybe introduced in the future. I think it's unnecessary at least for now. Maybe we can come up with another more clean way in the future when we actually encounter this issue. What do you think? > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074068#comment-17074068 ] Aljoscha Krettek commented on FLINK-1: -- I know that it can sometimes be annoying to come up with a cleaner solution but in Flink we have learned some lessons the hard way and we're still dealing with some mistakes that we make in the past. For example letting our Scala dependency "bleed through" in so many modules. The fact that {{flink-streaming-java}} depends on {{flink-runtime}} which made it very easy for a lot of runtime classes to "bleed through" into the API. It might seem like a minor thing to introduce Python support code in the core modules now but past experience has shown us that we need to be very careful in properly abstracting and separating concerns. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074063#comment-17074063 ] Aljoscha Krettek commented on FLINK-1: -- I still think it's not the right approach to mix in the Python concerns in all the other modules. (I also think it's not optimal that {{flink-clients}} now also has Python-related code, this makes some things more complicated than they need to be). Ideally, both the Table API and SQL client don't use the environments from the other APIs, so in the long run there needs to be a proper solution for this anyways. Regarding the solution: I think it's also problematic that the code mutates the Configuration, i.e. puts in new fields based on existing fields. This makes the code more fragile and/or harder to debug. I think either the generic {{TableEnvironment(Impl)}} or the planners are the place to put the Python support code. For code sharing, maybe it's good to add a module {{flink-python-core}} or {{flink-python-common}} to isolate those common concerns. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073593#comment-17073593 ] Dian Fu commented on FLINK-1: - Hi [~aljoscha], as explained by [~zhongwei], the requirement is that we need to process the Python dependency management related config options and register the dependencies to the distributed cache to make sure that the dependencies could be accessed during execution. In the `configure` method of ExecutionEnvironment / StreamExecutionEnvironment, it will perform some pre-processing according to the configuration object, e.g. setup the execution config and checkpoint config according to the give configuration object. So conceptually, this seems a good place for this requirement, e.g. register the Python dependencies to the distributed cache according to the given configuration object. Besides, it will also avoid code duplication as this requirements not only apply for PyFlink Table API, but also for any other kinds of API, such as PyFlink DataStream API, etc which maybe introduced in the near future. What's your thought? > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071640#comment-17071640 ] Wei Zhong commented on FLINK-1: --- Hi [~aljoscha], Yes we don't really support executing Python UDFs in `flink-core`, `flink-java` and `flink-streaming-java`. The code we added there is only used to define and process the Python configurations. First, there are many modules in our code base that support python, e.g. `SQL DDL`, `flink-table-planner`, `flink-table-planner-blink`, `flink-client`, `flink-sql-client`, etc. For simplicity let's call the modules that support Python "python-related modules". The amount of python-related modules will increase in the future, e.g. `flink-streaming-java`(for PyFlink DataStream API) and `flink-container`(for k8s support). The need for Python dependency management is widespread in any python-related modules. To unify the interface of Python dependency management and decouple the python-related modules from Python dependency management, we intend to use configurations to store the Python dependency information. The configurations of the information will be stored in the `Configuration` object of the `ExecutionEnvironment/StreamExecutionEnvironment`. After entering the code of the flink-python module, these configurations will be used to build the Python environment. Because any python-related modules need to read the definition of Python ConfigOptions, we put the definition of Python ConfigOptions (i.e. `PythonOptions` class) in `flink-core`, just like other config options. The python-related modules also need to process these configurations (i.e. register files to the distributed cache). For code reuse we process them in the `configure()` method of `ExecutionEnvironment/StreamExecutionEnvironment`. We can also do this via repeating the logic in each python-related module, or putting the logic in `flink-python` and calling via reflection when needed, but both of them seem not very clean. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071523#comment-17071523 ] Aljoscha Krettek commented on FLINK-1: -- Sorry, I updated my comment. > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070855#comment-17070855 ] Wei Zhong commented on FLINK-1: --- Hi [~aljoscha], it looks like some of your messages have been lost. Could you resend them? > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table
[ https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070813#comment-17070813 ] Aljoscha Krettek commented on FLINK-1: -- I think we need to take a step back: do we need to add functionality to {{flink-core}}, {{flink-java}}, etc? We don't really support executing Python UDFs there, so we could > Support new Python dependency configuration options in flink-java, > flink-streaming-java and flink-table > --- > > Key: FLINK-1 > URL: https://issues.apache.org/jira/browse/FLINK-1 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Assignee: Wei Zhong >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)