[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-03 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074469#comment-17074469
 ] 

Dian Fu commented on FLINK-1:
-

Thanks [~aljoscha] and [~sunjincheng121] for your suggestions, make sense to 
me. Also thanks [~zhongwei] for preparing the new PR. 

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-03 Thread Wei Zhong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074451#comment-17074451
 ] 

Wei Zhong commented on FLINK-1:
---

Ok, I'll update the PR and move the code back to `flink-python` module.

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-03 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074446#comment-17074446
 ] 

sunjincheng commented on FLINK-1:
-

I agree with [~aljoscha] that we should be careful when adding codes to the 
core modules. It seems that the added code in the core module is just to 
eliminate the code duplication which maybe introduced in the future. I think 
it's unnecessary at least for now. Maybe we can come up with another more clean 
way in the future when we actually encounter this issue. What do you think?

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-02 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074068#comment-17074068
 ] 

Aljoscha Krettek commented on FLINK-1:
--

I know that it can sometimes be annoying to come up with a cleaner solution but 
in Flink we have learned some lessons the hard way and we're still dealing with 
some mistakes that we make in the past. For example letting our Scala 
dependency "bleed through" in so many modules. The fact that 
{{flink-streaming-java}} depends on {{flink-runtime}} which made it very easy 
for a lot of runtime classes to "bleed through" into the API.

It might seem like a minor thing to introduce Python support code in the core 
modules now but past experience has shown us that we need to be very careful in 
properly abstracting and separating concerns.

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-02 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074063#comment-17074063
 ] 

Aljoscha Krettek commented on FLINK-1:
--

I still think it's not the right approach to mix in the Python concerns in all 
the other modules. (I also think it's not optimal that {{flink-clients}} now 
also has Python-related code, this makes some things more complicated than they 
need to be).

Ideally, both the Table API and SQL client don't use the environments from the 
other APIs, so in the long run there needs to be a proper solution for this 
anyways. Regarding the solution: I think it's also problematic that the code 
mutates the Configuration, i.e. puts in new fields based on existing fields. 
This makes the code more fragile and/or harder to debug.

I think either the generic {{TableEnvironment(Impl)}} or the planners are the 
place to put the Python support code. For code sharing, maybe it's good to add 
a module {{flink-python-core}} or {{flink-python-common}} to isolate those 
common concerns.

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-04-02 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073593#comment-17073593
 ] 

Dian Fu commented on FLINK-1:
-

Hi [~aljoscha], as explained by [~zhongwei], the requirement is that we need to 
process the Python dependency management related config options and register 
the dependencies to the distributed cache to make sure that the dependencies 
could be accessed during execution. In the `configure` method of 
ExecutionEnvironment / StreamExecutionEnvironment, it will perform some 
pre-processing according to the configuration object, e.g. setup the execution 
config and checkpoint config according to the give configuration object. So 
conceptually, this seems a good place for this requirement, e.g. register the 
Python dependencies to the distributed cache according to the given 
configuration object. Besides, it will also avoid code duplication as this 
requirements not only apply for PyFlink Table API, but also for any other kinds 
of API, such as PyFlink DataStream API, etc which maybe introduced in the near 
future. 

What's your thought?

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-03-31 Thread Wei Zhong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071640#comment-17071640
 ] 

Wei Zhong commented on FLINK-1:
---

Hi [~aljoscha],

Yes we don't really support executing Python UDFs in `flink-core`, `flink-java` 
and `flink-streaming-java`. The code we added there is only used to define and 
process the Python configurations. 

First, there are many modules in our code base that support python, e.g. `SQL 
DDL`, `flink-table-planner`, `flink-table-planner-blink`, `flink-client`, 
`flink-sql-client`, etc. For simplicity let's call the modules that support 
Python "python-related modules". The amount of python-related modules will 
increase in the future, e.g. `flink-streaming-java`(for PyFlink DataStream API) 
and `flink-container`(for k8s support). 

The need for Python dependency management is widespread in any python-related 
modules. To unify the interface of Python dependency management and decouple 
the python-related modules from Python dependency management, we intend to use 
configurations to store the Python dependency information. The configurations 
of the information will be stored in the `Configuration` object of the 
`ExecutionEnvironment/StreamExecutionEnvironment`. After entering the code of 
the flink-python module, these configurations will be used to build the Python 
environment.

Because any python-related modules need to read the definition of Python 
ConfigOptions, we put the definition of Python ConfigOptions (i.e. 
`PythonOptions` class) in `flink-core`, just like other config options. The 
python-related modules also need to process these configurations (i.e. register 
files to the distributed cache). For code reuse we process them in the 
`configure()` method of `ExecutionEnvironment/StreamExecutionEnvironment`. We 
can also do this via repeating the logic in each python-related module, or 
putting the logic in `flink-python` and calling via reflection when needed, but 
both of them seem not very clean.

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-03-31 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071523#comment-17071523
 ] 

Aljoscha Krettek commented on FLINK-1:
--

Sorry, I updated my comment.

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-03-30 Thread Wei Zhong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070855#comment-17070855
 ] 

Wei Zhong commented on FLINK-1:
---

Hi [~aljoscha], it looks like some of your messages have been lost. Could you 
resend them?

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16666) Support new Python dependency configuration options in flink-java, flink-streaming-java and flink-table

2020-03-30 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070813#comment-17070813
 ] 

Aljoscha Krettek commented on FLINK-1:
--

I think we need to take a step back: do we need to add functionality to 
{{flink-core}}, {{flink-java}}, etc?  We don't really support executing Python 
UDFs there, so we could 

> Support new Python dependency configuration options in flink-java, 
> flink-streaming-java and flink-table
> ---
>
> Key: FLINK-1
> URL: https://issues.apache.org/jira/browse/FLINK-1
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Wei Zhong
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)