Mongo Connector

2019-10-12 Thread Vijay Srinivasaraghavan
Hello,
Do we know how much of support we have for Mongo? The documentation page is 
pointing to a connector repo that was very old (last updated 5 years ago) and 
looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as 
well as bulk upserts). Trying to understand if anyone has used Mongo in the 
pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread Hequn Cheng
Hi Stephan,

Big +1 for adding this to Apache Flink!

As for the problem of whether this should be added to the Flink main
repository, from my side, I prefer to put it in the main repository. Not
only Stateful Functions shares very close relations with the current Flink,
but also other libs or modules in Flink can make use of it the other way
round in the future. At that time the Flink API stack would also be changed
a bit and this would be cool.

Best, Hequn

On Sat, Oct 12, 2019 at 9:16 PM Biao Liu  wrote:

> Hi Stehpan,
>
> +1 for having Stateful Functions in Flink.
>
> Before discussing which repository it should belong, I was wondering if we
> have reached an agreement of "splitting flink repository" as Piotr
> mentioned or not. It seems that it's just no more further discussion.
> It's OK for me to add it to core repository. After all almost everything
> is in core repository now. But if we decide to split the core repository
> someday, I tend to create a separate repository for Stateful Functions. It
> might be good time to take the first step of splitting.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Sat, 12 Oct 2019 at 19:31, Yu Li  wrote:
>
>> Hi Stephan,
>>
>> Big +1 for adding stateful functions to Flink. I believe a lot of user
>> would be interested to try this out and I could imagine how this could
>> contribute to reduce the TCO for business requiring both streaming
>> processing and stateful functions.
>>
>> And my 2 cents is to put it into flink core repository since I could see
>> a tight connection between this library and flink state.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Sat, 12 Oct 2019 at 17:31, jincheng sun 
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> bit +1 for adding this great features to Apache Flink.
>>>
>>> Regarding where we should place it, put it into Flink core repository or
>>> create a separate repository? I prefer put it into main repository and
>>> looking forward the more detail discussion for this decision.
>>>
>>> Best,
>>> Jincheng
>>>
>>>
>>> Jingsong Li  于2019年10月12日周六 上午11:32写道:
>>>
 Hi Stephan,

 big +1 for this contribution. It provides another user interface that
 is easy to use and popular at this time. these functions, It's hard for
 users to write in SQL/TableApi, while using DataStream is too complex.
 (We've done some stateFun kind jobs using DataStream before). With
 statefun, it is very easy.

 I think it's also a good opportunity to exercise Flink's core
 capabilities. I looked at stateful-functions-flink briefly, it is very
 interesting. I think there are many other things Flink can improve. So I
 think it's a better thing to put it into Flink, and the improvement for it
 will be more natural in the future.

 Best,
 Jingsong Lee

 On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz <
 dwysakow...@apache.org> wrote:

> Hi Stephan,
>
> I think this is a nice library, but what I like more about it is that
> it suggests exploring different use-cases. I think it definitely makes
> sense for the Flink community to explore more lightweight applications 
> that
> reuses resources. Therefore I definitely think it is a good idea for Flink
> community to accept this contribution and help maintaining it.
>
> Personally I'd prefer to have it in a separate repository. There were
> a few discussions before where different people were suggesting to extract
> connectors and other libraries to separate repositories. Moreover I think
> it could serve as an example for the Flink ecosystem website[1]. This 
> could
> be the first project in there and give a good impression that the 
> community
> sees potential in the ecosystem website.
>
> Lastly, I'm wondering if this should go through PMC vote according to
> our bylaws[2]. In the end the suggestion is to adopt an existing code base
> as is. It also proposes a new programs concept that could result in a 
> shift
> of priorities for the community in a long run.
>
> Best,
>
> Dawid
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html
>
> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> On 11/10/2019 13:12, Till Rohrmann wrote:
>
> Hi Stephan,
>
> +1 for adding stateful functions to Flink. I believe the new set of
> applications this feature will unlock will be super interesting for new 
> and
> existing Flink users alike.
>
> One reason for not including it in the main repository would to not
> being bound to Flink's release cadence. This would allow to release faster
> and more often. However, I believe that having it eventually in Flink's
> main repository would be beneficial in the long run.
>
> Cheers,
> Till
>
> On Fri, Oct 11, 2019 at 12:56 PM Trevor Grant <

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread Biao Liu
Hi Stehpan,

+1 for having Stateful Functions in Flink.

Before discussing which repository it should belong, I was wondering if we
have reached an agreement of "splitting flink repository" as Piotr
mentioned or not. It seems that it's just no more further discussion.
It's OK for me to add it to core repository. After all almost everything is
in core repository now. But if we decide to split the core repository
someday, I tend to create a separate repository for Stateful Functions. It
might be good time to take the first step of splitting.

Thanks,
Biao /'bɪ.aʊ/



On Sat, 12 Oct 2019 at 19:31, Yu Li  wrote:

> Hi Stephan,
>
> Big +1 for adding stateful functions to Flink. I believe a lot of user
> would be interested to try this out and I could imagine how this could
> contribute to reduce the TCO for business requiring both streaming
> processing and stateful functions.
>
> And my 2 cents is to put it into flink core repository since I could see a
> tight connection between this library and flink state.
>
> Best Regards,
> Yu
>
>
> On Sat, 12 Oct 2019 at 17:31, jincheng sun 
> wrote:
>
>> Hi Stephan,
>>
>> bit +1 for adding this great features to Apache Flink.
>>
>> Regarding where we should place it, put it into Flink core repository or
>> create a separate repository? I prefer put it into main repository and
>> looking forward the more detail discussion for this decision.
>>
>> Best,
>> Jincheng
>>
>>
>> Jingsong Li  于2019年10月12日周六 上午11:32写道:
>>
>>> Hi Stephan,
>>>
>>> big +1 for this contribution. It provides another user interface that is
>>> easy to use and popular at this time. these functions, It's hard for users
>>> to write in SQL/TableApi, while using DataStream is too complex. (We've
>>> done some stateFun kind jobs using DataStream before). With statefun, it is
>>> very easy.
>>>
>>> I think it's also a good opportunity to exercise Flink's core
>>> capabilities. I looked at stateful-functions-flink briefly, it is very
>>> interesting. I think there are many other things Flink can improve. So I
>>> think it's a better thing to put it into Flink, and the improvement for it
>>> will be more natural in the future.
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz 
>>> wrote:
>>>
 Hi Stephan,

 I think this is a nice library, but what I like more about it is that
 it suggests exploring different use-cases. I think it definitely makes
 sense for the Flink community to explore more lightweight applications that
 reuses resources. Therefore I definitely think it is a good idea for Flink
 community to accept this contribution and help maintaining it.

 Personally I'd prefer to have it in a separate repository. There were a
 few discussions before where different people were suggesting to extract
 connectors and other libraries to separate repositories. Moreover I think
 it could serve as an example for the Flink ecosystem website[1]. This could
 be the first project in there and give a good impression that the community
 sees potential in the ecosystem website.

 Lastly, I'm wondering if this should go through PMC vote according to
 our bylaws[2]. In the end the suggestion is to adopt an existing code base
 as is. It also proposes a new programs concept that could result in a shift
 of priorities for the community in a long run.

 Best,

 Dawid

 [1]
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html

 [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
 On 11/10/2019 13:12, Till Rohrmann wrote:

 Hi Stephan,

 +1 for adding stateful functions to Flink. I believe the new set of
 applications this feature will unlock will be super interesting for new and
 existing Flink users alike.

 One reason for not including it in the main repository would to not
 being bound to Flink's release cadence. This would allow to release faster
 and more often. However, I believe that having it eventually in Flink's
 main repository would be beneficial in the long run.

 Cheers,
 Till

 On Fri, Oct 11, 2019 at 12:56 PM Trevor Grant 
 wrote:

> +1 non-binding on contribution.
>
> Separate repo, or feature branch to start maybe? I just feel like in
> the beginning this thing is going to have lots of breaking changes that
> maybe aren't going to fit well with tests / other "v1+" release code. Just
> my .02.
>
>
>
> On Fri, Oct 11, 2019 at 4:38 AM Stephan Ewen  wrote:
>
>> Dear Flink Community!
>>
>> Some of you probably heard it already: On Tuesday, at Flink Forward
>> Berlin, we announced **Stateful Functions**.
>>
>> Stateful Functions is a library on Flink to implement general purpose
>> applications. It is built around stateful functions (who 

[VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-12 Thread Wei Zhong
Hi all,

I would like to start the vote for FLIP-78[1] which is discussed and reached 
consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it by 2019-10-16 
18:00 UTC, unless there is an objection or not enough votes.

Thanks,
Wei

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
 

[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
 





[jira] [Created] (FLINK-14383) Support python UDFs with constant value of time interval types

2019-10-12 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-14383:
---

 Summary: Support python UDFs with constant value of time interval 
types
 Key: FLINK-14383
 URL: https://issues.apache.org/jira/browse/FLINK-14383
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Hequn Cheng
 Fix For: 1.10.0


As discussed 
[here|https://github.com/apache/flink/pull/9858#issuecomment-541312088], this 
issue is dedicated to add support for python UDFs with constant value of time 
interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14382) Wrong local FLINK_PLUGINS_DIR is set to flink-conf.yaml of jobmanager and taskmanager on Yarn

2019-10-12 Thread Yang Wang (Jira)
Yang Wang created FLINK-14382:
-

 Summary: Wrong local FLINK_PLUGINS_DIR is set to flink-conf.yaml 
of jobmanager and taskmanager on Yarn
 Key: FLINK-14382
 URL: https://issues.apache.org/jira/browse/FLINK-14382
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Reporter: Yang Wang


If we do not set FLINK_PLUGINS_DIR in flink-conf.yaml, it will be set to [flink 
configuration|https://github.com/apache/flink/blob/9e6ff81e22d6f5f04abb50ca1aea84fd2542bf9d/flink-core/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java#L158]
 according to the environment.

In yarn mode, the local path will be set in flink-conf.yaml and used by 
jobmanager and taskmanager. We will find the warning log like below. 
{code:java}
2019-10-12 19:24:58,165 WARN  org.apache.flink.core.plugin.PluginConfig 
- Environment variable [FLINK_PLUGINS_DIR] is set to 
[/Users/wangy/IdeaProjects/apache-flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/plugins]
 but the directory doesn't exist
{code}

It was in introduced by FLINK-12143.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread Yu Li
Hi Stephan,

Big +1 for adding stateful functions to Flink. I believe a lot of user
would be interested to try this out and I could imagine how this could
contribute to reduce the TCO for business requiring both streaming
processing and stateful functions.

And my 2 cents is to put it into flink core repository since I could see a
tight connection between this library and flink state.

Best Regards,
Yu


On Sat, 12 Oct 2019 at 17:31, jincheng sun  wrote:

> Hi Stephan,
>
> bit +1 for adding this great features to Apache Flink.
>
> Regarding where we should place it, put it into Flink core repository or
> create a separate repository? I prefer put it into main repository and
> looking forward the more detail discussion for this decision.
>
> Best,
> Jincheng
>
>
> Jingsong Li  于2019年10月12日周六 上午11:32写道:
>
>> Hi Stephan,
>>
>> big +1 for this contribution. It provides another user interface that is
>> easy to use and popular at this time. these functions, It's hard for users
>> to write in SQL/TableApi, while using DataStream is too complex. (We've
>> done some stateFun kind jobs using DataStream before). With statefun, it is
>> very easy.
>>
>> I think it's also a good opportunity to exercise Flink's core
>> capabilities. I looked at stateful-functions-flink briefly, it is very
>> interesting. I think there are many other things Flink can improve. So I
>> think it's a better thing to put it into Flink, and the improvement for it
>> will be more natural in the future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz 
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> I think this is a nice library, but what I like more about it is that it
>>> suggests exploring different use-cases. I think it definitely makes sense
>>> for the Flink community to explore more lightweight applications that
>>> reuses resources. Therefore I definitely think it is a good idea for Flink
>>> community to accept this contribution and help maintaining it.
>>>
>>> Personally I'd prefer to have it in a separate repository. There were a
>>> few discussions before where different people were suggesting to extract
>>> connectors and other libraries to separate repositories. Moreover I think
>>> it could serve as an example for the Flink ecosystem website[1]. This could
>>> be the first project in there and give a good impression that the community
>>> sees potential in the ecosystem website.
>>>
>>> Lastly, I'm wondering if this should go through PMC vote according to
>>> our bylaws[2]. In the end the suggestion is to adopt an existing code base
>>> as is. It also proposes a new programs concept that could result in a shift
>>> of priorities for the community in a long run.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html
>>>
>>> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
>>> On 11/10/2019 13:12, Till Rohrmann wrote:
>>>
>>> Hi Stephan,
>>>
>>> +1 for adding stateful functions to Flink. I believe the new set of
>>> applications this feature will unlock will be super interesting for new and
>>> existing Flink users alike.
>>>
>>> One reason for not including it in the main repository would to not
>>> being bound to Flink's release cadence. This would allow to release faster
>>> and more often. However, I believe that having it eventually in Flink's
>>> main repository would be beneficial in the long run.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Oct 11, 2019 at 12:56 PM Trevor Grant 
>>> wrote:
>>>
 +1 non-binding on contribution.

 Separate repo, or feature branch to start maybe? I just feel like in
 the beginning this thing is going to have lots of breaking changes that
 maybe aren't going to fit well with tests / other "v1+" release code. Just
 my .02.



 On Fri, Oct 11, 2019 at 4:38 AM Stephan Ewen  wrote:

> Dear Flink Community!
>
> Some of you probably heard it already: On Tuesday, at Flink Forward
> Berlin, we announced **Stateful Functions**.
>
> Stateful Functions is a library on Flink to implement general purpose
> applications. It is built around stateful functions (who would have thunk)
> that can communicate arbitrarily through messages, have consistent
> state, and a small resource footprint. They are a bit like keyed
> ProcessFunctions
> that can send each other messages.
> As simple as this sounds, this means you can now communicate in
> non-DAG patterns, so it allows users to build programs they cannot build
> with Flink.
> It also has other neat properties, like multiplexing of functions,
> modular composition, tooling both container-based deployments and
> as-a-Flink-job deployments.
>
> You can find out more about it here
>   - Website: https://statefun.io/
>   - Code: https://github.com/ververica/stateful-functions
>   - Talk with 

Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-10-12 Thread jincheng sun
Hi Wei Zhong,

I have given you edit permission, could you please re-login and check it. :)

Best,
Jincheng


Wei Zhong  于2019年10月12日周六 下午6:06写道:

> Hi Jincheng,
>
> As communicated in this email thread, I’m proposing to convert the design
> doc to a FLIP and bring up the VOTE. It would be great if you can grant me
> the write access to Confluence :). My Confluence ID is zhongwei.
>
> Thanks,
> Wei Zhong
>
>
> 在 2019年10月12日,17:41,jincheng sun  写道:
>
> Hi,
>
> + 1 to bring up the VOTE and create the FLIP.
>
> Best,
> Jincheng
>
> Dian Fu  于2019年10月12日周六 上午10:12写道:
>
>> Hi Wei,
>>
>> Thanks for the great work! It seems that it has reached an agreement on
>> the design. Should we start VOTE on this design? I'm also wondering if a
>> FLIP is deserved as it introduces user facing API. If so, we should create
>> a FLIP before VOTE.
>>
>> Thanks,
>> Dian
>>
>> > 在 2019年10月9日,上午11:23,Wei Zhong  写道:
>> >
>> > Hi Jincheng, Dian and Jeff,
>> >
>> > Thank you for your replies and comments in google doc! I think we have
>> come to an agreement on the desgin doc with only minor changes as follow:
>> > - Using the API "set_python_executable" instead of
>> "set_environment_variable" to set the python executable file path.
>> > - Making the argument "requirements_cached_dir" of API
>> "set_python_requirements" optional to support only upload a requirement.txt
>> file.
>> >
>> > I'm also glad to hear any other opinions!
>> >
>> > Thanks,
>> > Wei
>> >
>> >
>> >> 在 2019年9月26日,15:23,Dian Fu  写道:
>> >>
>> >> Hi Wei,
>> >>
>> >> Thanks a lot for bringing up this discussion. Python dependency
>> management is very important for Python users. I have left a few comments
>> on the design doc.
>> >>
>> >> Thanks,
>> >> Dian
>> >>
>> >>> 在 2019年9月26日,下午12:23,jincheng sun  写道:
>> >>>
>> >>> Thanks for bring up the discussion, Wei.
>> >>> Overall the design doc looks good. I have left a few comments.
>> >>>
>> >>> BTW: Dependency Management is very important for Python UDFs, welcome
>> >>> anyone left your suggestions!
>> >>>
>> >>> Best,
>> >>> Jincheng
>> >>>
>> >>> Wei Zhong  于2019年9月26日周四 上午11:59写道:
>> >>>
>>  Hi everyone,
>> 
>>  In FLIP-58 [1] we have a plan to support Python UDF. As a critical
>> part of
>>  python UDF, the environment and dependency management of users'
>> python code
>>  has not been fully discussed.
>> 
>>  I'd like to start a discussion on "Flink Python UDF Environment and
>>  Dependency Management". Here is the design doc I drafted:
>> 
>> 
>> 
>> https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
>> 
>>  Please take a look, and feedbacks are welcome.
>> 
>>  Thanks,
>>  Wei
>> 
>>  [1]:
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>  <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Stateless+Function+for+Table
>> >
>> 
>> 
>> >>
>> >
>>
>>
>


Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-10-12 Thread Wei Zhong
Thank you Jincheng. I have got the permission.

> 在 2019年10月12日,18:09,jincheng sun  写道:
> 
> Hi Wei Zhong,
> 
> I have given you edit permission, could you please re-login and check it. :)
> 
> Best,
> Jincheng
> 
> 
> Wei Zhong mailto:weizhong0...@gmail.com>> 
> 于2019年10月12日周六 下午6:06写道:
> Hi Jincheng,
> 
> As communicated in this email thread, I’m proposing to convert the design doc 
> to a FLIP and bring up the VOTE. It would be great if you can grant me the 
> write access to Confluence :). My Confluence ID is zhongwei.
> 
> Thanks,
> Wei Zhong
> 
> 
>> 在 2019年10月12日,17:41,jincheng sun > > 写道:
>> 
>> Hi,
>> 
>> + 1 to bring up the VOTE and create the FLIP.
>> 
>> Best, 
>> Jincheng
>> 
>> Dian Fu mailto:dian0511...@gmail.com>> 
>> 于2019年10月12日周六 上午10:12写道:
>> Hi Wei,
>> 
>> Thanks for the great work! It seems that it has reached an agreement on the 
>> design. Should we start VOTE on this design? I'm also wondering if a FLIP is 
>> deserved as it introduces user facing API. If so, we should create a FLIP 
>> before VOTE.
>> 
>> Thanks,
>> Dian
>> 
>> > 在 2019年10月9日,上午11:23,Wei Zhong > > > 写道:
>> > 
>> > Hi Jincheng, Dian and Jeff,
>> > 
>> > Thank you for your replies and comments in google doc! I think we have 
>> > come to an agreement on the desgin doc with only minor changes as follow:
>> > - Using the API "set_python_executable" instead of 
>> > "set_environment_variable" to set the python executable file path.
>> > - Making the argument "requirements_cached_dir" of API 
>> > "set_python_requirements" optional to support only upload a 
>> > requirement.txt file.
>> > 
>> > I'm also glad to hear any other opinions!
>> > 
>> > Thanks,
>> > Wei
>> > 
>> > 
>> >> 在 2019年9月26日,15:23,Dian Fu > >> > 写道:
>> >> 
>> >> Hi Wei,
>> >> 
>> >> Thanks a lot for bringing up this discussion. Python dependency 
>> >> management is very important for Python users. I have left a few comments 
>> >> on the design doc.
>> >> 
>> >> Thanks,
>> >> Dian
>> >> 
>> >>> 在 2019年9月26日,下午12:23,jincheng sun > >>> > 写道:
>> >>> 
>> >>> Thanks for bring up the discussion, Wei.
>> >>> Overall the design doc looks good. I have left a few comments.
>> >>> 
>> >>> BTW: Dependency Management is very important for Python UDFs, welcome
>> >>> anyone left your suggestions!
>> >>> 
>> >>> Best,
>> >>> Jincheng
>> >>> 
>> >>> Wei Zhong mailto:weizhong0...@gmail.com>> 
>> >>> 于2019年9月26日周四 上午11:59写道:
>> >>> 
>>  Hi everyone,
>>  
>>  In FLIP-58 [1] we have a plan to support Python UDF. As a critical part 
>>  of
>>  python UDF, the environment and dependency management of users' python 
>>  code
>>  has not been fully discussed.
>>  
>>  I'd like to start a discussion on "Flink Python UDF Environment and
>>  Dependency Management". Here is the design doc I drafted:
>>  
>>  
>>  https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
>>   
>>  
>>  
>>  Please take a look, and feedbacks are welcome.
>>  
>>  Thanks,
>>  Wei
>>  
>>  [1]:
>>  https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>   
>>  
>>  >   
>>  >
>>  
>>  
>> >> 
>> > 
>> 
> 



Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-10-12 Thread Wei Zhong
Hi Jincheng,

As communicated in this email thread, I’m proposing to convert the design doc 
to a FLIP and bring up the VOTE. It would be great if you can grant me the 
write access to Confluence :). My Confluence ID is zhongwei.

Thanks,
Wei Zhong


> 在 2019年10月12日,17:41,jincheng sun  写道:
> 
> Hi,
> 
> + 1 to bring up the VOTE and create the FLIP.
> 
> Best, 
> Jincheng
> 
> Dian Fu mailto:dian0511...@gmail.com>> 于2019年10月12日周六 
> 上午10:12写道:
> Hi Wei,
> 
> Thanks for the great work! It seems that it has reached an agreement on the 
> design. Should we start VOTE on this design? I'm also wondering if a FLIP is 
> deserved as it introduces user facing API. If so, we should create a FLIP 
> before VOTE.
> 
> Thanks,
> Dian
> 
> > 在 2019年10月9日,上午11:23,Wei Zhong  > > 写道:
> > 
> > Hi Jincheng, Dian and Jeff,
> > 
> > Thank you for your replies and comments in google doc! I think we have come 
> > to an agreement on the desgin doc with only minor changes as follow:
> > - Using the API "set_python_executable" instead of 
> > "set_environment_variable" to set the python executable file path.
> > - Making the argument "requirements_cached_dir" of API 
> > "set_python_requirements" optional to support only upload a requirement.txt 
> > file.
> > 
> > I'm also glad to hear any other opinions!
> > 
> > Thanks,
> > Wei
> > 
> > 
> >> 在 2019年9月26日,15:23,Dian Fu  >> > 写道:
> >> 
> >> Hi Wei,
> >> 
> >> Thanks a lot for bringing up this discussion. Python dependency management 
> >> is very important for Python users. I have left a few comments on the 
> >> design doc.
> >> 
> >> Thanks,
> >> Dian
> >> 
> >>> 在 2019年9月26日,下午12:23,jincheng sun  >>> > 写道:
> >>> 
> >>> Thanks for bring up the discussion, Wei.
> >>> Overall the design doc looks good. I have left a few comments.
> >>> 
> >>> BTW: Dependency Management is very important for Python UDFs, welcome
> >>> anyone left your suggestions!
> >>> 
> >>> Best,
> >>> Jincheng
> >>> 
> >>> Wei Zhong mailto:weizhong0...@gmail.com>> 
> >>> 于2019年9月26日周四 上午11:59写道:
> >>> 
>  Hi everyone,
>  
>  In FLIP-58 [1] we have a plan to support Python UDF. As a critical part 
>  of
>  python UDF, the environment and dependency management of users' python 
>  code
>  has not been fully discussed.
>  
>  I'd like to start a discussion on "Flink Python UDF Environment and
>  Dependency Management". Here is the design doc I drafted:
>  
>  
>  https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
>   
>  
>  
>  Please take a look, and feedbacks are welcome.
>  
>  Thanks,
>  Wei
>  
>  [1]:
>  https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>   
>  
>     
>  >
>  
>  
> >> 
> > 
> 



Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-12 Thread Jark Wu
Hi Jingsong, 

Thanks for verifying. I updated the fixVersion to 1.9.2 for these issues. 

Best,
Jark

> 在 2019年10月12日,16:45,Jingsong Li  写道:
> 
> +1 (non-binding)
> 
> - Check if checksums files match the corresponding release files
> - Check if GPG files match the corresponding release files
> - Verify that the source archives do not contains any binaries
> - Build the source with Maven to ensure all source files have Apache headers
> - Check that all POM files point to the same version (1.9.1)
> - Start a local cluster both Scala 2.11 and 2.12, and shut down. verified
> out and log, verified we ui. run examples.
> All succeeded.
> 
> Hi Jark, there are some JIRA issue still use fix version 1.9.0, do you need
> modify fix version?
> https://issues.apache.org/jira/browse/FLINK-14328
> https://issues.apache.org/jira/browse/FLINK-14327
> https://issues.apache.org/jira/browse/FLINK-14215
> https://issues.apache.org/jira/browse/FLINK-14072
> https://issues.apache.org/jira/browse/FLINK-12576
> 
> Best,
> Jingsong Lee
> 
> 
> On Wed, Oct 9, 2019 at 3:32 PM Jark Wu  wrote:
> 
>> +1 from my side.
>> 
>> - checked signatures and hashes
>> - checked that all POM files point to the same version
>> - verified that the source archives do not contains any binaries
>> - build the source release with Scala 2.12 and Scala 2.11 successfully
>> - manually verified the diff pom files between 1.9.0 and 1.9.1 to check
>> dependencies, looks good
>> - started cluster for both Scala 2.11 and 2.12, ran examples, verified web
>> ui and log output, nothing unexpected
>> 
>> Best,
>> Jark
>> 
>> On Wed, 9 Oct 2019 at 11:18, Jark Wu  wrote:
>> 
>>> Thanks Jincheng and Till, then let's keep on verifying the RC1.
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Wed, 9 Oct 2019 at 11:00, jincheng sun 
>>> wrote:
>>> 
 I think we should create the new RC when we find the blocker issues.
 We can looking forward the other check result, we can add the fix of
 FLINK-14315 in to 1.9.1 only we find the blockers.
 
 Best,
 Jincheng
 
 Till Rohrmann  于2019年10月8日周二 下午8:20写道:
 
> FLINK-14315 has been merged into the release-1.9 branch. I've marked
>> the
> fix version of this ticket as 1.9.2. If we should create a new RC, then
> we
> could include this fix. If this happens, then we need to update the fix
> version to 1.9.1.
> 
> Cheers,
> Till
> 
> On Tue, Oct 8, 2019 at 1:51 PM Till Rohrmann 
> wrote:
> 
>> If people already spent time on verifying the current RC I would also
> be
>> fine to release the fix for FLINK-14315 with Flink 1.9.2.
>> 
>> I will try to merge the PR as soon as possible. When I close the
> ticket, I
>> will update the fix version field to 1.9.2.
>> 
>> Cheers,
>> Till
>> 
>> On Tue, Oct 8, 2019 at 4:43 AM Jark Wu  wrote:
>> 
>>> Hi Zili,
>>> 
>>> Thanks for reminding me this, because of the Chinese National Day
>> and
>>> Flink Forward Europe,
>>> we didn't receive any verification on the 1.9.1 RC1. And I guess we
> have
>>> to extend the voting time after Flink Forward.
>>> So I'm fine to have FLINK-14315 and rebuild another RC. What do you
> think
>>> @Till @Jincheng?
>>> 
>>> I guess FLINK-14315 will be merged soon as it is approved 4 days
>> ago?
>>> Could you help to merge it once it is passed ? @Zili Chen
>>> 
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Tue, 8 Oct 2019 at 09:14, Zili Chen 
>> wrote:
>>> 
 Hi Jark,
 
 I notice a critical bug[1] is marked resolved in 1.9.1 but given
> 1.9.1
 has been cut I'd like to throw the issue here so that we're sure
 whether or not it is included in 1.9.1.
 
 Best,
 tison.
 
 [1] https://issues.apache.org/jira/browse/FLINK-14315
 
 
 Jark Wu  于2019年9月30日周一 下午3:25写道:
 
> Hi everyone,
> 
> Please review and vote on the release candidate #1 for the version
> 1.9.1,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific
> comments)
> 
> 
> The complete staging area is available for your review, which
> includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience
> releases to
> be
> deployed to dist.apache.org [2], which are signed with the key
>> with
> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> * all artifacts to be deployed to the Maven Central Repository
>> [4],
> * source code tag "release-1.9.1-rc1" [5],
> * website pull request listing the new release and adding
> announcement
> blog
> post [6].
> 
> The vote will be open for at least 72 hours.
> Please cast your votes 

Re: [DISCUSS] Drop Python 2 support for 1.10

2019-10-12 Thread jincheng sun
Hi Dian,

I think it's better to bring up the VOTE for this proposal. Then push this
forward.:)

Thanks,
Jincheng

Timo Walther  于2019年10月10日周四 下午8:07写道:

> I also heard from other companies that upgrading to Python 3 is in
> progress for data teams.
>
> +1 for simplifying the code base with option 1).
>
> Thanks,
> Timo
>
> On 08.10.19 16:34, Dian Fu wrote:
> > Hi everyone,
> >
> > I would like to propose to drop Python 2 support(Currently Python 2.7,
> 3.5, 3.6, 3.7 are all supported in Flink) as it's coming to an end at Jan
> 1, 2020 [1]. A lot of projects [2][3][4] has already stated or are planning
> to drop Python 2 support.
> >
> > The benefits of dropping Python 2 support are:
> > 1. Maintaining Python 2/3 compatibility is a burden and it makes the
> code complicate as Python 2 and Python 3 is not compatible.
> > 2. There are many features which are only available in Python 3.x such
> as Type Hints[5]. We can only make use of this kind of features after
> dropping the Python 2 support.
> > 3. Flink-python depends on third-part projects, such as Apache Beam (may
> add more dependencies such as pandas, etc in the near future), it's not
> possible to upgrade them to the latest version once they drop the Python 2
> support.
> >
> > Here are the options we have:
> > 1. Drop Python 2 support in 1.10:
> > As flink-python module is a new module added since 1.9.0 and so dropping
> Python 2 support at the early stage seems a good choice for us.
> > 2. Deprecate Python 2 in 1.10 and drop its support in 1.11:
> > As 1.10 is planned to be released around the beginning of 2020. This is
> also aligned with the official Python 2 support.
> >
> > Personally I prefer option 1 as flink-python is new module and there is
> no much history reasons to consider.
> >
> > Looking forward to your feedback!
> >
> > Regards,
> > Dian
> >
> > [1] https://pythonclock.org/ 
> > [2] https://python3statement.org/ 
> > [3]
> https://spark.apache.org/news/plan-for-dropping-python-2-support.html <
> https://spark.apache.org/news/plan-for-dropping-python-2-support.html>
> > [4]
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> <
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> >
> > [5]
> https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5
> <
> https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5
> >
>
>
>


Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-10-12 Thread jincheng sun
Hi,

+ 1 to bring up the VOTE and create the FLIP.

Best,
Jincheng

Dian Fu  于2019年10月12日周六 上午10:12写道:

> Hi Wei,
>
> Thanks for the great work! It seems that it has reached an agreement on
> the design. Should we start VOTE on this design? I'm also wondering if a
> FLIP is deserved as it introduces user facing API. If so, we should create
> a FLIP before VOTE.
>
> Thanks,
> Dian
>
> > 在 2019年10月9日,上午11:23,Wei Zhong  写道:
> >
> > Hi Jincheng, Dian and Jeff,
> >
> > Thank you for your replies and comments in google doc! I think we have
> come to an agreement on the desgin doc with only minor changes as follow:
> > - Using the API "set_python_executable" instead of
> "set_environment_variable" to set the python executable file path.
> > - Making the argument "requirements_cached_dir" of API
> "set_python_requirements" optional to support only upload a requirement.txt
> file.
> >
> > I'm also glad to hear any other opinions!
> >
> > Thanks,
> > Wei
> >
> >
> >> 在 2019年9月26日,15:23,Dian Fu  写道:
> >>
> >> Hi Wei,
> >>
> >> Thanks a lot for bringing up this discussion. Python dependency
> management is very important for Python users. I have left a few comments
> on the design doc.
> >>
> >> Thanks,
> >> Dian
> >>
> >>> 在 2019年9月26日,下午12:23,jincheng sun  写道:
> >>>
> >>> Thanks for bring up the discussion, Wei.
> >>> Overall the design doc looks good. I have left a few comments.
> >>>
> >>> BTW: Dependency Management is very important for Python UDFs, welcome
> >>> anyone left your suggestions!
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Wei Zhong  于2019年9月26日周四 上午11:59写道:
> >>>
>  Hi everyone,
> 
>  In FLIP-58 [1] we have a plan to support Python UDF. As a critical
> part of
>  python UDF, the environment and dependency management of users'
> python code
>  has not been fully discussed.
> 
>  I'd like to start a discussion on "Flink Python UDF Environment and
>  Dependency Management". Here is the design doc I drafted:
> 
> 
> 
> https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
> 
>  Please take a look, and feedbacks are welcome.
> 
>  Thanks,
>  Wei
> 
>  [1]:
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>  <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Stateless+Function+for+Table
> >
> 
> 
> >>
> >
>
>


Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread jincheng sun
Hi Stephan,

bit +1 for adding this great features to Apache Flink.

Regarding where we should place it, put it into Flink core repository or
create a separate repository? I prefer put it into main repository and
looking forward the more detail discussion for this decision.

Best,
Jincheng


Jingsong Li  于2019年10月12日周六 上午11:32写道:

> Hi Stephan,
>
> big +1 for this contribution. It provides another user interface that is
> easy to use and popular at this time. these functions, It's hard for users
> to write in SQL/TableApi, while using DataStream is too complex. (We've
> done some stateFun kind jobs using DataStream before). With statefun, it is
> very easy.
>
> I think it's also a good opportunity to exercise Flink's core
> capabilities. I looked at stateful-functions-flink briefly, it is very
> interesting. I think there are many other things Flink can improve. So I
> think it's a better thing to put it into Flink, and the improvement for it
> will be more natural in the future.
>
> Best,
> Jingsong Lee
>
> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Stephan,
>>
>> I think this is a nice library, but what I like more about it is that it
>> suggests exploring different use-cases. I think it definitely makes sense
>> for the Flink community to explore more lightweight applications that
>> reuses resources. Therefore I definitely think it is a good idea for Flink
>> community to accept this contribution and help maintaining it.
>>
>> Personally I'd prefer to have it in a separate repository. There were a
>> few discussions before where different people were suggesting to extract
>> connectors and other libraries to separate repositories. Moreover I think
>> it could serve as an example for the Flink ecosystem website[1]. This could
>> be the first project in there and give a good impression that the community
>> sees potential in the ecosystem website.
>>
>> Lastly, I'm wondering if this should go through PMC vote according to our
>> bylaws[2]. In the end the suggestion is to adopt an existing code base as
>> is. It also proposes a new programs concept that could result in a shift of
>> priorities for the community in a long run.
>>
>> Best,
>>
>> Dawid
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html
>>
>> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
>> On 11/10/2019 13:12, Till Rohrmann wrote:
>>
>> Hi Stephan,
>>
>> +1 for adding stateful functions to Flink. I believe the new set of
>> applications this feature will unlock will be super interesting for new and
>> existing Flink users alike.
>>
>> One reason for not including it in the main repository would to not being
>> bound to Flink's release cadence. This would allow to release faster and
>> more often. However, I believe that having it eventually in Flink's main
>> repository would be beneficial in the long run.
>>
>> Cheers,
>> Till
>>
>> On Fri, Oct 11, 2019 at 12:56 PM Trevor Grant 
>> wrote:
>>
>>> +1 non-binding on contribution.
>>>
>>> Separate repo, or feature branch to start maybe? I just feel like in the
>>> beginning this thing is going to have lots of breaking changes that maybe
>>> aren't going to fit well with tests / other "v1+" release code. Just my
>>> .02.
>>>
>>>
>>>
>>> On Fri, Oct 11, 2019 at 4:38 AM Stephan Ewen  wrote:
>>>
 Dear Flink Community!

 Some of you probably heard it already: On Tuesday, at Flink Forward
 Berlin, we announced **Stateful Functions**.

 Stateful Functions is a library on Flink to implement general purpose
 applications. It is built around stateful functions (who would have thunk)
 that can communicate arbitrarily through messages, have consistent
 state, and a small resource footprint. They are a bit like keyed
 ProcessFunctions
 that can send each other messages.
 As simple as this sounds, this means you can now communicate in non-DAG
 patterns, so it allows users to build programs they cannot build with 
 Flink.
 It also has other neat properties, like multiplexing of functions,
 modular composition, tooling both container-based deployments and
 as-a-Flink-job deployments.

 You can find out more about it here
   - Website: https://statefun.io/
   - Code: https://github.com/ververica/stateful-functions
   - Talk with motivation:
 https://speakerdeck.com/stephanewen/stateful-functions-building-general-purpose-applications-and-services-on-apache-flink?slide=12


 Now for the main issue: **We would like to contribute this project to
 Apache Flink**

 I believe that this is a great fit for both sides.
 For the Flink community, it would be a way to extend the capabilities
 and use cases of Flink into a completely different type of applications and
 thus grow the community into this new field.
 Many discussions recently about evolving the Flink runtime 

Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-12 Thread Jingsong Li
+1 (non-binding)

- Check if checksums files match the corresponding release files
- Check if GPG files match the corresponding release files
- Verify that the source archives do not contains any binaries
- Build the source with Maven to ensure all source files have Apache headers
- Check that all POM files point to the same version (1.9.1)
- Start a local cluster both Scala 2.11 and 2.12, and shut down. verified
out and log, verified we ui. run examples.
All succeeded.

Hi Jark, there are some JIRA issue still use fix version 1.9.0, do you need
modify fix version?
https://issues.apache.org/jira/browse/FLINK-14328
https://issues.apache.org/jira/browse/FLINK-14327
https://issues.apache.org/jira/browse/FLINK-14215
https://issues.apache.org/jira/browse/FLINK-14072
https://issues.apache.org/jira/browse/FLINK-12576

Best,
Jingsong Lee


On Wed, Oct 9, 2019 at 3:32 PM Jark Wu  wrote:

> +1 from my side.
>
> - checked signatures and hashes
> - checked that all POM files point to the same version
> - verified that the source archives do not contains any binaries
> - build the source release with Scala 2.12 and Scala 2.11 successfully
> - manually verified the diff pom files between 1.9.0 and 1.9.1 to check
> dependencies, looks good
> - started cluster for both Scala 2.11 and 2.12, ran examples, verified web
> ui and log output, nothing unexpected
>
> Best,
> Jark
>
> On Wed, 9 Oct 2019 at 11:18, Jark Wu  wrote:
>
> > Thanks Jincheng and Till, then let's keep on verifying the RC1.
> >
> > Best,
> > Jark
> >
> > On Wed, 9 Oct 2019 at 11:00, jincheng sun 
> > wrote:
> >
> >> I think we should create the new RC when we find the blocker issues.
> >> We can looking forward the other check result, we can add the fix of
> >> FLINK-14315 in to 1.9.1 only we find the blockers.
> >>
> >> Best,
> >> Jincheng
> >>
> >> Till Rohrmann  于2019年10月8日周二 下午8:20写道:
> >>
> >>> FLINK-14315 has been merged into the release-1.9 branch. I've marked
> the
> >>> fix version of this ticket as 1.9.2. If we should create a new RC, then
> >>> we
> >>> could include this fix. If this happens, then we need to update the fix
> >>> version to 1.9.1.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Tue, Oct 8, 2019 at 1:51 PM Till Rohrmann 
> >>> wrote:
> >>>
> >>> > If people already spent time on verifying the current RC I would also
> >>> be
> >>> > fine to release the fix for FLINK-14315 with Flink 1.9.2.
> >>> >
> >>> > I will try to merge the PR as soon as possible. When I close the
> >>> ticket, I
> >>> > will update the fix version field to 1.9.2.
> >>> >
> >>> > Cheers,
> >>> > Till
> >>> >
> >>> > On Tue, Oct 8, 2019 at 4:43 AM Jark Wu  wrote:
> >>> >
> >>> >> Hi Zili,
> >>> >>
> >>> >> Thanks for reminding me this, because of the Chinese National Day
> and
> >>> >> Flink Forward Europe,
> >>> >> we didn't receive any verification on the 1.9.1 RC1. And I guess we
> >>> have
> >>> >> to extend the voting time after Flink Forward.
> >>> >> So I'm fine to have FLINK-14315 and rebuild another RC. What do you
> >>> think
> >>> >> @Till @Jincheng?
> >>> >>
> >>> >> I guess FLINK-14315 will be merged soon as it is approved 4 days
> ago?
> >>> >> Could you help to merge it once it is passed ? @Zili Chen
> >>> >> 
> >>> >>
> >>> >> Best,
> >>> >> Jark
> >>> >>
> >>> >> On Tue, 8 Oct 2019 at 09:14, Zili Chen 
> wrote:
> >>> >>
> >>> >>> Hi Jark,
> >>> >>>
> >>> >>> I notice a critical bug[1] is marked resolved in 1.9.1 but given
> >>> 1.9.1
> >>> >>> has been cut I'd like to throw the issue here so that we're sure
> >>> >>> whether or not it is included in 1.9.1.
> >>> >>>
> >>> >>> Best,
> >>> >>> tison.
> >>> >>>
> >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-14315
> >>> >>>
> >>> >>>
> >>> >>> Jark Wu  于2019年9月30日周一 下午3:25写道:
> >>> >>>
> >>>   Hi everyone,
> >>> 
> >>>  Please review and vote on the release candidate #1 for the version
> >>>  1.9.1,
> >>>  as follows:
> >>>  [ ] +1, Approve the release
> >>>  [ ] -1, Do not approve the release (please provide specific
> >>> comments)
> >>> 
> >>> 
> >>>  The complete staging area is available for your review, which
> >>> includes:
> >>>  * JIRA release notes [1],
> >>>  * the official Apache source release and binary convenience
> >>> releases to
> >>>  be
> >>>  deployed to dist.apache.org [2], which are signed with the key
> with
> >>>  fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> >>>  * all artifacts to be deployed to the Maven Central Repository
> [4],
> >>>  * source code tag "release-1.9.1-rc1" [5],
> >>>  * website pull request listing the new release and adding
> >>> announcement
> >>>  blog
> >>>  post [6].
> >>> 
> >>>  The vote will be open for at least 72 hours.
> >>>  Please cast your votes before *Oct. 3th 2019, 08:00 UTC*.
> >>> 
> >>>  It is adopted by majority approval, with at least 3 PMC
> affirmative
> >>>  votes.
> >>> 
> >>>  Thanks,
> >>>