from:"Dian Fu"

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

2020-03-25 Thread Dian Fu

Hi Robert,

Thanks a lot for your great work!

Overall I'm +1 to switch to Azure as the primary CI tool if it's stable enough 
as I think there is no need to run both the travis and Azure for one single PR.

However, there are still some improvements need to do and it would be great if 
these issues could be addressed before fully switch to Azure:
- The report of Azure is still not viewable[1] (I noticed that Hequn has also 
reported this issue in another thread). This is very useful information.
- For PR test of Azure pipeline, it seems that it will not rebase the master 
code before running the tests.

Thanks,
Dian

[1] 
https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
 

> 在 2020年3月25日，下午3:33，Chesnay Schepler  写道：
> 
> Some thoughts:
> - by virtue of maintaining the past 2 releases we will have to maintain any 
> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> - the azure setup doesn't appear to be equivalent yet since the java e2e 
> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of which 
> SQLClientKafkaITCase isn't run
> - the nightly scripts still seems to be using a maven version other than 
> 3.2.5; from today on master:
> 
> 2020-03-25T05:31:52.7412964Z [INFO] < 
> org.apache.flink:flink-end-to-end-tests-common-kafka >
> 2020-03-25T05:31:52.7413854Z [INFO] Building 
> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT   [39/46]
> 2020-03-25T05:31:52.7414689Z [INFO] [ jar 
> ]-
> 2020-03-25T05:31:52.7518360Z [INFO]
> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check 
> (validate) @ flink-end-to-end-tests-common-kafka ---
> 
> - there is no real benefit in retiring the travis support in CiBot; the 
> important part is whether Travis is run or not for pull requests.
> 
> From what I can tell though azure seems to be working fine for pull requests, 
> so +1 to at least disable the travis PR runs.
> 
> On 23/03/2020 14:48, Robert Metzger wrote:
>> Hey devs,
>> 
>> I would like to discuss whether it makes sense to fully switch to Azure
>> Pipelines and phase out our Travis integration.
>> More information on our Azure integration can be found here:
>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>> 
>> Travis will stay for the release-1.10 and older branches, as I have set up
>> Azure only for the master branch.
>> 
>> Proposal:
>> - We keep the flinkbot infrastructure supporting both Travis and Azure
>> around, while we are still receive pull requests and pushes for the
>> "master" and "release-1.10" branches.
>> - We remove the travis-specific files from "master", so that builds are not
>> triggered anymore
>> - once we receive no more builds at Travis (because 1.11 has been
>> released), we remove the remaining travis-related infrastructure
>> 
>> What do you think?
>> 
>> 
>> Best,
>> Robert
>> 
>

[jira] [Created] (FLINK-16759) HiveModuleTest failed to compile on release-1.10

2020-03-24 Thread Dian Fu (Jira)

Dian Fu created FLINK-16759:
---

 Summary: HiveModuleTest failed to compile on release-1.10
 Key: FLINK-16759
 URL: https://issues.apache.org/jira/browse/FLINK-16759
 Project: Flink
  Issue Type: Bug
  Components: Build System, Connectors / Hive
Reporter: Dian Fu
 Fix For: 1.10.1


The cron task of release-1.10 failed to compile with the following exception:
{code}
23:36:45.190 [ERROR] 
/home/travis/build/apache/flink/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/table/module/hive/HiveModuleTest.java:[158,45]
 constructor HiveModule in class org.apache.flink.table.module.hive.HiveModule 
cannot be applied to given types;
 required: java.lang.String
 found: no arguments
 reason: actual and formal argument lists differ in length
{code}

instance: [https://api.travis-ci.org/v3/job/666450476/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16747) Performance improvements for Python UDF

2020-03-24 Thread Dian Fu (Jira)

Dian Fu created FLINK-16747:
---

 Summary: Performance improvements for Python UDF
 Key: FLINK-16747
 URL: https://issues.apache.org/jira/browse/FLINK-16747
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Huang Xingbo
 Fix For: 1.11.0


This is an umbrella JIRA which tracks all the efforts about Python UDF 
performance improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-22 Thread Dian Fu

Thanks Jingsong for bringing up this discussion. +1 to this proposal. I think 
Bowen's proposal makes much sense to me.

This is also a painful problem for PyFlink users. Currently there is no 
built-in easy-to-use table source/sink and it requires users to write a lot of 
code to trying out PyFlink. This is especially painful for new users who are 
not familiar with PyFlink/Flink. I have also encountered the tedious process 
Bowen encountered, e.g. writing random source connector, print sink and also 
blackhole print sink as there are no built-in ones to use. 

Regards,
Dian

> 在 2020年3月22日，上午11:24，Jark Wu  写道：
> 
> +1 to Bowen's proposal. I also saw many requirements on such built-in
> connectors.
> 
> I will leave some my thoughts here:
> 
>> 1. datagen source (random source)
> I think we can merge the functinality of sequence-source into random source
> to allow users to custom their data values.
> Flink can generate random data according to the field types, users
> can customize their values to be more domain specific, e.g.
> 'field.user'='User_[1-9]{0,1}'
> This will be similar to kafka-datagen-connect[1].
> 
>> 2. console sink (print sink)
> This will be very useful in production debugging, to easily output an
> intermediate view or result view to a `.out` file.
> So that we can look into the data representation, or check dirty data.
> This should be out-of-box without manually DDL registration.
> 
>> 3. blackhole sink (no output sink)
> This is very useful for high performance testing of Flink, to meansure the
> throughput of the whole pipeline without sink.
> Presto also provides this as a built-in connector [2].
> 
> Best,
> Jark
> 
> [1]:
> https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
> [2]: https://prestodb.io/docs/current/connector/blackhole.html
> 
> 
> On Sat, 21 Mar 2020 at 12:31, Bowen Li  wrote:
> 
>> +1.
>> 
>> I would suggest to take a step even further and see what users really need
>> to test/try/play with table API and Flink SQL. Besides this one, here're
>> some more sources and sinks that I have developed or used previously to
>> facilitate building Flink table/SQL pipelines.
>> 
>> 
>>   1. random input data source
>>  - should generate random data at a specified rate according to schema
>>  - purposes
>> - test Flink pipeline and data can end up in external storage
>> correctly
>> - stress test Flink sink as well as tuning up external storage
>>  2. print data sink
>>  - should print data in row format in console
>>  - purposes
>> - make it easier to test Flink SQL job e2e in IDE
>> - test Flink pipeline and ensure output data format/value is
>> correct
>>  3. no output data sink
>>  - just swallow output data without doing anything
>>  - purpose
>> - evaluate and tune performance of Flink source and the whole
>> pipeline. Users' don't need to worry about sink back pressure
>> 
>> These may be taken into consideration all together as an effort to lower
>> the threshold of running Flink SQL/table API, and facilitate users' daily
>> work.
>> 
>> Cheers,
>> Bowen
>> 
>> 
>> On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> I heard some users complain that table is difficult to test. Now with SQL
>>> client, users are more and more inclined to use it to test rather than
>>> program.
>>> The most common example is Kafka source. If users need to test their SQL
>>> output and checkpoint, they need to:
>>> 
>>> - 1.Launch a Kafka standalone, create a Kafka topic .
>>> - 2.Write a program, mock input records, and produce records to Kafka
>>> topic.
>>> - 3.Then test in Flink.
>>> 
>>> The step 1 and 2 are annoying, although this test is E2E.
>>> 
>>> Then I found StatefulSequenceSource, it is very good because it has deal
>>> with checkpoint things, so it is very good to checkpoint
>> mechanism.Usually,
>>> users are turned on checkpoint in production.
>>> 
>>> With computed columns, user are easy to create a sequence source DDL same
>>> to Kafka DDL. Then they can test inside Flink, don't need launch other
>>> things.
>>> 
>>> Have you consider this? What do you think?
>>> 
>>> CC: @Aljoscha Krettek  the author
>>> of StatefulSequenceSource.
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>

[jira] [Created] (FLINK-16650) Support LocalZonedTimestampType for Python UDF of blink planner

2020-03-18 Thread Dian Fu (Jira)

Dian Fu created FLINK-16650:
---

 Summary: Support LocalZonedTimestampType for Python UDF of blink 
planner
 Key: FLINK-16650
 URL: https://issues.apache.org/jira/browse/FLINK-16650
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-112: Support User-Defined Metrics for Python UDF

2020-03-17 Thread Dian Fu

+1 (binding)

On Tue, Mar 17, 2020 at 10:35 AM jincheng sun 
wrote:

> +1
>
> Best,
> Jincheng
>
>
>
> Hequn Cheng  于2020年3月16日周一 上午10:01写道：
>
> > Hi everyone,
> >
> > I'd like to start the vote of FLIP-112[1] which is discussed and reached
> > consensus in the discussion thread[2].
> > The vote will be open for at least 72 hours. Unless there is an
> objection,
> > I will try to close it by March 19, 2020 03:00 UTC if we have received
> > sufficient votes.
> >
> > Thanks,
> > Hequn
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-112%3A+Support+User-Defined+Metrics+in++Python+UDF
> > [2]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-112-Support-User-Defined-Metrics-for-Python-UDF-td38609.html
> >
>

Re: [VOTE] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-17 Thread Dian Fu

+1 (binding)

On Tue, Mar 17, 2020 at 6:56 PM Hequn Cheng  wrote:

> +1 (binding)
>
> Best,
> Hequn
>
> > On Mar 17, 2020, at 5:03 PM, Benchao Li  wrote:
> >
> > +1 (non-binding)
> >
> > BTW it's in the same thread in my gmail too.
> >
> >
> >
> > Kurt Young  于2020年3月17日周二 上午11:47写道：
> >
> >> Looks like I hit the gmail's bug again...
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Tue, Mar 17, 2020 at 11:11 AM Wei Zhong 
> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> This vote thread is independent from my side[1]. If this thread is
> >>> combined with another thread from your side, you can try to change the
> >> mail
> >>> client.
> >>>
> >>> Best,
> >>> Wei
> >>>
> >>> [1]
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-td38895.html
> >>> <
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-td38895.html
> 
> >>>
>  在 2020年3月17日，10:57，Kurt Young  写道：
> 
>  Hi, please use a dedicated vote thread.
> 
>  Best,
>  Kurt
> 
> 
>  On Tue, Mar 17, 2020 at 10:36 AM jincheng sun <
> >> sunjincheng...@gmail.com>
>  wrote:
> 
> > +1
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Wei Zhong  于2020年3月13日周五 下午9:04写道：
> >
> >> Hi all,
> >>
> >> I would like to start the vote for FLIP-106[1] which is discussed
> and
> >> reached consensus in the discussion thread[2].
> >>
> >> The vote will be open for at least 72 hours. I'll try to close it by
> >> 2020-03-18 14:00 UTC, unless there is an objection or not enough
> >> votes.
> >>
> >> Best,
> >> Wei
> >>
> >> [1]
> >>
> >
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
> >> [2]
> >>
> >
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-td38107.html
> >>
> >>
> >
> >>>
> >>>
> >>
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>

[jira] [Created] (FLINK-16608) Support primitive data types for vectorized Python UDF

2020-03-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-16608:
---

 Summary: Support primitive data types for vectorized Python UDF
 Key: FLINK-16608
 URL: https://issues.apache.org/jira/browse/FLINK-16608
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


As the title described, the aim of this JIRA is to support the primitive types 
for vectorized Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16607) Update flink-fn-execution.proto adding more schema information

2020-03-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-16607:
---

 Summary: Update flink-fn-execution.proto adding more schema 
information
 Key: FLINK-16607
 URL: https://issues.apache.org/jira/browse/FLINK-16607
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Currently the following information is missing in flink-fn-execution.proto:
- the length for CharType/VarCharType/BinaryType/VarBinaryType
- the precision for TimeType
- LocalZonedTimeStampType is missing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16606) Throw exceptions for the data types which are not currently supported

2020-03-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-16606:
---

 Summary: Throw exceptions for the data types which are not 
currently supported
 Key: FLINK-16606
 URL: https://issues.apache.org/jira/browse/FLINK-16606
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.10.1, 1.11.0


Currently, there are still cases where a data type isn't supported as the 
Python data type will be firstly converted to TypeInformation which will lose a 
few information, e.g, 
 - the precision for TimeType could only be 0
 - the length for VarBinaryType/VarCharType could only be 0x7fff
 - the precision/scale for DecimalType could only be 38/18
 - the precision for TimestampType/LocalZonedTimestampType could only be 3
 - the resolution for DayTimeIntervalType could only be `SECOND` and the 
fractionalPrecision could only be 3
 - the resolution for YearMonthIntervalType could only be `MONTH` and the 
`yearPrecision` could only be 2
 - the CharType/BinaryType/ZonedTimestampType is not supported

We should throw exceptions for the cases not supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-112: Support User-Defined Metrics for Python UDF

2020-03-12 Thread Dian Fu

Hi Hequn,

Thanks for driving this. +1 to this feature.

Just one minor comment: It seems that we will add an API get_metric_group for 
the Python class FunctionContext, could you update the FLIP reflecting this?

Thanks,
Dian

> 在 2020年3月10日，下午3:38，Wei Zhong  写道：
> 
> Hi Hequn,
> 
> Thanks for driving this. +1 for the metrics support for Python UDF, which 
> makes it much easier for users to monitor the execution of Python UDFs.
> 
> Best,
> Wei
> 
> 
>> 在 2020年3月10日，15:32，Xingbo Huang  写道：
>> 
>> Hi Hequn,
>> thanks for drafting the FLIP and kicking off the discussion.
>> 
>> +1 for this feature.
>> I think this feature will be extremely convenient for PyFlink users.
>> 
>> Best,
>> Xingbo
>> 
>> Hequn Cheng  于2020年3月9日周一 上午11:32写道：
>> 
>>> Hi everyone,
>>> 
>>> FLIP-58 adds the support for Python UDFs, but user-defined metrics
>>> have not been supported yet. With metrics, users can report and monitor
>>> the UDF status to get a deeper understanding of the execution,
>>> so in this FLIP, we want to support metrics for Python UDFs.
>>> 
>>> Previously, Jincheng and I discussed offline about the support of
>>> metrics for Python UDFs. We'd like to achieve three goals for
>>> supporting metrics for Python UDFs:
>>> - Support user-defined metrics including Counters, Gauges, Meters,
>>> Distributions in Python UDFs.
>>> - Support defining user scopes.
>>> - Support defining user variables.
>>> 
>>> More details can be found in the FLIP wiki page[1] and we are looking
>>> forward
>>> to your feedback.
>>> 
>>> Best,
>>> Hequn
>>> 
>>> [1]
>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-112%3A+Support+User-Defined+Metrics+in++Python+UDF
>>> 
>

[jira] [Created] (FLINK-16538) Restructure Python Table API documentation

2020-03-11 Thread Dian Fu (Jira)

Dian Fu created FLINK-16538:
---

 Summary: Restructure Python Table API documentation
 Key: FLINK-16538
 URL: https://issues.apache.org/jira/browse/FLINK-16538
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Python Table API documentation is currently spread across a number of pages and 
it's difficult for a user to find out all the documentations available. 
Besides, there are also a few documentations which deserves specific page to 
describe the functionality, such as the environment setup, the Python 
dependency management, the vectorized Python UDF, etc. 

We want to improve the documentation by adding an item under the Table API as 
"Python Table API" and the structure of "Python Table API" is as following:
 * Python Table API
 ** Installation
 ** User-defined Functions
 ** Vectorized Python UDF
 ** Dependency Management
 ** Metrics & Logging
 ** Configuration

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16486) Add documentation for vectorized Python UDF

2020-03-07 Thread Dian Fu (Jira)

Dian Fu created FLINK-16486:
---

 Summary: Add documentation for vectorized Python UDF
 Key: FLINK-16486
 URL: https://issues.apache.org/jira/browse/FLINK-16486
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


As the title described, the aim of this JIRA is to add documentation for 
vectorized Python UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16485) Support vectorized Python UDF in the batch mode of old planner

2020-03-07 Thread Dian Fu (Jira)

Dian Fu created FLINK-16485:
---

 Summary: Support vectorized Python UDF in the batch mode of old 
planner
 Key: FLINK-16485
 URL: https://issues.apache.org/jira/browse/FLINK-16485
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Currently, vectorized Python UDF is only supported in the batch/stream mode for 
the blink planner and stream mode for the old planner. The aim of this Jira is 
to add support in the batch mode for the old planner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16484) Support all the data types in vectorized Python UDF

2020-03-07 Thread Dian Fu (Jira)

Dian Fu created FLINK-16484:
---

 Summary: Support all the data types in vectorized Python UDF
 Key: FLINK-16484
 URL: https://issues.apache.org/jira/browse/FLINK-16484
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Currently, only TinyInt/Smallint/Int/BigInt data types are supported in 
vectorized Python UDF. The aim of this Jira is to add support for all the other 
data types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16483) Add Python building blocks to make sure the basic functionality of vectorized Python UDF could work

2020-03-07 Thread Dian Fu (Jira)

Dian Fu created FLINK-16483:
---

 Summary: Add Python building blocks to make sure the basic 
functionality of vectorized Python UDF could work
 Key: FLINK-16483
 URL: https://issues.apache.org/jira/browse/FLINK-16483
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


The aim of this Jira is to add Python building blocks such as the coders, etc 
to make sure the basic functionality of vectorized Python UDF could work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Dian Fu

Hi Jark,

Thanks for starting this discussion. Personally I also love the "squash and 
merge" button. It's very convenient.

Regarding to the email address "noreply", it seems that there are two cases:
- The email address in the original commit is already "noreply". In this case, 
this issue will still exist even if the PR is merged via command line, e.g. [1].
- The email address in the original commit is correct and it becomes "noreply" 
when merged via web page button because the author has not correctly set the 
commit email address[2] in his personal github setting, e.g.[3]. In this case, 
it's indeed a problem. However, I have checked that there are only 75 such kind 
of commits out of 5375 commits since Jan 1, 2019. So maybe it's acceptable 
compared to the benefits we could gain.

Regards,
Dian

[1] 
https://github.com/apache/flink/commit/c4db7052c78d6b8204170e17a80a2416fa760523 

[2] 
https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/adding-an-email-address-to-your-github-account
 

[3] 
https://github.com/apache/flink/commit/9b5232d79a945607a83b02b0025b3206b06c27bd 

> 在 2020年3月6日，下午12:18，Jark Wu  写道：
> 
> Hi Stephan,
> 
>> noreply email address.
> I investigated this and found some x...@users.noreply.github.com address. I
> think that's because they enabled "kepp email addresses private" on GitHub
> [1].
> 
>> Don't most PRs consist anyways of multiple commits where we want to
> preserve "refactor" and "feature" differentiation in the history, rather
> than squash everything?
> For multiple commits, GitHub provides another button called "rebase and
> merge" which is mentioned by Piotr. But I usually operate in local if want
> to preserve multiple commits.
> 
> It seems that GitHub is fixing it in 24 hours:
> https://twitter.com/yadong_xie/status/1235554461256302593
> 
> Best,
> Jark
> 
> [1]:
> https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address
> 
> On Fri, 6 Mar 2020 at 10:05, Jingsong Li  wrote:
> 
>> Hi,
>> 
>> I agree with Jark. The tool is useful. If there are some problem, I think
>> we can reach an agreement to form certain terms?
>> 
>> Github provides:
>> - "rebase and merge" keep all commits.
>> - "squash and merge" squash all commits to one commits, pull request
>> authors used to be multiple commits, like "address comments", "Fix
>> comments", "Fix checkstyle". I think we can help authors to squash these
>> useless commits.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Fri, Mar 6, 2020 at 4:46 AM Matthias J. Sax  wrote:
>> 
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA512
>>> 
>>> Seems, this will be fixed today:
>>> 
>>> https://twitter.com/natfriedman/status/1235613840659767298?s=19
>>> 
>>> 
>>> - -Matthias
>>> 
>>> On 3/5/20 8:37 AM, Stephan Ewen wrote:
 It looks like this feature still messes up email addresses, for
 example if you do a "git log | grep noreply" in the repo.
 
 Don't most PRs consist anyways of multiple commits where we want
 to preserve "refactor" and "feature" differentiation in the
 history, rather than squash everything?
 
 On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
 wrote:
 
> Hi,
> 
> If it’s really not preserving ownership (I didn’t notice the
> problem before), +1 for removing “squash and merge”.
> 
> However -1 for removing “rebase and merge”. I didn’t see any
> issues with it and I’m using it constantly.
> 
> Piotrek
> 
>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
>> 
>> Hi all,
>> 
>> Thanks for the feedbacks. But I want to clarify the motivation
>> to disable "Squash and merge" is just because of the
>> regression/bug of the missing author information. If GitHub
>> fixes this later, I think it makes sense to bring this button
>> back.
>> 
>> Hi Stephan & Zhijiang,
>> 
>> To be honest, I love the "Squash and merge" button and often
>> use it. It saves me a lot of time to merge PRs, because pulling
>> and pushing commits
> in
>> China is very unstable.
>> 
>> I don't think the potential problems you mentioned is a
>> "problem". For "Squash and merge", - "Merge commits": there is
>> no "merge" commits, because GitHub will
> squash
>> commits and rebase the commit and then add to the master
>> branch. - "This closes #" line to track back: when you
>> click "Squash and merge", it allows you to edit the title and
>> description, so you can add "This closes #" message to the
>> description the same with in the local git. Besides, GitHub
>> automatically append "(#)" after the
>

Re: Flink dev blog

2020-03-05 Thread Dian Fu

pular topic in ML
>>>>> discussions), such as how to enable and monitor RocksDB metrics and
>> do
>>>>> debugging/perf-tuning with the metrics/logs, and introduce
>>>>> internals/details around the RocksDB memory management mechanism.
>>>>> 
>>>>> Best Regards,
>>>>> Yu
>>>>> 
>>>>> 
>>>>> On Wed, 4 Mar 2020 at 11:07, Xintong Song 
>>> wrote:
>>>>> 
>>>>>> I also like Ufuk's idea.
>>>>>> 
>>>>>> The wiki allows people to post on their works in a quick and easier
>>>> way.
>>>>>> For me and probably many other Chinese folks, writing and
>> polishing a
>>>>>> formal article in English usually takes a long time, of which a
>>>>> significant
>>>>>> portion is spent on polishing the language. If the blog does not
>>>> require
>>>>>> such formal and high quality languages, I believe it will make
>>> things a
>>>>> lot
>>>>>> easier and encourage more people to share their ideas. Besides, it
>>> also
>>>>>> avoids putting more review workloads on committers.
>>>>>> 
>>>>>> Regarding promoting wiki post to the main blog, I think the wiki
>>>>> feedbacks
>>>>>> (comment, likes, etc.) could be a great input. We can also contact
>>> the
>>>>>> original author before promoting posts to the main blog to refine
>> the
>>>>>> article (responding to the wiki comments, polishing languages,
>> adding
>>>>>> latest updates, etc.).
>>>>>> 
>>>>>> Thank you~
>>>>>> 
>>>>>> Xintong Song
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 4, 2020 at 10:25 AM Jark Wu  wrote:
>>>>>> 
>>>>>>> +1 for this.
>>>>>>> 
>>>>>>> Regarding to the place to hold blogs. Personally, I prefer to use
>>>>>> existing
>>>>>>> blog and separate by tags/categories and title names.
>>>>>>> Because, the dev blogs are very good learning materials. I
>> believe
>>>> many
>>>>>>> users will be interested in these posts. It's just like
>>>>>>> "Technology Deep Dive" talks in Flink Forward which attracts many
>>>>>>> audiences. Putting them with main blog together can help
>>>>>>> to give the dev blogs more exposure.
>>>>>>> 
>>>>>>> But I also share Robert's concern. So I'm in favor of Ufuk's
>> idea:
>>>>>> starting
>>>>>>> with Wiki, and moving good posts to the main blog gradually.
>>>>>>> We should also improve our current blog web to support
>>>> tags/categories.
>>>>>>> Maybe @vthink...@gmail.com  Yadong can help
>>> on
>>>>>> this.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jark
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 4 Mar 2020 at 05:03, Ufuk Celebi  wrote:
>>>>>>> 
>>>>>>>> +1 on starting with the Wiki. I really like the name "Engine
>>> room".
>>>>> Can
>>>>>>> we
>>>>>>>> name the section in the Wiki like that? In general, if we think
>>>> that
>>>>> a
>>>>>>> post
>>>>>>>> or a series of posts would be a good fit for the main blog, it
>>>> would
>>>>> be
>>>>>>>> pretty straightforward to promote a post from the Engine room
>> to
>>>> the
>>>>>> main
>>>>>>>> blog (including further edits, focus on language, etc.)
>>>>>>>> 
>>>>>>>> – Ufuk
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Mar 3, 2020 at 5:58 PM Rong Rong 
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Big +1 on this. Some of these topics are not only for
>>>> contributors,
>>>>>> but
>>>>>>>>> would also be super useful for advance users.
>>>>>>>>>

Re: Flink dev blog

2020-03-03 Thread Dian Fu

Big +1 on this idea. It will benefit both the developers and users a lot.

Regarding to the place to hold these blogs, my preference is 3) as I notice 
that there are already a few high quality blogs on flink web-site[1] and I 
guess that may be a good place to start with. We just need to figure out a way 
to let contributors clearly mark the audience of their articles and also help 
users to easily determine whether the content is what they want.

Regards,
Dian

[1] https://flink.apache.org/blog/ 
> 在 2020年3月3日，下午11:14，Yadong Xie  写道：
> 
> Hi all
> 
> maybe we can use markdown & GitHub to make the submission easy to review
> I have set up a similar blog for Flink-china blog before(deprecated), glad
> to offer help if needed
> 
> here is the link: https://github.com/flink-china/doc
> 
> Seth Wiesman  于2020年3月3日周二 下午10:51写道：
> 
>> For lack of a better way to put this, I think the location depends on the
>> level of effort you want to put into writing these articles.
>> 
>> If they are informal design documents then I think the wiki is the way to
>> go.
>> 
>> If you want to have them be more polished then the existing blog. This
>> means going through a PR on the flink website, thinking about language,
>> etc. If we go this route we can distinguish them with a series title like
>> "Flink Engine Room" and a disclaimer at the top.
>> 
>> "Flink Engine Room: Plugins"
>> 
>> "Flink Engine Room is a series of blog posts covering ongoing development
>> on Apache Flink internals, why decisions were made, and how they will
>> impact future development. The information described in this post is not
>> required to successfully write and deploy Flink applications in
>> production."
>> 
>> Seth
>> 
>> 
>> On Tue, Mar 3, 2020 at 8:29 AM Arvid Heise  wrote:
>> 
>>> I think there is enough to positive to start setting it up. That begs the
>>> question: in which format.
>>> 
>>> Following possibilities exist:
>>> 1) Use wiki as Robert pointed out.
>>> 2) Add new blog.
>>> 3) Use existing blog and separate by tags #user, #expert, #dev (can be
>>> mixed). Start page could filter on #user by default.
>>> 4) ???
>>> 
>>> I'm assuming only few have a strong opinion, so I'd be happy if you'd
>> just
>>> drop your numbers in order of highest to lowest preference.
>>> 
>>> On Tue, Mar 3, 2020 at 2:48 PM Piotr Nowojski 
>> wrote:
>>> 
 +1 for the idea :) And fully agree to clearly separate them.
 
 I think the original idea was writing about some recent changes in the
 Flink’s code base, that could affect other Flink developers
 (contributors/committers). Like for example some new ideas/future
 directions that we want to follow. Especially if they are work in
>>> progress
 and there is lots of old code not adhering to those new ideas. In some
 later responses, it seemed like people are more thinking about
>> presenting
 some more advanced features, like a deep tech dive for power users.
 
 I’m not opposing the deep tech dives, but I just wanted to note that
>> is a
 different target audience. I think the dev blogs could cover both of
>>> them.
 At least initially. Later on we can decide to put more emphasis on
>> power
 users or Flink devs, or split them, or whatever.
 
 Piotrek
 
> On 3 Mar 2020, at 12:37, Jingsong Li  wrote:
> 
> +1 for this proposal. I have a lot of desired topics in table and
>>> batch.
> 
> I also second Seth and Stephan 's comment separate this in a clear
>> way.
> Have concerns that maybe easy to confuse new users.
> If I am a beginner and find a bunch of deep documents, I need to
>>> further
> distinguish which is effective and which is invalid for me, which may
 cause
> me a lot of trouble.
> 
> Best,
> Jingsong Lee
> 
> On Tue, Mar 3, 2020 at 6:36 PM Flavio Pompermaier <
>>> pomperma...@okkam.it>
> wrote:
> 
>> Big +1 from my side. I'd be very interested in what Jeff proposed,
>> in
>> particular everything related to client part (job submission,
>> workflow
>> management, callbacks on submission/success/failure, etc).
>> Something I can't find anywhere is also how to query Flink
 states..would it
>> be possible to have something like the Presto UI [1]? Does Flink
 implement
>> some sort of query queuing? I heard about a query proxy server but I
 don't
>> know if there's a will to push in that direction.
>> For Stateful Functions it would be nice to deeply compare the taxi
 driver
>> solution with a more common implementation (i.e. using a database to
>> persist the legal data..is it safe to keep them as a Flink state?).
>> [1]
 https://www.tutorialspoint.com/apache_presto/images/web_interface.jpg
>> 
>> Best,
>> Flavio
>> 
>> On Tue, Mar 3, 2020 at 10:47 AM Jeff Zhang 
>> wrote:
>> 
>>> +1 for this proposal.  I am preparing some articles for how to

[jira] [Created] (FLINK-16337) Add RelNodes and Rules for vectorized Python UDF execution

2020-02-28 Thread Dian Fu (Jira)

Dian Fu created FLINK-16337:
---

 Summary: Add RelNodes and Rules for vectorized Python UDF execution
 Key: FLINK-16337
 URL: https://issues.apache.org/jira/browse/FLINK-16337
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


As the title describes, the aim of this JIRA is to add RelNodes and Rules for 
vectorized Python UDF execution. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16271) Introduce ArrowScalarFunctionOperator for vectorized Python UDF execution

2020-02-24 Thread Dian Fu (Jira)

Dian Fu created FLINK-16271:
---

 Summary: Introduce ArrowScalarFunctionOperator for vectorized 
Python UDF execution
 Key: FLINK-16271
 URL: https://issues.apache.org/jira/browse/FLINK-16271
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


The aim of this Jira is to introduce ArrowScalarFunctionOperator for vectorized 
Python UDF execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Dian Fu

Congrats Jingsong!

> 在 2020年2月21日，上午11:39，Jark Wu  写道：
> 
> Congratulations Jingsong! Well deserved.
> 
> Best,
> Jark
> 
> On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> 
>> Congratulations! Jingsong
>> 
>> 
>> Best,
>> Dan Zou
>>

Re: [ANNOUNCE] Apache Flink-shaded 10.0 released

2020-02-19 Thread Dian Fu

Thanks Chesnay for the great work and everyone involved!

Regards,
Dian

> 在 2020年2月20日，上午12:21，Zhijiang  写道：
> 
> Thanks Chesnay for making the release efficiently and also thanks to all the 
> other participants!
> 
> Best,
> Zhijiang 
> 
> 
> --
> From:Till Rohrmann mailto:trohrm...@apache.org>>
> Send Time:2020 Feb. 19 (Wed.) 22:21
> To:dev mailto:dev@flink.apache.org>>
> Subject:Re: [ANNOUNCE] Apache Flink-shaded 10.0 released
> 
> Thanks for making the release possible Chesnay and everyone who was
> involved!
> 
> Cheers,
> Till
> 
> On Wed, Feb 19, 2020 at 7:47 AM jincheng sun 
> wrote:
> 
>> Thanks a lot for the release Chesnay!
>> And thanks to everyone who make this release possible!
>> 
>> Best,
>> Jincheng
>> 
>> 
>> Chesnay Schepler  于2020年2月19日周三 上午12:45写道：
>> 
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink-shaded 10.0.
>>> 
>>> The flink-shaded project contains a number of shaded dependencies for
>>> Apache Flink.
>>> 
>>> Apache Flink(r) is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data
>>> streaming applications.
>>> 
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>> 
>>> The full release notes are available in Jira:
>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
>>> 
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> 
>>> Regards,
>>> Chesnay

Re: [DISCUSS] Kicking off the 1.11 release cycle

2020-02-19 Thread Dian Fu

Thanks Stephan kicking off this discussion and Zhijiang volunteering as one of 
the release managers.

+1 for the "feature freeze date" around end of April. There are still more than 
2 months left, so I think it's a reasonable time.

Thanks,
Dian

> 在 2020年2月19日，下午10:52，Aljoscha Krettek  写道：
> 
> +1
> 
> Although I would hope that it can be more than just "anticipated".
> 
> Best,
> Aljoscha
> 
> On 19.02.20 15:40, Till Rohrmann wrote:
>> Thanks for volunteering as one of our release managers Zhijiang.
>> +1 for the *anticipated feature freeze date* end of April. As we go along
>> and collect more data points we might be able to strengthen our
>> initial anticipation.
>> Cheers,
>> Till
>> On Wed, Feb 19, 2020 at 4:44 AM Zhijiang 
>> wrote:
>>> Thanks for kicking off the next release and the introduction, @Stephan!
>>> 
>>> It's my pleasure to become the release manager and involve in other
>>> community works. I am working together with @Piotr for a long time,  so
>>> very happy to cooperate for the release manager work again. The previous
>>> release work was always well done, and I can learn a lot from these rich
>>> experiences.
>>> 
>>> +1 for the "feature freeze date" around end of April.
>>>  Although we have the FF SF in the meantime, fortunately there are no long
>>> public holidays during this period in China.
>>> 
>>> Best,
>>> Zhijiang
>>> 
>>> 
>>> --
>>> From:Stephan Ewen 
>>> Send Time:2020 Feb. 19 (Wed.) 01:15
>>> To:dev 
>>> Cc:zhijiang ; pnowojski 
>>> Subject:[DISCUSS] Kicking off the 1.11 release cycle
>>> 
>>> Hi all!
>>> 
>>> Now that the 1.10 release is out (congrats again, everyone!), I wanted to
>>> bring up some questions about organizing the next release cycle.
>>> 
>>> The most important questions at the beginning would be
>>>   - Who would volunteer as Release Managers
>>>   - When would be the release date.
>>> 
>>> For the release managers, Piotrek and Zhijiang have mentioned previously
>>> that they would be interested, so I am copying them here to chime in.
>>> @Piotr and @Zhijiang could you confirm if you are interested in helping
>>> out with this?
>>> 
>>> About the release date: By our original "3 months release cycle"
>>> assumption, we should aim for a release **Mid May**, meaning a feature
>>> freeze no later than end of April.
>>> That would be indeed a shorter release cycle than 1.10, and the assumption
>>> of a shorter testing period. But aiming for a shorter release cycle than
>>> 1.10 would actually be nice, in my opinion. 1.10 grew very big in the end,
>>> which caused also a very long testing period (plus Christmas and Chinese
>>> New Year are also partially to blame).
>>> 
>>> The exact feature freeze date is anyways a community discussion later, but
>>> what do you think about starting with an "anticipated feature freeze date"
>>> around end of April, so that committers, contributors, and users know
>>> roughly what to expect?
>>> 
>>> Best,
>>> Stephan
>>> 
>>> 
>>>

[jira] [Created] (FLINK-16121) Introduce ArrowReader and ArrowWriter for Arrow format data read and write

2020-02-17 Thread Dian Fu (Jira)

Dian Fu created FLINK-16121:
---

 Summary: Introduce ArrowReader and ArrowWriter for Arrow format 
data read and write
 Key: FLINK-16121
 URL: https://issues.apache.org/jira/browse/FLINK-16121
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


As the title described, this aim of this JIRA is to introduce classes such as 
ArrowReader which is used to read the execution results of vectorized Python 
UDF and ArrowWriter which is used to convert Flink rows to Arrow format before 
sending them to the Python worker for vectorized Python UDF execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16114) Support Scalar Vectorized Python UDF in PyFlink

2020-02-17 Thread Dian Fu (Jira)

Dian Fu created FLINK-16114:
---

 Summary: Support Scalar Vectorized Python UDF in PyFlink
 Key: FLINK-16114
 URL: https://issues.apache.org/jira/browse/FLINK-16114
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Scalar Python UDF has already been supported in Flink 1.10 
([FLIP-58|https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table])
 and it operates one row at a time. It works in the way that the Java operator 
serializes one input row to bytes and sends them to the Python worker; the 
Python worker deserializes the input row and evaluates the Python UDF with it; 
the result row is serialized and sent back to the Java operator.

It suffers from the following problems:
 # High serialization/deserialization overhead
 # It’s difficult to leverage the popular Python libraries used by data 
scientists, such as Pandas, Numpy, etc which provide high performance data 
structure and functions.

We want to introduce vectorized Python UDF to address this problem. For 
vectorized Python UDF, a batch of rows are transferred between JVM and Python 
VM in columnar format. The batch of rows will be converted to a collection of 
Pandas.Series and given to the vectorized Python UDF which could then leverage 
the popular Python libraries such as Pandas, Numpy, etc for the Python UDF 
implementation.

More details could be found in 
[FLIP-97.|https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[RESULT][VOTE] FLIP-97: Support scalar vectorized Python UDF in PyFlink

2020-02-17 Thread Dian Fu

Hi all,

Thanks you all for the discussion and votes.
So far, we have
  - 3 binding +1 votes (Jincheng, Hequn, Dian)
  - 1 non-binding +1 votes (Jingsong)
  - No -1 votes

The voting time has passed and there are enough +1 votes. Therefore, I'm happy 
to announce that FLIP-97[1] has been accepted.

Thanks,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink

Re: [VOTE] Support scalar vectorized Python UDF in PyFlink

2020-02-16 Thread Dian Fu

+1 (binding)

Regards,
Dian

> 在 2020年2月13日，下午6:15，Hequn Cheng  写道：
> 
> +1 (binding)
> 
> Best, Hequn
> 
> On Thu, Feb 13, 2020 at 11:48 AM Jingsong Li  wrote:
> 
>> +1 (non-binding)
>> Thanks Dian for driving.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Thu, Feb 13, 2020 at 11:45 AM jincheng sun 
>> wrote:
>> 
>>> +1 (binding)
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Dian Fu  于2020年2月12日周三 下午1:31写道：
>>> 
>>>> Hi all,
>>>> 
>>>> I'd like to start the vote of FLIP-97[1] which is discussed and reached
>>>> consensus in the discussion thread[2].
>>>> 
>>>> The vote will be open for at least 72 hours. Unless there is an
>>> objection,
>>>> I will try to close it by Feb 17, 2020 08:00 UTC if we have received
>>>> sufficient votes.
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>> [1]
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
>>>> [2]
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-scalar-vectorized-Python-UDF-in-PyFlink-tt37264.html
>>> 
>> 
>> 
>> --
>> Best, Jingsong Lee
>>

[jira] [Created] (FLINK-16109) Move the Python scalar operators and table operators to separate package

2020-02-16 Thread Dian Fu (Jira)

Dian Fu created FLINK-16109:
---

 Summary: Move the Python scalar operators and table operators to 
separate package
 Key: FLINK-16109
 URL: https://issues.apache.org/jira/browse/FLINK-16109
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Huang Xingbo
 Fix For: 1.11.0


Currently both the Python scalar operators and table operators are under the 
same package org.apache.flink.table.runtime.operators.python. There are already 
many operators under this package. After introducing the aggregate function 
support and Vectorized Python function support in the future, there will be 
more and more operators under the same package. 

We could improve it by the following package structure: 
org.apache.flink.table.runtime.operators.python.scalar
 org.apache.flink.table.runtime.operators.python.table
org.apache.flink.table.runtime.operators.python.aggregate (in the future)
org.apache.flink.table.runtime.operators.python.scalar.arrow (in the future)

As these classes are internal, it's safe to do so and there are no backwards 
compatibility issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New Documentation Style Guide

2020-02-15 Thread Dian Fu

Thanks for the great work! This is very helpful to keep the documentation style 
consistent across the whole project. It's also very helpful for non-native 
English contributors like me.

> 在 2020年2月15日，下午3:42，Jark Wu  写道：
> 
> Great summary! Thanks for adding the translation specification in it.
> I learned a lot from the guide.
> 
> Best,
> Jark
> 
> On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek  wrote:
> 
>> Hi Everyone,
>> 
>> we just merged a new style guide for documentation writing:
>> https://flink.apache.org/contributing/docs-style.html.
>> 
>> Anyone who is writing documentation or is planning to do so should check
>> this out. Please open a Jira Issue or respond here if you have any
>> comments or questions.
>> 
>> Some of the most important points in the style guide are:
>> 
>>  - We should use direct language and address the reader as you instead
>> of passive constructions. Please read the guide if you want to
>> understand what this means.
>> 
>>  - We should use "alert blocks" instead of simple inline alert tags.
>> Again, please refer to the guide to see what this means exactly if
>> you're not sure.
>> 
>> There's plenty more and some interesting links about
>> technical/documentation writing as well.
>> 
>> Best,
>> Aljoscha
>>

[jira] [Created] (FLINK-16053) Remove redundant metrics in PyFlink

2020-02-13 Thread Dian Fu (Jira)

Dian Fu created FLINK-16053:
---

 Summary: Remove redundant metrics in PyFlink
 Key: FLINK-16053
 URL: https://issues.apache.org/jira/browse/FLINK-16053
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


We have recorded the metrics about how many elements it has processed in Python 
UDF. This kind of information is not necessary as there is also this kind of 
information in the Java operator. I have performed a simple test and find that 
removing it could improve the performance about 5% - 10%.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Support Python ML Pipeline API

2020-02-13 Thread Dian Fu

+1 (binding)

Regards,
Dian

> 在 2020年2月14日，下午2:49，Yuan Mei  写道：
> 
> +1 vote
> 
> This is one of the most important things for Flink ML framework to be
> widely adopted since most data scientists use python.
> 
> Best
> 
> Yuan
> 
> On Fri, Feb 14, 2020 at 9:45 AM Hequn Cheng  wrote:
> 
>> Hi everyone,
>> 
>> I'd like to start the vote of FLIP-96[1] which is discussed and reached
>> consensus in the discussion thread[2].
>> The vote will be open for at least 72 hours. Unless there is an objection,
>> I will try to close it by Feb 19, 2020 02:00 UTC if we have received
>> sufficient votes.
>> 
>> Thanks,
>> Hequn
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-96%3A+Support+Python+ML+Pipeline+API
>> [2]
>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>>

Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Dian Fu

+1 (non-binding)

- Verified the signature and checksum
- Checked the release note that all the tickets included in this release are 
there
- Checked the website PR and it LGTM
- Checked the notice file of the newly added module flink-shade-zookeeper-3 and 
it LGTM

Regards,
Dian

> 在 2020年2月14日，下午2:58，Hequn Cheng  写道：
> 
> Thank you Chesnay for the release!
> 
> +1 (non-binding)
> 
> - The problem that exists in RC1 has been resolved.
> - Release notes looks good.
> - Built from source archive successfully.
> - Check commit history manually. Nothing looks weird.
> - Signatures and hash are correct.
> - All artifacts have been deployed to the maven central repository.
> - The website pull request looks good
> 
> Best, Hequn
> 
> On Fri, Feb 14, 2020 at 1:14 AM Zhu Zhu  wrote:
> 
>> +1 (non-binding)
>> 
>> - checked release notes, JIRA tickets and commit history
>> - verified the signature and checksum
>> - checked the maven central artifacts
>>  * examined the zookeeper shaded jars (both 3.4.10 and 3.5.6), curator and
>> zookeeper classes are there and shaded
>> - built from the source archive as well as the git tag
>> - checked the website pull request
>> 
>> Thanks,
>> Zhu Zhu
>> 
>> Ufuk Celebi  于2020年2月14日周五 上午12:32写道：
>> 
>>> PS: Also verified the NOTICE changes since the last RC.
>>> 
>>> On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:
>>> 
 Hey Chensay,
 
 +1 (binding).
 
 - Verified checksum ✅
 - Verified signature ✅
 - Jira changelog looks good to me ✅
 - Website PR looks good to me ✅
 - Verified no unshaded dependencies (except the Hadoop modules which I
 think is expected) ✅
 - Verified dependency management fix FLINK-15540
 (commons-collections:3.2.2 as expected) ✅
 - Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
 flink-shaded-force-shading and the Hadoop modules which I think is
 expected) ✅
 
 – Ufuk
 
 On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
> 
> +1 (non-binding)
> 
> Checked issues listed in release notes: ok
> Checked sums and signatures: ok
> Checked the maven central artifices: ok
> Built from source: ok (8u101, 11.0.4)
> Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> Checked contents of zookeeper shaded jars: ok
> - no unshaded classes
> - shading pattern is correct
> Checked website pull request listing the new release: ok
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
 wrote:
> 
>> Hi everyone,
>> Please review and vote on the release candidate #3 for the version
 10.0,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>> 
>> 
>> The complete staging area is available for your review, which
>>> includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>>> dist.apache.org
>> [2], which are signed with the key with fingerprint 11D464BA [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-10.0-rc3 [5],
>> * website pull request listing the new release [6].
>> 
>> The vote will be open for at least 72 hours. It is adopted by
>>> majority
>> approval, with at least 3 PMC affirmative votes.
>> 
>> Thanks,
>> Chesnay
>> 
>> [1]
>> 
>> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
>> [2]
 https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> [4]
 https://repository.apache.org/content/repositories/orgapacheflink-1337
>> [5]
>> 
>> 
 
>>> 
>> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
>> [6] https://github.com/apache/flink-web/pull/304
>> 
>> 
>> 
>> 
 
>>> 
>>

Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-12 Thread Dian Fu

Thanks for the great work, Jincheng.

Regards,
Dian

> 在 2020年2月13日，下午1:32，jincheng sun  写道：
> 
> Hi everyone,
> 
> The Apache Flink community is very happy to announce the release of Apache 
> Flink Python API(PyFlink) 1.9.2, which is the first release to PyPI for the 
> Apache Flink Python API 1.9 series.
>  
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
>  
> The release is available for download at:
> 
> https://pypi.org/project/apache-flink/1.9.2/#files 
> 
> 
> Or installed using pip command: 
> 
> pip install apache-flink==1.9.2
>  
> We would like to thank all contributors of the Apache Flink community who 
> helped to verify this release and made this release possible!
> 
> Best,
> Jincheng

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Dian Fu

Thanks Gary & Yu and everyone involved, Great work!

Regards,
Dian

> 在 2020年2月12日，下午11:57，Haibo Sun  写道：
> 
> Thanks Gary & Yu. Great work!
> 
> Best,
> Haibo
> 
> At 2020-02-12 21:31:00, "Yu Li"  wrote:
> >The Apache Flink community is very happy to announce the release of Apache
> >Flink 1.10.0, which is the latest major release.
> >
> >Apache Flink® is an open-source stream processing framework for
> >distributed, high-performing, always-available, and accurate data streaming
> >applications.
> >
> >The release is available for download at:
> >https://flink.apache.org/downloads.html
> >
> >Please check out the release blog post for an overview of the improvements
> >for this new major release:
> >https://flink.apache.org/news/2020/02/11/release-1.10.0.html
> >
> >The full release notes are available in Jira:
> >https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12345845
> >
> >We would like to thank all contributors of the Apache Flink community who
> >made this release possible!
> >
> >Cheers,
> >Gary & Yu

[VOTE] Support scalar vectorized Python UDF in PyFlink

2020-02-11 Thread Dian Fu

Hi all,

I'd like to start the vote of FLIP-97[1] which is discussed and reached 
consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Feb 17, 2020 08:00 UTC if we have received sufficient 
votes.

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-scalar-vectorized-Python-UDF-in-PyFlink-tt37264.html

Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dian Fu

Hi Dawid,

Thanks for your reply. I'm also in favor of "col" as a column expression in the 
Python Table API. Regarding to use "$" in the Java/Scala Table API, I'm fine 
with it. So +1 from my side.

Thanks,
Dian

> 在 2020年2月11日，下午9:48，Aljoscha Krettek  写道：
> 
> +1
> 
> Best,
> Aljoscha
> 
> On 11.02.20 11:17, Jingsong Li wrote:
>> Thanks Dawid for your explanation,
>> +1 for vote.
>> So I am big +1 to accepting java.lang.Object in the Java DSL, without
>> scala implicit conversion, a lot of "lit" look unfriendly to users.
>> Best,
>> Jingsong Lee
>> On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
>> wrote:
>>> Hi,
>>> 
>>> To answer some of the questions:
>>> 
>>> @Jingsong We use Objects in the java API to make it possible to use raw
>>> Objects without the need to wrap them in literals. If an expression is
>>> passed it is used as is. If anything else is used, it is assumed to be
>>> an literal and is wrapped into a literal. This way we can e.g. write
>>> $("f0").plus(1).
>>> 
>>> @Jark I think it makes sense to shorten them, I will do it I hope people
>>> that already voted don't mind.
>>> 
>>> @Dian That's a valid concern. I would not discard the '$' as a column
>>> expression for java and scala. I think once we introduce the expression
>>> DSL for python we can add another alias to java/scala. Personally I'd be
>>> in favor of col.
>>> 
>>> On 11/02/2020 10:41, Dian Fu wrote:
>>>> Hi Dawid,
>>>> 
>>>> Thanks for driving this feature. The design looks very well for me
>>> overall.
>>>> 
>>>> I have only one concern: $ is not allowed to be used in the identifier
>>> of Python and so we have to come out with another symbol when aligning this
>>> feature in the Python Table API. I noticed that there are also other
>>> options proposed in the discussion thread, e.g. ref, col, etc. I think it
>>> would be great if the proposed symbol could be supported in both the
>>> Java/Scala and Python Table API. What's your thoughts?
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 在 2020年2月11日，上午11:13，Jark Wu  写道：
>>>>> 
>>>>> +1 for this.
>>>>> 
>>>>> I have some minor comments:
>>>>> - I'm +1 to use $ in both Java and Scala API.
>>>>> - I'm +1 to use lit(), Spark also provides lit() function to create a
>>>>> literal value.
>>>>> - Is it possible to have `isGreater` instead of `isGreaterThan` and
>>>>> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in
>>> BaseExpressions?
>>>>> 
>>>>> Best,
>>>>> Jark
>>>>> 
>>>>> On Tue, 11 Feb 2020 at 10:21, Jingsong Li 
>>> wrote:
>>>>> 
>>>>>> Hi Dawid,
>>>>>> 
>>>>>> Thanks for driving.
>>>>>> 
>>>>>> - adding $ in scala api looks good to me.
>>>>>> - Just a question, what should be expected to java.lang.Object? literal
>>>>>> object or expression? So the Object is the grammatical sugar of
>>> literal?
>>>>>> 
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>> 
>>>>>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 
>>> wrote:
>>>>>> 
>>>>>>> +1 for this.
>>>>>>> 
>>>>>>> It will also help in making a TableEnvironment.fromElements() possible
>>>>>>> and reduces technical debt. One entry point of TypeInformation less in
>>>>>>> the API.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>> 
>>>>>>> 
>>>>>>> On 10.02.20 08:31, Dawid Wysakowicz wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I wanted to resurrect the thread about introducing a Java Expression
>>>>>>>> DSL. Please see the updated flip page[1]. Most of the flip was
>>>>>> concluded
>>>>>>>> in previous discussion thread. The major changes since then are:
>>>>>>>> 
>>>>>>>> * accepting java.lang.Object in the Java DSL
>>>>>>>> 
>>>>>>>> * adding $ interpolation for a column in the Scala DSL
>>>>>>>> 
>>>>>>>> I think it's important to move those changes forward as it makes it
>>>>>>>> easier to transition to the new type system (Java parser supports
>>> only
>>>>>>>> the old type system stack for now) that we are working on for the
>>> past
>>>>>>>> releases.
>>>>>>>> 
>>>>>>>> Because the previous discussion thread was rather conclusive I want
>>> to
>>>>>>>> start already with a vote. If you think we need another round of
>>>>>>>> discussion, feel free to say so.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The vote will last for at least 72 hours, following the consensus
>>>>>> voting
>>>>>>>> process.
>>>>>>>> 
>>>>>>>> FLIP wiki:
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>>>>>>>> 
>>>>>>>> Discussion thread:
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> --
>>>>>> Best, Jingsong Lee
>>>>>> 
>>> 
>>>

Re: [DISCUSS] Support scalar vectorized Python UDF in PyFlink

2020-02-11 Thread Dian Fu

Hi all,

Thanks you all participating this discussion and sharing your thoughts. It
seems that we have reached consensus on the design now. I will start a VOTE
thread if there are no other feedbacks.

Thanks,
Dian

On Tue, Feb 11, 2020 at 10:23 AM Dian Fu  wrote:

> Hi Jingsong,
>
> You're right. I have updated the FLIP which reflects this.
>
> Thanks,
> Dian
>
> > 在 2020年2月11日，上午10:03，Jingsong Li  写道：
> >
> > Hi Dian and Jincheng,
> >
> > Thanks for your explanation. Think again. Maybe most of users don't want
> to
> > modify this parameters.
> > We all realize that "batch.size" should be a larger value, so
> "bundle.size"
> > must also be increased. Now the default value of "bundle.size" is only
> 1000.
> > I think you can update design to provide meaningful default value for
> > "batch.size" and "bundle.size".
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Feb 10, 2020 at 4:36 PM Dian Fu  wrote:
> >
> >> Hi Jincheng, Hequn & Jingsong,
> >>
> >> Thanks a lot for your suggestions. I have created FLIP-97[1] for this
> >> feature.
> >>
> >>> One little suggestion: maybe it would be nice if we can add some
> >> performance explanation in the document? (I just very curious:))
> >> Thanks for the suggestion. I have updated the design doc in the
> >> "BackGround" section about where the performance gains could be got
> from.
> >>
> >>> It seems that a batch should always in a bundle. Bundle size should
> >> always
> >> bigger than batch size. (if a batch can not cross bundle).
> >> Can you explain this relationship to the document?
> >> I have updated the design doc explaining more about these two
> >> configurations.
> >>
> >>> In the batch world, vectorization batch size is about 1024+. What do
> you
> >> think about the default value of "batch"?
> >> Is there any link about where this value comes from? I have performed a
> >> simple test for Pandas UDF which performs the simple +1 operation. The
> >> performance is best when the batch size is set to 5000. I think it
> depends
> >> on the data type of each column, the functionality the Pandas UDF does,
> >> etc. However I agree with you that we could give a meaningful default
> value
> >> for the "batch" size which works in most scenarios.
> >>
> >>> Can we only configure one parameter and calculate another
> automatically?
> >> For example, if we just want to "pipeline", "bundle.size" is twice as
> much
> >> as "batch.size", is this work?
> >> I agree with Jincheng that this is not feasible. I think that giving an
> >> meaningful default value for the "batch.size" which works in most
> scenarios
> >> is enough. What's your thought?
> >>
> >> Thanks,
> >> Dian
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> >>
> >>
> >> On Mon, Feb 10, 2020 at 4:25 PM jincheng sun 
> >> wrote:
> >>
> >>> Hi Jingsong,
> >>>
> >>> Thanks for your feedback! I would like to share my thoughts regarding
> the
> >>> follows question:
> >>>
> >>>>> - Can we only configure one parameter and calculate another
> >>> automatically? For example, if we just want to "pipeline",
> "bundle.size"
> >> is
> >>> twice as much as "batch.size", is this work?
> >>>
> >>> I don't think this works. These two configurations are used for
> different
> >>> purposes and there is no direct relationship between them and so I
> guess
> >> we
> >>> cannot infer a configuration from the other configuration.
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>>
> >>> Jingsong Li  于2020年2月10日周一 下午1:53写道：
> >>>
> >>>> Thanks Dian for your reply.
> >>>>
> >>>> +1 to create a FLIP too.
> >>>>
> >>>> About "python.fn-execution.bundle.size" and
> >>>> "python.fn-execution.arrow.batch.size", I got what are you mean about
> >>>> "pipeline". I agree.
> >>>> It seems that a batch should always in a bundle. Bundle size s

Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dian Fu

Hi Dawid,

Thanks for driving this feature. The design looks very well for me overall.

I have only one concern: $ is not allowed to be used in the identifier of 
Python and so we have to come out with another symbol when aligning this 
feature in the Python Table API. I noticed that there are also other options 
proposed in the discussion thread, e.g. ref, col, etc. I think it would be 
great if the proposed symbol could be supported in both the Java/Scala and 
Python Table API. What's your thoughts?

Regards,
Dian

> 在 2020年2月11日，上午11:13，Jark Wu  写道：
> 
> +1 for this.
> 
> I have some minor comments:
> - I'm +1 to use $ in both Java and Scala API.
> - I'm +1 to use lit(), Spark also provides lit() function to create a
> literal value.
> - Is it possible to have `isGreater` instead of `isGreaterThan` and
> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in BaseExpressions?
> 
> Best,
> Jark
> 
> On Tue, 11 Feb 2020 at 10:21, Jingsong Li  wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for driving.
>> 
>> - adding $ in scala api looks good to me.
>> - Just a question, what should be expected to java.lang.Object? literal
>> object or expression? So the Object is the grammatical sugar of literal?
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther  wrote:
>> 
>>> +1 for this.
>>> 
>>> It will also help in making a TableEnvironment.fromElements() possible
>>> and reduces technical debt. One entry point of TypeInformation less in
>>> the API.
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 10.02.20 08:31, Dawid Wysakowicz wrote:
 Hi all,

 I wanted to resurrect the thread about introducing a Java Expression
 DSL. Please see the updated flip page[1]. Most of the flip was
>> concluded
 in previous discussion thread. The major changes since then are:

 * accepting java.lang.Object in the Java DSL

 * adding $ interpolation for a column in the Scala DSL

 I think it's important to move those changes forward as it makes it
 easier to transition to the new type system (Java parser supports only
 the old type system stack for now) that we are working on for the past
 releases.

 Because the previous discussion thread was rather conclusive I want to
 start already with a vote. If you think we need another round of
 discussion, feel free to say so.

 The vote will last for at least 72 hours, following the consensus
>> voting
 process.

 FLIP wiki:

 [1]

>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL

 Discussion thread:

>>> 
>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E

>>> 
>>> 
>> 
>> --
>> Best, Jingsong Lee
>>

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Dian Fu

+1 (non-binding)

- Verified the signature and checksum
- Pip installed the package successfully: pip install apache-flink-1.9.2.tar.gz
- Run word count example successfully.

Regards,
Dian

> 在 2020年2月11日，上午11:44，jincheng sun  写道：
> 
> 
> +1 (binding) 
> 
> - Install the PyFlink by `pip install` [SUCCESS]
> - Run word_count in both command line and IDE [SUCCESS]
> 
> Best,
> Jincheng
> 
> 
> 
> Wei Zhong mailto:weizhong0...@gmail.com>> 
> 于2020年2月11日周二 上午11:17写道：
> Hi,
> 
> Thanks for driving this, Jincheng.
> 
> +1 (non-binding) 
> 
> - Verified signatures and checksums.
> - Verified README.md and setup.py.
> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python 
> 3.7.5 successfully.
> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via 
> `pyflink-shell.sh local` and try the examples in the help message, run well 
> and no exception.
> - Try a word count example in IDE with Python 2.7.15 and Python 3.7.5, run 
> well and no exception.
> 
> Best,
> Wei
> 
> 
>> 在 2020年2月10日，19:12，jincheng sun > > 写道：
>> 
>> Hi everyone,
>> 
>> Please review and vote on the release candidate #1 for the PyFlink version 
>> 1.9.2, as follows:
>> 
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> The complete staging area is available for your review, which includes:
>> 
>> * the official Apache binary convenience releases to be deployed to 
>> dist.apache.org  [1], which are signed with the key 
>> with fingerprint 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from 
>> source code [3].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> Thanks,
>> Jincheng
>> 
>> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/ 
>> 
>> [2] https://dist.apache.org/repos/dist/release/flink/KEYS 
>> 
>> [3] https://github.com/apache/flink/tree/release-1.9.2 
>>

Re: [DISCUSS] Support scalar vectorized Python UDF in PyFlink

2020-02-10 Thread Dian Fu

Hi Jingsong,

You're right. I have updated the FLIP which reflects this. 

Thanks,
Dian

> 在 2020年2月11日，上午10:03，Jingsong Li  写道：
> 
> Hi Dian and Jincheng,
> 
> Thanks for your explanation. Think again. Maybe most of users don't want to
> modify this parameters.
> We all realize that "batch.size" should be a larger value, so "bundle.size"
> must also be increased. Now the default value of "bundle.size" is only 1000.
> I think you can update design to provide meaningful default value for
> "batch.size" and "bundle.size".
> 
> Best,
> Jingsong Lee
> 
> On Mon, Feb 10, 2020 at 4:36 PM Dian Fu  wrote:
> 
>> Hi Jincheng, Hequn & Jingsong,
>> 
>> Thanks a lot for your suggestions. I have created FLIP-97[1] for this
>> feature.
>> 
>>> One little suggestion: maybe it would be nice if we can add some
>> performance explanation in the document? (I just very curious:))
>> Thanks for the suggestion. I have updated the design doc in the
>> "BackGround" section about where the performance gains could be got from.
>> 
>>> It seems that a batch should always in a bundle. Bundle size should
>> always
>> bigger than batch size. (if a batch can not cross bundle).
>> Can you explain this relationship to the document?
>> I have updated the design doc explaining more about these two
>> configurations.
>> 
>>> In the batch world, vectorization batch size is about 1024+. What do you
>> think about the default value of "batch"?
>> Is there any link about where this value comes from? I have performed a
>> simple test for Pandas UDF which performs the simple +1 operation. The
>> performance is best when the batch size is set to 5000. I think it depends
>> on the data type of each column, the functionality the Pandas UDF does,
>> etc. However I agree with you that we could give a meaningful default value
>> for the "batch" size which works in most scenarios.
>> 
>>> Can we only configure one parameter and calculate another automatically?
>> For example, if we just want to "pipeline", "bundle.size" is twice as much
>> as "batch.size", is this work?
>> I agree with Jincheng that this is not feasible. I think that giving an
>> meaningful default value for the "batch.size" which works in most scenarios
>> is enough. What's your thought?
>> 
>> Thanks,
>> Dian
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
>> 
>> 
>> On Mon, Feb 10, 2020 at 4:25 PM jincheng sun 
>> wrote:
>> 
>>> Hi Jingsong,
>>> 
>>> Thanks for your feedback! I would like to share my thoughts regarding the
>>> follows question:
>>> 
>>>>> - Can we only configure one parameter and calculate another
>>> automatically? For example, if we just want to "pipeline", "bundle.size"
>> is
>>> twice as much as "batch.size", is this work?
>>> 
>>> I don't think this works. These two configurations are used for different
>>> purposes and there is no direct relationship between them and so I guess
>> we
>>> cannot infer a configuration from the other configuration.
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Jingsong Li  于2020年2月10日周一 下午1:53写道：
>>> 
>>>> Thanks Dian for your reply.
>>>> 
>>>> +1 to create a FLIP too.
>>>> 
>>>> About "python.fn-execution.bundle.size" and
>>>> "python.fn-execution.arrow.batch.size", I got what are you mean about
>>>> "pipeline". I agree.
>>>> It seems that a batch should always in a bundle. Bundle size should
>>> always
>>>> bigger than batch size. (if a batch can not cross bundle).
>>>> Can you explain this relationship to the document?
>>>> 
>>>> I think default value is a very important thing, we can discuss:
>>>> - In the batch world, vectorization batch size is about 1024+. What do
>>> you
>>>> think about the default value of "batch"?
>>>> - Can we only configure one parameter and calculate another
>>> automatically?
>>>> For example, if we just want to "pipeline", "bundle.size" is twice as
>>> much
>>>> as "batch.size", is this work?
>>>> 
>>>> Best,
>>>> Jingsong Lee
>

[jira] [Created] (FLINK-15974) Support to use the Python UDF directly in the Python Table API

2020-02-10 Thread Dian Fu (Jira)

Dian Fu created FLINK-15974:
---

 Summary: Support to use the Python UDF directly in the Python 
Table API
 Key: FLINK-15974
 URL: https://issues.apache.org/jira/browse/FLINK-15974
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


Currently, a Python UDF has been registered before using in Python Table API, 
e.g.
{code}
t_env.register_function("inc", inc)
table.select("inc(id)") \
 .insert_into("sink")
{code}

It would be great if we could support to use Python UDF directly in the Python 
Table API, e.g.
{code}
table.select(inc("id")) \
 .insert_into("sink")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15973) Optimize the execution plan where it refers the Python UDF result field in the where clause

2020-02-10 Thread Dian Fu (Jira)

Dian Fu created FLINK-15973:
---

 Summary: Optimize the execution plan where it refers the Python 
UDF result field in the where clause
 Key: FLINK-15973
 URL: https://issues.apache.org/jira/browse/FLINK-15973
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


For the following job:
{code}
t_env.register_function("inc", inc)

table.select("inc(id) as inc_id") \
 .where("inc_id > 0") \
 .insert_into("sink")
{code}

The execution plan is as following:
{code}
StreamExecPythonCalc(select=inc(f0) AS inc_id))
+- StreamExecCalc(select=id AS f0, where=>(f0, 0))
+--- StreamExecPythonCalc(select=id, inc(f0) AS f0))
+-StreamExecCalc(select=id, id AS f0))
+---StreamExecTableSourceScan(fields=id)
{code}

The plan is not the best. It should be as following:
{code}
StreamExecPythonCalc(select=f0)
+- StreamExecCalc(select=f0, where=>(f0, 0))
+--- StreamExecPythonCalc(select=inc(f0) AS f0))
+-StreamExecCalc(select=id, id AS f0))
+---StreamExecTableSourceScan(fields=id)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15971) Adjust the default value of bundle size and bundle time

2020-02-10 Thread Dian Fu (Jira)

Dian Fu created FLINK-15971:
---

 Summary: Adjust the default value of bundle size and bundle time
 Key: FLINK-15971
 URL: https://issues.apache.org/jira/browse/FLINK-15971
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


Currently the default value for "python.fn-execution.bundle.size" is 1000 and 
the default value for "python.fn-execution.bundle.time" is 1000ms. We should 
try to find out a meaningful default value which works best in most scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15970) Optimize the Python UDF execution to only serialize the value

2020-02-10 Thread Dian Fu (Jira)

Dian Fu created FLINK-15970:
---

 Summary: Optimize the Python UDF execution to only serialize the 
value
 Key: FLINK-15970
 URL: https://issues.apache.org/jira/browse/FLINK-15970
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


Currently, the window/timestamp/pane info are also serialized and sent between 
the Java operator and the Python worker. These informations are useless and 
after bumping beam to 2.19.0(BEAM-7951), optimization is possible to not 
serialize these fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Support scalar vectorized Python UDF in PyFlink

2020-02-10 Thread Dian Fu

Hi Jincheng, Hequn & Jingsong,

Thanks a lot for your suggestions. I have created FLIP-97[1] for this
feature.

> One little suggestion: maybe it would be nice if we can add some
performance explanation in the document? (I just very curious:))
Thanks for the suggestion. I have updated the design doc in the
"BackGround" section about where the performance gains could be got from.

> It seems that a batch should always in a bundle. Bundle size should always
bigger than batch size. (if a batch can not cross bundle).
Can you explain this relationship to the document?
I have updated the design doc explaining more about these two
configurations.

> In the batch world, vectorization batch size is about 1024+. What do you
think about the default value of "batch"?
Is there any link about where this value comes from? I have performed a
simple test for Pandas UDF which performs the simple +1 operation. The
performance is best when the batch size is set to 5000. I think it depends
on the data type of each column, the functionality the Pandas UDF does,
etc. However I agree with you that we could give a meaningful default value
for the "batch" size which works in most scenarios.

> Can we only configure one parameter and calculate another automatically?
For example, if we just want to "pipeline", "bundle.size" is twice as much
as "batch.size", is this work?
I agree with Jincheng that this is not feasible. I think that giving an
meaningful default value for the "batch.size" which works in most scenarios
is enough. What's your thought?

Thanks,
Dian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink


On Mon, Feb 10, 2020 at 4:25 PM jincheng sun 
wrote:

> Hi Jingsong,
>
> Thanks for your feedback! I would like to share my thoughts regarding the
> follows question:
>
> >> - Can we only configure one parameter and calculate another
> automatically? For example, if we just want to "pipeline", "bundle.size" is
> twice as much as "batch.size", is this work?
>
> I don't think this works. These two configurations are used for different
> purposes and there is no direct relationship between them and so I guess we
> cannot infer a configuration from the other configuration.
>
> Best,
> Jincheng
>
>
> Jingsong Li  于2020年2月10日周一 下午1:53写道：
>
> > Thanks Dian for your reply.
> >
> > +1 to create a FLIP too.
> >
> > About "python.fn-execution.bundle.size" and
> > "python.fn-execution.arrow.batch.size", I got what are you mean about
> > "pipeline". I agree.
> > It seems that a batch should always in a bundle. Bundle size should
> always
> > bigger than batch size. (if a batch can not cross bundle).
> > Can you explain this relationship to the document?
> >
> > I think default value is a very important thing, we can discuss:
> > - In the batch world, vectorization batch size is about 1024+. What do
> you
> > think about the default value of "batch"?
> > - Can we only configure one parameter and calculate another
> automatically?
> > For example, if we just want to "pipeline", "bundle.size" is twice as
> much
> > as "batch.size", is this work?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Feb 10, 2020 at 11:55 AM Hequn Cheng  wrote:
> >
> > > Hi Dian,
> > >
> > > Thanks a lot for bringing up the discussion!
> > >
> > > It is great to see the Pandas UDFs feature is going to be introduced. I
> > > think this would improve the performance and also the usability of
> > > user-defined functions (UDFs) in Python.
> > > One little suggestion: maybe it would be nice if we can add some
> > > performance explanation in the document? (I just very curious:))
> > >
> > > +1 to create a FLIP for this big enhancement.
> > >
> > > Best,
> > > Hequn
> > >
> > > On Mon, Feb 10, 2020 at 11:15 AM jincheng sun <
> sunjincheng...@gmail.com>
> > > wrote:
> > >
> > > > Hi Dian,
> > > >
> > > > Thanks for bring up this discussion. This is very important for the
> > > > ecological of PyFlink. Add support Pandas greatly enriches the
> > available
> > > > UDF library of PyFlink and greatly improves the usability of PyFlink!
> > > >
> > > > +1 for Support scalar vectorized Python UDF.
> > > >
> > > > I think we should to create a FLIP for this big enhancements. :)
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > >
> > > >
> > > > dianfu  于2020年2月5日周三 下午6:01写道：
> > > >
> > > > > Hi Jingsong,
> > > > >
> > > > > Thanks a lot for the valuable feedback.
> > > > >
> > > > > 1. The configurations "python.fn-execution.bundle.size" and
> > > > > "python.fn-execution.arrow.batch.size" are used for separate
> purposes
> > > > and I
> > > > > think they are both needed. If they are unified, the Python
> operator
> > > has
> > > > to
> > > > > wait the execution results of the previous batch of elements before
> > > > > processing the next batch. This means that the Python UDF execution
> > can
> > > > not
> > > > > be pipelined between batches. With separate configuration,

Re: [DISCUSS] Support Python ML Pipeline API

2020-02-09 Thread Dian Fu

Hi Hequn,

Thanks for bringing up the discussion. +1 to this feature. The design LGTM.
It's great that the Python ML users could use both the Java Pipeline
Transformer/Estimator/Model classes and the Python
Pipeline Transformer/Estimator/Model in the same job.

Regards,
Dian

On Mon, Feb 10, 2020 at 11:08 AM jincheng sun 
wrote:

> Hi Hequn,
>
> Thanks for bring up this discussion.
>
> +1 for add Python ML Pipeline API, even though the Java pipeline API may
> change.
>
> I would like to suggest create a FLIP for this API changes. :)
>
> Best,
> Jincheng
>
>
> Hequn Cheng  于2020年2月5日周三 下午5:24写道：
>
> > Hi everyone,
> >
> > FLIP-39[1] rebuilds the Flink ML pipeline on top of TableAPI and
> introduces
> > a new set of Java APIs. As Python is widely used in ML areas, providing
> > Python ML Pipeline APIs for Flink can not only make it easier to write ML
> > jobs for Python users but also broaden the adoption of Flink ML.
> >
> > Given this, Jincheng and I discussed offline about the support of Python
> ML
> > Pipeline API and drafted a design doc[2]. We'd like to achieve three
> goals
> > for supporting Python Pipeline API:
> > - Add Python pipeline API according to Java pipeline API(we will adapt
> the
> > Python pipeline API if Java pipeline API changes).
> > - Support native Python Transformer/Estimator/Model, i.e., users can
> write
> > not only Python Transformer/Estimator/Model wrappers for calling Java
> ones
> > but also can write native Python Transformer/Estimator/Models.
> > - Ease of use. Support keyword arguments when defining parameters.
> >
> > More details can be found in the design doc and we are looking forward to
> > your feedback.
> >
> > Best,
> > Hequn
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> > [2]
> >
> >
> https://docs.google.com/document/d/1fwSO5sRNWMoYuvNgfQJUV6N2n2q5UEVA4sezCljKcVQ/edit?usp=sharing
> >
>

[jira] [Created] (FLINK-15929) test_dependency failed on travis

2020-02-05 Thread Dian Fu (Jira)

Dian Fu created FLINK-15929:
---

 Summary: test_dependency failed on travis
 Key: FLINK-15929
 URL: https://issues.apache.org/jira/browse/FLINK-15929
 Project: Flink
  Issue Type: Test
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.10.1, 1.11.0


The Python tests "test_dependency" is instable. It failed on travis with the 
following exception:
{code}
"Source: PythonInputFormatTableSource(a, b) -> 
SourceConversion(table=[default_catalog.default_database.Unregistered_TableSource_862767019,
 source: [PythonInputFormatTableSource(a, b)]], fields=[a, b]) -> 
StreamExecPythonCalc -> Calc(select=[f0 AS _c0, a]) -> SinkConversionToRow -> 
Map -> Sink: Unnamed (1/2)" #581 prio=5 os_prio=0 cpu=32.04ms elapsed=302.56s 
tid=0x01f26000 nid=0x4662 waiting on condition  [0x7f0acb7f5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.2/Native Method)
- parking to wait for  <0x8aa3bfc0> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.2/LockSupport.java:234)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.2/CompletableFuture.java:1798)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.2/ForkJoinPool.java:3128)
at 
java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.2/CompletableFuture.java:1868)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.2/CompletableFuture.java:2021)
at 
org.apache.beam.runners.fnexecution.control.MapControlClientPool.getClient(MapControlClientPool.java:69)
at 
org.apache.beam.runners.fnexecution.control.MapControlClientPool$$Lambda$1090/0x000100d70040.take(Unknown
 Source)
at 
org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:126)
at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178)
at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
- locked <0x8aa02788> (a 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$StrongEntry)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211)
at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202)
at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
at 
org.apache.flink.python.AbstractPythonFunctionRunner.open(AbstractPythonFunctionRunner.java:179)
at 
org.apache.flink.table.runtime.operators.python.AbstractPythonScalarFunctionOperator$ProjectUdfInputPythonScalarFunctionRunner.open(AbstractPythonScalarFunctionOperator.java:193)
at 
org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:139)
at 
org.apache.flink.table.runtime.operators.python.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:143)
at 
org.apache.flink.table.runtime.operators.python.BaseRowPythonScalarFunctionOperator.open(BaseRowPythonScalarFunctionOperator.java:86)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1007)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$

[jira] [Created] (FLINK-15902) Improve the Python API doc of the version they are added

2020-02-04 Thread Dian Fu (Jira)

Dian Fu created FLINK-15902:
---

 Summary: Improve the Python API doc of the version they are added
 Key: FLINK-15902
 URL: https://issues.apache.org/jira/browse/FLINK-15902
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0, 1.10.1


Currently it's not possible to know in which version a Python API is added. 
This information is very useful for users and it should be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15897) Deserialize the Python UDF execution results when sending them out

2020-02-04 Thread Dian Fu (Jira)

Dian Fu created FLINK-15897:
---

 Summary: Deserialize the Python UDF execution results when sending 
them out
 Key: FLINK-15897
 URL: https://issues.apache.org/jira/browse/FLINK-15897
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Currently, the Python UDF execution results are deserialized and then buffered 
in a collection when received from the Python worker. The deserialization could 
be deferred when sending the execution results to the downstream operator. 
That's to say, it buffers the serialized bytes instead of the deserialized Java 
objects in the buffer. This could reduce the memory footprint of the Java 
operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15875) Bump Beam to 2.19.0

2020-02-03 Thread Dian Fu (Jira)

Dian Fu created FLINK-15875:
---

 Summary: Bump Beam to 2.19.0
 Key: FLINK-15875
 URL: https://issues.apache.org/jira/browse/FLINK-15875
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


Currently PyFlink depends on Beam's portability framework for Python UDF 
execution. The current dependent version is 2.15.0. We should bump it to 2.19.0 
as it includes several critical features/fixes needed, e.g.
1) BEAM-7951: It allows to not serialize the window/timestamp/pane info between 
the Java operator and the Python worker which could definitely improve the 
performance a lot
2) BEAM-8935: It allows to fail fast if the Python worker start up failed. 
Currently it takes 2 minutes to detect whether the Python worker is started 
successfully. 
3) BEAM-7948: It supports periodically flush the data between the Java operator 
and the Python worker. This's especially useful for streaming jobs and could 
improve the latency.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Yu Li became a Flink committer

2020-01-23 Thread Dian Fu

Congrats Yu!

> 在 2020年1月23日，下午6:47，Hequn Cheng  写道：
> 
> Congratulations Yu!
> Thanks a lot for being the release manager of the big 1.10 release. You are
> doing a very good job!
> 
> Best, Hequn
> 
> 
> On Thu, Jan 23, 2020 at 6:29 PM Jingsong Li  wrote:
> 
>> Congratulations Yu, well deserved!
>> 
>> And thanks for your great contribution to the 1.10 release.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Thu, Jan 23, 2020 at 6:14 PM Fabian Hueske  wrote:
>> 
>>> Congrats Yu!
>>> Good to have you on board!
>>> 
>>> Cheers, Fabian
>>> 
>>> Am Do., 23. Jan. 2020 um 11:13 Uhr schrieb Piotr Nowojski <
>>> pi...@ververica.com>:
>>> 
 Congratulations! :)
 
> On 23 Jan 2020, at 10:48, Tzu-Li (Gordon) Tai 
 wrote:
> 
> Congratulations :)
> 
> On Thu, Jan 23, 2020, 5:07 PM Yadong Xie 
>> wrote:
> 
>> Well deserved!
>> 
>> Yangze Guo  于2020年1月23日周四 下午5:06写道：
>> 
>>> Congratulations!
>>> 
>>> Best,
>>> Yangze Guo
>>> 
>>> 
>>> On Thu, Jan 23, 2020 at 4:59 PM Stephan Ewen 
>>> wrote:
 
 Hi all!
 
 We are announcing that Yu Li has joined the rank of Flink
>>> committers.
 
 Yu joined already in late December, but the announcement got lost
>> because
 of the Christmas and New Years season, so here is a belated proper
 announcement.
 
 Yu is one of the main contributors to the state backend components
>>> in
>> the
 recent year, working on various improvements, for example the
>>> RocksDB
 memory management for 1.10.
 He has also been one of the release managers for the big 1.10
>>> release.
 
 Congrats for joining us, Yu!
 
 Best,
 Stephan
>>> 
>> 
 
 
>>> 
>> 
>> 
>> --
>> Best, Jingsong Lee
>>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-17 Thread Dian Fu

Thanks everyone for your warm welcome.
It's my pleasure to be part of the community and looking forward to
contribute more in the future.

Regards,
Dian

On Fri, Jan 17, 2020 at 4:03 PM Yang Wang  wrote:

> Congratulations!
>
>
> Best,
> Yang
>
> Terry Wang  于2020年1月17日周五 下午3:28写道：
>
>> Congratulations!
>>
>> Best,
>> Terry Wang
>>
>>
>>
>> 2020年1月17日 14:09，Biao Liu  写道：
>>
>> Congrats!
>>
>> Thanks,
>> Biao /'bɪ.aʊ/
>>
>>
>>
>> On Fri, 17 Jan 2020 at 13:43, Rui Li  wrote:
>>
>>> Congratulations Dian, well deserved!
>>>
>>> On Thu, Jan 16, 2020 at 5:58 PM jincheng sun 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I'm very happy to announce that Dian accepted the offer of the Flink
>>>> PMC to become a committer of the Flink project.
>>>>
>>>> Dian Fu has been contributing to Flink for many years. Dian Fu played
>>>> an essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
>>>> contributed several major features, reported and fixed many bugs, spent a
>>>> lot of time reviewing pull requests and also frequently helping out on the
>>>> user mailing lists and check/vote the release.
>>>>
>>>> Please join in me congratulating Dian for becoming a Flink committer !
>>>>
>>>> Best,
>>>> Jincheng(on behalf of the Flink PMC)
>>>>
>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>
>>

[jira] [Created] (FLINK-15493) FlinkKafkaInternalProducerITCase.testProducerWhenCommitEmptyPartitionsToOutdatedTxnCoordinator failed on travis

2020-01-06 Thread Dian Fu (Jira)

Dian Fu created FLINK-15493:
---

 Summary: 
FlinkKafkaInternalProducerITCase.testProducerWhenCommitEmptyPartitionsToOutdatedTxnCoordinator
 failed on travis
 Key: FLINK-15493
 URL: https://issues.apache.org/jira/browse/FLINK-15493
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


FlinkKafkaInternalProducerITCase.testProducerWhenCommitEmptyPartitionsToOutdatedTxnCoordinator
 failed on travis with the following exception:
{code:java}
Test 
testProducerWhenCommitEmptyPartitionsToOutdatedTxnCoordinator(org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase)
 failed with: org.junit.runners.model.TestTimedOutException: test timed out 
after 3 milliseconds at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:502) at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92)
 at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) 
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.createTestTopic(KafkaTestEnvironmentImpl.java:177)
 at 
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.createTestTopic(KafkaTestEnvironment.java:115)
 at 
org.apache.flink.streaming.connectors.kafka.KafkaTestBase.createTestTopic(KafkaTestBase.java:197)
 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaInternalProducerITCase.testProducerWhenCommitEmptyPartitionsToOutdatedTxnCoordinator(FlinkKafkaInternalProducerITCase.java:176)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)
{code}
instance: [https://api.travis-ci.org/v3/job/633307060/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-05 Thread Dian Fu

+1 to set blink planner as the default planner for SQL client considering that 
so many features added since 1.10 are only available in the blink planner.

> 在 2020年1月6日，上午11:04，Rui Li  写道：
> 
> +1. I think it improves user experience.
> 
> On Mon, Jan 6, 2020 at 10:18 AM Zhenghua Gao  > wrote:
> +1 for making blink planner as the default planner for SQL Client since we 
> have made a huge improvement in 1.10.
> 
> Best Regards,
> Zhenghua Gao
> 
> 
> On Sun, Jan 5, 2020 at 2:42 PM Benchao Li  > wrote:
> +1 
> 
> We have used blink planner since 1.9.0 release in our production environment, 
> and it behaves really impressive.
> 
> Hequn Cheng mailto:chenghe...@gmail.com>> 于2020年1月5日周日 
> 下午1:58写道：
> +1 to make blink planner as the default planner for SQL Client, hence we can 
> give the blink planner a bit more exposure. 
> 
> Best, Hequn
> 
> On Fri, Jan 3, 2020 at 6:32 PM Jark Wu  > wrote:
> Hi Benoît,
> 
> Thanks for the reminder. I will look into the issue and hopefully we can 
> target it into 1.9.2 and 1.10. 
> 
> Cheers,
> Jark
> 
> On Fri, 3 Jan 2020 at 18:21, Benoît Paris  > wrote:
> >  If anyone finds that blink planner has any significant defects and has a 
> > larger regression than the old planner, please let us know.
> 
> Overall, the Blink-exclusive features are must (TopN, deduplicate, 
> LAST_VALUE, plan reuse, etc)! But all use cases of the Legacy planner in 
> production are not covered:
> An edge case of Temporal Table Functions does not allow computed Tables (as 
> opposed to TableSources) to be used on the query side in Blink 
> (https://issues.apache.org/jira/browse/FLINK-14200 
> )
> 
> Cheers
> Ben
> 
> 
> On Fri, Jan 3, 2020 at 10:00 AM Jeff Zhang  > wrote:
> +1, I have already made blink as the default planner of flink interpreter in 
> Zeppelin
> 
> 
> Jingsong Li mailto:jingsongl...@gmail.com>> 
> 于2020年1月3日周五 下午4:37写道：
> Hi Jark,
> 
> +1 for default blink planner in SQL-CLI.
> I believe this new planner can be put into practice in production.
> We've worked hard for nearly a year, but the old planner didn't move on.
> 
> And I'd like to cc to u...@flink.apache.org .
> If anyone finds that blink planner has any significant defects and has a 
> larger regression than the old planner, please let us know. We will be very 
> grateful.
> 
> Best,
> Jingsong Lee
> 
> On Fri, Jan 3, 2020 at 4:14 PM Leonard Xu  > wrote:
> +1 for this. 
> We bring many SQL/API features and enhance stability in 1.10 release, and 
> almost all of them happens in Blink planner.
> SQL CLI is the most convenient entrypoint for me, I believe many users will 
> have a better experience If we set Blink planner as default planner.
> 
> Best,
> Leonard
> 
> > 在 2020年1月3日，15:16，Terry Wang  > > 写道：
> > 
> > Since what blink planner can do is a superset of flink planner, big +1 for 
> > changing the default planner to Blink planner from my side.
> > 
> > Best,
> > Terry Wang
> > 
> > 
> > 
> >> 2020年1月3日 15:00，Jark Wu mailto:imj...@gmail.com>> 写道：
> >> 
> >> Hi everyone,
> >> 
> >> In 1.10 release, Flink SQL supports many awesome features and improvements,
> >> including:
> >> - support watermark statement and computed column in DDL
> >> - fully support all data types in Hive
> >> - Batch SQL performance improvements (TPC-DS 7x than Hive MR)
> >> - support INSERT OVERWRITE and INSERT PARTITION
> >> 
> >> However, all the features and improvements are only avaiable in Blink
> >> planner, not in Old planner.
> >> There are also some other features are limited in Blink planner, e.g.
> >> Dimension Table Join [1],
> >> TopN [2], Deduplicate [3], streaming aggregates optimization [4], and so 
> >> on.
> >> 
> >> But Old planner is still the default planner in Table API & SQL. It is
> >> frustrating for users to set
> >> to blink planner manually when every time start a SQL CLI. And it's
> >> surprising to see unsupported
> >> exception if they trying out the new features but not switch planner.
> >> 
> >> SQL CLI is a very important entrypoint for trying out new feautures and
> >> prototyping for users.
> >> In order to give new planner more exposures, I would like to suggest to set
> >> default planner
> >> for SQL Client to Blink planner before 1.10 release.
> >> 
> >> The approach is just changing the default SQL CLI yaml configuration[5]. In
> >> this way, the existing
> >> environment is still compatible and unaffected.
> >> 
> >> Changing the default planner for the whole Table API & SQL is another topic
> >> and is out of scope of this discussion.
> >> 
> >> What do you think?
> >> 
> >> Best,
> >> Jark
> >> 
> >> [1]:
> >> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table
>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

2019-12-29 Thread Dian Fu

Hi all,

Sorry to jump into this discussion. Thanks everyone for the discussion. I'm 
very interested in this topic although I'm not an expert in this part. So I'm 
glad to share my thoughts as following:

1) It's better to have a whole design for this feature
As we know, there are two deployment modes: per-job mode and session mode. I'm 
wondering which mode really needs this feature. As the design doc mentioned, 
per-job mode is more used for streaming jobs and session mode is usually used 
for batch jobs(Of course, the job types and the deployment modes are 
orthogonal). Usually streaming job is only needed to be submitted once and it 
will run for days or weeks, while batch jobs will be submitted more frequently 
compared with streaming jobs. This means that maybe session mode also needs 
this feature. However, if we support this feature in session mode, the 
application master will become the new centralized service(which should be 
solved). So in this case, it's better to have a complete design for both 
per-job mode and session mode. Furthermore, even if we can do it phase by 
phase, we need to have a whole picture of how it works in both per-job mode and 
session mode.

2) It's better to consider the convenience for users, such as debugging
After we finish this feature, the job graph will be compiled in the application 
master, which means that users cannot easily get the exception message 
synchorousely in the job client if there are problems during the job graph 
compiling (especially for platform users), such as the resource path is 
incorrect, the user program itself has some problems, etc. What I'm thinking is 
that maybe we should throw the exceptions as early as possible (during job 
submission stage).

3) It's better to consider the impact to the stability of the cluster
If we perform the compiling in the application master, we should consider the 
impact of the compiling errors. Although YARN could resume the application 
master in case of failures, but in some case the compiling failure may be a 
waste of cluster resource and may impact the stability the cluster and the 
other jobs in the cluster, such as the resource path is incorrect, the user 
program itself has some problems(in this case, job failover cannot solve this 
kind of problems) etc. In the current implemention, the compiling errors are 
handled in the client side and there is no impact to the cluster at all.

Regarding to 1), it's clearly pointed in the design doc that only per-job mode 
will be supported. However, I think it's better to also consider the session 
mode in the design doc.
Regarding to 2) and 3), I have not seen related sections in the design doc. It 
will be good if we can cover them in the design doc.

Feel free to correct me If there is anything I misunderstand.

Regards,
Dian

> 在 2019年12月27日，上午3:13，Peter Huang  写道：
> 
> Hi Yang,
> 
> I can't agree more. The effort definitely needs to align with the final
> goal of FLIP-73.
> I am thinking about whether we can achieve the goal with two phases.
> 
> 1) Phase I
> As the CLiFrontend will not be depreciated soon. We can still use the
> deployMode flag there,
> pass the program info through Flink configuration,  use the
> ClassPathJobGraphRetriever
> to generate the job graph in ClusterEntrypoints of yarn and Kubernetes.
> 
> 2) Phase II
> In  AbstractJobClusterExecutor, the job graph is generated in the execute
> function. We can still
> use the deployMode in it. With deployMode = cluster, the execute function
> only starts the cluster.
> 
> When {Yarn/Kuberneates}PerJobClusterEntrypoint starts, It will start the
> dispatch first, then we can use
> a ClusterEnvironment similar to ContextEnvironment to submit the job with
> jobName the local
> dispatcher. For the details, we need more investigation. Let's wait
> for @Aljoscha
> Krettek  @Till Rohrmann 's
> feedback after the holiday season.
> 
> Thank you in advance. Merry Chrismas and Happy New Year!!!
> 
> 
> Best Regards
> Peter Huang
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Dec 25, 2019 at 1:08 AM Yang Wang  wrote:
> 
>> Hi Peter,
>> 
>> I think we need to reconsider tison's suggestion seriously. After FLIP-73,
>> the deployJobCluster has
>> beenmoved into `JobClusterExecutor#execute`. It should not be perceived
>> for `CliFrontend`. That
>> means the user program will *ALWAYS* be executed on client side. This is
>> the by design behavior.
>> So, we could not just add `if(client mode) .. else if(cluster mode) ...`
>> codes in `CliFrontend` to bypass
>> the executor. We need to find a clean way to decouple executing user
>> program and deploying per-job
>> cluster. Based on this, we could support to execute user program on client
>> or master side.
>> 
>> Maybe Aljoscha and Jeff could give some good suggestions.
>> 
>> 
>> 
>> Best,
>> Yang
>> 
>> Peter Huang  于2019年12月25日周三 上午4:03写道：
>> 
>>> Hi Jingjing,
>>> 
>>> The improvement proposed is a deployment option for CLI. For SQL based
>>> Flink application, It is

[jira] [Created] (FLINK-15414) KafkaITCase#prepare failed in travis

2019-12-26 Thread Dian Fu (Jira)

Dian Fu created FLINK-15414:
---

 Summary: KafkaITCase#prepare failed in travis
 Key: FLINK-15414
 URL: https://issues.apache.org/jira/browse/FLINK-15414
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Dian Fu


The nightly travis for release-1.9 failed with the following error:
{code:java}
15:43:24.454 [ERROR] Errors: 809815:43:24.455 [ERROR]   
KafkaITCase.prepare:58->KafkaTestBase.prepare:92->KafkaTestBase.prepare:100->KafkaTestBase.startClusters:134->KafkaTestBase.startClusters:145
 » Kafka
{code}
instance: [https://api.travis-ci.org/v3/job/629636116/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15413) ScalarOperatorsTest failed in travis

2019-12-26 Thread Dian Fu (Jira)

Dian Fu created FLINK-15413:
---

 Summary: ScalarOperatorsTest failed in travis
 Key: FLINK-15413
 URL: https://issues.apache.org/jira/browse/FLINK-15413
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Dian Fu


The travis of release-1.9 nightly failed with the following error:
{code:java}
14:50:19.796 [ERROR] ScalarOperatorsTest>ExpressionTestBase.evaluateExprs:161 
Wrong result for: [CASE WHEN (CASE WHEN f2 = 1 THEN CAST('' as INT) ELSE 0 END) 
is null THEN 'null' ELSE 'not null' END] optimized to: [_UTF-16LE'not 
null':VARCHAR(8) CHARACTER SET "UTF-16LE"] expected: but was:
{code}
instance: [https://api.travis-ci.org/v3/job/629636107/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15412) LocalExecutorITCase#testParameterizedTypes failed in travis

2019-12-26 Thread Dian Fu (Jira)

Dian Fu created FLINK-15412:
---

 Summary: LocalExecutorITCase#testParameterizedTypes failed in 
travis
 Key: FLINK-15412
 URL: https://issues.apache.org/jira/browse/FLINK-15412
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Reporter: Dian Fu


The travis of release-1.9 nightly failed with the following error:
{code:java}
14:43:17.916 [INFO] Running 
org.apache.flink.table.client.gateway.local.LocalExecutorITCase
14:44:47.388 [ERROR] Tests run: 34, Failures: 0, Errors: 1, Skipped: 1, Time 
elapsed: 89.468 s <<< FAILURE! - in 
org.apache.flink.table.client.gateway.local.LocalExecutorITCase
14:44:47.388 [ERROR] testParameterizedTypes[Planner: 
blink](org.apache.flink.table.client.gateway.local.LocalExecutorITCase) Time 
elapsed: 7.88 s <<< ERROR!
org.apache.flink.table.client.gateway.SqlExecutionException: Invalid SQL 
statement at 
org.apache.flink.table.client.gateway.local.LocalExecutorITCase.testParameterizedTypes(LocalExecutorITCase.java:557)
Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
failed. findAndCreateTableSource failed
 at 
org.apache.flink.table.client.gateway.local.LocalExecutorITCase.testParameterizedTypes(LocalExecutorITCase.java:557)
Caused by: org.apache.flink.table.api.TableException: findAndCreateTableSource 
failed
 at 
org.apache.flink.table.client.gateway.local.LocalExecutorITCase.testParameterizedTypes(LocalExecutorITCase.java:557)
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: 
Could not find a suitable table factory for 
'org.apache.flink.table.factories.TableSourceFactory' in
the classpath.
Reason: No context matches.
{code}
instance: [https://api.travis-ci.org/v3/job/629636106/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15271) Add documentation about the Python environment requirements

2019-12-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-15271:
---

 Summary: Add documentation about the Python environment 
requirements
 Key: FLINK-15271
 URL: https://issues.apache.org/jira/browse/FLINK-15271
 Project: Flink
  Issue Type: Task
  Components: API / Python, Documentation
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


Python UDF has specific requirements about the Python environments, such as 
Python 3.5+, Beam 2.15.0, etc.  We should add clear documentation about these 
requirements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15270) Add documentation about how to specify third-party dependencies via API for Python UDFs

2019-12-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-15270:
---

 Summary: Add documentation about how to specify third-party 
dependencies via API for Python UDFs
 Key: FLINK-15270
 URL: https://issues.apache.org/jira/browse/FLINK-15270
 Project: Flink
  Issue Type: Task
  Components: API / Python, Documentation
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


Currently we have already provided APIs and command line options to allow users 
to specify third-part dependencies which may be used in Python UDFs. There are 
already documentation about how to specify third-part dependencies in the 
command line options. We should also add documentation about how to specify 
third-party dependencies via API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-15 Thread Dian Fu

Congrats Zhu Zhu!

> 在 2019年12月15日，下午6:23，Zhu Zhu  写道：
> 
> Thanks everyone for the warm welcome!
> It's my honor and pleasure to improve Flink with all of you in the
> community!
> 
> Thanks,
> Zhu Zhu
> 
> Benchao Li  于2019年12月15日周日 下午3:54写道：
> 
>> Congratulations!:)
>> 
>> Hequn Cheng  于2019年12月15日周日 上午11:47写道：
>> 
>>> Congrats, Zhu Zhu!
>>> 
>>> Best, Hequn
>>> 
>>> On Sun, Dec 15, 2019 at 6:11 AM Shuyi Chen  wrote:
>>> 
 Congratulations!
 
 On Sat, Dec 14, 2019 at 7:59 AM Rong Rong  wrote:
 
> Congrats Zhu Zhu :-)
> 
> --
> Rong
> 
> On Sat, Dec 14, 2019 at 4:47 AM tison  wrote:
> 
>> Congratulations!:)
>> 
>> Best,
>> tison.
>> 
>> 
>> OpenInx  于2019年12月14日周六 下午7:34写道：
>> 
>>> Congrats Zhu Zhu!
>>> 
>>> On Sat, Dec 14, 2019 at 2:38 PM Jeff Zhang 
>>> wrote:
>>> 
 Congrats, Zhu Zhu!
 
 Paul Lam  于2019年12月14日周六 上午10:29写道：
 
> Congrats Zhu Zhu!
> 
> Best,
> Paul Lam
> 
> Kurt Young  于2019年12月14日周六 上午10:22写道：
> 
>> Congratulations Zhu Zhu!
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Sat, Dec 14, 2019 at 10:04 AM jincheng sun <
 sunjincheng...@gmail.com>
>> wrote:
>> 
>>> Congrats ZhuZhu and welcome on board!
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Jark Wu  于2019年12月14日周六 上午9:55写道：
>>> 
 Congratulations, Zhu Zhu!
 
 Best,
 Jark
 
 On Sat, 14 Dec 2019 at 08:20, Yangze Guo <
 karma...@gmail.com
>> 
 wrote:
 
> Congrats, ZhuZhu!
> 
> Bowen Li  于 2019年12月14日周六
>>> 上午5:37写道：
> 
>> Congrats!
>> 
>> On Fri, Dec 13, 2019 at 10:42 AM Xuefu Z <
>> usxu...@gmail.com>
>> wrote:
>> 
>>> Congratulations, Zhu Zhu!
>>> 
>>> On Fri, Dec 13, 2019 at 10:37 AM Peter Huang <
> huangzhenqiu0...@gmail.com
>>> 
>>> wrote:
>>> 
 Congratulations!:)
 
 On Fri, Dec 13, 2019 at 9:45 AM Piotr Nowojski
>> <
 pi...@ververica.com>
 wrote:
 
> Congratulations! :)
> 
>> On 13 Dec 2019, at 18:05, Fabian Hueske <
> fhue...@gmail.com
>>> 
> wrote:
>> 
>> Congrats Zhu Zhu and welcome on board!
>> 
>> Best, Fabian
>> 
>> Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb
>>> Till
 Rohrmann
> <
>> trohrm...@apache.org>:
>> 
>>> Hi everyone,
>>> 
>>> I'm very happy to announce that Zhu Zhu
 accepted
>> the
> offer
>>> of
> the
 Flink
> PMC
>>> to become a committer of the Flink
>> project.
>>> 
>>> Zhu Zhu has been an active community
>> member
 for
>> more
> than
>> a
 year
>>> now.
> Zhu
>>> Zhu played an essential role in the
>>> scheduler
> refactoring,
> helped
>>> implementing fine grained recovery, drives
> FLIP-53
>>> and
>> fixed
>> various
> bugs
>>> in the scheduler and runtime. Zhu Zhu also
> helped
>>> the
 community
> by
>>> reporting issues, answering user mails and
 being
 active
> on
>>> the
> dev
> mailing
>>> list.
>>> 
>>> Congratulations Zhu Zhu!
>>> 
>>> Best, Till
>>> (on behalf of the Flink PMC)
>>> 
> 
> 
 
>>> 
>>> 
>>> --
>>> Xuefu Zhang
>>> 
>>> "In Honey We Trust!"
>>> 
>> 
> 
 
>>> 
>> 
> 
 
 
 --
 Best Regards
 
 Jeff Zhang
 
>>> 
>> 
> 
 
>>> 
>> 
>> 
>> --
>> 
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Dian Fu

Thanks Hequn for being the release manager and everyone who contributed to this 
release.

Regards,
Dian

> 在 2019年12月12日，下午2:24，Hequn Cheng  写道：
> 
> Hi,
>  
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.8.3, which is the third bugfix release for the Apache Flink 1.8 
> series.
>  
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
>  
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
>  
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release:
> https://flink.apache.org/news/2019/12/11/release-1.8.3.html 
> 
>  
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12346112 
> 
>  
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> Great thanks to @Jincheng as a mentor during this release.
>  
> Regards,
> Hequn

Re: [ANNOUNCE] Feature freeze for Apache Flink 1.10.0 release

2019-12-10 Thread Dian Fu

Hi Yu & Gary,

Thanks for your great work. Looking forward to the 1.10 release.

Regards,
Dian

> 在 2019年12月11日，上午10:29，Danny Chan  写道：
> 
> Hi Yu & Gary,
> 
> Thanks for your nice job ~
> 
> Best,
> Danny Chan
> 在 2019年11月19日 +0800 PM11:44，dev@flink.apache.org，写道：
>> 
>> Hi Yu & Gary,
>> 
>> Thanks a lot for your work and looking forward to the 1.10 release. :)

Re: [VOTE] Release 1.8.3, release candidate #3

2019-12-10 Thread Dian Fu

+1 (non-binding)

- verified signatures and checksums
- build from source without tests with Scala 2.11 and Scala 2.12
- start standalone cluster and web ui is accessible
- submit word count example of both batch and stream, there is no suspicious 
log output
- run a couple of tests in IDE
- verified that all POM files point to 1.8.3
- the release note and website PR looks good

Regards,
Dian

> 在 2019年12月10日，下午10:58，Till Rohrmann  写道：
> 
> Hi everyone,
> 
> +1 (binding)
> 
> - verified that e2e tests pass on Travis
> - verified checksums and signatures
> - built Flink from sources with Scala 2.12
> - ran examples on standalone cluster
> 
> Cheers,
> Till
> 
> On Tue, Dec 10, 2019 at 12:23 PM Yang Wang  wrote:
> 
>> Hi Hequn,
>> 
>> +1 (non-binding)
>> 
>> - verified checksums and hashes
>> - built from sources (Scala 2.11 and Scala 2.12)
>> - running flink per-job/session cluster on Yarn with more that 1000
>> containers, good
>> 
>> Danny Chan  于2019年12月10日周二 上午9:39写道：
>> 
>>> Hi Hequn,
>>> 
>>> +1 (non-binding)
>>> 
>>> - verified checksums and hashes
>>> - built from local with MacOS 10.14 and JDK8
>>> - do some check in the SQL-CLI
>>> - run some tests in IDE
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年12月5日 +0800 PM9:39，Hequn Cheng ，写道：
 Hi everyone,
 
 Please review and vote on the release candidate #3 for the version
>> 1.8.3,
 as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)
 
 
 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to
>>> be
 deployed to dist.apache.org [2], which are signed with the key with
 fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "release-1.8.3-rc3" [5],
 * website pull request listing the new release and adding announcement
>>> blog
 post [6].
 
 The vote will be open for at least 72 hours.
 Please cast your votes before *Dec. 10th 2019, 16:00 UTC*.
 
 It is adopted by majority approval, with at least 3 PMC affirmative
>>> votes.
 
 Thanks,
 Hequn
 
 [1]
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346112
 [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc3/
 [3] https://dist.apache.org/repos/dist/release/flink/KEYS
 [4]
>>> https://repository.apache.org/content/repositories/orgapacheflink-1314/
 [5]
 
>>> 
>> https://github.com/apache/flink/commit/d54807ba10d0392a60663f030f9fe0bfa1c66754
 [6] https://github.com/apache/flink-web/pull/285
>>> 
>>

[RESULT] [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi everyone,

Thanks for the discussion and votes.

So far we have received 4 approving votes, 3 of which are binding and there is 
no -1 votes:
* Jincheng (binding)
* Hequn (binding)
* Jark (binding)
* Jingsong (non-binding)

Therefore, I'm happy to announce that FLIP-88 has been accepted.

Thanks everyone!

Regards,
Dian

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi Jingsong,

Appreciated for your sharing. It's very helpful as the Python operator will 
take the similar way.

Thanks,
Dian

> 在 2019年12月6日，上午11:12，Jingsong Li  写道：
> 
> Hi Dian,
> 
> After [1] and [2], in the batch sql world, we will:
> - [2] In client/compile side: we use memory weight request memory for
> Transformation.
> - [1] In runtime side: we use memory fraction to compute memory size and
> allocate in StreamOperator.
> For your information.
> 
> [1] https://jira.apache.org/jira/browse/FLINK-14063
> [2] https://jira.apache.org/jira/browse/FLINK-15035
> 
> Best,
> Jingsong Lee
> 
> On Tue, Dec 3, 2019 at 6:07 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks for your valuable feedback. I have updated the "Example" section
>> describing how to use these options in a Python Table API program.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
>>> 
>>> Hi Dian:
>>> 
>>> Thanks for you explanation.
>>> If you can update the document to add explanation for the changes to the
>>> table layer,
>>> it might be better. (it's just a suggestion, it depends on you)
>>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
>>> Will this queue take up a lot of memory?
>>> Can it also occupy memory as large as buffer.memory?
>>> If so, what we're dealing with now is the silent use of heap memory?
>>> I feel a little strange, because the memory on the python side will
>> reserve,
>>> but the memory on the JVM side is used silently.
>>> 
>>> After carefully seeing your comments on Google doc:
>>>> The memory used by the Java operator is currently accounted as the task
>>> on-heap memory. We can revisit this if we find it's a problem in the
>> future.
>>> I agree that we can ignore it now, But we can add some content to the
>>> document to remind the user, What do you think?
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
>>> 
>>>> Hi Jingsong,
>>>> 
>>>> Thanks a lot for your comments. Please see my reply inlined below.
>>>> 
>>>>> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
>>>>> 
>>>>> Hi Dian:
>>>>> 
>>>>> 
>>>>> Thanks for your driving. I have some questions:
>>>>> 
>>>>> 
>>>>> - Where should these configurations belong? You have mentioned
>>>> tableApi/SQL,
>>>>> so should in TableConfig?
>>>> 
>>>> All Python related configurations are defined in PythonOptions. User
>> could
>>>> configure these configurations via TableConfig.getConfiguration.setXXX
>> for
>>>> Python Table API programs.
>>>> 
>>>>> 
>>>>> - If just in table/sql, whether it should be called: table.python.,
>>>>> because in table, all config options are called table.***.
>>>> 
>>>> These configurations are not table specific. They will be used for both
>>>> Python Table API programs and Python DataStream API programs (which is
>>>> planned to be supported in the future). So python.xxx seems more
>>>> appropriate, what do you think?
>>>> 
>>>>> - What should table module do? So in CommonPythonCalc, we should read
>>>>> options from table config, and set resources to OneInputTransformation?
>>>> 
>>>> As described in the design doc, in compilation phase, for batch jobs,
>> the
>>>> required memory of the Python worker will be calculated according to the
>>>> configuration and set as the managed memory for the operator. For stream
>>>> jobs, the resource spec will be unknown(The reason is that currently the
>>>> resources for all the operators in stream jobs are unknown and it
>> doesn’t
>>>> support to configure both known and unknown resources in a single job).
>>>> 
>>>>> - Are all buffer.memory off-heap memory? I took a look
>>>>> to AbstractPythonScalarFunctionOperator, there is a
>> forwardedInputQueue,
>>>> is
>>>>> this one a heap queue? So we need heap memory too?
>>>> 
>>>> Yes, they are all off-heap memory which is supposed to be used by the
>>>> Python process. The forwardedInputQueue is a buffer used in the Java
>>>> operator and its memory is accounted as the

Re: [VOTE] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Dian Fu

+1 (non-binding)

Regards,
Dian

> 在 2019年12月5日，上午11:11，Jark Wu  写道：
> 
> +1 (binding)
> 
> Best,
> Jark
> 
> On Thu, 5 Dec 2019 at 10:45, Wei Zhong  wrote:
> 
>> Hi all,
>> 
>> According to our previous discussion in [1], I'd like to bring up a vote
>> to apply the adjustment [2] to the command-line option design of FLIP-78
>> [3].
>> 
>> The vote will be open for at least 72 hours unless there is an objection
>> or not enough votes.
>> 
>> Best,
>> Wei
>> 
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-the-Pyflink-command-line-options-Adjustment-to-FLIP-78-td35440.html
>> [2]
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> [3]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> 
>>

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-04 Thread Dian Fu

+1 for dropping them. 

Just FYI: there was a similar discussion few months ago [1]. 

[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html#a29997
 


> 在 2019年12月5日，上午10:29，vino yang  写道：
> 
> +1
> 
> jincheng sun mailto:sunjincheng...@gmail.com>> 
> 于2019年12月5日周四 上午10:26写道：
> +1  for drop it, and Thanks for bring up this discussion Chesnay!
> 
> Best,
> Jincheng
> 
> Jark Wu mailto:imj...@gmail.com>> 于2019年12月5日周四 上午10:19写道：
> +1 for dropping, also cc'ed user mailing list.
> 
> 
> Best,
> Jark
> 
> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf  >
> wrote:
> 
> > Hi Chesnay,
> >
> > +1 for dropping. I have not heard from any user using 0.8 or 0.9 for a long
> > while.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Wed, Dec 4, 2019 at 1:57 PM Chesnay Schepler  > >
> > wrote:
> >
> > > Hello,
> > >
> > > What's everyone's take on dropping the Kafka 0.8/0.9 connectors from the
> > > Flink codebase?
> > >
> > > We haven't touched either of them for the 1.10 release, and it seems
> > > quite unlikely that we will do so in the future.
> > >
> > > We could finally close a number of test stability tickets that have been
> > > lingering for quite a while.
> > >
> > >
> > > Regards,
> > >
> > > Chesnay
> > >
> > >
> >
> > --
> >
> > Konstantin Knauf | Solutions Architect
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica  > >
> >
> >
> > --
> >
> > Join Flink Forward  > > - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Tony) Cheng
> >

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Dian Fu

Thanks for bringing up this discussion Wei. +1 for this proposal!

As these options are proposed in 1.10, it will be great if we can improve them 
in 1.10. Then it will not cause compatible issues.

Thanks,
Dian

> 在 2019年12月5日，上午10:01，jincheng sun  写道：
> 
> Hi all,
> 
> Thanks for the quick response Aljoscha & Wei !
> 
> It seems unify the options is necessary, and 1.10 will be code frozen. I
> would be like to bring up the VOTE thread for this change ASAP, and more
> detail can continue discuss in the PR.
> 
> What do you think?
> 
> Best,
> Jincheng
> 
> Aljoscha Krettek  于2019年12月4日周三 下午5:11写道：
> 
>> Perfect, thanks for the background info! I also found this section now,
>> which mentions that it comes from Hadoop:
>> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
>> 
>> I think the proposed changes are good!
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
>>> 
>>> Hi Aljoscha,
>>> 
>>> Thanks for your reply! Before bringing up this discussion I did some
>> research on commonly used separators for options that take multiple values.
>> I have considered ",", ":" and "#". Finally I chose "#" as the separator of
>> "--pyRequirements".
>>> 
>>> For ",", it is the most widely used separator. Many projects use it as
>> the separator of the values in same level. e.g. "-Dexcludes" in Maven,
>> "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
>> "--pyRequirements", the requirement cached directory, is not at the same
>> level as its first parameter (the requirements file). It is secondary and
>> is only needed when the packages in the requirements file can not be
>> downloaded from the package index server.
>>> 
>>> For ":", it is used as a path separator in most cases. e.g. main
>> arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java. But
>> as we support accept a URI as the file path, which contains ":" in most
>> cases, ":" can not be used as the separator of "--pyRequirements".
>>> 
>>> For "#", it is really rarely used as a separator for multiple values. I
>> only find Spark using "#" as the separator for option "--files" and
>> "--archives" between file path and target file/directory name. After some
>> research I find that this usage comes from the URI fragment. We can append
>> a secondary resource as the fragment of the URI after a number sign ("#")
>> character. As we treat user file paths as URIs when parsing command line,
>> using "#" as the separator of "--pyRequirements" makes sense to me, which
>> means the second parameter is the fragment of the first parameter. The
>> definition of URI fragment can be found here [1].
>>> 
>>> The reason of using "#" in "--pyArchives" as the separator of file path
>> and targer directory name is the same as above.
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1] https://tools.ietf.org/html/rfc3986#section-3.5
>>> 
 在 2019年12月3日，22:02，Aljoscha Krettek  写道：
 
 Hi,
 
 Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as
>> a separator for options that take two values seems a bit strange to me, did
>> you research if any other CLI tools have this convention?
 
 Side note: I don’t like that our options use camel-case, I think that’s
>> very non-standard. But that’s how it is now…
 
 Best,
 Aljoscha
 
> On 3. Dec 2019, at 10:14, jincheng sun 
>> wrote:
> 
> Thanks for bringup this discussion Wei!
> I think this is very important for Flink User, we should contains this
> changes in Flink 1.10.
> +1  for the optimization from the perspective of user convenience and
>> the
> unified use of Flink command line parameters.
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年12月2日周一 下午3:26写道：
> 
>> Hi everyone,
>> 
>> I wanted to bring up the discussion of improving the Pyflink command
>> line
>> options.
>> 
>> A few command line options have been introduced in the FLIP-78 [1],
>> i.e.
>> "python-executable-path", "python-requirements","python-archive", etc.
>> There are a few problems with these options, i.e. the naming style,
>> variable argument options, etc.
>> 
>> We want to make some adjustment of FLIP-78 to improve the newly
>> introduced
>> command line options, here is the design doc:
>> 
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> <
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> Wei
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> 
>> 
 
>>> 
>> 
>>

Re: [DISCUSS] Voting from apache.org addresses

2019-12-04 Thread Dian Fu

Hi Dawid,

Thanks for the reply. Counting all the votes from non apache addresses as 
non-binding makes sense. Just as Jark mentioned, we can always remind the 
committer/PMC to vote again to use the apache address if necessary (i.e. when 
the number of binding votes is not enough).

Thanks,
Dian

> 在 2019年12月4日，下午7:27，Kurt Young  写道：
> 
> +1 (from my apache email ;-))
> 
> Best,
> Kurt
> 
> 
> On Wed, Dec 4, 2019 at 7:22 PM Jark Wu  wrote:
> 
>> I'm +1 on this proposal.
>> 
>> Regarding to the case that Dian mentioned, we can reminder the
>> committer/PMC to vote again use the apache email,
>> and of course the non-apache vote is counted as non-binding.
>> 
>> Best,
>> Jark
>> 
>> On Wed, 4 Dec 2019 at 17:33, Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi Dian,
>>> 
>>> I don't want to be very strict, but I think it should be counted as
>>> non-binding, if it comes from non apache address, yes.
>>> 
>>> Anybody should be able to verify a vote. Moreover I think this the only
>>> way to "encourage" all committers to use their apache addresses ;)
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 04/12/2019 10:26, Dian Fu wrote:
>>>> Thanks for your explanation Dawid! It makes sense to me now. +1.
>>>> 
>>>> Just one minor question: Does this mean that if a committer/PMC
>>> accidentally votes using the non apache email, even if the person who
>>> summarizes the votes clearly KNOWS who he/she is, that vote will still be
>>> counted as non-binding?
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 在 2019年12月4日，下午5:13，Aljoscha Krettek  写道：
>>>>> 
>>>>> Very sensible! +1
>>>>> 
>>>>>> On 4. Dec 2019, at 10:02, Chesnay Schepler 
>> wrote:
>>>>>> 
>>>>>> I believe this to be a sensible approach by Dawid; +1.
>>>>>> 
>>>>>> On 04/12/2019 09:04, Dawid Wysakowicz wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Sorry I think I was not clear enough on my initial e-mail. Let me
>>> first clarify two things and later on try to rephrase my initial
>> suggestion.
>>>>>>> 
>>>>>>> 1. I do not want to count all votes from @apache.org addresses as
>>> binding
>>>>>>> 2. I do not want to discourage people that do not have @apache.org
>>>>>>>  address from voting
>>>>>>> 3. What I said does not change anything for non-committers/non-PMCs
>>>>>>> 
>>>>>>> What I meant is that if you are a committer/PMC please use an
>>> apache.org address because then the person that summarizes the votes can
>>> check in the apache directory if a person with that address is a
>>> committer/PMC in flink project. Otherwise if a committer uses a different
>>> address there is no way to check if that person is a committer/PMC or
>> not.
>>> It does not mean though that if you vote from apache.org this vote is
>>> automatically binding. It just allows us to check if it is.
>>>>>>> 
>>>>>>> To elaborate on Xuefu's example. It's absolutely fine for you to use
>>> an apache address for voting. I will still check if you are a committer
>> or
>>> not. But take me (or any other committer) for example. If I use my
>>> non-apache address for a vote and the person verifying the vote does not
>>> know me and my address, it is not easy for that person to verify if I am
>> a
>>> committer or not.
>>>>>>> 
>>>>>>> Also it does not mean that other people are not allowed to vote. You
>>> can vote from other addresses, but those votes will be counted as
>>> non-binding. This does not change anything for non-committers/non-PMC.
>>> However if you are a committer and vote from non apache address your vote
>>> will be non-binding, because we cannot verify you are indeed a committer
>>> (we might don't know your other address).
>>>>>>> 
>>>>>>> I agree the additional information (binding, non-binding) in a vote
>>> helps, but it still should be verified. People make mistakes.
>>>>>>> 
>>>>>>> I hope this clears it up a bit.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> D

Re: [DISCUSS] Voting from apache.org addresses

2019-12-04 Thread Dian Fu

Thanks for your explanation Dawid! It makes sense to me now. +1.

Just one minor question: Does this mean that if a committer/PMC accidentally 
votes using the non apache email, even if the person who summarizes the votes 
clearly KNOWS who he/she is, that vote will still be counted as non-binding?

Regards,
Dian

> 在 2019年12月4日，下午5:13，Aljoscha Krettek  写道：
> 
> Very sensible! +1
> 
>> On 4. Dec 2019, at 10:02, Chesnay Schepler  wrote:
>> 
>> I believe this to be a sensible approach by Dawid; +1.
>> 
>> On 04/12/2019 09:04, Dawid Wysakowicz wrote:
>>> 
>>> Hi all,
>>> 
>>> Sorry I think I was not clear enough on my initial e-mail. Let me first 
>>> clarify two things and later on try to rephrase my initial suggestion.
>>> 
>>> 1. I do not want to count all votes from @apache.org addresses as binding
>>> 2. I do not want to discourage people that do not have @apache.org
>>>   address from voting
>>> 3. What I said does not change anything for non-committers/non-PMCs
>>> 
>>> What I meant is that if you are a committer/PMC please use an apache.org 
>>> address because then the person that summarizes the votes can check in the 
>>> apache directory if a person with that address is a committer/PMC in flink 
>>> project. Otherwise if a committer uses a different address there is no way 
>>> to check if that person is a committer/PMC or not. It does not mean though 
>>> that if you vote from apache.org this vote is automatically binding. It 
>>> just allows us to check if it is.
>>> 
>>> To elaborate on Xuefu's example. It's absolutely fine for you to use an 
>>> apache address for voting. I will still check if you are a committer or 
>>> not. But take me (or any other committer) for example. If I use my 
>>> non-apache address for a vote and the person verifying the vote does not 
>>> know me and my address, it is not easy for that person to verify if I am a 
>>> committer or not.
>>> 
>>> Also it does not mean that other people are not allowed to vote. You can 
>>> vote from other addresses, but those votes will be counted as non-binding. 
>>> This does not change anything for non-committers/non-PMC. However if you 
>>> are a committer and vote from non apache address your vote will be 
>>> non-binding, because we cannot verify you are indeed a committer (we might 
>>> don't know your other address).
>>> 
>>> I agree the additional information (binding, non-binding) in a vote helps, 
>>> but it still should be verified. People make mistakes.
>>> 
>>> I hope this clears it up a bit.
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 04/12/2019 04:58, Dian Fu wrote:
>>>> Thanks Dawid for start this discussion.
>>>> 
>>>> I have the same feeling with Xuefu and Jingsong. Besides that, according 
>>>> to the bylaws, for some kinds of votes, only the votes from active PMC 
>>>> members are binding, such as product release. So an email address doesn't 
>>>> help here. Even if a vote is from a Flink committer, it is still 
>>>> non-binding.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年12月4日，上午10:37，Jingsong Lee  写道：
>>>>> 
>>>>> Thanks Dawid for driving this discussion.
>>>>> 
>>>>> +1 to Xuefu's viewpoint.
>>>>> I am not a Flink committer, but sometimes I use apache email address to
>>>>> send email.
>>>>> 
>>>>> Another way is that we require the binding ticket to must contain 
>>>>> "binding".
>>>>> Otherwise it must be a "non-binding" ticket.
>>>>> In this way, we can let lazy people continue voting without any suffix 
>>>>> too.
>>>>> 
>>>>> Best,
>>>>> Jingsong Lee
>>>>> 
>>>>> On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:
>>>>> 
>>>>>> Hi Dawid,
>>>>>> 
>>>>>> Thanks for initiating this discussion. I understand the problem you
>>>>>> described, but the solution might not work as having an apache.org email
>>>>>> address doesn't necessary mean it's from a Flink committer. This 
>>>>>> certainly
>>>>>> applies to me.
>>>>>> 
>>>>>> It probably helps for the voters to identify themselves by specifying
>>>>>> either "binding" or "non-binding", though I understand this cannot be
>>>>>> enforced but serves a general guideline.
>>>>>> 
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>> 
>>>>>> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I wanted to reach out primarily to the Flink's committers. I think
>>>>>>> whenever we cast a vote on a proposal, is it a FLIP, release candidate
>>>>>>> or any other proposal, we should use our apache.org email address.
>>>>>>> 
>>>>>>> It is not an easy task to check if a person voting is a committer/PMC if
>>>>>>> we do not work with him/her on a daily basis. This is important for
>>>>>>> verifying if a vote is binding or not.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dawid
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> --
>>>>>> Xuefu Zhang
>>>>>> 
>>>>>> "In Honey We Trust!"
>>>>>> 
>>>>> -- 
>>>>> Best, Jingsong Lee
>> 
>> 
>

Re: [DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Dian Fu

Thanks Dawid for start this discussion.

I have the same feeling with Xuefu and Jingsong. Besides that, according to the 
bylaws, for some kinds of votes, only the votes from active PMC members are 
binding, such as product release. So an email address doesn't help here. Even 
if a vote is from a Flink committer, it is still non-binding.

Thanks,
Dian

> 在 2019年12月4日，上午10:37，Jingsong Lee  写道：
> 
> Thanks Dawid for driving this discussion.
> 
> +1 to Xuefu's viewpoint.
> I am not a Flink committer, but sometimes I use apache email address to
> send email.
> 
> Another way is that we require the binding ticket to must contain "binding".
> Otherwise it must be a "non-binding" ticket.
> In this way, we can let lazy people continue voting without any suffix too.
> 
> Best,
> Jingsong Lee
> 
> On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for initiating this discussion. I understand the problem you
>> described, but the solution might not work as having an apache.org email
>> address doesn't necessary mean it's from a Flink committer. This certainly
>> applies to me.
>> 
>> It probably helps for the voters to identify themselves by specifying
>> either "binding" or "non-binding", though I understand this cannot be
>> enforced but serves a general guideline.
>> 
>> Thanks,
>> Xuefu
>> 
>> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I wanted to reach out primarily to the Flink's committers. I think
>>> whenever we cast a vote on a proposal, is it a FLIP, release candidate
>>> or any other proposal, we should use our apache.org email address.
>>> 
>>> It is not an easy task to check if a person voting is a committer/PMC if
>>> we do not work with him/her on a daily basis. This is important for
>>> verifying if a vote is binding or not.
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> 
>>> 
>> 
>> --
>> Xuefu Zhang
>> 
>> "In Honey We Trust!"
>> 
> 
> 
> -- 
> Best, Jingsong Lee

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Hi Becket,

Thanks for the kind remind. Definitely agree with you. I have updated the 
progress of this vote on the discussion thread[1] and submitted a PR which 
updates the flink website on how to report security issues.

Thanks,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
 
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951>
> 在 2019年12月4日，上午7:29，Becket Qin  写道：
> 
> Hi Dian,
> 
> Thanks for driving the effort regardless.
> 
> Even if we don't setup a security@f.a.o ML for Flink, we probably should
> have a clear pointer to the ASF guideline and secur...@apache.org in the
> project website. I think many people are not aware of the
> secur...@apache.org address. If they failed to find information in the
> Flink site, they will simply assume there is no special procedure for
> security problems.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Tue, Dec 3, 2019 at 4:54 PM Dian Fu  wrote:
> 
>> Hi all,
>> 
>> Thanks everyone for participating this vote. As we have received only two
>> +1 and there is also one -1 for this vote, according to the bylaws, I'm
>> sorry to announce that this proposal was rejected.
>> 
>> Neverthless, I think we can always restart the discussion in the future if
>> we see more evidence that such a mailing list is necessary.
>> 
>> Thanks,
>> Dian
>> 
>> 
>>> 在 2019年12月3日，下午4:53，Dian Fu  写道：
>>> 
>>> Actually I have tried to find out the reason why so many apache projects
>> choose to set up a project specific security mailing list in case that the
>> general secur...@apache.org mailing list seems working well.
>> Unfortunately, there is no open discussions in these projects and there is
>> also no clear guideline/standard in the ASF site whether a project should
>> set up such a mailing list (The project specific security mailing list
>> seems only an optional and we noticed that at the beginning of the
>> discussion). This is also one of the main reasons we start such a
>> discussion to see if somebody has more thoughts about this.
>>> 
>>>> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
>>>> 
>>>> Would security@f.a.o work as any other private ML?
>>>> 
>>>> Contrary to what Becket said in the discussion thread,
>> secur...@apache.org is not just "another hop"; it provides guiding
>> material, the security team checks for activity and can be pinged easily as
>> they are cc'd in the initial report.
>>>> 
>>>> I vastly prefer this over a separate mailing list; if these benefits
>> don't apply to security@f.a.o I'm -1 on this.
>>>> 
>>>> On 02/12/2019 02:28, Becket Qin wrote:
>>>>> Thanks for driving this, Dian.
>>>>> 
>>>>> +1 from me, for the reasons I mentioned in the discussion thread.
>>>>> 
>>>>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu 
>> wrote:
>>>>> 
>>>>>> NOTE: Only PMC votes is binding.
>>>>>> 
>>>>>> Thanks for sharing your thoughts. I also think that this doesn't fall
>> into
>>>>>> any of the existing categories listed in the bylaws. Maybe we could
>> do some
>>>>>> improvements for the bylaws.
>>>>>> 
>>>>>> This is not codebase change as Robert mentioned and it's related to
>> how to
>>>>>> manage Flink's development in a good way. So, I agree with Robert and
>>>>>> Jincheng that this VOTE should only count PMC votes for now.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>>>>> 
>>>>>>> I also think that we should only count PMC votes.
>>>>>>> 
>>>>>>> This ML is to improve the security mechanism for Flink. Of course we
>>>>>> don't
>>>>>>> expect to use this
>>>>>>> ML often. I hope that it's perfect if this ML is never used.
>> However, the
>>>>>>> Flink community is growing rapidly, it's better to
>>>>>>> make our security mechanism as convenient as possible. But I agree
>> that
>>>>>>> this ML is not

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-12-03 Thread Dian Fu

Hi all,

Just sync the results of the vote for setup a mailing list security@f.a.o
that it has been rejected [1].

Another very important thing is that all the people agree that there should
be a guideline on how to report security issues in Flink website. Do you
think we should bring up a separate discussion/vote thread? If so, I will
do that. Personally I think that discussing on the PR is enough. What do
you think?

I have created a PR [2]. Appreciate if you can take a look at.

Regards,
Dian

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Setup-a-security-flink-apache-org-mailing-list-tt35205.html
[2] https://github.com/apache/flink-web/pull/287

On Thu, Nov 21, 2019 at 3:58 PM Dian Fu  wrote:

> Hi all,
>
> There are no new feedbacks and it seems that we have received enough
> feedback about setup a secur...@flink.apache.org mailing list[1] for
> security report and discussion. It shows that it's optional as we can use
> either secur...@flink.apache.org or secur...@apache.org. So I'd like to
> start the vote for setup a secur...@flink.apache.org mailing list to make
> the final decision.
>
> Thanks,
> Dian
>
> 在 2019年11月19日，下午6:06，Dian Fu  写道：
>
> Hi all,
>
> Thanks for sharing your thoughts. Appreciated! Let me try to summarize the
> information and thoughts received so far. Please feel free to let me know
> if there is anything wrong or missing.
>
> 1. Setup project specific security mailing list
> Pros:
> - The security reports received by secur...@apache.org will be forwarded
> to the project private(PMC) mailing list. Having a project specific
> security mailing list is helpful in cases when the best person to address
> the security issue is not a PMC member, but a committer. It makes things
> simple as everyone(both PMCs and committers) is on the same table.
> - Even though the security issues are usually rare, they could be
> devastating and thus need to be treated seriously.
> - Most notable apache projects such as apache common, hadoop, spark,
> kafka, hive, etc have a security specific mailing list.
>
> Cons:
> - The ASF security mailing list secur...@apache.org could be used if
> there is no project specific security mailing list.
> - The number of security reports is very low.
>
> Additional information:
> - Security mailing list could only be subscribed by PMCs and committers.
> However everyone could report security issues to the security mailing list.
>
>
> 2. Guide users to report the security issues
> Why:
> - Security vulnerabilities should not be publicly disclosed (e.g. via dev
> ML or JIRA) until the project has responded. We should guide users on how
> to report security issues in Flink website.
>
> How:
> - Option 1: Set up secur...@flink.apache.org and ask users to report
> security issues there
> - Option 2: Ask users to send security report to secur...@apache.org
> - Option 3: Ask users to send security report directly to
> priv...@flink.apache.org
>
>
> 3. Dedicated page to show the security vulnerabilities
> - We may need a dedicated security page to describe the CVE list on the
> Flink website.
>
> I think it makes sense to open separate discussion thread on 2) and 3).
> I'll create separate discussion thread for them. Let's focus on 1) in this
> thread.
>
> If there is no other feedback on 1), I'll bring up a VOTE for this
> discussion.
>
> What do you think?
>
> Thanks,
> Dian
>
> On Fri, Nov 15, 2019 at 10:18 AM Becket Qin  wrote:
>
>> Thanks for bringing this up, Dian.
>>
>> +1 on creating a project specific security mailing list. My two cents, I
>> think it is worth doing in practice.
>>
>> Although the ASF security ML is always available, usually all the emails
>> are simply routed to the individual project PMC. This is an additional
>> hop.
>> And in some cases, the best person to address the reported issue may not
>> be
>> a PMC member, but a committer, so the PMC have to again involve them into
>> the loop. This make things unnecessarily complicated. Having a project
>> specific security ML would make it much easier to have everyone at the
>> same
>> table.
>>
>> Also, one thing to note is that even though the security issues are
>> usually
>> rare, they could be devastating, thus need to be treated seriously. So I
>> think it is a good idea to establish the handling mechanism regardless of
>> the frequency of the reported security vulnerabilities.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Nov 15, 2019 at 1:14 AM Yu Li  wrote:
>>
>> > Thanks for bringing up this discussion Dian! How to report security
>> bugs to
>>

[jira] [Created] (FLINK-15046) Add guideline on how to report security issues

2019-12-03 Thread Dian Fu (Jira)

Dian Fu created FLINK-15046:
---

 Summary: Add guideline on how to report security issues
 Key: FLINK-15046
 URL: https://issues.apache.org/jira/browse/FLINK-15046
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
 Environment: As discussed in the 
[ML|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951]
 , there should be a guideline on how to report security issues in Flink 
website.
Reporter: Dian Fu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread Dian Fu

+1 to remove them. It seems that we should also drop the class Option as it's 
currently only used in RequiredParameters.

> 在 2019年12月3日，下午8:34，Robert Metzger  写道：
> 
> +1 on removing it.
> 
> On Tue, Dec 3, 2019 at 12:31 PM Stephan Ewen  > wrote:
> I just stumbled across these classes recently and was looking for sample uses.
> No examples and other tests in the code base seem to use RequiredParameters 
> and OptionType.
> 
> They also seem quite redundant with how ParameterTool itself works 
> (tool.getRequired()).
> 
> Should we drop them, in an attempt to reduce unnecessary code and confusion 
> for users (multiple ways to do the same thing)? There are also many better 
> command line parsing libraries out there, this seems like something we don't 
> need to solve in Flink.
> 
> Best,
> Stephan

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-03 Thread Dian Fu

Hi Jingsong,

Thanks for your valuable feedback. I have updated the "Example" section 
describing how to use these options in a Python Table API program.

Thanks,
Dian

> 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
> 
> Hi Dian:
> 
> Thanks for you explanation.
> If you can update the document to add explanation for the changes to the
> table layer,
> it might be better. (it's just a suggestion, it depends on you)
> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> Will this queue take up a lot of memory?
> Can it also occupy memory as large as buffer.memory?
> If so, what we're dealing with now is the silent use of heap memory?
> I feel a little strange, because the memory on the python side will reserve,
> but the memory on the JVM side is used silently.
> 
> After carefully seeing your comments on Google doc:
>> The memory used by the Java operator is currently accounted as the task
> on-heap memory. We can revisit this if we find it's a problem in the future.
> I agree that we can ignore it now, But we can add some content to the
> document to remind the user, What do you think?
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks a lot for your comments. Please see my reply inlined below.
>> 
>>> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
>>> 
>>> Hi Dian:
>>> 
>>> 
>>> Thanks for your driving. I have some questions:
>>> 
>>> 
>>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL,
>>> so should in TableConfig?
>> 
>> All Python related configurations are defined in PythonOptions. User could
>> configure these configurations via TableConfig.getConfiguration.setXXX for
>> Python Table API programs.
>> 
>>> 
>>> - If just in table/sql, whether it should be called: table.python.,
>>> because in table, all config options are called table.***.
>> 
>> These configurations are not table specific. They will be used for both
>> Python Table API programs and Python DataStream API programs (which is
>> planned to be supported in the future). So python.xxx seems more
>> appropriate, what do you think?
>> 
>>> - What should table module do? So in CommonPythonCalc, we should read
>>> options from table config, and set resources to OneInputTransformation?
>> 
>> As described in the design doc, in compilation phase, for batch jobs, the
>> required memory of the Python worker will be calculated according to the
>> configuration and set as the managed memory for the operator. For stream
>> jobs, the resource spec will be unknown(The reason is that currently the
>> resources for all the operators in stream jobs are unknown and it doesn’t
>> support to configure both known and unknown resources in a single job).
>> 
>>> - Are all buffer.memory off-heap memory? I took a look
>>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue,
>> is
>>> this one a heap queue? So we need heap memory too?
>> 
>> Yes, they are all off-heap memory which is supposed to be used by the
>> Python process. The forwardedInputQueue is a buffer used in the Java
>> operator and its memory is accounted as the on-heap memory.
>> 
>> Regards,
>> Dian
>> 
>>> 
>>> Hope to get your reply.
>>> 
>>> 
>>> Best,
>>> 
>>> Jingsong Lee
>>> 
>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu  wrote:
>>> 
>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>>>> offline and also on the design doc.
>>>> 
>>>> It seems that we have reached consensus on the design. I would bring up
>>>> the VOTE if there is no other feedbacks.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
>>>>> 
>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>>>> It is great to make sure that the resources used by the Python process
>>>> are
>>>>> managed properly by Flink’s resource management framework.
>>>>> 
>>>>> Also, thanks to the guys that working on the unified memory management
>>>>> framework.
>>>>> 
>>>>> Best, Hequn
>>>>> 
>>>>> 
>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
>>>>> 
>>>>>> Thanks for drivin

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Hi all,

Thanks everyone for participating this vote. As we have received only two +1 
and there is also one -1 for this vote, according to the bylaws, I'm sorry to 
announce that this proposal was rejected. 

Neverthless, I think we can always restart the discussion in the future if we 
see more evidence that such a mailing list is necessary.

Thanks,
Dian


> 在 2019年12月3日，下午4:53，Dian Fu  写道：
> 
> Actually I have tried to find out the reason why so many apache projects 
> choose to set up a project specific security mailing list in case that the 
> general secur...@apache.org mailing list seems working well. Unfortunately, 
> there is no open discussions in these projects and there is also no clear 
> guideline/standard in the ASF site whether a project should set up such a 
> mailing list (The project specific security mailing list seems only an 
> optional and we noticed that at the beginning of the discussion). This is 
> also one of the main reasons we start such a discussion to see if somebody 
> has more thoughts about this.
> 
>> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
>> 
>> Would security@f.a.o work as any other private ML?
>> 
>> Contrary to what Becket said in the discussion thread, secur...@apache.org 
>> is not just "another hop"; it provides guiding material, the security team 
>> checks for activity and can be pinged easily as they are cc'd in the initial 
>> report.
>> 
>> I vastly prefer this over a separate mailing list; if these benefits don't 
>> apply to security@f.a.o I'm -1 on this.
>> 
>> On 02/12/2019 02:28, Becket Qin wrote:
>>> Thanks for driving this, Dian.
>>> 
>>> +1 from me, for the reasons I mentioned in the discussion thread.
>>> 
>>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>>> 
>>>> NOTE: Only PMC votes is binding.
>>>> 
>>>> Thanks for sharing your thoughts. I also think that this doesn't fall into
>>>> any of the existing categories listed in the bylaws. Maybe we could do some
>>>> improvements for the bylaws.
>>>> 
>>>> This is not codebase change as Robert mentioned and it's related to how to
>>>> manage Flink's development in a good way. So, I agree with Robert and
>>>> Jincheng that this VOTE should only count PMC votes for now.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>>> 
>>>>> I also think that we should only count PMC votes.
>>>>> 
>>>>> This ML is to improve the security mechanism for Flink. Of course we
>>>> don't
>>>>> expect to use this
>>>>> ML often. I hope that it's perfect if this ML is never used. However, the
>>>>> Flink community is growing rapidly, it's better to
>>>>> make our security mechanism as convenient as possible. But I agree that
>>>>> this ML is not a must to have, it's nice to have.
>>>>> 
>>>>> So, I give the vote as +1(binding).
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> Robert Metzger  于2019年11月25日周一 下午9:45写道：
>>>>> 
>>>>>> I agree that we are only counting PMC votes (because this decision goes
>>>>>> beyond the codebase)
>>>>>> 
>>>>>> I'm undecided what to vote :) I'm not against setting up a new mailing
>>>>>> list, but I also don't think the benefit (having a private list with
>>>> PMC +
>>>>>> committers) is enough to justify the work involved. As far as I
>>>> remember,
>>>>>> we have received 2 security issue notices, both basically about the same
>>>>>> issue.  I'll leave it to other PMC members to support this if they want
>>>> to
>>>>>> ...
>>>>>> 
>>>>>> 
>>>>>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>>>> dwysakow...@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> What is the voting scheme for it? I am not sure if it falls into any of
>>>>>>> the categories we have listed in our bylaws. Are committers votes
>>>>>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
>>>> a
>>>>>>> binding vote or just an informational vote?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dawid
>>>>>>> 
>>>>>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> According to our previous discussion in [1], I'd like to bring up a
>>>>>> vote
>>>>>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>>>>>> 
>>>>>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>>>>>> try
>>>>>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>>>>>> not
>>>>>>>>> enough votes.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Dian
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>>>>>> 
>>>> 
>> 
>

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Actually I have tried to find out the reason why so many apache projects choose 
to set up a project specific security mailing list in case that the general 
secur...@apache.org mailing list seems working well. Unfortunately, there is no 
open discussions in these projects and there is also no clear 
guideline/standard in the ASF site whether a project should set up such a 
mailing list (The project specific security mailing list seems only an optional 
and we noticed that at the beginning of the discussion). This is also one of 
the main reasons we start such a discussion to see if somebody has more 
thoughts about this.

> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
> 
> Would security@f.a.o work as any other private ML?
> 
> Contrary to what Becket said in the discussion thread, secur...@apache.org is 
> not just "another hop"; it provides guiding material, the security team 
> checks for activity and can be pinged easily as they are cc'd in the initial 
> report.
> 
> I vastly prefer this over a separate mailing list; if these benefits don't 
> apply to security@f.a.o I'm -1 on this.
> 
> On 02/12/2019 02:28, Becket Qin wrote:
>> Thanks for driving this, Dian.
>> 
>> +1 from me, for the reasons I mentioned in the discussion thread.
>> 
>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>> 
>>> NOTE: Only PMC votes is binding.
>>> 
>>> Thanks for sharing your thoughts. I also think that this doesn't fall into
>>> any of the existing categories listed in the bylaws. Maybe we could do some
>>> improvements for the bylaws.
>>> 
>>> This is not codebase change as Robert mentioned and it's related to how to
>>> manage Flink's development in a good way. So, I agree with Robert and
>>> Jincheng that this VOTE should only count PMC votes for now.
>>> 
>>> Thanks,
>>> Dian
>>> 
>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>> 
>>>> I also think that we should only count PMC votes.
>>>> 
>>>> This ML is to improve the security mechanism for Flink. Of course we
>>> don't
>>>> expect to use this
>>>> ML often. I hope that it's perfect if this ML is never used. However, the
>>>> Flink community is growing rapidly, it's better to
>>>> make our security mechanism as convenient as possible. But I agree that
>>>> this ML is not a must to have, it's nice to have.
>>>> 
>>>> So, I give the vote as +1(binding).
>>>> 
>>>> Best,
>>>> Jincheng
>>>> 
>>>> Robert Metzger  于2019年11月25日周一 下午9:45写道：
>>>> 
>>>>> I agree that we are only counting PMC votes (because this decision goes
>>>>> beyond the codebase)
>>>>> 
>>>>> I'm undecided what to vote :) I'm not against setting up a new mailing
>>>>> list, but I also don't think the benefit (having a private list with
>>> PMC +
>>>>> committers) is enough to justify the work involved. As far as I
>>> remember,
>>>>> we have received 2 security issue notices, both basically about the same
>>>>> issue.  I'll leave it to other PMC members to support this if they want
>>> to
>>>>> ...
>>>>> 
>>>>> 
>>>>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>>> dwysakow...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> What is the voting scheme for it? I am not sure if it falls into any of
>>>>>> the categories we have listed in our bylaws. Are committers votes
>>>>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
>>> a
>>>>>> binding vote or just an informational vote?
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Dawid
>>>>>> 
>>>>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>>>>> +1
>>>>>>> 
>>>>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> According to our previous discussion in [1], I'd like to bring up a
>>>>> vote
>>>>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>>>>> 
>>>>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>>>>> try
>>>>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>>>>> not
>>>>>>>> enough votes.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Dian
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>>>>> 
>>> 
>

Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-02 Thread Dian Fu

Hi Jingsong,

It's fine. :)  Appreciated the comments! 

I have replied you in the discussion thread as I also think it's better to 
discuss these in the discussion thread.

Thanks,
Dian

> 在 2019年12月2日，下午3:47，Jingsong Li  写道：
> 
> Sorry for bothering your voting.
> Let's discuss in discussion thread.
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee  wrote:
> 
>> Hi Dian:
>> 
>> Thanks for your driving. I have some questions:
>> 
>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL, so should in TableConfig?
>> - If just in table/sql, whether it should be called: table.python.,
>> because in table, all config options are called table.***.
>> - What should table module do? So in CommonPythonCalc, we should read
>> options from table config, and set resources to OneInputTransformation?
>> - Are all buffer.memory off-heap memory? I took a look
>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
>> this one a heap queue? So we need heap memory too?
>> 
>> Hope to get your reply.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu  wrote:
>> 
>>> Hi all,
>>> 
>>> I'd like to start the vote of FLIP-88 [1] since that we have reached an
>>> agreement on the design in the discussion thread [2].
>>> 
>>> This vote will be open for at least 72 hours. Unless there is an
>>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we have
>>> received sufficient votes.
>>> 
>>> Regards,
>>> Dian
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
>> 
>> 
>> 
>> --
>> Best, Jingsong Lee
>> 
> 
> 
> -- 
> Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-02 Thread Dian Fu

Hi Jingsong,

Thanks a lot for your comments. Please see my reply inlined below.

> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
> 
> Hi Dian:
> 
> 
> Thanks for your driving. I have some questions:
> 
> 
> - Where should these configurations belong? You have mentioned tableApi/SQL,
> so should in TableConfig?

All Python related configurations are defined in PythonOptions. User could 
configure these configurations via TableConfig.getConfiguration.setXXX for 
Python Table API programs.

> 
> - If just in table/sql, whether it should be called: table.python.,
> because in table, all config options are called table.***.

These configurations are not table specific. They will be used for both Python 
Table API programs and Python DataStream API programs (which is planned to be 
supported in the future). So python.xxx seems more appropriate, what do you 
think?

> - What should table module do? So in CommonPythonCalc, we should read
> options from table config, and set resources to OneInputTransformation?

As described in the design doc, in compilation phase, for batch jobs, the 
required memory of the Python worker will be calculated according to the 
configuration and set as the managed memory for the operator. For stream jobs, 
the resource spec will be unknown(The reason is that currently the resources 
for all the operators in stream jobs are unknown and it doesn’t support to 
configure both known and unknown resources in a single job).

> - Are all buffer.memory off-heap memory? I took a look
> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
> this one a heap queue? So we need heap memory too?

Yes, they are all off-heap memory which is supposed to be used by the Python 
process. The forwardedInputQueue is a buffer used in the Java operator and its 
memory is accounted as the on-heap memory.

Regards,
Dian

> 
> Hope to get your reply.
> 
> 
> Best,
> 
> Jingsong Lee
> 
> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu  wrote:
> 
>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>> offline and also on the design doc.
>> 
>> It seems that we have reached consensus on the design. I would bring up
>> the VOTE if there is no other feedbacks.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
>>> 
>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>> It is great to make sure that the resources used by the Python process
>> are
>>> managed properly by Flink’s resource management framework.
>>> 
>>> Also, thanks to the guys that working on the unified memory management
>>> framework.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
>>> 
>>>> Thanks for driving this discussion, Dian!
>>>> 
>>>> +1 for this proposal. It will help to reduce container failure due to
>>>> the memory overuse.
>>>> Some comments left in the design doc.
>>>> 
>>>> Best,
>>>> Yangze Guo
>>>> 
>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song 
>>>> wrote:
>>>>> 
>>>>> Sorry for the late reply.
>>>>> 
>>>>> +1 for the general proposal.
>>>>> 
>>>>> And one remainder, to use UNKNOWN resource requirement, we need to make
>>>>> sure optimizer knowns which operators use off-heap managed memory, and
>>>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>> works
>>>> on
>>>>> this part.
>>>>> 
>>>>> Thank you~
>>>>> 
>>>>> Xintong Song
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>> 
>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu 
>> wrote:
>>>>> 
>>>>>> Hi Jincheng,
>>>>>> 
>>>>>> Thanks for the reply and also looking forward to the feedback from the
>>>>>> community.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun  写道：
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> +1， Thanks for bring up this discussion Dian!

[VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-01 Thread Dian Fu

Hi all,

I'd like to start the vote of FLIP-88 [1] since that we have reached an 
agreement on the design in the discussion thread [2]. 

This vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Dec 5, 2019 08:00 UTC if we have received sufficient 
votes.

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html

Re: [VOTE] Release 1.8.3, release candidate #1

2019-11-29 Thread Dian Fu

Hi Hequn,

Thanks a lot for the quick update.

+1 (non-binding)

- Verified signatures and checksums
- Build from source without tests with Scala 2.11 and Scala 2.12
- Start local cluster and WebUI is accessible
- Submit wordcount example of both batch and stream, there is no suspicious log 
output
- Run a couple of tests in IDE
- Verified that all POM files point to the right version
- The release note and announcement PR looks good

Regards,
Dian

> 在 2019年11月29日，下午4:20，Hequn Cheng  写道：
> 
> Hi Dian,
> 
> Thanks a lot for the review and valuable feedback! I have addressed the
> JIRA problems and updated the website PR.
> It would be great if you can take another look.
> 
> Best,
> Hequn
> 
> On Fri, Nov 29, 2019 at 2:21 PM Dian Fu  wrote:
> 
>> Hi Hequn,
>> 
>> Thanks a lot for the great work!
>> 
>> I found the following minor issues:
>> 1. The announcement PR for website has one minor issue. (Have left
>> comments on the PR)
>> 2. The following JIRAs are included 1.8.3, but the fix version are not
>> updated and so not reflected in the release note:
>>   https://issues.apache.org/jira/browse/FLINK-14235 (
>> https://github.com/apache/flink/tree/e0387a8007707ab29795e3aa3794ad279eaaeaf9
>> )
>>   https://issues.apache.org/jira/browse/FLINK-14370 (
>> https://github.com/apache/flink/tree/cf7509b91888e4c6b64eb514fbb62af49533e0f0
>> )
>> 3. The following JIRA is not included in 1.8.3, but the fix version is
>> marked as fixed in 1.8.3 and so was included in the release note:
>>   https://issues.apache.org/jira/browse/FLINK-10377
>> 
>> Regards,
>> Dian
>> 
>>> 在 2019年11月28日，下午10:22，Hequn Cheng  写道：
>>> 
>>> Hi everyone,
>>> 
>>> Please review and vote on the release candidate #1 for the version 1.8.3,
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release and binary convenience releases to
>> be
>>> deployed to dist.apache.org [2], which are signed with the key with
>>> fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "release-1.8.3-rc1" [5],
>>> * website pull request listing the new release and adding announcement
>> blog
>>> post [6].
>>> 
>>> The vote will be open for at least 72 hours.
>>> Please cast your votes before *Dec. 3rd 2019, 16:00 UTC*.
>>> 
>>> It is adopted by majority approval, with at least 3 PMC affirmative
>> votes.
>>> 
>>> Thanks,
>>> Hequn
>>> 
>>> [1]
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346112
>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc1/
>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>> [4]
>> https://repository.apache.org/content/repositories/orgapacheflink-1310
>>> [5]
>>> 
>> https://github.com/apache/flink/commit/cf92a34e3465d2898eecd2aa7c06d6dc17f60ba3
>>> [6] https://github.com/apache/flink-web/pull/285
>> 
>>

Re: [VOTE] Release 1.8.3, release candidate #1

2019-11-28 Thread Dian Fu

Hi Hequn,

Thanks a lot for the great work!

I found the following minor issues:
1. The announcement PR for website has one minor issue. (Have left comments on 
the PR)
2. The following JIRAs are included 1.8.3, but the fix version are not updated 
and so not reflected in the release note:
   https://issues.apache.org/jira/browse/FLINK-14235 
(https://github.com/apache/flink/tree/e0387a8007707ab29795e3aa3794ad279eaaeaf9)
   https://issues.apache.org/jira/browse/FLINK-14370 
(https://github.com/apache/flink/tree/cf7509b91888e4c6b64eb514fbb62af49533e0f0)
3. The following JIRA is not included in 1.8.3, but the fix version is marked 
as fixed in 1.8.3 and so was included in the release note:
   https://issues.apache.org/jira/browse/FLINK-10377

Regards,
Dian

> 在 2019年11月28日，下午10:22，Hequn Cheng  写道：
> 
> Hi everyone,
> 
> Please review and vote on the release candidate #1 for the version 1.8.3,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.8.3-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
> 
> The vote will be open for at least 72 hours.
> Please cast your votes before *Dec. 3rd 2019, 16:00 UTC*.
> 
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Hequn
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346112
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1310
> [5]
> https://github.com/apache/flink/commit/cf92a34e3465d2898eecd2aa7c06d6dc17f60ba3
> [6] https://github.com/apache/flink-web/pull/285

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-11-25 Thread Dian Fu

Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu offline and 
also on the design doc.

It seems that we have reached consensus on the design. I would bring up the 
VOTE if there is no other feedbacks.

Thanks,
Dian

> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
> 
> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> It is great to make sure that the resources used by the Python process are
> managed properly by Flink’s resource management framework.
> 
> Also, thanks to the guys that working on the unified memory management
> framework.
> 
> Best, Hequn
> 
> 
> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
> 
>> Thanks for driving this discussion, Dian!
>> 
>> +1 for this proposal. It will help to reduce container failure due to
>> the memory overuse.
>> Some comments left in the design doc.
>> 
>> Best,
>> Yangze Guo
>> 
>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song 
>> wrote:
>>> 
>>> Sorry for the late reply.
>>> 
>>> +1 for the general proposal.
>>> 
>>> And one remainder, to use UNKNOWN resource requirement, we need to make
>>> sure optimizer knowns which operators use off-heap managed memory, and
>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>> details, and I would suggest you to double check with @Zhu Zhu who works
>> on
>>> this part.
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>> 
>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu  wrote:
>>> 
>>>> Hi Jincheng,
>>>> 
>>>> Thanks for the reply and also looking forward to the feedback from the
>>>> community.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月11日，下午2:34，jincheng sun  写道：
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> +1， Thanks for bring up this discussion Dian!
>>>>> 
>>>>> The Resource Management is very important for PyFlink UDF. So, It's
>> great
>>>>> if anyone can add more comments or inputs in the design doc or
>> feedback
>>>> in
>>>>> ML. :)
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> Dian Fu  于2019年11月5日周二 上午11:32写道：
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>> stateless
>>>>>> function for Python Table API. It will launch a separate Python
>> process
>>>> for
>>>>>> Python user-defined function execution. The resources used by the
>> Python
>>>>>> process should be managed properly by Flink’s resource management
>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>> framework
>>>>>> and PyFlink user-defined function resource management should be
>> based on
>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
>>>> this. I
>>>>>> draft a design doc[3] and want to start a discussion about PyFlink
>>>>>> user-defined function resource management.
>>>>>> 
>>>>>> Welcome any comments on the design doc or giving us feedback on the
>> ML
>>>>>> directly.
>>>>>> 
>>>>>> Regards,
>>>>>> Dian
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>> [2]
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>> [3]
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>> 
>>>> 
>>

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-11-25 Thread Dian Fu

NOTE: Only PMC votes is binding.

Thanks for sharing your thoughts. I also think that this doesn't fall into any 
of the existing categories listed in the bylaws. Maybe we could do some 
improvements for the bylaws.

This is not codebase change as Robert mentioned and it's related to how to 
manage Flink's development in a good way. So, I agree with Robert and Jincheng 
that this VOTE should only count PMC votes for now. 

Thanks,
Dian

> 在 2019年11月26日，上午11:43，jincheng sun  写道：
> 
> I also think that we should only count PMC votes.
> 
> This ML is to improve the security mechanism for Flink. Of course we don't
> expect to use this
> ML often. I hope that it's perfect if this ML is never used. However, the
> Flink community is growing rapidly, it's better to
> make our security mechanism as convenient as possible. But I agree that
> this ML is not a must to have, it's nice to have.
> 
> So, I give the vote as +1(binding).
> 
> Best,
> Jincheng
> 
> Robert Metzger  于2019年11月25日周一 下午9:45写道：
> 
>> I agree that we are only counting PMC votes (because this decision goes
>> beyond the codebase)
>> 
>> I'm undecided what to vote :) I'm not against setting up a new mailing
>> list, but I also don't think the benefit (having a private list with PMC +
>> committers) is enough to justify the work involved. As far as I remember,
>> we have received 2 security issue notices, both basically about the same
>> issue.  I'll leave it to other PMC members to support this if they want to
>> ...
>> 
>> 
>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> What is the voting scheme for it? I am not sure if it falls into any of
>>> the categories we have listed in our bylaws. Are committers votes
>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this a
>>> binding vote or just an informational vote?
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>> +1
>>>> 
>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> According to our previous discussion in [1], I'd like to bring up a
>> vote
>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>> 
>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>> try
>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>> not
>>>>> enough votes.
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>> [1]
>>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>> 
>>> 
>>

[jira] [Created] (FLINK-14944) Unstable test FlinkFnExecutionSyncTests.test_flink_fn_execution_pb2_synced

2019-11-25 Thread Dian Fu (Jira)

Dian Fu created FLINK-14944:
---

 Summary: Unstable test 
FlinkFnExecutionSyncTests.test_flink_fn_execution_pb2_synced
 Key: FLINK-14944
 URL: https://issues.apache.org/jira/browse/FLINK-14944
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


This tests failed occasionally:
{code:java}
force = 'True', output_dir = '/tmp/tmpkh6rmeig'

def generate_proto_files(force=True, output_dir=DEFAULT_PYTHON_OUTPUT_PATH):
try:
import grpc_tools  # noqa  # pylint: disable=unused-import
except ImportError:
warnings.warn('Installing grpcio-tools is recommended for 
development.')

proto_dirs = [os.path.join(PYFLINK_ROOT_PATH, path) for path in 
PROTO_PATHS]
proto_files = sum(
[glob.glob(os.path.join(d, '*.proto')) for d in proto_dirs], [])
out_dir = os.path.join(PYFLINK_ROOT_PATH, output_dir)
out_files = [path for path in glob.glob(os.path.join(out_dir, 
'*_pb2.py'))]

if out_files and not proto_files and not force:
# We have out_files but no protos; assume they're up to date.
# This is actually the common case (e.g. installation from an 
sdist).
logging.info('No proto files; using existing generated files.')
return

elif not out_files and not proto_files:
raise RuntimeError(
'No proto files found in %s.' % proto_dirs)

# Regenerate iff the proto files or this file are newer.
elif force or not out_files or len(out_files) < len(proto_files) or (
min(os.path.getmtime(path) for path in out_files)
<= max(os.path.getmtime(path)
   for path in proto_files + [os.path.realpath(__file__)])):
try:
>   from grpc_tools import protoc
E   ModuleNotFoundError: No module named 'grpc_tools'

pyflink/gen_protos.py:70: ModuleNotFoundError

During handling of the above exception, another exception occurred:

self = 


def test_flink_fn_execution_pb2_synced(self):
>   generate_proto_files('True', self.tempdir)

pyflink/fn_execution/tests/test_flink_fn_execution_pb2_synced.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyflink/gen_protos.py:83: in generate_proto_files
target=_install_grpcio_tools_and_generate_proto_files(force, output_dir))
pyflink/gen_protos.py:131: in _install_grpcio_tools_and_generate_proto_files
'--upgrade', GRPC_TOOLS, "-I"])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

popenargs = 
(['/home/travis/build/apache/flink/flink-python/.tox/py36/bin/python', '-m', 
'pip', 'install', '--prefix', 
'/home/travis/build/apache/flink/flink-python/pyflink/../.eggs/grpcio-wheels', 
...],)
kwargs = {}, retcode = 2
cmd = ['/home/travis/build/apache/flink/flink-python/.tox/py36/bin/python', 
'-m', 'pip', 'install', '--prefix', 
'/home/travis/build/apache/flink/flink-python/pyflink/../.eggs/grpcio-wheels', 
...]

def check_call(*popenargs, **kwargs):
"""Run command with arguments.  Wait for command to complete.  If
the exit code was zero then return, otherwise raise
CalledProcessError.  The CalledProcessError object will have the
return code in the returncode attribute.

The arguments are the same as for the call function.  Example:

check_call(["ls", "-l"])
"""
retcode = call(*popenargs, **kwargs)
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
>   raise CalledProcessError(retcode, cmd)
E   subprocess.CalledProcessError: Command 
'['/home/travis/build/apache/flink/flink-python/.tox/py36/bin/python', '-m', 
'pip', 'install', '--prefix', 
'/home/travis/build/apache/flink/flink-python/pyflink/../.eggs/grpcio-wheels', 
'--build', 
'/home/travis/build/apache/flink/flink-python/pyflink/../.eggs/grpcio-wheels-build',
 '--upgrade', 'grpcio-tools>=1.3.5,<=1.14.2', '-I']' returned non-zero exit 
status 2.

dev/.conda/envs/3.6/lib/python3.6/subprocess.py:311: CalledProcessError
{code}

instance: [https://api.travis-ci.org/v3/job/616685590/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Apache Flink-shaded 9.0 released

2019-11-24 Thread Dian Fu

Thanks Chesnay for the great work and thanks to everyone who has contributed to 
this release.

Regards,
Dian

> 在 2019年11月25日，上午10:22，Zhu Zhu  写道：
> 
> Thanks a lot to Chesnay for the great work to release Flink-shaded 9.0!
> And thanks for the efforts to make this release possible to all the
> contributors!
> 
> Thanks,
> Zhu Zhu
> 
> Hequn Cheng  于2019年11月25日周一 上午9:51写道：
> 
>> Thank you Chesnay for the great work!
>> Also thanks a lot to the people who made this release possible!
>> 
>> Best, Hequn
>> 
>> On Mon, Nov 25, 2019 at 12:53 AM Chesnay Schepler 
>> wrote:
>> 
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink-shaded 9.0.
>>> 
>>> The flink-shaded project contains a number of shaded dependencies for
>>> Apache Flink.
>>> 
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data
>>> streaming applications.
>>> 
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>> 
>>> The full release notes are available in Jira:
>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346089
>>> 
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> 
>>> Regards,
>>> Chesnay
>>> 
>>> 
>>

Re: [DISCUSS] Remove old WebUI

2019-11-24 Thread Dian Fu

+1 to drop the old UI.

> 在 2019年11月25日，上午10:59，Zhenghua Gao  写道：
> 
> +1 to drop the old one.
> 
> *Best Regards,*
> *Zhenghua Gao*
> 
> 
> On Thu, Nov 21, 2019 at 8:05 PM Chesnay Schepler  wrote:
> 
>> Hello everyone,
>> 
>> Flink 1.9 shipped with a new UI, with the old one being kept around as a
>> backup in case something wasn't working as expected.
>> 
>> Currently there are no issues indicating any significant problems
>> (exclusive to the new UI), so I wanted to check what people think about
>> dropping the old UI for 1.10.
>> 
>>

[jira] [Created] (FLINK-14891) PythonScalarFunctionOperator should be chained with upstream operators by default

2019-11-21 Thread Dian Fu (Jira)

Dian Fu created FLINK-14891:
---

 Summary: PythonScalarFunctionOperator should be chained with 
upstream operators by default
 Key: FLINK-14891
 URL: https://issues.apache.org/jira/browse/FLINK-14891
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


Currently the default chaining strategy for PythonScalarFunctionOperator is not 
set and it's HEAD by default. We should set the default value as ALWAYS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14888) Move the Python SqlDialect to module table

2019-11-21 Thread Dian Fu (Jira)

Dian Fu created FLINK-14888:
---

 Summary: Move the Python SqlDialect to module table
 Key: FLINK-14888
 URL: https://issues.apache.org/jira/browse/FLINK-14888
 Project: Flink
  Issue Type: Task
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


Currently SqlDialect is located in pyflink.common which is not correct as it 
belongs to the module table actually and so we should move it to module 
pyflink.table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release flink-shaded 9.0, release candidate #1

2019-11-21 Thread Dian Fu

+1 (non-binding)

- verified the signature and checksum
- checked the maven central artifices

Regards,
Dian



On Wed, Nov 20, 2019 at 8:47 PM tison  wrote:

> +1 (non-binding)
>
> Best,
> tison.
>
>
> Aljoscha Krettek  于2019年11月20日周三 下午6:58写道：
>
> > +1 (binding)
> >
> > Best,
> > Aljoscha
> >
> > > On 19. Nov 2019, at 23:13, Chesnay Schepler 
> wrote:
> > >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the version 9.0,
> > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with fingerprint 11D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-9.0-rc1" [5],
> > > * website pull request listing the new release [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1] https://issues.apache.org/jira/projects/FLINK/versions/12346089
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-9.0-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1275
> > > [5] https://github.com/apache/flink-shaded/tree/release-9.0-rc1
> > > [6] https://github.com/apache/flink-web/pull/283
> > >
> >
> >
>

[VOTE] Setup a secur...@flink.apache.org mailing list

2019-11-21 Thread Dian Fu

Hi all,

According to our previous discussion in [1], I'd like to bring up a vote to set 
up a secur...@flink.apache.org mailing list.

The vote will be open for at least 72 hours (excluding weekend). I'll try to 
close it by 2019-11-26 18:00 UTC, unless there is an objection or not enough 
votes.

Regards,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-11-20 Thread Dian Fu

Hi all,

There are no new feedbacks and it seems that we have received enough feedback 
about setup a secur...@flink.apache.org mailing list[1] for security report and 
discussion. It shows that it's optional as we can use either 
secur...@flink.apache.org <mailto:secur...@flink.apache.org> or 
secur...@apache.org. So I'd like to start the vote for setup a 
secur...@flink.apache.org mailing list to make the final decision.

Thanks,
Dian

> 在 2019年11月19日，下午6:06，Dian Fu  写道：
> 
> Hi all,
> 
> Thanks for sharing your thoughts. Appreciated! Let me try to summarize the 
> information and thoughts received so far. Please feel free to let me know if 
> there is anything wrong or missing.
> 
> 1. Setup project specific security mailing list
> Pros:
> - The security reports received by secur...@apache.org 
> <mailto:secur...@apache.org> will be forwarded to the project private(PMC) 
> mailing list. Having a project specific security mailing list is helpful in 
> cases when the best person to address the security issue is not a PMC member, 
> but a committer. It makes things simple as everyone(both PMCs and committers) 
> is on the same table.
> - Even though the security issues are usually rare, they could be devastating 
> and thus need to be treated seriously.
> - Most notable apache projects such as apache common, hadoop, spark, kafka, 
> hive, etc have a security specific mailing list.
> 
> Cons:
> - The ASF security mailing list secur...@apache.org 
> <mailto:secur...@apache.org> could be used if there is no project specific 
> security mailing list.
> - The number of security reports is very low.
> 
> Additional information:
> - Security mailing list could only be subscribed by PMCs and committers. 
> However everyone could report security issues to the security mailing list.
> 
> 
> 2. Guide users to report the security issues
> Why:
> - Security vulnerabilities should not be publicly disclosed (e.g. via dev ML 
> or JIRA) until the project has responded. We should guide users on how to 
> report security issues in Flink website.
> 
> How:
> - Option 1: Set up secur...@flink.apache.org 
> <mailto:secur...@flink.apache.org> and ask users to report security issues 
> there
> - Option 2: Ask users to send security report to secur...@apache.org 
> <mailto:secur...@apache.org>
> - Option 3: Ask users to send security report directly to 
> priv...@flink.apache.org <mailto:priv...@flink.apache.org>
> 
> 
> 3. Dedicated page to show the security vulnerabilities
> - We may need a dedicated security page to describe the CVE list on the Flink 
> website.
> 
> I think it makes sense to open separate discussion thread on 2) and 3). I'll 
> create separate discussion thread for them. Let's focus on 1) in this thread. 
> 
> If there is no other feedback on 1), I'll bring up a VOTE for this discussion.
> 
> What do you think?
> 
> Thanks,
> Dian
> 
> On Fri, Nov 15, 2019 at 10:18 AM Becket Qin  <mailto:becket@gmail.com>> wrote:
> Thanks for bringing this up, Dian.
> 
> +1 on creating a project specific security mailing list. My two cents, I
> think it is worth doing in practice.
> 
> Although the ASF security ML is always available, usually all the emails
> are simply routed to the individual project PMC. This is an additional hop.
> And in some cases, the best person to address the reported issue may not be
> a PMC member, but a committer, so the PMC have to again involve them into
> the loop. This make things unnecessarily complicated. Having a project
> specific security ML would make it much easier to have everyone at the same
> table.
> 
> Also, one thing to note is that even though the security issues are usually
> rare, they could be devastating, thus need to be treated seriously. So I
> think it is a good idea to establish the handling mechanism regardless of
> the frequency of the reported security vulnerabilities.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Fri, Nov 15, 2019 at 1:14 AM Yu Li  <mailto:car...@gmail.com>> wrote:
> 
> > Thanks for bringing up this discussion Dian! How to report security bugs to
> > our project is a very important topic!
> >
> > Big +1 on adding some explicit instructions in our document about how to
> > report security issues, and I suggest to open another thread to vote the
> > reporting way in Flink.
> >
> > FWIW, known options to report security issues include:
> > 1. Set up secur...@flink.apache.org <mailto:secur...@flink.apache.org> and 
> > ask users to report security
> > issues
> > there
> > 2. Ask users to send security report to secur...@apache.org 
> > <mailto:

[jira] [Created] (FLINK-14866) A few documentation links are broken

2019-11-19 Thread Dian Fu (Jira)

Dian Fu created FLINK-14866:
---

 Summary: A few documentation links are broken
 Key: FLINK-14866
 URL: https://issues.apache.org/jira/browse/FLINK-14866
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


The links for udfs.html and functions.html doesn't work any more.

udfs.html and functions.html are referenced in a few places, i.e.:
[https://raw.githubusercontent.com/apache/flink/master/docs/dev/table/sql.md]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14865) PyFlinkBlinkStreamUserDefinedFunctionTests#test_udf_in_join_condition_2 failed in travis

2019-11-19 Thread Dian Fu (Jira)

Dian Fu created FLINK-14865:
---

 Summary: 
PyFlinkBlinkStreamUserDefinedFunctionTests#test_udf_in_join_condition_2 failed 
in travis
 Key: FLINK-14865
 URL: https://issues.apache.org/jira/browse/FLINK-14865
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


{code:java}
Caused by: 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
 java.io.IOException: Cannot run program 
"/tmp/32b29e73-3348-4326-bc06-69f6adda04ea_pyflink-udf-runner.sh": error=26, 
Text file busy547E at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)548E
 at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211)549E
 at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202)550E
 at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)551E
 at 
org.apache.flink.python.AbstractPythonFunctionRunner.open(AbstractPythonFunctionRunner.java:201)552E
 at 
org.apache.flink.table.runtime.operators.python.AbstractPythonScalarFunctionOperator$ProjectUdfInputPythonScalarFunctionRunner.open(AbstractPythonScalarFunctionOperator.java:177)553E
 at 
org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:114)554E
 at 
org.apache.flink.table.runtime.operators.python.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:137)555E
 at 
org.apache.flink.table.runtime.operators.python.BaseRowPythonScalarFunctionOperator.open(BaseRowPythonScalarFunctionOperator.java:83)556E
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:585)557E
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:436)558E
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:702)559E at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:527)560E ... 1 more561E 
Caused by: java.io.IOException: Cannot run program 
"/tmp/32b29e73-3348-4326-bc06-69f6adda04ea_pyflink-udf-runner.sh": error=26, 
Text file busy562E at 
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)563E at 
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)564E at 
org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:133)565E
 at 
org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:120)566E
 at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178)567E
 at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162)568E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)569E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)570E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)571E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)572E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)573E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)574E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)575E
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)576E
 ... 13 more577E Suppressed: java.lang.NullPointerException: Process for id 
does not exist: 1578E at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:895)579E
 at 
org.apache.beam.runners.fnexecution.environment.ProcessManager.stopProcess(ProcessManager.java:147)580E
 at 
org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:139)581E
 ... 23 more582E Caused by: java.io.IOException: error=26, Text file busy583E 
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)584E at 
java.base/java.lang.ProcessImpl.(ProcessImpl.java:340)585E at 
java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)586E at 
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)587E ... 26 
more588589
{code}
[https://travis-ci.org/apache/

Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework

2019-11-19 Thread Dian Fu

+1 (non-binding)

Regards,
Dian

> 在 2019年11月19日，下午6:31，Piotr Nowojski  写道：
> 
> +1 (non-binding)
> 
> Piotrek
> 
>> On 19 Nov 2019, at 04:20, Yang Wang  wrote:
>> 
>> +1 (non-binding)
>> 
>> It is great to have a new end-to-end test framework, even it is only for
>> performance tests now.
>> 
>> Best,
>> Yang
>> 
>> Jingsong Li  于2019年11月19日周二 上午9:54写道：
>> 
>>> +1 (non-binding)
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Nov 18, 2019 at 7:59 PM Becket Qin  wrote:
>>> 
 +1 (binding) on having the test suite.
 
 BTW, it would be good to have a few more details about the performance
 tests. For example:
 1. How do the testing records look like? The size and key distributions.
 2. The resources for each task.
 3. The intended configuration for the jobs.
 4. What exact source and sink it would use.
 
 Thanks,
 
 Jiangjie (Becket) Qin
 
 On Mon, Nov 18, 2019 at 7:25 PM Zhijiang >>> .invalid>
 wrote:
 
> +1 (binding)!
> 
> It is a good thing to enhance our testing work.
> 
> Best,
> Zhijiang
> 
> 
> --
> From:Hequn Cheng 
> Send Time:2019 Nov. 18 (Mon.) 18:22
> To:dev 
> Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing
 Framework
> 
> +1 (binding)!
> I think this would be very helpful to detect regression problems.
> 
> Best, Hequn
> 
> On Mon, Nov 18, 2019 at 4:28 PM vino yang 
>>> wrote:
> 
>> +1 (non-binding)
>> 
>> Best,
>> Vino
>> 
>> jincheng sun  于2019年11月18日周一 下午2:31写道：
>> 
>>> +1  (binding)
>>> 
>>> OpenInx  于2019年11月18日周一 下午12:09写道：
>>> 
 +1  (non-binding)
 
 On Mon, Nov 18, 2019 at 11:54 AM aihua li >>> 
>> wrote:
 
> +1  (non-binding)
> 
> Thanks Yu Li for driving on this.
> 
>> 在 2019年11月15日，下午8:10，Yu Li  写道：
>> 
>> Hi All,
>> 
>> I would like to start the vote for FLIP-83 [1] which is
 discussed
>> and
>> reached consensus in the discussion thread [2].
>> 
>> The vote will be open for at least 72 hours (excluding
 weekend).
>> I'll
 try
>> to close it by 2019-11-20 21:00 CST, unless there is an
 objection
>> or
 not
>> enough votes.
>> 
>> [1]
>> 
> 
 
>>> 
>> 
> 
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework
>> [2] https://s.apache.org/7fqrz
>> 
>> Best Regards,
>> Yu
> 
> 
 
>>> 
>> 
> 
> 
 
>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>>> 
>

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-11-19 Thread Dian Fu

ning in its
> > online ref-guide book here <http://hbase.apache.org/book.html#_preface
> >).
> >
> > Hope these information helps. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 14 Nov 2019 at 18:11, Chesnay Schepler 
> wrote:
> >
> > > Source: https://www.apache.org/security/
> > >
> > > Now, we can of course setup such a mailing list (as outlined here
> > > https://www.apache.org/security/committers.html), but I'm not sure if
> it
> > > is necessary since the number of reports is _really_ low.
> > >
> > > On 14/11/2019 11:03, Chesnay Schepler wrote:
> > > > AFAIK, the official way to report vulnerabilities in any apache
> > > > project is to write to secur...@apache.org and/or notify the
> > > > respective PMC. So far, we had several reports that went this route,
> > > > hence I'm not convinced that an additional ML is required.
> > > >
> > > > I would be fine with an additional paragraph somewhere outlining this
> > > > though.
> > > >
> > > > On 14/11/2019 06:57, Jark Wu wrote:
> > > >> Hi Dian,
> > > >>
> > > >> Good idea and +1 to setup security mailing list.
> > > >> Security vulnerabilities should not be publicly disclosed (e.g. via
> > > >> dev ML
> > > >> or JIRA) until the project has responded.
> > > >> However, AFAIK, Flink doesn't have an official process to
> > > >> report vulnerabilities.
> > > >> It would be nice to have one to protect Flink users and response
> > > >> security
> > > >> problems quickly.
> > > >>
> > > >> Btw, we may also need a dedicated page to describe the security
> > > >> vulnerabilities report process and CVE list on the website.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >>
> > > >>
> > > >> On Thu, 14 Nov 2019 at 13:36, Hequn Cheng 
> > wrote:
> > > >>
> > > >>> Hi Dian,
> > > >>>
> > > >>> Good idea! +1 to have a security mailing list.
> > > >>> It is nice for Flink to have an official procedure to handle
> security
> > > >>> problems, e.g., reporting, addressing and publishing.
> > > >>>
> > > >>> Best, Hequn
> > > >>>
> > > >>> On Thu, Nov 14, 2019 at 1:20 PM Jeff Zhang 
> wrote:
> > > >>>
> > > >>>> Thanks Dian Fu for this proposal. +1 for creating security mail
> > > >>>> list. To
> > > >>> be
> > > >>>> noticed, security mail list is private mail list, could not be
> > > >>>> subscribed
> > > >>>> publicly.
> > > >>>> FYI, apache member can create mail list using this self service
> tool
> > > >>>> https://selfserve.apache.org/
> > > >>>>
> > > >>>>
> > > >>>> jincheng sun  于2019年11月14日周四
> > > >>>> 下午12:25写道：
> > > >>>>
> > > >>>>> Hi Dian,
> > > >>>>>
> > > >>>>> Thanks a lot for bringing up this discussion. This is very
> > important
> > > >>> for
> > > >>>>> Flink community!
> > > >>>>>
> > > >>>>> I think setup a security mailing list for Flink is pretty nice
> > > >>> although `
> > > >>>>> secur...@apache.org` can be used and the report will be
> forwarded
> > to
> > > >>>> Flink
> > > >>>>> private mailing list if there is no project specific security
> > mailing
> > > >>>>> list. One thing that is pretty sure is that we should guide users
> > on
> > > >>> how
> > > >>>> to
> > > >>>>> report security issues in Flink website as security
> vulnerabilities
> > > >>>> should
> > > >>>>> not be entered into a project's public bug tracker directly
> > according
> > > >>> to
> > > >>>>> the guidance for how to handling the security vulnerabilities in
> > ASF
> > > >>>>> site[1].
> > > >>>>>
> > > >&g

Re: [DISCUSS] Release flink-shaded 9.0

2019-11-19 Thread Dian Fu

I see, thanks for the reminder @Chesnay!

Will help on the release check once the RC is out.

Regards,
Dian

On Tue, Nov 19, 2019 at 4:41 PM Chesnay Schepler  wrote:

> Thanks for the offer, but without being a committer I don't think
> there's a lot to do :/
>
> @Uce If no one else steps up I'll kick it off later today myself; this
> would mean a release on Friday.
>
> On 19/11/2019 09:17, Dian Fu wrote:
> > Hi Chesnay,
> >
> > Thanks a lot for kicking off this release. +1 to release flink-shaded
> 9.0.
> >
> > I'm willing to help on the release. Please feel free to let me know
> > if there is anything I could help.
> >
> > Regards,
> > Dian
> >
> > On Mon, Nov 18, 2019 at 8:43 PM Ufuk Celebi  wrote:
> >
> >> @Chesnay: I know you said that you are pretty busy these days. If we
> can't
> >> find anybody else to work on this, when would you be available to create
> >> the first RC?
> >>
> >> On Sun, Nov 17, 2019 at 6:48 AM Hequn Cheng 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Big +1 to release 9.0.
> >>> It would be good if we can solve these security vulnerabilities.
> >>>
> >>> Thanks a lot for your nice work and kick off the release so quickly.
> >>>
> >>>
> >>> On Fri, Nov 15, 2019 at 11:50 PM Ufuk Celebi  wrote:
> >>>
> >>>>  From what I can see, the Jackson version bump fixes quite a few
> >>>> vulnerabilities. Therefore, I'd be +1 to release flink-shaded 9.0.
> >>>>
> >>>> Thanks for all the work to verify this on master already.
> >>>>
> >>>> – Ufuk
> >>>>
> >>>>
> >>>> On Fri, Nov 15, 2019 at 2:26 PM Chesnay Schepler 
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I'd like to kick off the next release for flink-shaded. Background is
> >>>>> that we recently bumped jackson to 2.10.1 to fix a variety of
> >> security
> >>>>> vulnerabilities, and it would be good to include them in the upcoming
> >>>>> 1.8.3/1.9.2 releases.
> >>>>>
> >>>>> The release would contain few changes beyond the jackson changes;
> >>>>> flink-shaded can now be compiled on Java 11 and an encoding issue for
> >>>>> the NOTICE files was fixed.
> >>>>>
> >>>>> So overall this should be very little overhead.
> >>>>>
> >>>>> I have already verified that the master would work with this version
> >>>>> (this being a reasonable indicator for it also working in previous
> >>>>> version).
> >>>>>
> >>>>> I'd also appreciate it if someone would volunteer to handle the
> >>> release;
> >>>>> I'm quite bogged down at the moment :(
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Chesnay
> >>>>>
> >>>>>
>
>

< 1 2 3 4 5 6 7 8 >

501 - 600 of 773 matches

Mail list logo