Re: Cwiki edit access

2020-02-13 Thread jincheng sun
Hi lining,

Sounds good! I have assigned write edit permission to `jinglining` account.
Please log in again and verify whether it works. Have a nice day at work!

Best,
Jincheng


lining jing  于2020年2月14日周五 上午11:49写道:

> As I am currently drafting FLIP-98 ~ 104 with Yadong, so i need to edit it.
>
> jincheng sun  于2020年2月14日周五 上午11:43写道:
>
> > Hi Lining,
> >
> > Could you please add more information about why you need page access
> > authority, and which page you want to edit?
> >
> > Best,
> > Jincheng
> >
> >
> > lining jing  于2020年2月14日周五 上午11:09写道:
> >
> > > Hi, I would like to be able to edit pages in the Confluence Flink
> space.
> > > Can
> > > someone give me access, please?
> > >
> > > Thanks
> > >
> >
>


[jira] [Created] (FLINK-16053) Remove redundant metrics in PyFlink

2020-02-13 Thread Dian Fu (Jira)
Dian Fu created FLINK-16053:
---

 Summary: Remove redundant metrics in PyFlink
 Key: FLINK-16053
 URL: https://issues.apache.org/jira/browse/FLINK-16053
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


We have recorded the metrics about how many elements it has processed in Python 
UDF. This kind of information is not necessary as there is also this kind of 
information in the Java operator. I have performed a simple test and find that 
removing it could improve the performance about 5% - 10%.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16052) Homebrew test failed with 1.10.0 dist package

2020-02-13 Thread Yu Li (Jira)
Yu Li created FLINK-16052:
-

 Summary: Homebrew test failed with 1.10.0 dist package
 Key: FLINK-16052
 URL: https://issues.apache.org/jira/browse/FLINK-16052
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.10.0
Reporter: Yu Li
 Fix For: 1.10.1, 1.11.0


After updating the homebrew formula to 1.10.0 with patch supplied in this 
[PR|https://github.com/Homebrew/homebrew-core/pull/50110], executing `brew 
install --build-from-source Formula/apache-flink.rb` and then `brew test 
Formula/apache-flink.rb`, we could see below error:
{code}
[ERROR] Unexpected result: Error: Could not find or load main class 
org.apache.flink.runtime.util.BashJavaUtils
[ERROR] The last line of the BashJavaUtils outputs is expected to be the 
execution result, following the prefix 'BASH_JAVA_UTILS_EXEC_RESULT:'
Picked up _JAVA_OPTIONS: 
-Djava.io.tmpdir=/private/tmp/apache-flink-test-20200214-33361-1jotper 
-Duser.home=/Users/jueding/Library/Caches/Homebrew/java_cache
Error: Could not find or load main class 
org.apache.flink.runtime.util.BashJavaUtils
[ERROR] Could not get JVM parameters properly.
Error: apache-flink: failed
Failed executing:
{code}

After a bisect checking on {{flink-dist/src/main/flink-bin/bin}} changes, 
confirmed the above issue is related to FLINK-15488, but we will see new errors 
like below after reverting FLINK-15488 (and FLINK-15519):
{code}
==> /usr/local/Cellar/apache-flink/1.10.0/libexec/bin/start-cluster.sh
==> /usr/local/Cellar/apache-flink/1.10.0/bin/flink run -p 1 
/usr/local/Cellar/apache-flink/1.10.0/libexec/examples/streaming/WordCount.jar 
--input input --output result
Last 15 lines from 
/Users/jueding/Library/Logs/Homebrew/apache-flink/test.02.flink:
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
... 4 more
Caused by: 
org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException:
 Could not fulfill slot request b7f17c0928112209ae873d089123b1c6. Requested 
resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$fulfillPendingSlotRequestWithPendingTaskManagerSlot$9(SlotManagerImpl.java:772)
at 
org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.fulfillPendingSlotRequestWithPendingTaskManagerSlot(SlotManagerImpl.java:768)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$internalRequestSlot$7(SlotManagerImpl.java:755)
at 
org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.internalRequestSlot(SlotManagerImpl.java:755)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.registerSlotRequest(SlotManagerImpl.java:314)
... 27 more
Error: apache-flink: failed
Failed executing:
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Support Python ML Pipeline API

2020-02-13 Thread Dian Fu
+1 (binding)

Regards,
Dian

> 在 2020年2月14日,下午2:49,Yuan Mei  写道:
> 
> +1 vote
> 
> This is one of the most important things for Flink ML framework to be
> widely adopted since most data scientists use python.
> 
> Best
> 
> Yuan
> 
> On Fri, Feb 14, 2020 at 9:45 AM Hequn Cheng  wrote:
> 
>> Hi everyone,
>> 
>> I'd like to start the vote of FLIP-96[1] which is discussed and reached
>> consensus in the discussion thread[2].
>> The vote will be open for at least 72 hours. Unless there is an objection,
>> I will try to close it by Feb 19, 2020 02:00 UTC if we have received
>> sufficient votes.
>> 
>> Thanks,
>> Hequn
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-96%3A+Support+Python+ML+Pipeline+API
>> [2]
>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>> 



Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Dian Fu
+1 (non-binding)

- Verified the signature and checksum
- Checked the release note that all the tickets included in this release are 
there
- Checked the website PR and it LGTM
- Checked the notice file of the newly added module flink-shade-zookeeper-3 and 
it LGTM

Regards,
Dian

> 在 2020年2月14日,下午2:58,Hequn Cheng  写道:
> 
> Thank you Chesnay for the release!
> 
> +1 (non-binding)
> 
> - The problem that exists in RC1 has been resolved.
> - Release notes looks good.
> - Built from source archive successfully.
> - Check commit history manually. Nothing looks weird.
> - Signatures and hash are correct.
> - All artifacts have been deployed to the maven central repository.
> - The website pull request looks good
> 
> Best, Hequn
> 
> On Fri, Feb 14, 2020 at 1:14 AM Zhu Zhu  wrote:
> 
>> +1 (non-binding)
>> 
>> - checked release notes, JIRA tickets and commit history
>> - verified the signature and checksum
>> - checked the maven central artifacts
>>  * examined the zookeeper shaded jars (both 3.4.10 and 3.5.6), curator and
>> zookeeper classes are there and shaded
>> - built from the source archive as well as the git tag
>> - checked the website pull request
>> 
>> Thanks,
>> Zhu Zhu
>> 
>> Ufuk Celebi  于2020年2月14日周五 上午12:32写道:
>> 
>>> PS: Also verified the NOTICE changes since the last RC.
>>> 
>>> On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:
>>> 
 Hey Chensay,
 
 +1 (binding).
 
 - Verified checksum ✅
 - Verified signature ✅
 - Jira changelog looks good to me ✅
 - Website PR looks good to me ✅
 - Verified no unshaded dependencies (except the Hadoop modules which I
 think is expected) ✅
 - Verified dependency management fix FLINK-15540
 (commons-collections:3.2.2 as expected) ✅
 - Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
 flink-shaded-force-shading and the Hadoop modules which I think is
 expected) ✅
 
 – Ufuk
 
 On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
> 
> +1 (non-binding)
> 
> Checked issues listed in release notes: ok
> Checked sums and signatures: ok
> Checked the maven central artifices: ok
> Built from source: ok (8u101, 11.0.4)
> Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> Checked contents of zookeeper shaded jars: ok
> - no unshaded classes
> - shading pattern is correct
> Checked website pull request listing the new release: ok
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
 wrote:
> 
>> Hi everyone,
>> Please review and vote on the release candidate #3 for the version
 10.0,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>> 
>> 
>> The complete staging area is available for your review, which
>>> includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>>> dist.apache.org
>> [2], which are signed with the key with fingerprint 11D464BA [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-10.0-rc3 [5],
>> * website pull request listing the new release [6].
>> 
>> The vote will be open for at least 72 hours. It is adopted by
>>> majority
>> approval, with at least 3 PMC affirmative votes.
>> 
>> Thanks,
>> Chesnay
>> 
>> [1]
>> 
>> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
>> [2]
 https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> [4]
 https://repository.apache.org/content/repositories/orgapacheflink-1337
>> [5]
>> 
>> 
 
>>> 
>> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
>> [6] https://github.com/apache/flink-web/pull/304
>> 
>> 
>> 
>> 
 
>>> 
>> 



Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Hequn Cheng
Thank you Chesnay for the release!

+1 (non-binding)

- The problem that exists in RC1 has been resolved.
- Release notes looks good.
- Built from source archive successfully.
- Check commit history manually. Nothing looks weird.
- Signatures and hash are correct.
- All artifacts have been deployed to the maven central repository.
- The website pull request looks good

Best, Hequn

On Fri, Feb 14, 2020 at 1:14 AM Zhu Zhu  wrote:

> +1 (non-binding)
>
> - checked release notes, JIRA tickets and commit history
> - verified the signature and checksum
> - checked the maven central artifacts
>   * examined the zookeeper shaded jars (both 3.4.10 and 3.5.6), curator and
> zookeeper classes are there and shaded
> - built from the source archive as well as the git tag
> - checked the website pull request
>
> Thanks,
> Zhu Zhu
>
> Ufuk Celebi  于2020年2月14日周五 上午12:32写道:
>
> > PS: Also verified the NOTICE changes since the last RC.
> >
> > On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:
> >
> > > Hey Chensay,
> > >
> > > +1 (binding).
> > >
> > > - Verified checksum ✅
> > > - Verified signature ✅
> > > - Jira changelog looks good to me ✅
> > > - Website PR looks good to me ✅
> > > - Verified no unshaded dependencies (except the Hadoop modules which I
> > > think is expected) ✅
> > > - Verified dependency management fix FLINK-15540
> > > (commons-collections:3.2.2 as expected) ✅
> > > - Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
> > > flink-shaded-force-shading and the Hadoop modules which I think is
> > > expected) ✅
> > >
> > > – Ufuk
> > >
> > > On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Checked issues listed in release notes: ok
> > > > Checked sums and signatures: ok
> > > > Checked the maven central artifices: ok
> > > > Built from source: ok (8u101, 11.0.4)
> > > > Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> > > > Checked contents of zookeeper shaded jars: ok
> > > > - no unshaded classes
> > > > - shading pattern is correct
> > > > Checked website pull request listing the new release: ok
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > > Please review and vote on the release candidate #3 for the version
> > > 10.0,
> > > > > as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > > * JIRA release notes [1],
> > > > > * the official Apache source release to be deployed to
> > dist.apache.org
> > > > > [2], which are signed with the key with fingerprint 11D464BA [3],
> > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > > * source code tag "release-10.0-rc3 [5],
> > > > > * website pull request listing the new release [6].
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > > Thanks,
> > > > > Chesnay
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
> > > > > [2]
> > > https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
> > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > [4]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1337
> > > > > [5]
> > > > >
> > > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
> > > > > [6] https://github.com/apache/flink-web/pull/304
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> >
>


Re: [VOTE] Support Python ML Pipeline API

2020-02-13 Thread Yuan Mei
+1 vote

This is one of the most important things for Flink ML framework to be
widely adopted since most data scientists use python.

Best

Yuan

On Fri, Feb 14, 2020 at 9:45 AM Hequn Cheng  wrote:

> Hi everyone,
>
> I'd like to start the vote of FLIP-96[1] which is discussed and reached
> consensus in the discussion thread[2].
> The vote will be open for at least 72 hours. Unless there is an objection,
> I will try to close it by Feb 19, 2020 02:00 UTC if we have received
> sufficient votes.
>
> Thanks,
> Hequn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-96%3A+Support+Python+ML+Pipeline+API
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>


Re: [DISCUSS] Support Python ML Pipeline API

2020-02-13 Thread Hequn Cheng
Hi Becket,

Thanks a lot for your advice. Definitely agree with you that we should
follow the FLIP process.
Will pay more attention to this next time.

Best, Hequn


On Fri, Feb 14, 2020 at 2:19 PM Becket Qin  wrote:

> I just had an offline chat with Hequn and realized that FLIP-96 has already
> been opened for this discussion. I missed that because the FLIP was not
> mentioned in the thread.
>
> I am fine with proceeding to a vote.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Feb 14, 2020 at 12:52 PM Becket Qin  wrote:
>
> > Hi Hequn,
> >
> > Given this is an addition to the public API, we probably should follow
> the
> > FLIP process. It would be a quick one though, I think.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Feb 14, 2020 at 10:03 AM Hequn Cheng  wrote:
> >
> >> Hi all,
> >>
> >> Thanks a lot for your valuable feedback!
> >> As it seems we have reached a consensus on the discussion now. I have
> >> started a VOTE thread[1]. Looking forward to your vote.
> >>
> >> Best,
> >> Hequn
> >>
> >> [1]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Support-Python-ML-Pipeline-API-td37637.html
> >>
> >> On Thu, Feb 13, 2020 at 10:40 AM Becket Qin 
> wrote:
> >>
> >>> +1. I'd say this is almost a must-have for machine learning.
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On Thu, Feb 13, 2020 at 10:03 AM Rong Rong 
> wrote:
> >>>
>  Thanks for driving this initiative @Hequn Cheng .
> 
>  Moving towards python based ML is definitely a huge win consider how
>  large
>  the python-ML community is. a big +1 on my side!
>  Regarding the doc, I only left a few comments on the specific APIs.
>  overall
>  the architecture looks very good!
> 
>  Looking forward to it!
>  --
>  Rong
> 
>  On Sun, Feb 9, 2020 at 10:28 PM Hequn Cheng  wrote:
> 
>  > Hi everyone,
>  >
>  > Thanks a lot for your feedback. I have created the FLIP[1].
>  >
>  > Best,
>  > Hequn
>  >
>  > [1]
>  >
>  >
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+96%3A+Support+Python+ML+Pipeline+API
>  >
>  > On Mon, Feb 10, 2020 at 12:29 PM Dian Fu 
>  wrote:
>  >
>  > > Hi Hequn,
>  > >
>  > > Thanks for bringing up the discussion. +1 to this feature. The
>  design
>  > LGTM.
>  > > It's great that the Python ML users could use both the Java
> Pipeline
>  > > Transformer/Estimator/Model classes and the Python
>  > > Pipeline Transformer/Estimator/Model in the same job.
>  > >
>  > > Regards,
>  > > Dian
>  > >
>  > > On Mon, Feb 10, 2020 at 11:08 AM jincheng sun <
>  sunjincheng...@gmail.com>
>  > > wrote:
>  > >
>  > > > Hi Hequn,
>  > > >
>  > > > Thanks for bring up this discussion.
>  > > >
>  > > > +1 for add Python ML Pipeline API, even though the Java pipeline
>  API
>  > may
>  > > > change.
>  > > >
>  > > > I would like to suggest create a FLIP for this API changes. :)
>  > > >
>  > > > Best,
>  > > > Jincheng
>  > > >
>  > > >
>  > > > Hequn Cheng  于2020年2月5日周三 下午5:24写道:
>  > > >
>  > > > > Hi everyone,
>  > > > >
>  > > > > FLIP-39[1] rebuilds the Flink ML pipeline on top of TableAPI
> and
>  > > > introduces
>  > > > > a new set of Java APIs. As Python is widely used in ML areas,
>  > providing
>  > > > > Python ML Pipeline APIs for Flink can not only make it easier
> to
>  > write
>  > > ML
>  > > > > jobs for Python users but also broaden the adoption of Flink
> ML.
>  > > > >
>  > > > > Given this, Jincheng and I discussed offline about the support
>  of
>  > > Python
>  > > > ML
>  > > > > Pipeline API and drafted a design doc[2]. We'd like to achieve
>  three
>  > > > goals
>  > > > > for supporting Python Pipeline API:
>  > > > > - Add Python pipeline API according to Java pipeline API(we
> will
>  > adapt
>  > > > the
>  > > > > Python pipeline API if Java pipeline API changes).
>  > > > > - Support native Python Transformer/Estimator/Model, i.e.,
>  users can
>  > > > write
>  > > > > not only Python Transformer/Estimator/Model wrappers for
>  calling Java
>  > > > ones
>  > > > > but also can write native Python Transformer/Estimator/Models.
>  > > > > - Ease of use. Support keyword arguments when defining
>  parameters.
>  > > > >
>  > > > > More details can be found in the design doc and we are looking
>  > forward
>  > > to
>  > > > > your feedback.
>  > > > >
>  > > > > Best,
>  > > > > Hequn
>  > > > >
>  > > > > [1]
>  > > > >
>  > > > >
>  > > >
>  > >
>  >
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>  > > > > [2]
>  > > > >
>  > > > >
>  

Re: [DISCUSS] Support Python ML Pipeline API

2020-02-13 Thread Becket Qin
I just had an offline chat with Hequn and realized that FLIP-96 has already
been opened for this discussion. I missed that because the FLIP was not
mentioned in the thread.

I am fine with proceeding to a vote.

Thanks,

Jiangjie (Becket) Qin

On Fri, Feb 14, 2020 at 12:52 PM Becket Qin  wrote:

> Hi Hequn,
>
> Given this is an addition to the public API, we probably should follow the
> FLIP process. It would be a quick one though, I think.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Feb 14, 2020 at 10:03 AM Hequn Cheng  wrote:
>
>> Hi all,
>>
>> Thanks a lot for your valuable feedback!
>> As it seems we have reached a consensus on the discussion now. I have
>> started a VOTE thread[1]. Looking forward to your vote.
>>
>> Best,
>> Hequn
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Support-Python-ML-Pipeline-API-td37637.html
>>
>> On Thu, Feb 13, 2020 at 10:40 AM Becket Qin  wrote:
>>
>>> +1. I'd say this is almost a must-have for machine learning.
>>>
>>> Thanks,
>>>
>>> Jiangjie (Becket) Qin
>>>
>>> On Thu, Feb 13, 2020 at 10:03 AM Rong Rong  wrote:
>>>
 Thanks for driving this initiative @Hequn Cheng .

 Moving towards python based ML is definitely a huge win consider how
 large
 the python-ML community is. a big +1 on my side!
 Regarding the doc, I only left a few comments on the specific APIs.
 overall
 the architecture looks very good!

 Looking forward to it!
 --
 Rong

 On Sun, Feb 9, 2020 at 10:28 PM Hequn Cheng  wrote:

 > Hi everyone,
 >
 > Thanks a lot for your feedback. I have created the FLIP[1].
 >
 > Best,
 > Hequn
 >
 > [1]
 >
 >
 https://cwiki.apache.org/confluence/display/FLINK/FLIP+96%3A+Support+Python+ML+Pipeline+API
 >
 > On Mon, Feb 10, 2020 at 12:29 PM Dian Fu 
 wrote:
 >
 > > Hi Hequn,
 > >
 > > Thanks for bringing up the discussion. +1 to this feature. The
 design
 > LGTM.
 > > It's great that the Python ML users could use both the Java Pipeline
 > > Transformer/Estimator/Model classes and the Python
 > > Pipeline Transformer/Estimator/Model in the same job.
 > >
 > > Regards,
 > > Dian
 > >
 > > On Mon, Feb 10, 2020 at 11:08 AM jincheng sun <
 sunjincheng...@gmail.com>
 > > wrote:
 > >
 > > > Hi Hequn,
 > > >
 > > > Thanks for bring up this discussion.
 > > >
 > > > +1 for add Python ML Pipeline API, even though the Java pipeline
 API
 > may
 > > > change.
 > > >
 > > > I would like to suggest create a FLIP for this API changes. :)
 > > >
 > > > Best,
 > > > Jincheng
 > > >
 > > >
 > > > Hequn Cheng  于2020年2月5日周三 下午5:24写道:
 > > >
 > > > > Hi everyone,
 > > > >
 > > > > FLIP-39[1] rebuilds the Flink ML pipeline on top of TableAPI and
 > > > introduces
 > > > > a new set of Java APIs. As Python is widely used in ML areas,
 > providing
 > > > > Python ML Pipeline APIs for Flink can not only make it easier to
 > write
 > > ML
 > > > > jobs for Python users but also broaden the adoption of Flink ML.
 > > > >
 > > > > Given this, Jincheng and I discussed offline about the support
 of
 > > Python
 > > > ML
 > > > > Pipeline API and drafted a design doc[2]. We'd like to achieve
 three
 > > > goals
 > > > > for supporting Python Pipeline API:
 > > > > - Add Python pipeline API according to Java pipeline API(we will
 > adapt
 > > > the
 > > > > Python pipeline API if Java pipeline API changes).
 > > > > - Support native Python Transformer/Estimator/Model, i.e.,
 users can
 > > > write
 > > > > not only Python Transformer/Estimator/Model wrappers for
 calling Java
 > > > ones
 > > > > but also can write native Python Transformer/Estimator/Models.
 > > > > - Ease of use. Support keyword arguments when defining
 parameters.
 > > > >
 > > > > More details can be found in the design doc and we are looking
 > forward
 > > to
 > > > > your feedback.
 > > > >
 > > > > Best,
 > > > > Hequn
 > > > >
 > > > > [1]
 > > > >
 > > > >
 > > >
 > >
 >
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
 > > > > [2]
 > > > >
 > > > >
 > > >
 > >
 >
 https://docs.google.com/document/d/1fwSO5sRNWMoYuvNgfQJUV6N2n2q5UEVA4sezCljKcVQ/edit?usp=sharing
 > > > >
 > > >
 > >
 >

>>>


Re: [VOTE] Support Python ML Pipeline API

2020-02-13 Thread jincheng sun
+1(binding)

Best,
Jincheng


Hequn Cheng  于2020年2月14日周五 上午9:45写道:

> Hi everyone,
>
> I'd like to start the vote of FLIP-96[1] which is discussed and reached
> consensus in the discussion thread[2].
> The vote will be open for at least 72 hours. Unless there is an objection,
> I will try to close it by Feb 19, 2020 02:00 UTC if we have received
> sufficient votes.
>
> Thanks,
> Hequn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-96%3A+Support+Python+ML+Pipeline+API
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>


Re: [DISCUSS] Support Python ML Pipeline API

2020-02-13 Thread Becket Qin
Hi Hequn,

Given this is an addition to the public API, we probably should follow the
FLIP process. It would be a quick one though, I think.

Thanks,

Jiangjie (Becket) Qin

On Fri, Feb 14, 2020 at 10:03 AM Hequn Cheng  wrote:

> Hi all,
>
> Thanks a lot for your valuable feedback!
> As it seems we have reached a consensus on the discussion now. I have
> started a VOTE thread[1]. Looking forward to your vote.
>
> Best,
> Hequn
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Support-Python-ML-Pipeline-API-td37637.html
>
> On Thu, Feb 13, 2020 at 10:40 AM Becket Qin  wrote:
>
>> +1. I'd say this is almost a must-have for machine learning.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Thu, Feb 13, 2020 at 10:03 AM Rong Rong  wrote:
>>
>>> Thanks for driving this initiative @Hequn Cheng .
>>>
>>> Moving towards python based ML is definitely a huge win consider how
>>> large
>>> the python-ML community is. a big +1 on my side!
>>> Regarding the doc, I only left a few comments on the specific APIs.
>>> overall
>>> the architecture looks very good!
>>>
>>> Looking forward to it!
>>> --
>>> Rong
>>>
>>> On Sun, Feb 9, 2020 at 10:28 PM Hequn Cheng  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > Thanks a lot for your feedback. I have created the FLIP[1].
>>> >
>>> > Best,
>>> > Hequn
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+96%3A+Support+Python+ML+Pipeline+API
>>> >
>>> > On Mon, Feb 10, 2020 at 12:29 PM Dian Fu 
>>> wrote:
>>> >
>>> > > Hi Hequn,
>>> > >
>>> > > Thanks for bringing up the discussion. +1 to this feature. The design
>>> > LGTM.
>>> > > It's great that the Python ML users could use both the Java Pipeline
>>> > > Transformer/Estimator/Model classes and the Python
>>> > > Pipeline Transformer/Estimator/Model in the same job.
>>> > >
>>> > > Regards,
>>> > > Dian
>>> > >
>>> > > On Mon, Feb 10, 2020 at 11:08 AM jincheng sun <
>>> sunjincheng...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi Hequn,
>>> > > >
>>> > > > Thanks for bring up this discussion.
>>> > > >
>>> > > > +1 for add Python ML Pipeline API, even though the Java pipeline
>>> API
>>> > may
>>> > > > change.
>>> > > >
>>> > > > I would like to suggest create a FLIP for this API changes. :)
>>> > > >
>>> > > > Best,
>>> > > > Jincheng
>>> > > >
>>> > > >
>>> > > > Hequn Cheng  于2020年2月5日周三 下午5:24写道:
>>> > > >
>>> > > > > Hi everyone,
>>> > > > >
>>> > > > > FLIP-39[1] rebuilds the Flink ML pipeline on top of TableAPI and
>>> > > > introduces
>>> > > > > a new set of Java APIs. As Python is widely used in ML areas,
>>> > providing
>>> > > > > Python ML Pipeline APIs for Flink can not only make it easier to
>>> > write
>>> > > ML
>>> > > > > jobs for Python users but also broaden the adoption of Flink ML.
>>> > > > >
>>> > > > > Given this, Jincheng and I discussed offline about the support of
>>> > > Python
>>> > > > ML
>>> > > > > Pipeline API and drafted a design doc[2]. We'd like to achieve
>>> three
>>> > > > goals
>>> > > > > for supporting Python Pipeline API:
>>> > > > > - Add Python pipeline API according to Java pipeline API(we will
>>> > adapt
>>> > > > the
>>> > > > > Python pipeline API if Java pipeline API changes).
>>> > > > > - Support native Python Transformer/Estimator/Model, i.e., users
>>> can
>>> > > > write
>>> > > > > not only Python Transformer/Estimator/Model wrappers for calling
>>> Java
>>> > > > ones
>>> > > > > but also can write native Python Transformer/Estimator/Models.
>>> > > > > - Ease of use. Support keyword arguments when defining
>>> parameters.
>>> > > > >
>>> > > > > More details can be found in the design doc and we are looking
>>> > forward
>>> > > to
>>> > > > > your feedback.
>>> > > > >
>>> > > > > Best,
>>> > > > > Hequn
>>> > > > >
>>> > > > > [1]
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>> > > > > [2]
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://docs.google.com/document/d/1fwSO5sRNWMoYuvNgfQJUV6N2n2q5UEVA4sezCljKcVQ/edit?usp=sharing
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>


[jira] [Created] (FLINK-16051) subtask id in Checkpoint UI not consistent with TaskUI

2020-02-13 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-16051:
--

 Summary: subtask id in Checkpoint UI not consistent with TaskUI
 Key: FLINK-16051
 URL: https://issues.apache.org/jira/browse/FLINK-16051
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.10.0
Reporter: Jiayi Liao
 Attachments: checkpointui.png, taskui.png

The subtask id in Subtask UI starts from 0, but in Checkpoint UI subtask id 
starts from 1.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Cwiki edit access

2020-02-13 Thread lining jing
As I am currently drafting FLIP-98 ~ 104 with Yadong, so i need to edit it.

jincheng sun  于2020年2月14日周五 上午11:43写道:

> Hi Lining,
>
> Could you please add more information about why you need page access
> authority, and which page you want to edit?
>
> Best,
> Jincheng
>
>
> lining jing  于2020年2月14日周五 上午11:09写道:
>
> > Hi, I would like to be able to edit pages in the Confluence Flink space.
> > Can
> > someone give me access, please?
> >
> > Thanks
> >
>


Re: Cwiki edit access

2020-02-13 Thread jincheng sun
Hi Lining,

Could you please add more information about why you need page access
authority, and which page you want to edit?

Best,
Jincheng


lining jing  于2020年2月14日周五 上午11:09写道:

> Hi, I would like to be able to edit pages in the Confluence Flink space.
> Can
> someone give me access, please?
>
> Thanks
>


Re: Cwiki edit access

2020-02-13 Thread jincheng sun
Hi Lining,

Could you please information about why you need page access authority, and
which page you want edit?

Best,
Jincheng

lining jing  于2020年2月14日周五 上午11:09写道:

> Hi, I would like to be able to edit pages in the Confluence Flink space.
> Can
> someone give me access, please?
>
> Thanks
>
-- 

Best,
Jincheng
-
Twitter: https://twitter.com/sunjincheng121
-


Cwiki edit access

2020-02-13 Thread lining jing
Hi, I would like to be able to edit pages in the Confluence Flink space. Can
someone give me access, please?

Thanks


[jira] [Created] (FLINK-16050) Add Attempt Information

2020-02-13 Thread lining (Jira)
lining created FLINK-16050:
--

 Summary: Add Attempt Information
 Key: FLINK-16050
 URL: https://issues.apache.org/jira/browse/FLINK-16050
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST, Runtime / Web Frontend
Reporter: lining


According to the 
[docs|https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid-vertices-vertexid-subtasks-subtaskindex],
 there may exist more than one attempt in a subtask, but there is no way to get 
the attempt history list in the REST API, users have no way to know if the 
subtask has failed before. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] FLIP-75: Flink Web UI Improvement Proposal

2020-02-13 Thread Yadong Xie
Hi all,

I would like to start the vote for FLIP-75 which proposes to provide a
better experience for Flink users in web UI.

In order to make the vote easier, the FLIP-75 was split
into several top-level sub FLIPs:

   - FLIP-98: Better Back Pressure Detection
   

   - FLIP-99: Make Max Exception Configurable
   

   - FLIP-100: Add Attempt Information
   

   - FLIP-101: Add Pending Slots Detail
   

   - FLIP-102: Add More Metrics to TaskManager
   

   - FLIP-103: Better TM/JM Log Display
   
   - FLIP-104: Add More Metrics to JobManager
   



and in order to help everyone better understand the proposal, we spent some
efforts on making an online POC

previous web: http://101.132.122.69:8081/
POC web: http://101.132.122.69:8081/web/index.html

Please fill in the following format when voting:

FLIP-98: +1 (or -1)
FLIP-99: +1 (or -1)
FLIP-100: +1 (or -1)
FLIP-101: +1 (or -1)
FLIP-102: +1 (or -1)
FLIP-103: +1 (or -1)
FLIP-104: +1 (or -1)

The vote will last for at least 72 hours, following the consensus voting
process.

FLIP wiki:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal

Discussion thread:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html

Thanks,

Yadong


Re: [DISCUSS] Support Python ML Pipeline API

2020-02-13 Thread Hequn Cheng
Hi all,

Thanks a lot for your valuable feedback!
As it seems we have reached a consensus on the discussion now. I have
started a VOTE thread[1]. Looking forward to your vote.

Best,
Hequn

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Support-Python-ML-Pipeline-API-td37637.html

On Thu, Feb 13, 2020 at 10:40 AM Becket Qin  wrote:

> +1. I'd say this is almost a must-have for machine learning.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Feb 13, 2020 at 10:03 AM Rong Rong  wrote:
>
>> Thanks for driving this initiative @Hequn Cheng .
>>
>> Moving towards python based ML is definitely a huge win consider how large
>> the python-ML community is. a big +1 on my side!
>> Regarding the doc, I only left a few comments on the specific APIs.
>> overall
>> the architecture looks very good!
>>
>> Looking forward to it!
>> --
>> Rong
>>
>> On Sun, Feb 9, 2020 at 10:28 PM Hequn Cheng  wrote:
>>
>> > Hi everyone,
>> >
>> > Thanks a lot for your feedback. I have created the FLIP[1].
>> >
>> > Best,
>> > Hequn
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+96%3A+Support+Python+ML+Pipeline+API
>> >
>> > On Mon, Feb 10, 2020 at 12:29 PM Dian Fu  wrote:
>> >
>> > > Hi Hequn,
>> > >
>> > > Thanks for bringing up the discussion. +1 to this feature. The design
>> > LGTM.
>> > > It's great that the Python ML users could use both the Java Pipeline
>> > > Transformer/Estimator/Model classes and the Python
>> > > Pipeline Transformer/Estimator/Model in the same job.
>> > >
>> > > Regards,
>> > > Dian
>> > >
>> > > On Mon, Feb 10, 2020 at 11:08 AM jincheng sun <
>> sunjincheng...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Hequn,
>> > > >
>> > > > Thanks for bring up this discussion.
>> > > >
>> > > > +1 for add Python ML Pipeline API, even though the Java pipeline API
>> > may
>> > > > change.
>> > > >
>> > > > I would like to suggest create a FLIP for this API changes. :)
>> > > >
>> > > > Best,
>> > > > Jincheng
>> > > >
>> > > >
>> > > > Hequn Cheng  于2020年2月5日周三 下午5:24写道:
>> > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > FLIP-39[1] rebuilds the Flink ML pipeline on top of TableAPI and
>> > > > introduces
>> > > > > a new set of Java APIs. As Python is widely used in ML areas,
>> > providing
>> > > > > Python ML Pipeline APIs for Flink can not only make it easier to
>> > write
>> > > ML
>> > > > > jobs for Python users but also broaden the adoption of Flink ML.
>> > > > >
>> > > > > Given this, Jincheng and I discussed offline about the support of
>> > > Python
>> > > > ML
>> > > > > Pipeline API and drafted a design doc[2]. We'd like to achieve
>> three
>> > > > goals
>> > > > > for supporting Python Pipeline API:
>> > > > > - Add Python pipeline API according to Java pipeline API(we will
>> > adapt
>> > > > the
>> > > > > Python pipeline API if Java pipeline API changes).
>> > > > > - Support native Python Transformer/Estimator/Model, i.e., users
>> can
>> > > > write
>> > > > > not only Python Transformer/Estimator/Model wrappers for calling
>> Java
>> > > > ones
>> > > > > but also can write native Python Transformer/Estimator/Models.
>> > > > > - Ease of use. Support keyword arguments when defining parameters.
>> > > > >
>> > > > > More details can be found in the design doc and we are looking
>> > forward
>> > > to
>> > > > > your feedback.
>> > > > >
>> > > > > Best,
>> > > > > Hequn
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1fwSO5sRNWMoYuvNgfQJUV6N2n2q5UEVA4sezCljKcVQ/edit?usp=sharing
>> > > > >
>> > > >
>> > >
>> >
>>
>


[VOTE] Support Python ML Pipeline API

2020-02-13 Thread Hequn Cheng
Hi everyone,

I'd like to start the vote of FLIP-96[1] which is discussed and reached
consensus in the discussion thread[2].
The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Feb 19, 2020 02:00 UTC if we have received
sufficient votes.

Thanks,
Hequn

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-96%3A+Support+Python+ML+Pipeline+API
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html


Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Zhu Zhu
+1 (non-binding)

- checked release notes, JIRA tickets and commit history
- verified the signature and checksum
- checked the maven central artifacts
  * examined the zookeeper shaded jars (both 3.4.10 and 3.5.6), curator and
zookeeper classes are there and shaded
- built from the source archive as well as the git tag
- checked the website pull request

Thanks,
Zhu Zhu

Ufuk Celebi  于2020年2月14日周五 上午12:32写道:

> PS: Also verified the NOTICE changes since the last RC.
>
> On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:
>
> > Hey Chensay,
> >
> > +1 (binding).
> >
> > - Verified checksum ✅
> > - Verified signature ✅
> > - Jira changelog looks good to me ✅
> > - Website PR looks good to me ✅
> > - Verified no unshaded dependencies (except the Hadoop modules which I
> > think is expected) ✅
> > - Verified dependency management fix FLINK-15540
> > (commons-collections:3.2.2 as expected) ✅
> > - Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
> > flink-shaded-force-shading and the Hadoop modules which I think is
> > expected) ✅
> >
> > – Ufuk
> >
> > On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Checked issues listed in release notes: ok
> > > Checked sums and signatures: ok
> > > Checked the maven central artifices: ok
> > > Built from source: ok (8u101, 11.0.4)
> > > Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> > > Checked contents of zookeeper shaded jars: ok
> > > - no unshaded classes
> > > - shading pattern is correct
> > > Checked website pull request listing the new release: ok
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
> > wrote:
> > >
> > > > Hi everyone,
> > > > Please review and vote on the release candidate #3 for the version
> > 10.0,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2], which are signed with the key with fingerprint 11D464BA [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag "release-10.0-rc3 [5],
> > > > * website pull request listing the new release [6].
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Chesnay
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
> > > > [2]
> > https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1337
> > > > [5]
> > > >
> > > >
> >
> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
> > > > [6] https://github.com/apache/flink-web/pull/304
> > > >
> > > >
> > > >
> > > >
> >
>


Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Ufuk Celebi
PS: Also verified the NOTICE changes since the last RC.

On Thu, Feb 13, 2020 at 5:25 PM Ufuk Celebi  wrote:

> Hey Chensay,
>
> +1 (binding).
>
> - Verified checksum ✅
> - Verified signature ✅
> - Jira changelog looks good to me ✅
> - Website PR looks good to me ✅
> - Verified no unshaded dependencies (except the Hadoop modules which I
> think is expected) ✅
> - Verified dependency management fix FLINK-15540
> (commons-collections:3.2.2 as expected) ✅
> - Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
> flink-shaded-force-shading and the Hadoop modules which I think is
> expected) ✅
>
> – Ufuk
>
> On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
> >
> > +1 (non-binding)
> >
> > Checked issues listed in release notes: ok
> > Checked sums and signatures: ok
> > Checked the maven central artifices: ok
> > Built from source: ok (8u101, 11.0.4)
> > Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> > Checked contents of zookeeper shaded jars: ok
> > - no unshaded classes
> > - shading pattern is correct
> > Checked website pull request listing the new release: ok
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler 
> wrote:
> >
> > > Hi everyone,
> > > Please review and vote on the release candidate #3 for the version
> 10.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2], which are signed with the key with fingerprint 11D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-10.0-rc3 [5],
> > > * website pull request listing the new release [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1]
> > >
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
> > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1337
> > > [5]
> > >
> > >
> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
> > > [6] https://github.com/apache/flink-web/pull/304
> > >
> > >
> > >
> > >
>


Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Ufuk Celebi
Hey Chensay,

+1 (binding).

- Verified checksum ✅
- Verified signature ✅
- Jira changelog looks good to me ✅
- Website PR looks good to me ✅
- Verified no unshaded dependencies (except the Hadoop modules which I
think is expected) ✅
- Verified dependency management fix FLINK-15540 (commons-collections:3.2.2
as expected) ✅
- Verified pom exclusion fix FLINK-15815 (no META-INF/maven except for
flink-shaded-force-shading and the Hadoop modules which I think is
expected) ✅

– Ufuk

On Thu, Feb 13, 2020 at 3:08 PM Yu Li  wrote:
>
> +1 (non-binding)
>
> Checked issues listed in release notes: ok
> Checked sums and signatures: ok
> Checked the maven central artifices: ok
> Built from source: ok (8u101, 11.0.4)
> Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
> Checked contents of zookeeper shaded jars: ok
> - no unshaded classes
> - shading pattern is correct
> Checked website pull request listing the new release: ok
>
> Best Regards,
> Yu
>
>
> On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler  wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #3 for the version 10.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with fingerprint 11D464BA [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-10.0-rc3 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Chesnay
> >
> > [1]
> >
> >
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
https://repository.apache.org/content/repositories/orgapacheflink-1337
> > [5]
> >
> >
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
> > [6] https://github.com/apache/flink-web/pull/304
> >
> >
> >
> >


Re: [DISCUSS][TABLE] Issue with package structure in the Table API

2020-02-13 Thread Timo Walther

Hi everyone,

thanks for bringing our offline discussion to the mailing list, Dawid. 
This is a very bad mistake that has been made in the past. In general, 
we should discourage putting the terms "java" and "scala" in package 
names as this has side effects on Scala imports.


I really don't like forcing users to put a "_root_" in their imports. It 
also happended to me a couple of times while developing Flink code that 
I was sitting in front of my IDE wondering why the code doesn't compile.


I'm also in favor of peforming this big change as early as possible. I'm 
sure Table API users are already quite annoyed by all the 
changes/refactorings happening. Changing the imports twice or three 
times is even more cumbersome.


Having to import just "org.apache.flink.table.api._" is a big usability 
plus for new users and especially interactive shell/notebook users.


Regards,
Timo


On 13.02.20 14:39, Dawid Wysakowicz wrote:

Hi devs,

I wanted to bring up a problem that we have in our package structure.

As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
started advertising importing two packages in the scala API:
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._

The intention was that the first package (org.apache.flink.table.api)
would contain all api classes that are required to work with the unified
TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
expressionDsl. The second package (org.apache.flink.table.api.scala._)
would've been an optional package that contain bridging conversions
between Table and DataStream/DataSet APIs including the more specific
StreamTableEnvironment and BatchTableEnvironment.

The part missing in the original plan was to move all expressions
implicit conversions to the org.apache.flink.table.api package. Without
that step users of pure table program (that do not use the
table-api-scala-bridge module) cannot use the Expression DSL. Therefore
we should try to move those expressions as soon as possible.

The problem with this approach is that it clashes with common imports of
classes from java.* and scala.* packages. Users are forced to write:

import org.apache.flink.table.api._
import org.apache.flink.table.api.scala_
import _root_.scala.collection.mutable.ArrayBuffer
import _root_.java.lang.Integer

Besides being cumbersome, it also messes up the macro based type
extraction (org.apache.flink.api.scala#createTypeInformation) for all
classes from scala.* packages. I don't fully understand the reasons for
it, but the createTypeInformation somehow drops the _root_ for
WeakTypeTags. So e.g. for a call:
createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
actually tries to construct a TypeInformation for
org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
obviously fails.



What I would suggest for a target solution is to have:

1. for users of unified Table API with Scala ExpressionDSL

import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
and expressions)

2. for users of Table API with scala's bridging conversions

import org.apache.flink.table.api._ (for Tumble etc. and expressions)
import org.apache.flink.table.api.bridge.scala._ (for bridging
conversions and StreamTableEnvironment)

3. for users of unified Table API with Java ExpressionDSL

import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)

4. for users of Table API with java's bridging conversions

import org.apache.flink.table.api.* (for Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)
import org.apache.flink.table.api.bridge.java.*

To have that working we need to:
* move the scala expression DSL to org.apache.flink.table.api package in
table-api-scala module
* move all classes from org.apache.flink.table.api.scala and
org.apache.flink.table.api.java packages to
org.apache.flink.table.api.bridge.scala and
org.apache.flink.table.api.bridge.java accordingly and drop the former
packages

The biggest question I have is how do we want to perform that
transition. If we do it in one go we will break existing user programs
that uses classes from org.apache.flink.table.api.java/scala. Those
packages were present from day one of Table API. Nevertheless this would
be my preffered way to move forward as we annoy users only once, even if
one big time :(

Different option would be to make that transition gradually in 3 releases.
  1. In the first we introduce the
org.apache.flink.table.api.bridge.java/scala, and we have
StreamTableEnvironment etc. as well as expression DSL in both. We ask
users to migrate to the new package.
  2. We drop the org.apache.flink.table.api.java/scala and ask users to
import additionally org.apache.flink.table.api.* for expressions (this
is the same as we did in 1.9.0, the thing though it was extremely hard
to do it then)
  3. We finally move the expression DSL from

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-13 Thread Gyula Fóra
Thanks for the feedback Aljoscha!

I have a POC ready with the Flink changes + the Atlas hook implementation.
I will try to push this to a public repo tomorrow and we can discuss
further based on that!

Gyula

On Thu, Feb 13, 2020, 15:26 Aljoscha Krettek  wrote:

> I think exposing the Pipeline should be ok. Using the internal
> StreamGraph might be problematic because this might change/break but
> that's a problem of the external code.
>
> Aljoscha
>
> On 11.02.20 16:26, Gyula Fóra wrote:
> > Hi All!
> >
> > I have made a prototype that simply adds a getPipeline() method to the
> > JobClient interface. Then I could easily implement the Atlas hook using
> the
> > JobListener interface. I simply check if Pipeline is instanceof
> StreamGraph
> > and do the logic there.
> >
> > I think this is so far the cleanest approach and I much prefer this
> > compared to working on the JobGraph directly which would expose even more
> > messy internals.
> >
> > Unfortunately this change alone is not enough for the integration as we
> > need to make sure that all Sources/Sinks that we want to integrate to
> atlas
> > publicly expose some of their properties:
> >
> > - Kafka source/sink:
> >- Kafka props
> >- Topic(s) - this is tricky for sinks
> > - FS source /sink:
> >- Hadoop props
> >- Base path for StreamingFileSink
> >- Path for ContinuousMonitoringSource
> >
> > Most of these are straightforward changes, the only question is what we
> > want to register in Atlas from the available connectors. Ideally users
> > could also somehow register their own Atlas metadata for custom sources
> and
> > sinks, we could probably introduce an interface for that in Atlas.
> >
> > Cheers,
> > Gyula
> >
> > On Fri, Feb 7, 2020 at 10:37 AM Gyula Fóra  wrote:
> >
> >> Maybe we could improve the Pipeline interface in the long run, but as a
> >> temporary solution the JobClient could expose a getPipeline() method.
> >>
> >> That way the implementation of the JobListener could check if its a
> >> StreamGraph or a Plan.
> >>
> >> How bad does that sound?
> >>
> >> Gyula
> >>
> >> On Fri, Feb 7, 2020 at 10:19 AM Gyula Fóra 
> wrote:
> >>
> >>> Hi Aljoscha!
> >>>
> >>> That's a valid concert but we should try to figure something out, many
> >>> users need this before they can use Flink.
> >>>
> >>> I think the closest thing we have right now is the StreamGraph. In
> >>> contrast with the JobGraph  the StreamGraph is pretty nice from a
> metadata
> >>> perspective :D
> >>> The big downside of exposing the StreamGraph is that we don't have it
> in
> >>> batch. On the other hand we could expose the JobGraph but then the
> >>> integration component would still have to do the heavy lifting for
> batch
> >>> and stream specific operators and UDFs.
> >>>
> >>> Instead of exposing either StreamGraph/JobGraph, we could come up with
> a
> >>> metadata like representation for the users but that would be like
> >>> implementing Atlas integration itself without Atlas dependencies :D
> >>>
> >>> As a comparison point, this is how it works in Storm:
> >>> Every operator (spout/bolt), stores a config map (string->string) with
> >>> all the metadata such as operator class, and the operator specific
> configs.
> >>> The Atlas hook works on this map.
> >>> This is very fragile and depends on a lot of internals. Kind of like
> >>> exposing the JobGraph but much worse. I think we can do better.
> >>>
> >>> Gyula
> >>>
> >>> On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek 
> >>> wrote:
> >>>
>  If we need it, we can probably beef up the JobListener to allow
>  accessing some information about the whole graph or sources and sinks.
>  My only concern right now is that we don't have a stable interface for
>  our job graphs/pipelines right now.
> 
>  Best,
>  Aljoscha
> 
>  On 06.02.20 23:00, Gyula Fóra wrote:
> > Hi Jeff & Till!
> >
> > Thanks for the feedback, this is exactly the discussion I was looking
>  for.
> > The JobListener looks very promising if we can expose the JobGraph
>  somehow
> > (correct me if I am wrong but it is not accessible at the moment).
> >
> > I did not know about this feature that's why I added my JobSubmission
>  hook
> > which was pretty similar but only exposing the JobGraph. In general I
>  like
> > the listener better and I would not like to add anything extra if we
>  can
> > avoid it.
> >
> > Actually the bigger part of the integration work that will need more
> > changes in Flink will be regarding the accessibility of sources/sinks
>  from
> > the JobGraph and their specific properties. For instance at the
> moment
>  the
> > Kafka sources and sinks do not expose anything publicly such as
> topics,
> > kafka configs, etc. Same goes for other data connectors that we need
> to
> > integrate in the long run. I guess there will be a separate thread on
> 

[jira] [Created] (FLINK-16049) Remove outdated "Best Pracatices" section from Application Development Section

2020-02-13 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-16049:


 Summary: Remove outdated "Best Pracatices" section from 
Application Development Section
 Key: FLINK-16049
 URL: https://issues.apache.org/jira/browse/FLINK-16049
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Improve history server with log support

2020-02-13 Thread Aljoscha Krettek

Hi,

what's the difference in approach to the mentioned related Jira Issue 
([1])? I commented there because I'm skeptical about adding 
Hadoop-specific code to the generic cluster components.


Best,
Aljoscha

[1] https://issues.apache.org/jira/browse/FLINK-14317

On 13.02.20 03:47, SHI Xiaogang wrote:

Hi Rong Rong,

Thanks for the proposal. We are also suffering from some pains brought by
history server. To address them, we propose a trace system, which is very
similar to the metric system, for historical information.

A trace is semi-structured information about events in Flink. Useful traces
include:
* job traces: which contain the job graph of submitted jobs.
* schedule traces: A schedule trace is typically composed of the
information of task slots. They are generated when a job finishes, fails,
or is canceled. As a job may restart mutliple times, a job typically has
multiple schedule traces.
* checkpoint traces: which are generated when a checkpoint completes or
fails.
* task manager traces: which are generated when a task manager terminates.
Users can access the link to aggregated logs intaskmanager traces.

Users can use TraceReport to collect traces in Flink and export them to
external storage (e.g., ElasticSearch). By retrieving traces when
exceptions happen, we can improve user experience in altering.

Regards,
Xiaogang

Rong Rong  于2020年2月13日周四 上午9:41写道:


Hi All,

Recently we have been experimenting using Flink’s history server as a
centralized debugging service for completed streaming jobs.

Specifically, we dynamically generate links to access log files on the YARN
host; in the meantime, we use the Flink history server to show job graphs,
exceptions and other info of the completed jobs[2].

This causes some pain for our users, namely: It is inconvenient to go to
YARN host to access logs; then go to Flink history server for the other
information.

Thus we would like to propose an improvement to the currently Flink history
server:

-

To support dynamic links to residual log files from the host machine
within the retention period [3];
-

To support dynamic links to aggregated log files provided by the
cluster, if supported: such as Hadoop HistoryServer[1], or Kubernetes
cluster level logging[4]?
-

   Similar integration with Hadoop HistoryServer was already proposed
   before[5] with slightly different approach.


Any feedback and suggestions are highly appreciated!

--

Rong

[1]

https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html

[2]

https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/historyserver.html

[3]

https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.log.retain-seconds

[4]

https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures
[5] https://issues.apache.org/jira/browse/FLINK-14317





[jira] [Created] (FLINK-16048) Support read avro data that serialized by KafkaAvroSerializer from Kafka in Table

2020-02-13 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-16048:
--

 Summary: Support read avro data that serialized by 
KafkaAvroSerializer from Kafka in Table
 Key: FLINK-16048
 URL: https://issues.apache.org/jira/browse/FLINK-16048
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.11.0
Reporter: Leonard Xu
 Fix For: 1.11.0


 found SQL Kafka connector can not consume avro data that was serialized by 
`KafkaAvroSerializer` and only can consume Row data with avro schema because we 
use `AvroRowDeserializationSchema/AvroRowSerializationSchema` to se/de data in  
`AvroRowFormatFactory`. 

I think we should support this because `KafkaAvroSerializer` is very common in 
Kafka.

and someone met same question in stackoverflow[1].


[[1]https://stackoverflow.com/questions/56452571/caused-by-org-apache-avro-avroruntimeexception-malformed-data-length-is-negat/56478259|https://stackoverflow.com/questions/56452571/caused-by-org-apache-avro-avroruntimeexception-malformed-data-length-is-negat/56478259]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-13 Thread Dawid Wysakowicz
Sorry for late reply,

@all I think there is a general consensus that we want to drop ES 2.x
support. I created https://issues.apache.org/jira/browse/FLINK-16046 to
track it.


@Stephan @Chesnay @Itamar In our connectors we use Java High Level Rest
Client. ES promises to maintain compatibility of it with any newer minor
version of ES. So if we have 6.1 client we can use it with any 6.2, 6.3 etc.

ES provides also a low level rest client which does not include any
direct es dependencies and can work with any version of ES. It does not
provide any marshalling unmarshalling or higher level features as
Chesnay said.

Correct me if I am wrong @Itamar but your HTTP client is a simplified
version of the ES's high level rest client with a subset of its
features. I think it will still have the same problems as ES's High
Level Rest Client's because ES does not guarantee that newer message
formats will be compatible with older versions of ES or that message
formats are compatible across major versions at all.


@Stephan @Danny As for the 5.x connector. Any ideas how can we get
user's feedback about it? I cross posted on the user mailing list with
no luck so far. Personally I would be in favor of dropping the
connector. Worst case scenario users still have the possibility of
building the connector themselves from source with just bumping the
flink's versions. As far as I can tell there were no changes to the code
base for quite some time.

Best,

Dawid

On 11/02/2020 10:46, Chesnay Schepler wrote:
> I suppose the downside in an HTTP ES sink is that you don't get _any_
> form of high-level API from ES, and we'd have to manually build an
> HTTP request that matches the ES format. Of course you also lose any
> client-side verification that the clients did, if there is any (but I
> guess the API itself prevented certain errors).
>
> On 11/02/2020 09:32, Stephan Ewen wrote:
>> +1 to drop ES 2.x - unsure about 5.x (makes sense to get more user input
>> for that one).
>>
>> @Itamar - if you would be interested in contributing a "universal" or
>> "cross version" ES connector, that could be very interesting. Do you
>> know
>> if there are known performance issues or feature restrictions with that
>> approach?
>> @dawid what do you think about that?
>>
>>
>> On Tue, Feb 11, 2020 at 6:28 AM Danny Chan  wrote:
>>
>>> 5.x seems to have a lot of users, is the 6.x completely compatible with
>>> 5.x ~
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年2月10日 +0800 PM9:45,Dawid Wysakowicz
>>> ,写道:
 Hi all,

 As described in this https://issues.apache.org/jira/browse/FLINK-11720
 ticket our elasticsearch 5.x connector does not work out of the box on
 some systems and requires a version bump. This also happens for our
 e2e.
 We cannot bump the version in es 5.x connector, because 5.x connector
 shares a common class with 2.x that uses an API that was replaced
 in 5.2.

 Both versions are already long eol: https://www.elastic.co/support/eol

 I suggest to drop both connectors 5.x and 2.x. If it is too much to
 drop
 both of them, I would strongly suggest dropping at least 2.x connector
 and update the 5.x line to a working es client module.

 What do you think? Should we drop both versions? Drop only the 2.x
 connector? Or keep them both?

 Best,

 Dawid


>



signature.asc
Description: OpenPGP digital signature


Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-13 Thread Aljoscha Krettek
I think exposing the Pipeline should be ok. Using the internal 
StreamGraph might be problematic because this might change/break but 
that's a problem of the external code.


Aljoscha

On 11.02.20 16:26, Gyula Fóra wrote:

Hi All!

I have made a prototype that simply adds a getPipeline() method to the
JobClient interface. Then I could easily implement the Atlas hook using the
JobListener interface. I simply check if Pipeline is instanceof StreamGraph
and do the logic there.

I think this is so far the cleanest approach and I much prefer this
compared to working on the JobGraph directly which would expose even more
messy internals.

Unfortunately this change alone is not enough for the integration as we
need to make sure that all Sources/Sinks that we want to integrate to atlas
publicly expose some of their properties:

- Kafka source/sink:
   - Kafka props
   - Topic(s) - this is tricky for sinks
- FS source /sink:
   - Hadoop props
   - Base path for StreamingFileSink
   - Path for ContinuousMonitoringSource

Most of these are straightforward changes, the only question is what we
want to register in Atlas from the available connectors. Ideally users
could also somehow register their own Atlas metadata for custom sources and
sinks, we could probably introduce an interface for that in Atlas.

Cheers,
Gyula

On Fri, Feb 7, 2020 at 10:37 AM Gyula Fóra  wrote:


Maybe we could improve the Pipeline interface in the long run, but as a
temporary solution the JobClient could expose a getPipeline() method.

That way the implementation of the JobListener could check if its a
StreamGraph or a Plan.

How bad does that sound?

Gyula

On Fri, Feb 7, 2020 at 10:19 AM Gyula Fóra  wrote:


Hi Aljoscha!

That's a valid concert but we should try to figure something out, many
users need this before they can use Flink.

I think the closest thing we have right now is the StreamGraph. In
contrast with the JobGraph  the StreamGraph is pretty nice from a metadata
perspective :D
The big downside of exposing the StreamGraph is that we don't have it in
batch. On the other hand we could expose the JobGraph but then the
integration component would still have to do the heavy lifting for batch
and stream specific operators and UDFs.

Instead of exposing either StreamGraph/JobGraph, we could come up with a
metadata like representation for the users but that would be like
implementing Atlas integration itself without Atlas dependencies :D

As a comparison point, this is how it works in Storm:
Every operator (spout/bolt), stores a config map (string->string) with
all the metadata such as operator class, and the operator specific configs.
The Atlas hook works on this map.
This is very fragile and depends on a lot of internals. Kind of like
exposing the JobGraph but much worse. I think we can do better.

Gyula

On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek 
wrote:


If we need it, we can probably beef up the JobListener to allow
accessing some information about the whole graph or sources and sinks.
My only concern right now is that we don't have a stable interface for
our job graphs/pipelines right now.

Best,
Aljoscha

On 06.02.20 23:00, Gyula Fóra wrote:

Hi Jeff & Till!

Thanks for the feedback, this is exactly the discussion I was looking

for.

The JobListener looks very promising if we can expose the JobGraph

somehow

(correct me if I am wrong but it is not accessible at the moment).

I did not know about this feature that's why I added my JobSubmission

hook

which was pretty similar but only exposing the JobGraph. In general I

like

the listener better and I would not like to add anything extra if we

can

avoid it.

Actually the bigger part of the integration work that will need more
changes in Flink will be regarding the accessibility of sources/sinks

from

the JobGraph and their specific properties. For instance at the moment

the

Kafka sources and sinks do not expose anything publicly such as topics,
kafka configs, etc. Same goes for other data connectors that we need to
integrate in the long run. I guess there will be a separate thread on

this

once we iron out the initial integration points :)

I will try to play around with the JobListener interface tomorrow and

see

if I can extend it to meet our needs.

Cheers,
Gyula

On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:


Hi Gyula,

Flink 1.10 introduced JobListener which is invoked after job

submission and

finished.  May we can add api on JobClient to get what info you

needed for

altas integration.




https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46



Gyula Fóra  于2020年2月5日周三 下午7:48写道:


Hi all!

We have started some preliminary work on the Flink - Atlas

integration at

Cloudera. It seems that the integration will require some new hook
interfaces at the jobgraph generation and submission phases, so I

figured I

will open a discussion thread with my initial ideas 

[jira] [Created] (FLINK-16047) Blink planner produces wrong aggregate results with state clean up

2020-02-13 Thread Timo Walther (Jira)
Timo Walther created FLINK-16047:


 Summary: Blink planner produces wrong aggregate results with state 
clean up
 Key: FLINK-16047
 URL: https://issues.apache.org/jira/browse/FLINK-16047
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Timo Walther


It seems that FLINK-10674 has not been ported to the Blink planner.

Because state clean up happens in processing time, it might be the case that 
retractions are arriving after the state has been cleaned up. Before these 
changes, a new accumulator was created and invalid retraction messages were 
emitted. This change drops retraction messages for which no accumulator exists.

These lines are missing:
{code}
if (null == accumulators) {
  // Don't create a new accumulator for a retraction message. This
  // might happen if the retraction message is the first message for the
  // key or after a state clean up.
  if (!inputC.change) {
return
  }
  // first accumulate message
  firstRow = true
  accumulators = function.createAccumulators()
} else {
  firstRow = false
}
{code}

The bug has not been verified. I spotted it only by looking at the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release flink-shaded 10.0, release candidate #3

2020-02-13 Thread Yu Li
+1 (non-binding)

Checked issues listed in release notes: ok
Checked sums and signatures: ok
Checked the maven central artifices: ok
Built from source: ok (8u101, 11.0.4)
Built from source (with -Dshade-sources): ok (8u101, 11.0.4)
Checked contents of zookeeper shaded jars: ok
- no unshaded classes
- shading pattern is correct
Checked website pull request listing the new release: ok

Best Regards,
Yu


On Wed, 12 Feb 2020 at 22:09, Chesnay Schepler  wrote:

> Hi everyone,
> Please review and vote on the release candidate #3 for the version 10.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint 11D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-10.0-rc3 [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346746
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc3/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1337
> [5]
>
> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc3
> [6] https://github.com/apache/flink-web/pull/304
>
>
>
>


[jira] [Created] (FLINK-16046) Drop Elasticsearch 2.x connector

2020-02-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16046:


 Summary: Drop Elasticsearch 2.x connector
 Key: FLINK-16046
 URL: https://issues.apache.org/jira/browse/FLINK-16046
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Reporter: Dawid Wysakowicz
 Fix For: 1.11.0


We should drop the ES 2.x connector as discussed here: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-connectors-for-Elasticsearch-2-x-and-5-x-td37471.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS][TABLE] Issue with package structure in the Table API

2020-02-13 Thread Dawid Wysakowicz
Hi devs,

I wanted to bring up a problem that we have in our package structure.

As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
started advertising importing two packages in the scala API:
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._

The intention was that the first package (org.apache.flink.table.api)
would contain all api classes that are required to work with the unified
TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
expressionDsl. The second package (org.apache.flink.table.api.scala._)
would've been an optional package that contain bridging conversions
between Table and DataStream/DataSet APIs including the more specific
StreamTableEnvironment and BatchTableEnvironment.

The part missing in the original plan was to move all expressions
implicit conversions to the org.apache.flink.table.api package. Without
that step users of pure table program (that do not use the
table-api-scala-bridge module) cannot use the Expression DSL. Therefore
we should try to move those expressions as soon as possible.

The problem with this approach is that it clashes with common imports of
classes from java.* and scala.* packages. Users are forced to write:

import org.apache.flink.table.api._
import org.apache.flink.table.api.scala_
import _root_.scala.collection.mutable.ArrayBuffer
import _root_.java.lang.Integer

Besides being cumbersome, it also messes up the macro based type
extraction (org.apache.flink.api.scala#createTypeInformation) for all
classes from scala.* packages. I don't fully understand the reasons for
it, but the createTypeInformation somehow drops the _root_ for
WeakTypeTags. So e.g. for a call:
createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
actually tries to construct a TypeInformation for
org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
obviously fails.



What I would suggest for a target solution is to have:

1. for users of unified Table API with Scala ExpressionDSL

import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
and expressions)

2. for users of Table API with scala's bridging conversions

import org.apache.flink.table.api._ (for Tumble etc. and expressions)
import org.apache.flink.table.api.bridge.scala._ (for bridging
conversions and StreamTableEnvironment)

3. for users of unified Table API with Java ExpressionDSL

import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)

4. for users of Table API with java's bridging conversions

import org.apache.flink.table.api.* (for Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)
import org.apache.flink.table.api.bridge.java.*

To have that working we need to:
* move the scala expression DSL to org.apache.flink.table.api package in
table-api-scala module
* move all classes from org.apache.flink.table.api.scala and
org.apache.flink.table.api.java packages to
org.apache.flink.table.api.bridge.scala and
org.apache.flink.table.api.bridge.java accordingly and drop the former
packages

The biggest question I have is how do we want to perform that
transition. If we do it in one go we will break existing user programs
that uses classes from org.apache.flink.table.api.java/scala. Those
packages were present from day one of Table API. Nevertheless this would
be my preffered way to move forward as we annoy users only once, even if
one big time :(

Different option would be to make that transition gradually in 3 releases.
 1. In the first we introduce the
org.apache.flink.table.api.bridge.java/scala, and we have
StreamTableEnvironment etc. as well as expression DSL in both. We ask
users to migrate to the new package.
 2. We drop the org.apache.flink.table.api.java/scala and ask users to
import additionally org.apache.flink.table.api.* for expressions (this
is the same as we did in 1.9.0, the thing though it was extremely hard
to do it then)
 3. We finally move the expression DSL from
org.apache.flink.table.api.bridge.scala to org.apache.flink.table.api
 
What do you think about it?

Best,

Dawid




signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-16045) Extract connectors documentation to a top-level section

2020-02-13 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-16045:


 Summary: Extract connectors documentation to a top-level section
 Key: FLINK-16045
 URL: https://issues.apache.org/jira/browse/FLINK-16045
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16044) Extract libraries documentation to a top-level section

2020-02-13 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-16044:


 Summary: Extract libraries documentation to a top-level section
 Key: FLINK-16044
 URL: https://issues.apache.org/jira/browse/FLINK-16044
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16043) Support non-BMP Unicode for JsonRowSerializationSchema

2020-02-13 Thread Benchao Li (Jira)
Benchao Li created FLINK-16043:
--

 Summary: Support non-BMP Unicode for JsonRowSerializationSchema
 Key: FLINK-16043
 URL: https://issues.apache.org/jira/browse/FLINK-16043
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.10.0
Reporter: Benchao Li


This is a known issue for jackson: 
[https://github.com/FasterXML/jackson-core/issues/223]

You can see more details: 
[https://github.com/FasterXML/jackson-core/blob/master/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java#L2105]

 

And I also encountered this issue in my production environment. I've figured 
out a solution to solve this issue. Java's String.getBytes() can deal with 
UTF-8 encoding well. So we can do it like this:

{{mapper.writeValueAsString(node).getBytes()}} instead of 

{{mapper.writeValueAsBytes(node)}}

cc [~jark] [~twalthr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-13 Thread godfrey he
hi kurt,jark,jingsong

Regarding to "fromQuery", I agree with kurt. In addition, I think `Table
from(String tableName)` should be renamed to `Table fromCatalog(String
tableName)`.

Regarding to the "DmlBatch", DML contains "INSERT", "UPDATE", "DELETE", and
they can be executed in a same batch in the future. So we can add
"addUpdate" method and "addDelete" method to support them.

Regarding to the "Inserts addInsert", maybe we can add a "DmlBatchBuilder".

open to more discussion

Best,
godfrey



Kurt Young  于2020年2月13日周四 下午4:56写道:

> Regarding to "fromQuery" is confusing users with "Table from(String
> tableName)", I have
> a just opposite opinion. I think this "fromXXX" pattern can make users
> quite clear when they
> want to get a Table from TableEnvironment. Similar interfaces will also
> include like "fromElements".
>
> Regarding to the name of DmlBatch, I think it's mainly for
> future flexibility, in case we can support
> other statement in a single batch. If that happens, the name "Inserts" will
> be weird.
>
> Best,
> Kurt
>
>
> On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:
>
> > I agree with Jingsong.
> >
> > +1 to keep `sqlQuery`, it's clear from the method name and return type
> that
> > it accepts a SELECT query and returns a logic representation `Table`.
> > The `fromQuery` is a little confused users with the `Table from(String
> > tableName)` method.
> >
> > Regarding to the `DmlBatch`, I agree with Jingsong, AFAIK, the purpose of
> > `DmlBatch` is used to batching insert statements.
> > Besides, DML terminology is not commonly know among users. So what about
> > `InsertsBatching startBatchingInserts()` ?
> >
> > Best,
> > Jark
> >
> > On Thu, 13 Feb 2020 at 15:50, Jingsong Li 
> wrote:
> >
> > > Hi Godfrey,
> > >
> > > Thanks for updating. +1 sketchy.
> > >
> > > I have no idea to change "sqlQuery" to "fromQuery", I think "sqlQuery"
> is
> > > OK, It's not that confusing with return values.
> > >
> > > Can we change the "DmlBatch" to "Inserts"?  I don't see any other
> needs.
> > > "Dml" seems a little weird.
> > > It is better to support "Inserts addInsert" too. Users can
> > > "inserts.addInsert().addInsert()"
> > >
> > > I try to match the new interfaces with the old interfaces simply.
> > > - "startInserts -> addInsert" replace old "sqlUpdate(insert)" and
> > > "insertInto".
> > > - "executeStatement" new one, execute all kinds of sqls immediately.
> > > Including old "sqlUpdate(DDLs)".
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Wed, Feb 12, 2020 at 11:10 AM godfreyhe 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'd like to resume the discussion for FlIP-84 [0]. I had updated the
> > > > document, the mainly changes are:
> > > >
> > > > 1. about "`void sqlUpdate(String sql)`" section
> > > >   a) change "Optional executeSql(String sql) throws
> > > Exception"
> > > > to "ResultTable executeStatement(String statement, String jobName)
> > throws
> > > > Exception". The reason is: "statement" is a more general concept than
> > > > "sql",
> > > > e.g. "show xx" is not a sql command (refer to [1]), but is a
> statement
> > > > (just
> > > > like JDBC). "insert" statement also has return value which is the
> > > affected
> > > > row count, we can unify the return type to "ResultTable" instead of
> > > > "Optional".
> > > >   b) add two sub-interfaces for "ResultTable": "RowResultTable" is
> used
> > > for
> > > > non-streaming select statement and will not contain change flag;
> > > > "RowWithChangeFlagResultTable" is used for streaming select statement
> > and
> > > > will contain change flag.
> > > >
> > > > 2) about "Support batch sql execute and explain" section
> > > > introduce "DmlBatch" to support both sql and Table API (which is
> > borrowed
> > > > from the ideas Dawid mentioned in the slack)
> > > >
> > > > interface TableEnvironment {
> > > > DmlBatch startDmlBatch();
> > > > }
> > > >
> > > > interface DmlBatch {
> > > >   /**
> > > >   * add insert statement to the batch
> > > >   */
> > > > void addInsert(String insert);
> > > >
> > > >  /**
> > > >   * add Table with given sink name to the batch
> > > >   */
> > > > void addInsert(String sinkName, Table table);
> > > >
> > > >  /**
> > > >   * execute the dml statements as a batch
> > > >   */
> > > >   ResultTable execute(String jobName) throws Exception
> > > >
> > > >   /**
> > > >  * Returns the AST and the execution plan to compute the result of
> the
> > > > batch
> > > > dml statement.
> > > >   */
> > > >   String explain(boolean extended);
> > > > }
> > > >
> > > > 3) about "Discuss a parse method for multiple statements execute in
> SQL
> > > > CLI"
> > > > section
> > > > add the pros and cons for each solution
> > > >
> > > > 4) update the "Examples" section and "Summary" section based on the
> > above
> > > > changes
> > > >
> > > > Please refer the design doc[1] for more details and welcome any
> > feedback.
> > > >
> > > > Bests,
> > > > godfreyhe
> > > >
> > > >
> > > > [0]
> 

[jira] [Created] (FLINK-16042) Add state benchmark for append operation in AppendingState

2020-02-13 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-16042:
--

 Summary: Add state benchmark for append operation in AppendingState
 Key: FLINK-16042
 URL: https://issues.apache.org/jira/browse/FLINK-16042
 Project: Flink
  Issue Type: Improvement
  Components: Benchmarks
Affects Versions: 1.10.0
Reporter: Jiayi Liao


As discussed [here|https://github.com/dataArtisans/flink-benchmarks/issues/47], 
currently we don't have benchmark for append operation in {{AppendingState}}, 
which is frequently used in stateful operators like {{WindowOperator}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16041) Expand "popular" documentation sections by default

2020-02-13 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-16041:


 Summary: Expand "popular" documentation sections by default
 Key: FLINK-16041
 URL: https://issues.apache.org/jira/browse/FLINK-16041
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Attachments: current-state.png, proposed-state.png

Currently, when the documentation page is loaded all sections are collapsed, 
this means that some prominent subsections are not easily discoverable. I think 
we should expand the "Getting Started", "Concepts", and "API" sections by 
default. (See also 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation,
 because "API" doesn't exist yet in the current documentation.

I attached screenshots to show what I mean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16040) Change local import to global import

2020-02-13 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-16040:


 Summary: Change local import to global import
 Key: FLINK-16040
 URL: https://issues.apache.org/jira/browse/FLINK-16040
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.11.0


There two reasons support doing this:
 # Execute import will cost time.
 # PEP8  claims that "Imports are always put at the top of the file" 
[https://www.python.org/dev/peps/pep-0008/#imports]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16039) Add API method to get last element in session window

2020-02-13 Thread Manas Kale (Jira)
Manas Kale created FLINK-16039:
--

 Summary: Add API method to get last element in session window
 Key: FLINK-16039
 URL: https://issues.apache.org/jira/browse/FLINK-16039
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.10.0
Reporter: Manas Kale


Consider the events : 

[1, event], [2, event]

where first element is event timestamp in seconds and second element is event 
code/name.

Also consider that an Event time session window with inactivityGap = 2 seconds 
is acting on above stream.

When the first event arrives, a session window should be created that is [1,1].

When the second event arrives, a new session window should be created that is 
[2,2]. Since this falls within firstWindowTimestamp+inactivityGap, it should be 
merged into session window [1,2] and  [2,2] should be deleted.

This is my understanding of how session windows are created. *Please correct me 
if wrong.*

However, Flink does not follow such a definition of windows semantically. If I 
call the  getEnd() method of the TimeWindow() class, I get back _timestamp + 
inactivityGap_.

For the above example, after processing the first element, I would get 1 + 2 = 
3 seconds as the window "end".

The actual window end should be the timestamp 1, which is the last event in the 
session window. 

A solution would be to change the "end" definition of all windows, but I 
suppose this would be breaking and would need some debate.

Therefore, I propose an intermediate solution : add a new API method that keeps 
track of the last element added in the session window. 

If there is agreement on this, I would like to start drafting a change document 
and implement this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-13 Thread Hequn Cheng
Thanks a lot for the release, Jincheng!
Also thanks to everyone that make this release possible!

Best,
Hequn

On Thu, Feb 13, 2020 at 2:18 PM Dian Fu  wrote:

> Thanks for the great work, Jincheng.
>
> Regards,
> Dian
>
> 在 2020年2月13日,下午1:32,jincheng sun  写道:
>
> Hi everyone,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink Python API(PyFlink) 1.9.2, which is the first release to PyPI for the
> Apache Flink Python API 1.9 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
>
> https://pypi.org/project/apache-flink/1.9.2/#files
>
> Or installed using pip command:
>
> pip install apache-flink==1.9.2
>
> We would like to thank all contributors of the Apache Flink community who
> helped to verify this release and made this release possible!
>
> Best,
> Jincheng
>
>
>


[jira] [Created] (FLINK-16038) Expose hitRate metric for JDBCLookupFunction

2020-02-13 Thread Benchao Li (Jira)
Benchao Li created FLINK-16038:
--

 Summary: Expose hitRate metric for JDBCLookupFunction
 Key: FLINK-16038
 URL: https://issues.apache.org/jira/browse/FLINK-16038
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.10.0
Reporter: Benchao Li


{{hitRate}} is a key metric for a cache. Exposing {{hitRate}} for 
{{JDBCLookupFunction}} can give users an intuitive understanding about the 
cache stat.

cc [~lzljs3620320] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16037) maven-dependency-plugin not fully compatible with Java 11

2020-02-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16037:


 Summary: maven-dependency-plugin not fully compatible with Java 11
 Key: FLINK-16037
 URL: https://issues.apache.org/jira/browse/FLINK-16037
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


The maven-dependency-plugin 3.1.1 is not fully compatible with Java 11; 
dependency analysis and listing of dependencies is currently failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16036) Deprecate String based Expression DSL

2020-02-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16036:


 Summary: Deprecate String based Expression DSL
 Key: FLINK-16036
 URL: https://issues.apache.org/jira/browse/FLINK-16036
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Support scalar vectorized Python UDF in PyFlink

2020-02-13 Thread Hequn Cheng
+1 (binding)

Best, Hequn

On Thu, Feb 13, 2020 at 11:48 AM Jingsong Li  wrote:

> +1 (non-binding)
> Thanks Dian for driving.
>
> Best,
> Jingsong Lee
>
> On Thu, Feb 13, 2020 at 11:45 AM jincheng sun 
> wrote:
>
> > +1 (binding)
> >
> > Best,
> > Jincheng
> >
> >
> > Dian Fu  于2020年2月12日周三 下午1:31写道:
> >
> > > Hi all,
> > >
> > > I'd like to start the vote of FLIP-97[1] which is discussed and reached
> > > consensus in the discussion thread[2].
> > >
> > > The vote will be open for at least 72 hours. Unless there is an
> > objection,
> > > I will try to close it by Feb 17, 2020 08:00 UTC if we have received
> > > sufficient votes.
> > >
> > > Regards,
> > > Dian
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> > > [2]
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-scalar-vectorized-Python-UDF-in-PyFlink-tt37264.html
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-13 Thread Kurt Young
Regarding to "fromQuery" is confusing users with "Table from(String
tableName)", I have
a just opposite opinion. I think this "fromXXX" pattern can make users
quite clear when they
want to get a Table from TableEnvironment. Similar interfaces will also
include like "fromElements".

Regarding to the name of DmlBatch, I think it's mainly for
future flexibility, in case we can support
other statement in a single batch. If that happens, the name "Inserts" will
be weird.

Best,
Kurt


On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:

> I agree with Jingsong.
>
> +1 to keep `sqlQuery`, it's clear from the method name and return type that
> it accepts a SELECT query and returns a logic representation `Table`.
> The `fromQuery` is a little confused users with the `Table from(String
> tableName)` method.
>
> Regarding to the `DmlBatch`, I agree with Jingsong, AFAIK, the purpose of
> `DmlBatch` is used to batching insert statements.
> Besides, DML terminology is not commonly know among users. So what about
> `InsertsBatching startBatchingInserts()` ?
>
> Best,
> Jark
>
> On Thu, 13 Feb 2020 at 15:50, Jingsong Li  wrote:
>
> > Hi Godfrey,
> >
> > Thanks for updating. +1 sketchy.
> >
> > I have no idea to change "sqlQuery" to "fromQuery", I think "sqlQuery" is
> > OK, It's not that confusing with return values.
> >
> > Can we change the "DmlBatch" to "Inserts"?  I don't see any other needs.
> > "Dml" seems a little weird.
> > It is better to support "Inserts addInsert" too. Users can
> > "inserts.addInsert().addInsert()"
> >
> > I try to match the new interfaces with the old interfaces simply.
> > - "startInserts -> addInsert" replace old "sqlUpdate(insert)" and
> > "insertInto".
> > - "executeStatement" new one, execute all kinds of sqls immediately.
> > Including old "sqlUpdate(DDLs)".
> >
> > Best,
> > Jingsong Lee
> >
> > On Wed, Feb 12, 2020 at 11:10 AM godfreyhe  wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to resume the discussion for FlIP-84 [0]. I had updated the
> > > document, the mainly changes are:
> > >
> > > 1. about "`void sqlUpdate(String sql)`" section
> > >   a) change "Optional executeSql(String sql) throws
> > Exception"
> > > to "ResultTable executeStatement(String statement, String jobName)
> throws
> > > Exception". The reason is: "statement" is a more general concept than
> > > "sql",
> > > e.g. "show xx" is not a sql command (refer to [1]), but is a statement
> > > (just
> > > like JDBC). "insert" statement also has return value which is the
> > affected
> > > row count, we can unify the return type to "ResultTable" instead of
> > > "Optional".
> > >   b) add two sub-interfaces for "ResultTable": "RowResultTable" is used
> > for
> > > non-streaming select statement and will not contain change flag;
> > > "RowWithChangeFlagResultTable" is used for streaming select statement
> and
> > > will contain change flag.
> > >
> > > 2) about "Support batch sql execute and explain" section
> > > introduce "DmlBatch" to support both sql and Table API (which is
> borrowed
> > > from the ideas Dawid mentioned in the slack)
> > >
> > > interface TableEnvironment {
> > > DmlBatch startDmlBatch();
> > > }
> > >
> > > interface DmlBatch {
> > >   /**
> > >   * add insert statement to the batch
> > >   */
> > > void addInsert(String insert);
> > >
> > >  /**
> > >   * add Table with given sink name to the batch
> > >   */
> > > void addInsert(String sinkName, Table table);
> > >
> > >  /**
> > >   * execute the dml statements as a batch
> > >   */
> > >   ResultTable execute(String jobName) throws Exception
> > >
> > >   /**
> > >  * Returns the AST and the execution plan to compute the result of the
> > > batch
> > > dml statement.
> > >   */
> > >   String explain(boolean extended);
> > > }
> > >
> > > 3) about "Discuss a parse method for multiple statements execute in SQL
> > > CLI"
> > > section
> > > add the pros and cons for each solution
> > >
> > > 4) update the "Examples" section and "Summary" section based on the
> above
> > > changes
> > >
> > > Please refer the design doc[1] for more details and welcome any
> feedback.
> > >
> > > Bests,
> > > godfreyhe
> > >
> > >
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/19-mdYJjKirh5aXCwq1fDajSaI09BJMMT95wy_YhtuZk/edit
> > > [1] https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/
> > >
> > >
> > >
> > > --
> > > Sent from:
> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


[jira] [Created] (FLINK-16035) Update Java's TableEnvironment/Table to accept Java Expression DSL

2020-02-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16035:


 Summary: Update Java's TableEnvironment/Table to accept Java 
Expression DSL
 Key: FLINK-16035
 URL: https://issues.apache.org/jira/browse/FLINK-16035
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16034) Update string based expressions to java dsl in documentation

2020-02-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16034:


 Summary: Update string based expressions to java dsl in 
documentation
 Key: FLINK-16034
 URL: https://issues.apache.org/jira/browse/FLINK-16034
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16033) Introduce Java Expression DSL

2020-02-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16033:


 Summary: Introduce Java Expression DSL
 Key: FLINK-16033
 URL: https://issues.apache.org/jira/browse/FLINK-16033
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16032) Depends on core classifier hive-exec in hive connector

2020-02-13 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-16032:


 Summary: Depends on core classifier hive-exec in hive connector
 Key: FLINK-16032
 URL: https://issues.apache.org/jira/browse/FLINK-16032
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Jingsong Lee
 Fix For: 1.11.0


Now we depends on non-core classifier hive-exec, it is a uber jar and it 
contains a lot of classes.

This make parquet vectorization support very hard, because we use many deep api 
of parquet, and it is hard to compatible with multi parquet versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE][RESULT] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-13 Thread Dawid Wysakowicz
Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Timo, Jark, Aljoscha)
   - 2 non-binding +1 votes (Jingsong, Dian)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-55 
approved.
Thank you all.
Best,
Dawid

On 12/02/2020 06:21, Dian Fu wrote:
> Hi Dawid,
>
> Thanks for your reply. I'm also in favor of "col" as a column expression in 
> the Python Table API. Regarding to use "$" in the Java/Scala Table API, I'm 
> fine with it. So +1 from my side.
>
> Thanks,
> Dian
>
>> 在 2020年2月11日,下午9:48,Aljoscha Krettek  写道:
>>
>> +1
>>
>> Best,
>> Aljoscha
>>
>> On 11.02.20 11:17, Jingsong Li wrote:
>>> Thanks Dawid for your explanation,
>>> +1 for vote.
>>> So I am big +1 to accepting java.lang.Object in the Java DSL, without
>>> scala implicit conversion, a lot of "lit" look unfriendly to users.
>>> Best,
>>> Jingsong Lee
>>> On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
>>> wrote:
 Hi,

 To answer some of the questions:

 @Jingsong We use Objects in the java API to make it possible to use raw
 Objects without the need to wrap them in literals. If an expression is
 passed it is used as is. If anything else is used, it is assumed to be
 an literal and is wrapped into a literal. This way we can e.g. write
 $("f0").plus(1).

 @Jark I think it makes sense to shorten them, I will do it I hope people
 that already voted don't mind.

 @Dian That's a valid concern. I would not discard the '$' as a column
 expression for java and scala. I think once we introduce the expression
 DSL for python we can add another alias to java/scala. Personally I'd be
 in favor of col.

 On 11/02/2020 10:41, Dian Fu wrote:
> Hi Dawid,
>
> Thanks for driving this feature. The design looks very well for me
 overall.
> I have only one concern: $ is not allowed to be used in the identifier
 of Python and so we have to come out with another symbol when aligning this
 feature in the Python Table API. I noticed that there are also other
 options proposed in the discussion thread, e.g. ref, col, etc. I think it
 would be great if the proposed symbol could be supported in both the
 Java/Scala and Python Table API. What's your thoughts?
> Regards,
> Dian
>
>> 在 2020年2月11日,上午11:13,Jark Wu  写道:
>>
>> +1 for this.
>>
>> I have some minor comments:
>> - I'm +1 to use $ in both Java and Scala API.
>> - I'm +1 to use lit(), Spark also provides lit() function to create a
>> literal value.
>> - Is it possible to have `isGreater` instead of `isGreaterThan` and
>> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in
 BaseExpressions?
>> Best,
>> Jark
>>
>> On Tue, 11 Feb 2020 at 10:21, Jingsong Li 
 wrote:
>>> Hi Dawid,
>>>
>>> Thanks for driving.
>>>
>>> - adding $ in scala api looks good to me.
>>> - Just a question, what should be expected to java.lang.Object? literal
>>> object or expression? So the Object is the grammatical sugar of
 literal?
>>> Best,
>>> Jingsong Lee
>>>
>>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 
 wrote:
 +1 for this.

 It will also help in making a TableEnvironment.fromElements() possible
 and reduces technical debt. One entry point of TypeInformation less in
 the API.

 Regards,
 Timo


 On 10.02.20 08:31, Dawid Wysakowicz wrote:
> Hi all,
>
> I wanted to resurrect the thread about introducing a Java Expression
> DSL. Please see the updated flip page[1]. Most of the flip was
>>> concluded
> in previous discussion thread. The major changes since then are:
>
> * accepting java.lang.Object in the Java DSL
>
> * adding $ interpolation for a column in the Scala DSL
>
> I think it's important to move those changes forward as it makes it
> easier to transition to the new type system (Java parser supports
 only
> the old type system stack for now) that we are working on for the
 past
> releases.
>
> Because the previous discussion thread was rather conclusive I want
 to
> start already with a vote. If you think we need another round of
> discussion, feel free to say so.
>
>
> The vote will last for at least 72 hours, following the consensus
>>> voting
> process.
>
> FLIP wiki:
>
> [1]
>
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> Discussion thread:
>
>
 https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
>
>
>>> --
>>> 

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-13 Thread Jark Wu
I agree with Jingsong.

+1 to keep `sqlQuery`, it's clear from the method name and return type that
it accepts a SELECT query and returns a logic representation `Table`.
The `fromQuery` is a little confused users with the `Table from(String
tableName)` method.

Regarding to the `DmlBatch`, I agree with Jingsong, AFAIK, the purpose of
`DmlBatch` is used to batching insert statements.
Besides, DML terminology is not commonly know among users. So what about
`InsertsBatching startBatchingInserts()` ?

Best,
Jark

On Thu, 13 Feb 2020 at 15:50, Jingsong Li  wrote:

> Hi Godfrey,
>
> Thanks for updating. +1 sketchy.
>
> I have no idea to change "sqlQuery" to "fromQuery", I think "sqlQuery" is
> OK, It's not that confusing with return values.
>
> Can we change the "DmlBatch" to "Inserts"?  I don't see any other needs.
> "Dml" seems a little weird.
> It is better to support "Inserts addInsert" too. Users can
> "inserts.addInsert().addInsert()"
>
> I try to match the new interfaces with the old interfaces simply.
> - "startInserts -> addInsert" replace old "sqlUpdate(insert)" and
> "insertInto".
> - "executeStatement" new one, execute all kinds of sqls immediately.
> Including old "sqlUpdate(DDLs)".
>
> Best,
> Jingsong Lee
>
> On Wed, Feb 12, 2020 at 11:10 AM godfreyhe  wrote:
>
> > Hi everyone,
> >
> > I'd like to resume the discussion for FlIP-84 [0]. I had updated the
> > document, the mainly changes are:
> >
> > 1. about "`void sqlUpdate(String sql)`" section
> >   a) change "Optional executeSql(String sql) throws
> Exception"
> > to "ResultTable executeStatement(String statement, String jobName) throws
> > Exception". The reason is: "statement" is a more general concept than
> > "sql",
> > e.g. "show xx" is not a sql command (refer to [1]), but is a statement
> > (just
> > like JDBC). "insert" statement also has return value which is the
> affected
> > row count, we can unify the return type to "ResultTable" instead of
> > "Optional".
> >   b) add two sub-interfaces for "ResultTable": "RowResultTable" is used
> for
> > non-streaming select statement and will not contain change flag;
> > "RowWithChangeFlagResultTable" is used for streaming select statement and
> > will contain change flag.
> >
> > 2) about "Support batch sql execute and explain" section
> > introduce "DmlBatch" to support both sql and Table API (which is borrowed
> > from the ideas Dawid mentioned in the slack)
> >
> > interface TableEnvironment {
> > DmlBatch startDmlBatch();
> > }
> >
> > interface DmlBatch {
> >   /**
> >   * add insert statement to the batch
> >   */
> > void addInsert(String insert);
> >
> >  /**
> >   * add Table with given sink name to the batch
> >   */
> > void addInsert(String sinkName, Table table);
> >
> >  /**
> >   * execute the dml statements as a batch
> >   */
> >   ResultTable execute(String jobName) throws Exception
> >
> >   /**
> >  * Returns the AST and the execution plan to compute the result of the
> > batch
> > dml statement.
> >   */
> >   String explain(boolean extended);
> > }
> >
> > 3) about "Discuss a parse method for multiple statements execute in SQL
> > CLI"
> > section
> > add the pros and cons for each solution
> >
> > 4) update the "Examples" section and "Summary" section based on the above
> > changes
> >
> > Please refer the design doc[1] for more details and welcome any feedback.
> >
> > Bests,
> > godfreyhe
> >
> >
> > [0]
> >
> >
> https://docs.google.com/document/d/19-mdYJjKirh5aXCwq1fDajSaI09BJMMT95wy_YhtuZk/edit
> > [1] https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >
>
>
> --
> Best, Jingsong Lee
>