Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi everyone,

Thanks for the great feedbacks so far.

I updated the FLIP documentation according to the discussion. Changes
include:
- remove "version" key, and merge it into "connector"
- add "scan", "lookup", "sink" prefix to some property keys if they are
only used in that case.
- add a "New Property Key" section to list all the previous property keys
and new property keys.

We use "scan" and "lookup" instead of "source" prefix because we should
distinguish them and they aligns to FLIP-95 ScanTableSource and
LookupTableSource.
I also colored red for some major change of property keys in the FLIP. I
will list some of them here too:

kafka:
connector.startup-mode => scan.startup.mode
connector.specific-offsets => scan.startup.specific-offsets
connector.startup-timestamp-millis => scan.startup.timestamp-millis
connector.sink-partitioner & connector.sink-partitioner-class =>
sink.partitioner

elasticsearch:
connector.key-delimiter => document-id.key-delimiter  # make it
explicit that it is used for document id
connector.key-null-literal => document-id.key-null-literal  # and
it also can be used for es sources in the future
connector.bulk-flush.back-off.type => sink.bulk-flush.back-off.strategy

jdbc:
connector.table => table-name

Welcome further feedbacks!

Best,
Jark


On Tue, 31 Mar 2020 at 14:45, Jark Wu  wrote:

> Hi Kurt,
>
> I also prefer "-" as version delimiter now. I didn't remove the "_"
> proposal by mistake, that's why I sent another email last night :)
> Regarding to "property-version", I also think we shouldn't let users to
> learn about this. And ConfigOption provides a good ability
> to support deprecated keys and auto-generate documentation for deprecated
> keys.
>
> Hi Danny,
>
> Regarding to “connector.properties.*”:
> In FLIP-95, the Factory#requiredOptions() and Factory#optionalOptions()
> inferfaces are only used for generation of documentation.
> It does not influence the discovery and validation of a factory. The
> validation logic is defined by connectors
> in createDynamicTableSource/Sink().
> So you don't have to provide an option for "connector.properties.*". But I
> think we should make ConfigOption support wildcard in the long term for a
> full story.
>
> I don't think we should inline all the "connector.properties.*",
> otherwise, it will be very tricky for users to configure the properties.
> Regarding to FLIP-113, I suggest to provide some ConfigOptions for
> commonly used kafka properties and put them in the supportedHintOptions(),
> e.g. "connector.properties.group.id",
> "connector.properties.fetch.min.bytes".
>
> Best,
> Jark
>
>
>
>
>
> On Tue, 31 Mar 2020 at 12:04, Danny Chan  wrote:
>
>> Thanks Jark for bring up this discussion, +1 for this idea, I believe the
>> user has suffered from the verbose property key for long time.
>>
>> Just one question, how do we handle the keys with wildcard, such as the
>> “connector.properties.*” in Kafka connector which would then hand-over to
>> Kafka client directly. As what suggested in FLIP-95, we use a ConfigOption
>> to describe the “supported properties”, then I have to concerns:
>>
>> • For the new keys, do we still need to put multi-lines there the such
>> key, such as “connector.properties.abc” “connector.properties.def”, or
>> should we inline them, such as “some-key-prefix” = “k1=v1, k2=v2 ..."
>> • Should the ConfigOption support the wildcard ? (If we plan to support
>> the current multi-line style)
>>
>>
>> Best,
>> Danny Chan
>> 在 2020年3月31日 +0800 AM12:37,Jark Wu ,写道:
>> > Hi all,
>> >
>> > Thanks for the feedbacks.
>> >
>> > It seems that we have a conclusion to put the version into the factory
>> > identifier. I'm also fine with this.
>> > If we have this outcome, the interface of Factory#factoryVersion is not
>> > needed anymore, this can simplify the learning cost of new factory.
>> > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
>> > 
>> >
>> > kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
>> > the meaning of "universal" not easy to understand.
>> > kafka-0.11 => kafka for 0.11 version
>> > kafka-0.10 => kafka for 0.10 version
>> > elasticsearch-6 => elasticsearch for 6.x versions
>> > elasticsearch-7 => elasticsearch for 7.x versions
>> > hbase-1.4 => hbase for 1.4.x versions
>> > jdbc
>> > filesystem
>> >
>> > We use "-" as the version delimiter to make them to be more consistent.
>> > This is not forces, users can also use other delimiters or without
>> > delimiter.
>> > But this can be a guilde in the Javadoc of Factory, to make the
>> connector
>> > ecosystem to be more consistent.
>> >
>> > What do you think?
>> >
>> > 
>> >
>> > Regarding "connector.property-version":
>> >
>> > Hi @Dawid Wysakowicz  , the new fatories are
>> > designed not support to read current properties.
>> > All the current properties are routed to the old factories if they are
>> > using "connector.type". Otherwise, properties are routed to new

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Tzu-Li (Gordon) Tai
Sounds good, I'll post a new link to this vote thread, which will have the
problem fixed in a new maven staging repository.

On Tue, Mar 31, 2020 at 2:51 PM Robert Metzger  wrote:

> Thank you for looking into this.
>
> I'm fine with keeping this RC open, but re-vote on a new maven staging
> repository.
>
> On Tue, Mar 31, 2020 at 8:42 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Found the culprit:
> >
> > The Stateful Functions project uses the Apache POM as the parent POM, and
> > uses the `apache-release` build profile to build the staging jars.
> >
> > The problem arises because the `apache-release` build profile itself
> > bundles a source release distribution to be released to Maven.
> > This should be disabled specifically for us, because we use our own
> tooling
> > (tools/releasing/create_source_release.sh) to create the source tarballs
> > which does correctly exclude all those unexpected files Robert found.
> >
> > Will rebuild the RC. I think in this case, it's completely fine to keep
> > with the original voting end time, since nothing is really touched, only
> > excluding some files from the staging Maven repository.
> >
> > On Tue, Mar 31, 2020 at 2:29 PM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > Hi Robert,
> > >
> > > I think you're right. There should be no tarballs / jars packaged for
> > > statefun-parent actually, only the pom file since that's the parent
> > module
> > > which only has pom packaging.
> > > I'm looking into it.
> > >
> > > On Tue, Mar 31, 2020 at 2:23 PM Robert Metzger 
> > > wrote:
> > >
> > >> While checking the release, I found a 77
> > >> MB statefun-parent-2.0.0-source-release.zip file in the maven staging
> > >> repo:
> > >>
> > >>
> >
> https://repository.apache.org/content/repositories/orgapacheflink-1343/org/apache/flink/statefun-parent/2.0.0/
> > >>
> > >> It seems that the file contains all ruby dependencies in docs/ from
> > jekyll
> > >> for the docs (in "statefun-parent-2.0.0/docs/.rubydeps/ruby/2.5.0"). I
> > >> don't think we want to publish these files as part of the release to
> > maven
> > >> central?
> > >> (It also contains python venv files in "statefun-python-sdk/venv")
> > >>
> > >> I guess this is a reason to cancel the RC?
> > >>
> > >>
> > >> On Tue, Mar 31, 2020 at 6:10 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> > >> wrote:
> > >>
> > >> > +1 (binding)
> > >> >
> > >> > ** Legal **
> > >> > - checksums and GPG files match corresponding release files
> > >> > - Source distribution does not contain binaries, contents are sane
> (no
> > >> > .git* / .travis* / generated html content files)
> > >> > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > >> > font-awesome dependency in docs and copied sources from fastutil (
> > >> > http://fastutil.di.unimi.it/)
> > >> > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > >> > Artifacts that do bundle dependencies are:
> > statefun-flink-distribution,
> > >> > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > >> > sources).
> > >> > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE
> > and
> > >> > NOTICE files (no bundled dependencies)
> > >> > - All POMs / README / Python SDK setup.py / Dockerfiles / doc
> configs
> > >> point
> > >> > to same version “2.0.0”
> > >> > - README looks good
> > >> >
> > >> > ** Functional **
> > >> > - Building from source dist with end-to-end tests enabled (mvn clean
> > >> verify
> > >> > -Prun-e2e-tests) passes (JDK 8)
> > >> > - Generated quickstart from archetype looks good (correct POM /
> > >> Dockerfile
> > >> > / service file)
> > >> > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> > >> Python
> > >> > SDK Walkthrough
> > >> > - Flink Harness works in IDE
> > >> > - Test remote functions deployment mode with AWS ecosystem: remote
> > >> Python
> > >> > functions running in AWS Lambda behind AWS API Gateway, Java
> embedded
> > >> > functions running in AWS ECS
> > >> >
> > >> > On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> > >> > wrote:
> > >> >
> > >> > > FYI - I've also updated the website Downloads page to include this
> > >> > release.
> > >> > > Please also consider that for your reviews:
> > >> > > https://github.com/apache/flink-web/pull/318
> > >> > >
> > >> > > On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf <
> > >> > konstan...@ververica.com>
> > >> > > wrote:
> > >> > >
> > >> > >> Hi Gordon,
> > >> > >>
> > >> > >> +1 (non-binding)
> > >> > >>
> > >> > >> * Maven build from source...check
> > >> > >> * Python build from source...check
> > >> > >> * Went through Walkthrough based on local builds...check
> > >> > >>
> > >> > >> Cheers,
> > >> > >>
> > >> > >> Konstantin
> > >> > >>
> > >> > >> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai <
> > >> > tzuli...@apache.org>
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Hi everyone,
> > >> > >> >
> > >> > >> > Please review and vote on the *release candid

[jira] [Created] (FLINK-16878) flink-table-planner contains unwanted dependency org.apiguardian.api

2020-03-30 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-16878:
--

 Summary: flink-table-planner contains unwanted dependency 
org.apiguardian.api
 Key: FLINK-16878
 URL: https://issues.apache.org/jira/browse/FLINK-16878
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: Robert Metzger


CI run: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6856&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5

{code}
==
Running 'Dependency shading of table modules test'
==
TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-57663957727
Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT
Success: There are no unwanted dependencies in the 
/home/vsts/work/1/s/flink-end-to-end-tests/../flink-table/flink-table-api-java/target/flink-table-api-java-1.11-SNAPSHOT.jar
 jar.
Success: There are no unwanted dependencies in the 
/home/vsts/work/1/s/flink-end-to-end-tests/../flink-table/flink-table-api-scala/target/flink-table-api-scala_2.11-1.11-SNAPSHOT.jar
 jar.
Success: There are no unwanted dependencies in the 
/home/vsts/work/1/s/flink-end-to-end-tests/../flink-table/flink-table-api-java-bridge/target/flink-table-api-java-bridge_2.11-1.11-SNAPSHOT.jar
 jar.
Success: There are no unwanted dependencies in the 
/home/vsts/work/1/s/flink-end-to-end-tests/../flink-table/flink-table-api-scala-bridge/target/flink-table-api-scala-bridge_2.11-1.11-SNAPSHOT.jar
 jar.
Failure: There are unwanted dependencies in the 
/home/vsts/work/1/s/flink-end-to-end-tests/../flink-table/flink-table-planner/target/flink-table-planner_2.11-1.11-SNAPSHOT.jar
 jar:   -> org.apiguardian.apinot found
  -> org.apiguardian.apinot found
  -> org.apiguardian.apinot found
  -> org.apiguardian.apinot found
  -> org.apache.commons.io.inputnot found
[FAIL] Test script contains errors.
Checking for errors...
No errors in log files.
Checking for exceptions...
No exceptions in log files.
Checking for non-empty .out files...
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*.out:
 No such file or directory
No non-empty .out files.

[FAIL] 'Dependency shading of table modules test' failed after 0 minutes and 14 
seconds! Test exited with exit code 1

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Robert Metzger
Thank you for looking into this.

I'm fine with keeping this RC open, but re-vote on a new maven staging
repository.

On Tue, Mar 31, 2020 at 8:42 AM Tzu-Li (Gordon) Tai 
wrote:

> Found the culprit:
>
> The Stateful Functions project uses the Apache POM as the parent POM, and
> uses the `apache-release` build profile to build the staging jars.
>
> The problem arises because the `apache-release` build profile itself
> bundles a source release distribution to be released to Maven.
> This should be disabled specifically for us, because we use our own tooling
> (tools/releasing/create_source_release.sh) to create the source tarballs
> which does correctly exclude all those unexpected files Robert found.
>
> Will rebuild the RC. I think in this case, it's completely fine to keep
> with the original voting end time, since nothing is really touched, only
> excluding some files from the staging Maven repository.
>
> On Tue, Mar 31, 2020 at 2:29 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi Robert,
> >
> > I think you're right. There should be no tarballs / jars packaged for
> > statefun-parent actually, only the pom file since that's the parent
> module
> > which only has pom packaging.
> > I'm looking into it.
> >
> > On Tue, Mar 31, 2020 at 2:23 PM Robert Metzger 
> > wrote:
> >
> >> While checking the release, I found a 77
> >> MB statefun-parent-2.0.0-source-release.zip file in the maven staging
> >> repo:
> >>
> >>
> https://repository.apache.org/content/repositories/orgapacheflink-1343/org/apache/flink/statefun-parent/2.0.0/
> >>
> >> It seems that the file contains all ruby dependencies in docs/ from
> jekyll
> >> for the docs (in "statefun-parent-2.0.0/docs/.rubydeps/ruby/2.5.0"). I
> >> don't think we want to publish these files as part of the release to
> maven
> >> central?
> >> (It also contains python venv files in "statefun-python-sdk/venv")
> >>
> >> I guess this is a reason to cancel the RC?
> >>
> >>
> >> On Tue, Mar 31, 2020 at 6:10 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > ** Legal **
> >> > - checksums and GPG files match corresponding release files
> >> > - Source distribution does not contain binaries, contents are sane (no
> >> > .git* / .travis* / generated html content files)
> >> > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> >> > font-awesome dependency in docs and copied sources from fastutil (
> >> > http://fastutil.di.unimi.it/)
> >> > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> >> > Artifacts that do bundle dependencies are:
> statefun-flink-distribution,
> >> > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> >> > sources).
> >> > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE
> and
> >> > NOTICE files (no bundled dependencies)
> >> > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
> >> point
> >> > to same version “2.0.0”
> >> > - README looks good
> >> >
> >> > ** Functional **
> >> > - Building from source dist with end-to-end tests enabled (mvn clean
> >> verify
> >> > -Prun-e2e-tests) passes (JDK 8)
> >> > - Generated quickstart from archetype looks good (correct POM /
> >> Dockerfile
> >> > / service file)
> >> > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> >> Python
> >> > SDK Walkthrough
> >> > - Flink Harness works in IDE
> >> > - Test remote functions deployment mode with AWS ecosystem: remote
> >> Python
> >> > functions running in AWS Lambda behind AWS API Gateway, Java embedded
> >> > functions running in AWS ECS
> >> >
> >> > On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
> >> > wrote:
> >> >
> >> > > FYI - I've also updated the website Downloads page to include this
> >> > release.
> >> > > Please also consider that for your reviews:
> >> > > https://github.com/apache/flink-web/pull/318
> >> > >
> >> > > On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf <
> >> > konstan...@ververica.com>
> >> > > wrote:
> >> > >
> >> > >> Hi Gordon,
> >> > >>
> >> > >> +1 (non-binding)
> >> > >>
> >> > >> * Maven build from source...check
> >> > >> * Python build from source...check
> >> > >> * Went through Walkthrough based on local builds...check
> >> > >>
> >> > >> Cheers,
> >> > >>
> >> > >> Konstantin
> >> > >>
> >> > >> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai <
> >> > tzuli...@apache.org>
> >> > >> wrote:
> >> > >>
> >> > >> > Hi everyone,
> >> > >> >
> >> > >> > Please review and vote on the *release candidate #4* for the
> >> version
> >> > >> 2.0.0
> >> > >> > of Apache Flink Stateful Functions,
> >> > >> > as follows:
> >> > >> > [ ] +1, Approve the release
> >> > >> > [ ] -1, Do not approve the release (please provide specific
> >> comments)
> >> > >> >
> >> > >> > **Testing Guideline**
> >> > >> >
> >> > >> > You can find here [1] a doc that we can use for collaborating
> >> testing
> >> > >> > efforts.
> >> > >> > The listed testing tasks i

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi Kurt,

I also prefer "-" as version delimiter now. I didn't remove the "_"
proposal by mistake, that's why I sent another email last night :)
Regarding to "property-version", I also think we shouldn't let users to
learn about this. And ConfigOption provides a good ability
to support deprecated keys and auto-generate documentation for deprecated
keys.

Hi Danny,

Regarding to “connector.properties.*”:
In FLIP-95, the Factory#requiredOptions() and Factory#optionalOptions()
inferfaces are only used for generation of documentation.
It does not influence the discovery and validation of a factory. The
validation logic is defined by connectors
in createDynamicTableSource/Sink().
So you don't have to provide an option for "connector.properties.*". But I
think we should make ConfigOption support wildcard in the long term for a
full story.

I don't think we should inline all the "connector.properties.*", otherwise,
it will be very tricky for users to configure the properties.
Regarding to FLIP-113, I suggest to provide some ConfigOptions for commonly
used kafka properties and put them in the supportedHintOptions(),
e.g. "connector.properties.group.id",
"connector.properties.fetch.min.bytes".

Best,
Jark





On Tue, 31 Mar 2020 at 12:04, Danny Chan  wrote:

> Thanks Jark for bring up this discussion, +1 for this idea, I believe the
> user has suffered from the verbose property key for long time.
>
> Just one question, how do we handle the keys with wildcard, such as the
> “connector.properties.*” in Kafka connector which would then hand-over to
> Kafka client directly. As what suggested in FLIP-95, we use a ConfigOption
> to describe the “supported properties”, then I have to concerns:
>
> • For the new keys, do we still need to put multi-lines there the such
> key, such as “connector.properties.abc” “connector.properties.def”, or
> should we inline them, such as “some-key-prefix” = “k1=v1, k2=v2 ..."
> • Should the ConfigOption support the wildcard ? (If we plan to support
> the current multi-line style)
>
>
> Best,
> Danny Chan
> 在 2020年3月31日 +0800 AM12:37,Jark Wu ,写道:
> > Hi all,
> >
> > Thanks for the feedbacks.
> >
> > It seems that we have a conclusion to put the version into the factory
> > identifier. I'm also fine with this.
> > If we have this outcome, the interface of Factory#factoryVersion is not
> > needed anymore, this can simplify the learning cost of new factory.
> > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> > 
> >
> > kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
> > the meaning of "universal" not easy to understand.
> > kafka-0.11 => kafka for 0.11 version
> > kafka-0.10 => kafka for 0.10 version
> > elasticsearch-6 => elasticsearch for 6.x versions
> > elasticsearch-7 => elasticsearch for 7.x versions
> > hbase-1.4 => hbase for 1.4.x versions
> > jdbc
> > filesystem
> >
> > We use "-" as the version delimiter to make them to be more consistent.
> > This is not forces, users can also use other delimiters or without
> > delimiter.
> > But this can be a guilde in the Javadoc of Factory, to make the connector
> > ecosystem to be more consistent.
> >
> > What do you think?
> >
> > 
> >
> > Regarding "connector.property-version":
> >
> > Hi @Dawid Wysakowicz  , the new fatories are
> > designed not support to read current properties.
> > All the current properties are routed to the old factories if they are
> > using "connector.type". Otherwise, properties are routed to new
> factories.
> >
> > If I understand correctly, the "connector.property-version" is attched
> > implicitly by system, not manually set by users.
> > For example, the framework should add "connector.property-version=1" to
> > properties when processing DDL statement.
> > I'm fine to add a "connector.property-version=1" when processing DDL
> > statement, but I think it's also fine if we don't,
> > because this can be done in the future if need and the default version
> can
> > be 1.
> >
> > Best,
> > Jark
> >
> > On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:
> >
> > > Hi all,
> > >
> > > Thanks for the feedbacks.
> > >
> > > It seems that we have a conclusion to put the version into the factory
> > > identifier. I'm also fine with this.
> > > If we have this outcome, the interface of Factory#factoryVersion is not
> > > needed anymore, this can simplify the learning cost of new factory.
> > > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> > > 
> > >
> > > Btw, I would like to use "_" instead of "-" as the version delimiter,
> > > because "-" looks like minus and may confuse users, e.g.
> "elasticsearch-6".
> > > This is not forced, but should be a guilde in the Javadoc of Factory.
> > > I propose to use the following identifiers for existing connectors,
> > >
> > > kafka => kafka for 0.11+ versions, we don't suffix "-universal",
> because
> > > the meaning of "universal" not easy to understand.
> > > kafka-0.11 => kafka for 0.11 versio

[jira] [Created] (FLINK-16877) SingleDirectoryWriter should not produce file when no input record

2020-03-30 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-16877:


 Summary: SingleDirectoryWriter should not produce file when no 
input record
 Key: FLINK-16877
 URL: https://issues.apache.org/jira/browse/FLINK-16877
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.11.0


SingleDirectoryWriter should not produce file when no input record.

Now it will produce a empty file, we should avoid this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Tzu-Li (Gordon) Tai
Found the culprit:

The Stateful Functions project uses the Apache POM as the parent POM, and
uses the `apache-release` build profile to build the staging jars.

The problem arises because the `apache-release` build profile itself
bundles a source release distribution to be released to Maven.
This should be disabled specifically for us, because we use our own tooling
(tools/releasing/create_source_release.sh) to create the source tarballs
which does correctly exclude all those unexpected files Robert found.

Will rebuild the RC. I think in this case, it's completely fine to keep
with the original voting end time, since nothing is really touched, only
excluding some files from the staging Maven repository.

On Tue, Mar 31, 2020 at 2:29 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Robert,
>
> I think you're right. There should be no tarballs / jars packaged for
> statefun-parent actually, only the pom file since that's the parent module
> which only has pom packaging.
> I'm looking into it.
>
> On Tue, Mar 31, 2020 at 2:23 PM Robert Metzger 
> wrote:
>
>> While checking the release, I found a 77
>> MB statefun-parent-2.0.0-source-release.zip file in the maven staging
>> repo:
>>
>> https://repository.apache.org/content/repositories/orgapacheflink-1343/org/apache/flink/statefun-parent/2.0.0/
>>
>> It seems that the file contains all ruby dependencies in docs/ from jekyll
>> for the docs (in "statefun-parent-2.0.0/docs/.rubydeps/ruby/2.5.0"). I
>> don't think we want to publish these files as part of the release to maven
>> central?
>> (It also contains python venv files in "statefun-python-sdk/venv")
>>
>> I guess this is a reason to cancel the RC?
>>
>>
>> On Tue, Mar 31, 2020 at 6:10 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> > +1 (binding)
>> >
>> > ** Legal **
>> > - checksums and GPG files match corresponding release files
>> > - Source distribution does not contain binaries, contents are sane (no
>> > .git* / .travis* / generated html content files)
>> > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
>> > font-awesome dependency in docs and copied sources from fastutil (
>> > http://fastutil.di.unimi.it/)
>> > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
>> > Artifacts that do bundle dependencies are: statefun-flink-distribution,
>> > statefun-ridesharing-example-simulator, statefun-flink-core (copied
>> > sources).
>> > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
>> > NOTICE files (no bundled dependencies)
>> > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
>> point
>> > to same version “2.0.0”
>> > - README looks good
>> >
>> > ** Functional **
>> > - Building from source dist with end-to-end tests enabled (mvn clean
>> verify
>> > -Prun-e2e-tests) passes (JDK 8)
>> > - Generated quickstart from archetype looks good (correct POM /
>> Dockerfile
>> > / service file)
>> > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
>> Python
>> > SDK Walkthrough
>> > - Flink Harness works in IDE
>> > - Test remote functions deployment mode with AWS ecosystem: remote
>> Python
>> > functions running in AWS Lambda behind AWS API Gateway, Java embedded
>> > functions running in AWS ECS
>> >
>> > On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> > wrote:
>> >
>> > > FYI - I've also updated the website Downloads page to include this
>> > release.
>> > > Please also consider that for your reviews:
>> > > https://github.com/apache/flink-web/pull/318
>> > >
>> > > On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf <
>> > konstan...@ververica.com>
>> > > wrote:
>> > >
>> > >> Hi Gordon,
>> > >>
>> > >> +1 (non-binding)
>> > >>
>> > >> * Maven build from source...check
>> > >> * Python build from source...check
>> > >> * Went through Walkthrough based on local builds...check
>> > >>
>> > >> Cheers,
>> > >>
>> > >> Konstantin
>> > >>
>> > >> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai <
>> > tzuli...@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Hi everyone,
>> > >> >
>> > >> > Please review and vote on the *release candidate #4* for the
>> version
>> > >> 2.0.0
>> > >> > of Apache Flink Stateful Functions,
>> > >> > as follows:
>> > >> > [ ] +1, Approve the release
>> > >> > [ ] -1, Do not approve the release (please provide specific
>> comments)
>> > >> >
>> > >> > **Testing Guideline**
>> > >> >
>> > >> > You can find here [1] a doc that we can use for collaborating
>> testing
>> > >> > efforts.
>> > >> > The listed testing tasks in the doc also serve as a guideline in
>> what
>> > to
>> > >> > test for this release.
>> > >> > If you wish to take ownership of a testing task, simply put your
>> name
>> > >> down
>> > >> > in the "Checked by" field of the task.
>> > >> >
>> > >> > **Release Overview**
>> > >> >
>> > >> > As an overview, the release consists of the following:
>> > >> > a) Stateful Functions canonical source distribution, to be
>> deployed to
>> > >> the
>> > >> > release rep

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-03-30 Thread godfrey he
Hi everyone,

I'd like to start the vote of FLIP-84[1] again, because we have some
feedbacks. The feedbacks are all about new introduced methods, here is the
discussion thread [2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 3, 2020 06:30 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Tzu-Li (Gordon) Tai
Hi Robert,

I think you're right. There should be no tarballs / jars packaged for
statefun-parent actually, only the pom file since that's the parent module
which only has pom packaging.
I'm looking into it.

On Tue, Mar 31, 2020 at 2:23 PM Robert Metzger  wrote:

> While checking the release, I found a 77
> MB statefun-parent-2.0.0-source-release.zip file in the maven staging repo:
>
> https://repository.apache.org/content/repositories/orgapacheflink-1343/org/apache/flink/statefun-parent/2.0.0/
>
> It seems that the file contains all ruby dependencies in docs/ from jekyll
> for the docs (in "statefun-parent-2.0.0/docs/.rubydeps/ruby/2.5.0"). I
> don't think we want to publish these files as part of the release to maven
> central?
> (It also contains python venv files in "statefun-python-sdk/venv")
>
> I guess this is a reason to cancel the RC?
>
>
> On Tue, Mar 31, 2020 at 6:10 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > +1 (binding)
> >
> > ** Legal **
> > - checksums and GPG files match corresponding release files
> > - Source distribution does not contain binaries, contents are sane (no
> > .git* / .travis* / generated html content files)
> > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > font-awesome dependency in docs and copied sources from fastutil (
> > http://fastutil.di.unimi.it/)
> > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > Artifacts that do bundle dependencies are: statefun-flink-distribution,
> > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > sources).
> > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
> > NOTICE files (no bundled dependencies)
> > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
> point
> > to same version “2.0.0”
> > - README looks good
> >
> > ** Functional **
> > - Building from source dist with end-to-end tests enabled (mvn clean
> verify
> > -Prun-e2e-tests) passes (JDK 8)
> > - Generated quickstart from archetype looks good (correct POM /
> Dockerfile
> > / service file)
> > - Examples run: Java Greeter / Java Ridesharing / Python Greeter / Python
> > SDK Walkthrough
> > - Flink Harness works in IDE
> > - Test remote functions deployment mode with AWS ecosystem: remote Python
> > functions running in AWS Lambda behind AWS API Gateway, Java embedded
> > functions running in AWS ECS
> >
> > On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> > wrote:
> >
> > > FYI - I've also updated the website Downloads page to include this
> > release.
> > > Please also consider that for your reviews:
> > > https://github.com/apache/flink-web/pull/318
> > >
> > > On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf <
> > konstan...@ververica.com>
> > > wrote:
> > >
> > >> Hi Gordon,
> > >>
> > >> +1 (non-binding)
> > >>
> > >> * Maven build from source...check
> > >> * Python build from source...check
> > >> * Went through Walkthrough based on local builds...check
> > >>
> > >> Cheers,
> > >>
> > >> Konstantin
> > >>
> > >> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> > >> wrote:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > Please review and vote on the *release candidate #4* for the version
> > >> 2.0.0
> > >> > of Apache Flink Stateful Functions,
> > >> > as follows:
> > >> > [ ] +1, Approve the release
> > >> > [ ] -1, Do not approve the release (please provide specific
> comments)
> > >> >
> > >> > **Testing Guideline**
> > >> >
> > >> > You can find here [1] a doc that we can use for collaborating
> testing
> > >> > efforts.
> > >> > The listed testing tasks in the doc also serve as a guideline in
> what
> > to
> > >> > test for this release.
> > >> > If you wish to take ownership of a testing task, simply put your
> name
> > >> down
> > >> > in the "Checked by" field of the task.
> > >> >
> > >> > **Release Overview**
> > >> >
> > >> > As an overview, the release consists of the following:
> > >> > a) Stateful Functions canonical source distribution, to be deployed
> to
> > >> the
> > >> > release repository at dist.apache.org
> > >> > b) Stateful Functions Python SDK distributions to be deployed to
> PyPI
> > >> > c) Maven artifacts to be deployed to the Maven Central Repository
> > >> >
> > >> > **Staging Areas to Review**
> > >> >
> > >> > The staging areas containing the above mentioned artifacts are as
> > >> follows,
> > >> > for your review:
> > >> > * All artifacts for a) and b) can be found in the corresponding dev
> > >> > repository at dist.apache.org [2]
> > >> > * All artifacts for c) can be found at the Apache Nexus Repository
> [3]
> > >> >
> > >> > All artifacts are singed with the
> > >> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> > >> >
> > >> > Other links for your review:
> > >> > * JIRA release notes [5]
> > >> > * source code tag "release-2.0.0-rc4" [6] [7]
> > >> >
> > >> > **Extra Remarks**
> > >> >
> > >> > * Part of the release is also official Docker images for Sta

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Robert Metzger
While checking the release, I found a 77
MB statefun-parent-2.0.0-source-release.zip file in the maven staging repo:
https://repository.apache.org/content/repositories/orgapacheflink-1343/org/apache/flink/statefun-parent/2.0.0/

It seems that the file contains all ruby dependencies in docs/ from jekyll
for the docs (in "statefun-parent-2.0.0/docs/.rubydeps/ruby/2.5.0"). I
don't think we want to publish these files as part of the release to maven
central?
(It also contains python venv files in "statefun-python-sdk/venv")

I guess this is a reason to cancel the RC?


On Tue, Mar 31, 2020 at 6:10 AM Tzu-Li (Gordon) Tai 
wrote:

> +1 (binding)
>
> ** Legal **
> - checksums and GPG files match corresponding release files
> - Source distribution does not contain binaries, contents are sane (no
> .git* / .travis* / generated html content files)
> - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> font-awesome dependency in docs and copied sources from fastutil (
> http://fastutil.di.unimi.it/)
> - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> Artifacts that do bundle dependencies are: statefun-flink-distribution,
> statefun-ridesharing-example-simulator, statefun-flink-core (copied
> sources).
> - Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
> NOTICE files (no bundled dependencies)
> - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs point
> to same version “2.0.0”
> - README looks good
>
> ** Functional **
> - Building from source dist with end-to-end tests enabled (mvn clean verify
> -Prun-e2e-tests) passes (JDK 8)
> - Generated quickstart from archetype looks good (correct POM / Dockerfile
> / service file)
> - Examples run: Java Greeter / Java Ridesharing / Python Greeter / Python
> SDK Walkthrough
> - Flink Harness works in IDE
> - Test remote functions deployment mode with AWS ecosystem: remote Python
> functions running in AWS Lambda behind AWS API Gateway, Java embedded
> functions running in AWS ECS
>
> On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > FYI - I've also updated the website Downloads page to include this
> release.
> > Please also consider that for your reviews:
> > https://github.com/apache/flink-web/pull/318
> >
> > On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf <
> konstan...@ververica.com>
> > wrote:
> >
> >> Hi Gordon,
> >>
> >> +1 (non-binding)
> >>
> >> * Maven build from source...check
> >> * Python build from source...check
> >> * Went through Walkthrough based on local builds...check
> >>
> >> Cheers,
> >>
> >> Konstantin
> >>
> >> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >> wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > Please review and vote on the *release candidate #4* for the version
> >> 2.0.0
> >> > of Apache Flink Stateful Functions,
> >> > as follows:
> >> > [ ] +1, Approve the release
> >> > [ ] -1, Do not approve the release (please provide specific comments)
> >> >
> >> > **Testing Guideline**
> >> >
> >> > You can find here [1] a doc that we can use for collaborating testing
> >> > efforts.
> >> > The listed testing tasks in the doc also serve as a guideline in what
> to
> >> > test for this release.
> >> > If you wish to take ownership of a testing task, simply put your name
> >> down
> >> > in the "Checked by" field of the task.
> >> >
> >> > **Release Overview**
> >> >
> >> > As an overview, the release consists of the following:
> >> > a) Stateful Functions canonical source distribution, to be deployed to
> >> the
> >> > release repository at dist.apache.org
> >> > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> >> > c) Maven artifacts to be deployed to the Maven Central Repository
> >> >
> >> > **Staging Areas to Review**
> >> >
> >> > The staging areas containing the above mentioned artifacts are as
> >> follows,
> >> > for your review:
> >> > * All artifacts for a) and b) can be found in the corresponding dev
> >> > repository at dist.apache.org [2]
> >> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> >> >
> >> > All artifacts are singed with the
> >> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> >> >
> >> > Other links for your review:
> >> > * JIRA release notes [5]
> >> > * source code tag "release-2.0.0-rc4" [6] [7]
> >> >
> >> > **Extra Remarks**
> >> >
> >> > * Part of the release is also official Docker images for Stateful
> >> > Functions. This can be a separate process, since the creation of those
> >> > relies on the fact that we have distribution jars already deployed to
> >> > Maven. I will follow-up with this after these artifacts are officially
> >> > released.
> >> > In the meantime, there is this discussion [8] ongoing about where to
> >> host
> >> > the StateFun Dockerfiles.
> >> > * The Flink Website and blog post is also being worked on (by Marta)
> as
> >> > part of the release, to incorporate the new Stateful Functions
> project.
> >> We
> >> > 

Re: [DISCUSS] FLIP-118: Improve Flink’s ID system

2020-03-30 Thread Yangze Guo
Thank you all for the feedback! Sorry for the belated reply.

@Till
I'm +1 for your two ideas and I'd like to move these two out of the
scope of this FLIP since the pipelined region scheduling is an ongoing
work now.
I also agree that we should not make the InstanceID in
TaskExecutorConnection being composed of the ResourceID plus a
monotonically increasing value. Thanks a lot for your explanation.

@Konstantin @Yang
Regarding the PodName of TaskExecutor on K8s, I second Yang's
suggestion. It makes sense to me to let user export RESOURCE_ID and
make TM respect it. User needs to guarantee there is no collision for
different TM.

Best,
Yangze Guo


On Tue, Mar 31, 2020 at 12:25 AM Steven Wu  wrote:
>
> +1 on allowing user defined resourceId for taskmanager
>
> On Sun, Mar 29, 2020 at 7:24 PM Yang Wang  wrote:
>
> > Hi Konstantin,
> >
> > I think it is a good idea. Currently, our users also report a similar issue
> > with
> > resourceId of standalone cluster. When we start a standalone cluster now,
> > the `TaskManagerRunner` always generates a uuid for the resourceId. It will
> > be used to register to the jobmanager and not convenient to match with the
> > real
> > taskmanager, especially in container environment.
> >
> > I think a probably solution is we could support the user defined
> > resourceId.
> > We could get it from the environment. For standalone on K8s, we could set
> > the "RESOURCE_ID" env to the pod name so that it is easier to match the
> > taskmanager with K8s pod.
> >
> > Moreover, i am afraid we could not set the pod name to the resourceId. I
> > think
> > you could set the "deployment.meta.name". Since the pod name is generated
> > by
> > K8s in the pattern {deployment.meta.nane}-{rc.uuid}-{uuid}. On the
> > contrary, we
> > will set the resourceId to the pod name.
> >
> >
> > Best,
> > Yang
> >
> > Konstantin Knauf  于2020年3月29日周日 下午8:06写道:
> >
> > > Hi Yangze, Hi Till,
> > >
> > > thanks you for working on this topic. I believe it will make debugging
> > > large Apache Flink deployments much more feasible.
> > >
> > > I was wondering whether it would make sense to allow the user to specify
> > > the Resource ID in standalone setups?  For example, many users still
> > > implicitly use standalone clusters on Kubernetes (the native support is
> > > still experimental) and in these cases it would be interesting to also
> > set
> > > the PodName as the ResourceID. What do you think?
> > >
> > > Cheers,
> > >
> > > Kosntantin
> > >
> > > On Thu, Mar 26, 2020 at 6:49 PM Till Rohrmann 
> > > wrote:
> > >
> > > > Hi Yangze,
> > > >
> > > > thanks for creating this FLIP. I think it is a very good improvement
> > > > helping our users and ourselves understanding better what's going on in
> > > > Flink.
> > > >
> > > > Creating the ResourceIDs with host information/pod name is a good idea.
> > > >
> > > > Also deriving ExecutionGraph IDs from their superset ID is a good idea.
> > > >
> > > > The InstanceID is used for fencing purposes. I would not make it a
> > > > composition of the ResourceID + a monotonically increasing number. The
> > > > problem is that in case of a RM failure the InstanceIDs would start
> > from
> > > 0
> > > > again and this could lead to collisions.
> > > >
> > > > Logging more information on how the different runtime IDs are
> > correlated
> > > is
> > > > also a good idea.
> > > >
> > > > Two other ideas for simplifying the ids are the following:
> > > >
> > > > * The SlotRequestID was introduced because the SlotPool was a separate
> > > > RpcEndpoint a while ago. With this no longer being the case I think we
> > > > could remove the SlotRequestID and replace it with the AllocationID.
> > > > * Instead of creating new SlotRequestIDs for multi task slots one could
> > > > derive them from the SlotRequestID used for requesting the underlying
> > > > AllocatedSlot.
> > > >
> > > > Given that the slot sharing logic will most likely be reworked with the
> > > > pipelined region scheduling, we might be able to resolve these two
> > points
> > > > as part of the pipelined region scheduling effort.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Mar 26, 2020 at 10:51 AM Yangze Guo 
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We would like to start a discussion thread on "FLIP-118: Improve
> > > > > Flink’s ID system"[1].
> > > > >
> > > > > This FLIP mainly discusses the following issues, target to enhance
> > the
> > > > > readability of IDs in log and help user to debug in case of failures:
> > > > >
> > > > > - Enhance the readability of the string literals of IDs. Most of them
> > > > > are hashcodes, e.g. ExecutionAttemptID, which do not provide much
> > > > > meaningful information and are hard to recognize and compare for
> > > > > users.
> > > > > - Log the ID’s lineage information to make debugging more convenient.
> > > > > Currently, the log fails to always show the lineage information
> > > > > between IDs. Finding out relationships betw

Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-30 Thread Zhu Zhu
Thanks for the nice suggestion Till.
The section 'Bulk Slot Allocation' is updated.

Thanks,
Zhu Zhu

Gary Yao  于2020年3月30日周一 下午3:38写道:

> >
> > The links work for me now. Someone might have fixed them. Never mind.
> >
>
> Actually, I fixed the links after seeing your email. Thanks for reporting.
>
> Best,
> Gary
>
> On Mon, Mar 30, 2020 at 3:48 AM Xintong Song 
> wrote:
>
> > @ZhuZhu
> >
> > The links work for me now. Someone might have fixed them. Never mind.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 30, 2020 at 1:31 AM Zhu Zhu  wrote:
> >
> > > Thanks for the comments!
> > >
> > > To Xintong,
> > > It's a bit strange since the in page links work as expected. Would you
> > take
> > > another try?
> > >
> > > To Till,
> > > - Regarding the idea to improve to SlotProvider interface
> > > I think it is a good idea and thanks a lot! In the current design we
> make
> > > slot requests for batch jobs to wait for resources without timeout as
> > long
> > > as the JM see enough slots overall. This implicitly add assumption that
> > > tasks can finish and slots are be returned. This, however, would not
> work
> > > in the mixed bounded/unbounded workloads as you mentioned.
> > > Your idea looks more clear that it always allow slot allocations to
> wait
> > > and not time out as long as it see enough slots. And the 'enough' check
> > is
> > > with regard to slots that can be returned (for bounded tasks) and slots
> > > that will be occupied forever (for unbounded tasks), so that streaming
> > jobs
> > > can naturally throw slot allocation timeout errors if the cluster does
> > not
> > > have enough resources for all the tasks to run at the same time.
> > > I will take a deeper thought to see how we can implement it this way.
> > >
> > > - Regarding the idea to solve "Resource deadlocks when slot allocation
> > > competition happens between multiple jobs in a session cluster"
> > > Agreed it's also possible to let the RM to revoke the slots to unblock
> > the
> > > oldest bulk of requests first. That would require some extra work to
> > change
> > > the RM to holds the requests before it is sure the slots are
> successfully
> > > assigned to the JM (currently the RM removes pending requests right
> after
> > > the requests are sent to TM without confirming wether the slot offers
> > > succeed). We can look deeper into it later when we are about to support
> > > variant sizes slots.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > >
> > > Till Rohrmann  于2020年3月27日周五 下午10:59写道:
> > >
> > > > Thanks for creating this FLIP Zhu Zhu and Gary!
> > > >
> > > > +1 for adding pipelined region scheduling.
> > > >
> > > > Concerning the extended SlotProvider interface I have an idea how we
> > > could
> > > > further improve it. If I am not mistaken, then you have proposed to
> > > > introduce the two timeouts in order to distinguish between batch and
> > > > streaming jobs and to encode that batch job requests can wait if
> there
> > > are
> > > > enough resources in the SlotPool (not necessarily being available
> right
> > > > now). I think what we actually need to tell the SlotProvider is
> > whether a
> > > > request will use the slot only for a limited time or not. This is
> > exactly
> > > > the difference between processing bounded and unbounded streams. If
> the
> > > > SlotProvider knows this difference, then it can tell which slots will
> > > > eventually be reusable and which not. Based on this it can tell
> > whether a
> > > > slot request can be fulfilled eventually or whether we fail after the
> > > > specified timeout. Another benefit of this approach would be that we
> > can
> > > > easily support mixed bounded/unbounded workloads. What we would need
> to
> > > > know for this approach is whether a pipelined region is processing a
> > > > bounded or unbounded stream.
> > > >
> > > > To give an example let's assume we request the following sets of
> slots
> > > > where each pipelined region requires the same slots:
> > > >
> > > > slotProvider.allocateSlots(pr1_bounded, timeout);
> > > > slotProvider.allocateSlots(pr2_unbounded, timeout);
> > > > slotProvider.allocateSlots(pr3_bounded, timeout);
> > > >
> > > > Let's assume we receive slots for pr1_bounded in < timeout and can
> then
> > > > fulfill the request. Then we request pr2_unbounded. Since we know
> that
> > > > pr1_bounded will complete eventually, we don't fail this request
> after
> > > > timeout. Next we request pr3_bounded after pr2_unbounded has been
> > > > completed. In this case, we see that we need to request new resources
> > > > because pr2_unbounded won't release its slots. Hence, if we cannot
> > > allocate
> > > > new resources within timeout, we fail this request.
> > > >
> > > > A small comment concerning "Resource deadlocks when slot allocation
> > > > competition happens between multiple jobs in a session cluster":
> > Another
> > > > idea to solve this situation would be to give the ResourceManager the
> > > rig

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he
Hi, Timo & Jark

Thanks for your explanation.
Agree with you that async execution should always be async,
and sync execution scenario can be covered  by async execution.
It helps provide an unified entry point for batch and streaming.
I think we can also use sync execution for some testing.
So, I agree with you that we provide `executeSql` method and it's async
method.
If we want sync method in the future, we can add method named
`executeSqlSync`.

I think we've reached an agreement. I will update the document, and start
voting process.

Best,
Godfrey


Jark Wu  于2020年3月31日周二 上午12:46写道:

> Hi,
>
> I didn't follow the full discussion.
> But I share the same concern with Timo that streaming queries should always
> be async.
> Otherwise, I can image it will cause a lot of confusion and problems if
> users don't deeply keep the "sync" in mind (e.g. client hangs).
> Besides, the streaming mode is still the majority use cases of Flink and
> Flink SQL. We should put the usability at a high priority.
>
> Best,
> Jark
>
>
> On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:
>
> > Hi Godfrey,
> >
> > maybe I wasn't expressing my biggest concern enough in my last mail.
> > Even in a singleline and sync execution, I think that streaming queries
> > should not block the execution. Otherwise it is not possible to call
> > collect() or print() on them afterwards.
> >
> > "there are too many things need to discuss for multiline":
> >
> > True, I don't want to solve all of them right now. But what I know is
> > that our newly introduced methods should fit into a multiline execution.
> > There is no big difference of calling `executeSql(A), executeSql(B)` and
> > processing a multiline file `A;\nB;`.
> >
> > I think the example that you mentioned can simply be undefined for now.
> > Currently, no catalog is modifying data but just metadata. This is a
> > separate discussion.
> >
> > "result of the second statement is indeterministic":
> >
> > Sure this is indeterministic. But this is the implementers fault and we
> > cannot forbid such pipelines.
> >
> > How about we always execute streaming queries async? It would unblock
> > executeSql() and multiline statements.
> >
> > Having a `executeSqlAsync()` is useful for batch. However, I don't want
> > `sync/async` be the new batch/stream flag. The execution behavior should
> > come from the query itself.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.03.20 11:12, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Agree with you that streaming queries is our top priority,
> > > but I think there are too many things need to discuss for multiline
> > > statements:
> > > e.g.
> > > 1. what's the behaivor of DDL and DML mixing for async execution:
> > > create table t1 xxx;
> > > create table t2 xxx;
> > > insert into t2 select * from t1 where xxx;
> > > drop table t1; // t1 may be a MySQL table, the data will also be
> deleted.
> > >
> > > t1 is dropped when "insert" job is running.
> > >
> > > 2. what's the behaivor of unified scenario for async execution: (as you
> > > mentioned)
> > > INSERT INTO t1 SELECT * FROM s;
> > > INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> > >
> > > The result of the second statement is indeterministic, because the
> first
> > > statement maybe is running.
> > > I think we need to put a lot of effort to define the behavior of
> > logically
> > > related queries.
> > >
> > > In this FLIP, I suggest we only handle single statement, and we also
> > > introduce an async execute method
> > > which is more important and more often used for users.
> > >
> > > Dor the sync methods (like `TableEnvironment.executeSql` and
> > > `StatementSet.execute`),
> > > the result will be returned until the job is finished. The following
> > > methods will be introduced in this FLIP:
> > >
> > >   /**
> > >* Asynchronously execute the given single statement
> > >*/
> > > TableEnvironment.executeSqlAsync(String statement): TableResult
> > >
> > > /**
> > >   * Asynchronously execute the dml statements as a batch
> > >   */
> > > StatementSet.executeAsync(): TableResult
> > >
> > > public interface TableResult {
> > > /**
> > >  * return JobClient for DQL and DML in async mode, else return
> > > Optional.empty
> > >  */
> > > Optional getJobClient();
> > > }
> > >
> > > what do you think?
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther  于2020年3月26日周四 下午9:15写道:
> > >
> > >> Hi Godfrey,
> > >>
> > >> executing streaming queries must be our top priority because this is
> > >> what distinguishes Flink from competitors. If we change the execution
> > >> behavior, we should think about the other cases as well to not break
> the
> > >> API a third time.
> > >>
> > >> I fear that just having an async execute method will not be enough
> > >> because users should be able to mix streaming and batch queries in a
> > >> unified scenario.
> > >>
> > >> If I remember it correctly, we had some discussions in the past about
> > >> what decides about the execut

[jira] [Created] (FLINK-16876) Make TtlTimeProvider configurable when creating keyed state backend

2020-03-30 Thread Yun Tang (Jira)
Yun Tang created FLINK-16876:


 Summary: Make TtlTimeProvider configurable when creating keyed 
state backend
 Key: FLINK-16876
 URL: https://issues.apache.org/jira/browse/FLINK-16876
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.10.0
Reporter: Yun Tang
 Fix For: 1.11.0


Currently, we would always use TtlTimeProvider.DEFAULT to create keyed state 
backend. This is somehow acceptable since we only support processing time for 
TTL now. However, this would make UT tests which would verify TTL logic not so 
convenient like FLINK-16581.

I propose to let TtlTimeProvider configurable when creating keyed state backend 
to not block other feature development.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16875) Tracking replace QueryConfig and TableConfig implementation with ConfigOption

2020-03-30 Thread jinhai (Jira)
jinhai created FLINK-16875:
--

 Summary: Tracking replace QueryConfig and TableConfig 
implementation with ConfigOption
 Key: FLINK-16875
 URL: https://issues.apache.org/jira/browse/FLINK-16875
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: jinhai


Tracking replace QueryConfig and TableConfig implementation with ConfigOption.

First remove QueryConfig integrated into TableConfig. issue: 
https://issues.apache.org/jira/browse/FLINK-13691

Second consider removing {{TableConfig}} and fully rely on a 
Configuration-based object with {{ConfigOptions. issue: 
https://issues.apache.org/jira/browse/FLINK-16835}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Tzu-Li (Gordon) Tai
+1 (binding)

** Legal **
- checksums and GPG files match corresponding release files
- Source distribution does not contain binaries, contents are sane (no
.git* / .travis* / generated html content files)
- Bundled source LICENSEs and NOTICE looks good. Mentions bundled
font-awesome dependency in docs and copied sources from fastutil (
http://fastutil.di.unimi.it/)
- Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
Artifacts that do bundle dependencies are: statefun-flink-distribution,
statefun-ridesharing-example-simulator, statefun-flink-core (copied
sources).
- Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
NOTICE files (no bundled dependencies)
- All POMs / README / Python SDK setup.py / Dockerfiles / doc configs point
to same version “2.0.0”
- README looks good

** Functional **
- Building from source dist with end-to-end tests enabled (mvn clean verify
-Prun-e2e-tests) passes (JDK 8)
- Generated quickstart from archetype looks good (correct POM / Dockerfile
/ service file)
- Examples run: Java Greeter / Java Ridesharing / Python Greeter / Python
SDK Walkthrough
- Flink Harness works in IDE
- Test remote functions deployment mode with AWS ecosystem: remote Python
functions running in AWS Lambda behind AWS API Gateway, Java embedded
functions running in AWS ECS

On Tue, Mar 31, 2020 at 12:09 PM Tzu-Li (Gordon) Tai 
wrote:

> FYI - I've also updated the website Downloads page to include this release.
> Please also consider that for your reviews:
> https://github.com/apache/flink-web/pull/318
>
> On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf 
> wrote:
>
>> Hi Gordon,
>>
>> +1 (non-binding)
>>
>> * Maven build from source...check
>> * Python build from source...check
>> * Went through Walkthrough based on local builds...check
>>
>> Cheers,
>>
>> Konstantin
>>
>> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > Please review and vote on the *release candidate #4* for the version
>> 2.0.0
>> > of Apache Flink Stateful Functions,
>> > as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> > **Testing Guideline**
>> >
>> > You can find here [1] a doc that we can use for collaborating testing
>> > efforts.
>> > The listed testing tasks in the doc also serve as a guideline in what to
>> > test for this release.
>> > If you wish to take ownership of a testing task, simply put your name
>> down
>> > in the "Checked by" field of the task.
>> >
>> > **Release Overview**
>> >
>> > As an overview, the release consists of the following:
>> > a) Stateful Functions canonical source distribution, to be deployed to
>> the
>> > release repository at dist.apache.org
>> > b) Stateful Functions Python SDK distributions to be deployed to PyPI
>> > c) Maven artifacts to be deployed to the Maven Central Repository
>> >
>> > **Staging Areas to Review**
>> >
>> > The staging areas containing the above mentioned artifacts are as
>> follows,
>> > for your review:
>> > * All artifacts for a) and b) can be found in the corresponding dev
>> > repository at dist.apache.org [2]
>> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
>> >
>> > All artifacts are singed with the
>> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
>> >
>> > Other links for your review:
>> > * JIRA release notes [5]
>> > * source code tag "release-2.0.0-rc4" [6] [7]
>> >
>> > **Extra Remarks**
>> >
>> > * Part of the release is also official Docker images for Stateful
>> > Functions. This can be a separate process, since the creation of those
>> > relies on the fact that we have distribution jars already deployed to
>> > Maven. I will follow-up with this after these artifacts are officially
>> > released.
>> > In the meantime, there is this discussion [8] ongoing about where to
>> host
>> > the StateFun Dockerfiles.
>> > * The Flink Website and blog post is also being worked on (by Marta) as
>> > part of the release, to incorporate the new Stateful Functions project.
>> We
>> > can follow up with a link to those changes afterwards in this vote
>> thread,
>> > but that would not block you to test and cast your votes already.
>> > * Since the Flink website changes are still being worked on, you will
>> not
>> > yet be able to find the Stateful Functions docs from there. Here are the
>> > links [9] [10].
>> >
>> > **Vote Duration**
>> >
>> > Since this RC only fixes licensing issues from previous RCs,
>> > and the code itself has not been touched,
>> > I'd like to stick with the original vote ending time.
>> >
>> > The vote will be open for at least 72 hours starting Monday
>> > *(target end date is Wednesday, April 1st).*
>> > It is adopted by majority approval, with at least 3 PMC affirmative
>> votes.
>> >
>> > Thanks,
>> > Gordon
>> >
>> > [1]
>> >
>> >
>> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
>> > [2]
>> https://dist.a

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Tzu-Li (Gordon) Tai
FYI - I've also updated the website Downloads page to include this release.
Please also consider that for your reviews:
https://github.com/apache/flink-web/pull/318

On Tue, Mar 31, 2020 at 3:42 AM Konstantin Knauf 
wrote:

> Hi Gordon,
>
> +1 (non-binding)
>
> * Maven build from source...check
> * Python build from source...check
> * Went through Walkthrough based on local builds...check
>
> Cheers,
>
> Konstantin
>
> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the *release candidate #4* for the version
> 2.0.0
> > of Apache Flink Stateful Functions,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Testing Guideline**
> >
> > You can find here [1] a doc that we can use for collaborating testing
> > efforts.
> > The listed testing tasks in the doc also serve as a guideline in what to
> > test for this release.
> > If you wish to take ownership of a testing task, simply put your name
> down
> > in the "Checked by" field of the task.
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Stateful Functions canonical source distribution, to be deployed to
> the
> > release repository at dist.apache.org
> > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > c) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> follows,
> > for your review:
> > * All artifacts for a) and b) can be found in the corresponding dev
> > repository at dist.apache.org [2]
> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> >
> > All artifacts are singed with the
> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> >
> > Other links for your review:
> > * JIRA release notes [5]
> > * source code tag "release-2.0.0-rc4" [6] [7]
> >
> > **Extra Remarks**
> >
> > * Part of the release is also official Docker images for Stateful
> > Functions. This can be a separate process, since the creation of those
> > relies on the fact that we have distribution jars already deployed to
> > Maven. I will follow-up with this after these artifacts are officially
> > released.
> > In the meantime, there is this discussion [8] ongoing about where to host
> > the StateFun Dockerfiles.
> > * The Flink Website and blog post is also being worked on (by Marta) as
> > part of the release, to incorporate the new Stateful Functions project.
> We
> > can follow up with a link to those changes afterwards in this vote
> thread,
> > but that would not block you to test and cast your votes already.
> > * Since the Flink website changes are still being worked on, you will not
> > yet be able to find the Stateful Functions docs from there. Here are the
> > links [9] [10].
> >
> > **Vote Duration**
> >
> > Since this RC only fixes licensing issues from previous RCs,
> > and the code itself has not been touched,
> > I'd like to stick with the original vote ending time.
> >
> > The vote will be open for at least 72 hours starting Monday
> > *(target end date is Wednesday, April 1st).*
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Thanks,
> > Gordon
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
> > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc4/
> > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1343/
> > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [5]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
> > [6]
> >
> >
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=5d5d62fca2dbe3c75e8157b7ce67d4d4ce12ffd9
> > [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc4
> > [8]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
> > [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> > [10]
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
> >
> > TIP: You can create a `settings.xml` file with these contents:
> >
> > """
> > 
> >   
> > flink-statefun-2.0.0
> >   
> >   
> > 
> >   flink-statefun-2.0.0
> >   
> > 
> >   flink-statefun-2.0.0
> >   
> > https://repository.apache.org/content/repositories/orgapacheflink-1343/
> > 
> > 
> > 
> >   archetype
> >   
> > https://repository.apache.org/content/repositories/orgapacheflink-1343/
> > 
> > 
> >   
> > 
> >   
> > 
> > """
> >
> > And reference that in you maven commands via `--settings
> > path/to/settings.xml`.
> > This is useful for creating a quickstart based on th

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Danny Chan
Thanks Jark for bring up this discussion, +1 for this idea, I believe the user 
has suffered from the verbose property key for long time.

Just one question, how do we handle the keys with wildcard, such as the 
“connector.properties.*” in Kafka connector which would then hand-over to Kafka 
client directly. As what suggested in FLIP-95, we use a ConfigOption to 
describe the “supported properties”, then I have to concerns:

• For the new keys, do we still need to put multi-lines there the such key, 
such as “connector.properties.abc” “connector.properties.def”, or should we 
inline them, such as “some-key-prefix” = “k1=v1, k2=v2 ..."
• Should the ConfigOption support the wildcard ? (If we plan to support the 
current multi-line style)


Best,
Danny Chan
在 2020年3月31日 +0800 AM12:37,Jark Wu ,写道:
> Hi all,
>
> Thanks for the feedbacks.
>
> It seems that we have a conclusion to put the version into the factory
> identifier. I'm also fine with this.
> If we have this outcome, the interface of Factory#factoryVersion is not
> needed anymore, this can simplify the learning cost of new factory.
> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> 
>
> kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
> the meaning of "universal" not easy to understand.
> kafka-0.11 => kafka for 0.11 version
> kafka-0.10 => kafka for 0.10 version
> elasticsearch-6 => elasticsearch for 6.x versions
> elasticsearch-7 => elasticsearch for 7.x versions
> hbase-1.4 => hbase for 1.4.x versions
> jdbc
> filesystem
>
> We use "-" as the version delimiter to make them to be more consistent.
> This is not forces, users can also use other delimiters or without
> delimiter.
> But this can be a guilde in the Javadoc of Factory, to make the connector
> ecosystem to be more consistent.
>
> What do you think?
>
> 
>
> Regarding "connector.property-version":
>
> Hi @Dawid Wysakowicz  , the new fatories are
> designed not support to read current properties.
> All the current properties are routed to the old factories if they are
> using "connector.type". Otherwise, properties are routed to new factories.
>
> If I understand correctly, the "connector.property-version" is attched
> implicitly by system, not manually set by users.
> For example, the framework should add "connector.property-version=1" to
> properties when processing DDL statement.
> I'm fine to add a "connector.property-version=1" when processing DDL
> statement, but I think it's also fine if we don't,
> because this can be done in the future if need and the default version can
> be 1.
>
> Best,
> Jark
>
> On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:
>
> > Hi all,
> >
> > Thanks for the feedbacks.
> >
> > It seems that we have a conclusion to put the version into the factory
> > identifier. I'm also fine with this.
> > If we have this outcome, the interface of Factory#factoryVersion is not
> > needed anymore, this can simplify the learning cost of new factory.
> > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> > 
> >
> > Btw, I would like to use "_" instead of "-" as the version delimiter,
> > because "-" looks like minus and may confuse users, e.g. "elasticsearch-6".
> > This is not forced, but should be a guilde in the Javadoc of Factory.
> > I propose to use the following identifiers for existing connectors,
> >
> > kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
> > the meaning of "universal" not easy to understand.
> > kafka-0.11 => kafka for 0.11 version
> > kafka-0.10 => kafka for 0.10 version
> > elasticsearch-6 => elasticsearch for 6.x versions
> > elasticsearch-7 => elasticsearch for 7.x versions
> > hbase-1.4 => hbase for 1.4.x versions
> > jdbc
> > filesystem
> >
> > We use "-" as the version delimiter to make them to be more consistent.
> > This is not forces, users can also use other delimiters or without
> > delimiter.
> > But this can be a guilde in the Javadoc of Factory, to make the connector
> > ecosystem to be more consistent.
> >
> > What do you think?
> >
> > 
> >
> > Regarding "connector.property-version":
> >
> > Hi @Dawid Wysakowicz  , the new fatories are
> > designed not support to read current properties.
> > All the current properties are routed to the old factories if they are
> > using "connector.type". Otherwise, properties are routed to new factories.
> >
> > If I understand correctly, the "connector.property-version" is attched
> > implicitly by system, not manually set by users.
> > For example, the framework should add "connector.property-version=1" to
> > properties when processing DDL statement.
> > I'm fine to add a "connector.property-version=1" when processing DDL
> > statement, but I think it's also fine if we don't,
> > because this can be done in the future if need and the default version can
> > be 1.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> >
> > On Mon, 30 Mar 2020 at 23:21, Dawid Wysakowicz 
> > wrote:
> >

Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Jingsong Li
Thanks Jeff very much, that is very impressive.

Zeppelin is very convenient development platform.

Best,
Jingsong Lee

On Tue, Mar 31, 2020 at 11:58 AM Zhijiang 
wrote:

>
> Thanks for the continuous efforts for engaging in Flink ecosystem Jeff!
> Glad to see the progressive achievement. Wish more users try it out in
> practice.
>
> Best,
> Zhijiang
>
>
> --
> From:Dian Fu 
> Send Time:2020 Mar. 31 (Tue.) 10:15
> To:Jeff Zhang 
> Cc:user ; dev 
> Subject:Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)
>
> Hi Jeff,
>
> Thanks for the great work and sharing it with the community! Very
> impressive and will try it out.
>
> Regards,
> Dian
>
> 在 2020年3月30日,下午9:16,Till Rohrmann  写道:
>
> This is great news Jeff! Thanks a lot for sharing it with the community.
> Looking forward trying Flink on Zeppelin out :-)
>
> Cheers,
> Till
>
> On Mon, Mar 30, 2020 at 2:47 PM Jeff Zhang  wrote:
> Hi Folks,
>
> I am very excited to announce the integration work of flink on apache
> zeppelin notebook is completed. You can now run flink jobs via datastream
> api, table api, sql, pyflink in apache apache zeppelin notebook. Download
> it here http://zeppelin.apache.org/download.html),
>
> Here's some highlights of this work
>
> 1. Support 3 kind of execution mode: local, remote, yarn
> 2. Support multiple languages  in one flink session: scala, python, sql
> 3. Support hive connector (reading from hive and writing to hive)
> 4. Dependency management
> 5. UDF support (scala, pyflink)
> 6. Support both batch sql and streaming sql
>
> For more details and usage instructions, you can refer following 4 blogs
>
> 1) Get started https://link.medium.com/oppqD6dIg5
>  2) Batch https://
> link.medium.com/3qumbwRIg5  3) Streaming
> https://link.medium.com/RBHa2lTIg5  4)
> Advanced usage https://link.medium.com/CAekyoXIg5
> 
>
> Welcome to use flink on zeppelin and give feedback and comments.
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

-- 
Best, Jingsong Lee


Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Zhijiang

Thanks for the continuous efforts for engaging in Flink ecosystem Jeff!
Glad to see the progressive achievement. Wish more users try it out in practice.

Best,
Zhijiang



--
From:Dian Fu 
Send Time:2020 Mar. 31 (Tue.) 10:15
To:Jeff Zhang 
Cc:user ; dev 
Subject:Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

Hi Jeff,

Thanks for the great work and sharing it with the community! Very impressive 
and will try it out.

Regards,
Dian

在 2020年3月30日,下午9:16,Till Rohrmann  写道:
This is great news Jeff! Thanks a lot for sharing it with the community. 
Looking forward trying Flink on Zeppelin out :-)

Cheers,
Till
On Mon, Mar 30, 2020 at 2:47 PM Jeff Zhang  wrote:
Hi Folks,

I am very excited to announce the integration work of flink on apache zeppelin 
notebook is completed. You can now run flink jobs via datastream api, table 
api, sql, pyflink in apache apache zeppelin notebook. Download it here 
http://zeppelin.apache.org/download.html), 

Here's some highlights of this work

1. Support 3 kind of execution mode: local, remote, yarn
2. Support multiple languages  in one flink session: scala, python, sql
3. Support hive connector (reading from hive and writing to hive) 
4. Dependency management
5. UDF support (scala, pyflink)
6. Support both batch sql and streaming sql

For more details and usage instructions, you can refer following 4 blogs

1) Get started https://link.medium.com/oppqD6dIg5 2) Batch 
https://link.medium.com/3qumbwRIg5 3) Streaming 
https://link.medium.com/RBHa2lTIg5 4) Advanced usage 
https://link.medium.com/CAekyoXIg5

Welcome to use flink on zeppelin and give feedback and comments. 

-- 
Best Regards

Jeff Zhang



[VOTE] FLIP-113: Supports Dynamic Table Options for Flink SQL

2020-03-30 Thread Danny Chan
Hi all,

I would like to start the vote for FLIP-113 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Danny Chan

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
[2]
https://lists.apache.org/thread.html/r94af5d3d97e76e7dd9df68cb0becc7ba74d15591a8fab84c72fa%40%3Cdev.flink.apache.org%3E


[jira] [Created] (FLINK-16874) Respect the dynamic options when calculating memory options in taskmanager.sh

2020-03-30 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-16874:
--

 Summary: Respect the dynamic options when calculating memory 
options in taskmanager.sh
 Key: FLINK-16874
 URL: https://issues.apache.org/jira/browse/FLINK-16874
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Yangze Guo
 Fix For: 1.10.1, 1.11.0


Since FLINK-9821, the taskmanager.sh will pass user-defined dynamic options to 
the TaskManagerRunner. However, in FLINK-13983, we calculate the memory-related 
configuration only according to the FLINK_CONF_DIR. We then append the 
calculation result as dynamic options to the TM, the user-defined dynamic 
options would be overridden and ignored.
The BashJavaUtils is already support loading dynamic options, we just need to 
pass it in bash script.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Yu Li
+1 (non-binding)

Checked sums and signatures: OK
Checked no binaries in source distribution: OK
Checked RAT and end-to-end tests (8u101, 11.0.4): OK
Checked version in pom/README/setup.py files: OK
Checked quick start: OK
Checked Greeter local docker-compose examples: OK
Checked Ridesharing local docker-compose examples: OK

Minor: encountered some problem when running ride-sharing example with 2GB
docker machine and confirmed issue disappeared after tuning memory up to
4GB. Opened FLINK-16873 [1] to improve README but not a blocker issue.

Best Regards,
Yu

[1] https://issues.apache.org/jira/browse/FLINK-16873


On Tue, 31 Mar 2020 at 03:42, Konstantin Knauf 
wrote:

> Hi Gordon,
>
> +1 (non-binding)
>
> * Maven build from source...check
> * Python build from source...check
> * Went through Walkthrough based on local builds...check
>
> Cheers,
>
> Konstantin
>
> On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the *release candidate #4* for the version
> 2.0.0
> > of Apache Flink Stateful Functions,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Testing Guideline**
> >
> > You can find here [1] a doc that we can use for collaborating testing
> > efforts.
> > The listed testing tasks in the doc also serve as a guideline in what to
> > test for this release.
> > If you wish to take ownership of a testing task, simply put your name
> down
> > in the "Checked by" field of the task.
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Stateful Functions canonical source distribution, to be deployed to
> the
> > release repository at dist.apache.org
> > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > c) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> follows,
> > for your review:
> > * All artifacts for a) and b) can be found in the corresponding dev
> > repository at dist.apache.org [2]
> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> >
> > All artifacts are singed with the
> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> >
> > Other links for your review:
> > * JIRA release notes [5]
> > * source code tag "release-2.0.0-rc4" [6] [7]
> >
> > **Extra Remarks**
> >
> > * Part of the release is also official Docker images for Stateful
> > Functions. This can be a separate process, since the creation of those
> > relies on the fact that we have distribution jars already deployed to
> > Maven. I will follow-up with this after these artifacts are officially
> > released.
> > In the meantime, there is this discussion [8] ongoing about where to host
> > the StateFun Dockerfiles.
> > * The Flink Website and blog post is also being worked on (by Marta) as
> > part of the release, to incorporate the new Stateful Functions project.
> We
> > can follow up with a link to those changes afterwards in this vote
> thread,
> > but that would not block you to test and cast your votes already.
> > * Since the Flink website changes are still being worked on, you will not
> > yet be able to find the Stateful Functions docs from there. Here are the
> > links [9] [10].
> >
> > **Vote Duration**
> >
> > Since this RC only fixes licensing issues from previous RCs,
> > and the code itself has not been touched,
> > I'd like to stick with the original vote ending time.
> >
> > The vote will be open for at least 72 hours starting Monday
> > *(target end date is Wednesday, April 1st).*
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Thanks,
> > Gordon
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
> > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc4/
> > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1343/
> > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [5]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
> > [6]
> >
> >
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=5d5d62fca2dbe3c75e8157b7ce67d4d4ce12ffd9
> > [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc4
> > [8]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
> > [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> > [10]
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
> >
> > TIP: You can create a `settings.xml` file with these contents:
> >
> > """
> > 
> >   
> > flink-statefun-2.0.0
> >   
> >   
> > 
> >   flink-statefun-2.0.0
> >   
> > 
> >   flink-stat

[jira] [Created] (FLINK-16873) Document the suggested memory size for statefun docker machine

2020-03-30 Thread Yu Li (Jira)
Yu Li created FLINK-16873:
-

 Summary: Document the suggested memory size for statefun docker 
machine
 Key: FLINK-16873
 URL: https://issues.apache.org/jira/browse/FLINK-16873
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Stateful Functions
Affects Versions: statefun-2.0
Reporter: Yu Li
 Fix For: statefun-2.1


When testing the ride-sharing example for stateful functions 2.0.0 release 
candidate, I found it will stably fail on my local Mac, with 2GB as the default 
memory setting in Docker Desktop. After tuning the memory up to 4GB, the issue 
disappeared.

According to this experience, I suggest we explicitly document the recommended 
memory size for stateful functions docker machine in README.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] FLIP-114: Support Python UDF in SQL Client

2020-03-30 Thread Wei Zhong
Hi all,

I would like to start the vote for FLIP-114[1] which is discussed and reached 
consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it by 2020-04-03 
03:00 UTC, unless there is an objection or not enough votes.

Best,
Wei

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-114-Support-Python-UDF-in-SQL-Client-td38655.html



Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-30 Thread Yangze Guo
Hi everyone,
I've updated the FLIP accordingly. The key change is replacing two
resource allocation interfaces to config options.

If there are no further comments, I would like to start a voting
thread by tomorrow.

Best,
Yangze Guo

On Mon, Mar 30, 2020 at 9:15 PM Till Rohrmann  wrote:
>
> If there is no need for the ExternalResourceDriver on the RM side, then it
> is always a good idea to keep it simple and don't introduce it. One can
> always change things once one realizes that there is a need for it.
>
> Cheers,
> Till
>
> On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo  wrote:
>
> > Hi @Till, @Xintong
> >
> > I think even without the credential concerns, replacing the interfaces
> > with configuration options is a good idea from my side.
> > - Currently, I don't see any external resource does not compatible
> > with this mechanism
> > - It reduces the burden of users to implement a plugin themselves.
> > WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Mar 30, 2020 at 5:44 PM Xintong Song 
> > wrote:
> > >
> > > I also agree that the pluggable ExternalResourceDriver should be loaded
> > by
> > > the cluster class loader. Despite the plugin might be implemented by
> > users,
> > > external resources (as part of task executor resources) should be cluster
> > > configurations, unlike job-level user codes such as UDFs, because the
> > task
> > > executors belongs to the cluster rather than jobs.
> > >
> > >
> > > IIUC, the concern Stephan raised is about the potential credential
> > problem
> > > when executing user codes on RM with cluster class loader. The concern
> > > makes sense to me, and I think what Yangze suggested should be a good
> > > approach trying to prevent such credential problems. The only purpose we
> > > tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on
> > RM
> > > was that, we need to set these key-value pairs to pod/container requests.
> > > Replacing the interfaces getKubernetes/YarnExternalResource with
> > > configuration options
> > > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > > we can still fulfill that purpose, without the credential risks.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann 
> > wrote:
> > >
> > > > At the moment the RM does not have a user code class loader and I agree
> > > > with Stephan that it should stay like this. This, however, does not
> > mean
> > > > that we cannot support pluggable components in the RM. As long as the
> > > > plugins are on the system's class path, it should be fine for the RM to
> > > > load them. For example, we could add external resources via Flink's
> > plugin
> > > > mechanism or something similar.
> > > >
> > > > A very simple implementation of such an ExternalResourceDriver could
> > be a
> > > > class which simply returns what is written in the flink-conf.yaml
> > under a
> > > > given key.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo  wrote:
> > > >
> > > > > Hi, Stephan,
> > > > >
> > > > > I see your concern and I totally agree with you.
> > > > >
> > > > > The interface on RM side is now `Map
> > > > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > > > get from it is the configuration key of that external resource in
> > > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > > external-resource.{resourceName}.amount.
> > > > > So, I think it makes sense to replace these two interfaces with two
> > > > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key.
> > We
> > > > > may lose some extensibility, but AFAIK it could work with common
> > > > > external resources like GPU, FPGA. WDYT?
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen 
> > wrote:
> > > > > >
> > > > > > Maybe one final comment: It is probably not an issue, but let's
> > try and
> > > > > > keep user code (via user code classloader) out of the
> > ResourceManager,
> > > > if
> > > > > > possible.
> > > > > >
> > > > > > As background:
> > > > > >
> > > > > > There were thoughts in the past to support setups where the RM
> > must run
> > > > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > > > credentials, as the user code might access them otherwise.
> > > > > > This is actually possible today, you can run the RM in a different
> > JVM
> > > > or
> > > > > > in a different container, and give it more credentials than JMs /
> > TMs.
> > > > > But
> > > > > > for this to be feasible, we cannot allow any user-defined code to
> > be in
> > > > > the
> > > > > > JVM, because that instantaneously breaks the isolation of
> > credentials.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo 
> > wrote:
> > > > > >
> > > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > > >
> > > > > > > Regardin

Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Dian Fu
Hi Jeff,

Thanks for the great work and sharing it with the community! Very impressive 
and will try it out.

Regards,
Dian

> 在 2020年3月30日,下午9:16,Till Rohrmann  写道:
> 
> This is great news Jeff! Thanks a lot for sharing it with the community. 
> Looking forward trying Flink on Zeppelin out :-)
> 
> Cheers,
> Till
> 
> On Mon, Mar 30, 2020 at 2:47 PM Jeff Zhang  > wrote:
> Hi Folks,
> 
> I am very excited to announce the integration work of flink on apache 
> zeppelin notebook is completed. You can now run flink jobs via datastream 
> api, table api, sql, pyflink in apache apache zeppelin notebook. Download it 
> here http://zeppelin.apache.org/download.html 
> ), 
> 
> Here's some highlights of this work
> 
> 1. Support 3 kind of execution mode: local, remote, yarn
> 2. Support multiple languages  in one flink session: scala, python, sql
> 3. Support hive connector (reading from hive and writing to hive) 
> 4. Dependency management
> 5. UDF support (scala, pyflink)
> 6. Support both batch sql and streaming sql
> 
> For more details and usage instructions, you can refer following 4 blogs
> 
> 1) Get started https://link.medium.com/oppqD6dIg5 
> 
> 2) Batch https://link.medium.com/3qumbwRIg5 
> 3) Streaming  https://link.medium.com/RBHa2lTIg5 
> 
> 4) Advanced usage  https://link.medium.com/CAekyoXIg5 
> 
> 
> Welcome to use flink on zeppelin and give feedback and comments. 
> 
> -- 
> Best Regards
> 
> Jeff Zhang



Re:Re: codestyle issue

2020-03-30 Thread flinker
Concise and comprehensive.Thank you.
At 2020-03-31 09:45:35, "Xintong Song"  wrote:
>Because it is a Enum. For java Enums, you need to first define its values,
>followed by a semicolon, then the methods. In this case, there's no values
>because we are only using Enum to keep ClientUtils singleton, but you still
>need the semicolon before defining methods. You could see that removing
>that semicolon will cause compiling errors.
>
>Thank you~
>
>Xintong Song
>
>
>
>On Tue, Mar 31, 2020 at 12:50 AM flinker  wrote:
>
>> Why are there unnecessary ';' lines in the code style of any files? For
>> example, line 41 of the file Flink / Flink Runtime / SRC / main / Java /
>> org / Apache / Flink / Runtime / client / clientutils.java.
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


Re: codestyle issue

2020-03-30 Thread Xintong Song
Because it is a Enum. For java Enums, you need to first define its values,
followed by a semicolon, then the methods. In this case, there's no values
because we are only using Enum to keep ClientUtils singleton, but you still
need the semicolon before defining methods. You could see that removing
that semicolon will cause compiling errors.

Thank you~

Xintong Song



On Tue, Mar 31, 2020 at 12:50 AM flinker  wrote:

> Why are there unnecessary ';' lines in the code style of any files? For
> example, line 41 of the file Flink / Flink Runtime / SRC / main / Java /
> org / Apache / Flink / Runtime / client / clientutils.java.
> Thanks.
>
>
>
>
>
>
>
>
>
>
>


Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Kurt Young
Regarding to "property-version", even this is brought up by myself, but I'm
actually not a fan of this.
As a user, I think this property is too much for me to learn, and I
probably won't use this anyway.
If framework can follow a good procedure to upgrade properties, show some
error messages (but still
can work) when user used some deprecated properties, and then finally drop
them 2-3 versions later.
I think this is good enough.

Best,
Kurt


On Tue, Mar 31, 2020 at 9:07 AM Kurt Young  wrote:

> Regarding to version delimiter, I'm in favor of using "-", not "_". Not
> only because we are using
> "-" for config keys in other module, like CoreOptions, DeploymentOptions,
> but also "-" is easier
> to type than "_" which doesn't need to press "shift" ;-)
>
> Best,
> Kurt
>
>
> On Tue, Mar 31, 2020 at 12:37 AM Jark Wu  wrote:
>
>> Hi all,
>>
>> Thanks for the feedbacks.
>>
>> It seems that we have a conclusion to put the version into the factory
>> identifier. I'm also fine with this.
>> If we have this outcome, the interface of Factory#factoryVersion is not
>> needed anymore, this can simplify the learning cost of new factory.
>> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
>> 
>>
>> kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
>> the meaning of "universal" not easy to understand.
>> kafka-0.11 => kafka for 0.11 version
>> kafka-0.10 => kafka for 0.10 version
>> elasticsearch-6 => elasticsearch for 6.x versions
>> elasticsearch-7 => elasticsearch for 7.x versions
>> hbase-1.4 => hbase for 1.4.x versions
>> jdbc
>> filesystem
>>
>> We use "-" as the version delimiter to make them to be more consistent.
>> This is not forces, users can also use other delimiters or without
>> delimiter.
>> But this can be a guilde in the Javadoc of Factory, to make the connector
>> ecosystem to be more consistent.
>>
>> What do you think?
>>
>> 
>>
>> Regarding "connector.property-version":
>>
>> Hi @Dawid Wysakowicz  , the new fatories are
>> designed not support to read current properties.
>> All the current properties are routed to the old factories if they are
>> using "connector.type". Otherwise, properties are routed to new factories.
>>
>> If I understand correctly, the "connector.property-version" is attched
>> implicitly by system, not manually set by users.
>> For example, the framework should add "connector.property-version=1" to
>> properties when processing DDL statement.
>> I'm fine to add a "connector.property-version=1" when processing DDL
>> statement, but I think it's also fine if we don't,
>> because this can be done in the future if need and the default version can
>> be 1.
>>
>> Best,
>> Jark
>>
>> On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:
>>
>> > Hi all,
>> >
>> > Thanks for the feedbacks.
>> >
>> > It seems that we have a conclusion to put the version into the factory
>> > identifier. I'm also fine with this.
>> > If we have this outcome, the interface of Factory#factoryVersion is not
>> > needed anymore, this can simplify the learning cost of new factory.
>> > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
>> > 
>> >
>> > Btw, I would like to use "_" instead of "-" as the version delimiter,
>> > because "-" looks like minus and may confuse users, e.g.
>> "elasticsearch-6".
>> > This is not forced, but should be a guilde in the Javadoc of Factory.
>> > I propose to use the following identifiers for existing connectors,
>> >
>> > kafka  => kafka for 0.11+ versions, we don't suffix "-universal",
>> because
>> > the meaning of "universal" not easy to understand.
>> > kafka-0.11 => kafka for 0.11 version
>> > kafka-0.10 => kafka for 0.10 version
>> > elasticsearch-6 => elasticsearch for 6.x versions
>> > elasticsearch-7 => elasticsearch for 7.x versions
>> > hbase-1.4 => hbase for 1.4.x versions
>> > jdbc
>> > filesystem
>> >
>> > We use "-" as the version delimiter to make them to be more consistent.
>> > This is not forces, users can also use other delimiters or without
>> > delimiter.
>> > But this can be a guilde in the Javadoc of Factory, to make the
>> connector
>> > ecosystem to be more consistent.
>> >
>> > What do you think?
>> >
>> > 
>> >
>> > Regarding "connector.property-version":
>> >
>> > Hi @Dawid Wysakowicz  , the new fatories are
>> > designed not support to read current properties.
>> > All the current properties are routed to the old factories if they are
>> > using "connector.type". Otherwise, properties are routed to new
>> factories.
>> >
>> > If I understand correctly, the "connector.property-version" is attched
>> > implicitly by system, not manually set by users.
>> > For example, the framework should add "connector.property-version=1" to
>> > properties when processing DDL statement.
>> > I'm fine to add a "connector.property-version=1" when processing DDL
>> > statement, but I think it's also fine if we don't,
>> > because this can be done in the future if need a

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Kurt Young
Regarding to version delimiter, I'm in favor of using "-", not "_". Not
only because we are using
"-" for config keys in other module, like CoreOptions, DeploymentOptions,
but also "-" is easier
to type than "_" which doesn't need to press "shift" ;-)

Best,
Kurt


On Tue, Mar 31, 2020 at 12:37 AM Jark Wu  wrote:

> Hi all,
>
> Thanks for the feedbacks.
>
> It seems that we have a conclusion to put the version into the factory
> identifier. I'm also fine with this.
> If we have this outcome, the interface of Factory#factoryVersion is not
> needed anymore, this can simplify the learning cost of new factory.
> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> 
>
> kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
> the meaning of "universal" not easy to understand.
> kafka-0.11 => kafka for 0.11 version
> kafka-0.10 => kafka for 0.10 version
> elasticsearch-6 => elasticsearch for 6.x versions
> elasticsearch-7 => elasticsearch for 7.x versions
> hbase-1.4 => hbase for 1.4.x versions
> jdbc
> filesystem
>
> We use "-" as the version delimiter to make them to be more consistent.
> This is not forces, users can also use other delimiters or without
> delimiter.
> But this can be a guilde in the Javadoc of Factory, to make the connector
> ecosystem to be more consistent.
>
> What do you think?
>
> 
>
> Regarding "connector.property-version":
>
> Hi @Dawid Wysakowicz  , the new fatories are
> designed not support to read current properties.
> All the current properties are routed to the old factories if they are
> using "connector.type". Otherwise, properties are routed to new factories.
>
> If I understand correctly, the "connector.property-version" is attched
> implicitly by system, not manually set by users.
> For example, the framework should add "connector.property-version=1" to
> properties when processing DDL statement.
> I'm fine to add a "connector.property-version=1" when processing DDL
> statement, but I think it's also fine if we don't,
> because this can be done in the future if need and the default version can
> be 1.
>
> Best,
> Jark
>
> On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:
>
> > Hi all,
> >
> > Thanks for the feedbacks.
> >
> > It seems that we have a conclusion to put the version into the factory
> > identifier. I'm also fine with this.
> > If we have this outcome, the interface of Factory#factoryVersion is not
> > needed anymore, this can simplify the learning cost of new factory.
> > We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> > 
> >
> > Btw, I would like to use "_" instead of "-" as the version delimiter,
> > because "-" looks like minus and may confuse users, e.g.
> "elasticsearch-6".
> > This is not forced, but should be a guilde in the Javadoc of Factory.
> > I propose to use the following identifiers for existing connectors,
> >
> > kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
> > the meaning of "universal" not easy to understand.
> > kafka-0.11 => kafka for 0.11 version
> > kafka-0.10 => kafka for 0.10 version
> > elasticsearch-6 => elasticsearch for 6.x versions
> > elasticsearch-7 => elasticsearch for 7.x versions
> > hbase-1.4 => hbase for 1.4.x versions
> > jdbc
> > filesystem
> >
> > We use "-" as the version delimiter to make them to be more consistent.
> > This is not forces, users can also use other delimiters or without
> > delimiter.
> > But this can be a guilde in the Javadoc of Factory, to make the connector
> > ecosystem to be more consistent.
> >
> > What do you think?
> >
> > 
> >
> > Regarding "connector.property-version":
> >
> > Hi @Dawid Wysakowicz  , the new fatories are
> > designed not support to read current properties.
> > All the current properties are routed to the old factories if they are
> > using "connector.type". Otherwise, properties are routed to new
> factories.
> >
> > If I understand correctly, the "connector.property-version" is attched
> > implicitly by system, not manually set by users.
> > For example, the framework should add "connector.property-version=1" to
> > properties when processing DDL statement.
> > I'm fine to add a "connector.property-version=1" when processing DDL
> > statement, but I think it's also fine if we don't,
> > because this can be done in the future if need and the default version
> can
> > be 1.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> >
> > On Mon, 30 Mar 2020 at 23:21, Dawid Wysakowicz 
> > wrote:
> >
> >> Hi all,
> >>
> >> I like the overall design of the FLIP.
> >>
> >> As for the withstanding concerns. I kind of like the approach to put the
> >> version into the factory identifier. I think it's the cleanest way to
> >> say that this version actually applies to the connector itself and not
> >> to the system it connects to. BTW, I think the outcome of this
> >> discussion will affect interfaces described in FLIP-95. If we put the
> >> version into the functionIdentifier,

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #4

2020-03-30 Thread Konstantin Knauf
Hi Gordon,

+1 (non-binding)

* Maven build from source...check
* Python build from source...check
* Went through Walkthrough based on local builds...check

Cheers,

Konstantin

On Mon, Mar 30, 2020 at 5:52 AM Tzu-Li (Gordon) Tai 
wrote:

> Hi everyone,
>
> Please review and vote on the *release candidate #4* for the version 2.0.0
> of Apache Flink Stateful Functions,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Testing Guideline**
>
> You can find here [1] a doc that we can use for collaborating testing
> efforts.
> The listed testing tasks in the doc also serve as a guideline in what to
> test for this release.
> If you wish to take ownership of a testing task, simply put your name down
> in the "Checked by" field of the task.
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Stateful Functions canonical source distribution, to be deployed to the
> release repository at dist.apache.org
> b) Stateful Functions Python SDK distributions to be deployed to PyPI
> c) Maven artifacts to be deployed to the Maven Central Repository
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
>
> All artifacts are singed with the
> key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
>
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-2.0.0-rc4" [6] [7]
>
> **Extra Remarks**
>
> * Part of the release is also official Docker images for Stateful
> Functions. This can be a separate process, since the creation of those
> relies on the fact that we have distribution jars already deployed to
> Maven. I will follow-up with this after these artifacts are officially
> released.
> In the meantime, there is this discussion [8] ongoing about where to host
> the StateFun Dockerfiles.
> * The Flink Website and blog post is also being worked on (by Marta) as
> part of the release, to incorporate the new Stateful Functions project. We
> can follow up with a link to those changes afterwards in this vote thread,
> but that would not block you to test and cast your votes already.
> * Since the Flink website changes are still being worked on, you will not
> yet be able to find the Stateful Functions docs from there. Here are the
> links [9] [10].
>
> **Vote Duration**
>
> Since this RC only fixes licensing issues from previous RCs,
> and the code itself has not been touched,
> I'd like to stick with the original vote ending time.
>
> The vote will be open for at least 72 hours starting Monday
> *(target end date is Wednesday, April 1st).*
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Gordon
>
> [1]
>
> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc4/
> [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1343/
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
> [6]
>
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=5d5d62fca2dbe3c75e8157b7ce67d4d4ce12ffd9
> [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc4
> [8]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
> [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> [10] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
>
> TIP: You can create a `settings.xml` file with these contents:
>
> """
> 
>   
> flink-statefun-2.0.0
>   
>   
> 
>   flink-statefun-2.0.0
>   
> 
>   flink-statefun-2.0.0
>   
> https://repository.apache.org/content/repositories/orgapacheflink-1343/
> 
> 
> 
>   archetype
>   
> https://repository.apache.org/content/repositories/orgapacheflink-1343/
> 
> 
>   
> 
>   
> 
> """
>
> And reference that in you maven commands via `--settings
> path/to/settings.xml`.
> This is useful for creating a quickstart based on the staged release and
> for building against the staged jars.
>


-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica 


--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tu

codestyle issue

2020-03-30 Thread flinker
Why are there unnecessary ';' lines in the code style of any files? For 
example, line 41 of the file Flink / Flink Runtime / SRC / main / Java / org / 
Apache / Flink / Runtime / client / clientutils.java.
Thanks.




 





 

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread Jark Wu
Hi,

I didn't follow the full discussion.
But I share the same concern with Timo that streaming queries should always
be async.
Otherwise, I can image it will cause a lot of confusion and problems if
users don't deeply keep the "sync" in mind (e.g. client hangs).
Besides, the streaming mode is still the majority use cases of Flink and
Flink SQL. We should put the usability at a high priority.

Best,
Jark


On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:

> Hi Godfrey,
>
> maybe I wasn't expressing my biggest concern enough in my last mail.
> Even in a singleline and sync execution, I think that streaming queries
> should not block the execution. Otherwise it is not possible to call
> collect() or print() on them afterwards.
>
> "there are too many things need to discuss for multiline":
>
> True, I don't want to solve all of them right now. But what I know is
> that our newly introduced methods should fit into a multiline execution.
> There is no big difference of calling `executeSql(A), executeSql(B)` and
> processing a multiline file `A;\nB;`.
>
> I think the example that you mentioned can simply be undefined for now.
> Currently, no catalog is modifying data but just metadata. This is a
> separate discussion.
>
> "result of the second statement is indeterministic":
>
> Sure this is indeterministic. But this is the implementers fault and we
> cannot forbid such pipelines.
>
> How about we always execute streaming queries async? It would unblock
> executeSql() and multiline statements.
>
> Having a `executeSqlAsync()` is useful for batch. However, I don't want
> `sync/async` be the new batch/stream flag. The execution behavior should
> come from the query itself.
>
> Regards,
> Timo
>
>
> On 30.03.20 11:12, godfrey he wrote:
> > Hi Timo,
> >
> > Agree with you that streaming queries is our top priority,
> > but I think there are too many things need to discuss for multiline
> > statements:
> > e.g.
> > 1. what's the behaivor of DDL and DML mixing for async execution:
> > create table t1 xxx;
> > create table t2 xxx;
> > insert into t2 select * from t1 where xxx;
> > drop table t1; // t1 may be a MySQL table, the data will also be deleted.
> >
> > t1 is dropped when "insert" job is running.
> >
> > 2. what's the behaivor of unified scenario for async execution: (as you
> > mentioned)
> > INSERT INTO t1 SELECT * FROM s;
> > INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >
> > The result of the second statement is indeterministic, because the first
> > statement maybe is running.
> > I think we need to put a lot of effort to define the behavior of
> logically
> > related queries.
> >
> > In this FLIP, I suggest we only handle single statement, and we also
> > introduce an async execute method
> > which is more important and more often used for users.
> >
> > Dor the sync methods (like `TableEnvironment.executeSql` and
> > `StatementSet.execute`),
> > the result will be returned until the job is finished. The following
> > methods will be introduced in this FLIP:
> >
> >   /**
> >* Asynchronously execute the given single statement
> >*/
> > TableEnvironment.executeSqlAsync(String statement): TableResult
> >
> > /**
> >   * Asynchronously execute the dml statements as a batch
> >   */
> > StatementSet.executeAsync(): TableResult
> >
> > public interface TableResult {
> > /**
> >  * return JobClient for DQL and DML in async mode, else return
> > Optional.empty
> >  */
> > Optional getJobClient();
> > }
> >
> > what do you think?
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年3月26日周四 下午9:15写道:
> >
> >> Hi Godfrey,
> >>
> >> executing streaming queries must be our top priority because this is
> >> what distinguishes Flink from competitors. If we change the execution
> >> behavior, we should think about the other cases as well to not break the
> >> API a third time.
> >>
> >> I fear that just having an async execute method will not be enough
> >> because users should be able to mix streaming and batch queries in a
> >> unified scenario.
> >>
> >> If I remember it correctly, we had some discussions in the past about
> >> what decides about the execution mode of a query. Currently, we would
> >> like to let the query decide, not derive it from the sources.
> >>
> >> So I could image a multiline pipeline as:
> >>
> >> USE CATALOG 'mycat';
> >> INSERT INTO t1 SELECT * FROM s;
> >> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >>
> >> For executeMultilineSql():
> >>
> >> sync because regular SQL
> >> sync because regular Batch SQL
> >> async because Streaming SQL
> >>
> >> For executeAsyncMultilineSql():
> >>
> >> async because everything should be async
> >> async because everything should be async
> >> async because everything should be async
> >>
> >> What we should not start for executeAsyncMultilineSql():
> >>
> >> sync because DDL
> >> async because everything should be async
> >> async because everything should be async
> >>
> >> What are you thoughts here?
> >>
> 

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi all,

Thanks for the feedbacks.

It seems that we have a conclusion to put the version into the factory
identifier. I'm also fine with this.
If we have this outcome, the interface of Factory#factoryVersion is not
needed anymore, this can simplify the learning cost of new factory.
We may need to update FLIP-95 and re-vote for it? cc @Timo Walther


kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
the meaning of "universal" not easy to understand.
kafka-0.11 => kafka for 0.11 version
kafka-0.10 => kafka for 0.10 version
elasticsearch-6 => elasticsearch for 6.x versions
elasticsearch-7 => elasticsearch for 7.x versions
hbase-1.4 => hbase for 1.4.x versions
jdbc
filesystem

We use "-" as the version delimiter to make them to be more consistent.
This is not forces, users can also use other delimiters or without
delimiter.
But this can be a guilde in the Javadoc of Factory, to make the connector
ecosystem to be more consistent.

What do you think?



Regarding "connector.property-version":

Hi @Dawid Wysakowicz  , the new fatories are
designed not support to read current properties.
All the current properties are routed to the old factories if they are
using "connector.type". Otherwise, properties are routed to new factories.

If I understand correctly, the "connector.property-version" is attched
implicitly by system, not manually set by users.
For example, the framework should add "connector.property-version=1" to
properties when processing DDL statement.
I'm fine to add a "connector.property-version=1" when processing DDL
statement, but I think it's also fine if we don't,
because this can be done in the future if need and the default version can
be 1.

Best,
Jark

On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:

> Hi all,
>
> Thanks for the feedbacks.
>
> It seems that we have a conclusion to put the version into the factory
> identifier. I'm also fine with this.
> If we have this outcome, the interface of Factory#factoryVersion is not
> needed anymore, this can simplify the learning cost of new factory.
> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
> 
>
> Btw, I would like to use "_" instead of "-" as the version delimiter,
> because "-" looks like minus and may confuse users, e.g. "elasticsearch-6".
> This is not forced, but should be a guilde in the Javadoc of Factory.
> I propose to use the following identifiers for existing connectors,
>
> kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
> the meaning of "universal" not easy to understand.
> kafka-0.11 => kafka for 0.11 version
> kafka-0.10 => kafka for 0.10 version
> elasticsearch-6 => elasticsearch for 6.x versions
> elasticsearch-7 => elasticsearch for 7.x versions
> hbase-1.4 => hbase for 1.4.x versions
> jdbc
> filesystem
>
> We use "-" as the version delimiter to make them to be more consistent.
> This is not forces, users can also use other delimiters or without
> delimiter.
> But this can be a guilde in the Javadoc of Factory, to make the connector
> ecosystem to be more consistent.
>
> What do you think?
>
> 
>
> Regarding "connector.property-version":
>
> Hi @Dawid Wysakowicz  , the new fatories are
> designed not support to read current properties.
> All the current properties are routed to the old factories if they are
> using "connector.type". Otherwise, properties are routed to new factories.
>
> If I understand correctly, the "connector.property-version" is attched
> implicitly by system, not manually set by users.
> For example, the framework should add "connector.property-version=1" to
> properties when processing DDL statement.
> I'm fine to add a "connector.property-version=1" when processing DDL
> statement, but I think it's also fine if we don't,
> because this can be done in the future if need and the default version can
> be 1.
>
> Best,
> Jark
>
>
>
>
>
> On Mon, 30 Mar 2020 at 23:21, Dawid Wysakowicz 
> wrote:
>
>> Hi all,
>>
>> I like the overall design of the FLIP.
>>
>> As for the withstanding concerns. I kind of like the approach to put the
>> version into the factory identifier. I think it's the cleanest way to
>> say that this version actually applies to the connector itself and not
>> to the system it connects to. BTW, I think the outcome of this
>> discussion will affect interfaces described in FLIP-95. If we put the
>> version into the functionIdentifier, do we need Factory#factoryVersion?
>>
>> I also think it does make sense to have a versioning for the properties
>> as well. Are we able to read all the current properties with the new
>> factories? I think we could use the "connector.property-version" to
>> alternate between different Factory interfaces to support the old set of
>> properties. Otherwise the new factories need to understand both set of
>> properties, don't they?
>>
>> Best,
>>
>> Dawid
>>
>> On 30/03/2020 17:07, Timo Walther wrote:
>> > Hi Jark,
>> >
>> > thanks for the FLIP. I don't 

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi all,

Thanks for the feedbacks.

It seems that we have a conclusion to put the version into the factory
identifier. I'm also fine with this.
If we have this outcome, the interface of Factory#factoryVersion is not
needed anymore, this can simplify the learning cost of new factory.
We may need to update FLIP-95 and re-vote for it? cc @Timo Walther


Btw, I would like to use "_" instead of "-" as the version delimiter,
because "-" looks like minus and may confuse users, e.g. "elasticsearch-6".
This is not forced, but should be a guilde in the Javadoc of Factory.
I propose to use the following identifiers for existing connectors,

kafka  => kafka for 0.11+ versions, we don't suffix "-universal", because
the meaning of "universal" not easy to understand.
kafka-0.11 => kafka for 0.11 version
kafka-0.10 => kafka for 0.10 version
elasticsearch-6 => elasticsearch for 6.x versions
elasticsearch-7 => elasticsearch for 7.x versions
hbase-1.4 => hbase for 1.4.x versions
jdbc
filesystem

We use "-" as the version delimiter to make them to be more consistent.
This is not forces, users can also use other delimiters or without
delimiter.
But this can be a guilde in the Javadoc of Factory, to make the connector
ecosystem to be more consistent.

What do you think?



Regarding "connector.property-version":

Hi @Dawid Wysakowicz  , the new fatories are
designed not support to read current properties.
All the current properties are routed to the old factories if they are
using "connector.type". Otherwise, properties are routed to new factories.

If I understand correctly, the "connector.property-version" is attched
implicitly by system, not manually set by users.
For example, the framework should add "connector.property-version=1" to
properties when processing DDL statement.
I'm fine to add a "connector.property-version=1" when processing DDL
statement, but I think it's also fine if we don't,
because this can be done in the future if need and the default version can
be 1.

Best,
Jark





On Mon, 30 Mar 2020 at 23:21, Dawid Wysakowicz 
wrote:

> Hi all,
>
> I like the overall design of the FLIP.
>
> As for the withstanding concerns. I kind of like the approach to put the
> version into the factory identifier. I think it's the cleanest way to
> say that this version actually applies to the connector itself and not
> to the system it connects to. BTW, I think the outcome of this
> discussion will affect interfaces described in FLIP-95. If we put the
> version into the functionIdentifier, do we need Factory#factoryVersion?
>
> I also think it does make sense to have a versioning for the properties
> as well. Are we able to read all the current properties with the new
> factories? I think we could use the "connector.property-version" to
> alternate between different Factory interfaces to support the old set of
> properties. Otherwise the new factories need to understand both set of
> properties, don't they?
>
> Best,
>
> Dawid
>
> On 30/03/2020 17:07, Timo Walther wrote:
> > Hi Jark,
> >
> > thanks for the FLIP. I don't have a strong opinion on
> > DynamicTableFactory#factoryVersion() for distiguishing connector
> > versions. We can also include it in the factory identifier itself. For
> > Kafka, I would then just use `kafka` because the naming "universal"
> > was just a helper solution at that time.
> >
> > What we need is a good guide for how to design the options. We should
> > include that in the JavaDoc of factory interface itself, once it is
> > in-place. I like the difference between source and sink in properties.
> >
> > Regarding "connector.property-version":
> >
> > It was meant for backwards compatibility along the dimension of
> > "property design" compared to "connector version". However, since the
> > connector is now responsible for parsing all properties, it is not as
> > necessary as it was before.
> >
> > However, a change of the property design as it is done in FLIP-107
> > becomes more difficult without a property version and versioning is
> > always a good idea in API world.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.03.20 16:38, Benchao Li wrote:
> >> Hi Jark,
> >>
> >> Thanks for starting the discussion. The FLIP LGTM generally.
> >>
> >> Regarding connector version VS put version into connector's name,
> >> I favor the latter personally, using one property to locate a
> >> connector can make the error log more precise.
> >>  From the users' side, using one property to match a connector will
> >> be easier. Especially we have many connectors,
> >> and some of the need version property required, and some of them not.
> >>
> >> Regarding Jingsong's suggestion,
> >> IMO, it's a very good complement to the FLIP. Distinguishing
> >> properties for source and sink can be very useful, and
> >> also this will make the connector property more precise.
> >> We are also sick of this for now, we cannot know whether a DDL is a
> >> source or sink unless we look through all queries where
> >> t

Re: [DISCUSS] FLIP-118: Improve Flink’s ID system

2020-03-30 Thread Steven Wu
+1 on allowing user defined resourceId for taskmanager

On Sun, Mar 29, 2020 at 7:24 PM Yang Wang  wrote:

> Hi Konstantin,
>
> I think it is a good idea. Currently, our users also report a similar issue
> with
> resourceId of standalone cluster. When we start a standalone cluster now,
> the `TaskManagerRunner` always generates a uuid for the resourceId. It will
> be used to register to the jobmanager and not convenient to match with the
> real
> taskmanager, especially in container environment.
>
> I think a probably solution is we could support the user defined
> resourceId.
> We could get it from the environment. For standalone on K8s, we could set
> the "RESOURCE_ID" env to the pod name so that it is easier to match the
> taskmanager with K8s pod.
>
> Moreover, i am afraid we could not set the pod name to the resourceId. I
> think
> you could set the "deployment.meta.name". Since the pod name is generated
> by
> K8s in the pattern {deployment.meta.nane}-{rc.uuid}-{uuid}. On the
> contrary, we
> will set the resourceId to the pod name.
>
>
> Best,
> Yang
>
> Konstantin Knauf  于2020年3月29日周日 下午8:06写道:
>
> > Hi Yangze, Hi Till,
> >
> > thanks you for working on this topic. I believe it will make debugging
> > large Apache Flink deployments much more feasible.
> >
> > I was wondering whether it would make sense to allow the user to specify
> > the Resource ID in standalone setups?  For example, many users still
> > implicitly use standalone clusters on Kubernetes (the native support is
> > still experimental) and in these cases it would be interesting to also
> set
> > the PodName as the ResourceID. What do you think?
> >
> > Cheers,
> >
> > Kosntantin
> >
> > On Thu, Mar 26, 2020 at 6:49 PM Till Rohrmann 
> > wrote:
> >
> > > Hi Yangze,
> > >
> > > thanks for creating this FLIP. I think it is a very good improvement
> > > helping our users and ourselves understanding better what's going on in
> > > Flink.
> > >
> > > Creating the ResourceIDs with host information/pod name is a good idea.
> > >
> > > Also deriving ExecutionGraph IDs from their superset ID is a good idea.
> > >
> > > The InstanceID is used for fencing purposes. I would not make it a
> > > composition of the ResourceID + a monotonically increasing number. The
> > > problem is that in case of a RM failure the InstanceIDs would start
> from
> > 0
> > > again and this could lead to collisions.
> > >
> > > Logging more information on how the different runtime IDs are
> correlated
> > is
> > > also a good idea.
> > >
> > > Two other ideas for simplifying the ids are the following:
> > >
> > > * The SlotRequestID was introduced because the SlotPool was a separate
> > > RpcEndpoint a while ago. With this no longer being the case I think we
> > > could remove the SlotRequestID and replace it with the AllocationID.
> > > * Instead of creating new SlotRequestIDs for multi task slots one could
> > > derive them from the SlotRequestID used for requesting the underlying
> > > AllocatedSlot.
> > >
> > > Given that the slot sharing logic will most likely be reworked with the
> > > pipelined region scheduling, we might be able to resolve these two
> points
> > > as part of the pipelined region scheduling effort.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Mar 26, 2020 at 10:51 AM Yangze Guo 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We would like to start a discussion thread on "FLIP-118: Improve
> > > > Flink’s ID system"[1].
> > > >
> > > > This FLIP mainly discusses the following issues, target to enhance
> the
> > > > readability of IDs in log and help user to debug in case of failures:
> > > >
> > > > - Enhance the readability of the string literals of IDs. Most of them
> > > > are hashcodes, e.g. ExecutionAttemptID, which do not provide much
> > > > meaningful information and are hard to recognize and compare for
> > > > users.
> > > > - Log the ID’s lineage information to make debugging more convenient.
> > > > Currently, the log fails to always show the lineage information
> > > > between IDs. Finding out relationships between entities identified by
> > > > given IDs is a common demand, e.g., slot of which AllocationID is
> > > > assigned to satisfy slot request of with SlotRequestID. Absence of
> > > > such lineage information, it’s impossible to track the end to end
> > > > lifecycle of an Execution or a Task now, which makes debugging
> > > > difficult.
> > > >
> > > > Key changes proposed in the FLIP are as follows:
> > > >
> > > > - Add location information to distributed components
> > > > - Add topology information to graph components
> > > > - Log the ID’s lineage information
> > > > - Expose the identifier of distributing component to user
> > > >
> > > > Please find more details in the FLIP wiki document [1]. Looking
> forward
> > > to
> > > > your feedbacks.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148643521
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> >

[jira] [Created] (FLINK-16872) Flink UI shows server error on selecting back pressure on a batch job

2020-03-30 Thread Nikola (Jira)
Nikola created FLINK-16872:
--

 Summary: Flink UI shows server error on selecting back pressure on 
a batch job 
 Key: FLINK-16872
 URL: https://issues.apache.org/jira/browse/FLINK-16872
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.10.0
 Environment: flink 1.10
Reporter: Nikola
 Attachments: Screenshot 2020-03-30 at 19.58.04.png

I have a batch job which when I click on "Back pressure" it shows a bubble 
saying "Server error". The reason is because the backend returns 500 error.

!Screenshot 2020-03-30 at 19.58.04.png!

 

This is the body returned:



 
{code:java}
{ "errors": [ "Internal server error.", "" ] }{code}
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16871) Make more build information available during runtime.

2020-03-30 Thread Niels Basjes (Jira)
Niels Basjes created FLINK-16871:


 Summary: Make more build information available during runtime.
 Key: FLINK-16871
 URL: https://issues.apache.org/jira/browse/FLINK-16871
 Project: Flink
  Issue Type: Improvement
Reporter: Niels Basjes


This is a split from FLINK-15794 where (as discussed 
[here|https://github.com/apache/flink/pull/11245]) a file is generated during 
the build with the properties that were valid at build time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16870) OrcTableSource throws NullPointerException

2020-03-30 Thread Nikola (Jira)
Nikola created FLINK-16870:
--

 Summary: OrcTableSource throws NullPointerException
 Key: FLINK-16870
 URL: https://issues.apache.org/jira/browse/FLINK-16870
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ORC
Affects Versions: 1.10.0
 Environment: flink 1.10
Reporter: Nikola
 Attachments: flink-1.10-minimal-example.txt, 
flink-1.10-orc-exception.log

I am trying to read some ORC data from HDFS as given the example here: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.10/api/java/org/apache/flink/orc/OrcTableSource.html]

When I try to do this, the job crashes with NullPointerException:


 
{code:java}
Caused by: java.lang.NullPointerException at 
org.apache.flink.orc.shim.OrcShimV200.computeProjectionMask(OrcShimV200.java:188)
 at 
org.apache.flink.orc.shim.OrcShimV200.createRecordReader(OrcShimV200.java:120) 
{code}
 

I have attached a minimal version of code which can reproduce the issue. The 
same piece of code (and more complex) runs fine on flink 1.8.2

I have tried to look what is causing it and it seems that the 
NullPointerException happens on this line: 
[https://github.com/apache/flink/blob/release-1.10/flink-formats/flink-orc/src/main/java/org/apache/flink/orc/shim/OrcShimV200.java#L188

]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread Timo Walther

Hi Godfrey,

maybe I wasn't expressing my biggest concern enough in my last mail. 
Even in a singleline and sync execution, I think that streaming queries 
should not block the execution. Otherwise it is not possible to call 
collect() or print() on them afterwards.


"there are too many things need to discuss for multiline":

True, I don't want to solve all of them right now. But what I know is 
that our newly introduced methods should fit into a multiline execution.
There is no big difference of calling `executeSql(A), executeSql(B)` and 
processing a multiline file `A;\nB;`.


I think the example that you mentioned can simply be undefined for now. 
Currently, no catalog is modifying data but just metadata. This is a 
separate discussion.


"result of the second statement is indeterministic":

Sure this is indeterministic. But this is the implementers fault and we 
cannot forbid such pipelines.


How about we always execute streaming queries async? It would unblock 
executeSql() and multiline statements.


Having a `executeSqlAsync()` is useful for batch. However, I don't want 
`sync/async` be the new batch/stream flag. The execution behavior should 
come from the query itself.


Regards,
Timo


On 30.03.20 11:12, godfrey he wrote:

Hi Timo,

Agree with you that streaming queries is our top priority,
but I think there are too many things need to discuss for multiline
statements:
e.g.
1. what's the behaivor of DDL and DML mixing for async execution:
create table t1 xxx;
create table t2 xxx;
insert into t2 select * from t1 where xxx;
drop table t1; // t1 may be a MySQL table, the data will also be deleted.

t1 is dropped when "insert" job is running.

2. what's the behaivor of unified scenario for async execution: (as you
mentioned)
INSERT INTO t1 SELECT * FROM s;
INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;

The result of the second statement is indeterministic, because the first
statement maybe is running.
I think we need to put a lot of effort to define the behavior of logically
related queries.

In this FLIP, I suggest we only handle single statement, and we also
introduce an async execute method
which is more important and more often used for users.

Dor the sync methods (like `TableEnvironment.executeSql` and
`StatementSet.execute`),
the result will be returned until the job is finished. The following
methods will be introduced in this FLIP:

  /**
   * Asynchronously execute the given single statement
   */
TableEnvironment.executeSqlAsync(String statement): TableResult

/**
  * Asynchronously execute the dml statements as a batch
  */
StatementSet.executeAsync(): TableResult

public interface TableResult {
/**
 * return JobClient for DQL and DML in async mode, else return
Optional.empty
 */
Optional getJobClient();
}

what do you think?

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午9:15写道:


Hi Godfrey,

executing streaming queries must be our top priority because this is
what distinguishes Flink from competitors. If we change the execution
behavior, we should think about the other cases as well to not break the
API a third time.

I fear that just having an async execute method will not be enough
because users should be able to mix streaming and batch queries in a
unified scenario.

If I remember it correctly, we had some discussions in the past about
what decides about the execution mode of a query. Currently, we would
like to let the query decide, not derive it from the sources.

So I could image a multiline pipeline as:

USE CATALOG 'mycat';
INSERT INTO t1 SELECT * FROM s;
INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;

For executeMultilineSql():

sync because regular SQL
sync because regular Batch SQL
async because Streaming SQL

For executeAsyncMultilineSql():

async because everything should be async
async because everything should be async
async because everything should be async

What we should not start for executeAsyncMultilineSql():

sync because DDL
async because everything should be async
async because everything should be async

What are you thoughts here?

Regards,
Timo


On 26.03.20 12:50, godfrey he wrote:

Hi Timo,

I agree with you that streaming queries mostly need async execution.
In fact, our original plan is only introducing sync methods in this FLIP,
and async methods (like "executeSqlAsync") will be introduced in the

future

which is mentioned in the appendix.

Maybe the async methods also need to be considered in this FLIP.

I think sync methods is also useful for streaming which can be used to

run

bounded source.
Maybe we should check whether all sources are bounded in sync execution
mode.


Also, if we block for streaming queries, we could never support
multiline files. Because the first INSERT INTO would block the further
execution.

agree with you, we need async method to submit multiline files,
and files should be limited that the DQL and DML should be always in the
end for streaming.

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午4:29写道

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Dawid Wysakowicz
Hi all,

I like the overall design of the FLIP.

As for the withstanding concerns. I kind of like the approach to put the
version into the factory identifier. I think it's the cleanest way to
say that this version actually applies to the connector itself and not
to the system it connects to. BTW, I think the outcome of this
discussion will affect interfaces described in FLIP-95. If we put the
version into the functionIdentifier, do we need Factory#factoryVersion?

I also think it does make sense to have a versioning for the properties
as well. Are we able to read all the current properties with the new
factories? I think we could use the "connector.property-version" to
alternate between different Factory interfaces to support the old set of
properties. Otherwise the new factories need to understand both set of
properties, don't they?

Best,

Dawid

On 30/03/2020 17:07, Timo Walther wrote:
> Hi Jark,
>
> thanks for the FLIP. I don't have a strong opinion on
> DynamicTableFactory#factoryVersion() for distiguishing connector
> versions. We can also include it in the factory identifier itself. For
> Kafka, I would then just use `kafka` because the naming "universal"
> was just a helper solution at that time.
>
> What we need is a good guide for how to design the options. We should
> include that in the JavaDoc of factory interface itself, once it is
> in-place. I like the difference between source and sink in properties.
>
> Regarding "connector.property-version":
>
> It was meant for backwards compatibility along the dimension of
> "property design" compared to "connector version". However, since the
> connector is now responsible for parsing all properties, it is not as
> necessary as it was before.
>
> However, a change of the property design as it is done in FLIP-107
> becomes more difficult without a property version and versioning is
> always a good idea in API world.
>
> Regards,
> Timo
>
>
> On 30.03.20 16:38, Benchao Li wrote:
>> Hi Jark,
>>
>> Thanks for starting the discussion. The FLIP LGTM generally.
>>
>> Regarding connector version VS put version into connector's name,
>> I favor the latter personally, using one property to locate a
>> connector can make the error log more precise.
>>  From the users' side, using one property to match a connector will
>> be easier. Especially we have many connectors,
>> and some of the need version property required, and some of them not.
>>
>> Regarding Jingsong's suggestion,
>> IMO, it's a very good complement to the FLIP. Distinguishing
>> properties for source and sink can be very useful, and
>> also this will make the connector property more precise.
>> We are also sick of this for now, we cannot know whether a DDL is a
>> source or sink unless we look through all queries where
>> the table is used.
>> Even more, some of the required properties are only required for
>> source, bug we cannot leave it blank for sink, and vice versa.
>> I think we can also add a type for dimension tables except source and
>> sink.
>>
>> Kurt Young mailto:ykt...@gmail.com>> 于2020年3月30日
>> 周一 下午8:16写道:
>>
>>  > It's not possible for the framework to throw such exception.
>>     Because the
>>     framework doesn't know what versions do the connector support.
>>
>>     Not really, we can list all valid connectors framework could
>> found. E.g.
>>     user mistyped 'kafka-0.x', the error message will looks like:
>>
>>     we don't have any connector named "kafka-0.x", but we have:
>>     FileSystem
>>     Kafka-0.10
>>     Kafka-0.11
>>     ElasticSearch6
>>     ElasticSearch7
>>
>>     Best,
>>     Kurt
>>
>>
>>     On Mon, Mar 30, 2020 at 5:11 PM Jark Wu >     > wrote:
>>
>>  > Hi Kurt,
>>  >
>>  > > 2) Lists all available connectors seems also quite
>>     straightforward, e.g
>>  > user provided a wrong "kafka-0.8", we tell user we have
>> candidates of
>>  > "kakfa-0.11", "kafka-universal"
>>  > It's not possible for the framework to throw such exception.
>>     Because the
>>  > framework doesn't know what versions do the connector support.
>>     All the
>>  > version information is a blackbox in the identifier. But with
>>  > `Factory#factoryVersion()` interface, we can know all the
>> supported
>>  > versions.
>>  >
>>  > > 3) I don't think so. We can still treat it as the same
>>     connector but with
>>  > different versions.
>>  > That's true but that's weird. Because from the plain DDL
>>     definition, they
>>  > look like different connectors with different "connector"
>> value, e.g.
>>  > 'connector=kafka-0.8', 'connector=kafka-0.10'.
>>  >
>>  > > If users don't set any version, we will use "kafka-universal"
>>     instead.
>>  > The behavior is inconsistent IMO.
>>  > That is a long term vision when there is no kafka clusters
>> with <0.11
>>  > version.
>>  > At that point, "universal" is the only supported version in Flink
>>     and the
>>    

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Timo Walther

Hi Jark,

thanks for the FLIP. I don't have a strong opinion on 
DynamicTableFactory#factoryVersion() for distiguishing connector 
versions. We can also include it in the factory identifier itself. For 
Kafka, I would then just use `kafka` because the naming "universal" was 
just a helper solution at that time.


What we need is a good guide for how to design the options. We should 
include that in the JavaDoc of factory interface itself, once it is 
in-place. I like the difference between source and sink in properties.


Regarding "connector.property-version":

It was meant for backwards compatibility along the dimension of 
"property design" compared to "connector version". However, since the 
connector is now responsible for parsing all properties, it is not as 
necessary as it was before.


However, a change of the property design as it is done in FLIP-107 
becomes more difficult without a property version and versioning is 
always a good idea in API world.


Regards,
Timo


On 30.03.20 16:38, Benchao Li wrote:

Hi Jark,

Thanks for starting the discussion. The FLIP LGTM generally.

Regarding connector version VS put version into connector's name,
I favor the latter personally, using one property to locate a connector 
can make the error log more precise.
 From the users' side, using one property to match a connector will be 
easier. Especially we have many connectors,

and some of the need version property required, and some of them not.

Regarding Jingsong's suggestion,
IMO, it's a very good complement to the FLIP. Distinguishing properties 
for source and sink can be very useful, and

also this will make the connector property more precise.
We are also sick of this for now, we cannot know whether a DDL is a 
source or sink unless we look through all queries where

the table is used.
Even more, some of the required properties are only required for source, 
bug we cannot leave it blank for sink, and vice versa.

I think we can also add a type for dimension tables except source and sink.

Kurt Young mailto:ykt...@gmail.com>> 于2020年3月30日 
周一 下午8:16写道:


 > It's not possible for the framework to throw such exception.
Because the
framework doesn't know what versions do the connector support.

Not really, we can list all valid connectors framework could found. E.g.
user mistyped 'kafka-0.x', the error message will looks like:

we don't have any connector named "kafka-0.x", but we have:
FileSystem
Kafka-0.10
Kafka-0.11
ElasticSearch6
ElasticSearch7

Best,
Kurt


On Mon, Mar 30, 2020 at 5:11 PM Jark Wu mailto:imj...@gmail.com>> wrote:

 > Hi Kurt,
 >
 > > 2) Lists all available connectors seems also quite
straightforward, e.g
 > user provided a wrong "kafka-0.8", we tell user we have candidates of
 > "kakfa-0.11", "kafka-universal"
 > It's not possible for the framework to throw such exception.
Because the
 > framework doesn't know what versions do the connector support.
All the
 > version information is a blackbox in the identifier. But with
 > `Factory#factoryVersion()` interface, we can know all the supported
 > versions.
 >
 > > 3) I don't think so. We can still treat it as the same
connector but with
 > different versions.
 > That's true but that's weird. Because from the plain DDL
definition, they
 > look like different connectors with different "connector" value, e.g.
 > 'connector=kafka-0.8', 'connector=kafka-0.10'.
 >
 > > If users don't set any version, we will use "kafka-universal"
instead.
 > The behavior is inconsistent IMO.
 > That is a long term vision when there is no kafka clusters with <0.11
 > version.
 > At that point, "universal" is the only supported version in Flink
and the
 > "version" key can be optional.
 >
 > -
 >
 > Hi Jingsong,
 >
 > > "version" vs "kafka.version"
 > I though about it. But if we prefix "kafka" to version, we should
prefix
 > "kafka" for all other property keys, because they are all kafka
specific
 > options.
 > However, that will make the property set verbose, see rejected
option#2 in
 > the FLIP.
 >
 > > explicitly separate options for source and sink
 > That's a good topic. It's good to have a guideline for the new
property
 > keys.
 > I'm fine to prefix with a "source"/"sink" for some connector keys.
 > Actually, we already do this in some connectors, e.g. jdbc and hbase.
 >
 > Best,
 > Jark
 >
 > On Mon, 30 Mar 2020 at 16:36, Jingsong Li mailto:jingsongl...@gmail.com>> wrote:
 >
 > > Thanks Jark for the proposal.
 > >
 > > +1 to the general idea.
 > >
 > > For "version", what about "kafka.version"? It is obvious to
know its
 > > meaning.
 > >
 > > And I'd like to start a new topic:
 > > Should we 

[jira] [Created] (FLINK-16869) Flink UI does not show version

2020-03-30 Thread Nikola (Jira)
Nikola created FLINK-16869:
--

 Summary: Flink UI does not show version
 Key: FLINK-16869
 URL: https://issues.apache.org/jira/browse/FLINK-16869
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.10.0
 Environment: flink 1.10
Reporter: Nikola
 Attachments: Screenshot 2020-03-30 at 18.17.55.png

When I open the flink cluster UI I cannot see the version.
I am using flink 1.10.0 with scala 2.12 installed from docker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

2020-03-30 Thread Andrey Zagrebin
Hi All,

Thanks for this FLIP, Yadong. This is a very good improvement to the
Flink's UI.
It looks like there are still couple of things to resolve before the final
vote.

- I also find the non-heap title in configuration confusing because there
are also other non-heap types of memory. The "off-heap" concept is quite
broad.
What about "JVM specific" meaning that it is not coming directly from Flink?
or we could remove the "Non-heap" box at all and show directly JVM
Metaspace and Overhead as separate boxes,
this would also fit if we decide to keep the Metaspace metric.

- Total Process Memory Used: I agree with Xintong, it is hard to say what
is used there.
Then the size of "Total Process Memory" basically becomes part of
configuration.

- Non-Heap Used/Max/.. Not sure what committed means here. I also think we
should either exclude it or display what is known for sure.
In general, the metaspace usage would be nice to have but it should be then
exactly metaspace usage without any thing else.

- I do not know how the mapped memory works. Is it meant for the new
spilled partitions? If the mapped memory also pulls from the direct
memory limit
then this is something we do not account in our network buffers as I
understand. In this case, this metric may be useful for tuning to understand
how much the mapped memory uses from the direct memory limit to set e.g.
framework off-heap limit correctly and avoid direct OOM.
It could be something to discuss with Zhijiang. e.g. is the direct
memory used there to buffer fetched regions of partition files or what for?

- Not sure, we need an extra wrapping box "other" for the managed memory
atm. I could be just "Managed" or "Managed by Flink".

Best,
Andrey

On Fri, Mar 27, 2020 at 6:13 AM Xintong Song  wrote:

> Sorry for the late response.
>
> I have shared my suggestions with Yadong & Lining offline. I think it would
> be better to also post them here, for the public record.
>
>- I'm not sure about displaying Total Process Memory Used. Currently, we
>do not have a good way to monitor all memory footprints of the process.
>Metrics for some native memory usages (e.g., thread stack) are absent.
>Displaying a partial used memory size could be confusing for users.
>- I would suggest merge the current Mapped Memory metrics into Direct
>Memory. Actually, the metrics are retrieved from MXBeans for direct
> buffer
>pool and mapped buffer pool. Both of the two pools are accounted for in
>-XX:MaxDirectMemorySize. There's no Flink configuration that can modify
> the
>individual pool sizes. Therefore, I think displaying the total Direct
>Memory would be good enough. Moreover, in most use cases the size of
> mapped
>buffer pool is zero and users do not need to understand what is Mapped
>Memory. For expert users who do need the separated metrics for
> individual
>pools, they can subscribe the metrics on their own.
>- I would suggest to not display Non-Heap Memory. Despite the name, the
>metrics (also retrieved from MXBeans) actually accounts for metaspace,
> code
>cache, and compressed class space. It does not account for all JVM
> native
>memory overheads, e.g., thread stack. That means the metrics of Non-Heap
>Memory do not well correspond to any of the FLIP-49 memory components.
> They
>account for Flink's JVM Metaspace and part of JVM Overhead. I think this
>brings more confusion then help to users, especially primary users.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Mar 26, 2020 at 6:34 PM Till Rohrmann 
> wrote:
>
> > Thanks for updating the FLIP Yadong.
> >
> > What is the difference between managedMemory and managedMemoryTotal
> > and networkMemory and networkMemoryTotal in the REST response? If they
> are
> > duplicates, then we might be able to remove one.
> >
> > Apart from that, the proposal looks good to me.
> >
> > Pulling also Andrey in to hear his opinion about the representation of
> the
> > memory components.
> >
> > Cheers,
> > Till
> >
> > On Thu, Mar 19, 2020 at 11:37 AM Yadong Xie  wrote:
> >
> >> Hi all
> >>
> >> I have updated the design of the metric page and FLIP doc, please let me
> >> know what you think about it
> >>
> >> FLIP-102:
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> >> POC web:
> >>
> >>
> http://101.132.122.69:8081/web/#/task-manager/8e1f1beada3859ee8e46d0960bb1da18/metrics
> >>
> >> Till Rohrmann  于2020年2月27日周四 下午10:27写道:
> >>
> >> > Thinking a bit more about the problem whether to report the aggregated
> >> > memory statistics or the individual slot statistics, I think reporting
> >> it
> >> > on a per slot basis won't work nicely together with FLIP-56 (dynamic
> >> slot
> >> > allocation). The problem is that with FLIP-56, we will no longer have
> >> > dedicated slots. The number of slots might change over the lifetime
> of a
> >> > TaskExecutor. Hence, it won't be easy to generate a metric path f

Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-30 Thread Timo Walther

Hi Dawid,

thanks for updating the FLIP. One minor comment from my side, should we 
move the LIKE clause to the very end?


CREATE TABLE X () WITH () LIKE ...

Otherwise, the LIKE clause looks a bit lost if there are options 
afterwards. Otherwise, +1 for start a vote from my side.


Regards,
Timo


On 25.03.20 15:30, Dawid Wysakowicz wrote:

Thank you for your opinions. I updated the FLIP with results of the
discussion. Let me know if you have further concerns.

Best,

Dawid

On 05/03/2020 07:46, Jark Wu wrote:

Hi Dawid,


INHERITS creates a new table with a "link" to the original table.

Yes, INHERITS is a "link" to the original table in PostgreSQL.
But INHERITS is not SQL standard, I think it's fine for vendors to define
theire semantics.


Standard also allows declaring the clause after the schema part. We can

also do it.
Is that true? I didn't find it in SQL standard. If this is true, I prefer
to put LIKE after the schema part.



Hi Jingsong,

The concern you mentioned in (2) is exactly my concern too. That's why I
suggested INHERITS, or put LIKE after schema part.

Best,
Jark

On Thu, 5 Mar 2020 at 12:05, Jingsong Li  wrote:


Thanks Dawid for starting this discussion.

I like the "LIKE".

1.For "INHERITS", I think this is a good feature too, yes, ALTER TABLE will
propagate any changes in column data definitions and check constraints down
the inheritance hierarchy. A inherits B, A and B share every things, they
have the same kafka topic. If modify schema of B, this means underlying
kafka topic schema changed, so I think it is good to modify A too. If this
for "ConfluentSchemaRegistryCatalog" mention by Jark, I think sometimes
this is just we want.
But "LIKE" also very useful for many cases.

2.For LIKE statement in schema, I know two kinds of like syntax, one is
MySQL/hive/sqlserver, the other is PostgreSQL. I prefer former:
- In the FLIP, there is "OVERWRITING OPTIONS", this will overwrite
properties in "with"? This looks weird, because "LIKE" is in schema, but it
can affect outside properties.

Best,
Jingsong Lee

On Wed, Mar 4, 2020 at 2:05 PM Dawid Wysakowicz 
wrote:


Hi Jark,
I did investigate the INHERITS clause, but it has a semantic that in my
opinion we definitely don't want to support. INHERITS creates a new table
with a "link" to the original table. Therefore if you e.g change the

schema

of the original table it's also reflected in the child table. It's also
possible for tables like A inherits B query them like Select * from only

A,

by default it returns results from both tables. I am pretty sure it's not
what we're looking for.

PostgreSQL implements both the LIKE clause and INHERITS. I am open for
discussion if we should support multiple LIKE statements or not. Standard
also allows declaring the clause after the schema part. We can also do

it.

Nevertheless I think including multiple tables might be useful, e.g. when
you want to union two tables and output to the same Kafka cluster and

just

change the target topic. I know it's not a very common use case but it's
not a big effort to support it.

Let me know what you think.

Best,
Dawid

On Wed, 4 Mar 2020, 04:55 Jark Wu,  wrote:


Hi Dawid,

Thanks for starting this discussion. I like the idea.
Once we support more intergrated catalogs,
e.g. ConfluentSchemaRegistryCatalog, this problem will be more urgent.
Because it's very common to adjust existing tables in catalog slightly.

My initial thought was introducing INHERITS keyword, which is also
supported in PostgreSQL [1].
This is also similar to the functionality of Hive CREATE TABLE LIKE

[2].

CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
cat.db.KafkoTopic
CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
cat.db.KafkoTopic WITH ('k' = 'v')

The INHERITS can inherit an existing table with all columns, watermark,

and

properties, but the properties and watermark and be overwrited

explicitly.

The reason I prefer INHERITS rather than LIKE is the keyword position.

We

are copying an existing table definition including the properties.
However, LIKE appears in the schema part, it sounds like copying

properties

into schema part of DDL.

Besides of that, I'm not sure whether the use case stands "merging two
tables into a single one with a different connector".
 From my understanding, most use cases are just slightly adjusting on an
existing catalog table with new properties or watermarks.
Do we really need to merge two table definitions into a single one? For
example, is it possible to merge a Kafka table definition and
a Filesystem table definition into a new Kafka table, and the new Kafka
table exactly matches the underlying physical data format?

Best,
Jark

[1]: https://www.postgresql.org/docs/9.5/sql-createtable.html
[2]:



https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableLike


On Tue, 3 Mar 2020 at 21:12, Dawid Wysakowicz 
wrote:


Hi devs,

I wanted to bring anothe

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Benchao Li
Hi Jark,

Thanks for starting the discussion. The FLIP LGTM generally.

Regarding connector version VS put version into connector's name,
I favor the latter personally, using one property to locate a connector can
make the error log more precise.
>From the users' side, using one property to match a connector will be
easier. Especially we have many connectors,
and some of the need version property required, and some of them not.

Regarding Jingsong's suggestion,
IMO, it's a very good complement to the FLIP. Distinguishing properties for
source and sink can be very useful, and
also this will make the connector property more precise.
We are also sick of this for now, we cannot know whether a DDL is a source
or sink unless we look through all queries where
the table is used.
Even more, some of the required properties are only required for source,
bug we cannot leave it blank for sink, and vice versa.
I think we can also add a type for dimension tables except source and sink.

Kurt Young  于2020年3月30日周一 下午8:16写道:

> > It's not possible for the framework to throw such exception. Because the
> framework doesn't know what versions do the connector support.
>
> Not really, we can list all valid connectors framework could found. E.g.
> user mistyped 'kafka-0.x', the error message will looks like:
>
> we don't have any connector named "kafka-0.x", but we have:
> FileSystem
> Kafka-0.10
> Kafka-0.11
> ElasticSearch6
> ElasticSearch7
>
> Best,
> Kurt
>
>
> On Mon, Mar 30, 2020 at 5:11 PM Jark Wu  wrote:
>
> > Hi Kurt,
> >
> > > 2) Lists all available connectors seems also quite straightforward, e.g
> > user provided a wrong "kafka-0.8", we tell user we have candidates of
> > "kakfa-0.11", "kafka-universal"
> > It's not possible for the framework to throw such exception. Because the
> > framework doesn't know what versions do the connector support. All the
> > version information is a blackbox in the identifier. But with
> > `Factory#factoryVersion()` interface, we can know all the supported
> > versions.
> >
> > > 3) I don't think so. We can still treat it as the same connector but
> with
> > different versions.
> > That's true but that's weird. Because from the plain DDL definition, they
> > look like different connectors with different "connector" value, e.g.
> > 'connector=kafka-0.8', 'connector=kafka-0.10'.
> >
> > > If users don't set any version, we will use "kafka-universal" instead.
> > The behavior is inconsistent IMO.
> > That is a long term vision when there is no kafka clusters with <0.11
> > version.
> > At that point, "universal" is the only supported version in Flink and the
> > "version" key can be optional.
> >
> > -
> >
> > Hi Jingsong,
> >
> > > "version" vs "kafka.version"
> > I though about it. But if we prefix "kafka" to version, we should prefix
> > "kafka" for all other property keys, because they are all kafka specific
> > options.
> > However, that will make the property set verbose, see rejected option#2
> in
> > the FLIP.
> >
> > > explicitly separate options for source and sink
> > That's a good topic. It's good to have a guideline for the new property
> > keys.
> > I'm fine to prefix with a "source"/"sink" for some connector keys.
> > Actually, we already do this in some connectors, e.g. jdbc and hbase.
> >
> > Best,
> > Jark
> >
> > On Mon, 30 Mar 2020 at 16:36, Jingsong Li 
> wrote:
> >
> > > Thanks Jark for the proposal.
> > >
> > > +1 to the general idea.
> > >
> > > For "version", what about "kafka.version"? It is obvious to know its
> > > meaning.
> > >
> > > And I'd like to start a new topic:
> > > Should we need to explicitly separate source from sink?
> > > With the development of batch and streaming, more and more connectors
> > have
> > > both source and sink.
> > >
> > > So should we set a rule for table properties:
> > > - properties for both source and sink: without prefix, like "topic"
> > > - properties for source only: with "source." prefix, like
> > > "source.startup-mode"
> > > - properties for sink only: with "sink." prefix, like
> "sink.partitioner"
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Mar 30, 2020 at 3:56 PM Jark Wu  wrote:
> > >
> > > > Hi Kurt,
> > > >
> > > > That's good questions.
> > > >
> > > > > the meaning of "version"
> > > > There are two versions in the old design. One is property version
> > > > "connector.property-version" which can be used for backward
> > > compatibility.
> > > > The other one is "connector.version" which defines the version of
> > > external
> > > > system, e.g. 0.11" for kafka, "6" or "7" for ES.
> > > > In this proposal, the "version" is the previous "connector.version".
> > The
> > > > ""connector.property-version" is not introduced in new design.
> > > >
> > > > > how to keep the old capability which can evolve connector
> properties
> > > > The "connector.property-version" is an optional key in the old design
> > and
> > > > is never bump

[jira] [Created] (FLINK-16866) Make job submission non-blocking

2020-03-30 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-16866:
-

 Summary: Make job submission non-blocking
 Key: FLINK-16866
 URL: https://issues.apache.org/jira/browse/FLINK-16866
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.0, 1.9.2, 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Currently, Flink waits to acknowledge a job submission until the corresponding 
{{JobManager}} has been created. Since its creation also involves the creation 
of the {{ExecutionGraph}} and potential FS operations, it can take a bit of 
time. If the user has configured a too low {{web.timeout}}, the submission can 
time out only reporting a {{TimeoutException}} to the user.

I propose to change the notion of job submission slightly. Instead of waiting 
until the {{JobManager}} has been created, a job submission is complete once 
all job relevant files have been uploaded to the {{Dispatcher}} and the 
{{Dispatcher}} has been told about it. Creating the {{JobManager}} will then 
belong to the actual job execution. Consequently, if problems occur while 
creating the {{JobManager}} it will result into a job failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16867) Simplify default timeout configuration

2020-03-30 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-16867:
-

 Summary: Simplify default timeout configuration
 Key: FLINK-16867
 URL: https://issues.apache.org/jira/browse/FLINK-16867
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration, Runtime / Coordination
Affects Versions: 1.10.0, 1.9.2, 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


At the moment, Flink has several timeout options:

* {{akka.ask.timeout}}: Timeout for intra cluster RPCs (JM <-> RM <-> TE)
* {{web.timeout}}: Timeout for RPCs between REST handlers and RM, JM, TE

At the moment, these values are separately configured. This requires the user 
to know about both configuration options and that Flink has multiple timeout 
values. 

In order to simplify setups I would suggest that {{web.timeout}} defaults to 
{{akka.ask.timeout}}, if {{web.timeout}} has been explicitly configured. This 
has the benefits that the user only need to know about a single timeout value 
which is applied cluster wide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16868) Table/SQL doesn't support custom trigger

2020-03-30 Thread Jimmy Wong (Jira)
Jimmy Wong created FLINK-16868:
--

 Summary: Table/SQL doesn't support custom trigger
 Key: FLINK-16868
 URL: https://issues.apache.org/jira/browse/FLINK-16868
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Runtime
Reporter: Jimmy Wong
 Fix For: 1.9.2, 1.9.1, 1.9.0


Table/SQL doesn't support custom trigger, such as CountTrigger, 
ContinuousEventTimeTrigger/ContinuousProcessingTimeTrigger. Do we has plans to 
make it?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Till Rohrmann
This is great news Jeff! Thanks a lot for sharing it with the community.
Looking forward trying Flink on Zeppelin out :-)

Cheers,
Till

On Mon, Mar 30, 2020 at 2:47 PM Jeff Zhang  wrote:

> Hi Folks,
>
> I am very excited to announce the integration work of flink on apache
> zeppelin notebook is completed. You can now run flink jobs via datastream
> api, table api, sql, pyflink in apache apache zeppelin notebook. Download
> it here http://zeppelin.apache.org/download.html),
>
> Here's some highlights of this work
>
> 1. Support 3 kind of execution mode: local, remote, yarn
> 2. Support multiple languages  in one flink session: scala, python, sql
> 3. Support hive connector (reading from hive and writing to hive)
> 4. Dependency management
> 5. UDF support (scala, pyflink)
> 6. Support both batch sql and streaming sql
>
> For more details and usage instructions, you can refer following 4 blogs
>
> 1) Get started https://link.medium.com/oppqD6dIg5
>  2) Batch https://
> link.medium.com/3qumbwRIg5  3) Streaming
> https://link.medium.com/RBHa2lTIg5  4)
> Advanced usage https://link.medium.com/CAekyoXIg5
> 
>
> Welcome to use flink on zeppelin and give feedback and comments.
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-30 Thread Till Rohrmann
If there is no need for the ExternalResourceDriver on the RM side, then it
is always a good idea to keep it simple and don't introduce it. One can
always change things once one realizes that there is a need for it.

Cheers,
Till

On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo  wrote:

> Hi @Till, @Xintong
>
> I think even without the credential concerns, replacing the interfaces
> with configuration options is a good idea from my side.
> - Currently, I don't see any external resource does not compatible
> with this mechanism
> - It reduces the burden of users to implement a plugin themselves.
> WDYT?
>
> Best,
> Yangze Guo
>
> On Mon, Mar 30, 2020 at 5:44 PM Xintong Song 
> wrote:
> >
> > I also agree that the pluggable ExternalResourceDriver should be loaded
> by
> > the cluster class loader. Despite the plugin might be implemented by
> users,
> > external resources (as part of task executor resources) should be cluster
> > configurations, unlike job-level user codes such as UDFs, because the
> task
> > executors belongs to the cluster rather than jobs.
> >
> >
> > IIUC, the concern Stephan raised is about the potential credential
> problem
> > when executing user codes on RM with cluster class loader. The concern
> > makes sense to me, and I think what Yangze suggested should be a good
> > approach trying to prevent such credential problems. The only purpose we
> > tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on
> RM
> > was that, we need to set these key-value pairs to pod/container requests.
> > Replacing the interfaces getKubernetes/YarnExternalResource with
> > configuration options
> > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > we can still fulfill that purpose, without the credential risks.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann 
> wrote:
> >
> > > At the moment the RM does not have a user code class loader and I agree
> > > with Stephan that it should stay like this. This, however, does not
> mean
> > > that we cannot support pluggable components in the RM. As long as the
> > > plugins are on the system's class path, it should be fine for the RM to
> > > load them. For example, we could add external resources via Flink's
> plugin
> > > mechanism or something similar.
> > >
> > > A very simple implementation of such an ExternalResourceDriver could
> be a
> > > class which simply returns what is written in the flink-conf.yaml
> under a
> > > given key.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo  wrote:
> > >
> > > > Hi, Stephan,
> > > >
> > > > I see your concern and I totally agree with you.
> > > >
> > > > The interface on RM side is now `Map
> > > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > > get from it is the configuration key of that external resource in
> > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > external-resource.{resourceName}.amount.
> > > > So, I think it makes sense to replace these two interfaces with two
> > > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key.
> We
> > > > may lose some extensibility, but AFAIK it could work with common
> > > > external resources like GPU, FPGA. WDYT?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen 
> wrote:
> > > > >
> > > > > Maybe one final comment: It is probably not an issue, but let's
> try and
> > > > > keep user code (via user code classloader) out of the
> ResourceManager,
> > > if
> > > > > possible.
> > > > >
> > > > > As background:
> > > > >
> > > > > There were thoughts in the past to support setups where the RM
> must run
> > > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > > credentials, as the user code might access them otherwise.
> > > > > This is actually possible today, you can run the RM in a different
> JVM
> > > or
> > > > > in a different container, and give it more credentials than JMs /
> TMs.
> > > > But
> > > > > for this to be feasible, we cannot allow any user-defined code to
> be in
> > > > the
> > > > > JVM, because that instantaneously breaks the isolation of
> credentials.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo 
> wrote:
> > > > >
> > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > >
> > > > > > Regarding separating the interface, I'm also +1 with it.
> > > > > >
> > > > > > Regarding the resource allocation interface, true, it's
> dangerous to
> > > > > > give much access to user codes. Changing the return type to
> > > Map > > > > > key, String/Long value> makes sense to me. AFAIK, it is
> compatible
> > > > > > with all the first-party supported resources for
> Yarn/Kubernetes. It
> > > > > > could also free us from the potential dependency issue as well.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, M

[ANNOUNCE] Flink on Zeppelin (Zeppelin 0.9 is released)

2020-03-30 Thread Jeff Zhang
Hi Folks,

I am very excited to announce the integration work of flink on apache
zeppelin notebook is completed. You can now run flink jobs via datastream
api, table api, sql, pyflink in apache apache zeppelin notebook. Download
it here http://zeppelin.apache.org/download.html),

Here's some highlights of this work

1. Support 3 kind of execution mode: local, remote, yarn
2. Support multiple languages  in one flink session: scala, python, sql
3. Support hive connector (reading from hive and writing to hive)
4. Dependency management
5. UDF support (scala, pyflink)
6. Support both batch sql and streaming sql

For more details and usage instructions, you can refer following 4 blogs

1) Get started https://link.medium.com/oppqD6dIg5
 2) Batch https://link.medium.com/3qumbwRIg5
 3) Streaming https://
link.medium.com/RBHa2lTIg5  4) Advanced
usage https://link.medium.com/CAekyoXIg5 

Welcome to use flink on zeppelin and give feedback and comments.

-- 
Best Regards

Jeff Zhang


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Leonard Xu
+1(non-binding)

Best,
Leonard Xu

> 在 2020年3月30日,16:43,Jingsong Li  写道:
> 
> +1
> 
> Best,
> Jingsong Lee
> 
> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:
> 
>> +1
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> Jark Wu  于2020年3月30日周一 下午3:57写道:
>>> 
 +1 from my side.
 
 Thanks Timo for driving this.
 
 Best,
 Jark
 
 On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
 
> Hi all,
> 
> I would like to start the vote for FLIP-95 [1], which is discussed
>> and
> reached a consensus in the discussion thread [2].
> 
> The vote will be open until April 2nd (72h), unless there is an
> objection or not enough votes.
> 
> Thanks,
> Timo
> 
> [1]
> 
> 
 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
> 
> 
 
>>> 
>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> 
 
>>> 
>>> 
>>> --
>>> 
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>> 
>> 
> 
> 
> -- 
> Best, Jingsong Lee



Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Kurt Young
> It's not possible for the framework to throw such exception. Because the
framework doesn't know what versions do the connector support.

Not really, we can list all valid connectors framework could found. E.g.
user mistyped 'kafka-0.x', the error message will looks like:

we don't have any connector named "kafka-0.x", but we have:
FileSystem
Kafka-0.10
Kafka-0.11
ElasticSearch6
ElasticSearch7

Best,
Kurt


On Mon, Mar 30, 2020 at 5:11 PM Jark Wu  wrote:

> Hi Kurt,
>
> > 2) Lists all available connectors seems also quite straightforward, e.g
> user provided a wrong "kafka-0.8", we tell user we have candidates of
> "kakfa-0.11", "kafka-universal"
> It's not possible for the framework to throw such exception. Because the
> framework doesn't know what versions do the connector support. All the
> version information is a blackbox in the identifier. But with
> `Factory#factoryVersion()` interface, we can know all the supported
> versions.
>
> > 3) I don't think so. We can still treat it as the same connector but with
> different versions.
> That's true but that's weird. Because from the plain DDL definition, they
> look like different connectors with different "connector" value, e.g.
> 'connector=kafka-0.8', 'connector=kafka-0.10'.
>
> > If users don't set any version, we will use "kafka-universal" instead.
> The behavior is inconsistent IMO.
> That is a long term vision when there is no kafka clusters with <0.11
> version.
> At that point, "universal" is the only supported version in Flink and the
> "version" key can be optional.
>
> -
>
> Hi Jingsong,
>
> > "version" vs "kafka.version"
> I though about it. But if we prefix "kafka" to version, we should prefix
> "kafka" for all other property keys, because they are all kafka specific
> options.
> However, that will make the property set verbose, see rejected option#2 in
> the FLIP.
>
> > explicitly separate options for source and sink
> That's a good topic. It's good to have a guideline for the new property
> keys.
> I'm fine to prefix with a "source"/"sink" for some connector keys.
> Actually, we already do this in some connectors, e.g. jdbc and hbase.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 16:36, Jingsong Li  wrote:
>
> > Thanks Jark for the proposal.
> >
> > +1 to the general idea.
> >
> > For "version", what about "kafka.version"? It is obvious to know its
> > meaning.
> >
> > And I'd like to start a new topic:
> > Should we need to explicitly separate source from sink?
> > With the development of batch and streaming, more and more connectors
> have
> > both source and sink.
> >
> > So should we set a rule for table properties:
> > - properties for both source and sink: without prefix, like "topic"
> > - properties for source only: with "source." prefix, like
> > "source.startup-mode"
> > - properties for sink only: with "sink." prefix, like "sink.partitioner"
> >
> > What do you think?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Mar 30, 2020 at 3:56 PM Jark Wu  wrote:
> >
> > > Hi Kurt,
> > >
> > > That's good questions.
> > >
> > > > the meaning of "version"
> > > There are two versions in the old design. One is property version
> > > "connector.property-version" which can be used for backward
> > compatibility.
> > > The other one is "connector.version" which defines the version of
> > external
> > > system, e.g. 0.11" for kafka, "6" or "7" for ES.
> > > In this proposal, the "version" is the previous "connector.version".
> The
> > > ""connector.property-version" is not introduced in new design.
> > >
> > > > how to keep the old capability which can evolve connector properties
> > > The "connector.property-version" is an optional key in the old design
> and
> > > is never bumped up.
> > > I'm not sure how "connector.property-version" should work in the
> initial
> > > design. Maybe @Timo Walther  has more knowledge on
> > > this.
> > > But for the new properties, every options should be expressed as
> > > `ConfigOption` which provides `withDeprecatedKeys(...)` method to
> easily
> > > support evolving keys.
> > >
> > > > a single keys instead of two, e.g. "kafka-0.11", "kafka-universal"?
> > > There are several benefit to use separate "version" key I can see:
> > > 1) it's more readable to separete them into different keys, because
> they
> > > are orthogonal concepts.
> > > 2) the planner can give all the availble versions in the exception
> > message,
> > > if user uses a wrong version (this is often reported in user ML).
> > > 3) If we use "kafka-0.11" as connector identifier, we may have to
> write a
> > > full documentation for each version, because they are different
> > > "connector"?
> > > IMO, for 0.11, 0.11, etc... kafka, they are actually the same
> > connector
> > > but with different "client jar" version,
> > > they share all the same supported property keys and should reside
> > > together.
> > > 4) IMO, the future vision is version-free. At some point in the future,
> > we
> > 

[jira] [Created] (FLINK-16865) 【Flink Kafka Connector】Restore from Savepoint,if add new Kafka topic,Flink will consume the new topic from earlist,it may cause duplicate data sink

2020-03-30 Thread zhisheng (Jira)
zhisheng created FLINK-16865:


 Summary: 【Flink Kafka Connector】Restore from Savepoint,if add new 
Kafka topic,Flink will consume the new topic from earlist,it may cause 
duplicate data sink
 Key: FLINK-16865
 URL: https://issues.apache.org/jira/browse/FLINK-16865
 Project: Flink
  Issue Type: Bug
Reporter: zhisheng
 Attachments: image-2020-03-30-19-57-42-451.png

h3. 【Flink Kafka Connector】

If the job adds another Kafka topic when the job start from Savepoint, it will 
start to consume from the earlist of that topic, it may cause duplicate data 
sink.

I found the config is in , maybe it can be 

I found that the configuration is already written in the code 
FlinkKafkaConsumerBase#open(), maybe it can be Configurable.

 

!image-2020-03-30-19-57-42-451.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16864) Add idle metrics for Task

2020-03-30 Thread Wenlong Lyu (Jira)
Wenlong Lyu created FLINK-16864:
---

 Summary: Add idle metrics for Task
 Key: FLINK-16864
 URL: https://issues.apache.org/jira/browse/FLINK-16864
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Reporter: Wenlong Lyu


Currently there is no metric for user to measure how busy a task is concretely, 
which is important for user to decide how to tune a job.

We would like to propose adding an IdleTime which measure idle time of a task 
including the time cost for mail processor to wait for new mail and the time 
cost in record writer to waiting a new buffer. 

With the idle time:
1. when a job can not catch up with the speed of data generating, the vertex 
which idle time is near to zero is the bottle neck of the job.
2. when a job is not busy, idle time  can be used to guide user how much he can 
scale down the job.

In addition, measuring idle time can have little impaction on the performance 
of the job, because when a task is busy, we don't touch the code to measure 
wait-time in mailbox.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: PackagedProgram and ProgramDescription

2020-03-30 Thread Flavio Pompermaier
I would personally like to see a way of describing a Flink job/pipeline
(including its parameters and types) in order to enable better UIs, then
the important thing is to make things consistent and aligned with the new
client developments and exploit this new dev sprint to fix such issues.

On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek 
wrote:

> On 18.03.20 14:45, Flavio Pompermaier wrote:
> > what do you think if we exploit this job-submission sprint to address
> also
> > the problem discussed in
> https://issues.apache.org/jira/browse/FLINK-10862?
>
> That's a good idea! What should we do? It seems that most committers on
> the issue were in favour of deprecating/removing ProgramDescription.
>


[jira] [Created] (FLINK-16863) add lastModified as a field of LogInfo

2020-03-30 Thread lining (Jira)
lining created FLINK-16863:
--

 Summary: add lastModified as a field of LogInfo
 Key: FLINK-16863
 URL: https://issues.apache.org/jira/browse/FLINK-16863
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Reporter: lining


Sorting descendingly on the last modified date could a user be able to see the 
most recent files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-30 Thread Yangze Guo
Hi @Till, @Xintong

I think even without the credential concerns, replacing the interfaces
with configuration options is a good idea from my side.
- Currently, I don't see any external resource does not compatible
with this mechanism
- It reduces the burden of users to implement a plugin themselves.
WDYT?

Best,
Yangze Guo

On Mon, Mar 30, 2020 at 5:44 PM Xintong Song  wrote:
>
> I also agree that the pluggable ExternalResourceDriver should be loaded by
> the cluster class loader. Despite the plugin might be implemented by users,
> external resources (as part of task executor resources) should be cluster
> configurations, unlike job-level user codes such as UDFs, because the task
> executors belongs to the cluster rather than jobs.
>
>
> IIUC, the concern Stephan raised is about the potential credential problem
> when executing user codes on RM with cluster class loader. The concern
> makes sense to me, and I think what Yangze suggested should be a good
> approach trying to prevent such credential problems. The only purpose we
> tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on RM
> was that, we need to set these key-value pairs to pod/container requests.
> Replacing the interfaces getKubernetes/YarnExternalResource with
> configuration options
> 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> we can still fulfill that purpose, without the credential risks.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann  wrote:
>
> > At the moment the RM does not have a user code class loader and I agree
> > with Stephan that it should stay like this. This, however, does not mean
> > that we cannot support pluggable components in the RM. As long as the
> > plugins are on the system's class path, it should be fine for the RM to
> > load them. For example, we could add external resources via Flink's plugin
> > mechanism or something similar.
> >
> > A very simple implementation of such an ExternalResourceDriver could be a
> > class which simply returns what is written in the flink-conf.yaml under a
> > given key.
> >
> > Cheers,
> > Till
> >
> > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo  wrote:
> >
> > > Hi, Stephan,
> > >
> > > I see your concern and I totally agree with you.
> > >
> > > The interface on RM side is now `Map
> > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > get from it is the configuration key of that external resource in
> > > Yarn/K8s. The "String/Long value" would be the same as the
> > > external-resource.{resourceName}.amount.
> > > So, I think it makes sense to replace these two interfaces with two
> > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> > > may lose some extensibility, but AFAIK it could work with common
> > > external resources like GPU, FPGA. WDYT?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen  wrote:
> > > >
> > > > Maybe one final comment: It is probably not an issue, but let's try and
> > > > keep user code (via user code classloader) out of the ResourceManager,
> > if
> > > > possible.
> > > >
> > > > As background:
> > > >
> > > > There were thoughts in the past to support setups where the RM must run
> > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > credentials, as the user code might access them otherwise.
> > > > This is actually possible today, you can run the RM in a different JVM
> > or
> > > > in a different container, and give it more credentials than JMs / TMs.
> > > But
> > > > for this to be feasible, we cannot allow any user-defined code to be in
> > > the
> > > > JVM, because that instantaneously breaks the isolation of credentials.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo  wrote:
> > > >
> > > > > Thanks for the feedback, @Till and @Xintong.
> > > > >
> > > > > Regarding separating the interface, I'm also +1 with it.
> > > > >
> > > > > Regarding the resource allocation interface, true, it's dangerous to
> > > > > give much access to user codes. Changing the return type to
> > Map > > > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > > > could also free us from the potential dependency issue as well.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song  > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks for updating the FLIP, Yangze.
> > > > > >
> > > > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > > > decorator
> > > > > > calls. Users can still configure one driver class, and we can use
> > > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > > specific
> > > > > > interfaces.
> > > > > >
> > > > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> > > `Pod`
> > > > > > (`AbstractKubernete

Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'

2020-03-30 Thread Chesnay Schepler
I would recommend to wait until a committer has signed up for reviewing 
your changes before preparing any PR.
Otherwise the chances are high that you invest a lot of time but the 
changes never get in.


On 30/03/2020 11:42, Sivaprasanna wrote:

Hello Till,

I agree with having the scope limited and more concentrated. I can file a
Jira and get started with the code changes, as and when someone has some
bandwidth, the review can also be done. What do you think?

Cheers,
Sivaprasanna

On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann  wrote:


Hi Sivaprasanna,

thanks for starting this discussion. In general I like the idea to remove
duplications and move common code to a shared module. As a recommendation,
I would exclude the whole part about Flink's Hadoop compatibility modules
because they are legacy code and hardly used anymore. This would also have
the benefit of making the scope of the proposal a bit smaller.

What we now need is a committer who wants to help with this effort. It
might be that this takes a bit of time as many of the committers are quite
busy.

Cheers,
Till

On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna 
wrote:


Hi,

Continuing on an earlier discussion[1] regarding having a separate module
for Hadoop related utility components, I have gone through our project
briefly and found the following components which I feel could be moved

to a

separate module for reusability, and better module structure.

Module Name Class Name Used at / Remarks

flink-hadoop-fs
flink.runtime.util.HadoopUtils
flink-runtime => HadoopModule & HadoopModuleFactory
flink-swift-fs-hadoop => SwiftFileSystemFactory
flink-yarn => Utils, YarnClusterDescriptor

flink-hadoop-compatability
api.java.hadoop.mapred.utils.HadoopUtils
Both belong to the same module but with different packages
(api.java.hadoop.mapred and api.java.hadoop.mapreduce)
api.java.hadoop.mapreduce.utils.HadoopUtils
flink-sequeunce-file
formats.sequeuncefile.SerializableHadoopConfiguration Currently,
it is used at formats.sequencefile.SequenceFileWriterFactory but can also
be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter and
pretty much everywhere to avoid NotSerializableException.

*Proposal*
To summarise, I believe we can create a new module (flink-hadoop-utils ?)
and move these reusable components to this new module which will have an
optional/provided dependency on flink-shaded-hadoop-2.

*Structure*
In the present form, I think we will have two classes with the packaging
structure being *org.apache.flink.hadoop.[utils/serialization]*
1. HadoopUtils with all static methods ( after combining and eliminating
the duplicate code fragments from the three HadoopUtils classes mentioned
above)
2. Move the existing SerializableHadoopConfiguration from the
flink-sequence-file to this new module .

*Justification*
* With this change, we would be stripping away the dependency on
flink-hadoop-fs from flink-runtime as I don't see any other classes from
flink-hadoop-fs is being used anywhere in flink-runtime module.
* We will have a common place where all the utilities related to Hadoop

can

go which can be reused easily without leading to jar hell.

In addition to this, if you are aware of any other classes that fit in

this

approach, please share the details here.

*Note*
I don't have a complete understanding here but I did see two
implementations of the following classes under two different packages
*.mapred and *.mapreduce.
* HadoopInputFormat
* HadoopInputFormatBase
* HadoopOutputFormat
* HadoopOutputFormatBase

Can we somehow figure and have them in this new module?

Thanks,
Sivaprasanna

[1]



https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E





Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-30 Thread Xintong Song
I also agree that the pluggable ExternalResourceDriver should be loaded by
the cluster class loader. Despite the plugin might be implemented by users,
external resources (as part of task executor resources) should be cluster
configurations, unlike job-level user codes such as UDFs, because the task
executors belongs to the cluster rather than jobs.


IIUC, the concern Stephan raised is about the potential credential problem
when executing user codes on RM with cluster class loader. The concern
makes sense to me, and I think what Yangze suggested should be a good
approach trying to prevent such credential problems. The only purpose we
tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on RM
was that, we need to set these key-value pairs to pod/container requests.
Replacing the interfaces getKubernetes/YarnExternalResource with
configuration options
'external-resource.{resourceName}.yarn/kubernetes.key/amount',
we can still fulfill that purpose, without the credential risks.


Thank you~

Xintong Song



On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann  wrote:

> At the moment the RM does not have a user code class loader and I agree
> with Stephan that it should stay like this. This, however, does not mean
> that we cannot support pluggable components in the RM. As long as the
> plugins are on the system's class path, it should be fine for the RM to
> load them. For example, we could add external resources via Flink's plugin
> mechanism or something similar.
>
> A very simple implementation of such an ExternalResourceDriver could be a
> class which simply returns what is written in the flink-conf.yaml under a
> given key.
>
> Cheers,
> Till
>
> On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo  wrote:
>
> > Hi, Stephan,
> >
> > I see your concern and I totally agree with you.
> >
> > The interface on RM side is now `Map
> > getYarn/KubernetesExternalResource()`. The only valid information RM
> > get from it is the configuration key of that external resource in
> > Yarn/K8s. The "String/Long value" would be the same as the
> > external-resource.{resourceName}.amount.
> > So, I think it makes sense to replace these two interfaces with two
> > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> > may lose some extensibility, but AFAIK it could work with common
> > external resources like GPU, FPGA. WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen  wrote:
> > >
> > > Maybe one final comment: It is probably not an issue, but let's try and
> > > keep user code (via user code classloader) out of the ResourceManager,
> if
> > > possible.
> > >
> > > As background:
> > >
> > > There were thoughts in the past to support setups where the RM must run
> > > with "superuser" credentials, but we cannot run JM/TM with these
> > > credentials, as the user code might access them otherwise.
> > > This is actually possible today, you can run the RM in a different JVM
> or
> > > in a different container, and give it more credentials than JMs / TMs.
> > But
> > > for this to be feasible, we cannot allow any user-defined code to be in
> > the
> > > JVM, because that instantaneously breaks the isolation of credentials.
> > >
> > >
> > >
> > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo  wrote:
> > >
> > > > Thanks for the feedback, @Till and @Xintong.
> > > >
> > > > Regarding separating the interface, I'm also +1 with it.
> > > >
> > > > Regarding the resource allocation interface, true, it's dangerous to
> > > > give much access to user codes. Changing the return type to
> Map > > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > > could also free us from the potential dependency issue as well.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song  >
> > > > wrote:
> > > > >
> > > > > Thanks for updating the FLIP, Yangze.
> > > > >
> > > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > > decorator
> > > > > calls. Users can still configure one driver class, and we can use
> > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > specific
> > > > > interfaces.
> > > > >
> > > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> > `Pod`
> > > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`)
> to
> > user
> > > > > codes. It gives more access to user codes than needed for defining
> > > > external
> > > > > resource, which might cause problems. Instead, I would suggest to
> > have
> > > > > interface like `Map
> > > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> trohrm...@apache.org>
> > > > wro

Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'

2020-03-30 Thread Sivaprasanna
Hello Till,

I agree with having the scope limited and more concentrated. I can file a
Jira and get started with the code changes, as and when someone has some
bandwidth, the review can also be done. What do you think?

Cheers,
Sivaprasanna

On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann  wrote:

> Hi Sivaprasanna,
>
> thanks for starting this discussion. In general I like the idea to remove
> duplications and move common code to a shared module. As a recommendation,
> I would exclude the whole part about Flink's Hadoop compatibility modules
> because they are legacy code and hardly used anymore. This would also have
> the benefit of making the scope of the proposal a bit smaller.
>
> What we now need is a committer who wants to help with this effort. It
> might be that this takes a bit of time as many of the committers are quite
> busy.
>
> Cheers,
> Till
>
> On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna 
> wrote:
>
> > Hi,
> >
> > Continuing on an earlier discussion[1] regarding having a separate module
> > for Hadoop related utility components, I have gone through our project
> > briefly and found the following components which I feel could be moved
> to a
> > separate module for reusability, and better module structure.
> >
> > Module Name Class Name Used at / Remarks
> >
> > flink-hadoop-fs
> > flink.runtime.util.HadoopUtils
> > flink-runtime => HadoopModule & HadoopModuleFactory
> > flink-swift-fs-hadoop => SwiftFileSystemFactory
> > flink-yarn => Utils, YarnClusterDescriptor
> >
> > flink-hadoop-compatability
> > api.java.hadoop.mapred.utils.HadoopUtils
> > Both belong to the same module but with different packages
> > (api.java.hadoop.mapred and api.java.hadoop.mapreduce)
> > api.java.hadoop.mapreduce.utils.HadoopUtils
> > flink-sequeunce-file
> > formats.sequeuncefile.SerializableHadoopConfiguration Currently,
> > it is used at formats.sequencefile.SequenceFileWriterFactory but can also
> > be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter and
> > pretty much everywhere to avoid NotSerializableException.
> >
> > *Proposal*
> > To summarise, I believe we can create a new module (flink-hadoop-utils ?)
> > and move these reusable components to this new module which will have an
> > optional/provided dependency on flink-shaded-hadoop-2.
> >
> > *Structure*
> > In the present form, I think we will have two classes with the packaging
> > structure being *org.apache.flink.hadoop.[utils/serialization]*
> > 1. HadoopUtils with all static methods ( after combining and eliminating
> > the duplicate code fragments from the three HadoopUtils classes mentioned
> > above)
> > 2. Move the existing SerializableHadoopConfiguration from the
> > flink-sequence-file to this new module .
> >
> > *Justification*
> > * With this change, we would be stripping away the dependency on
> > flink-hadoop-fs from flink-runtime as I don't see any other classes from
> > flink-hadoop-fs is being used anywhere in flink-runtime module.
> > * We will have a common place where all the utilities related to Hadoop
> can
> > go which can be reused easily without leading to jar hell.
> >
> > In addition to this, if you are aware of any other classes that fit in
> this
> > approach, please share the details here.
> >
> > *Note*
> > I don't have a complete understanding here but I did see two
> > implementations of the following classes under two different packages
> > *.mapred and *.mapreduce.
> > * HadoopInputFormat
> > * HadoopInputFormatBase
> > * HadoopOutputFormat
> > * HadoopOutputFormatBase
> >
> > Can we somehow figure and have them in this new module?
> >
> > Thanks,
> > Sivaprasanna
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E
> >
>


Re: PackagedProgram and ProgramDescription

2020-03-30 Thread Aljoscha Krettek

On 18.03.20 14:45, Flavio Pompermaier wrote:

what do you think if we exploit this job-submission sprint to address also
the problem discussed in https://issues.apache.org/jira/browse/FLINK-10862?


That's a good idea! What should we do? It seems that most committers on 
the issue were in favour of deprecating/removing ProgramDescription.


Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'

2020-03-30 Thread Till Rohrmann
Hi Sivaprasanna,

thanks for starting this discussion. In general I like the idea to remove
duplications and move common code to a shared module. As a recommendation,
I would exclude the whole part about Flink's Hadoop compatibility modules
because they are legacy code and hardly used anymore. This would also have
the benefit of making the scope of the proposal a bit smaller.

What we now need is a committer who wants to help with this effort. It
might be that this takes a bit of time as many of the committers are quite
busy.

Cheers,
Till

On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna 
wrote:

> Hi,
>
> Continuing on an earlier discussion[1] regarding having a separate module
> for Hadoop related utility components, I have gone through our project
> briefly and found the following components which I feel could be moved to a
> separate module for reusability, and better module structure.
>
> Module Name Class Name Used at / Remarks
>
> flink-hadoop-fs
> flink.runtime.util.HadoopUtils
> flink-runtime => HadoopModule & HadoopModuleFactory
> flink-swift-fs-hadoop => SwiftFileSystemFactory
> flink-yarn => Utils, YarnClusterDescriptor
>
> flink-hadoop-compatability
> api.java.hadoop.mapred.utils.HadoopUtils
> Both belong to the same module but with different packages
> (api.java.hadoop.mapred and api.java.hadoop.mapreduce)
> api.java.hadoop.mapreduce.utils.HadoopUtils
> flink-sequeunce-file
> formats.sequeuncefile.SerializableHadoopConfiguration Currently,
> it is used at formats.sequencefile.SequenceFileWriterFactory but can also
> be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter and
> pretty much everywhere to avoid NotSerializableException.
>
> *Proposal*
> To summarise, I believe we can create a new module (flink-hadoop-utils ?)
> and move these reusable components to this new module which will have an
> optional/provided dependency on flink-shaded-hadoop-2.
>
> *Structure*
> In the present form, I think we will have two classes with the packaging
> structure being *org.apache.flink.hadoop.[utils/serialization]*
> 1. HadoopUtils with all static methods ( after combining and eliminating
> the duplicate code fragments from the three HadoopUtils classes mentioned
> above)
> 2. Move the existing SerializableHadoopConfiguration from the
> flink-sequence-file to this new module .
>
> *Justification*
> * With this change, we would be stripping away the dependency on
> flink-hadoop-fs from flink-runtime as I don't see any other classes from
> flink-hadoop-fs is being used anywhere in flink-runtime module.
> * We will have a common place where all the utilities related to Hadoop can
> go which can be reused easily without leading to jar hell.
>
> In addition to this, if you are aware of any other classes that fit in this
> approach, please share the details here.
>
> *Note*
> I don't have a complete understanding here but I did see two
> implementations of the following classes under two different packages
> *.mapred and *.mapreduce.
> * HadoopInputFormat
> * HadoopInputFormatBase
> * HadoopOutputFormat
> * HadoopOutputFormatBase
>
> Can we somehow figure and have them in this new module?
>
> Thanks,
> Sivaprasanna
>
> [1]
>
> https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E
>


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-30 Thread Till Rohrmann
At the moment the RM does not have a user code class loader and I agree
with Stephan that it should stay like this. This, however, does not mean
that we cannot support pluggable components in the RM. As long as the
plugins are on the system's class path, it should be fine for the RM to
load them. For example, we could add external resources via Flink's plugin
mechanism or something similar.

A very simple implementation of such an ExternalResourceDriver could be a
class which simply returns what is written in the flink-conf.yaml under a
given key.

Cheers,
Till

On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo  wrote:

> Hi, Stephan,
>
> I see your concern and I totally agree with you.
>
> The interface on RM side is now `Map
> getYarn/KubernetesExternalResource()`. The only valid information RM
> get from it is the configuration key of that external resource in
> Yarn/K8s. The "String/Long value" would be the same as the
> external-resource.{resourceName}.amount.
> So, I think it makes sense to replace these two interfaces with two
> configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> may lose some extensibility, but AFAIK it could work with common
> external resources like GPU, FPGA. WDYT?
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen  wrote:
> >
> > Maybe one final comment: It is probably not an issue, but let's try and
> > keep user code (via user code classloader) out of the ResourceManager, if
> > possible.
> >
> > As background:
> >
> > There were thoughts in the past to support setups where the RM must run
> > with "superuser" credentials, but we cannot run JM/TM with these
> > credentials, as the user code might access them otherwise.
> > This is actually possible today, you can run the RM in a different JVM or
> > in a different container, and give it more credentials than JMs / TMs.
> But
> > for this to be feasible, we cannot allow any user-defined code to be in
> the
> > JVM, because that instantaneously breaks the isolation of credentials.
> >
> >
> >
> > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo  wrote:
> >
> > > Thanks for the feedback, @Till and @Xintong.
> > >
> > > Regarding separating the interface, I'm also +1 with it.
> > >
> > > Regarding the resource allocation interface, true, it's dangerous to
> > > give much access to user codes. Changing the return type to Map > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > could also free us from the potential dependency issue as well.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song 
> > > wrote:
> > > >
> > > > Thanks for updating the FLIP, Yangze.
> > > >
> > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > decorator
> > > > calls. Users can still configure one driver class, and we can use
> > > > `instanceof` to check whether the driver implemented K8s/Yarn
> specific
> > > > interfaces.
> > > >
> > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> `Pod`
> > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to
> user
> > > > codes. It gives more access to user codes than needed for defining
> > > external
> > > > resource, which might cause problems. Instead, I would suggest to
> have
> > > > interface like `Map
> > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann 
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm a bit late to the party. I think the current proposal looks
> good.
> > > > >
> > > > > Concerning the ExternalResourceDriver interface defined in the FLIP
> > > [1], I
> > > > > would suggest to not include the decorator calls for Kubernetes and
> > > Yarn in
> > > > > the base interface. Instead I would suggest to segregate the
> deployment
> > > > > specific decorator calls into separate interfaces. That way an
> > > > > ExternalResourceDriver does not have to support all deployments
> from
> > > the
> > > > > very beginning. Moreover, some resources might not be supported by
> a
> > > > > specific deployment target and the natural way to express this
> would
> > > be to
> > > > > not implement the respective deployment specific interface.
> > > > >
> > > > > Moreover, having void
> > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > containerRequest)
> > > > > in the ExternalResourceDriver interface would require Hadoop on
> Flink's
> > > > > classpath whenever the external resource driver is being used.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Mar 26, 202

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he
Hi Timo,

Agree with you that streaming queries is our top priority,
but I think there are too many things need to discuss for multiline
statements:
e.g.
1. what's the behaivor of DDL and DML mixing for async execution:
create table t1 xxx;
create table t2 xxx;
insert into t2 select * from t1 where xxx;
drop table t1; // t1 may be a MySQL table, the data will also be deleted.

t1 is dropped when "insert" job is running.

2. what's the behaivor of unified scenario for async execution: (as you
mentioned)
INSERT INTO t1 SELECT * FROM s;
INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;

The result of the second statement is indeterministic, because the first
statement maybe is running.
I think we need to put a lot of effort to define the behavior of logically
related queries.

In this FLIP, I suggest we only handle single statement, and we also
introduce an async execute method
which is more important and more often used for users.

Dor the sync methods (like `TableEnvironment.executeSql` and
`StatementSet.execute`),
the result will be returned until the job is finished. The following
methods will be introduced in this FLIP:

 /**
  * Asynchronously execute the given single statement
  */
TableEnvironment.executeSqlAsync(String statement): TableResult

/**
 * Asynchronously execute the dml statements as a batch
 */
StatementSet.executeAsync(): TableResult

public interface TableResult {
   /**
* return JobClient for DQL and DML in async mode, else return
Optional.empty
*/
   Optional getJobClient();
}

what do you think?

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午9:15写道:

> Hi Godfrey,
>
> executing streaming queries must be our top priority because this is
> what distinguishes Flink from competitors. If we change the execution
> behavior, we should think about the other cases as well to not break the
> API a third time.
>
> I fear that just having an async execute method will not be enough
> because users should be able to mix streaming and batch queries in a
> unified scenario.
>
> If I remember it correctly, we had some discussions in the past about
> what decides about the execution mode of a query. Currently, we would
> like to let the query decide, not derive it from the sources.
>
> So I could image a multiline pipeline as:
>
> USE CATALOG 'mycat';
> INSERT INTO t1 SELECT * FROM s;
> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>
> For executeMultilineSql():
>
> sync because regular SQL
> sync because regular Batch SQL
> async because Streaming SQL
>
> For executeAsyncMultilineSql():
>
> async because everything should be async
> async because everything should be async
> async because everything should be async
>
> What we should not start for executeAsyncMultilineSql():
>
> sync because DDL
> async because everything should be async
> async because everything should be async
>
> What are you thoughts here?
>
> Regards,
> Timo
>
>
> On 26.03.20 12:50, godfrey he wrote:
> > Hi Timo,
> >
> > I agree with you that streaming queries mostly need async execution.
> > In fact, our original plan is only introducing sync methods in this FLIP,
> > and async methods (like "executeSqlAsync") will be introduced in the
> future
> > which is mentioned in the appendix.
> >
> > Maybe the async methods also need to be considered in this FLIP.
> >
> > I think sync methods is also useful for streaming which can be used to
> run
> > bounded source.
> > Maybe we should check whether all sources are bounded in sync execution
> > mode.
> >
> >> Also, if we block for streaming queries, we could never support
> >> multiline files. Because the first INSERT INTO would block the further
> >> execution.
> > agree with you, we need async method to submit multiline files,
> > and files should be limited that the DQL and DML should be always in the
> > end for streaming.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年3月26日周四 下午4:29写道:
> >
> >> Hi Godfrey,
> >>
> >> having control over the job after submission is a requirement that was
> >> requested frequently (some examples are [1], [2]). Users would like to
> >> get insights about the running or completed job. Including the jobId,
> >> jobGraph etc., the JobClient summarizes these properties.
> >>
> >> It is good to have a discussion about synchronous/asynchronous
> >> submission now to have a complete execution picture.
> >>
> >> I thought we submit streaming queries mostly async and just wait for the
> >> successful submission. If we block for streaming queries, how can we
> >> collect() or print() results?
> >>
> >> Also, if we block for streaming queries, we could never support
> >> multiline files. Because the first INSERT INTO would block the further
> >> execution.
> >>
> >> If we decide to block entirely on streaming queries, we need the async
> >> execution methods in the design already. However, I would rather go for
> >> non-blocking streaming queries. Also with the `EMIT STREAM` key word in
> >> mind that we might add to SQL statements soon.
>

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi Kurt,

> 2) Lists all available connectors seems also quite straightforward, e.g
user provided a wrong "kafka-0.8", we tell user we have candidates of
"kakfa-0.11", "kafka-universal"
It's not possible for the framework to throw such exception. Because the
framework doesn't know what versions do the connector support. All the
version information is a blackbox in the identifier. But with
`Factory#factoryVersion()` interface, we can know all the supported
versions.

> 3) I don't think so. We can still treat it as the same connector but with
different versions.
That's true but that's weird. Because from the plain DDL definition, they
look like different connectors with different "connector" value, e.g.
'connector=kafka-0.8', 'connector=kafka-0.10'.

> If users don't set any version, we will use "kafka-universal" instead.
The behavior is inconsistent IMO.
That is a long term vision when there is no kafka clusters with <0.11
version.
At that point, "universal" is the only supported version in Flink and the
"version" key can be optional.

-

Hi Jingsong,

> "version" vs "kafka.version"
I though about it. But if we prefix "kafka" to version, we should prefix
"kafka" for all other property keys, because they are all kafka specific
options.
However, that will make the property set verbose, see rejected option#2 in
the FLIP.

> explicitly separate options for source and sink
That's a good topic. It's good to have a guideline for the new property
keys.
I'm fine to prefix with a "source"/"sink" for some connector keys.
Actually, we already do this in some connectors, e.g. jdbc and hbase.

Best,
Jark

On Mon, 30 Mar 2020 at 16:36, Jingsong Li  wrote:

> Thanks Jark for the proposal.
>
> +1 to the general idea.
>
> For "version", what about "kafka.version"? It is obvious to know its
> meaning.
>
> And I'd like to start a new topic:
> Should we need to explicitly separate source from sink?
> With the development of batch and streaming, more and more connectors have
> both source and sink.
>
> So should we set a rule for table properties:
> - properties for both source and sink: without prefix, like "topic"
> - properties for source only: with "source." prefix, like
> "source.startup-mode"
> - properties for sink only: with "sink." prefix, like "sink.partitioner"
>
> What do you think?
>
> Best,
> Jingsong Lee
>
> On Mon, Mar 30, 2020 at 3:56 PM Jark Wu  wrote:
>
> > Hi Kurt,
> >
> > That's good questions.
> >
> > > the meaning of "version"
> > There are two versions in the old design. One is property version
> > "connector.property-version" which can be used for backward
> compatibility.
> > The other one is "connector.version" which defines the version of
> external
> > system, e.g. 0.11" for kafka, "6" or "7" for ES.
> > In this proposal, the "version" is the previous "connector.version". The
> > ""connector.property-version" is not introduced in new design.
> >
> > > how to keep the old capability which can evolve connector properties
> > The "connector.property-version" is an optional key in the old design and
> > is never bumped up.
> > I'm not sure how "connector.property-version" should work in the initial
> > design. Maybe @Timo Walther  has more knowledge on
> > this.
> > But for the new properties, every options should be expressed as
> > `ConfigOption` which provides `withDeprecatedKeys(...)` method to easily
> > support evolving keys.
> >
> > > a single keys instead of two, e.g. "kafka-0.11", "kafka-universal"?
> > There are several benefit to use separate "version" key I can see:
> > 1) it's more readable to separete them into different keys, because they
> > are orthogonal concepts.
> > 2) the planner can give all the availble versions in the exception
> message,
> > if user uses a wrong version (this is often reported in user ML).
> > 3) If we use "kafka-0.11" as connector identifier, we may have to write a
> > full documentation for each version, because they are different
> > "connector"?
> > IMO, for 0.11, 0.11, etc... kafka, they are actually the same
> connector
> > but with different "client jar" version,
> > they share all the same supported property keys and should reside
> > together.
> > 4) IMO, the future vision is version-free. At some point in the future,
> we
> > may don't need users to specify kafka version anymore, and make
> > "version=universal" as optional or removed in the future. This is can be
> > done easily if they are separate keys.
> >
> > Best,
> > Jark
> >
> >
> > On Mon, 30 Mar 2020 at 14:45, Kurt Young  wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for the proposal, I'm +1 to the general idea. However I have a
> > > question about "version",
> > > in the old design, the version seems to be aimed for tracking property
> > > version, with different
> > > version, we could evolve these step by step without breaking backward
> > > compatibility. But in this
> > > design, version is representing external system's version, like "0

Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles

2020-03-30 Thread Tzu-Li (Gordon) Tai
Repo has been created:
https://github.com/apache/flink-statefun-docker
https://gitbox.apache.org/repos/asf?p=flink-statefun-docker.git

On Mon, Mar 30, 2020 at 3:57 PM Tzu-Li (Gordon) Tai 
wrote:

> Thanks for the feedbacks everyone!
> Overall, everyone who replied is positive about creating a separate repo
> for the Stateful Functions Dockerfiles.
> Since such an action does not necessarily need a vote (as defined by the
> project bylaws), and we'd like the images to be available as soon as
> possible once the release is out,
> I'll proceed to create the repo and begin with the work on preparing the
> submission to the Docker Hub Official Images team.
>
> Cheers,
> Gordon
>
> On Fri, Mar 27, 2020 at 5:28 PM Zhijiang 
> wrote:
>
>> +1 for this proposal. Very reasonable analysis!
>>
>> Best,
>> Zhijiang
>>
>> --
>> From:Hequn Cheng 
>> Send Time:2020 Mar. 27 (Fri.) 09:46
>> To:dev 
>> Cc:private 
>> Subject:Re: [DISCUSS] Creating a new repo to host Stateful Functions
>> Dockerfiles
>>
>> +1 for a separate repository.
>> The dedicated `flink-docker` repo works fine now. We can do it similarly.
>>
>> Best,
>> Hequn
>>
>> On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann > > wrote:
>>
>> > +1 for a separate repository.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi  wrote:
>> >
>> > > +1.
>> > >
>>
>> > > The repo creation process is a light-weight, automated process on the ASF
>> > > side. When Patrick Lucas contributed docker-flink back to the Flink
>>
>> > > community (as flink-docker), there was virtually no overhead in creating
>> > > the repository. Reusing build scripts should still be possible at the
>> > cost
>> > > of some duplication which is fine imo.
>> > >
>> > > – Ufuk
>> > >
>> > > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen > > wrote:
>> > > >
>> > > > +1 to a separate repository.
>> > > >
>> > > > It seems to be best practice in the docker community.
>>
>> > > > And since it does not add overhead, why not go with the best practice?
>> > > >
>> > > > Best,
>> > > > Stephan
>> > > >
>> > > >
>> > > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai <
>> > tzuli...@apache.org
>> > > >
>> > > wrote:
>> > > >>
>> > > >> Hi Flink devs,
>> > > >>
>> > > >> As part of a Stateful Functions release, we would like to publish
>> > > Stateful
>> > > >> Functions Docker images to Dockerhub as an official image.
>> > > >>
>>
>> > > >> Some background context on Stateful Function images, for those who are
>> > > not
>> > > >> familiar with the project yet:
>> > > >>
>>
>> > > >>- Stateful Function images are built on top of the Flink official
>> > > >>images, with additional StateFun dependencies being added.
>> > > >>You can take a look at the scripts we currently use to build the
>> > > images
>> > > >>locally for development purposes [1].
>> > > >>- They are quite important for user experience, since building a
>> > > Docker
>> > > >>image is the recommended go-to deployment mode for StateFun user
>> > > >>applications [2].
>> > > >>
>> > > >>
>> > > >> A prerequisite for all of this is to first decide where we host the
>> > > >> Stateful Functions Dockerfiles,
>> > > >> before we can proceed with the process of requesting a new official
>> > > image
>> > > >> repository at Dockerhub.
>> > > >>
>> > > >> We’re proposing to create a new dedicated repo for this purpose,
>> > > >> with the name `apache/flink-statefun-docker`.
>> > > >>
>>
>> > > >> While we did initially consider integrating the StateFun Dockerfiles
>> > to
>> > > be
>> > > >> hosted together with the Flink ones in the existing
>> > > `apache/flink-docker`
>> > > >> repo, we had the following concerns:
>> > > >>
>>
>> > > >>- In general, it is a convention that each official Dockerhub image
>> > > is
>> > > >>backed by a dedicated source repo hosting the Dockerfiles.
>>
>> > > >>- The `apache/flink-docker` repo already has quite a few dedicated
>> > > >>tooling and CI smoke tests specific for the Flink images.
>> > > >>- Flink and StateFun have separate versioning schemes and
>> > independent
>>
>> > > >>release cycles. A new Flink release does not necessarily require a
>> > > >>“lock-step” to release new StateFun images as well.
>>
>> > > >>- Considering the above all-together, and the fact that creating a
>> > > new
>> > > >>repo is rather low-effort, having a separate repo would probably
>> > make
>> > > more
>> > > >>sense here.
>> > > >>
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >> Cheers,
>> > > >> Gordon
>> > > >>
>> > > >> [1]
>> > > >>
>> > >
>> > >
>> >
>> https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh
>> > > >> [2]
>> > > >>
>> > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html
>> > >
>> >
>>
>>
>>


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Jingsong Li
+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:

> +1
>
> Best,
> Kurt
>
>
> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>
> > +1 (non-binding)
> >
> > Jark Wu  于2020年3月30日周一 下午3:57写道:
> >
> > > +1 from my side.
> > >
> > > Thanks Timo for driving this.
> > >
> > > Best,
> > > Jark
> > >
> > > On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for FLIP-95 [1], which is discussed
> and
> > > > reached a consensus in the discussion thread [2].
> > > >
> > > > The vote will be open until April 2nd (72h), unless there is an
> > > > objection or not enough votes.
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 
Best, Jingsong Lee


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Kurt Young
+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:

> +1 (non-binding)
>
> Jark Wu  于2020年3月30日周一 下午3:57写道:
>
> > +1 from my side.
> >
> > Thanks Timo for driving this.
> >
> > Best,
> > Jark
> >
> > On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-95 [1], which is discussed and
> > > reached a consensus in the discussion thread [2].
> > >
> > > The vote will be open until April 2nd (72h), unless there is an
> > > objection or not enough votes.
> > >
> > > Thanks,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>


Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jingsong Li
Thanks Jark for the proposal.

+1 to the general idea.

For "version", what about "kafka.version"? It is obvious to know its
meaning.

And I'd like to start a new topic:
Should we need to explicitly separate source from sink?
With the development of batch and streaming, more and more connectors have
both source and sink.

So should we set a rule for table properties:
- properties for both source and sink: without prefix, like "topic"
- properties for source only: with "source." prefix, like
"source.startup-mode"
- properties for sink only: with "sink." prefix, like "sink.partitioner"

What do you think?

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 3:56 PM Jark Wu  wrote:

> Hi Kurt,
>
> That's good questions.
>
> > the meaning of "version"
> There are two versions in the old design. One is property version
> "connector.property-version" which can be used for backward compatibility.
> The other one is "connector.version" which defines the version of external
> system, e.g. 0.11" for kafka, "6" or "7" for ES.
> In this proposal, the "version" is the previous "connector.version". The
> ""connector.property-version" is not introduced in new design.
>
> > how to keep the old capability which can evolve connector properties
> The "connector.property-version" is an optional key in the old design and
> is never bumped up.
> I'm not sure how "connector.property-version" should work in the initial
> design. Maybe @Timo Walther  has more knowledge on
> this.
> But for the new properties, every options should be expressed as
> `ConfigOption` which provides `withDeprecatedKeys(...)` method to easily
> support evolving keys.
>
> > a single keys instead of two, e.g. "kafka-0.11", "kafka-universal"?
> There are several benefit to use separate "version" key I can see:
> 1) it's more readable to separete them into different keys, because they
> are orthogonal concepts.
> 2) the planner can give all the availble versions in the exception message,
> if user uses a wrong version (this is often reported in user ML).
> 3) If we use "kafka-0.11" as connector identifier, we may have to write a
> full documentation for each version, because they are different
> "connector"?
> IMO, for 0.11, 0.11, etc... kafka, they are actually the same connector
> but with different "client jar" version,
> they share all the same supported property keys and should reside
> together.
> 4) IMO, the future vision is version-free. At some point in the future, we
> may don't need users to specify kafka version anymore, and make
> "version=universal" as optional or removed in the future. This is can be
> done easily if they are separate keys.
>
> Best,
> Jark
>
>
> On Mon, 30 Mar 2020 at 14:45, Kurt Young  wrote:
>
> > Hi Jark,
> >
> > Thanks for the proposal, I'm +1 to the general idea. However I have a
> > question about "version",
> > in the old design, the version seems to be aimed for tracking property
> > version, with different
> > version, we could evolve these step by step without breaking backward
> > compatibility. But in this
> > design, version is representing external system's version, like "0.11"
> for
> > kafka, "6" or "7" for ES.
> > I'm not sure if this is necessary, what's the benefit of using two keys
> > instead of one, like "kafka-0.11"
> > or "ES6" directly? And how about the old capability which could let us
> > evolving connector properties?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Mar 30, 2020 at 2:36 PM LakeShen 
> > wrote:
> >
> > > Hi Jark,
> > >
> > > I am really looking forward to this feature. I think this feature
> > > could simplify flink sql code,and at the same time ,
> > > it could make the developer more easlier to config the flink sql WITH
> > > options.
> > >
> > > Now when I am using flink sql to write flink task , sometimes I think
> the
> > > WITH options is too long for user.
> > > For example,I config the kafka source connector parameter,for consumer
> > > group and brokers parameter:
> > >
> > >   'connector.properties.0.key' = 'group.id'
> > > >  , 'connector.properties.0.value' = 'xxx'
> > > >  , 'connector.properties.1.key' = 'bootstrap.servers'
> > > >  , 'connector.properties.1.value' = 'x'
> > > >
> > >
> > > I can understand this config , but for the flink fresh man,maybe it
> > > is confused for him.
> > > In my thought, I am really looking forward to this feature,thank you to
> > > propose this feature.
> > >
> > > Best wishes,
> > > LakeShen
> > >
> > >
> > > Jark Wu  于2020年3月30日周一 下午2:02写道:
> > >
> > > > Hi everyone,
> > > >
> > > > I want to start a discussion about further improve and simplify our
> > > current
> > > > connector porperty keys, aka WITH options. Currently, we have a
> > > > 'connector.' prefix for many properties, but they are verbose, and we
> > > see a
> > > > big inconsistency between the properties when designing FLIP-107.
> > > >
> > > > So we propose to remove all the 'connector.' prefix and rename
> > > > 'connector.type' to 'c

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Kurt Young
> 1) it's more readable to separete them into different keys, because they
are orthogonal concepts.
I'm not sure whether this is more clear, but using one key will be
definitely one line less.

> 2) the planner can give all the availble versions in the exception
message, if user uses a wrong version (this is often reported in user ML).
Lists all available connectors seems also quite straightforward, e.g user
provided a wrong "kafka-0.8", we tell user we have candidates of
"kakfa-0.11", "kafka-universal"

> 3) we may have to write a full documentation for each version, because
they are different "connector"?
I don't think so. We can still treat it as the same connector but with
different versions.

> 4)  At some point in the future, we may don't need users to specify kafka
version anymore.
I think this could make user confuse. For example, let's say current
"universal" kafka connector is the unified connector, which can support
all kinds of versions. Then users will have different experience regarding
to version:
If you provide "version=0.11", we will use "kafka-0.11" for connector. But
if users don't set any version, we will use "kafka-universal" instead.
The behavior is inconsistent IMO.

Best,
Kurt


On Mon, Mar 30, 2020 at 3:56 PM Jark Wu  wrote:

> Hi Kurt,
>
> That's good questions.
>
> > the meaning of "version"
> There are two versions in the old design. One is property version
> "connector.property-version" which can be used for backward compatibility.
> The other one is "connector.version" which defines the version of external
> system, e.g. 0.11" for kafka, "6" or "7" for ES.
> In this proposal, the "version" is the previous "connector.version". The
> ""connector.property-version" is not introduced in new design.
>
> > how to keep the old capability which can evolve connector properties
> The "connector.property-version" is an optional key in the old design and
> is never bumped up.
> I'm not sure how "connector.property-version" should work in the initial
> design. Maybe @Timo Walther  has more knowledge on
> this.
> But for the new properties, every options should be expressed as
> `ConfigOption` which provides `withDeprecatedKeys(...)` method to easily
> support evolving keys.
>
> > a single keys instead of two, e.g. "kafka-0.11", "kafka-universal"?
> There are several benefit to use separate "version" key I can see:
> 1) it's more readable to separete them into different keys, because they
> are orthogonal concepts.
> 2) the planner can give all the availble versions in the exception message,
> if user uses a wrong version (this is often reported in user ML).
> 3) If we use "kafka-0.11" as connector identifier, we may have to write a
> full documentation for each version, because they are different
> "connector"?
> IMO, for 0.11, 0.11, etc... kafka, they are actually the same connector
> but with different "client jar" version,
> they share all the same supported property keys and should reside
> together.
> 4) IMO, the future vision is version-free. At some point in the future, we
> may don't need users to specify kafka version anymore, and make
> "version=universal" as optional or removed in the future. This is can be
> done easily if they are separate keys.
>
> Best,
> Jark
>
>
> On Mon, 30 Mar 2020 at 14:45, Kurt Young  wrote:
>
> > Hi Jark,
> >
> > Thanks for the proposal, I'm +1 to the general idea. However I have a
> > question about "version",
> > in the old design, the version seems to be aimed for tracking property
> > version, with different
> > version, we could evolve these step by step without breaking backward
> > compatibility. But in this
> > design, version is representing external system's version, like "0.11"
> for
> > kafka, "6" or "7" for ES.
> > I'm not sure if this is necessary, what's the benefit of using two keys
> > instead of one, like "kafka-0.11"
> > or "ES6" directly? And how about the old capability which could let us
> > evolving connector properties?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Mar 30, 2020 at 2:36 PM LakeShen 
> > wrote:
> >
> > > Hi Jark,
> > >
> > > I am really looking forward to this feature. I think this feature
> > > could simplify flink sql code,and at the same time ,
> > > it could make the developer more easlier to config the flink sql WITH
> > > options.
> > >
> > > Now when I am using flink sql to write flink task , sometimes I think
> the
> > > WITH options is too long for user.
> > > For example,I config the kafka source connector parameter,for consumer
> > > group and brokers parameter:
> > >
> > >   'connector.properties.0.key' = 'group.id'
> > > >  , 'connector.properties.0.value' = 'xxx'
> > > >  , 'connector.properties.1.key' = 'bootstrap.servers'
> > > >  , 'connector.properties.1.value' = 'x'
> > > >
> > >
> > > I can understand this config , but for the flink fresh man,maybe it
> > > is confused for him.
> > > In my thought, I am really looking forward to this feature,thank you to

Re: [DISCUSS] FLIP-114: Support Python UDF in SQL Client

2020-03-30 Thread Hequn Cheng
Hi Wei,

Thanks a lot for the proposal! +1 for the VOTE.

Best,
Hequn



On Mon, Mar 30, 2020 at 3:31 PM Dian Fu  wrote:

> Thanks Wei for this work! +1 to bring up the VOTE thread.
>
> > 在 2020年3月30日,下午2:43,jincheng sun  写道:
> >
> > Hi Wei,
> >
> > +1, Thanks for this discussion which is crucial for SQL users to use
> > PyFlink. Would be great to bring up the VOTE thread.
> >
> > Best,
> > Jincheng
> >
> >
> > Wei Zhong  于2020年3月30日周一 下午2:38写道:
> >
> >> Hi everyone,
> >>
> >> Are there more comments about this FLIP? If not, I would like to bring
> up
> >> the VOTE.
> >>
> >> Best,
> >> Wei
> >>
> >>> 在 2020年3月9日,23:18,Xingbo Huang  写道:
> >>>
> >>> Hi Godfrey,
> >>> thanks for your suggestion.
> >>> I have added two examples how to use python UDF
> >>> in SQL and how to start sql-client.sh with full python dependencies In
> >> FLIP.
> >>>
> >>> Best,
> >>> Xingo
> >>>
> >>> godfrey he  于2020年3月9日周一 下午10:24写道:
> >>>
>  Hi Wei, thanks for the proposal.
> 
>  I think it's better to give two more examples, one is how to use
> python
> >> UDF
>  in SQL, another is how to start sql-client.sh with full python
>  dependencies.
> 
>  Best,
>  Godfrey
> 
>  Wei Zhong  于2020年3月9日周一 下午10:09写道:
> 
> > Hi everyone,
> >
> > I would like to start discussion about how to support Python UDF in
> SQL
> > Client.
> >
> > Flink Python UDF(FLIP-58[1]) has already been introduced in the
> release
>  of
> > 1.10.0 and the support for SQL DDL is introduced in FLIP-106[2].
> >
> > SQL Client defines UDF via the environment file and has its own CLI
> > implementation to manage dependencies, but neither of which supports
>  Python
> > UDF. We want to introduce the support of Python UDF for SQL Client,
> > including the registration and the dependency management of Python
> UDF.
> >
> > Here is the design doc:
> >
> >
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client
> >
> > Looking forward to your feedback!
> >
> > Best,
> > Wei
> >
> > [1]
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > [2]
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
> >
> >
> 
> >>
> >>
>
>


[jira] [Created] (FLINK-16862) Use proper example url in quickstarts

2020-03-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16862:


 Summary: Use proper example url in quickstarts
 Key: FLINK-16862
 URL: https://issues.apache.org/jira/browse/FLINK-16862
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 1.8.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.3, 1.10.1, 1.11.0


Our quickstarts define an example url {{http:/myorganization.org}}, which as it 
turns out is a real website.

We should just remove tag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16861) Adding delegation token to the AM container Failed

2020-03-30 Thread Xianxun Ye (Jira)
Xianxun Ye created FLINK-16861:
--

 Summary: Adding delegation token to the AM container Failed
 Key: FLINK-16861
 URL: https://issues.apache.org/jira/browse/FLINK-16861
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0, 1.9.1
Reporter: Xianxun Ye


This EXP happended When I use hive connector with kerberos auth.

Althrough modify the 'yarn-site.xml' yarn.resourcemanager.principal value to 
the special principal, I can fix this. But I have to change this value every 
times by modify yarn-site.xml file, when submit a hive conncetor job. This is 
not very convenient for dev.

 
{code:java}
//代码占位符
2020-03-06 22:58:01,778 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Adding 
delegation token to the AM container..
2020-03-06 22:58:01,781 ERROR org.apache.flink.client.cli.CliFrontend   
- Error while running the command.
org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy 
Yarn job cluster.
at 
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82)
at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: java.io.IOException: Can't get Master Kerberos principal for use as 
renewer
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:269)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:929)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
at 
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:75)
... 9 more

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Benchao Li
+1 (non-binding)

Jark Wu  于2020年3月30日周一 下午3:57写道:

> +1 from my side.
>
> Thanks Timo for driving this.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
>
> > Hi all,
> >
> > I would like to start the vote for FLIP-95 [1], which is discussed and
> > reached a consensus in the discussion thread [2].
> >
> > The vote will be open until April 2nd (72h), unless there is an
> > objection or not enough votes.
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles

2020-03-30 Thread Tzu-Li (Gordon) Tai
Thanks for the feedbacks everyone!
Overall, everyone who replied is positive about creating a separate repo
for the Stateful Functions Dockerfiles.
Since such an action does not necessarily need a vote (as defined by the
project bylaws), and we'd like the images to be available as soon as
possible once the release is out,
I'll proceed to create the repo and begin with the work on preparing the
submission to the Docker Hub Official Images team.

Cheers,
Gordon

On Fri, Mar 27, 2020 at 5:28 PM Zhijiang  wrote:

> +1 for this proposal. Very reasonable analysis!
>
> Best,
> Zhijiang
>
> --
> From:Hequn Cheng 
> Send Time:2020 Mar. 27 (Fri.) 09:46
> To:dev 
> Cc:private 
> Subject:Re: [DISCUSS] Creating a new repo to host Stateful Functions
> Dockerfiles
>
> +1 for a separate repository.
> The dedicated `flink-docker` repo works fine now. We can do it similarly.
>
> Best,
> Hequn
>
> On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann  > wrote:
>
> > +1 for a separate repository.
> >
> > Cheers,
> > Till
> >
> > On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi  wrote:
> >
> > > +1.
> > >
>
> > > The repo creation process is a light-weight, automated process on the ASF
> > > side. When Patrick Lucas contributed docker-flink back to the Flink
>
> > > community (as flink-docker), there was virtually no overhead in creating
> > > the repository. Reusing build scripts should still be possible at the
> > cost
> > > of some duplication which is fine imo.
> > >
> > > – Ufuk
> > >
> > > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen  wrote:
> > > >
> > > > +1 to a separate repository.
> > > >
> > > > It seems to be best practice in the docker community.
>
> > > > And since it does not add overhead, why not go with the best practice?
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > wrote:
> > > >>
> > > >> Hi Flink devs,
> > > >>
> > > >> As part of a Stateful Functions release, we would like to publish
> > > Stateful
> > > >> Functions Docker images to Dockerhub as an official image.
> > > >>
>
> > > >> Some background context on Stateful Function images, for those who are
> > > not
> > > >> familiar with the project yet:
> > > >>
> > > >>- Stateful Function images are built on top of the Flink official
> > > >>images, with additional StateFun dependencies being added.
> > > >>You can take a look at the scripts we currently use to build the
> > > images
> > > >>locally for development purposes [1].
> > > >>- They are quite important for user experience, since building a
> > > Docker
> > > >>image is the recommended go-to deployment mode for StateFun user
> > > >>applications [2].
> > > >>
> > > >>
> > > >> A prerequisite for all of this is to first decide where we host the
> > > >> Stateful Functions Dockerfiles,
> > > >> before we can proceed with the process of requesting a new official
> > > image
> > > >> repository at Dockerhub.
> > > >>
> > > >> We’re proposing to create a new dedicated repo for this purpose,
> > > >> with the name `apache/flink-statefun-docker`.
> > > >>
> > > >> While we did initially consider integrating the StateFun Dockerfiles
> > to
> > > be
> > > >> hosted together with the Flink ones in the existing
> > > `apache/flink-docker`
> > > >> repo, we had the following concerns:
> > > >>
>
> > > >>- In general, it is a convention that each official Dockerhub image
> > > is
> > > >>backed by a dedicated source repo hosting the Dockerfiles.
>
> > > >>- The `apache/flink-docker` repo already has quite a few dedicated
> > > >>tooling and CI smoke tests specific for the Flink images.
> > > >>- Flink and StateFun have separate versioning schemes and
> > independent
>
> > > >>release cycles. A new Flink release does not necessarily require a
> > > >>“lock-step” to release new StateFun images as well.
>
> > > >>- Considering the above all-together, and the fact that creating a
> > > new
> > > >>repo is rather low-effort, having a separate repo would probably
> > make
> > > more
> > > >>sense here.
> > > >>
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Cheers,
> > > >> Gordon
> > > >>
> > > >> [1]
> > > >>
> > >
> > >
> >
> https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh
> > > >> [2]
> > > >>
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html
> > >
> >
>
>
>


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Jark Wu
+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:

> Hi all,
>
> I would like to start the vote for FLIP-95 [1], which is discussed and
> reached a consensus in the discussion thread [2].
>
> The vote will be open until April 2nd (72h), unless there is an
> objection or not enough votes.
>
> Thanks,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>


Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Jark Wu
Hi Kurt,

That's good questions.

> the meaning of "version"
There are two versions in the old design. One is property version
"connector.property-version" which can be used for backward compatibility.
The other one is "connector.version" which defines the version of external
system, e.g. 0.11" for kafka, "6" or "7" for ES.
In this proposal, the "version" is the previous "connector.version". The
""connector.property-version" is not introduced in new design.

> how to keep the old capability which can evolve connector properties
The "connector.property-version" is an optional key in the old design and
is never bumped up.
I'm not sure how "connector.property-version" should work in the initial
design. Maybe @Timo Walther  has more knowledge on
this.
But for the new properties, every options should be expressed as
`ConfigOption` which provides `withDeprecatedKeys(...)` method to easily
support evolving keys.

> a single keys instead of two, e.g. "kafka-0.11", "kafka-universal"?
There are several benefit to use separate "version" key I can see:
1) it's more readable to separete them into different keys, because they
are orthogonal concepts.
2) the planner can give all the availble versions in the exception message,
if user uses a wrong version (this is often reported in user ML).
3) If we use "kafka-0.11" as connector identifier, we may have to write a
full documentation for each version, because they are different "connector"?
IMO, for 0.11, 0.11, etc... kafka, they are actually the same connector
but with different "client jar" version,
they share all the same supported property keys and should reside
together.
4) IMO, the future vision is version-free. At some point in the future, we
may don't need users to specify kafka version anymore, and make
"version=universal" as optional or removed in the future. This is can be
done easily if they are separate keys.

Best,
Jark


On Mon, 30 Mar 2020 at 14:45, Kurt Young  wrote:

> Hi Jark,
>
> Thanks for the proposal, I'm +1 to the general idea. However I have a
> question about "version",
> in the old design, the version seems to be aimed for tracking property
> version, with different
> version, we could evolve these step by step without breaking backward
> compatibility. But in this
> design, version is representing external system's version, like "0.11" for
> kafka, "6" or "7" for ES.
> I'm not sure if this is necessary, what's the benefit of using two keys
> instead of one, like "kafka-0.11"
> or "ES6" directly? And how about the old capability which could let us
> evolving connector properties?
>
> Best,
> Kurt
>
>
> On Mon, Mar 30, 2020 at 2:36 PM LakeShen 
> wrote:
>
> > Hi Jark,
> >
> > I am really looking forward to this feature. I think this feature
> > could simplify flink sql code,and at the same time ,
> > it could make the developer more easlier to config the flink sql WITH
> > options.
> >
> > Now when I am using flink sql to write flink task , sometimes I think the
> > WITH options is too long for user.
> > For example,I config the kafka source connector parameter,for consumer
> > group and brokers parameter:
> >
> >   'connector.properties.0.key' = 'group.id'
> > >  , 'connector.properties.0.value' = 'xxx'
> > >  , 'connector.properties.1.key' = 'bootstrap.servers'
> > >  , 'connector.properties.1.value' = 'x'
> > >
> >
> > I can understand this config , but for the flink fresh man,maybe it
> > is confused for him.
> > In my thought, I am really looking forward to this feature,thank you to
> > propose this feature.
> >
> > Best wishes,
> > LakeShen
> >
> >
> > Jark Wu  于2020年3月30日周一 下午2:02写道:
> >
> > > Hi everyone,
> > >
> > > I want to start a discussion about further improve and simplify our
> > current
> > > connector porperty keys, aka WITH options. Currently, we have a
> > > 'connector.' prefix for many properties, but they are verbose, and we
> > see a
> > > big inconsistency between the properties when designing FLIP-107.
> > >
> > > So we propose to remove all the 'connector.' prefix and rename
> > > 'connector.type' to 'connector', 'format.type' to 'format'. So a new
> > Kafka
> > > DDL may look like this:
> > >
> > > CREATE TABLE kafka_table (
> > >  ...
> > > ) WITH (
> > >  'connector' = 'kafka',
> > >  'version' = '0.10',
> > >  'topic' = 'test-topic',
> > >  'startup-mode' = 'earliest-offset',
> > >  'properties.bootstrap.servers' = 'localhost:9092',
> > >  'properties.group.id' = 'testGroup',
> > >  'format' = 'json',
> > >  'format.fail-on-missing-field' = 'false'
> > > );
> > >
> > > The new connector property key set will come together with new Factory
> > > inferface which is proposed in FLIP-95. Old properties are still
> > compatible
> > > with their existing implementation. New properties are only available
> in
> > > new DynamicTableFactory implementations.
> > >
> > > You can access the detailed FLIP here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%

[jira] [Created] (FLINK-16860) TableException: Failed to push filter into table source! when upgrading flink to 1.9.2

2020-03-30 Thread Nikola (Jira)
Nikola created FLINK-16860:
--

 Summary: TableException: Failed to push filter into table source! 
when upgrading flink to 1.9.2
 Key: FLINK-16860
 URL: https://issues.apache.org/jira/browse/FLINK-16860
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ORC
Affects Versions: 1.10.0, 1.9.2
 Environment: flink 1.8.2

flink 1.9.2
Reporter: Nikola
 Attachments: flink-1.8.2.txt, flink-1.9.2.txt

We have a batch job which we currently have on a flink cluster running 1.8.2
The job runs fine. We wanted to upgrade to flink 1.10, but that yielded errors, 
so we started downgrading until we found that the issue is in flink 1.9.2

The job on 1.9.2 fails with:
{code:java}
Caused by: org.apache.flink.table.api.TableException: Failed to push filter 
into table source! table source with pushdown capability must override and 
change explainSource() API to explain the pushdown applied!{code}
 

Which is not happening on flink 1.8.2

I tried to narrow it down and it seems that this exception has been added in 
FLINK-12399 and there was a small discussion regarding the exception: 
[https://github.com/apache/flink/pull/8468#discussion_r329876088]

Our code looks something like this:



 
{code:java}
String tempTableName = "tempTable";
String sql = SqlBuilder.buildSql(tempTableName);
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);
OrcTableSource orcTableSource = OrcTableSource.builder()
 .path(hdfsFolder, true)
 .forOrcSchema(ORC.getSchema())
 .withConfiguration(config)
 .build();
tableEnv.registerTableSource(tempTableName, orcTableSource);
Table tempTable = tableEnv.sqlQuery(sql);
return tableEnv.toDataSet(tempTable, Row.class); 
{code}

Where the sql build is nothing more than
{code:java}
SELECT * FROM table WHERE id IN (1,2,3) AND mid IN(4,5,6){code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-30 Thread Gary Yao
>
> The links work for me now. Someone might have fixed them. Never mind.
>

Actually, I fixed the links after seeing your email. Thanks for reporting.

Best,
Gary

On Mon, Mar 30, 2020 at 3:48 AM Xintong Song  wrote:

> @ZhuZhu
>
> The links work for me now. Someone might have fixed them. Never mind.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 30, 2020 at 1:31 AM Zhu Zhu  wrote:
>
> > Thanks for the comments!
> >
> > To Xintong,
> > It's a bit strange since the in page links work as expected. Would you
> take
> > another try?
> >
> > To Till,
> > - Regarding the idea to improve to SlotProvider interface
> > I think it is a good idea and thanks a lot! In the current design we make
> > slot requests for batch jobs to wait for resources without timeout as
> long
> > as the JM see enough slots overall. This implicitly add assumption that
> > tasks can finish and slots are be returned. This, however, would not work
> > in the mixed bounded/unbounded workloads as you mentioned.
> > Your idea looks more clear that it always allow slot allocations to wait
> > and not time out as long as it see enough slots. And the 'enough' check
> is
> > with regard to slots that can be returned (for bounded tasks) and slots
> > that will be occupied forever (for unbounded tasks), so that streaming
> jobs
> > can naturally throw slot allocation timeout errors if the cluster does
> not
> > have enough resources for all the tasks to run at the same time.
> > I will take a deeper thought to see how we can implement it this way.
> >
> > - Regarding the idea to solve "Resource deadlocks when slot allocation
> > competition happens between multiple jobs in a session cluster"
> > Agreed it's also possible to let the RM to revoke the slots to unblock
> the
> > oldest bulk of requests first. That would require some extra work to
> change
> > the RM to holds the requests before it is sure the slots are successfully
> > assigned to the JM (currently the RM removes pending requests right after
> > the requests are sent to TM without confirming wether the slot offers
> > succeed). We can look deeper into it later when we are about to support
> > variant sizes slots.
> >
> > Thanks,
> > Zhu Zhu
> >
> >
> > Till Rohrmann  于2020年3月27日周五 下午10:59写道:
> >
> > > Thanks for creating this FLIP Zhu Zhu and Gary!
> > >
> > > +1 for adding pipelined region scheduling.
> > >
> > > Concerning the extended SlotProvider interface I have an idea how we
> > could
> > > further improve it. If I am not mistaken, then you have proposed to
> > > introduce the two timeouts in order to distinguish between batch and
> > > streaming jobs and to encode that batch job requests can wait if there
> > are
> > > enough resources in the SlotPool (not necessarily being available right
> > > now). I think what we actually need to tell the SlotProvider is
> whether a
> > > request will use the slot only for a limited time or not. This is
> exactly
> > > the difference between processing bounded and unbounded streams. If the
> > > SlotProvider knows this difference, then it can tell which slots will
> > > eventually be reusable and which not. Based on this it can tell
> whether a
> > > slot request can be fulfilled eventually or whether we fail after the
> > > specified timeout. Another benefit of this approach would be that we
> can
> > > easily support mixed bounded/unbounded workloads. What we would need to
> > > know for this approach is whether a pipelined region is processing a
> > > bounded or unbounded stream.
> > >
> > > To give an example let's assume we request the following sets of slots
> > > where each pipelined region requires the same slots:
> > >
> > > slotProvider.allocateSlots(pr1_bounded, timeout);
> > > slotProvider.allocateSlots(pr2_unbounded, timeout);
> > > slotProvider.allocateSlots(pr3_bounded, timeout);
> > >
> > > Let's assume we receive slots for pr1_bounded in < timeout and can then
> > > fulfill the request. Then we request pr2_unbounded. Since we know that
> > > pr1_bounded will complete eventually, we don't fail this request after
> > > timeout. Next we request pr3_bounded after pr2_unbounded has been
> > > completed. In this case, we see that we need to request new resources
> > > because pr2_unbounded won't release its slots. Hence, if we cannot
> > allocate
> > > new resources within timeout, we fail this request.
> > >
> > > A small comment concerning "Resource deadlocks when slot allocation
> > > competition happens between multiple jobs in a session cluster":
> Another
> > > idea to solve this situation would be to give the ResourceManager the
> > right
> > > to revoke slot assignments in order to change the mapping between
> > requests
> > > and available slots.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Mar 27, 2020 at 12:44 PM Xintong Song 
> > > wrote:
> > >
> > > > Gary & Zhu Zhu,
> > > >
> > > > Thanks for preparing this FLIP, and a BIG +1 from my side. The
> > trade-off
> > > > between resource utilization and potenti

[VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-03-30 Thread Timo Walther

Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an 
objection or not enough votes.


Thanks,
Timo

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2] 
https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E


Re: [DISCUSS] FLIP-114: Support Python UDF in SQL Client

2020-03-30 Thread Dian Fu
Thanks Wei for this work! +1 to bring up the VOTE thread.

> 在 2020年3月30日,下午2:43,jincheng sun  写道:
> 
> Hi Wei,
> 
> +1, Thanks for this discussion which is crucial for SQL users to use
> PyFlink. Would be great to bring up the VOTE thread.
> 
> Best,
> Jincheng
> 
> 
> Wei Zhong  于2020年3月30日周一 下午2:38写道:
> 
>> Hi everyone,
>> 
>> Are there more comments about this FLIP? If not, I would like to bring up
>> the VOTE.
>> 
>> Best,
>> Wei
>> 
>>> 在 2020年3月9日,23:18,Xingbo Huang  写道:
>>> 
>>> Hi Godfrey,
>>> thanks for your suggestion.
>>> I have added two examples how to use python UDF
>>> in SQL and how to start sql-client.sh with full python dependencies In
>> FLIP.
>>> 
>>> Best,
>>> Xingo
>>> 
>>> godfrey he  于2020年3月9日周一 下午10:24写道:
>>> 
 Hi Wei, thanks for the proposal.
 
 I think it's better to give two more examples, one is how to use python
>> UDF
 in SQL, another is how to start sql-client.sh with full python
 dependencies.
 
 Best,
 Godfrey
 
 Wei Zhong  于2020年3月9日周一 下午10:09写道:
 
> Hi everyone,
> 
> I would like to start discussion about how to support Python UDF in SQL
> Client.
> 
> Flink Python UDF(FLIP-58[1]) has already been introduced in the release
 of
> 1.10.0 and the support for SQL DDL is introduced in FLIP-106[2].
> 
> SQL Client defines UDF via the environment file and has its own CLI
> implementation to manage dependencies, but neither of which supports
 Python
> UDF. We want to introduce the support of Python UDF for SQL Client,
> including the registration and the dependency management of Python UDF.
> 
> Here is the design doc:
> 
> 
> 
 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client
> 
> Looking forward to your feedback!
> 
> Best,
> Wei
> 
> [1]
> 
 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [2]
> 
 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
> 
> 
 
>> 
>>