[jira] [Created] (FLINK-16987) Add new table source and sink interfaces

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16987:


 Summary: Add new table source and sink interfaces
 Key: FLINK-16987
 URL: https://issues.apache.org/jira/browse/FLINK-16987
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther


Proper support for handling changelogs, more efficient processing of data 
through the new Blink planner, and unified interfaces that are DataStream API 
agnostic make it necessary to rework the table source and sink interfaces.

The goals of this FLIP are:
 * *Simplify the current interface architecture*:
 * Merge upsert, retract, and append sinks.
 * Unify batch and streaming sources.
 * Unify batch and streaming sinks.


 * *Allow sources to produce a changelog*:
 * UpsertTableSources have been requested a lot by users. Now is the time to 
open the internal planner capabilities via the new interfaces.
 * According to FLIP-105, we would like to support changelogs for processing 
formats such as [Debezium|https://debezium.io/].


 * *Don't rely on DataStream API for source and sinks*:
 * According to FLIP-32, the Table API and SQL should be independent of the 
DataStream API which is why the `table-common` module has no dependencies on 
`flink-streaming-java`.
 * Source and sink implementations should only depend on the `table-common` 
module after FLIP-27.
 * Until FLIP-27 is ready, we still put most of the interfaces in 
`table-common` and strictly separate interfaces that communicate with a planner 
and actual runtime reader/writers.


 * *Implement efficient sources and sinks without planner dependencies*:
 * Make Blink's internal data structures available to connectors.
 * Introduce stable interfaces for data structures that can be marked as 
`@PublicEvolving`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16988) Add core table source/sink interfaces

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16988:


 Summary: Add core table source/sink interfaces
 Key: FLINK-16988
 URL: https://issues.apache.org/jira/browse/FLINK-16988
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


This will add the most important interfaces for the new source/sink interfaces:
 * DynamicTableSource
 * ScanTableSource extends DynamicTableSource
 * LookupTableSource extends DynamicTableSource
 * DynamicTableSink

And some initial ability interfaces:
 * SupportsComputedColumnPushDown
 * SupportsFilterPushDown

All interfaces will have extended JavaDocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Tzu-Li (Gordon) Tai
FYI -
There are these open PRs to add blog posts and update the Flink website for
the Stateful Functions 2.0 release:
* https://github.com/apache/flink-web/pull/322
* https://github.com/apache/flink-web/pull/321

On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf 
wrote:

> +1 (non-binding)
>
> ** Functional **
> - Building from source dist with end-to-end tests enabled (mvn clean verify
> -Prun-e2e-tests) passes (JDK 8)
> - Flink Harness works in IDE
> - Building Python SDK dist from source
>
> On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > +1 (binding)
> >
> > ** Legal **
> > - checksums and GPG files match corresponding release files
> > - Source distribution does not contain binaries, contents are sane (no
> > .git* / .travis* / generated html content files)
> > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > font-awesome, jquery dependency in docs and copied sources from fastutil
> (
> > http://fastutil.di.unimi.it/)
> > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > Artifacts that do bundle dependencies are: statefun-flink-distribution,
> > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > sources). All non-ASLv2 deps have license files explicitly bundled.
> > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
> > NOTICE files (no bundled dependencies)
> > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
> point
> > to same version “2.0.0”
> > - README looks good
> >
> > ** Functional **
> > - Building from source dist with end-to-end tests enabled (mvn clean
> verify
> > -Prun-e2e-tests) passes (JDK 8)
> > - Generated quickstart from archetype looks good (correct POM /
> Dockerfile
> > / service file)
> > - Examples run: Java Greeter / Java Ridesharing / Python Greeter / Python
> > SDK Walkthrough
> > - Flink Harness works in IDE
> > - Test remote functions deployment mode with AWS ecosystem: remote Python
> > functions running in AWS Lambda behind AWS API Gateway, Java embedded
> > functions running in AWS ECS. Checkpointing enabled, randomly restarted
> > StateFun workers.
> >
> > On Fri, Apr 3, 2020 at 11:48 AM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the *release candidate #6* for the
> > > version 2.0.0 of Apache Flink Stateful Functions,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Testing Guideline**
> > >
> > > You can find here [1] a doc that we can use for collaborating testing
> > > efforts.
> > > The listed testing tasks in the doc also serve as a guideline in what
> to
> > > test for this release.
> > > If you wish to take ownership of a testing task, simply put your name
> > down
> > > in the "Checked by" field of the task.
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Stateful Functions canonical source distribution, to be deployed to
> > the
> > > release repository at dist.apache.org
> > > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a) and b) can be found in the corresponding dev
> > > repository at dist.apache.org [2]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> > >
> > > All artifacts are signed with the
> > > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> > >
> > > Other links for your review:
> > > * JIRA release notes [5]
> > > * source code tag "release-2.0.0-rc6" [6] [7]
> > > * PR to update the website Downloads page to include Stateful Functions
> > > links [8]
> > >
> > > **Extra Remarks**
> > >
> > > * Part of the release is also official Docker images for Stateful
> > > Functions. This can be a separate process, since the creation of those
> > > relies on the fact that we have distribution jars already deployed to
> > > Maven. I will follow-up with this after these artifacts are officially
> > > released.
> > > * The Flink Website and blog post is also being worked on (by Marta) as
> > > part of the release, to incorporate the new Stateful Functions project.
> > We
> > > can follow up with a link to those changes afterwards in this vote
> > thread,
> > > but that would not block you to test and cast your votes already.
> > > * Since the Flink website changes are still being worked on, you will
> not
> > > yet be able to find the Stateful Functions docs from there. Here are
> the
> > > links [9] [10].
> > >
> > > **Vote Duration**
> > >
> > > I propose to have the voting time for this RC to be 96 hours (including
> > > weekend) / 48 hours (excluding weekend).
> > >
> > > The voting time will therefore run until at least next *

[jira] [Created] (FLINK-16989) Support ScanTableSource in planner

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16989:


 Summary: Support ScanTableSource in planner
 Key: FLINK-16989
 URL: https://issues.apache.org/jira/browse/FLINK-16989
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Support the {{ScanTableSource}} interface in planner. Utility methods for 
creating type information and the data structure converters might not be 
implemented yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16990) Support LookupTableSource in planner

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16990:


 Summary: Support LookupTableSource in planner
 Key: FLINK-16990
 URL: https://issues.apache.org/jira/browse/FLINK-16990
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Support the {{LookupTableSource}} interface in planner. Utility methods for the 
data structure converters might not be implemented yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16991) Support DynamicTableSink in planner

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16991:


 Summary: Support DynamicTableSink in planner
 Key: FLINK-16991
 URL: https://issues.apache.org/jira/browse/FLINK-16991
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Support the {{DynamicTableSink}} interface in planner. Utility methods for the 
data structure converters might not be implemented yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16992) Add ability interfaces for table source/sink

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16992:


 Summary: Add ability interfaces for table source/sink
 Key: FLINK-16992
 URL: https://issues.apache.org/jira/browse/FLINK-16992
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


This will add the ability interfaces mentioned in FLIP-95:
 * SupportsWatermarkPushDown
 * SupportsProjectionPushDown
 * + already existing interfaces



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16993) Support SupportsComputedColumnPushDown in planner

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16993:


 Summary: Support SupportsComputedColumnPushDown in planner
 Key: FLINK-16993
 URL: https://issues.apache.org/jira/browse/FLINK-16993
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Support the {{SupportsComputedColumnPushDown}} interface for 
{{ScanTableSource}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16994) Support SupportsWatermarkPushDown in planner

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16994:


 Summary: Support SupportsWatermarkPushDown in planner
 Key: FLINK-16994
 URL: https://issues.apache.org/jira/browse/FLINK-16994
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Support the {{SupportsWatermarkPushDown}} interface for {{ScanTableSource}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16995) Add new data structure interfaces in table-common

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16995:


 Summary: Add new data structure interfaces in table-common
 Key: FLINK-16995
 URL: https://issues.apache.org/jira/browse/FLINK-16995
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther


This add the new data structure interfaces to {{table-common}}.

The planner and connector refactoring happens in a separate issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16996) Refactor planner and connectors to use new data structures

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16996:


 Summary: Refactor planner and connectors to use new data structures
 Key: FLINK-16996
 URL: https://issues.apache.org/jira/browse/FLINK-16996
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Ecosystem, Table SQL / Planner
Reporter: Timo Walther


Refactors existing code to use the new data structures interfaces.

This issue might be split into smaller subtasks if necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-06 Thread Aljoscha Krettek
The reason I'm saying it should be disabled by default is that this uses 
hint syntax, and hints should really not change query semantics.


I'm quite strongly against hints that change query semantics, but if we 
disable this by default I would be (reluctantly) OK with the feature. 
Companies that create deployments or set up the SQL environment for 
users can enable the feature if they want.


But yes, I also agree that we don't need whitelisting/blacklisting, 
which makes this a lot easier to do.


Best,
Aljoscha

On 06.04.20 04:27, Danny Chan wrote:

Hi, everyone ~

@Aljoscha @Timo


I think we're designing ourselves into ever more complicated corners

here

I kindly agree that, personally didn't see strong reasons why we should limit 
on each connector properties:

• we can define any table options for CREATE TABLE, why we treat the dynamic 
options differently, we never consider any security problems when create table, 
we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties, the table 
source creation work would be much easier, just used the merged options. There 
is no need to modify each connector to decide which options could be overridden 
and how we merge them(the merge work is redundant).
• @Timo, how about we support another interface 
`TableSourceFactory#Context.getExecutionOptions`, we always use this interface 
to get the options to create our table source. There is no need to copy the 
catalog table itselt, we just need to generate our Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree to default 
disable it, a global default config would break the user experience too much, 
especially when user want to modify the options in a ad-hoc way.



I suggest to remove `TableSourceFactory#supportedHintOptions` or 
`TableSourceFactory#forbiddenHintOptions` based on the fact that we does not 
have black/white list for CREATE TABLE at all at lease for current codebase.


@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 reasons 
compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a blacklist make us 
have the ability to disable some of the keys under that, i.e. 
connector.property.key1 , a whitelist can only match with prefix

2. We want the connectors to have the ability to disable format type switch 
format.type but allows all the other properties, e.g. format.* without 
format.type(let's call it SET_B), if we use the whitelist, we have to enumerate 
all the specific format keys start with format (SET_B), but with the old 
connector factories, we have no idea what specific format keys it 
supports(there is either a format.* or nothing).

3. Except the cases for 1 and 2, for normal keys(no wildcard), the blacklist 
and whitelist has the same expressiveness, use blacklist makes the code not too 
verbose to enumerate all the duplicate keys with #supportedKeys .(Not very 
strong reason, but i think as a connector developer, it makes sense)

Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther ,写道:

Hi everyone,

@Aljoscha: I disagree with your approach because a `Catalog` can return
a custom factory that is not using any properties. The hinting must be
transparent to a factory. We should NOT modify the metadata
`CatalogTable` at any point in time after the catalog.

@Danny, @Jingsong: How about we stick to the original design that we
wanted to vote on but use:

Set supportedHintProperties()

This fits better to the old factory design. And for the new FLIP-95
factories we will use `ConfigOption` and provide good utilities for
merging with hints etc.

We can allow `"format.*"` in `supportedHintProperties()` to allow
hinting in formats.

What do you think?

Regards,
Timo


On 02.04.20 16:24, Aljoscha Krettek wrote:

I think we're designing ourselves into ever more complicated corners
here. Maybe we need to take a step back and reconsider. What would you
think about this (somewhat) simpler proposal:

- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
CONNECTOR_PROPERTIES, depending on what naming we want to have for this
in the future. This will simply overwrite all connector properties, the
table factories don't know about hints but simply work with the
properties that they are given

- this special hint is disabled by default and can be activated with a
global option "foo.bazzle.connector-hints" (or something like this)
which has a warning that describes that this can change query semantics
etc.

That's it. This makes connector implementations a lot easier while still
allowing inline configuration.

I still don't like using hint syntax at all for this, because I strongly
maintain that hints should not change query syntax. In general using
hints should be kept to a minimum because they usually point to
shortcomings in the system.

Best,
Aljoscha

On 02.04.20

[jira] [Created] (FLINK-16997) Add new factory interfaces and utilities

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16997:


 Summary: Add new factory interfaces and utilities
 Key: FLINK-16997
 URL: https://issues.apache.org/jira/browse/FLINK-16997
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Adds the factory interfaces and necessary utilities for discovering and 
configuring connectors.

This issue will provide a reference implementation how factories, connectors, 
and formats play together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16998) Add a changeflag to Row type

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16998:


 Summary: Add a changeflag to Row type
 Key: FLINK-16998
 URL: https://issues.apache.org/jira/browse/FLINK-16998
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Timo Walther
Assignee: Timo Walther


In Blink planner, the change flag of records travelling through the pipeline 
are part of the record itself but not part of the logical schema. This 
simplifies the architecture and API in many cases.

Which is why we aim adopt the same mechanism for {{org.apache.flink.types.Row}}.

Take {{tableEnv.toRetractStream()}} as an example that returns either Scala or 
Java {{Tuple2}}. For FLIP-95 we need to support more update kinds 
than just a binary boolean.

This means:
- Add a changeflag {{RowKind}} to to {{Row}}
- Update the {{Row.toString()}} method
- Update serializers in backwards compatible way



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Robert Metzger
Thanks a lot for preparing another RC!

+1 (binding)

- source archive looks fine (no binaries, copied sources are properly
reported)
- staging repository looks fine (bundled binaries seem documented, versions
are correct)
- *mvn clean install *(mvn clean verify fails, "install" is required) w/
e2e passes locally from source dir




On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai 
wrote:

> FYI -
> There are these open PRs to add blog posts and update the Flink website for
> the Stateful Functions 2.0 release:
> * https://github.com/apache/flink-web/pull/322
> * https://github.com/apache/flink-web/pull/321
>
> On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf 
> wrote:
>
> > +1 (non-binding)
> >
> > ** Functional **
> > - Building from source dist with end-to-end tests enabled (mvn clean
> verify
> > -Prun-e2e-tests) passes (JDK 8)
> > - Flink Harness works in IDE
> > - Building Python SDK dist from source
> >
> > On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > ** Legal **
> > > - checksums and GPG files match corresponding release files
> > > - Source distribution does not contain binaries, contents are sane (no
> > > .git* / .travis* / generated html content files)
> > > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > > font-awesome, jquery dependency in docs and copied sources from
> fastutil
> > (
> > > http://fastutil.di.unimi.it/)
> > > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > > Artifacts that do bundle dependencies are: statefun-flink-distribution,
> > > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > > sources). All non-ASLv2 deps have license files explicitly bundled.
> > > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE and
> > > NOTICE files (no bundled dependencies)
> > > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
> > point
> > > to same version “2.0.0”
> > > - README looks good
> > >
> > > ** Functional **
> > > - Building from source dist with end-to-end tests enabled (mvn clean
> > verify
> > > -Prun-e2e-tests) passes (JDK 8)
> > > - Generated quickstart from archetype looks good (correct POM /
> > Dockerfile
> > > / service file)
> > > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> Python
> > > SDK Walkthrough
> > > - Flink Harness works in IDE
> > > - Test remote functions deployment mode with AWS ecosystem: remote
> Python
> > > functions running in AWS Lambda behind AWS API Gateway, Java embedded
> > > functions running in AWS ECS. Checkpointing enabled, randomly restarted
> > > StateFun workers.
> > >
> > > On Fri, Apr 3, 2020 at 11:48 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the *release candidate #6* for the
> > > > version 2.0.0 of Apache Flink Stateful Functions,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > **Testing Guideline**
> > > >
> > > > You can find here [1] a doc that we can use for collaborating testing
> > > > efforts.
> > > > The listed testing tasks in the doc also serve as a guideline in what
> > to
> > > > test for this release.
> > > > If you wish to take ownership of a testing task, simply put your name
> > > down
> > > > in the "Checked by" field of the task.
> > > >
> > > > **Release Overview**
> > > >
> > > > As an overview, the release consists of the following:
> > > > a) Stateful Functions canonical source distribution, to be deployed
> to
> > > the
> > > > release repository at dist.apache.org
> > > > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > >
> > > > **Staging Areas to Review**
> > > >
> > > > The staging areas containing the above mentioned artifacts are as
> > > follows,
> > > > for your review:
> > > > * All artifacts for a) and b) can be found in the corresponding dev
> > > > repository at dist.apache.org [2]
> > > > * All artifacts for c) can be found at the Apache Nexus Repository
> [3]
> > > >
> > > > All artifacts are signed with the
> > > > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> > > >
> > > > Other links for your review:
> > > > * JIRA release notes [5]
> > > > * source code tag "release-2.0.0-rc6" [6] [7]
> > > > * PR to update the website Downloads page to include Stateful
> Functions
> > > > links [8]
> > > >
> > > > **Extra Remarks**
> > > >
> > > > * Part of the release is also official Docker images for Stateful
> > > > Functions. This can be a separate process, since the creation of
> those
> > > > relies on the fact that we have distribution jars already deployed to
> > > > Maven. I will follow-up with this after these artifacts are
> officially
> > > > released.
> > > > * The Flink Website and blog post is also being worked on (by Ma

[jira] [Created] (FLINK-16999) Data structure should cover all conversions declared in logical types

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-16999:


 Summary: Data structure should cover all conversions declared in 
logical types
 Key: FLINK-16999
 URL: https://issues.apache.org/jira/browse/FLINK-16999
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


In order to ensure that we don't loose any type precision or conversion class 
information in sources and sinks, this issue will add a type integrity test for 
data structure converters. Also UDFs will benefit from this test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17000) Type information in sources should cover all data structures

2020-04-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-17000:


 Summary: Type information in sources should cover all data 
structures
 Key: FLINK-17000
 URL: https://issues.apache.org/jira/browse/FLINK-17000
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


In order to ensure that we don't loose any type information in sources, this 
issue will add a type integrity test for type information converters. 

See {{ScanTableSource#Context#createTypeInformation(DataType)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-111: Docker image unification

2020-04-06 Thread Niels Basjes
Hi all,

Sorry for jumping in at this late point of the discussion.
I see a lot of things I really like and I would like to put my "needs" and
observations here too so you take them into account (where possible).
I suspect that there will be overlap with things you already have taken
into account.

   1. No more 'flink:latest' docker image tag.
   Related to https://issues.apache.org/jira/browse/FLINK-15794
   What I have learned is that the 'latest' version of a docker image only
   makes sense IFF this is an almost standalone thing.
   So if I have a servlet that does something in isolation (like my hobby
   project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest' makes
   sense.
   With Flink you have the application code and all nodes in the cluster
   that are depending on each other and as such must run the exact same
   versions of the base software.
   So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...) where
   the application and the nodes inter communicate and closely depend on each
   other then 'latest' is a bad idea.
  1. Assume I have an application built against the Flink N api and the
  cluster downloads the latest which is also Flink N.
  Then a week later Flink N+1 is released and the API I use changes
  (Deprecated)
  and a while later Flink N+2 is released and the deprecated API is
  removed: Then my application no longer works even though I have
not changed
  anything.
  So I want my application to be 'pinned' to the exact version I built
  it with.
  2. I have a running cluster with my application and cluster running
  Flink N.
  I add some additional nodes and the new nodes pick up the Flink N+1
  image ... now I have a cluster with mixed versions.
  3. The version of flink is really the "Flink+Scala" version pair.
  If you have the right flink but the wrong scala you get really nasty
  errors: https://issues.apache.org/jira/browse/FLINK-16289

  2. Deploy SNAPSHOT docker images (i.e. something like
   *flink:1.11-SNAPSHOT_2.12*) .
   More and more use cases will be running on the code delivered via Docker
   images instead of bare jar files.
   So if a "SNAPSHOT" is released and deployed into a 'staging' maven repo
   (which may be locally on the developers workstation) then it is my opinion
   that at the same moment a "SNAPSHOT" docker image should be
   created/deployed.
   Each time a "SNAPSHOT" docker image is released this will overwrite the
   previous "SNAPSHOT".
   If the final version is released the SNAPSHOTs of that version
   can/should be removed.
   This will make testing in clusters a lot easier.
   Also building a local fix and then running it locally will work without
   additional modifications to the code.

   3. Support for a 'single application cluster'
   I've been playing around with the S3 plugin and what I have found is
   that this essentially requires all nodes to have full access to the
   credentials needed to connect to S3.
   This essentially means that a multi-tenant setup is not possible in
   these cases.
   So I think the single application cluster should be a feature available
   in all cases.

   4. I would like a native-kubernetes-single-application base image.
   I can then create a derived image where I only add the jar of my
   application.
   My desire is that I can then create a k8s yaml file for kubectl
   that adds the needed configs/secrets/arguments/environment variables and
   starts the cluster and application.
   Because the native kubernetes support makes it automatically scale based
   on the application this should 'just work'.

Additional note:

   1. Job/Task attempt logging instead of task manager logging.
   *I realize this has nothing to do with the docker images*
   I found something "hard to work with" while running some tests last week.
   The logging is done to a single log for the task manager.
   So if I have multiple things running in the single task manager then the
   logs are mixed together.
   Also several attempts of the same task are mixed which makes it very
   hard to find out 'what went wrong'.



On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi  wrote:

> Thanks for the summary, Andrey. Good idea to link Patrick's document from
> the FLIP as a future direction so it doesn't get lost. Could you make sure
> to revive that discussion when FLIP-111 nears an end?
>
> This is good to go on my part. +1 to start the VOTE.
>
>
> @Till, @Yang: Thanks for the clarification with the output redirection. I
> didn't see that. The concern with the `tee` approach is that the file would
> grow indefinitely. I think we can solve this with regular logging by
> redirecting stderr to ERROR log level, but I'm not sure. We can look at a
> potential solution when we get to that point. :-)
>
>
>
> On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin 
> wrote:
>
> > Hi everyone,
> >
> > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> >
> 

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

2020-04-06 Thread Andrey Zagrebin
Hi guys,

Thanks for more details Zhijiang.
It also looks to me that mapped memory size is mostly driven by OS limits
and bit-ness of JVM (32/64).

Thinking more about the 'Metrics' tab layout, couple of more things have
come into my mind.

# 'Metrics' tab -> 'Memory': 'Metrics' and 'Configuration' tabs

It contains only memory specific things and the design suggests not only
metrics but configuration as well.
Moreover, there are other metrics on top which are not in the metrics tab.
Therefore, I would name it 'Memory' and then add sub-tabs: e.g. 'Metrics'
and 'Configuration' tab.
Alternatively, one could consider splitting 'Metrics' into 'Metrics' and
'Configuration' tabs.

# Metrics (a bit different structure)

I would put memory metrics into 4 groups:
- JVM Memory
- Managed
- Network
- Garbage collection

Alternatively, one could consider:
- Managed by JVM (same as JVM Memory)
- Managed by Flink (Managed Segments and Network buffers)
- Garbage collection

## Total memory (remove from metrics)

As mentioned in the discussions before, it is hard to measure the total
memory usage.
Therefore, I would put into the configuration tab, see below.

## JVM Memory

Here we can have Heap, Non-Heap, Direct and mapped because they are all
managed by JVM.
Heap and direct can stay as they are.

### Non-Heap (could stay for now)

I think it is ok to keep Non-Heap for now because we had it also before.
This metric does not correlate explicitly with FLIP-49 but it is exposed by
JVM.
Once, we find better things to show (related only to JVM, e.g. Metaspace
etc), we can reconsider this as a follow-up.

### Mapped (looks still valuable)

As I understand at the moment, this can have a value for users to monitor
spilling of batch partitions.

### Metaspace (new, sub-component of Non-Heap, follow-up)

We have never had anything for the Metaspace. The recent experience shows
that it can be useful.
I would put it on road map as a follow-up though, because it also needs
some research and preparation on server side [1].

# Configuration (see Flink user docs picture)

We already have a picture in the docs representing memory components in
Flink [2].
The layout in this picture can be also used in this FLIP to depict the
actual configuration.
This would be more clear for users to see the same as we have in docs.

The configuration can also depict size of the total process and total Flink
memory according to docs.

As mentioned above, I also suggest to put it into a separate tab.

Best,
Andrey

[1]
https://kb.novaordis.com/index.php/Memory_Monitoring_and_Management_Platform_MBeans#Metaspace
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview


On Wed, Apr 1, 2020 at 8:03 PM Zhijiang 
wrote:

> Thanks for the FLIP, Yadong. In general I think this work is valuable for
> users to better understand the Flink's memory usages in different
> dimensions.
>
> Sorry for not going through every detailed discussions below, and I try to
> do that later if possible. Firstly I try to answer some Andrey's concerns
> with mmap.
>
> > - I do not know how the mapped memory works. Is it meant for the new
> spilled partitions? If the mapped memory also pulls from the direct
> > memory limit then this is something we do not account in our network
> buffers as I understand. In this case, this metric may be useful for tuning
> to understand
> > how much the mapped memory uses from the direct memory limit to set e.g.
> framework off-heap limit correctly and avoid direct OOM.
> > It could be something to discuss with Zhijiang. e.g. is the direct
> memory used there to buffer fetched regions of partition files or what for?
>
> Yes, the mapped memory is used in bounded blocking partition for batch
> jobs now, but not the default mode.
>
>  AIK it is not related and limited to the setting of `MaxDirectMemory`, so
> we do not need to worry about the current direct memory setting and the
> potential OOM issue.
> It is up to the address space to determine the mapped file size, and in 64
> bit system we can regard the limitless size in theory.
>
> Regarding the size of mapped buffer pool from MXBean, it only indicates
> how much file size were already mapped before, even it is unchanged to not
> reflect the real
> physical memory use. E.g. when the file was mapped 100GB region at the
> beginning, the mapped buffer pool from MXBean would be 100GB. But how many
> physical
> memories are really consumed is up to the specific read or write
> operations in practice, and also controlled by the operator system. E.g
> some unused regions might be
> exchanged into SWAP virtual memory when physical memory is limited.
>
> From this point, I guess it is no meaningful to show the size of mapped
> buffer pool for users who may be more concerned with how many physical
> memories are really
> used.
>
> Best,
> Zhijiang
>
>
> --
> From:Andrey Zagrebin 
> Send Time:2020 Mar. 

[DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Dawid Wysakowicz
Hi devs,

When working on improving the Table API/SQL connectors we faced a few
shortcomings of the DeserializationSchema and SerializationSchema
interfaces. Similar features were also mentioned by other users in the
past. The shortcomings I would like to address with the FLIP include:

  * Emitting 0 to m records from the deserialization schema with per
partition watermarks
  o https://github.com/apache/flink/pull/3314#issuecomment-376237266
  o differentiate null value from no value
  o support for Debezium CDC format

(https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL)

  * A way to initialize the schema
  o establish external connections
  o generate code on startup
  o no need for lazy initialization

  * Access to metrics

[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329]

One important aspect I would like to hear your opinion on is how to
support the Collector interface in Kafka source. Of course if we agree
to add the Collector to the DeserializationSchema.

The FLIP can be found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988&src=contextnavpagetreemode

Looking forward to your feedback.

Best,

Dawid



signature.asc
Description: OpenPGP digital signature


[RESULT][VOTE] FLIP-110: Support LIKE clause in CREATE TABLE

2020-04-06 Thread Dawid Wysakowicz
Hi all,

The voting time for FLIP-110 has passed. I'm closing the vote now.

There were 5 +1 votes, 4 of which are binding:

- Timo (binding)

- Jingsong (binding)

- Danny (non-binding)

- Jark (binding)

- Aljosha (binding)


There were no disapproving votes.

Thus, FLIP-110 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

Best,
Dawid




signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-17001) Support LIKE clause in CREATE TABLE

2020-04-06 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-17001:


 Summary: Support LIKE clause in CREATE TABLE
 Key: FLINK-17001
 URL: https://issues.apache.org/jira/browse/FLINK-17001
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
 Fix For: 1.11.0


Umbrella issue for all tasks related to supporting the CREATE TABLE ... LIKE 
syntax as described in 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17002) Support creating tables using other tables definition

2020-04-06 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-17002:


 Summary: Support creating tables using other tables definition
 Key: FLINK-17002
 URL: https://issues.apache.org/jira/browse/FLINK-17002
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


We should be able to create a Table based on properties of other tables. This 
includes merging the properties and creating a new Table based on that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17003) Support parsing LIKE clause in CREATE TABLE statement

2020-04-06 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-17003:


 Summary: Support parsing LIKE clause in CREATE TABLE statement
 Key: FLINK-17003
 URL: https://issues.apache.org/jira/browse/FLINK-17003
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


We should support the CREATE TABLE ... LIKE syntax in SqlParser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17004) Document the CREATE TABLE ... LIKE syntax in english

2020-04-06 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-17004:


 Summary: Document the CREATE TABLE ... LIKE syntax in english
 Key: FLINK-17004
 URL: https://issues.apache.org/jira/browse/FLINK-17004
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


Document the CREATE TABLE ... LIKE syntax in the Flink's documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17005) Translate the CREATE TABLE ... LIKE syntax documentation to chinese

2020-04-06 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-17005:


 Summary: Translate the CREATE TABLE ... LIKE syntax documentation 
to chinese
 Key: FLINK-17005
 URL: https://issues.apache.org/jira/browse/FLINK-17005
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Dawid Wysakowicz


Translate the page created in FLINK-17004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17006) AggregateITCase.testDistinctGroupBy fails with FileNotFoundException (in Rocksdb)

2020-04-06 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17006:
--

 Summary: AggregateITCase.testDistinctGroupBy fails with 
FileNotFoundException (in Rocksdb)
 Key: FLINK-17006
 URL: https://issues.apache.org/jira/browse/FLINK-17006
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends, Table SQL / Runtime, Tests
Affects Versions: 1.11.0
Reporter: Robert Metzger


CI run: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7045&view=logs&j=e25d5e7e-2a9c-5589-4940-0b638d75a414&t=294c2388-20e6-57a2-5721-91db544b1e69
Log output:
{code}
2020-04-03T17:17:44.4036304Z [ERROR] Tests run: 234, Failures: 0, Errors: 1, 
Skipped: 6, Time elapsed: 155.577 s <<< FAILURE! - in 
org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase
2020-04-03T17:17:44.4038781Z [ERROR] testDistinctGroupBy[LocalGlobal=OFF, 
MiniBatch=ON, 
StateBackend=ROCKSDB](org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase)
  Time elapsed: 0.456 s  <<< ERROR!
2020-04-03T17:17:44.4040384Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2020-04-03T17:17:44.4041520Zat 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
2020-04-03T17:17:44.4042712Zat 
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:659)
2020-04-03T17:17:44.4043972Zat 
org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:77)
2020-04-03T17:17:44.4045540Zat 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1644)
2020-04-03T17:17:44.4047015Zat 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1626)
2020-04-03T17:17:44.4048576Zat 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:673)
2020-04-03T17:17:44.4050073Zat 
org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testDistinctGroupBy(AggregateITCase.scala:172)
2020-04-03T17:17:44.4051200Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-04-03T17:17:44.4052171Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-04-03T17:17:44.4053308Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-04-03T17:17:44.4054322Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-04-03T17:17:44.4055410Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-04-03T17:17:44.4056570Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-04-03T17:17:44.4057800Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-04-03T17:17:44.4059019Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-04-03T17:17:44.4060178Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2020-04-03T17:17:44.4061261Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-04-03T17:17:44.4062617Zat 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
2020-04-03T17:17:44.4063782Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2020-04-03T17:17:44.4064838Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-04-03T17:17:44.4065742Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-04-03T17:17:44.4066636Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-04-03T17:17:44.4067762Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-04-03T17:17:44.4068895Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-04-03T17:17:44.4069978Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-04-03T17:17:44.4070920Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-04-03T17:17:44.4071901Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-04-03T17:17:44.4072875Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-04-03T17:17:44.4073850Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-04-03T17:17:44.4074854Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-04-03T17:17:44.4075729Zat 
org.junit.runners.Suite.runChild(Suite.java:128)
2020-04-03T17:17:44.4076541Zat 
org.junit.runners.Suite.runChild(Suite.java:27)
2020-04-03T17:17:44.4077479Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-04-03T17:17:44.4078422Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)

Re: [DISCUSS] FLIP-111: Docker image unification

2020-04-06 Thread Till Rohrmann
Thanks for the feedback Niels. This is very helpful.

1. I agree `flink:latest` is nice to get started but in the long run people
will want to pin their dependencies to a specific Flink version. I think
the fix will happen as part of FLINK-15794.

2. SNAPSHOT docker images will be really helpful for developers as well as
users who want to use the latest features. I believe that this will be a
follow-up of this FLIP.

3. The goal of FLIP-111 is to create an image which allows to start a
session as well as job cluster. Hence, I believe that we will solve this
problem soon.

4. Same as 3. The new image will also contain the native K8s integration so
that there is no need to create a special image modulo the artifacts you
want to add.

Additional notes:

1. I agree that one log makes it harder to separate different execution
attempts or different tasks. However, on the other hand, it gives you an
overall picture of what's happening in a Flink process. If things were
split apart, then it might become super hard to detect problems in the
runtime which affect the user code to fail or vice versa, for example. In
general cross correlation will be harder. I guess a solution could be to
make this configurable. In any case, we should move the discussion about
this topic into a separate thread.

Cheers,
Till

On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes  wrote:

> Hi all,
>
> Sorry for jumping in at this late point of the discussion.
> I see a lot of things I really like and I would like to put my "needs" and
> observations here too so you take them into account (where possible).
> I suspect that there will be overlap with things you already have taken
> into account.
>
>1. No more 'flink:latest' docker image tag.
>Related to https://issues.apache.org/jira/browse/FLINK-15794
>What I have learned is that the 'latest' version of a docker image only
>makes sense IFF this is an almost standalone thing.
>So if I have a servlet that does something in isolation (like my hobby
>project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> makes
>sense.
>With Flink you have the application code and all nodes in the cluster
>that are depending on each other and as such must run the exact same
>versions of the base software.
>So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...) where
>the application and the nodes inter communicate and closely depend on
> each
>other then 'latest' is a bad idea.
>   1. Assume I have an application built against the Flink N api and the
>   cluster downloads the latest which is also Flink N.
>   Then a week later Flink N+1 is released and the API I use changes
>   (Deprecated)
>   and a while later Flink N+2 is released and the deprecated API is
>   removed: Then my application no longer works even though I have
> not changed
>   anything.
>   So I want my application to be 'pinned' to the exact version I built
>   it with.
>   2. I have a running cluster with my application and cluster running
>   Flink N.
>   I add some additional nodes and the new nodes pick up the Flink N+1
>   image ... now I have a cluster with mixed versions.
>   3. The version of flink is really the "Flink+Scala" version pair.
>   If you have the right flink but the wrong scala you get really nasty
>   errors: https://issues.apache.org/jira/browse/FLINK-16289
>
>   2. Deploy SNAPSHOT docker images (i.e. something like
>*flink:1.11-SNAPSHOT_2.12*) .
>More and more use cases will be running on the code delivered via Docker
>images instead of bare jar files.
>So if a "SNAPSHOT" is released and deployed into a 'staging' maven repo
>(which may be locally on the developers workstation) then it is my
> opinion
>that at the same moment a "SNAPSHOT" docker image should be
>created/deployed.
>Each time a "SNAPSHOT" docker image is released this will overwrite the
>previous "SNAPSHOT".
>If the final version is released the SNAPSHOTs of that version
>can/should be removed.
>This will make testing in clusters a lot easier.
>Also building a local fix and then running it locally will work without
>additional modifications to the code.
>
>3. Support for a 'single application cluster'
>I've been playing around with the S3 plugin and what I have found is
>that this essentially requires all nodes to have full access to the
>credentials needed to connect to S3.
>This essentially means that a multi-tenant setup is not possible in
>these cases.
>So I think the single application cluster should be a feature available
>in all cases.
>
>4. I would like a native-kubernetes-single-application base image.
>I can then create a derived image where I only add the jar of my
>application.
>My desire is that I can then create a k8s yaml file for kubectl
>that adds the needed configs/secrets/arguments/environment v

Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Timo Walther

Hi Dawid,

thanks for this FLIP. This solves a lot of issues with the current 
design for both the Flink contributors and users. +1 for this.


Some minor suggestions from my side:
- How about finding something shorter for `InitializationContext`? Maybe 
just `OpenContext`?
- While introducing default methods for existing interfaces, shall we 
also create contexts for those methods? I see the following method in 
your FLIP and wonder if we can reduce the number of parameters while 
introducing a new method:


deserialize(
byte[] recordValue,
String partitionKey,
String seqNum,
long approxArrivalTimestamp,
String stream,
String shardId,
Collector out)

to:

deserialize(
byte[] recordValue,
Context c,
Collector out)

What do you think?

Regards,
Timo



On 06.04.20 11:08, Dawid Wysakowicz wrote:

Hi devs,

When working on improving the Table API/SQL connectors we faced a few 
shortcomings of the DeserializationSchema and SerializationSchema 
interfaces. Similar features were also mentioned by other users in the 
past. The shortcomings I would like to address with the FLIP include:


  * Emitting 0 to m records from the deserialization schema with per
partition watermarks
  o https://github.com/apache/flink/pull/3314#issuecomment-376237266
  o differentiate null value from no value
  o support for Debezium CDC format

(https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL)

  * A way to initialize the schema
  o establish external connections
  o generate code on startup
  o no need for lazy initialization

  * Access to metrics

[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329]

One important aspect I would like to hear your opinion on is how to 
support the Collector interface in Kafka source. Of course if we agree 
to add the Collector to the DeserializationSchema.


The FLIP can be found here: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988&src=contextnavpagetreemode


Looking forward to your feedback.

Best,

Dawid





[jira] [Created] (FLINK-17007) Add section "How to handle application parameters" in DataStream documentation

2020-04-06 Thread Congxian Qiu(klion26) (Jira)
Congxian Qiu(klion26) created FLINK-17007:
-

 Summary: Add section "How to handle application parameters" in 
DataStream documentation
 Key: FLINK-17007
 URL: https://issues.apache.org/jira/browse/FLINK-17007
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Congxian Qiu(klion26)
 Fix For: 1.11.0


This issue wants to add a section “How to handle application parameters” in the 
DataStream page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-119: Pipelined Region Scheduling

2020-04-06 Thread Gary Yao
Hi all,

The voting time for FLIP-119 has passed. I am closing the vote now.

There were 6 +1 votes, 4 of which are binding:
- Till (binding)
- Zhijiang (binding)
- Xintong (non-binding)
- Yangze (non-binding)
- Kurt (binding)
- Zhu Zhu (binding)

There were no disapproving votes.

Thus, FLIP-119 has been accepted.

Best,
Gary

On Fri, Apr 3, 2020 at 12:02 PM Zhu Zhu  wrote:

> +1 (binding)
>
> Thanks,
> Zhu Zhu
>
> Kurt Young  于2020年4月3日周五 下午5:25写道:
>
> > +1 (binding)
> >
> > Best,
> > Kurt
> >
> >
> > On Fri, Apr 3, 2020 at 4:02 PM Yangze Guo  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Apr 3, 2020 at 2:11 PM Xintong Song 
> > wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 1:18 PM Zhijiang  > > .invalid>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Best,
> > > > > Zhijiang
> > > > >
> > > > >
> > > > > --
> > > > > From:Till Rohrmann 
> > > > > Send Time:2020 Apr. 2 (Thu.) 23:09
> > > > > To:dev 
> > > > > Cc:zhuzh 
> > > > > Subject:Re: [VOTE] FLIP-119: Pipelined Region Scheduling
> > > > >
> > > > > +1
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Mar 31, 2020 at 5:52 PM Gary Yao  wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to start the vote for FLIP-119 [1], which is
> discussed
> > > and
> > > > > > reached a consensus in the discussion thread [2].
> > > > > >
> > > > > > The vote will be open until April 3 (72h) unless there is an
> > > objection
> > > > > > or not enough votes.
> > > > > >
> > > > > > Best,
> > > > > > Gary
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-119-Pipelined-Region-Scheduling-tp39350.html
> > > > > >
> > > > >
> > > > >
> > >
> >
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread Timo Walther

Hi Godfrey,

did you see my remaining feedback in the discussion thread? We could 
finish this FLIP if this gets resolved.


Thanks,
Timo

On 03.04.20 15:12, Terry Wang wrote:

+1 (non-binding)
Looks great to me, Thanks for driving on this.

Best,
Terry Wang




2020年4月3日 21:07,godfrey he  写道:

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, which is discussed and
reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 6, 2020 13:10 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey

godfrey he  于2020年3月31日周二 下午8:42写道:


Hi, Timo

So sorry about that, I'm in a little hurry. Let's wait for 24h.

Best,
Godfrey

Timo Walther  于2020年3月31日周二 下午5:26写道:


-1

The current discussion has not completed. The last comments were sent
less than 24h ago.

Let's wait a bit longer to collect feedback from all stakeholders.

Thanks,
Timo

On 31.03.20 08:31, godfrey he wrote:

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, because we have some
feedbacks. The feedbacks are all about new introduced methods, here is

the

discussion thread [2].

The vote will be open for at least 72 hours. Unless there is an

objection,

I will try to close it by Apr 3, 2020 06:30 UTC if we have received
sufficient votes.


[1]


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878


[2]


http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html



Bests,
Godfrey








[DISCUSS] Consolidated log4j2-properties file

2020-04-06 Thread Chesnay Schepler

Hello,

I discovered a handy trick that would allow us to share a single 
log4j2-test.properties across all modules.


https://github.com/apache/flink/pull/11634

The file would exist in flink-test-utils-junit/src/main/resources, and 
be used for all modules except the kafka connectors and yarn-tests 
(because they have some custom requirements).


This would mean the files can no longer go out of sync, utilities can be 
shared more easily, and you wouldn't need to add a new properties file 
to new modules (or older ones lacking one) during debugging.


Overall I personally quite, but I have heard some concerns about 
changing dev routines so I wanted to double-check what people think in 
general.




Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Igal Shilman
+1 (non binding)

legal / source:
- downloaded and verified the signature
- verified that pom and versions in the docs match
- no binary files in the distribution
- built and run e2e test with Java 8 and Java 11
- created a project from a maven archetype.

functional:
- run all the examples
- deployed to Python greeter example to k8s
- enabled checkpointing, created an application with two Python functions,
that send both local and remote messages, restarted TMs randomly and
verified
the sequential output in the output kafka topic (exactly once test)
-  run the harness tests
-  run the ridesharing example in paraliisim 10 overnight
-  created a savepoint with the state bootstrapping tool and
successfully started a job from that.

Kind regards,
Igal

On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger  wrote:

> Thanks a lot for preparing another RC!
>
> +1 (binding)
>
> - source archive looks fine (no binaries, copied sources are properly
> reported)
> - staging repository looks fine (bundled binaries seem documented, versions
> are correct)
> - *mvn clean install *(mvn clean verify fails, "install" is required) w/
> e2e passes locally from source dir
>
>
>
>
> On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > FYI -
> > There are these open PRs to add blog posts and update the Flink website
> for
> > the Stateful Functions 2.0 release:
> > * https://github.com/apache/flink-web/pull/322
> > * https://github.com/apache/flink-web/pull/321
> >
> > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
> konstan...@ververica.com>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > ** Functional **
> > > - Building from source dist with end-to-end tests enabled (mvn clean
> > verify
> > > -Prun-e2e-tests) passes (JDK 8)
> > > - Flink Harness works in IDE
> > > - Building Python SDK dist from source
> > >
> > > On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > ** Legal **
> > > > - checksums and GPG files match corresponding release files
> > > > - Source distribution does not contain binaries, contents are sane
> (no
> > > > .git* / .travis* / generated html content files)
> > > > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > > > font-awesome, jquery dependency in docs and copied sources from
> > fastutil
> > > (
> > > > http://fastutil.di.unimi.it/)
> > > > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > > > Artifacts that do bundle dependencies are:
> statefun-flink-distribution,
> > > > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > > > sources). All non-ASLv2 deps have license files explicitly bundled.
> > > > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE
> and
> > > > NOTICE files (no bundled dependencies)
> > > > - All POMs / README / Python SDK setup.py / Dockerfiles / doc configs
> > > point
> > > > to same version “2.0.0”
> > > > - README looks good
> > > >
> > > > ** Functional **
> > > > - Building from source dist with end-to-end tests enabled (mvn clean
> > > verify
> > > > -Prun-e2e-tests) passes (JDK 8)
> > > > - Generated quickstart from archetype looks good (correct POM /
> > > Dockerfile
> > > > / service file)
> > > > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> > Python
> > > > SDK Walkthrough
> > > > - Flink Harness works in IDE
> > > > - Test remote functions deployment mode with AWS ecosystem: remote
> > Python
> > > > functions running in AWS Lambda behind AWS API Gateway, Java embedded
> > > > functions running in AWS ECS. Checkpointing enabled, randomly
> restarted
> > > > StateFun workers.
> > > >
> > > > On Fri, Apr 3, 2020 at 11:48 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on the *release candidate #6* for the
> > > > > version 2.0.0 of Apache Flink Stateful Functions,
> > > > > as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > > **Testing Guideline**
> > > > >
> > > > > You can find here [1] a doc that we can use for collaborating
> testing
> > > > > efforts.
> > > > > The listed testing tasks in the doc also serve as a guideline in
> what
> > > to
> > > > > test for this release.
> > > > > If you wish to take ownership of a testing task, simply put your
> name
> > > > down
> > > > > in the "Checked by" field of the task.
> > > > >
> > > > > **Release Overview**
> > > > >
> > > > > As an overview, the release consists of the following:
> > > > > a) Stateful Functions canonical source distribution, to be deployed
> > to
> > > > the
> > > > > release repository at dist.apache.org
> > > > > b) Stateful Functions Python SDK distributions to be deployed to
> PyPI
> > > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > > >
> > > > > **Staging Areas to Review**

Re: [DISCUSS] Consolidated log4j2-properties file

2020-04-06 Thread Till Rohrmann
Hi Chesnay,

thanks for kicking this discussion off. I agree that deduplicating code is
in general a good idea.

The main benefit seems to be that all modules inherit a
log4j2-test.properties file and that this file allows to control the
logging output for several modules.

The main drawback I see is that it complicates the debugging process for
our devs. If you want to debug a problem and need logging for this, then
you have to know that there is a log4j2-test.properties file in
flink-test-utils-junit which you can tweak. At the moment, it is quite
straight forward as you simply go to the module which contains the test and
check the resources folder.

If we are ok with this drawback and document the change properly, then I'm
fine with this change.

Cheers,
Till

On Mon, Apr 6, 2020 at 12:24 PM Chesnay Schepler  wrote:

> Hello,
>
> I discovered a handy trick that would allow us to share a single
> log4j2-test.properties across all modules.
>
> https://github.com/apache/flink/pull/11634
>
> The file would exist in flink-test-utils-junit/src/main/resources, and
> be used for all modules except the kafka connectors and yarn-tests
> (because they have some custom requirements).
>
> This would mean the files can no longer go out of sync, utilities can be
> shared more easily, and you wouldn't need to add a new properties file
> to new modules (or older ones lacking one) during debugging.
>
> Overall I personally quite, but I have heard some concerns about
> changing dev routines so I wanted to double-check what people think in
> general.
>
>


[jira] [Created] (FLINK-17008) Handle null value for HADOOP_CONF_DIR/HADOOP_HOME env in AbstractKubernetesParameters#getLocalHadoopConfigurationDirectory

2020-04-06 Thread Canbin Zheng (Jira)
Canbin Zheng created FLINK-17008:


 Summary: Handle null value for HADOOP_CONF_DIR/HADOOP_HOME env in 
AbstractKubernetesParameters#getLocalHadoopConfigurationDirectory
 Key: FLINK-17008
 URL: https://issues.apache.org/jira/browse/FLINK-17008
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Reporter: Canbin Zheng
 Fix For: 1.11.0


{{System.getenv(Constants.ENV_HADOOP_CONF_DIR)}} or 
{{System.getenv(Constants.HADOOP_HOME)}} could return null value if they are 
not set in the System environments, it's a minor improvement to take these 
situations into consideration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Stephan Ewen
+1 (binding)

  - contents of staging directory looks correct
  - checked license / notice files of source distribution
  - checked license of "statefun-flink-distribution"
  - built from source, ran all tests successfully (WSL)
  - built and checked the docs (WSL / Docker setup)


On Mon, Apr 6, 2020 at 12:49 PM Igal Shilman  wrote:

> +1 (non binding)
>
> legal / source:
> - downloaded and verified the signature
> - verified that pom and versions in the docs match
> - no binary files in the distribution
> - built and run e2e test with Java 8 and Java 11
> - created a project from a maven archetype.
>
> functional:
> - run all the examples
> - deployed to Python greeter example to k8s
> - enabled checkpointing, created an application with two Python functions,
> that send both local and remote messages, restarted TMs randomly and
> verified
> the sequential output in the output kafka topic (exactly once test)
> -  run the harness tests
> -  run the ridesharing example in paraliisim 10 overnight
> -  created a savepoint with the state bootstrapping tool and
> successfully started a job from that.
>
> Kind regards,
> Igal
>
> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> wrote:
>
> > Thanks a lot for preparing another RC!
> >
> > +1 (binding)
> >
> > - source archive looks fine (no binaries, copied sources are properly
> > reported)
> > - staging repository looks fine (bundled binaries seem documented,
> versions
> > are correct)
> > - *mvn clean install *(mvn clean verify fails, "install" is required) w/
> > e2e passes locally from source dir
> >
> >
> >
> >
> > On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai 
> > wrote:
> >
> > > FYI -
> > > There are these open PRs to add blog posts and update the Flink website
> > for
> > > the Stateful Functions 2.0 release:
> > > * https://github.com/apache/flink-web/pull/322
> > > * https://github.com/apache/flink-web/pull/321
> > >
> > > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
> > konstan...@ververica.com>
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > ** Functional **
> > > > - Building from source dist with end-to-end tests enabled (mvn clean
> > > verify
> > > > -Prun-e2e-tests) passes (JDK 8)
> > > > - Flink Harness works in IDE
> > > > - Building Python SDK dist from source
> > > >
> > > > On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > ** Legal **
> > > > > - checksums and GPG files match corresponding release files
> > > > > - Source distribution does not contain binaries, contents are sane
> > (no
> > > > > .git* / .travis* / generated html content files)
> > > > > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > > > > font-awesome, jquery dependency in docs and copied sources from
> > > fastutil
> > > > (
> > > > > http://fastutil.di.unimi.it/)
> > > > > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > > > > Artifacts that do bundle dependencies are:
> > statefun-flink-distribution,
> > > > > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > > > > sources). All non-ASLv2 deps have license files explicitly bundled.
> > > > > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE
> > and
> > > > > NOTICE files (no bundled dependencies)
> > > > > - All POMs / README / Python SDK setup.py / Dockerfiles / doc
> configs
> > > > point
> > > > > to same version “2.0.0”
> > > > > - README looks good
> > > > >
> > > > > ** Functional **
> > > > > - Building from source dist with end-to-end tests enabled (mvn
> clean
> > > > verify
> > > > > -Prun-e2e-tests) passes (JDK 8)
> > > > > - Generated quickstart from archetype looks good (correct POM /
> > > > Dockerfile
> > > > > / service file)
> > > > > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> > > Python
> > > > > SDK Walkthrough
> > > > > - Flink Harness works in IDE
> > > > > - Test remote functions deployment mode with AWS ecosystem: remote
> > > Python
> > > > > functions running in AWS Lambda behind AWS API Gateway, Java
> embedded
> > > > > functions running in AWS ECS. Checkpointing enabled, randomly
> > restarted
> > > > > StateFun workers.
> > > > >
> > > > > On Fri, Apr 3, 2020 at 11:48 AM Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the *release candidate #6* for the
> > > > > > version 2.0.0 of Apache Flink Stateful Functions,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > **Testing Guideline**
> > > > > >
> > > > > > You can find here [1] a doc that we can use for collaborating
> > testing
> > > > > > efforts.
> > > > > > The listed testing tasks in the doc also serve as a guideline in
> > what
> > > > to
> > > > > > test for this release.
> >

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Seth Wiesman
+1 (non-binding)

legal / source
- checked sources for binary files
- checked license headers

functional
- built from source (mvn clean verify -Prun-e2e-tests)
- built python sdk and ran tests
- ran examples
- deployed mixed python / java application on k8s with checkpointing.
Failed TM's and watched it recover.
- deployed application on Flink session cluster
- created a savepoint using the bootstrap api and successfully used it to
start an application.

Seth

On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman  wrote:

> +1 (non binding)
>
> legal / source:
> - downloaded and verified the signature
> - verified that pom and versions in the docs match
> - no binary files in the distribution
> - built and run e2e test with Java 8 and Java 11
> - created a project from a maven archetype.
>
> functional:
> - run all the examples
> - deployed to Python greeter example to k8s
> - enabled checkpointing, created an application with two Python functions,
> that send both local and remote messages, restarted TMs randomly and
> verified
> the sequential output in the output kafka topic (exactly once test)
> -  run the harness tests
> -  run the ridesharing example in paraliisim 10 overnight
> -  created a savepoint with the state bootstrapping tool and
> successfully started a job from that.
>
> Kind regards,
> Igal
>
> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> wrote:
>
> > Thanks a lot for preparing another RC!
> >
> > +1 (binding)
> >
> > - source archive looks fine (no binaries, copied sources are properly
> > reported)
> > - staging repository looks fine (bundled binaries seem documented,
> versions
> > are correct)
> > - *mvn clean install *(mvn clean verify fails, "install" is required) w/
> > e2e passes locally from source dir
> >
> >
> >
> >
> > On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai 
> > wrote:
> >
> > > FYI -
> > > There are these open PRs to add blog posts and update the Flink website
> > for
> > > the Stateful Functions 2.0 release:
> > > * https://github.com/apache/flink-web/pull/322
> > > * https://github.com/apache/flink-web/pull/321
> > >
> > > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
> > konstan...@ververica.com>
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > ** Functional **
> > > > - Building from source dist with end-to-end tests enabled (mvn clean
> > > verify
> > > > -Prun-e2e-tests) passes (JDK 8)
> > > > - Flink Harness works in IDE
> > > > - Building Python SDK dist from source
> > > >
> > > > On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > ** Legal **
> > > > > - checksums and GPG files match corresponding release files
> > > > > - Source distribution does not contain binaries, contents are sane
> > (no
> > > > > .git* / .travis* / generated html content files)
> > > > > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > > > > font-awesome, jquery dependency in docs and copied sources from
> > > fastutil
> > > > (
> > > > > http://fastutil.di.unimi.it/)
> > > > > - Bundled LICENSEs and NOTICE files for Maven artifacts looks good.
> > > > > Artifacts that do bundle dependencies are:
> > statefun-flink-distribution,
> > > > > statefun-ridesharing-example-simulator, statefun-flink-core (copied
> > > > > sources). All non-ASLv2 deps have license files explicitly bundled.
> > > > > - Python SDK distributions (source and wheel) contain ASLv2 LICENSE
> > and
> > > > > NOTICE files (no bundled dependencies)
> > > > > - All POMs / README / Python SDK setup.py / Dockerfiles / doc
> configs
> > > > point
> > > > > to same version “2.0.0”
> > > > > - README looks good
> > > > >
> > > > > ** Functional **
> > > > > - Building from source dist with end-to-end tests enabled (mvn
> clean
> > > > verify
> > > > > -Prun-e2e-tests) passes (JDK 8)
> > > > > - Generated quickstart from archetype looks good (correct POM /
> > > > Dockerfile
> > > > > / service file)
> > > > > - Examples run: Java Greeter / Java Ridesharing / Python Greeter /
> > > Python
> > > > > SDK Walkthrough
> > > > > - Flink Harness works in IDE
> > > > > - Test remote functions deployment mode with AWS ecosystem: remote
> > > Python
> > > > > functions running in AWS Lambda behind AWS API Gateway, Java
> embedded
> > > > > functions running in AWS ECS. Checkpointing enabled, randomly
> > restarted
> > > > > StateFun workers.
> > > > >
> > > > > On Fri, Apr 3, 2020 at 11:48 AM Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the *release candidate #6* for the
> > > > > > version 2.0.0 of Apache Flink Stateful Functions,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > **Testing Guideline**
> > > > > >
> > > > > > You can find here [1] a doc that we can use for 

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Hequn Cheng
Thanks a lot for the new RC!

+1 (non-binding)

- Signatures and hash are correct.
- The source distribution contains no binaries.
- The source distribution is building properly with `-Prun-e2e-tests`
(JDK8).
- All POM files / README / Python SDK setup.py point to the same version.
- Verify license and notice.
  - Source distribution. Everything looks good and the jquery has been
added.
  - Jar artifacts. No missing dependencies, no version errors.
  - Python source distribution (source and wheel). It contains the license
and notice file.
- Flink Harness works in IDE.

Best,
Hequn

On Mon, Apr 6, 2020 at 10:05 PM Seth Wiesman  wrote:

> +1 (non-binding)
>
> legal / source
> - checked sources for binary files
> - checked license headers
>
> functional
> - built from source (mvn clean verify -Prun-e2e-tests)
> - built python sdk and ran tests
> - ran examples
> - deployed mixed python / java application on k8s with checkpointing.
> Failed TM's and watched it recover.
> - deployed application on Flink session cluster
> - created a savepoint using the bootstrap api and successfully used it to
> start an application.
>
> Seth
>
> On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman  wrote:
>
> > +1 (non binding)
> >
> > legal / source:
> > - downloaded and verified the signature
> > - verified that pom and versions in the docs match
> > - no binary files in the distribution
> > - built and run e2e test with Java 8 and Java 11
> > - created a project from a maven archetype.
> >
> > functional:
> > - run all the examples
> > - deployed to Python greeter example to k8s
> > - enabled checkpointing, created an application with two Python
> functions,
> > that send both local and remote messages, restarted TMs randomly and
> > verified
> > the sequential output in the output kafka topic (exactly once test)
> > -  run the harness tests
> > -  run the ridesharing example in paraliisim 10 overnight
> > -  created a savepoint with the state bootstrapping tool and
> > successfully started a job from that.
> >
> > Kind regards,
> > Igal
> >
> > On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> > wrote:
> >
> > > Thanks a lot for preparing another RC!
> > >
> > > +1 (binding)
> > >
> > > - source archive looks fine (no binaries, copied sources are properly
> > > reported)
> > > - staging repository looks fine (bundled binaries seem documented,
> > versions
> > > are correct)
> > > - *mvn clean install *(mvn clean verify fails, "install" is required)
> w/
> > > e2e passes locally from source dir
> > >
> > >
> > >
> > >
> > > On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> > > wrote:
> > >
> > > > FYI -
> > > > There are these open PRs to add blog posts and update the Flink
> website
> > > for
> > > > the Stateful Functions 2.0 release:
> > > > * https://github.com/apache/flink-web/pull/322
> > > > * https://github.com/apache/flink-web/pull/321
> > > >
> > > > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
> > > konstan...@ververica.com>
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > ** Functional **
> > > > > - Building from source dist with end-to-end tests enabled (mvn
> clean
> > > > verify
> > > > > -Prun-e2e-tests) passes (JDK 8)
> > > > > - Flink Harness works in IDE
> > > > > - Building Python SDK dist from source
> > > > >
> > > > > On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > ** Legal **
> > > > > > - checksums and GPG files match corresponding release files
> > > > > > - Source distribution does not contain binaries, contents are
> sane
> > > (no
> > > > > > .git* / .travis* / generated html content files)
> > > > > > - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> > > > > > font-awesome, jquery dependency in docs and copied sources from
> > > > fastutil
> > > > > (
> > > > > > http://fastutil.di.unimi.it/)
> > > > > > - Bundled LICENSEs and NOTICE files for Maven artifacts looks
> good.
> > > > > > Artifacts that do bundle dependencies are:
> > > statefun-flink-distribution,
> > > > > > statefun-ridesharing-example-simulator, statefun-flink-core
> (copied
> > > > > > sources). All non-ASLv2 deps have license files explicitly
> bundled.
> > > > > > - Python SDK distributions (source and wheel) contain ASLv2
> LICENSE
> > > and
> > > > > > NOTICE files (no bundled dependencies)
> > > > > > - All POMs / README / Python SDK setup.py / Dockerfiles / doc
> > configs
> > > > > point
> > > > > > to same version “2.0.0”
> > > > > > - README looks good
> > > > > >
> > > > > > ** Functional **
> > > > > > - Building from source dist with end-to-end tests enabled (mvn
> > clean
> > > > > verify
> > > > > > -Prun-e2e-tests) passes (JDK 8)
> > > > > > - Generated quickstart from archetype looks good (correct POM /
> > > > > Dockerfile
> > > > > > / service file)
> > > > > > - Examples run: Java Greeter / Java Ridesharing / Python Greeter
> /
> > > > Pyt

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-06 Thread godfrey he
Hi Timo,

Sorry for the late reply, and thanks for your correction.
I missed DQL for job submission scenario.
I'll fix the document right away.

Best,
Godfrey

Timo Walther  于2020年4月3日周五 下午9:53写道:

> Hi Godfrey,
>
> I'm sorry to jump in again but I still need to clarify some things
> around TableResult.
>
> The FLIP says:
> "For DML, this method returns TableResult until the job is submitted.
> For other statements, TableResult is returned until the execution is
> finished."
>
> I thought we agreed on making every execution async? This also means
> returning a TableResult for DQLs even though the execution is not done
> yet. People need access to the JobClient also for batch jobs in order to
> cancel long lasting queries. If people want to wait for the completion
> they can hook into JobClient or collect().
>
> Can we rephrase this part to:
>
> The FLIP says:
> "For DML and DQL, this method returns TableResult once the job has been
> submitted. For DDL and DCL statements, TableResult is returned once the
> operation has finished."
>
> Regards,
> Timo
>
>
> On 02.04.20 05:27, godfrey he wrote:
> > Hi Aljoscha, Dawid, Timo,
> >
> > Thanks so much for the detailed explanation.
> > Agree with you that the multiline story is not completed now, and we can
> > keep discussion.
> > I will add current discussions and conclusions to the FLIP.
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Timo Walther  于2020年4月1日周三 下午11:27写道:
> >
> >> Hi Godfrey,
> >>
> >> first of all, I agree with Dawid. The multiline story is not completed
> >> by this FLIP. It just verifies the big picture.
> >>
> >> 1. "control the execution logic through the proposed method if they know
> >> what the statements are"
> >>
> >> This is a good point that also Fabian raised in the linked google doc. I
> >> could also imagine to return a more complicated POJO when calling
> >> `executeMultiSql()`.
> >>
> >> The POJO would include some `getSqlProperties()` such that a platform
> >> gets insights into the query before executing. We could also trigger the
> >> execution more explicitly instead of hiding it behind an iterator.
> >>
> >> 2. "there are some special commands introduced in SQL client"
> >>
> >> For platforms and SQL Client specific commands, we could offer a hook to
> >> the parser or a fallback parser in case the regular table environment
> >> parser cannot deal with the statement.
> >>
> >> However, all of that is future work and can be discussed in a separate
> >> FLIP.
> >>
> >> 3. +1 for the `Iterator` instead of `Iterable`.
> >>
> >> 4. "we should convert the checked exception to unchecked exception"
> >>
> >> Yes, I meant using a runtime exception instead of a checked exception.
> >> There was no consensus on putting the exception into the `TableResult`.
> >>
> >> Regards,
> >> Timo
> >>
> >> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> >>> When considering the multi-line support I think it is helpful to start
> >>> with a use case in mind. In my opinion consumers of this method will
> be:
> >>>
> >>>   1. sql-client
> >>>   2. third-part sql based platforms
> >>>
> >>> @Godfrey As for the quit/source/... commands. I think those belong to
> >>> the responsibility of aforementioned. I think they should not be
> >>> understandable by the TableEnvironment. What would quit on a
> >>> TableEnvironment do? Moreover I think such commands should be prefixed
> >>> appropriately. I think it's a common practice to e.g. prefix those with
> >>> ! or : to say they are meta commands of the tool rather than a query.
> >>>
> >>> I also don't necessarily understand why platform users need to know the
> >>> kind of the query to use the proposed method. They should get the type
> >>> from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> >>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> >>> we can enrich the TableResult with more explicit kind of query, but so
> >>> far I don't see such a need.
> >>>
> >>> @Kurt In those cases I would assume the developers want to present
> >>> results of the queries anyway. Moreover I think it is safe to assume
> >>> they can adhere to such a contract that the results must be iterated.
> >>>
> >>> For direct users of TableEnvironment/Table API this method does not
> make
> >>> much sense anyway, in my opinion. I think we can rather safely assume
> in
> >>> this scenario they do not want to submit multiple queries at a single
> >> time.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>>
> >>> On 01/04/2020 15:07, Kurt Young wrote:
>  One comment to `executeMultilineSql`, I'm afraid sometimes user might
>  forget to
>  iterate the returned iterators, e.g. user submits a bunch of DDLs and
>  expect the
>  framework will execute them one by one. But it didn't.
> 
>  Best,
>  Kurt
> 
> 
>  On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek
> >> wrote:
> 
> > Agreed to what Dawid and Timo said.
> >
> > To answer your question abo

[jira] [Created] (FLINK-17009) Fold API-agnostic documentation into DataStream documentation

2020-04-06 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-17009:


 Summary: Fold API-agnostic documentation into DataStream 
documentation
 Key: FLINK-17009
 URL: https://issues.apache.org/jira/browse/FLINK-17009
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


As per 
[FLIP-42|https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation],
 we want to move most cross-API documentation to the DataStream section and 
deprecate the DataSet API in the future.

We want to go from

 - Project Build Setup
 - Basic API Concepts
 - Streaming (DataStream API)
 - Batch (DataSet API)
 - Table API & SQL 
 - Data Types & Serialization
 - Managing Execution
 - Libraries
 - Best Practices
 - API Migration Guides

To

 - DataStream API
 - Table API / SQL 
 - DataSet API




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he
Hi Timo,

Sorry for late reply, and thanks for your correction. I have fixed the typo
and updated the document.

Best,
Godfrey

Timo Walther  于2020年4月6日周一 下午6:05写道:

> Hi Godfrey,
>
> did you see my remaining feedback in the discussion thread? We could
> finish this FLIP if this gets resolved.
>
> Thanks,
> Timo
>
> On 03.04.20 15:12, Terry Wang wrote:
> > +1 (non-binding)
> > Looks great to me, Thanks for driving on this.
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2020年4月3日 21:07,godfrey he  写道:
> >>
> >> Hi everyone,
> >>
> >> I'd like to start the vote of FLIP-84[1] again, which is discussed and
> >> reached consensus in the discussion thread[2].
> >>
> >> The vote will be open for at least 72 hours. Unless there is an
> objection,
> >> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >> sufficient votes.
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>
> >> [2]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>
> >>
> >> Bests,
> >> Godfrey
> >>
> >> godfrey he  于2020年3月31日周二 下午8:42写道:
> >>
> >>> Hi, Timo
> >>>
> >>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >>>
> >>> Best,
> >>> Godfrey
> >>>
> >>> Timo Walther  于2020年3月31日周二 下午5:26写道:
> >>>
>  -1
> 
>  The current discussion has not completed. The last comments were sent
>  less than 24h ago.
> 
>  Let's wait a bit longer to collect feedback from all stakeholders.
> 
>  Thanks,
>  Timo
> 
>  On 31.03.20 08:31, godfrey he wrote:
> > Hi everyone,
> >
> > I'd like to start the vote of FLIP-84[1] again, because we have some
> > feedbacks. The feedbacks are all about new introduced methods, here
> is
>  the
> > discussion thread [2].
> >
> > The vote will be open for at least 72 hours. Unless there is an
>  objection,
> > I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> > sufficient votes.
> >
> >
> > [1]
> >
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >
> > [2]
> >
> 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >
> >
> > Bests,
> > Godfrey
> >
> 
> 
>
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread Timo Walther

Thanks, for the update.

+1 (binding) for this FLIP

Regards,
Timo


On 06.04.20 16:47, godfrey he wrote:

Hi Timo,

Sorry for late reply, and thanks for your correction. I have fixed the typo
and updated the document.

Best,
Godfrey

Timo Walther  于2020年4月6日周一 下午6:05写道:


Hi Godfrey,

did you see my remaining feedback in the discussion thread? We could
finish this FLIP if this gets resolved.

Thanks,
Timo

On 03.04.20 15:12, Terry Wang wrote:

+1 (non-binding)
Looks great to me, Thanks for driving on this.

Best,
Terry Wang




2020年4月3日 21:07,godfrey he  写道:

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, which is discussed and
reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an

objection,

I will try to close it by Apr 6, 2020 13:10 UTC if we have received
sufficient votes.


[1]


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878


[2]


http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html



Bests,
Godfrey

godfrey he  于2020年3月31日周二 下午8:42写道:


Hi, Timo

So sorry about that, I'm in a little hurry. Let's wait for 24h.

Best,
Godfrey

Timo Walther  于2020年3月31日周二 下午5:26写道:


-1

The current discussion has not completed. The last comments were sent
less than 24h ago.

Let's wait a bit longer to collect feedback from all stakeholders.

Thanks,
Timo

On 31.03.20 08:31, godfrey he wrote:

Hi everyone,

I'd like to start the vote of FLIP-84[1] again, because we have some
feedbacks. The feedbacks are all about new introduced methods, here

is

the

discussion thread [2].

The vote will be open for at least 72 hours. Unless there is an

objection,

I will try to close it by Apr 3, 2020 06:30 UTC if we have received
sufficient votes.


[1]




https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878


[2]




http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html



Bests,
Godfrey













Re: [DISCUSS] FLIP-111: Docker image unification

2020-04-06 Thread Canbin Zheng
Hi, all

Thanks a lot for this FLIP and all the fruitable discussion. I am not sure
whether the following questions are in the scope of this FLIP, but I still
expect your reply:

   1. Which docker base image do we plan to use for Java? As far as I see,
   openjdk:8-jre-alpine[1] is not officially supported by the OpenJDK project
   anymore; openjdk:8-jre is larger than openjdk:8-jre-slim in size so that we
   use the latter one in our internal branch and it works fine so far.
   2. Is it possible that we execute the container CMD under *TINI*[2]
   instead of the shell for better hygiene? As far as I see, the container of
   the JM or TMs is running in the shell form and it could not receive the
   *TERM* signal when the pod is deleted[3]. Some of the problems are as
   follows:
  - The JM and the TMs could have no chance of cleanup, I used to
  create FLINK-15843[4] for tracking this problem.
  - The pod could take a long time(up to 40 seconds) to be deleted
  after the K8s API Server receives the deletion request.

   At the moment, we use *TINI* in our internal branch for the
native K8s setup and it solves the problems mentioned above.

[1]
https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links

https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099
 [2]
https://github.com/krallin/tini
 [3]
https://docs.docker.com/engine/reference/commandline/kill/
 [4]
https://issues.apache.org/jira/browse/FLINK-15843

Regards,
Canbin Zheng

Till Rohrmann  于2020年4月6日周一 下午5:34写道:

> Thanks for the feedback Niels. This is very helpful.
>
> 1. I agree `flink:latest` is nice to get started but in the long run people
> will want to pin their dependencies to a specific Flink version. I think
> the fix will happen as part of FLINK-15794.
>
> 2. SNAPSHOT docker images will be really helpful for developers as well as
> users who want to use the latest features. I believe that this will be a
> follow-up of this FLIP.
>
> 3. The goal of FLIP-111 is to create an image which allows to start a
> session as well as job cluster. Hence, I believe that we will solve this
> problem soon.
>
> 4. Same as 3. The new image will also contain the native K8s integration so
> that there is no need to create a special image modulo the artifacts you
> want to add.
>
> Additional notes:
>
> 1. I agree that one log makes it harder to separate different execution
> attempts or different tasks. However, on the other hand, it gives you an
> overall picture of what's happening in a Flink process. If things were
> split apart, then it might become super hard to detect problems in the
> runtime which affect the user code to fail or vice versa, for example. In
> general cross correlation will be harder. I guess a solution could be to
> make this configurable. In any case, we should move the discussion about
> this topic into a separate thread.
>
> Cheers,
> Till
>
> On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes  wrote:
>
> > Hi all,
> >
> > Sorry for jumping in at this late point of the discussion.
> > I see a lot of things I really like and I would like to put my "needs"
> and
> > observations here too so you take them into account (where possible).
> > I suspect that there will be overlap with things you already have taken
> > into account.
> >
> >1. No more 'flink:latest' docker image tag.
> >Related to https://issues.apache.org/jira/browse/FLINK-15794
> >What I have learned is that the 'latest' version of a docker image
> only
> >makes sense IFF this is an almost standalone thing.
> >So if I have a servlet that does something in isolation (like my hobby
> >project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> > makes
> >sense.
> >With Flink you have the application code and all nodes in the cluster
> >that are depending on each other and as such must run the exact same
> >versions of the base software.
> >So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...)
> where
> >the application and the nodes inter communicate and closely depend on
> > each
> >other then 'latest' is a bad idea.
> >   1. Assume I have an application built against the Flink N api and
> the
> >   cluster downloads the latest which is also Flink N.
> >   Then a week later Flink N+1 is released and the API I use changes
> >   (Deprecated)
> >   and a while later Flink N+2 is released and the deprecated API is
> >   removed: Then my application no longer works even though I have
> > not changed
> >   anything.
> >   So I want my application to be 'pinned' to the exact version I
> built
> >   it with.
> >   2. I have a running cluster with my application and cluster running
> >   Flink N.
> >   I add some additional nodes and the new nodes pick up the Flink N+1
> >   image ... now I have a cluster with 

[jira] [Created] (FLINK-17010) Streaming File Sink s3 end-to-end test fails with "Output hash mismatch"

2020-04-06 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17010:
--

 Summary: Streaming File Sink s3 end-to-end test fails with "Output 
hash mismatch"
 Key: FLINK-17010
 URL: https://issues.apache.org/jira/browse/FLINK-17010
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.11.0
Reporter: Robert Metzger
 Fix For: 1.11.0


CI: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7099&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5

{code}
2020-04-06T13:17:38.2460013Z Digest: 
sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09
2020-04-06T13:17:38.2475230Z Status: Downloaded newer image for 
stedolan/jq:latest
2020-04-06T13:18:00.4459693Z Number of produced values 13124/6
2020-04-06T13:18:25.3214772Z Number of produced values 18300/6
2020-04-06T13:19:06.9767370Z Number of produced values 45366/6
2020-04-06T13:20:01.2846102Z Number of produced values 6/6
2020-04-06T13:20:02.5940091Z Cancelling job ff95cd4fd52d10b6540c03cf72b33111.
2020-04-06T13:20:03.7862792Z Cancelled job ff95cd4fd52d10b6540c03cf72b33111.
2020-04-06T13:20:03.8343709Z Waiting for job (ff95cd4fd52d10b6540c03cf72b33111) 
to reach terminal state CANCELED ...
2020-04-06T13:20:05.8474817Z Job (ff95cd4fd52d10b6540c03cf72b33111) reached 
terminal state CANCELED
2020-04-06T13:20:08.6987955Z FAIL File Streaming Sink: Output hash mismatch.  
Got 61bb5f161b859759a9829516d96e2bbc, expected 6727342fdd3aae2129e61fc8f433fb6f.
2020-04-06T13:20:08.6989364Z head hexdump of actual:
2020-04-06T13:20:08.7288989Z 000   C   o   m   p   l   e   t   e   d   
2   .   0   K   i
2020-04-06T13:20:08.7289917Z 010   B   /   3   4   0   .   7   K   i   
B   (   5   .   3
2020-04-06T13:20:08.7293001Z 020   K   i   B   /   s   )   w   i   
t   h   1   1   0
2020-04-06T13:20:08.7298661Z 030   f   i   l   e   (   s   )   r   
e   m   a   i   n   i
2020-04-06T13:20:08.7299371Z 040   n   g  \r   d   o   w   n   l   o   a   
d   :   s   3   :
2020-04-06T13:20:08.7301377Z 050   /   /   f   l   i   n   k   -   i   n   
t   e   g   r   a   t
2020-04-06T13:20:08.7302336Z 060   i   o   n   -   t   e   s   t   s   /   
t   e   m   p   /   t
2020-04-06T13:20:08.7303021Z 070   e   s   t   _   s   t   r   e   a   m   
i   n   g   _   f   i
2020-04-06T13:20:08.7303968Z 080   l   e   _   s   i   n   k   -   7   b   
0   7   7   2   1   2
2020-04-06T13:20:08.7304790Z 090   -   d   9   f   8   -   4   0   d   8   
-   9   d   0   a   -
2020-04-06T13:20:08.7305297Z 0a0   f   f   9   f   7   e   9   6   d   7   
b   d   /   0   /   p
2020-04-06T13:20:08.7306285Z 0b0   a   r   t   -   2   -   1   t   o
   h   o   s   t   d
2020-04-06T13:20:08.7307138Z 0c0   i   r   /   t   e   m   p   -   t   e   
s   t   -   d   i   r
2020-04-06T13:20:08.7307891Z 0d0   e   c   t   o   r   y   -   3   5   0   
6   5   0   6   7   2
2020-04-06T13:20:08.7308402Z 0e0   8   9   /   t   e   m   p   /   t   e   
s   t   _   s   t   r
2020-04-06T13:20:08.7308870Z 0f0   e   a   m   i   n   g   _   f   i   l   
e   _   s   i   n   k
2020-04-06T13:20:08.7309579Z 100   -   7   b   0   7   7   2   1   2   -   
d   9   f   8   -   4
2020-04-06T13:20:08.7310295Z 110   0   d   8   -   9   d   0   a   -   f   
f   9   f   7   e   9
2020-04-06T13:20:08.7311022Z 120   6   d   7   b   d   /   0   /   p   a   
r   t   -   2   -   1
2020-04-06T13:20:08.7311537Z 130  \n   C   o   m   p   l   e   t   e   d
   2   .   0   K
2020-04-06T13:20:08.7312010Z 140   i   B   /   3   4   0   .   7   K   
i   B   (   5   .
2020-04-06T13:20:08.7312461Z 150   3   K   i   B   /   s   )   w   
i   t   h   1   0
2020-04-06T13:20:08.7312930Z 160   9   f   i   l   e   (   s   )   
r   e   m   a   i   n
2020-04-06T13:20:08.7313393Z 170   i   n   g  \r   C   o   m   p   l   e   
t   e   d   4   .
2020-04-06T13:20:08.7313844Z 180   4   K   i   B   /   3   4   0   .   
7   K   i   B
2020-04-06T13:20:08.7314332Z 190   (   9   .   8   K   i   B   /   s   
)   w   i   t   h
2020-04-06T13:20:08.7314785Z 1a0   1   0   9   f   i   l   e   (   
s   )   r   e   m
2020-04-06T13:20:08.7315236Z 1b0   a   i   n   i   n   g  \r   d   o   w   
n   l   o   a   d   :
2020-04-06T13:20:08.7315957Z 1c0   s   3   :   /   /   f   l   i   n   
k   -   i   n   t   e
2020-04-06T13:20:08.7316672Z 1d0   g   r   a   t   i   o   n   -   t   e   
s   t   s   /   t   e
2020-04-06T13:20:08.7317163Z 1e0   m   p   /   t   e   s   t   _   s   t   
r   e   a   m   i   n
2020-04-06T13:20:08.7317869Z 1f0   g   _   f   i   l   e   _   s   i   n   
k   -   7   b   0   7
2020-04-06T13:20:08.7318579Z 200   7   2 

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-06 Thread Timo Walther
I agree with Aljoscha. The length of this thread shows that this is 
highly controversal. I think nobody really likes this feature 100% but 
we could not find a better solution. I would consider it as a 
nice-to-have improvement during a notebook/debugging session.


I would accept avoiding whitelisting/blacklisting if the feature is 
disabled by default. And we make the merged properties available in a 
separate TableSourceFactory#Context#getExecutionOptions as Danny proposed.


What do you think?

Thanks,
Timo


On 06.04.20 09:59, Aljoscha Krettek wrote:
The reason I'm saying it should be disabled by default is that this uses 
hint syntax, and hints should really not change query semantics.


I'm quite strongly against hints that change query semantics, but if we 
disable this by default I would be (reluctantly) OK with the feature. 
Companies that create deployments or set up the SQL environment for 
users can enable the feature if they want.


But yes, I also agree that we don't need whitelisting/blacklisting, 
which makes this a lot easier to do.


Best,
Aljoscha

On 06.04.20 04:27, Danny Chan wrote:

Hi, everyone ~

@Aljoscha @Timo


I think we're designing ourselves into ever more complicated corners

here

I kindly agree that, personally didn't see strong reasons why we 
should limit on each connector properties:


• we can define any table options for CREATE TABLE, why we treat the 
dynamic options differently, we never consider any security problems 
when create table, we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties, the 
table source creation work would be much easier, just used the merged 
options. There is no need to modify each connector to decide which 
options could be overridden and how we merge them(the merge work is 
redundant).
• @Timo, how about we support another interface 
`TableSourceFactory#Context.getExecutionOptions`, we always use this 
interface to get the options to create our table source. There is no 
need to copy the catalog table itselt, we just need to generate our 
Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree to 
default disable it, a global default config would break the user 
experience too much, especially when user want to modify the options 
in a ad-hoc way.




I suggest to remove `TableSourceFactory#supportedHintOptions` or 
`TableSourceFactory#forbiddenHintOptions` based on the fact that we 
does not have black/white list for CREATE TABLE at all at lease for 
current codebase.



@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 
reasons compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a blacklist 
make us have the ability to disable some of the keys under that, i.e. 
connector.property.key1 , a whitelist can only match with prefix


2. We want the connectors to have the ability to disable format type 
switch format.type but allows all the other properties, e.g. format.* 
without format.type(let's call it SET_B), if we use the whitelist, we 
have to enumerate all the specific format keys start with format 
(SET_B), but with the old connector factories, we have no idea what 
specific format keys it supports(there is either a format.* or nothing).


3. Except the cases for 1 and 2, for normal keys(no wildcard), the 
blacklist and whitelist has the same expressiveness, use blacklist 
makes the code not too verbose to enumerate all the duplicate keys 
with #supportedKeys .(Not very strong reason, but i think as a 
connector developer, it makes sense)


Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther ,写道:

Hi everyone,

@Aljoscha: I disagree with your approach because a `Catalog` can return
a custom factory that is not using any properties. The hinting must be
transparent to a factory. We should NOT modify the metadata
`CatalogTable` at any point in time after the catalog.

@Danny, @Jingsong: How about we stick to the original design that we
wanted to vote on but use:

Set supportedHintProperties()

This fits better to the old factory design. And for the new FLIP-95
factories we will use `ConfigOption` and provide good utilities for
merging with hints etc.

We can allow `"format.*"` in `supportedHintProperties()` to allow
hinting in formats.

What do you think?

Regards,
Timo


On 02.04.20 16:24, Aljoscha Krettek wrote:

I think we're designing ourselves into ever more complicated corners
here. Maybe we need to take a step back and reconsider. What would you
think about this (somewhat) simpler proposal:

- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
CONNECTOR_PROPERTIES, depending on what naming we want to have for this
in the future. This will simply overwrite all connector properties, the
table factories don't know about hints but simply work with the
properties that they are given

- this spe

Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Seth Wiesman
I would be in favor of buffering data outside of the checkpoint lock. In my
experience, serialization is always the biggest performance killer in user
code and I have a hard time believing in practice that anyone is going to
buffer so many records that is causes real memory concerns.

To add to Timo's point,

Statefun actually did that on its Kinesis ser/de interfaces[1,2].

Seth

[1]
https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/ingress/KinesisIngressDeserializer.java
[2]
https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/egress/KinesisEgressSerializer.java


On Mon, Apr 6, 2020 at 4:49 AM Timo Walther  wrote:

> Hi Dawid,
>
> thanks for this FLIP. This solves a lot of issues with the current
> design for both the Flink contributors and users. +1 for this.
>
> Some minor suggestions from my side:
> - How about finding something shorter for `InitializationContext`? Maybe
> just `OpenContext`?
> - While introducing default methods for existing interfaces, shall we
> also create contexts for those methods? I see the following method in
> your FLIP and wonder if we can reduce the number of parameters while
> introducing a new method:
>
> deserialize(
>  byte[] recordValue,
>  String partitionKey,
>  String seqNum,
>  long approxArrivalTimestamp,
>  String stream,
>  String shardId,
>  Collector out)
>
> to:
>
> deserialize(
>  byte[] recordValue,
>  Context c,
>  Collector out)
>
> What do you think?
>
> Regards,
> Timo
>
>
>
> On 06.04.20 11:08, Dawid Wysakowicz wrote:
> > Hi devs,
> >
> > When working on improving the Table API/SQL connectors we faced a few
> > shortcomings of the DeserializationSchema and SerializationSchema
> > interfaces. Similar features were also mentioned by other users in the
> > past. The shortcomings I would like to address with the FLIP include:
> >
> >   * Emitting 0 to m records from the deserialization schema with per
> > partition watermarks
> >   o https://github.com/apache/flink/pull/3314#issuecomment-376237266
> >   o differentiate null value from no value
> >   o support for Debezium CDC format
> > (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL
> )
> >
> >   * A way to initialize the schema
> >   o establish external connections
> >   o generate code on startup
> >   o no need for lazy initialization
> >
> >   * Access to metrics
> > [
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329
> ]
> >
> > One important aspect I would like to hear your opinion on is how to
> > support the Collector interface in Kafka source. Of course if we agree
> > to add the Collector to the DeserializationSchema.
> >
> > The FLIP can be found here:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988&src=contextnavpagetreemode
> >
> > Looking forward to your feedback.
> >
> > Best,
> >
> > Dawid
> >
>
>


[jira] [Created] (FLINK-17011) Introduce builder to create AbstractStreamOperatorTestHarness for testing

2020-04-06 Thread Yun Tang (Jira)
Yun Tang created FLINK-17011:


 Summary: Introduce builder to create 
AbstractStreamOperatorTestHarness for testing
 Key: FLINK-17011
 URL: https://issues.apache.org/jira/browse/FLINK-17011
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.10.0
Reporter: Yun Tang
 Fix For: 1.11.0


Current \{{AbstractStreamOperatorTestHarness}} lacks of builder which leads us 
to create more constructors. Moreover, to set customized component, we might 
have to call \{{AbstractStreamOperatorTestHarness#setup}}, which might be 
treated a deprecated interface, before using.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Consolidated log4j2-properties file

2020-04-06 Thread Chesnay Schepler
We can also move the file to the root of the project, which should make 
it easier to discover.


flink-test-utils-junit would then just be a distribution vehicle that 
few would have to know about.


On 06/04/2020 13:31, Till Rohrmann wrote:

Hi Chesnay,

thanks for kicking this discussion off. I agree that deduplicating code is
in general a good idea.

The main benefit seems to be that all modules inherit a
log4j2-test.properties file and that this file allows to control the
logging output for several modules.

The main drawback I see is that it complicates the debugging process for
our devs. If you want to debug a problem and need logging for this, then
you have to know that there is a log4j2-test.properties file in
flink-test-utils-junit which you can tweak. At the moment, it is quite
straight forward as you simply go to the module which contains the test and
check the resources folder.

If we are ok with this drawback and document the change properly, then I'm
fine with this change.

Cheers,
Till

On Mon, Apr 6, 2020 at 12:24 PM Chesnay Schepler  wrote:


Hello,

I discovered a handy trick that would allow us to share a single
log4j2-test.properties across all modules.

https://github.com/apache/flink/pull/11634

The file would exist in flink-test-utils-junit/src/main/resources, and
be used for all modules except the kafka connectors and yarn-tests
(because they have some custom requirements).

This would mean the files can no longer go out of sync, utilities can be
shared more easily, and you wouldn't need to add a new properties file
to new modules (or older ones lacking one) during debugging.

Overall I personally quite, but I have heard some concerns about
changing dev routines so I wanted to double-check what people think in
general.






Re: [DISCUSS] Consolidated log4j2-properties file

2020-04-06 Thread Chesnay Schepler
Actually, I would first have to double-check whether this would work 
within IntelliJ...


On 06/04/2020 20:40, Chesnay Schepler wrote:
We can also move the file to the root of the project, which should 
make it easier to discover.


flink-test-utils-junit would then just be a distribution vehicle that 
few would have to know about.


On 06/04/2020 13:31, Till Rohrmann wrote:

Hi Chesnay,

thanks for kicking this discussion off. I agree that deduplicating 
code is

in general a good idea.

The main benefit seems to be that all modules inherit a
log4j2-test.properties file and that this file allows to control the
logging output for several modules.

The main drawback I see is that it complicates the debugging process for
our devs. If you want to debug a problem and need logging for this, then
you have to know that there is a log4j2-test.properties file in
flink-test-utils-junit which you can tweak. At the moment, it is quite
straight forward as you simply go to the module which contains the 
test and

check the resources folder.

If we are ok with this drawback and document the change properly, 
then I'm

fine with this change.

Cheers,
Till

On Mon, Apr 6, 2020 at 12:24 PM Chesnay Schepler  
wrote:



Hello,

I discovered a handy trick that would allow us to share a single
log4j2-test.properties across all modules.

https://github.com/apache/flink/pull/11634

The file would exist in flink-test-utils-junit/src/main/resources, and
be used for all modules except the kafka connectors and yarn-tests
(because they have some custom requirements).

This would mean the files can no longer go out of sync, utilities 
can be

shared more easily, and you wouldn't need to add a new properties file
to new modules (or older ones lacking one) during debugging.

Overall I personally quite, but I have heard some concerns about
changing dev routines so I wanted to double-check what people think in
general.









Re: [DISCUSS] Consolidated log4j2-properties file

2020-04-06 Thread Chesnay Schepler
Nope, it wouldn't work to have it in the root of the project; in a 
practical sense IntelliJ can only really handle them in the standard 
location.


Ironically, the PR currently only works in IntelliJ, because when you 
build the jar with maven we exclude all log4j files via the 
shade-plugin. Which we could of course fix, but we'd need another module 
to override the shade-plugin in to avoid side-effects on 
flink-test-utils-junit.
I suppose one positive thing would be that we could name the module 
"flink-log4j2-test-configuration" which would at least be a more obvious 
location...


On 06/04/2020 20:53, Chesnay Schepler wrote:
Actually, I would first have to double-check whether this would work 
within IntelliJ...


On 06/04/2020 20:40, Chesnay Schepler wrote:
We can also move the file to the root of the project, which should 
make it easier to discover.


flink-test-utils-junit would then just be a distribution vehicle that 
few would have to know about.


On 06/04/2020 13:31, Till Rohrmann wrote:

Hi Chesnay,

thanks for kicking this discussion off. I agree that deduplicating 
code is

in general a good idea.

The main benefit seems to be that all modules inherit a
log4j2-test.properties file and that this file allows to control the
logging output for several modules.

The main drawback I see is that it complicates the debugging process 
for
our devs. If you want to debug a problem and need logging for this, 
then

you have to know that there is a log4j2-test.properties file in
flink-test-utils-junit which you can tweak. At the moment, it is quite
straight forward as you simply go to the module which contains the 
test and

check the resources folder.

If we are ok with this drawback and document the change properly, 
then I'm

fine with this change.

Cheers,
Till

On Mon, Apr 6, 2020 at 12:24 PM Chesnay Schepler 
 wrote:



Hello,

I discovered a handy trick that would allow us to share a single
log4j2-test.properties across all modules.

https://github.com/apache/flink/pull/11634

The file would exist in flink-test-utils-junit/src/main/resources, and
be used for all modules except the kafka connectors and yarn-tests
(because they have some custom requirements).

This would mean the files can no longer go out of sync, utilities 
can be

shared more easily, and you wouldn't need to add a new properties file
to new modules (or older ones lacking one) during debugging.

Overall I personally quite, but I have heard some concerns about
changing dev routines so I wanted to double-check what people think in
general.












[jira] [Created] (FLINK-17012) Expose cost of task initialization

2020-04-06 Thread Wenlong Lyu (Jira)
Wenlong Lyu created FLINK-17012:
---

 Summary: Expose cost of task initialization
 Key: FLINK-17012
 URL: https://issues.apache.org/jira/browse/FLINK-17012
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Wenlong Lyu


Currently a task switches to running before fully initialized, does not take 
state initialization and operator initialization(#open ) in to account, which 
may take long time to finish. As a result, there would be a weird phenomenon 
that all tasks are running but throughput is 0. 

I think it could be good if we can expose the initialization stage of tasks. 
What to you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-06 Thread Kurt Young
Sounds like a reasonable compromise, disabling this feature by default is a
way to protect
the vulnerability, and we can simplify the design quite a lot. We can
gather some users'
feedback to see whether further protections are necessary in the future.

Best,
Kurt


On Mon, Apr 6, 2020 at 11:49 PM Timo Walther  wrote:

> I agree with Aljoscha. The length of this thread shows that this is
> highly controversal. I think nobody really likes this feature 100% but
> we could not find a better solution. I would consider it as a
> nice-to-have improvement during a notebook/debugging session.
>
> I would accept avoiding whitelisting/blacklisting if the feature is
> disabled by default. And we make the merged properties available in a
> separate TableSourceFactory#Context#getExecutionOptions as Danny proposed.
>
> What do you think?
>
> Thanks,
> Timo
>
>
> On 06.04.20 09:59, Aljoscha Krettek wrote:
> > The reason I'm saying it should be disabled by default is that this uses
> > hint syntax, and hints should really not change query semantics.
> >
> > I'm quite strongly against hints that change query semantics, but if we
> > disable this by default I would be (reluctantly) OK with the feature.
> > Companies that create deployments or set up the SQL environment for
> > users can enable the feature if they want.
> >
> > But yes, I also agree that we don't need whitelisting/blacklisting,
> > which makes this a lot easier to do.
> >
> > Best,
> > Aljoscha
> >
> > On 06.04.20 04:27, Danny Chan wrote:
> >> Hi, everyone ~
> >>
> >> @Aljoscha @Timo
> >>
> >>> I think we're designing ourselves into ever more complicated corners
> >> here
> >>
> >> I kindly agree that, personally didn't see strong reasons why we
> >> should limit on each connector properties:
> >>
> >> • we can define any table options for CREATE TABLE, why we treat the
> >> dynamic options differently, we never consider any security problems
> >> when create table, we should not either for dynamic table options
> >> • If we do not have whitelist properties or blacklist properties, the
> >> table source creation work would be much easier, just used the merged
> >> options. There is no need to modify each connector to decide which
> >> options could be overridden and how we merge them(the merge work is
> >> redundant).
> >> • @Timo, how about we support another interface
> >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> >> interface to get the options to create our table source. There is no
> >> need to copy the catalog table itselt, we just need to generate our
> >> Context correctly.
> >> • @Aljoscha I agree to have a global config option, but I disagree to
> >> default disable it, a global default config would break the user
> >> experience too much, especially when user want to modify the options
> >> in a ad-hoc way.
> >>
> >>
> >>
> >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> >> does not have black/white list for CREATE TABLE at all at lease for
> >> current codebase.
> >>
> >>
> >> @Timo (i have replied offline but allows to represent it here again)
> >>
> >> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3
> >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> >> 1. For key with wildcard, like connector.property.* , use a blacklist
> >> make us have the ability to disable some of the keys under that, i.e.
> >> connector.property.key1 , a whitelist can only match with prefix
> >>
> >> 2. We want the connectors to have the ability to disable format type
> >> switch format.type but allows all the other properties, e.g. format.*
> >> without format.type(let's call it SET_B), if we use the whitelist, we
> >> have to enumerate all the specific format keys start with format
> >> (SET_B), but with the old connector factories, we have no idea what
> >> specific format keys it supports(there is either a format.* or nothing).
> >>
> >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> >> blacklist and whitelist has the same expressiveness, use blacklist
> >> makes the code not too verbose to enumerate all the duplicate keys
> >> with #supportedKeys .(Not very strong reason, but i think as a
> >> connector developer, it makes sense)
> >>
> >> Best,
> >> Danny Chan
> >> 在 2020年4月3日 +0800 PM11:28,Timo Walther ,写道:
> >>> Hi everyone,
> >>>
> >>> @Aljoscha: I disagree with your approach because a `Catalog` can return
> >>> a custom factory that is not using any properties. The hinting must be
> >>> transparent to a factory. We should NOT modify the metadata
> >>> `CatalogTable` at any point in time after the catalog.
> >>>
> >>> @Danny, @Jingsong: How about we stick to the original design that we
> >>> wanted to vote on but use:
> >>>
> >>> Set supportedHintProperties()
> >>>
> >>> This fits better to the old factory design. And for the new FLIP-95
> >>> factories we will use `ConfigOption`

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Dian Fu
+1 (non-binding)

- built from source with tests (mvn clean install)
- verified the checksum and signature
- checked the bundled licenses and notices
- verified that the source distribution doesn't container unnecessary binaries
- checked that the version pointed to the same version
- flink-web PR looks well
- built and checked the docs, looks well

Regards,
Dian

> 在 2020年4月6日,下午10:06,Hequn Cheng  写道:
> 
> Thanks a lot for the new RC!
> 
> +1 (non-binding)
> 
> - Signatures and hash are correct.
> - The source distribution contains no binaries.
> - The source distribution is building properly with `-Prun-e2e-tests`
> (JDK8).
> - All POM files / README / Python SDK setup.py point to the same version.
> - Verify license and notice.
>  - Source distribution. Everything looks good and the jquery has been
> added.
>  - Jar artifacts. No missing dependencies, no version errors.
>  - Python source distribution (source and wheel). It contains the license
> and notice file.
> - Flink Harness works in IDE.
> 
> Best,
> Hequn
> 
> On Mon, Apr 6, 2020 at 10:05 PM Seth Wiesman  wrote:
> 
>> +1 (non-binding)
>> 
>> legal / source
>> - checked sources for binary files
>> - checked license headers
>> 
>> functional
>> - built from source (mvn clean verify -Prun-e2e-tests)
>> - built python sdk and ran tests
>> - ran examples
>> - deployed mixed python / java application on k8s with checkpointing.
>> Failed TM's and watched it recover.
>> - deployed application on Flink session cluster
>> - created a savepoint using the bootstrap api and successfully used it to
>> start an application.
>> 
>> Seth
>> 
>> On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman  wrote:
>> 
>>> +1 (non binding)
>>> 
>>> legal / source:
>>> - downloaded and verified the signature
>>> - verified that pom and versions in the docs match
>>> - no binary files in the distribution
>>> - built and run e2e test with Java 8 and Java 11
>>> - created a project from a maven archetype.
>>> 
>>> functional:
>>> - run all the examples
>>> - deployed to Python greeter example to k8s
>>> - enabled checkpointing, created an application with two Python
>> functions,
>>> that send both local and remote messages, restarted TMs randomly and
>>> verified
>>> the sequential output in the output kafka topic (exactly once test)
>>> -  run the harness tests
>>> -  run the ridesharing example in paraliisim 10 overnight
>>> -  created a savepoint with the state bootstrapping tool and
>>> successfully started a job from that.
>>> 
>>> Kind regards,
>>> Igal
>>> 
>>> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
>>> wrote:
>>> 
 Thanks a lot for preparing another RC!
 
 +1 (binding)
 
 - source archive looks fine (no binaries, copied sources are properly
 reported)
 - staging repository looks fine (bundled binaries seem documented,
>>> versions
 are correct)
 - *mvn clean install *(mvn clean verify fails, "install" is required)
>> w/
 e2e passes locally from source dir
 
 
 
 
 On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
 wrote:
 
> FYI -
> There are these open PRs to add blog posts and update the Flink
>> website
 for
> the Stateful Functions 2.0 release:
> * https://github.com/apache/flink-web/pull/322
> * https://github.com/apache/flink-web/pull/321
> 
> On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
 konstan...@ververica.com>
> wrote:
> 
>> +1 (non-binding)
>> 
>> ** Functional **
>> - Building from source dist with end-to-end tests enabled (mvn
>> clean
> verify
>> -Prun-e2e-tests) passes (JDK 8)
>> - Flink Harness works in IDE
>> - Building Python SDK dist from source
>> 
>> On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
 tzuli...@apache.org>
>> wrote:
>> 
>>> +1 (binding)
>>> 
>>> ** Legal **
>>> - checksums and GPG files match corresponding release files
>>> - Source distribution does not contain binaries, contents are
>> sane
 (no
>>> .git* / .travis* / generated html content files)
>>> - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
>>> font-awesome, jquery dependency in docs and copied sources from
> fastutil
>> (
>>> http://fastutil.di.unimi.it/)
>>> - Bundled LICENSEs and NOTICE files for Maven artifacts looks
>> good.
>>> Artifacts that do bundle dependencies are:
 statefun-flink-distribution,
>>> statefun-ridesharing-example-simulator, statefun-flink-core
>> (copied
>>> sources). All non-ASLv2 deps have license files explicitly
>> bundled.
>>> - Python SDK distributions (source and wheel) contain ASLv2
>> LICENSE
 and
>>> NOTICE files (no bundled dependencies)
>>> - All POMs / README / Python SDK setup.py / Dockerfiles / doc
>>> configs
>> point
>>> to same version “2.0.0”
>>> - README looks good
>>> 
>>> ** Functional **
>>> - Building from

Re: [DISCUSS] FLIP-120: Support conversion between PyFlink Table and Pandas DataFrame

2020-04-06 Thread Dian Fu
Thanks you all for the discussion. It seems that we have reached consensus on 
the design. I will start a VOTE thread if there are no other feedbacks.

Regards,
Dian

> 在 2020年4月3日,下午2:58,Wei Zhong  写道:
> 
> Hi Dian,
> 
> Thanks for driving this. Big +1 for supporting from/to pandas in PyFlink!
> 
> Best,
> Wei
> 
>> 在 2020年4月3日,13:46,jincheng sun  写道:
>> 
>> +1, Thanks for bring up this discussion @Dian Fu 
>> 
>> Best,
>> Jincheng
>> 
>> 
>> Jeff Zhang  于2020年4月1日周三 下午1:27写道:
>> 
>>> Thanks for the reply, Dian, that make sense to me.
>>> 
>>> Dian Fu  于2020年4月1日周三 上午11:53写道:
>>> 
 Hi Jeff,
 
 Thanks for your feedback.
 
 ArrowTableSink is a Flink sink which is responsible for collecting the
 data of the table. It will serialize the data of the table to Arrow
>>> format
 to make sure that it could be deserialized to pandas dataframe
>>> efficiently.
 You are right that pandas dataframe is constructed at the client side and
 so there needs a way to transfer the table data from the ArrowTableSink
>>> to
 the client. It shares the same design as Table.collect on how to transfer
 the data to the client. This is still under lively discussion in
 FLINK-14807. I think we can discuss it there on this aspect and so it's
>>> not
 touched in this design(already mentioned in the design doc). Then we can
 focus on table/dataframe conversion in this design. Does that make sense
>>> to
 you?
 
 Thanks,
 Dian
 
 [1] https://issues.apache.org/jira/browse/FLINK-14807 <
 https://issues.apache.org/jira/browse/FLINK-14807>
> 在 2020年4月1日,上午11:14,Jeff Zhang  写道:
> 
> Thanks Dian for driving this, definitely +1
> 
> Here's my 2 cents:
> 
> 1. I would pay more attention on to_pandas than from_pandas.  Because
> to_pandas will be used more frequently I believe
> 2. I think ArrowTableSink may not be enough for to_pandas, because
>>> pandas
> dataframe is on client side, it is not a table sink. We still need to
> convert ArrowTableSink to pandas dataframe if I understand correctly.
> 
> 
> 
> 
> Dian Fu  于2020年4月1日周三 上午10:49写道:
> 
>> Hi everyone,
>> 
>> I'd like to start a discussion about supporting conversion between
 PyFlink
>> Table and Pandas DataFrame.
>> 
>> Pandas dataframe is the de-facto standard to work with tabular data in
>> Python community. PyFlink table is Flink’s representation of the
>>> tabular
>> data in Python language. It would be nice to provide the functionality
 to
>> convert between the PyFlink table and Pandas dataframe in PyFlink
>>> Table
>> API. It provides users the ability to switch between PyFlink and
>>> Pandas
>> seamlessly when processing data in Python language without an extra
>> intermediate connectors.
>> 
>> Jincheng Sun and I have discussed offline and have drafted the
>> FLIP-120[1]. Looking forward to your feedback!
>> 
>> Regards,
>> Dian
>> 
>> [1]
>> 
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
> 
> 
> 
> --
> Best Regards
> 
> Jeff Zhang
 
 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
> 



Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Yu Li
+1 (non-binding)

Checked sums and signatures: OK
Checked no binaries in source distribution: OK
Checked RAT and end-to-end tests (8u101, 11.0.4): OK
Checked version in pom/README/setup.py files: OK
Checked quick start: OK
Checked Greeter local docker-compose examples: OK
Checked Ridesharing local docker-compose examples: OK
Checked Flink website PR: OK

Best Regards,
Yu


On Tue, 7 Apr 2020 at 10:40, Dian Fu  wrote:

> +1 (non-binding)
>
> - built from source with tests (mvn clean install)
> - verified the checksum and signature
> - checked the bundled licenses and notices
> - verified that the source distribution doesn't container unnecessary
> binaries
> - checked that the version pointed to the same version
> - flink-web PR looks well
> - built and checked the docs, looks well
>
> Regards,
> Dian
>
> > 在 2020年4月6日,下午10:06,Hequn Cheng  写道:
> >
> > Thanks a lot for the new RC!
> >
> > +1 (non-binding)
> >
> > - Signatures and hash are correct.
> > - The source distribution contains no binaries.
> > - The source distribution is building properly with `-Prun-e2e-tests`
> > (JDK8).
> > - All POM files / README / Python SDK setup.py point to the same version.
> > - Verify license and notice.
> >  - Source distribution. Everything looks good and the jquery has been
> > added.
> >  - Jar artifacts. No missing dependencies, no version errors.
> >  - Python source distribution (source and wheel). It contains the license
> > and notice file.
> > - Flink Harness works in IDE.
> >
> > Best,
> > Hequn
> >
> > On Mon, Apr 6, 2020 at 10:05 PM Seth Wiesman 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> legal / source
> >> - checked sources for binary files
> >> - checked license headers
> >>
> >> functional
> >> - built from source (mvn clean verify -Prun-e2e-tests)
> >> - built python sdk and ran tests
> >> - ran examples
> >> - deployed mixed python / java application on k8s with checkpointing.
> >> Failed TM's and watched it recover.
> >> - deployed application on Flink session cluster
> >> - created a savepoint using the bootstrap api and successfully used it
> to
> >> start an application.
> >>
> >> Seth
> >>
> >> On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman  wrote:
> >>
> >>> +1 (non binding)
> >>>
> >>> legal / source:
> >>> - downloaded and verified the signature
> >>> - verified that pom and versions in the docs match
> >>> - no binary files in the distribution
> >>> - built and run e2e test with Java 8 and Java 11
> >>> - created a project from a maven archetype.
> >>>
> >>> functional:
> >>> - run all the examples
> >>> - deployed to Python greeter example to k8s
> >>> - enabled checkpointing, created an application with two Python
> >> functions,
> >>> that send both local and remote messages, restarted TMs randomly and
> >>> verified
> >>> the sequential output in the output kafka topic (exactly once test)
> >>> -  run the harness tests
> >>> -  run the ridesharing example in paraliisim 10 overnight
> >>> -  created a savepoint with the state bootstrapping tool and
> >>> successfully started a job from that.
> >>>
> >>> Kind regards,
> >>> Igal
> >>>
> >>> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> >>> wrote:
> >>>
>  Thanks a lot for preparing another RC!
> 
>  +1 (binding)
> 
>  - source archive looks fine (no binaries, copied sources are properly
>  reported)
>  - staging repository looks fine (bundled binaries seem documented,
> >>> versions
>  are correct)
>  - *mvn clean install *(mvn clean verify fails, "install" is required)
> >> w/
>  e2e passes locally from source dir
> 
> 
> 
> 
>  On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
>  wrote:
> 
> > FYI -
> > There are these open PRs to add blog posts and update the Flink
> >> website
>  for
> > the Stateful Functions 2.0 release:
> > * https://github.com/apache/flink-web/pull/322
> > * https://github.com/apache/flink-web/pull/321
> >
> > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
>  konstan...@ververica.com>
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >> ** Functional **
> >> - Building from source dist with end-to-end tests enabled (mvn
> >> clean
> > verify
> >> -Prun-e2e-tests) passes (JDK 8)
> >> - Flink Harness works in IDE
> >> - Building Python SDK dist from source
> >>
> >> On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
>  tzuli...@apache.org>
> >> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> ** Legal **
> >>> - checksums and GPG files match corresponding release files
> >>> - Source distribution does not contain binaries, contents are
> >> sane
>  (no
> >>> .git* / .travis* / generated html content files)
> >>> - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> >>> font-awesome, jquery dependency in docs and copied sources from
> > fastutil
> >> (
> >>> http://fastutil

Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Congxian Qiu
+1 (non-binding)

- checked sums and signature: OK
- checked no binaries in source distribution: OK
- checked all POM files/README/Python SDK setup.py point to the same
version 2.0.0 OK
- execute `mvn clean install -Prun-e2e-tests`: OK
- checked quick start: ok
- run greeter example locally: ok
- run Ridesharing example locally: ok

Best,
Congxian


Dian Fu  于2020年4月7日周二 上午10:40写道:

> +1 (non-binding)
>
> - built from source with tests (mvn clean install)
> - verified the checksum and signature
> - checked the bundled licenses and notices
> - verified that the source distribution doesn't container unnecessary
> binaries
> - checked that the version pointed to the same version
> - flink-web PR looks well
> - built and checked the docs, looks well
>
> Regards,
> Dian
>
> > 在 2020年4月6日,下午10:06,Hequn Cheng  写道:
> >
> > Thanks a lot for the new RC!
> >
> > +1 (non-binding)
> >
> > - Signatures and hash are correct.
> > - The source distribution contains no binaries.
> > - The source distribution is building properly with `-Prun-e2e-tests`
> > (JDK8).
> > - All POM files / README / Python SDK setup.py point to the same version.
> > - Verify license and notice.
> >  - Source distribution. Everything looks good and the jquery has been
> > added.
> >  - Jar artifacts. No missing dependencies, no version errors.
> >  - Python source distribution (source and wheel). It contains the license
> > and notice file.
> > - Flink Harness works in IDE.
> >
> > Best,
> > Hequn
> >
> > On Mon, Apr 6, 2020 at 10:05 PM Seth Wiesman 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> legal / source
> >> - checked sources for binary files
> >> - checked license headers
> >>
> >> functional
> >> - built from source (mvn clean verify -Prun-e2e-tests)
> >> - built python sdk and ran tests
> >> - ran examples
> >> - deployed mixed python / java application on k8s with checkpointing.
> >> Failed TM's and watched it recover.
> >> - deployed application on Flink session cluster
> >> - created a savepoint using the bootstrap api and successfully used it
> to
> >> start an application.
> >>
> >> Seth
> >>
> >> On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman  wrote:
> >>
> >>> +1 (non binding)
> >>>
> >>> legal / source:
> >>> - downloaded and verified the signature
> >>> - verified that pom and versions in the docs match
> >>> - no binary files in the distribution
> >>> - built and run e2e test with Java 8 and Java 11
> >>> - created a project from a maven archetype.
> >>>
> >>> functional:
> >>> - run all the examples
> >>> - deployed to Python greeter example to k8s
> >>> - enabled checkpointing, created an application with two Python
> >> functions,
> >>> that send both local and remote messages, restarted TMs randomly and
> >>> verified
> >>> the sequential output in the output kafka topic (exactly once test)
> >>> -  run the harness tests
> >>> -  run the ridesharing example in paraliisim 10 overnight
> >>> -  created a savepoint with the state bootstrapping tool and
> >>> successfully started a job from that.
> >>>
> >>> Kind regards,
> >>> Igal
> >>>
> >>> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> >>> wrote:
> >>>
>  Thanks a lot for preparing another RC!
> 
>  +1 (binding)
> 
>  - source archive looks fine (no binaries, copied sources are properly
>  reported)
>  - staging repository looks fine (bundled binaries seem documented,
> >>> versions
>  are correct)
>  - *mvn clean install *(mvn clean verify fails, "install" is required)
> >> w/
>  e2e passes locally from source dir
> 
> 
> 
> 
>  On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
>  wrote:
> 
> > FYI -
> > There are these open PRs to add blog posts and update the Flink
> >> website
>  for
> > the Stateful Functions 2.0 release:
> > * https://github.com/apache/flink-web/pull/322
> > * https://github.com/apache/flink-web/pull/321
> >
> > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
>  konstan...@ververica.com>
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >> ** Functional **
> >> - Building from source dist with end-to-end tests enabled (mvn
> >> clean
> > verify
> >> -Prun-e2e-tests) passes (JDK 8)
> >> - Flink Harness works in IDE
> >> - Building Python SDK dist from source
> >>
> >> On Mon, Apr 6, 2020 at 5:12 AM Tzu-Li (Gordon) Tai <
>  tzuli...@apache.org>
> >> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> ** Legal **
> >>> - checksums and GPG files match corresponding release files
> >>> - Source distribution does not contain binaries, contents are
> >> sane
>  (no
> >>> .git* / .travis* / generated html content files)
> >>> - Bundled source LICENSEs and NOTICE looks good. Mentions bundled
> >>> font-awesome, jquery dependency in docs and copied sources from
> > fastutil
> >> (
> >>> http://fastutil.di.unimi.it/)
> >>> - Bundled L

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-06 Thread Jark Wu
I'm fine to disable this feature by default and avoid
whitelisting/blacklisting. This simplifies a lot of things.

Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
need this interface?
Should the connector factory be aware of the properties is merged with
hints or not?
What's the problem if we always get properties from
`CatalogTable#getProperties`?

Best,
Jark

On Tue, 7 Apr 2020 at 10:39, Kurt Young  wrote:

> Sounds like a reasonable compromise, disabling this feature by default is a
> way to protect
> the vulnerability, and we can simplify the design quite a lot. We can
> gather some users'
> feedback to see whether further protections are necessary in the future.
>
> Best,
> Kurt
>
>
> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther  wrote:
>
> > I agree with Aljoscha. The length of this thread shows that this is
> > highly controversal. I think nobody really likes this feature 100% but
> > we could not find a better solution. I would consider it as a
> > nice-to-have improvement during a notebook/debugging session.
> >
> > I would accept avoiding whitelisting/blacklisting if the feature is
> > disabled by default. And we make the merged properties available in a
> > separate TableSourceFactory#Context#getExecutionOptions as Danny
> proposed.
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> > On 06.04.20 09:59, Aljoscha Krettek wrote:
> > > The reason I'm saying it should be disabled by default is that this
> uses
> > > hint syntax, and hints should really not change query semantics.
> > >
> > > I'm quite strongly against hints that change query semantics, but if we
> > > disable this by default I would be (reluctantly) OK with the feature.
> > > Companies that create deployments or set up the SQL environment for
> > > users can enable the feature if they want.
> > >
> > > But yes, I also agree that we don't need whitelisting/blacklisting,
> > > which makes this a lot easier to do.
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On 06.04.20 04:27, Danny Chan wrote:
> > >> Hi, everyone ~
> > >>
> > >> @Aljoscha @Timo
> > >>
> > >>> I think we're designing ourselves into ever more complicated corners
> > >> here
> > >>
> > >> I kindly agree that, personally didn't see strong reasons why we
> > >> should limit on each connector properties:
> > >>
> > >> • we can define any table options for CREATE TABLE, why we treat the
> > >> dynamic options differently, we never consider any security problems
> > >> when create table, we should not either for dynamic table options
> > >> • If we do not have whitelist properties or blacklist properties, the
> > >> table source creation work would be much easier, just used the merged
> > >> options. There is no need to modify each connector to decide which
> > >> options could be overridden and how we merge them(the merge work is
> > >> redundant).
> > >> • @Timo, how about we support another interface
> > >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> > >> interface to get the options to create our table source. There is no
> > >> need to copy the catalog table itselt, we just need to generate our
> > >> Context correctly.
> > >> • @Aljoscha I agree to have a global config option, but I disagree to
> > >> default disable it, a global default config would break the user
> > >> experience too much, especially when user want to modify the options
> > >> in a ad-hoc way.
> > >>
> > >>
> > >>
> > >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> > >> does not have black/white list for CREATE TABLE at all at lease for
> > >> current codebase.
> > >>
> > >>
> > >> @Timo (i have replied offline but allows to represent it here again)
> > >>
> > >> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3
> > >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > >> 1. For key with wildcard, like connector.property.* , use a blacklist
> > >> make us have the ability to disable some of the keys under that, i.e.
> > >> connector.property.key1 , a whitelist can only match with prefix
> > >>
> > >> 2. We want the connectors to have the ability to disable format type
> > >> switch format.type but allows all the other properties, e.g. format.*
> > >> without format.type(let's call it SET_B), if we use the whitelist, we
> > >> have to enumerate all the specific format keys start with format
> > >> (SET_B), but with the old connector factories, we have no idea what
> > >> specific format keys it supports(there is either a format.* or
> nothing).
> > >>
> > >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > >> blacklist and whitelist has the same expressiveness, use blacklist
> > >> makes the code not too verbose to enumerate all the duplicate keys
> > >> with #supportedKeys .(Not very strong reason, but i think as a
> > >> connector developer, it makes sense)
> > >>
> > >> Best,
> > >> 

Re: [DISCUSS] FLIP-105: Support to Interpret and Emit Changelog in Flink SQL

2020-04-06 Thread Jark Wu
Hi everyone,

Since this FLIP was proposed, the community has discussed a lot about the
first approach: introducing new TableSource and TableSink interfaces to
support changelog.
And yes, that is FLIP-95 which has been accepted last week. So most of the
work has been merged into FLIP-95.

In order to support the goal of FLIP-105, there is still a little things to
discuss: how to connect external CDC formats.
We propose to introduce 2 new formats: Debezium format and Canal format.
They are the most popular CDC tools according to the survey in user [1] and
user-zh [2] mailing list.

I have updated the FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL

Welcome feedbacks!

Best,
Jark

[1]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SURVEY-What-Change-Data-Capture-tools-are-you-using-td33569.html
[2]: http://apache-flink.147419.n8.nabble.com/SURVEY-CDC-td1910.html


On Fri, 14 Feb 2020 at 22:08, Jark Wu  wrote:

> Hi everyone,
>
> I would like to start discussion about how to support interpreting
> external changelog into Flink SQL, and how to emit changelog from Flink SQL.
>
> This topic has already been mentioned several times in the past. CDC
> (Change Data Capture) data has been a very important streaming data in the
> world. Connect to CDC is a significant feature for Flink, it fills the
> missing piece for Flink's streaming processing.
>
> In FLIP-105, we propose 2 approaches to achieve.
> One is introducing new TableSource interface (higher priority),
> the other is introducing new SQL syntax to interpret and emit changelog.
>
> FLIP-105:
> https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit#
>
> Thanks for any feedback!
>
> Best,
> Jark
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread Kurt Young
+1 (binding)

The latest doc looks good to me. One minor comment is with the latest
changes, it seems also very easy
to support running SELECT query in TableEnvironement#executeSql method.
Will this also be supported?

Best,
Kurt


On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:

> Thanks, for the update.
>
> +1 (binding) for this FLIP
>
> Regards,
> Timo
>
>
> On 06.04.20 16:47, godfrey he wrote:
> > Hi Timo,
> >
> > Sorry for late reply, and thanks for your correction. I have fixed the
> typo
> > and updated the document.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年4月6日周一 下午6:05写道:
> >
> >> Hi Godfrey,
> >>
> >> did you see my remaining feedback in the discussion thread? We could
> >> finish this FLIP if this gets resolved.
> >>
> >> Thanks,
> >> Timo
> >>
> >> On 03.04.20 15:12, Terry Wang wrote:
> >>> +1 (non-binding)
> >>> Looks great to me, Thanks for driving on this.
> >>>
> >>> Best,
> >>> Terry Wang
> >>>
> >>>
> >>>
>  2020年4月3日 21:07,godfrey he  写道:
> 
>  Hi everyone,
> 
>  I'd like to start the vote of FLIP-84[1] again, which is discussed and
>  reached consensus in the discussion thread[2].
> 
>  The vote will be open for at least 72 hours. Unless there is an
> >> objection,
>  I will try to close it by Apr 6, 2020 13:10 UTC if we have received
>  sufficient votes.
> 
> 
>  [1]
> 
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> 
>  [2]
> 
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> 
> 
>  Bests,
>  Godfrey
> 
>  godfrey he  于2020年3月31日周二 下午8:42写道:
> 
> > Hi, Timo
> >
> > So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年3月31日周二 下午5:26写道:
> >
> >> -1
> >>
> >> The current discussion has not completed. The last comments were
> sent
> >> less than 24h ago.
> >>
> >> Let's wait a bit longer to collect feedback from all stakeholders.
> >>
> >> Thanks,
> >> Timo
> >>
> >> On 31.03.20 08:31, godfrey he wrote:
> >>> Hi everyone,
> >>>
> >>> I'd like to start the vote of FLIP-84[1] again, because we have
> some
> >>> feedbacks. The feedbacks are all about new introduced methods, here
> >> is
> >> the
> >>> discussion thread [2].
> >>>
> >>> The vote will be open for at least 72 hours. Unless there is an
> >> objection,
> >>> I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> >>> sufficient votes.
> >>>
> >>>
> >>> [1]
> >>>
> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>
> >>> [2]
> >>>
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>
> >>>
> >>> Bests,
> >>> Godfrey
> >>>
> >>
> >>
> >>
> >>
> >
>
>


Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Jark Wu
Hi Dawid,

Thanks for driving this. This is a blocker to support Debezium CDC format
(FLIP-105). So big +1 from my side.

Regarding to emitting multiple records and checkpointing, I'm also in favor
of option#1: buffer all the records outside of the checkpoint lock.
I think most of the use cases will not buffer larger data than
it's deserialized byte[].

I have a minor suggestion on DeserializationSchema: could we have a default
implementation (maybe throw exception) for `T deserialize(byte[] message)`?
I think this will not break compatibility, and users don't have to
implement this deprecated interface if he/she wants to use the new
collector interface.
I think SinkFunction also did this in the same way: introduce a new invoke
method with Context parameter, and give the old invoke method an
empty implemention.

Best,
Jark

On Mon, 6 Apr 2020 at 23:51, Seth Wiesman  wrote:

> I would be in favor of buffering data outside of the checkpoint lock. In my
> experience, serialization is always the biggest performance killer in user
> code and I have a hard time believing in practice that anyone is going to
> buffer so many records that is causes real memory concerns.
>
> To add to Timo's point,
>
> Statefun actually did that on its Kinesis ser/de interfaces[1,2].
>
> Seth
>
> [1]
>
> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/ingress/KinesisIngressDeserializer.java
> [2]
>
> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/egress/KinesisEgressSerializer.java
>
>
> On Mon, Apr 6, 2020 at 4:49 AM Timo Walther  wrote:
>
> > Hi Dawid,
> >
> > thanks for this FLIP. This solves a lot of issues with the current
> > design for both the Flink contributors and users. +1 for this.
> >
> > Some minor suggestions from my side:
> > - How about finding something shorter for `InitializationContext`? Maybe
> > just `OpenContext`?
> > - While introducing default methods for existing interfaces, shall we
> > also create contexts for those methods? I see the following method in
> > your FLIP and wonder if we can reduce the number of parameters while
> > introducing a new method:
> >
> > deserialize(
> >  byte[] recordValue,
> >  String partitionKey,
> >  String seqNum,
> >  long approxArrivalTimestamp,
> >  String stream,
> >  String shardId,
> >  Collector out)
> >
> > to:
> >
> > deserialize(
> >  byte[] recordValue,
> >  Context c,
> >  Collector out)
> >
> > What do you think?
> >
> > Regards,
> > Timo
> >
> >
> >
> > On 06.04.20 11:08, Dawid Wysakowicz wrote:
> > > Hi devs,
> > >
> > > When working on improving the Table API/SQL connectors we faced a few
> > > shortcomings of the DeserializationSchema and SerializationSchema
> > > interfaces. Similar features were also mentioned by other users in the
> > > past. The shortcomings I would like to address with the FLIP include:
> > >
> > >   * Emitting 0 to m records from the deserialization schema with per
> > > partition watermarks
> > >   o
> https://github.com/apache/flink/pull/3314#issuecomment-376237266
> > >   o differentiate null value from no value
> > >   o support for Debezium CDC format
> > > (
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL
> > )
> > >
> > >   * A way to initialize the schema
> > >   o establish external connections
> > >   o generate code on startup
> > >   o no need for lazy initialization
> > >
> > >   * Access to metrics
> > > [
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329
> > ]
> > >
> > > One important aspect I would like to hear your opinion on is how to
> > > support the Collector interface in Kafka source. Of course if we agree
> > > to add the Collector to the DeserializationSchema.
> > >
> > > The FLIP can be found here:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988&src=contextnavpagetreemode
> > >
> > > Looking forward to your feedback.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> >
> >
>


Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

2020-04-06 Thread Robert Metzger
Thank you for posting the FLIP.

The proposed integration with Azure Pipelines looks good to me.

On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang  wrote:

> Hi everyone,
>
> I would like to start a discussion thread on "Support Cython Optimizing
> Python User Defined Function"
>
> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
> Python UDTF will be supported in the coming release of 1.11. In release
> 1.10, we focused on supporting UDF features and did not make many
> optimizations in terms of performance. Although we have made a lot of
> optimizations in master[2], Cython can further greatly improve the
> performance of Python UDF.
>
> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
> the FLIP-121[3]. It includes the following items:
>
> - Introduces Cython implementation of coder and operations
>
> - Doc changes for building sdist and wheel packages from source code
>
> - Solutions for packages building
>
>
> Looking forward to your feedback!
>
> Best,
>
> Xingbo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
> [2] https://issues.apache.org/jira/browse/FLINK-16747
>
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>


[jira] [Created] (FLINK-17013) Support Python UDTF in old planner under batch mode

2020-04-06 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-17013:


 Summary: Support Python UDTF in old planner under batch mode
 Key: FLINK-17013
 URL: https://issues.apache.org/jira/browse/FLINK-17013
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.11.0


Currently, Python UDTF has been supported under flink planner(only stream) and 
blink planner. This jira dedicates to add Python UDTF support for flink planner 
under batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT] [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #6

2020-04-06 Thread Tzu-Li (Gordon) Tai
The voting time has passed. Thank you for testing and voting everyone!

I'm happy to announce that we have unanimously approved this candidate as
the 2.0.0 release for Apache Flink Stateful Functions.

There are 10 approving votes, 3 of which are binding:
* Tzu-Li (Gordon) Tai (binding)
* Konstantin Knauf
* Robert Metzger (binding)
* Igal Shilman
* Stephan Ewen (binding)
* Seth Wiesman
* Hequn Cheng
* Dian Fu
* Yu Li
* Congxian Qiu

There are no disapproving notes.

The announcements for the release will happen in a separate thread once all
released artifacts are available.

Cheers,
Gordon

On Tue, Apr 7, 2020 at 11:03 AM Congxian Qiu  wrote:

> +1 (non-binding)
>
> - checked sums and signature: OK
> - checked no binaries in source distribution: OK
> - checked all POM files/README/Python SDK setup.py point to the same
> version 2.0.0 OK
> - execute `mvn clean install -Prun-e2e-tests`: OK
> - checked quick start: ok
> - run greeter example locally: ok
> - run Ridesharing example locally: ok
>
> Best,
> Congxian
>
>
> Dian Fu  于2020年4月7日周二 上午10:40写道:
>
> > +1 (non-binding)
> >
> > - built from source with tests (mvn clean install)
> > - verified the checksum and signature
> > - checked the bundled licenses and notices
> > - verified that the source distribution doesn't container unnecessary
> > binaries
> > - checked that the version pointed to the same version
> > - flink-web PR looks well
> > - built and checked the docs, looks well
> >
> > Regards,
> > Dian
> >
> > > 在 2020年4月6日,下午10:06,Hequn Cheng  写道:
> > >
> > > Thanks a lot for the new RC!
> > >
> > > +1 (non-binding)
> > >
> > > - Signatures and hash are correct.
> > > - The source distribution contains no binaries.
> > > - The source distribution is building properly with `-Prun-e2e-tests`
> > > (JDK8).
> > > - All POM files / README / Python SDK setup.py point to the same
> version.
> > > - Verify license and notice.
> > >  - Source distribution. Everything looks good and the jquery has been
> > > added.
> > >  - Jar artifacts. No missing dependencies, no version errors.
> > >  - Python source distribution (source and wheel). It contains the
> license
> > > and notice file.
> > > - Flink Harness works in IDE.
> > >
> > > Best,
> > > Hequn
> > >
> > > On Mon, Apr 6, 2020 at 10:05 PM Seth Wiesman 
> > wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> legal / source
> > >> - checked sources for binary files
> > >> - checked license headers
> > >>
> > >> functional
> > >> - built from source (mvn clean verify -Prun-e2e-tests)
> > >> - built python sdk and ran tests
> > >> - ran examples
> > >> - deployed mixed python / java application on k8s with checkpointing.
> > >> Failed TM's and watched it recover.
> > >> - deployed application on Flink session cluster
> > >> - created a savepoint using the bootstrap api and successfully used it
> > to
> > >> start an application.
> > >>
> > >> Seth
> > >>
> > >> On Mon, Apr 6, 2020 at 5:49 AM Igal Shilman 
> wrote:
> > >>
> > >>> +1 (non binding)
> > >>>
> > >>> legal / source:
> > >>> - downloaded and verified the signature
> > >>> - verified that pom and versions in the docs match
> > >>> - no binary files in the distribution
> > >>> - built and run e2e test with Java 8 and Java 11
> > >>> - created a project from a maven archetype.
> > >>>
> > >>> functional:
> > >>> - run all the examples
> > >>> - deployed to Python greeter example to k8s
> > >>> - enabled checkpointing, created an application with two Python
> > >> functions,
> > >>> that send both local and remote messages, restarted TMs randomly and
> > >>> verified
> > >>> the sequential output in the output kafka topic (exactly once test)
> > >>> -  run the harness tests
> > >>> -  run the ridesharing example in paraliisim 10 overnight
> > >>> -  created a savepoint with the state bootstrapping tool and
> > >>> successfully started a job from that.
> > >>>
> > >>> Kind regards,
> > >>> Igal
> > >>>
> > >>> On Mon, Apr 6, 2020 at 10:23 AM Robert Metzger 
> > >>> wrote:
> > >>>
> >  Thanks a lot for preparing another RC!
> > 
> >  +1 (binding)
> > 
> >  - source archive looks fine (no binaries, copied sources are
> properly
> >  reported)
> >  - staging repository looks fine (bundled binaries seem documented,
> > >>> versions
> >  are correct)
> >  - *mvn clean install *(mvn clean verify fails, "install" is
> required)
> > >> w/
> >  e2e passes locally from source dir
> > 
> > 
> > 
> > 
> >  On Mon, Apr 6, 2020 at 9:22 AM Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> >  wrote:
> > 
> > > FYI -
> > > There are these open PRs to add blog posts and update the Flink
> > >> website
> >  for
> > > the Stateful Functions 2.0 release:
> > > * https://github.com/apache/flink-web/pull/322
> > > * https://github.com/apache/flink-web/pull/321
> > >
> > > On Mon, Apr 6, 2020 at 2:53 PM Konstantin Knauf <
> >  konstan...@ververica.com>
> > > wrote:
> > >

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he
Hi, Kurt

yes. `TableEnvironement#executeSql` also could execute `SELECT` statement,
which is similar to `Table#execute`.
I add this to the document.

Best,
Godfrey

Kurt Young  于2020年4月7日周二 上午11:52写道:

> +1 (binding)
>
> The latest doc looks good to me. One minor comment is with the latest
> changes, it seems also very easy
> to support running SELECT query in TableEnvironement#executeSql method.
> Will this also be supported?
>
> Best,
> Kurt
>
>
> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:
>
> > Thanks, for the update.
> >
> > +1 (binding) for this FLIP
> >
> > Regards,
> > Timo
> >
> >
> > On 06.04.20 16:47, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Sorry for late reply, and thanks for your correction. I have fixed the
> > typo
> > > and updated the document.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther  于2020年4月6日周一 下午6:05写道:
> > >
> > >> Hi Godfrey,
> > >>
> > >> did you see my remaining feedback in the discussion thread? We could
> > >> finish this FLIP if this gets resolved.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >> On 03.04.20 15:12, Terry Wang wrote:
> > >>> +1 (non-binding)
> > >>> Looks great to me, Thanks for driving on this.
> > >>>
> > >>> Best,
> > >>> Terry Wang
> > >>>
> > >>>
> > >>>
> >  2020年4月3日 21:07,godfrey he  写道:
> > 
> >  Hi everyone,
> > 
> >  I'd like to start the vote of FLIP-84[1] again, which is discussed
> and
> >  reached consensus in the discussion thread[2].
> > 
> >  The vote will be open for at least 72 hours. Unless there is an
> > >> objection,
> >  I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >  sufficient votes.
> > 
> > 
> >  [1]
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > 
> >  [2]
> > 
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > 
> > 
> >  Bests,
> >  Godfrey
> > 
> >  godfrey he  于2020年3月31日周二 下午8:42写道:
> > 
> > > Hi, Timo
> > >
> > > So sorry about that, I'm in a little hurry. Let's wait for 24h.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther  于2020年3月31日周二 下午5:26写道:
> > >
> > >> -1
> > >>
> > >> The current discussion has not completed. The last comments were
> > sent
> > >> less than 24h ago.
> > >>
> > >> Let's wait a bit longer to collect feedback from all stakeholders.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >> On 31.03.20 08:31, godfrey he wrote:
> > >>> Hi everyone,
> > >>>
> > >>> I'd like to start the vote of FLIP-84[1] again, because we have
> > some
> > >>> feedbacks. The feedbacks are all about new introduced methods,
> here
> > >> is
> > >> the
> > >>> discussion thread [2].
> > >>>
> > >>> The vote will be open for at least 72 hours. Unless there is an
> > >> objection,
> > >>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
> received
> > >>> sufficient votes.
> > >>>
> > >>>
> > >>> [1]
> > >>>
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>
> > >>> [2]
> > >>>
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > >>>
> > >>>
> > >>> Bests,
> > >>> Godfrey
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>


[jira] [Created] (FLINK-17014) Implement PipelinedRegionSchedulingStrategy

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17014:
---

 Summary: Implement PipelinedRegionSchedulingStrategy
 Key: FLINK-17014
 URL: https://issues.apache.org/jira/browse/FLINK-17014
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


The PipelinedRegionSchedulingStrategy submits one pipelined region to the 
DefaultScheduler each time. The PipelinedRegionSchedulingStrategy must be aware 
of the inputs of each pipelined region. It should schedule a region if and only 
if all the inputs of that region become consumable.

PipelinedRegionSchedulingStrategy can implement as below:
 * startScheduling() : schedule all source regions one by one.
 * onPartitionConsumable(partition) : Check all the consumer regions of the 
notified partition, if all the inputs of a region have turned to be consumable, 
schedule the region
 * restartTasks(tasksToRestart) : find out all regions which contain the tasks 
to restart, reschedule those whose inputs are all consumable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17015) Fix NPE from NullAwareMapIterator

2020-04-06 Thread Jark Wu (Jira)
Jark Wu created FLINK-17015:
---

 Summary: Fix NPE from NullAwareMapIterator
 Key: FLINK-17015
 URL: https://issues.apache.org/jira/browse/FLINK-17015
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jark Wu
 Attachments: 92164295_3052056384855585_3776552648744894464_o.jpg

When using Heap statebackend, the underlying 
{{org.apache.flink.runtime.state.heap.HeapMapState#iterator}} may return a null 
iterator. It results in the {{NullAwareMapIterator}} holds a null iterator and 
throws NPE in the following {{NullAwareMapIterator#hasNext}} invocking. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

2020-04-06 Thread Dian Fu
Hi Xingbo,

Thanks a lot for the great work. Big +1 to this feature. The performance 
improvement is impressive.

Regards,
Dian

> 在 2020年4月7日,下午12:38,Robert Metzger  写道:
> 
> Thank you for posting the FLIP.
> 
> The proposed integration with Azure Pipelines looks good to me.
> 
> On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang  wrote:
> 
>> Hi everyone,
>> 
>> I would like to start a discussion thread on "Support Cython Optimizing
>> Python User Defined Function"
>> 
>> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
>> Python UDTF will be supported in the coming release of 1.11. In release
>> 1.10, we focused on supporting UDF features and did not make many
>> optimizations in terms of performance. Although we have made a lot of
>> optimizations in master[2], Cython can further greatly improve the
>> performance of Python UDF.
>> 
>> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
>> the FLIP-121[3]. It includes the following items:
>> 
>> - Introduces Cython implementation of coder and operations
>> 
>> - Doc changes for building sdist and wheel packages from source code
>> 
>> - Solutions for packages building
>> 
>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> 
>> Xingbo
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> 
>> [2] https://issues.apache.org/jira/browse/FLINK-16747
>> 
>> [3]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>> 



[jira] [Created] (FLINK-17016) Use PipelinedRegionSchedulingStrategy in DefaultScheduler (for Blink Planner)

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17016:
---

 Summary: Use PipelinedRegionSchedulingStrategy in DefaultScheduler 
(for Blink Planner)
 Key: FLINK-17016
 URL: https://issues.apache.org/jira/browse/FLINK-17016
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


The PipelinedRegionSchedulingStrategy should be used to schedule Blink planner 
batch jobs.
Blink planner batch jobs currently corresponds to 
{{ScheduleMode.LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST}}.
The SchedulingStrategy loading must be reworked on this mode.

E2E tests are needed to verify this change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17017) Implement Bulk Slot Allocation in SchedulerImpl

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17017:
---

 Summary: Implement Bulk Slot Allocation in SchedulerImpl
 Key: FLINK-17017
 URL: https://issues.apache.org/jira/browse/FLINK-17017
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


The SlotProvider interface should be extended with an bulk slot allocation 
method which accepts a bulk of slot requests as one of the parameters.

{code:java}
CompletableFuture> allocateSlots(
  Collection slotRequests,
  Time allocationTimeout);
 
class LogicalSlotRequest {
  SlotRequestId slotRequestId;
  ScheduledUnit scheduledUnit;
  SlotProfile slotProfile;
  boolean slotWillBeOccupiedIndefinitely;
}
 
class LogicalSlotRequestResult {
  SlotRequestId slotRequestId;
  LogicalSlot slot;
}
{code}

More details see [FLIP-119#Bulk Slot 
Allocation|https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling#FLIP-119PipelinedRegionScheduling-BulkSlotAllocation]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread Dawid Wysakowicz
+1

Best,

Dawid

On 07/04/2020 07:44, godfrey he wrote:
> Hi, Kurt
>
> yes. `TableEnvironement#executeSql` also could execute `SELECT` statement,
> which is similar to `Table#execute`.
> I add this to the document.
>
> Best,
> Godfrey
>
> Kurt Young  于2020年4月7日周二 上午11:52写道:
>
>> +1 (binding)
>>
>> The latest doc looks good to me. One minor comment is with the latest
>> changes, it seems also very easy
>> to support running SELECT query in TableEnvironement#executeSql method.
>> Will this also be supported?
>>
>> Best,
>> Kurt
>>
>>
>> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:
>>
>>> Thanks, for the update.
>>>
>>> +1 (binding) for this FLIP
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 06.04.20 16:47, godfrey he wrote:
 Hi Timo,

 Sorry for late reply, and thanks for your correction. I have fixed the
>>> typo
 and updated the document.

 Best,
 Godfrey

 Timo Walther  于2020年4月6日周一 下午6:05写道:

> Hi Godfrey,
>
> did you see my remaining feedback in the discussion thread? We could
> finish this FLIP if this gets resolved.
>
> Thanks,
> Timo
>
> On 03.04.20 15:12, Terry Wang wrote:
>> +1 (non-binding)
>> Looks great to me, Thanks for driving on this.
>>
>> Best,
>> Terry Wang
>>
>>
>>
>>> 2020年4月3日 21:07,godfrey he  写道:
>>>
>>> Hi everyone,
>>>
>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
>> and
>>> reached consensus in the discussion thread[2].
>>>
>>> The vote will be open for at least 72 hours. Unless there is an
> objection,
>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
>>> sufficient votes.
>>>
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> [2]
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>>>
>>> Bests,
>>> Godfrey
>>>
>>> godfrey he  于2020年3月31日周二 下午8:42写道:
>>>
 Hi, Timo

 So sorry about that, I'm in a little hurry. Let's wait for 24h.

 Best,
 Godfrey

 Timo Walther  于2020年3月31日周二 下午5:26写道:

> -1
>
> The current discussion has not completed. The last comments were
>>> sent
> less than 24h ago.
>
> Let's wait a bit longer to collect feedback from all stakeholders.
>
> Thanks,
> Timo
>
> On 31.03.20 08:31, godfrey he wrote:
>> Hi everyone,
>>
>> I'd like to start the vote of FLIP-84[1] again, because we have
>>> some
>> feedbacks. The feedbacks are all about new introduced methods,
>> here
> is
> the
>> discussion thread [2].
>>
>> The vote will be open for at least 72 hours. Unless there is an
> objection,
>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
>> received
>> sufficient votes.
>>
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> [2]
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>>
>> Bests,
>> Godfrey
>>
>
>
>>>



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-17018) Use Bulk Slot Allocation in DefaultExecutionSlotAllocator

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17018:
---

 Summary: Use Bulk Slot Allocation in DefaultExecutionSlotAllocator
 Key: FLINK-17018
 URL: https://issues.apache.org/jira/browse/FLINK-17018
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


The DefaultExecutionSlotAllocator should invoke bulk slot allocation methods to 
allocate slots.
The SlotProviderStrategy should also be reworked to forward such requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Tzu-Li (Gordon) Tai
Hi everyone!

On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
Flink committer.

Seth started contributing to the project in March 2017. You may know him
from several contributions in the past.
He had helped a lot with Flink documentation, and had contributed the State
Processor API.
Over the past few months, he has also helped tremendously in writing the
majority of the
Stateful Functions documentation.

Please join me in congratulating Seth for becoming a Flink committer!

Thanks,
Gordon


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Konstantin Knauf
Congratulations, Seth! Well deserved :)

On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai 
wrote:

> Hi everyone!
>
> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
> Flink committer.
>
> Seth started contributing to the project in March 2017. You may know him
> from several contributions in the past.
> He had helped a lot with Flink documentation, and had contributed the State
> Processor API.
> Over the past few months, he has also helped tremendously in writing the
> majority of the
> Stateful Functions documentation.
>
> Please join me in congratulating Seth for becoming a Flink committer!
>
> Thanks,
> Gordon
>


-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica 


--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[jira] [Created] (FLINK-17019) Implement FIFO Physical Slot Assignment in SlotPoolImpl

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17019:
---

 Summary: Implement FIFO Physical Slot Assignment in SlotPoolImpl
 Key: FLINK-17019
 URL: https://issues.apache.org/jira/browse/FLINK-17019
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


The SlotPool should try to fulfill the oldest pending slot request once it 
receives an available slot, no matter if the slot is returned by another 
terminated task or is just offered from a task manager. This naturally ensures 
that slot requests of an earlier scheduled region will be fulfilled earlier 
than requests of a later scheduled region.

We only need to change the slot assignment logic on slot offers. This is 
because the fields {{pendingRequests}} and {{waitingForResourceManager}} store 
the pending requests in LinkedHashMaps . Therefore, 
{{tryFulfillSlotRequestOrMakeAvailable(...)}} will naturally fulfill the 
pending requests in inserted order.

When a new slot is offered via {{SlotPoolImpl#offerSlot(...)}} , we should use 
it to fulfill the oldest fulfillable slot request directly by invoking 
{{tryFulfillSlotRequestOrMakeAvailable(...)}}. 

If a pending request (say R1) exists with the allocationId of the offered slot, 
and it is different from the request to fulfill (say R2), we should update the 
pendingRequest to replace AllocationID of R1 to be the AllocationID of R2. This 
ensures failAllocation(...) can fail slot allocation requests to trigger 
restarting tasks and re-allocating slots. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17020) Introduce GlobalDataExchangeMode for JobGraph Generation

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17020:
---

 Summary: Introduce GlobalDataExchangeMode for JobGraph Generation
 Key: FLINK-17020
 URL: https://issues.apache.org/jira/browse/FLINK-17020
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


Introduce GlobalDataExchangeMode with 4 modes:
 * ALL_EDGES_BLOCKING
 * FORWARD_EDGES_PIPELINED
 * POINTWISE_EDGES_PIPELINED
 * ALL_EDGES_PIPELINED

StreamGraph will be extended with a new field to host the 
GlobalDataExchangeMode. In the JobGraph generation stage, this mode will be 
used to determine the data exchange type of each job edge.

More details see [FLIP-119#Global Data Exchange 
Mode|https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling#FLIP-119PipelinedRegionScheduling-GlobalDataExchangeMode]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17021) Blink Planner set GlobalDataExchangeMode

2020-04-06 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17021:
---

 Summary: Blink Planner set GlobalDataExchangeMode
 Key: FLINK-17021
 URL: https://issues.apache.org/jira/browse/FLINK-17021
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Table SQL / Planner
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


Blink planner config option "table.exec.shuffle-mode" should be extended to set 
GlobalDataExchangeMode for a job, values supported are:
 * all-blocking/batch --> ALL_EDGES_BLOCKING
 * forward-pipelined-only --> FORWARD_EDGES_PIPELINED
 * pointwise-pipelined-only --> POINTWISE_EDGES_PIPELINED
 * all-pipelined/pipelined --> ALL_EDGES_PIPELINED

Note that values 'pipelined' and 'batch' are still supported to be compatible:
 * ‘pipelined’ will be treated the same as ‘all-pipelined’
 * ‘batch’ will be treated the same as as ‘all-blocking’

Blink planner needs to set GlobalDataExchangeMode to StreamGraph according to 
the config value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Dian Fu
Congratulations!

> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
> 
> Congratulations, Seth! Well deserved :)
> 
> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai 
> wrote:
> 
>> Hi everyone!
>> 
>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
>> Flink committer.
>> 
>> Seth started contributing to the project in March 2017. You may know him
>> from several contributions in the past.
>> He had helped a lot with Flink documentation, and had contributed the State
>> Processor API.
>> Over the past few months, he has also helped tremendously in writing the
>> majority of the
>> Stateful Functions documentation.
>> 
>> Please join me in congratulating Seth for becoming a Flink committer!
>> 
>> Thanks,
>> Gordon
>> 
> 
> 
> -- 
> 
> Konstantin Knauf | Head of Product
> 
> +49 160 91394525
> 
> 
> Follow us @VervericaData Ververica 
> 
> 
> --
> 
> Join Flink Forward  - The Apache Flink
> Conference
> 
> Stream Processing | Event Driven | Real Time
> 
> --
> 
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> 
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Tony) Cheng



Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Dawid Wysakowicz
Hi all,

@Timo I'm fine with OpenContext.

@Timo @Seth Sure we can combine all the parameters in a single object.
Will update the FLIP

@Jark I was aware of the implementation of SinkFunction, but it was a
conscious choice to not do it that way.

Personally I am against giving a default implementation to both the new
and old methods. This results in an interface that by default does
nothing or notifies the user only in the runtime, that he/she has not
implemented a method of the interface, which does not sound like a good
practice to me. Moreover I believe the method without a Collector will
still be the preferred method by many users. Plus it communicates
explicitly what is the minimal functionality required by the interface.
Nevertheless I am happy to hear other opinions.

@all I also prefer the buffering approach. Let's wait a day or two more
to see if others think differently.

Best,

Dawid

On 07/04/2020 06:11, Jark Wu wrote:
> Hi Dawid,
>
> Thanks for driving this. This is a blocker to support Debezium CDC format
> (FLIP-105). So big +1 from my side.
>
> Regarding to emitting multiple records and checkpointing, I'm also in favor
> of option#1: buffer all the records outside of the checkpoint lock.
> I think most of the use cases will not buffer larger data than
> it's deserialized byte[].
>
> I have a minor suggestion on DeserializationSchema: could we have a default
> implementation (maybe throw exception) for `T deserialize(byte[] message)`?
> I think this will not break compatibility, and users don't have to
> implement this deprecated interface if he/she wants to use the new
> collector interface.
> I think SinkFunction also did this in the same way: introduce a new invoke
> method with Context parameter, and give the old invoke method an
> empty implemention.
>
> Best,
> Jark
>
> On Mon, 6 Apr 2020 at 23:51, Seth Wiesman  wrote:
>
>> I would be in favor of buffering data outside of the checkpoint lock. In my
>> experience, serialization is always the biggest performance killer in user
>> code and I have a hard time believing in practice that anyone is going to
>> buffer so many records that is causes real memory concerns.
>>
>> To add to Timo's point,
>>
>> Statefun actually did that on its Kinesis ser/de interfaces[1,2].
>>
>> Seth
>>
>> [1]
>>
>> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/ingress/KinesisIngressDeserializer.java
>> [2]
>>
>> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/egress/KinesisEgressSerializer.java
>>
>>
>> On Mon, Apr 6, 2020 at 4:49 AM Timo Walther  wrote:
>>
>>> Hi Dawid,
>>>
>>> thanks for this FLIP. This solves a lot of issues with the current
>>> design for both the Flink contributors and users. +1 for this.
>>>
>>> Some minor suggestions from my side:
>>> - How about finding something shorter for `InitializationContext`? Maybe
>>> just `OpenContext`?
>>> - While introducing default methods for existing interfaces, shall we
>>> also create contexts for those methods? I see the following method in
>>> your FLIP and wonder if we can reduce the number of parameters while
>>> introducing a new method:
>>>
>>> deserialize(
>>>  byte[] recordValue,
>>>  String partitionKey,
>>>  String seqNum,
>>>  long approxArrivalTimestamp,
>>>  String stream,
>>>  String shardId,
>>>  Collector out)
>>>
>>> to:
>>>
>>> deserialize(
>>>  byte[] recordValue,
>>>  Context c,
>>>  Collector out)
>>>
>>> What do you think?
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>> On 06.04.20 11:08, Dawid Wysakowicz wrote:
 Hi devs,

 When working on improving the Table API/SQL connectors we faced a few
 shortcomings of the DeserializationSchema and SerializationSchema
 interfaces. Similar features were also mentioned by other users in the
 past. The shortcomings I would like to address with the FLIP include:

   * Emitting 0 to m records from the deserialization schema with per
 partition watermarks
   o
>> https://github.com/apache/flink/pull/3314#issuecomment-376237266
   o differentiate null value from no value
   o support for Debezium CDC format
 (
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL
>>> )
   * A way to initialize the schema
   o establish external connections
   o generate code on startup
   o no need for lazy initialization

   * Access to metrics
 [
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329
>>> ]
 One important aspect I would like to hear your opinion on is how to
 support the Collector interface in Kafka source. Of course if we a

Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Zhu Zhu
Congratulations, Seth!

Thanks,
Zhu Zhu

Dian Fu  于2020年4月7日周二 下午2:43写道:

> Congratulations!
>
> > 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
> >
> > Congratulations, Seth! Well deserved :)
> >
> > On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai 
> > wrote:
> >
> >> Hi everyone!
> >>
> >> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
> >> Flink committer.
> >>
> >> Seth started contributing to the project in March 2017. You may know him
> >> from several contributions in the past.
> >> He had helped a lot with Flink documentation, and had contributed the
> State
> >> Processor API.
> >> Over the past few months, he has also helped tremendously in writing the
> >> majority of the
> >> Stateful Functions documentation.
> >>
> >> Please join me in congratulating Seth for becoming a Flink committer!
> >>
> >> Thanks,
> >> Gordon
> >>
> >
> >
> > --
> >
> > Konstantin Knauf | Head of Product
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica 
> >
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Tony) Cheng
>
>


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Dawid Wysakowicz
Congratulations Seth. Happy to have you in the community!

Best,

Dawid

On 07/04/2020 08:43, Dian Fu wrote:
> Congratulations!
>
>> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
>>
>> Congratulations, Seth! Well deserved :)
>>
>> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi everyone!
>>>
>>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
>>> Flink committer.
>>>
>>> Seth started contributing to the project in March 2017. You may know him
>>> from several contributions in the past.
>>> He had helped a lot with Flink documentation, and had contributed the State
>>> Processor API.
>>> Over the past few months, he has also helped tremendously in writing the
>>> majority of the
>>> Stateful Functions documentation.
>>>
>>> Please join me in congratulating Seth for becoming a Flink committer!
>>>
>>> Thanks,
>>> Gordon
>>>
>>
>> -- 
>>
>> Konstantin Knauf | Head of Product
>>
>> +49 160 91394525
>>
>>
>> Follow us @VervericaData Ververica 
>>
>>
>> --
>>
>> Join Flink Forward  - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> (Tony) Cheng



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-17022) Flink SQL ROW_NUMBER() Exception: TableException: This calc has no useful projection and no filter. It should be removed by CalcRemoveRule.

2020-04-06 Thread xingoo (Jira)
xingoo created FLINK-17022:
--

 Summary: Flink SQL ROW_NUMBER() Exception: TableException: This 
calc has no useful projection and no filter. It should be removed by 
CalcRemoveRule.
 Key: FLINK-17022
 URL: https://issues.apache.org/jira/browse/FLINK-17022
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0
 Environment: flink 1.10 with PR: 

https://issues.apache.org/jira/browse/FLINK-16068

https://issues.apache.org/jira/browse/FLINK-16345
Reporter: xingoo


exception:
{code:java}
//代码占位符
Caused by: org.apache.flink.table.api.TableException: This calc has no useful 
projection and no filter. It should be removed by CalcRemoveRule.
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateProcessCode(CalcCodeGenerator.scala:176)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateCalcOperator(CalcCodeGenerator.scala:49)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:77)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:84)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:44)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlan(StreamExecExchange.scala:44)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecRank.translateToPlanInternal(StreamExecRank.scala:209)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecRank.translateToPlanInternal(StreamExecRank.scala:53)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecRank.translateToPlan(StreamExecRank.scala:53)
at 
org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:60)
at 
org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:59)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:59)
at 
org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:81)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.explain(TableEnvironmentImpl.java:447)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.explain(TableEnvironmentImpl.java:442)
at 
com.ververica.flink.table.gateway.operation.ExplainOperation.lambda$execute$0(ExplainOperation.java:53)
at 
com.ververica.flink.table.gateway.context.ExecutionContext.wrapClassLoader(ExecutionContext.java:230)
at 
com.ververica.flink.table.gateway.operation.ExplainOperation.execute(ExplainOperation.java:53)
... 45 more
{code}
sql:
{code:java}
//代码占位符
create view v1 as
select a,  b, count(1) as c 
from test_kafka_t 
group by a,b,HOP(ts, INTERVAL '10' SECOND, INTERVAL '1' MINUTE);

explain
select * from (
SELECT *,  row_number() over (PARTITION BY a ORDER BY c) AS rn
FROM v1
-- where 1=1  -- this can fix
)
where rn <= 5
{code}
kafka topic:
{code:java}
//代码占位符
CREATE TABLE test_kafka_t (
  a varchar,
  b int,
  ts as PROCTIME()
) WITH (
  'connector.type' = 'kafka',   
  'connector.version' = '0.11',
  'connector.topic' = 'xx',
  'connector.properties.zookeeper.connect' = 'xx',
  'connector.properties.bootstrap.servers' = 'xx',
  'connector.properties.group.id' = 'testGroup',
  'connector.startup-mode' = 'latest-offset',
  'format.type' = 'json'
)

Re: [ANNOUNCE] New Committers and PMC member

2020-04-06 Thread Dawid Wysakowicz
Thank you all for the support!

Best,

Dawid

On 02/04/2020 04:33, godfrey he wrote:
> Congratulations to all of you~
>
> Best,
> Godfrey
>
> Ismaël Mejía  于2020年4月2日周四 上午6:42写道:
>
>> Congrats everyone!
>>
>> On Thu, Apr 2, 2020 at 12:16 AM Rong Rong  wrote:
>>> Congratulations to all!!!
>>>
>>> --
>>> Rong
>>>
>>> On Wed, Apr 1, 2020 at 2:27 PM Thomas Weise  wrote:
>>>
 Congratulations!


 On Wed, Apr 1, 2020 at 9:31 AM Fabian Hueske 
>> wrote:
> Congrats everyone!
>
> Cheers, Fabian
>
> Am Mi., 1. Apr. 2020 um 18:26 Uhr schrieb Yun Tang >> :
>> Congratulations to all of you!
>>
>> Best
>> Yun Tang
>> 
>> From: Yang Wang 
>> Sent: Wednesday, April 1, 2020 22:28
>> To: dev 
>> Subject: Re: [ANNOUNCE] New Committers and PMC member
>>
>> Congratulations all.
>>
>> Best,
>> Yang
>>
>> Leonard Xu  于2020年4月1日周三 下午10:15写道:
>>
>>> Congratulations Konstantin, Dawid and Zhijiang!  Well deserved!
>>>
>>> Best,
>>> Leonard Xu
 在 2020年4月1日,21:22,Jark Wu  写道:

 Congratulations to you all!

 Best,
 Jark

 On Wed, 1 Apr 2020 at 20:33, Kurt Young 
>> wrote:
> Congratulations to you all!
>
> Best,
> Kurt
>
>
> On Wed, Apr 1, 2020 at 7:41 PM Danny Chan <
>> yuzhao@gmail.com>
>> wrote:
>> Congratulations!
>>
>> Best,
>> Danny Chan
>> 在 2020年4月1日 +0800 PM7:36,dev@flink.apache.org,写道:
>>> Congratulations!
>>>



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Jingsong Li
Congratulations Seth!

Best,
Jingsong Lee

On Tue, Apr 7, 2020 at 2:46 PM Dawid Wysakowicz 
wrote:

> Congratulations Seth. Happy to have you in the community!
>
> Best,
>
> Dawid
>
> On 07/04/2020 08:43, Dian Fu wrote:
> > Congratulations!
> >
> >> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
> >>
> >> Congratulations, Seth! Well deserved :)
> >>
> >> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai  >
> >> wrote:
> >>
> >>> Hi everyone!
> >>>
> >>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
> >>> Flink committer.
> >>>
> >>> Seth started contributing to the project in March 2017. You may know
> him
> >>> from several contributions in the past.
> >>> He had helped a lot with Flink documentation, and had contributed the
> State
> >>> Processor API.
> >>> Over the past few months, he has also helped tremendously in writing
> the
> >>> majority of the
> >>> Stateful Functions documentation.
> >>>
> >>> Please join me in congratulating Seth for becoming a Flink committer!
> >>>
> >>> Thanks,
> >>> Gordon
> >>>
> >>
> >> --
> >>
> >> Konstantin Knauf | Head of Product
> >>
> >> +49 160 91394525
> >>
> >>
> >> Follow us @VervericaData Ververica 
> >>
> >>
> >> --
> >>
> >> Join Flink Forward  - The Apache Flink
> >> Conference
> >>
> >> Stream Processing | Event Driven | Real Time
> >>
> >> --
> >>
> >> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>
> >> --
> >> Ververica GmbH
> >> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >> (Tony) Cheng
>
>

-- 
Best, Jingsong Lee


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Kurt Young
Congratulations, Seth!

Best,
Kurt


On Tue, Apr 7, 2020 at 2:51 PM Jingsong Li  wrote:

> Congratulations Seth!
>
> Best,
> Jingsong Lee
>
> On Tue, Apr 7, 2020 at 2:46 PM Dawid Wysakowicz 
> wrote:
>
> > Congratulations Seth. Happy to have you in the community!
> >
> > Best,
> >
> > Dawid
> >
> > On 07/04/2020 08:43, Dian Fu wrote:
> > > Congratulations!
> > >
> > >> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
> > >>
> > >> Congratulations, Seth! Well deserved :)
> > >>
> > >> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Hi everyone!
> > >>>
> > >>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a
> new
> > >>> Flink committer.
> > >>>
> > >>> Seth started contributing to the project in March 2017. You may know
> > him
> > >>> from several contributions in the past.
> > >>> He had helped a lot with Flink documentation, and had contributed the
> > State
> > >>> Processor API.
> > >>> Over the past few months, he has also helped tremendously in writing
> > the
> > >>> majority of the
> > >>> Stateful Functions documentation.
> > >>>
> > >>> Please join me in congratulating Seth for becoming a Flink committer!
> > >>>
> > >>> Thanks,
> > >>> Gordon
> > >>>
> > >>
> > >> --
> > >>
> > >> Konstantin Knauf | Head of Product
> > >>
> > >> +49 160 91394525
> > >>
> > >>
> > >> Follow us @VervericaData Ververica 
> > >>
> > >>
> > >> --
> > >>
> > >> Join Flink Forward  - The Apache Flink
> > >> Conference
> > >>
> > >> Stream Processing | Event Driven | Real Time
> > >>
> > >> --
> > >>
> > >> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > >>
> > >> --
> > >> Ververica GmbH
> > >> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > >> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
> Ji
> > >> (Tony) Cheng
> >
> >
>
> --
> Best, Jingsong Lee
>


Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Congxian Qiu
Congratulations, Seth!

Best,
Congxian


Kurt Young  于2020年4月7日周二 下午2:53写道:

> Congratulations, Seth!
>
> Best,
> Kurt
>
>
> On Tue, Apr 7, 2020 at 2:51 PM Jingsong Li  wrote:
>
> > Congratulations Seth!
> >
> > Best,
> > Jingsong Lee
> >
> > On Tue, Apr 7, 2020 at 2:46 PM Dawid Wysakowicz 
> > wrote:
> >
> > > Congratulations Seth. Happy to have you in the community!
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 07/04/2020 08:43, Dian Fu wrote:
> > > > Congratulations!
> > > >
> > > >> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
> > > >>
> > > >> Congratulations, Seth! Well deserved :)
> > > >>
> > > >> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi everyone!
> > > >>>
> > > >>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a
> > new
> > > >>> Flink committer.
> > > >>>
> > > >>> Seth started contributing to the project in March 2017. You may
> know
> > > him
> > > >>> from several contributions in the past.
> > > >>> He had helped a lot with Flink documentation, and had contributed
> the
> > > State
> > > >>> Processor API.
> > > >>> Over the past few months, he has also helped tremendously in
> writing
> > > the
> > > >>> majority of the
> > > >>> Stateful Functions documentation.
> > > >>>
> > > >>> Please join me in congratulating Seth for becoming a Flink
> committer!
> > > >>>
> > > >>> Thanks,
> > > >>> Gordon
> > > >>>
> > > >>
> > > >> --
> > > >>
> > > >> Konstantin Knauf | Head of Product
> > > >>
> > > >> +49 160 91394525
> > > >>
> > > >>
> > > >> Follow us @VervericaData Ververica 
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Join Flink Forward  - The Apache Flink
> > > >> Conference
> > > >>
> > > >> Stream Processing | Event Driven | Real Time
> > > >>
> > > >> --
> > > >>
> > > >> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > >>
> > > >> --
> > > >> Ververica GmbH
> > > >> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > >> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
> > Ji
> > > >> (Tony) Cheng
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>


Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Jark Wu
Thanks for the explanation. Sounds good to me.

Best,
Jark

On Tue, 7 Apr 2020 at 14:45, Dawid Wysakowicz 
wrote:

> Hi all,
>
> @Timo I'm fine with OpenContext.
>
> @Timo @Seth Sure we can combine all the parameters in a single object.
> Will update the FLIP
>
> @Jark I was aware of the implementation of SinkFunction, but it was a
> conscious choice to not do it that way.
>
> Personally I am against giving a default implementation to both the new
> and old methods. This results in an interface that by default does
> nothing or notifies the user only in the runtime, that he/she has not
> implemented a method of the interface, which does not sound like a good
> practice to me. Moreover I believe the method without a Collector will
> still be the preferred method by many users. Plus it communicates
> explicitly what is the minimal functionality required by the interface.
> Nevertheless I am happy to hear other opinions.
>
> @all I also prefer the buffering approach. Let's wait a day or two more
> to see if others think differently.
>
> Best,
>
> Dawid
>
> On 07/04/2020 06:11, Jark Wu wrote:
> > Hi Dawid,
> >
> > Thanks for driving this. This is a blocker to support Debezium CDC format
> > (FLIP-105). So big +1 from my side.
> >
> > Regarding to emitting multiple records and checkpointing, I'm also in
> favor
> > of option#1: buffer all the records outside of the checkpoint lock.
> > I think most of the use cases will not buffer larger data than
> > it's deserialized byte[].
> >
> > I have a minor suggestion on DeserializationSchema: could we have a
> default
> > implementation (maybe throw exception) for `T deserialize(byte[]
> message)`?
> > I think this will not break compatibility, and users don't have to
> > implement this deprecated interface if he/she wants to use the new
> > collector interface.
> > I think SinkFunction also did this in the same way: introduce a new
> invoke
> > method with Context parameter, and give the old invoke method an
> > empty implemention.
> >
> > Best,
> > Jark
> >
> > On Mon, 6 Apr 2020 at 23:51, Seth Wiesman  wrote:
> >
> >> I would be in favor of buffering data outside of the checkpoint lock.
> In my
> >> experience, serialization is always the biggest performance killer in
> user
> >> code and I have a hard time believing in practice that anyone is going
> to
> >> buffer so many records that is causes real memory concerns.
> >>
> >> To add to Timo's point,
> >>
> >> Statefun actually did that on its Kinesis ser/de interfaces[1,2].
> >>
> >> Seth
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/ingress/KinesisIngressDeserializer.java
> >> [2]
> >>
> >>
> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/egress/KinesisEgressSerializer.java
> >>
> >>
> >> On Mon, Apr 6, 2020 at 4:49 AM Timo Walther  wrote:
> >>
> >>> Hi Dawid,
> >>>
> >>> thanks for this FLIP. This solves a lot of issues with the current
> >>> design for both the Flink contributors and users. +1 for this.
> >>>
> >>> Some minor suggestions from my side:
> >>> - How about finding something shorter for `InitializationContext`?
> Maybe
> >>> just `OpenContext`?
> >>> - While introducing default methods for existing interfaces, shall we
> >>> also create contexts for those methods? I see the following method in
> >>> your FLIP and wonder if we can reduce the number of parameters while
> >>> introducing a new method:
> >>>
> >>> deserialize(
> >>>  byte[] recordValue,
> >>>  String partitionKey,
> >>>  String seqNum,
> >>>  long approxArrivalTimestamp,
> >>>  String stream,
> >>>  String shardId,
> >>>  Collector out)
> >>>
> >>> to:
> >>>
> >>> deserialize(
> >>>  byte[] recordValue,
> >>>  Context c,
> >>>  Collector out)
> >>>
> >>> What do you think?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>>
> >>> On 06.04.20 11:08, Dawid Wysakowicz wrote:
>  Hi devs,
> 
>  When working on improving the Table API/SQL connectors we faced a few
>  shortcomings of the DeserializationSchema and SerializationSchema
>  interfaces. Similar features were also mentioned by other users in the
>  past. The shortcomings I would like to address with the FLIP include:
> 
>    * Emitting 0 to m records from the deserialization schema with per
>  partition watermarks
>    o
> >> https://github.com/apache/flink/pull/3314#issuecomment-376237266
>    o differentiate null value from no value
>    o support for Debezium CDC format
>  (
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL
> >>> )
>    * A way to initialize the schema
>    o establish external connections
>    o genera

[jira] [Created] (FLINK-17023) The format checking of extractExecutionParams in config.sh is incorrect

2020-04-06 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-17023:
---

 Summary: The format checking of extractExecutionParams in 
config.sh is incorrect
 Key: FLINK-17023
 URL: https://issues.apache.org/jira/browse/FLINK-17023
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.11.0
Reporter: Caizhi Weng


In [FLINK-15727|https://issues.apache.org/jira/browse/FLINK-15727] 
extractExecutionParams now returns multiple lines of results to avoid using 
BashJavaUtils twice. But now the format checking for the last line is 
incorrect. Instead of 

{code:bash}
if ! [[ $execution_config =~ ^${EXECUTION_PREFIX}.* ]]; then
echo "[ERROR] Unexpected result: $execution_config" 1>&2
echo "[ERROR] The last line of the BashJavaUtils outputs is expected to 
be the execution result, following the prefix '${EXECUTION_PREFIX}'" 1>&2
echo "$output" 1>&2
exit 1
fi
{code}

It should be


{code:bash}
last_line=`echo "$execution_config" | tail -n 1`
if ! [[ "$last_line" =~ ^${EXECUTION_PREFIX}.* ]]; then
# ...
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)