from:"Timo Walther"

Timo Walther created FLINK-11068:


 Summary: Port Table class to Java
 Key: FLINK-11068
 URL: https://issues.apache.org/jira/browse/FLINK-11068
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Dawid Wysakowicz


This task includes porting {{Table}} and subclasses to Java. API-breaking 
changes need to be avoided and discussed. Some refactoring and clean up might 
be necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11067) Port TableEnvironments to Java

Timo Walther created FLINK-11067:


 Summary: Port TableEnvironments to Java
 Key: FLINK-11067
 URL: https://issues.apache.org/jira/browse/FLINK-11067
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Dawid Wysakowicz


This task includes porting {{TableEnvironment}}, {{StreamTableEnvironment}}, 
{{BatchTableEnvironment}} to Java. API-breaking changes need to be avoided and 
discussed. Some refactoring and clean up might be necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11066) Migrate main Table API classes to flink-table-api-base

Timo Walther created FLINK-11066:


 Summary: Migrate main Table API classes to flink-table-api-base
 Key: FLINK-11066
 URL: https://issues.apache.org/jira/browse/FLINK-11066
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


This issue covers the forth step of the migration plan mentioned in 
[FLIP-28|https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free].

The most important API classes such as TableEnvironments and Table are exposing 
a lot of protected methods in Java. Migrating those classes makes the API clean 
and the implementation ready for a major refactoring for the new catalog 
support. We can also think about a separation of interface and implementation; 
e.g. {{Table}} & {{TableImpl}}. However, the current API design makes this 
difficult as we are using constructors of interfaces {{new Table(...)}}.

This issue tracks efforts of porting API classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11065) Migrate flink-table runtime classes

Timo Walther created FLINK-11065:


 Summary: Migrate flink-table runtime classes
 Key: FLINK-11065
 URL: https://issues.apache.org/jira/browse/FLINK-11065
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


This issue covers the third step of the migration plan mentioned in 
[FLIP-28|https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free].

All runtime classes have little dependencies to other classes. This issue 
tracks efforts of porting runtime classes.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11064) Setup a new flink-table module structure

Timo Walther created FLINK-11064:


 Summary: Setup a new flink-table module structure
 Key: FLINK-11064
 URL: https://issues.apache.org/jira/browse/FLINK-11064
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


This issue covers the first step of the migration plan mentioned in 
[FLIP-28|https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free].

Move all files to their corresponding modules as they are. No migration happens 
at this stage. Modules might contain both Scala and Java classes. Classes that 
should be placed in `flink-table-spi` but are in Scala so far remain in 
`flink-table-api-base` for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11063) Make flink-table Scala-free

Timo Walther created FLINK-11063:

Summary: Make flink-table Scala-free
Key: FLINK-11063
URL: https://issues.apache.org/jira/browse/FLINK-11063
Project: Flink
Issue Type: New Feature
Components: Table API SQL
Reporter: Timo Walther

Currently, the Table & SQL API is implemented in Scala. This decision was made
a long-time ago when the initial code base was created as part of a master's
thesis. The community kept Scala because of the nice language features that
enable a fluent Table API like {{table.select('field.trim())}} and because
Scala allows for quick prototyping (e.g. multi-line comments for code
generation). The committers enforced not splitting the code-base into two
programming languages.

However, nowadays the {{flink-table}} module more and more becomes an important
part in the Flink ecosystem. Connectors, formats, and SQL client are actually
implemented in Java but need to interoperate with {{flink-table}} which makes
these modules dependent on Scala. As mentioned in an earlier mail thread, using
Scala for API classes also exposes member variables and methods in Java that
should not be exposed to users. Java is still the most important API language
and right now we treat it as a second-class citizen.

In order to not introduce more technical debt, the community aims to make the
{{flink-table}} module Scala-free as a long-term goal. This will be a
continuous effort that can not be finished within one release. We aim for
avoiding API-breaking changes.

A full description can be found in the corresponding
[FLIP-28|https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free].

FLIP-28 also contains a rough roadmap and serves as migration guidelines.

This Jira issue is an umbrella issue for tracking the efforts and possible
migration blockers.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] Apache Flink 1.7.0 released

2018-11-30 Thread Timo Walther

Thanks for being the release manager Till and thanks for the great work 
Flink community!


Regards,
Timo


Am 30.11.18 um 10:39 schrieb Till Rohrmann:

The Apache Flink community is very happy to announce the release of Apache
Flink 1.7.0, which is the next major release.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the new features
and improvements for this release:
https://flink.apache.org/news/2018/11/30/release-1.7.0.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343585

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Cheers,
Till

[jira] [Created] (FLINK-11036) Streaming classloader end-to-end test does not work on source releases

2018-11-30 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11036:


 Summary: Streaming classloader end-to-end test does not work on 
source releases
 Key: FLINK-11036
 URL: https://issues.apache.org/jira/browse/FLINK-11036
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests
Reporter: Timo Walther


A Flink distribution that has been built from a source release has no git 
commit id associated. The web UI shows {{unkown}} as the version commit. 
Therefore, the {{test_streaming_classloader.sh}} can not be executed on a 
source release. Either we change the test setup or we skip the test if the 
commit id is not present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-29 Thread Timo Walther

@Kurt: Yes, I don't think that that forks of Flink will have a hard time 
to keep up with the porting. That is also why I called this `long-term 
goal` because I don't see big resources for the porting to happen 
quicker. But at least new features, API, and runtime profit from Java to 
Scala conversion.


@Jark: I updated the document:

1. flink-table-common has been renamed to flink-table-spi by request.

2. Yes, good point. flink-sql-client can be moved there as well.

3. I added a paragraph to the document. Porting the code generation to 
Java makes only sense if acceptable tooling for it is in place.



Thanks for the feedback,

Timo


Am 29.11.18 um 08:28 schrieb Jark Wu:

Hi Timo,

Thanks for the great work!

Moving flink-table to Java is a long-awaited things but will involve much
effort. Agree with that we should make it as a long-term goal.

I have read the google doc and +1 for the proposal. Here I have some
questions:

1. Where should the flink-table-common module place ?  Will we move the
flink-table-common classes to the new modules?
2. Should flink-sql-client also as a sub-module under flink-table ?
3. The flink-table-planner contains code generation and will be converted
to Java. Actually, I prefer using Scala to code generate because of the
Multiline-String and String-Interpolation (i.e. s"hello $user") features in
Scala. It makes code of code-generation more readable. Do we really
want to migrate
code generation to Java?

Best,
Jark


On Wed, 28 Nov 2018 at 09:14, Kurt Young  wrote:


Hi Timo and Vino,

I agree that table is very active and there is no guarantee for not
producing any conflicts if you decide
to develop based on community version. I think this part is the risk what
we can imagine in the first place. But massively
language replacing is something you can not imagine and be ready for, there
is no feature added, no refactor is done, simply changing
from scala to java will cause lots of conflicts.

But I also agree that this is a "technical debt" that we should eventually
pay, as you said, we can do this slowly, even one file each time,
let other people have more time to resolve the conflicts.

Best,
Kurt


On Tue, Nov 27, 2018 at 8:37 PM Timo Walther  wrote:


Hi Kurt,

I understand your concerns. However, there is no concrete roadmap for
Flink 2.0 and (as Vino said) the flink-table is developed very actively.
Major refactorings happened in the past and will also happen with or
without Scala migration. A good example, is the proper catalog support
which will refactor big parts of the TableEnvironment class. Or the
introduction of "retractions" which needed a big refactoring of the
planning phase. Stability is only guaranteed for the API and the general
behavior, however, currently flink-table is not using @Public or
@PublicEvolving annotations for a reason.

I think the migration will still happen slowly because it needs people
that allocate time for that. Therefore, even Flink forks can slowly
adapt to the evolving Scala-to-Java code base.

Regards,
Timo


Am 27.11.18 um 13:16 schrieb vino yang:

Hi Kurt,

Currently, there is still a long time to go from flink 2.0. Considering
that the flink-table
is one of the most active modules in the current flink project, each
version has
a number of changes and features added. I think that refactoring faster
will reduce subsequent
complexity and workload. And this may be a gradual and long process. We
should be able to
   regard it as a "technical debt", and if it does not change it, it

will

also affect the decision-making of other issues.

Thanks, vino.

Kurt Young  于2018年11月27日周二 下午7:34写道：


Hi Timo,

Thanks for writing up the document. I'm +1 for reorganizing the module
structure and make table scala free. But I have
a little concern abount the timing. Is it more appropriate to get this

done

when Flink decide to bump to next big version, like 2.x.
It's true you can keep all the class's package path as it is, and will

not

introduce API change. But if some company are developing their own
Flink, and sync with community version by rebasing, may face a lot of
conflicts. Although you can avoid conflicts by always moving source

codes

between packages, but I assume you still need to delete the original

scala

file and add a new java file when you want to change program language.

Best,
Kurt


On Tue, Nov 27, 2018 at 5:57 PM Timo Walther 

wrote:

Hi Hequn,

thanks for your feedback. Yes, migrating the test cases is another

issue

that is not represented in the document but should naturally go along
with the migration.

I agree that we should migrate the main API classes quickly within

this

1.8 release after the module split has been performed. Help here is
highly appreciated!

I forgot that Java supports static methods in interfaces now, but
actually I don't like the design of calling

`TableEnvironment.get(env)`.

Because people often use `TableEnvironment tEnd =
TableE

[jira] [Created] (FLINK-11020) Reorder joins only to eliminate cross joins

2018-11-28 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11020:


 Summary: Reorder joins only to eliminate cross joins 
 Key: FLINK-11020
 URL: https://issues.apache.org/jira/browse/FLINK-11020
 Project: Flink
  Issue Type: Improvement
  Components: Table API  SQL
Reporter: Timo Walther


Currently, we don't reorder join and rely on the order provided by the user. 
This is fine for most of the cases, however, it limits the set of supported SQL 
queries.

Example:
{code}
val streamUtil: StreamTableTestUtil = streamTestUtil()
streamUtil.addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c.rowtime, 
'proctime.proctime)
streamUtil.addTable[(Int, String, Long)]("MyTable2", 'a, 'b, 'c.rowtime, 
'proctime.proctime)
streamUtil.addTable[(Int, String, Long)]("MyTable3", 'a, 'b, 'c.rowtime, 
'proctime.proctime)
val sqlQuery =
  """
|SELECT t1.a, t3.b
|FROM MyTable3 t3, MyTable2 t2, MyTable t1
|WHERE t1.a = t3.a AND t1.a = t2.a
|""".stripMargin

streamUtil.printSql(sqlQuery)
{code}

Given the current rule sets, this query produces a cross join which is not 
supported and thus leads to:

{code}
org.apache.flink.table.api.TableException: Cannot generate a valid execution 
plan for the given query: 

LogicalProject(a=[$8], b=[$1])
  LogicalFilter(condition=[AND(=($8, $0), =($8, $4))])
LogicalJoin(condition=[true], joinType=[inner])
  LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[_DataStreamTable_2]])
LogicalTableScan(table=[[_DataStreamTable_1]])
  LogicalTableScan(table=[[_DataStreamTable_0]])

This exception indicates that the query uses an unsupported SQL feature.
Please check the documentation for the set of currently supported SQL features.
{code}

Introducing {{JoinPushThroughJoinRule}} would help but should only be applied 
if a cross join is the only alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-27 Thread Timo Walther


Hi Kurt,

I understand your concerns. However, there is no concrete roadmap for 
Flink 2.0 and (as Vino said) the flink-table is developed very actively. 
Major refactorings happened in the past and will also happen with or 
without Scala migration. A good example, is the proper catalog support 
which will refactor big parts of the TableEnvironment class. Or the 
introduction of "retractions" which needed a big refactoring of the 
planning phase. Stability is only guaranteed for the API and the general 
behavior, however, currently flink-table is not using @Public or 
@PublicEvolving annotations for a reason.


I think the migration will still happen slowly because it needs people 
that allocate time for that. Therefore, even Flink forks can slowly 
adapt to the evolving Scala-to-Java code base.


Regards,
Timo


Am 27.11.18 um 13:16 schrieb vino yang:

Hi Kurt,

Currently, there is still a long time to go from flink 2.0. Considering
that the flink-table
is one of the most active modules in the current flink project, each
version has
a number of changes and features added. I think that refactoring faster
will reduce subsequent
complexity and workload. And this may be a gradual and long process. We
should be able to
  regard it as a "technical debt", and if it does not change it, it will
also affect the decision-making of other issues.

Thanks, vino.

Kurt Young  于2018年11月27日周二 下午7:34写道：


Hi Timo,

Thanks for writing up the document. I'm +1 for reorganizing the module
structure and make table scala free. But I have
a little concern abount the timing. Is it more appropriate to get this done
when Flink decide to bump to next big version, like 2.x.
It's true you can keep all the class's package path as it is, and will not
introduce API change. But if some company are developing their own
Flink, and sync with community version by rebasing, may face a lot of
conflicts. Although you can avoid conflicts by always moving source codes
between packages, but I assume you still need to delete the original scala
file and add a new java file when you want to change program language.

Best,
Kurt


On Tue, Nov 27, 2018 at 5:57 PM Timo Walther  wrote:


Hi Hequn,

thanks for your feedback. Yes, migrating the test cases is another issue
that is not represented in the document but should naturally go along
with the migration.

I agree that we should migrate the main API classes quickly within this
1.8 release after the module split has been performed. Help here is
highly appreciated!

I forgot that Java supports static methods in interfaces now, but
actually I don't like the design of calling `TableEnvironment.get(env)`.
Because people often use `TableEnvironment tEnd =
TableEnvironment.get(env)` and then wonder why there is no
`toAppendStream` or `toDataSet` because they are using the base class.
However, things like that can be discussed in the corresponding issue
when it comes to implementation.

@Vino: I think your work fits nicely to these efforts.

@everyone: I will wait for more feedback until end of this week. Then I
will convert the design document into a FLIP and open subtasks in Jira,
if there are no objections?

Regards,
Timo

Am 24.11.18 um 13:45 schrieb vino yang:

Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect to do, but from

the

impact I have observed,
it will involve many classes that are currently in flink-table.

*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461

Hequn Cheng  于2018年11月24日周六 下午7:20写道：


Hi Timo,

Thanks for the effort and writing up this document. I like the idea to

make

flink-table scala free, so +1 for the proposal!

It's good to make Java the first-class citizen. For a long time, we

have

neglected java so that many features in Table are missed in Java Test
cases, such as this one[1] I found recently. And I think we may also

need

to migrate our test cases, i.e, add java tests.

This definitely is a big change and will break API compatible. In

order

to

bring a smaller impact on users, I think we should go fast when we

migrate

APIs targeted to users. It's better to introduce the user sensitive

changes

within a release. However, it may be not that easy. I can help to
contribute.

Separation of interface and implementation is a good idea. This may
introduce a minimum of dependencies or even no dependencies. I saw

your

reply in the google doc. Java8 has already supported static method for
interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001


On Fri, Nov 23, 2018 at 5:36 PM Timo Walther 

wrote:

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-27 Thread Timo Walther


Hi Hequn,

thanks for your feedback. Yes, migrating the test cases is another issue 
that is not represented in the document but should naturally go along 
with the migration.


I agree that we should migrate the main API classes quickly within this 
1.8 release after the module split has been performed. Help here is 
highly appreciated!


I forgot that Java supports static methods in interfaces now, but 
actually I don't like the design of calling `TableEnvironment.get(env)`. 
Because people often use `TableEnvironment tEnd = 
TableEnvironment.get(env)` and then wonder why there is no 
`toAppendStream` or `toDataSet` because they are using the base class. 
However, things like that can be discussed in the corresponding issue 
when it comes to implementation.


@Vino: I think your work fits nicely to these efforts.

@everyone: I will wait for more feedback until end of this week. Then I 
will convert the design document into a FLIP and open subtasks in Jira, 
if there are no objections?


Regards,
Timo

Am 24.11.18 um 13:45 schrieb vino yang:

Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect to do, but from the
impact I have observed,
it will involve many classes that are currently in flink-table.

*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461

Hequn Cheng  于2018年11月24日周六 下午7:20写道：


Hi Timo,

Thanks for the effort and writing up this document. I like the idea to make
flink-table scala free, so +1 for the proposal!

It's good to make Java the first-class citizen. For a long time, we have
neglected java so that many features in Table are missed in Java Test
cases, such as this one[1] I found recently. And I think we may also need
to migrate our test cases, i.e, add java tests.

This definitely is a big change and will break API compatible. In order to
bring a smaller impact on users, I think we should go fast when we migrate
APIs targeted to users. It's better to introduce the user sensitive changes
within a release. However, it may be not that easy. I can help to
contribute.

Separation of interface and implementation is a good idea. This may
introduce a minimum of dependencies or even no dependencies. I saw your
reply in the google doc. Java8 has already supported static method for
interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001


On Fri, Nov 23, 2018 at 5:36 PM Timo Walther  wrote:


Hi everyone,

thanks for the great feedback so far. I updated the document with the
input I got so far

@Fabian: I moved the porting of flink-table-runtime classes up in the

list.

@Xiaowei: Could you elaborate what "interface only" means to you? Do you
mean a module containing pure Java `interface`s? Or is the validation
logic also part of the API module? Are 50+ expression classes part of
the API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when we
should develop in Scala and when in Java. As Piotr said, every new Scala
line is instant technical debt.

Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:

Hi Timo,

Thanks for writing this down +1 from my side :)


I'm wondering that whether we can have rule in the interim when Java

and Scala coexist that dependency can only be one-way. I found that in

the

current code base there are cases where a Scala class extends Java and

vise

versa. This is quite painful. I'm thinking if we could say that extension
can only be from Java to Scala, which will help the situation. However,

I'm

not sure if this is practical.

Xuefu: I’m also not sure what’s the best approach here, probably we

will

have to work it out as we go. One thing to consider is that from now on,
every single new code line written in Scala anywhere in Flink-table

(except

of Flink-table-api-scala) is an instant technological debt. From this
perspective I would be in favour of tolerating quite big inchonvieneces
just to avoid any new Scala code.

Piotrek


On 23 Nov 2018, at 03:25, Zhang, Xuefu 

wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our external

catalog rework, we found much confusion between Java and Scala, and this
Scala-free roadmap should greatly mitigate that.

I'm wondering that whether we can have rule in the interim when Java

and Scala coexist that dependency can only be one-way. I found that in

the

current code base there are cases where a Scala class extends Java and

vise

versa. This is quite painful. I'm thinking if we could say that extension
can only be from Java to Scala, which will help the situation. However,

I'm

[jira] [Created] (FLINK-11011) Elasticsearch 6 sink end-to-end test unstable

2018-11-27 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11011:


 Summary: Elasticsearch 6 sink end-to-end test unstable
 Key: FLINK-11011
 URL: https://issues.apache.org/jira/browse/FLINK-11011
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests
Reporter: Timo Walther
Assignee: Timo Walther


The log contains errors:

{code}

2018-11-26 12:55:02,363 ERROR org.apache.flink.runtime.jobmaster.JobMaster - 
Received DeclineCheckpoint message for job 1a7516a04fb0cc85bdb3aa21548bd9bb 
with no CheckpointCoordinator

{code}

 

See also: https://api.travis-ci.org/v3/job/459693461/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Flink SQL DDL Design

2018-11-27 Thread Timo Walther

Thanks for offering your help here, Xuefu. It would be great to move
these efforts forward. I agree that the DDL is somehow releated to the
unified connector API design but we can also start with the basic
functionality now and evolve the DDL during this release and next releases.

For example, we could identify the MVP DDL syntax that skips defining
key constraints and maybe even time attributes. This DDL could be used
for batch usecases, ETL, and materializing SQL queries (no time
operations like windows).

The unified connector API is high on our priority list for the 1.8
release. I will try to update the document until mid of next week.

Regards,

Timo

Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
but we are definitely interested in moving this forward. I think once the
unified connector API design [1] is done, we can finalize the DDL design as
well and start creating concrete subtasks to collaborate on the
implementation with the community.

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu
wrote:

Hi Shuyi,

I'm wondering if you folks still have the bandwidth working on this.

We have some dedicated resource and like to move this forward. We can
collaborate.

Thanks,

Xuefu

--
发件人：wenlong.lwl
日 期：2018年11月05日 11:15:35
收件人：
主 题：Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński wrote:

+1, Thanks for the proposal.

I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].

Best Regards,
Dom.

[1]
https://github.com/apache/bahir-flink

sob., 3 lis 2018 o 17:17 Rong Rong napisał(a):

+1. Thanks for putting the proposal together Shuyi.

DDL has been brought up in a couple of times previously [1,2].

Utilizing

DDL will definitely be a great extension to the current Flink SQL to
systematically support some of the previously brought up features such

[3]. And it will also be beneficial to see the document closely aligned
with the previous discussion for unified SQL connector API [4].

I also left a few comments on the doc. Looking forward to the alignment
with the other couple of efforts and contributing to them!

Best,
Rong

[1]

http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E

[2]

http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E

[3] https://issues.apache.org/jira/browse/FLINK-8003
[4]

http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E

On Fri, Nov 2, 2018 at 10:22 AM Bowen Li wrote:

Thanks Shuyi!

I left some comments there. I think the design of SQL DDL and

Flink-Hive

integration/External catalog enhancements will work closely with

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-23 Thread Timo Walther

Hi everyone,

thanks for the great feedback so far. I updated the document with the
input I got so far

@Fabian: I moved the porting of flink-table-runtime classes up in the list.

@Xiaowei: Could you elaborate what "interface only" means to you? Do you
mean a module containing pure Java `interface`s? Or is the validation
logic also part of the API module? Are 50+ expression classes part of
the API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when we
should develop in Scala and when in Java. As Piotr said, every new Scala
line is instant technical debt.

Thanks,
Timo

Am 23.11.18 um 10:29 schrieb Piotr Nowojski:

Hi Timo,

Thanks for writing this down +1 from my side :)

I'm wondering that whether we can have rule in the interim when Java and Scala
coexist that dependency can only be one-way. I found that in the current code
base there are cases where a Scala class extends Java and vise versa. This is
quite painful. I'm thinking if we could say that extension can only be from
Java to Scala, which will help the situation. However, I'm not sure if this is
practical.

Xuefu: I’m also not sure what’s the best approach here, probably we will have
to work it out as we go. One thing to consider is that from now on, every
single new code line written in Scala anywhere in Flink-table (except of
Flink-table-api-scala) is an instant technological debt. From this perspective
I would be in favour of tolerating quite big inchonvieneces just to avoid any
new Scala code.

Piotrek

On 23 Nov 2018, at 03:25, Zhang, Xuefu wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our external catalog
rework, we found much confusion between Java and Scala, and this Scala-free
roadmap should greatly mitigate that.

Thanks,
Xuefu

--
Sender:jincheng sun
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev
Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many dependence. In
particular, it is not necessary to introduce the specific implementation
dependencies which users do not care about. So I am glad to see your
proposal, and hope when we consider splitting the API interface into a
separate module, so that the user can introduce minimum of dependencies.

So, +1 to [separation of interface and implementation; e.g. `Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang 于2018年11月22日周四 下午10:50写道：

Hi Timo, thanks for driving this! I think that this is a nice thing to do.
While we are doing this, can we also keep in mind that we want to
eventually have a TableAPI interface only module which users can take
dependency on, but without including any implementation details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske wrote:

Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize the porting of the
flink-table-common classes.
Since flink-table-runtime is (or should be) independent of the API and
planner modules, we could start porting these classes once the code is
split into the new module structure.
The benefits of a Scala-free flink-table-runtime would be a Scala-free
execution Jar.

Best, Fabian

Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
twal...@apache.org

:
Hi everyone,

I would like to continue this discussion thread and convert the outcome
into a FLIP such that users and contributors know what to expect in the
upcoming releases.

I created a design document [1] that clarifies our motivation why we
want to do this, how a Maven module structure could look like, and a
suggestion for a migration plan.

It would be great to start with the efforts for the 1.8 release such
that new features can be developed in Java and major refactorings such
as improvements to the connectors and external catalog support are not
blocked.

Please let me know what you think.

Regards,
Timo

[1]

https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing

Am 02.07.18 um 17:08 schrieb Fabian Hueske:

Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for the

comments.

I think the first step would be to separate the flink-table module

into

multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later divided

further

into Java/Scala Table API/SQL
- fli

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-22 Thread Timo Walther

hink we have to plan this well.
I don't like the idea of having the whole code base fragmented into Java
and Scala code for too long.

I think we should do this one step at a time and focus on migrating one
module at a time.
IMO, the easiest start would be to port the runtime to Java.
Extracting the API classes into an own module, porting them to Java, and
removing the Scala dependency won't be possible without breaking the API
since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann :


I think that is a noble and honorable goal and we should strive for it.
This, however, must be an iterative process given the sheer size of the
code base. I like the approach to define common Java modules which are

used

by more specific Scala modules and slowly moving classes from Scala to
Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <

pi...@data-artisans.com>

wrote:


Hi,

I do not have an experience with how scala and java interacts with

each

other, so I can not fully validate your proposal, but generally

speaking

+1

from me.

Does it also mean, that we should slowly migrate `flink-table-core` to
Java? How would you envision it? It would be nice to be able to add

new

classes/features written in Java and so that they can coexist with old
Scala code until we gradually switch from Scala to Java.

Piotrek


On 13 Jun 2018, at 11:32, Timo Walther  wrote:

Hi everyone,

as you all know, currently the Table & SQL API is implemented in

Scala.

This decision was made a long-time ago when the initital code base was
created as part of a master's thesis. The community kept Scala

because of

the nice language features that enable a fluent Table API like
table.select('field.trim()) and because Scala allows for quick

prototyping

(e.g. multi-line comments for code generation). The committers

enforced

not

splitting the code-base into two programming languages.

However, nowadays the flink-table module more and more becomes an

important part in the Flink ecosystem. Connectors, formats, and SQL

client

are actually implemented in Java but need to interoperate with

flink-table

which makes these modules dependent on Scala. As mentioned in an

earlier

mail thread, using Scala for API classes also exposes member variables

and

methods in Java that should not be exposed to users [1]. Java is still

the

most important API language and right now we treat it as a

second-class

citizen. I just noticed that you even need to add Scala if you just

want

to

implement a ScalarFunction because of method clashes between `public

String

toString()` and `public scala.Predef.String toString()`.

Given the size of the current code base, reimplementing the entire

flink-table code in Java is a goal that we might never reach.

However, we

should at least treat the symptoms and have this as a long-term goal

in

mind. My suggestion would be to convert user-facing and runtime

classes

and

split the code base into multiple modules:

flink-table-java {depends on flink-table-core}

Implemented in Java. Java users can use this. This would require to

convert classes like TableEnvironment, Table.

flink-table-scala {depends on flink-table-core}

Implemented in Scala. Scala users can use this.


flink-table-common

Implemented in Java. Connectors, formats, and UDFs can use this. It

contains interface classes such as descriptors, table sink, table

source.

flink-table-core {depends on flink-table-common and

flink-table-runtime}

Implemented in Scala. Contains the current main code base.


flink-table-runtime

Implemented in Java. This would require to convert classes in

o.a.f.table.runtime but would improve the runtime potentially.


What do you think?


Regards,

Timo

[1]

http://apache-flink-mailing-list-archive.1008284.n3.

nabble.com/DISCUSS-Convert-main-Table-API-classes-into-

traits-tp21335.html

[jira] [Created] (FLINK-10927) Resuming Externalized Checkpoint (r, inc, up) test instable

2018-11-19 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10927:


 Summary: Resuming Externalized Checkpoint (r, inc, up) test 
instable
 Key: FLINK-10927
 URL: https://issues.apache.org/jira/browse/FLINK-10927
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests
Reporter: Timo Walther


The test failed on Travis.

{code}

2018-11-18 12:34:44,476 INFO 
org.apache.flink.streaming.api.operators.AbstractStreamOperator - Could not 
complete snapshot 12 for operator 
ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (2/2). java.io.IOException: 
Cannot register Closeable, registry is already closed. Closing argument. at 
org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
 at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:123)
 at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:111)
 at 
org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
 at 
org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
 at 
org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
 at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
 at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
 at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
 at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
 at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
 at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) 
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at 
java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apply for Contributor Permission

2018-11-13 Thread Timo Walther


Welcome to the Flink community!

I gave you contributor permissions.

Regards,

Timo



Am 13.11.18 um 14:09 schrieb Shuiqiang Chen:

Hi guys:

Could somebody give me contributor permissions? my jira username is :
csq.

Thanks.

[jira] [Created] (FLINK-10859) Queryable state (rocksdb) end-to-end test fails on Travis

2018-11-13 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10859:


 Summary: Queryable state (rocksdb) end-to-end test fails on Travis
 Key: FLINK-10859
 URL: https://issues.apache.org/jira/browse/FLINK-10859
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests
Reporter: Timo Walther


Queryable state (rocksdb) end-to-end test fails on Travis.

I guess the root exception is:

{code}

org.apache.flink.runtime.checkpoint.decline.CheckpointDeclineTaskNotReadyException:
 Task Source: Custom Source (1/1) was not running at 
org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1166) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10847) Add support for IS NOT DISTINCT FROM in code generator

Timo Walther created FLINK-10847:


 Summary: Add support for IS NOT DISTINCT FROM in code generator
 Key: FLINK-10847
 URL: https://issues.apache.org/jira/browse/FLINK-10847
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


It seems that sometimes {{IS NOT DISTINCT FROM}} is not rewritten by Calcite, 
thus we should have built-in support for it and add more tests. It is already 
officially documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10846) Add support for IS DISTINCT FROM in code generator

Timo Walther created FLINK-10846:


 Summary: Add support for IS DISTINCT FROM in code generator
 Key: FLINK-10846
 URL: https://issues.apache.org/jira/browse/FLINK-10846
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


It seems that sometimes \{{IS DISTINCT FROM}} is not rewritten by Calcite, thus 
we should have built-in support for it and add more tests. It is already 
officially documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10845) Support DISTINCT aggregates for batch

Timo Walther created FLINK-10845:


 Summary: Support DISTINCT aggregates for batch
 Key: FLINK-10845
 URL: https://issues.apache.org/jira/browse/FLINK-10845
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


Currently, we support distinct aggregates for streaming. However, executing the 
same query on batch like the following test:

{code}
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

val sqlQuery =
  "SELECT b, " +
  "  SUM(DISTINCT (a / 3)), " +
  "  COUNT(DISTINCT SUBSTRING(c FROM 1 FOR 2))," +
  "  COUNT(DISTINCT c) " +
  "FROM MyTable " +
  "GROUP BY b"

val data = new mutable.MutableList[(Int, Long, String)]
data.+=((1, 1L, "Hi"))
data.+=((2, 2L, "Hello"))
data.+=((3, 2L, "Hello world"))
data.+=((4, 3L, "Hello world, how are you?"))
data.+=((5, 3L, "I am fine."))
data.+=((6, 3L, "Luke Skywalker"))
data.+=((7, 4L, "Comment#1"))
data.+=((8, 4L, "Comment#2"))
data.+=((9, 4L, "Comment#3"))
data.+=((10, 4L, "Comment#4"))
data.+=((11, 5L, "Comment#5"))
data.+=((12, 5L, "Comment#6"))
data.+=((13, 5L, "Comment#7"))
data.+=((14, 5L, "Comment#8"))
data.+=((15, 5L, "Comment#9"))
data.+=((16, 6L, "Comment#10"))
data.+=((17, 6L, "Comment#11"))
data.+=((18, 6L, "Comment#12"))
data.+=((19, 6L, "Comment#13"))
data.+=((20, 6L, "Comment#14"))
data.+=((21, 6L, "Comment#15"))


val t = env.fromCollection(data).toTable(tEnv).as('a, 'b, 'c)
tEnv.registerTable("MyTable", t)

tEnv.sqlQuery(sqlQuery).toDataSet[Row].print()
{code}

Fails with:

{code}
org.apache.flink.table.codegen.CodeGenException: Unsupported call: IS NOT 
DISTINCT FROM 
If you think this function should be supported, you can create an issue and 
start a discussion for it.

at 
org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
at 
org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:1027)
at 
org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:66)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:107)
at 
org.apache.flink.table.codegen.CodeGenerator.generateExpression(CodeGenerator.scala:247)
at 
org.apache.flink.table.plan.nodes.dataset.DataSetJoin.addInnerJoin(DataSetJoin.scala:221)
at 
org.apache.flink.table.plan.nodes.dataset.DataSetJoin.translateToPlan(DataSetJoin.scala:170)
at 
org.apache.flink.table.plan.nodes.dataset.DataSetCalc.translateToPlan(DataSetCalc.scala:91)
at 
org.apache.flink.table.plan.nodes.dataset.DataSetJoin.translateToPlan(DataSetJoin.scala:165)
at 
org.apache.flink.table.plan.nodes.dataset.DataSetCalc.translateToPlan(DataSetCalc.scala:91)
at 
org.apache.flink.table.api.BatchTableEnvironment.translate(BatchTableEnvironment.scala:498)
at 
org.apache.flink.table.api.BatchTableEnvironment.translate(BatchTableEnvironment.scala:476)
at 
org.apache.flink.table.api.scala.BatchTableEnvironment.toDataSet(BatchTableEnvironment.scala:141)
at 
org.apache.flink.table.api.scala.TableConversions.toDataSet(TableConversions.scala:50)
at 
org.apache.flink.table.runtime.stream.sql.SqlITCase.testDistinctGroupBy(SqlITCase.scala:2
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10843) Make Kafka version definition more flexible for new Kafka table factory

Timo Walther created FLINK-10843:


 Summary: Make Kafka version definition more flexible for new Kafka 
table factory
 Key: FLINK-10843
 URL: https://issues.apache.org/jira/browse/FLINK-10843
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector, Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Currently, a user has to specify a specific version for a Kafka connector like:

{code}
connector:
  type: kafka
  version: "0.11" # required: valid connector versions are "0.8", "0.9", 
"0.10", and "0.11"
  topic: ...  # required: topic name from which the table is read
{code}

However, the new Kafka connector aims to be universal, thus, at least for 1.x 
and 2.x versions which we should support those as parameters as well. 
Currently, {{2.0}} is the only accepted string for the factory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10825) ConnectedComponents test instable on Travis

2018-11-08 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10825:


 Summary: ConnectedComponents test instable on Travis
 Key: FLINK-10825
 URL: https://issues.apache.org/jira/browse/FLINK-10825
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests
Reporter: Timo Walther


The "ConnectedComponents iterations with high parallelism end-to-end test" 
succeeds on Travis but the log contains with the following exception:

{code}
2018-11-08 10:15:13,698 ERROR 
org.apache.flink.runtime.taskexecutor.rpc.RpcResultPartitionConsumableNotifier  
- Could not schedule or update consumers at the JobManager.
org.apache.flink.runtime.executiongraph.ExecutionGraphException: Cannot find 
execution for execution Id 5b02c2f51e51f68b66bfab07afc1bf17.
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleOrUpdateConsumers(ExecutionGraph.java:1635)
at 
org.apache.flink.runtime.jobmaster.JobMaster.scheduleOrUpdateConsumers(JobMaster.java:637)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor.aroundReceive(Actor.scala:502)
at akka.actor.Actor.aroundReceive$(Actor.scala:500)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-01 Thread Timo Walther


Hi Jincheng,

I was also thinking about introducing a process function for the Table 
API several times. This would allow to define more complex logic (custom 
windows, timers, etc.) embedded into a relational API with schema 
awareness and optimization around the black box. Of course this would 
mean that we diverge with Table API from SQL API, however, it would open 
the Table API also for more event-driven applications.


Maybe it would be possible to define timers and firing logic using Table 
API expressions and UDFs. Within planning this would be treated as a 
special Calc node.


Just some ideas that might be interesting for new use cases.

Regards,
Timo


Am 01.11.18 um 13:12 schrieb Aljoscha Krettek:

Hi Jincheng,

these points sound very good! Are there any concrete proposals for changes? For 
example a FLIP/design document?

See here for FLIPs: 
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best,
Aljoscha


On 1. Nov 2018, at 12:51, jincheng sun  wrote:

*I am sorry for the formatting of the email content. I reformat
the **content** as follows---*

*Hi ALL,*

With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used relational query language. However, there are
still some scenarios where Flink SQL failed to meet user needs in terms of
functionality and ease of use, such as:

*1. In terms of functionality*
Iteration, user-defined window, user-defined join, user-defined
GroupReduce, etc. Users cannot express them with SQL;

*2. In terms of ease of use*

   - Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
   udf2(), udf3())” can be used to accomplish the same function., with a
   map() function returning 100 columns, one has to define or call 100 UDFs
   when using SQL, which is quite involved.
   - FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can be
   implemented with “table.join(udtf).select()”. However, it is obvious that
   dataStream is easier to use than SQL.

Due to the above two reasons, some users have to use the DataStream API or
the DataSet API. But when they do that, they lose the unification of batch
and streaming. They will also lose the sophisticated optimizations such as
codegen, aggregate join transpose and multi-stage agg from Flink SQL.

We believe that enhancing the functionality and productivity is vital for
the successful adoption of Table API. To this end,  Table API still
requires more efforts from every contributor in the community. We see great
opportunity in improving our user’s experience from this work. Any feedback
is welcome.

Regards,

Jincheng

jincheng sun  于2018年11月1日周四 下午5:07写道：


Hi all,

With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used relational query language. However, there are
still some scenarios where Flink SQL failed to meet user needs in terms of
functionality and ease of use, such as:


   -

   In terms of functionality

Iteration, user-defined window, user-defined join, user-defined
GroupReduce, etc. Users cannot express them with SQL;

   -

   In terms of ease of use
   -

  Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
  udf2(), udf3())” can be used to accomplish the same function., with a
  map() function returning 100 columns, one has to define or call 100 UDFs
  when using SQL, which is quite involved.
  -

  FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can
  be implemented with “table.join(udtf).select()”. However, it is obvious
  that datastream is easier to use than SQL.


Due to the above two reasons, some users have to use the DataStream API or
the DataSet API. But when they do that, they lose the unification of batch
and streaming. They will also lose the sophisticated optimizations such as
codegen, aggregate join transpose  and multi-stage agg from Flink SQL.

We believe that enhancing the functionality and productivity is vital for
the successful adoption of Table API. To this end,  Table API still
requires more efforts from every contributor in the community. We see great
opportunity in improving our user’s experience from this work. Any feedback
is welcome.

Regards,

Jincheng

[jira] [Created] (FLINK-10726) Streaming SQL end-to-end test fails due to recent flink-table-common changes

2018-10-30 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10726:


 Summary: Streaming SQL end-to-end test fails due to recent 
flink-table-common changes
 Key: FLINK-10726
 URL: https://issues.apache.org/jira/browse/FLINK-10726
 Project: Flink
  Issue Type: Bug
  Components: E2E Tests, Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


The streaming SQL end-to-end test fails with:

{code}
java.lang.NoClassDefFoundError: org/apache/flink/table/api/TableException
at 
org.apache.flink.sql.tests.StreamSQLTestProgram.main(StreamSQLTestProgram.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.table.api.TableException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
{code}

This has to do with the recent dependency changes for introducing a 
{{flink-table-common}} module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10689) Port Table API extension points to flink-table-common

Timo Walther created FLINK-10689:


 Summary: Port Table API extension points to flink-table-common
 Key: FLINK-10689
 URL: https://issues.apache.org/jira/browse/FLINK-10689
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


After FLINK-10687 and FLINK-10688 have been resolved, we should also port the 
remaining extension points of the Table API to flink-table-common. This 
includes interfaces for UDFs and the external catalog interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10688) Remove flink-table dependencies from flink-connectors

Timo Walther created FLINK-10688:


 Summary: Remove flink-table dependencies from flink-connectors 
 Key: FLINK-10688
 URL: https://issues.apache.org/jira/browse/FLINK-10688
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Replace {{flink-table}} dependencies in {{flink-connectors}} with more 
lightweight and Scala-free {{flink-table-common}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10687) Make flink-formats Scala-free

Timo Walther created FLINK-10687:


 Summary: Make flink-formats Scala-free
 Key: FLINK-10687
 URL: https://issues.apache.org/jira/browse/FLINK-10687
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


{{flink-table}} is the only dependency that pulls in Scala for {{flink-json}}, 
{{flink-avro}}. We should aim to make {{flink-formats}} Scala-free using only a 
dependency to {{flink-table-common}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10686) Introduce a flink-table-common module

Timo Walther created FLINK-10686:


 Summary: Introduce a flink-table-common module
 Key: FLINK-10686
 URL: https://issues.apache.org/jira/browse/FLINK-10686
 Project: Flink
  Issue Type: Improvement
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Because more and more table factories for connectors and formats are added and 
external catalog support is also on the horizon, {{flink-table}} becomes a 
dependency for many Flink modules. Since {{flink-table}} is implemented in 
Scala it requires other modules to be suffixes with Scala prefixes. However, as 
we have learned in the past, Scala code is hard to maintain which is why our 
long-term goal is to avoid Scala/Scala dependencies.

Therefore we propose a new module {{flink-table-common}} that contains 
interfaces between {{flink-table}} and other modules. This module is 
implemented in Java and should contain minimal (or better no) external 
dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10638) Invalid table scan resolution for temporal join queries

2018-10-22 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10638:


 Summary: Invalid table scan resolution for temporal join queries
 Key: FLINK-10638
 URL: https://issues.apache.org/jira/browse/FLINK-10638
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Registered tables that contain a temporal join are not properly resolved when 
performing a table scan.

{code}
LogicalProject(amount=[*($0, $4)])
  LogicalFilter(condition=[=($3, $1)])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
requiredColumns=[{2}])
  LogicalTableScan(table=[[_DataStreamTable_0]])
  
LogicalTableFunctionScan(invocation=[Rates(CAST($cor0.rowtime):TIMESTAMP(3) NOT 
NULL)], rowType=[RecordType(VARCHAR(65536) currency, BIGINT rate, TIME 
ATTRIBUTE(ROWTIME) rowtime)], elementType=[class [Ljava.lang.Object;])
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10568) Show correct time attributes when calling Table.getSchema()

2018-10-16 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10568:


 Summary: Show correct time attributes when calling 
Table.getSchema()
 Key: FLINK-10568
 URL: https://issues.apache.org/jira/browse/FLINK-10568
 Project: Flink
  Issue Type: Improvement
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


A call to `Table.getSchema` can result in false positives with respect to time 
attributes. Time attributes are converted during optimization phase. It would 
be helpful if we could execute parts of the optimization phase when calling 
this method in order to get useful/correct information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [VOTE] Release flink-shaded 5.0, release candidate #1

2018-10-11 Thread Timo Walther


+1

- I built locally and checked the JAR files for suspicious things.
- I went throught the change diff between 4 and 5 as well.

Could not find anything blocking this release.

Thanks,
Timo


Am 10.10.18 um 17:22 schrieb Aljoscha Krettek:

+1

I did
  - verify all changes between 4.0 and 5.0
  - check signature and hash of the source release
  - build a work-in-progress branch for Scala 2.12 support using the new shaded 
asm6 package


On 10. Oct 2018, at 15:11, Chesnay Schepler  wrote:

Hi everyone,
Please review and vote on the release candidate #1 for the version 5.0, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

This release
* adds jackson-dataformat-csv to the the shaded-jackson module (used for CSV 
table factory)
* enables dependency convergence
* adds a new flink-shaded-asm-6 module (required for scala 2.12 and java 9)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2], 
which are signed with the key with fingerprint 11D464BA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-5.0-rc1" [5].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Chesnay

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343750
[2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-5.0-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1182/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=shortlog;h=refs/tags/5.0-rc1

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Timo Walther

Hi Xuefu,

thanks for your proposal, it is a nice summary. Here are my thoughts to
your list:

1. I think this is also on our current mid-term roadmap. Flink lacks a
poper catalog support for a very long time. Before we can connect
catalogs we need to define how to map all the information from a catalog
to Flink's representation. This is why the work on the unified connector
API [1] is going on for quite some time as it is the first approach to
discuss and represent the pure characteristics of connectors.
2. It would be helpful to figure out what is missing in [1] to to ensure
this point. I guess we will need a new design document just for a proper
Hive catalog integration.
3. This is already work in progress. ORC has been merged, Parquet is on
its way [1].
4. This should be easy. There was a PR in past that I reviewed but was
not maintained anymore.
5. The type system of Flink SQL is very flexible. Only UNION type is
missing.
6. A Flink SQL DDL is on the roadmap soon once we are done with [1].
Support for Hive syntax also needs cooperation with Apache Calcite.

7-11. Long-term goals.

I would also propose to start with a smaller scope where also current
Flink SQL users can profit: 1, 2, 5, 3. This would allow to grow the
Flink SQL ecosystem. After that we can aim to be fully compatible
including syntax and UDFs (4, 6 etc.). Once the core is ready, we can
work on the tooling (7, 8, 9) and performance (10, 11).

@Jörn: Yes, we should not have a tight dependency on Hive. It should be
treated as one "connector" system out of many.

Thanks,
Timo

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?ts=5bb62df4#

[2] https://github.com/apache/flink/pull/6483

Am 11.10.18 um 07:54 schrieb Jörn Franke:

Would it maybe make sense to provide Flink as an engine on Hive
(„flink-on-Hive“)? Eg to address 4,5,6,8,9,10. this could be more loosely
coupled than integrating hive in all possible flink core modules and thus
introducing a very tight dependency to Hive in the core.
1,2,3 could be achieved via a connector based on the Flink Table API.
Just as a proposal to start this Endeavour as independent projects (hive
engine, connector) to avoid too tight coupling with Flink. Maybe in a more
distant future if the Hive integration is heavily demanded one could then
integrate it more tightly if needed.

What is meant by 11?

Am 11.10.2018 um 05:01 schrieb Zhang, Xuefu :

Hi Fabian/Vno,

Thank you very much for your encouragement inquiry. Sorry that I didn't see
Fabian's email until I read Vino's response just now. (Somehow Fabian's went to
the spam folder.)

My proposal contains long-term and short-terms goals. Nevertheless, the effort
will focus on the following areas, including Fabian's list:

1. Hive metastore connectivity - This covers both read/write access, which
means Flink can make full use of Hive's metastore as its catalog (at least for
the batch but can extend for streaming as well).
2. Metadata compatibility - Objects (databases, tables, partitions, etc)
created by Hive can be understood by Flink and the reverse direction is true
also.
3. Data compatibility - Similar to #2, data produced by Hive can be consumed by
Flink and vise versa.
4. Support Hive UDFs - For all Hive's native udfs, Flink either provides its
own implementation or make Hive's implementation work in Flink. Further, for
user created UDFs in Hive, Flink SQL should provide a mechanism allowing user
to import them into Flink without any code change required.
5. Data types - Flink SQL should support all data types that are available in
Hive.
6. SQL Language - Flink SQL should support SQL standard (such as SQL2003) with
extension to support Hive's syntax and language features, around DDL, DML, and
SELECT queries.
7. SQL CLI - this is currently developing in Flink but more effort is needed.
8. Server - provide a server that's compatible with Hive's HiverServer2 in
thrift APIs, such that HiveServer2 users can reuse their existing client (such
as beeline) but connect to Flink's thrift server instead.
9. JDBC/ODBC drivers - Flink may provide its own JDBC/ODBC drivers for other
application to use to connect to its thrift server
10. Support other user's customizations in Hive, such as Hive Serdes, storage
handlers, etc.
11. Better task failure tolerance and task scheduling at Flink runtime.

As you can see, achieving all those requires significant effort and across all
layers in Flink. However, a short-term goal could include only core areas
(such as 1, 2, 4, 5, 6, 7) or start at a smaller scope (such as #3, #6).

Please share your further thoughts. If we generally agree that this is the
right direction, I could come up with a formal proposal quickly and then we can
follow up with broader discussions.

Thanks,
Xuefu

--
Sender:vino yang
Sent at:2018 Oct 11 (Thu) 09:45
Recipient:Fabian Hueske
Cc:dev ;

Re: [DISCUSS] Improvements to the Unified SQL Connector API

2018-10-10 Thread Timo Walther

Hi everyone,

thanks for the feedback that we got so far. I will update the document
in the next couple of hours such that we can continue with the discussion.

Regarding the table type: Actually I just didn't mention it in the
document, because the table type is a SQL Client/External catalog
interface specific property that is evaluated before the unified
connector API (depending on the table type a source and/or sink is
discovered). I agree with Shuyi's comments that it should be possible to
restrict read/write access. The general goal should be that properties
defined in the design document apply to both sources and sinks, i.e., no
special source-only or sink-only properties.

@Rong: Currently, a user can not change the way how a table is used in
the interactive shell. Tables defined in an environment file are
immutable. This will be possible using a SQL DDL in the future.

Regards,
Timo

Am 05.10.18 um 07:30 schrieb Shuyi Chen:

In the case of normal Flink job, I agree we can infer the table type from
the queries. However, for SQL client, the query is adhoc and not known
beforehand. In such case, we might want to enforce the table open mode at
startup time, so users won't accidentally write to a Kafka topic that is
supposed to be written only by some producer. What do you guys think?

Shuyi

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng wrote:

Hi,

Thanks a lot for the proposal. I like the idea to unify table definitions.
I think we can drop the table type since the type can be derived from the
sql, i.e, a table be inserted can only be a sink table.

I left some minor suggestions in the document, mainly include:
- Maybe we also need to allow define properties for tables.
- Support specify Computed Columns in a table
- Support define keys for sources.

Best, Hequn

On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen wrote:

Thanks a lot for the proposal, Timo. I left a few comments. Also, it

seems

the example in the doc does not have the table type (source, sink and

both)

property anymore. Are you suggesting drop it? I think the table type
properties is still useful as it can restrict a certain connector to be
only source/sink, for example, we usually want a Kafka topic to be either
read-only or write-only, but not both.

Shuyi

On Mon, Oct 1, 2018 at 1:53 AM Timo Walther wrote:

Hi everyone,

as some of you might have noticed, in the last two releases we aimed to
unify SQL connectors and make them more modular. The first connectors
and formats have been implemented and are usable via the SQL Client and
Java/Scala/SQL APIs.

However, after writing more connectors/example programs and talking to
users, there are still a couple of improvements that should be applied
to unified SQL connector API.

I wrote a design document [1] that discusses limitations that I have
observed and consideres feedback that I have collected over the last
months. I don't know whether we will implement all of these
improvements, but it would be great to get feedback for a satisfactory
API and for future priorization.

The general goal should be to connect to external systems as convenient
and type-safe as possible. Any feedback is highly appreciated.

Thanks,

Timo

[1]

https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

--
"So you have to trust that the dots will somehow connect in your future."

Re: [DISCUSS] Improvements to the Unified SQL Connector API

2018-10-02 Thread Timo Walther

Thanks for the feedback Fabian. I updated the document and addressed
your comments.

I agree that tables which are stored in different systems need more
discussion. I would suggest to deprecate the field mapping interfaces in
this release and remove it in the next release.

Regards,
Timo

Am 02.10.18 um 11:06 schrieb Fabian Hueske:

Thanks for the proposal Timo!

I've done a pass and added some comments (mostly asking for clarification,
details).
Overall, this is going into a very good direction.
I think the tables which are stored in different systems and using a format
definition to define other formats require some more discussions.
However, these are also not the features that we would start with.

>From a compatibility point of view, an important question to answer would
be whether we can drop the support for field mapping, i.e., do we have
users who take advantage of mapping format fields to fields with a
different name in the schema.
Besides that, all existing functionality is preserved although the syntax
changes a bit.

Best,
Fabian

Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther :

Hi everyone,

However, after writing more connectors/example programs and talking to
users, there are still a couple of improvements that should be applied
to unified SQL connector API.

The general goal should be to connect to external systems as convenient
and type-safe as possible. Any feedback is highly appreciated.

Thanks,

Timo

[1]

https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

[DISCUSS] Improvements to the Unified SQL Connector API

2018-10-01 Thread Timo Walther


Hi everyone,

as some of you might have noticed, in the last two releases we aimed to 
unify SQL connectors and make them more modular. The first connectors 
and formats have been implemented and are usable via the SQL Client and 
Java/Scala/SQL APIs.


However, after writing more connectors/example programs and talking to 
users, there are still a couple of improvements that should be applied 
to unified SQL connector API.


I wrote a design document [1] that discusses limitations that I have 
observed and consideres feedback that I have collected over the last 
months. I don't know whether we will implement all of these 
improvements, but it would be great to get feedback for a satisfactory 
API and for future priorization.


The general goal should be to connect to external systems as convenient 
and type-safe as possible. Any feedback is highly appreciated.


Thanks,

Timo

[1] 
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

[jira] [Created] (FLINK-10448) VALUES clause is translated into a separate operator per value

2018-09-27 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10448:


 Summary: VALUES clause is translated into a separate operator per 
value
 Key: FLINK-10448
 URL: https://issues.apache.org/jira/browse/FLINK-10448
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


It seems that a SQL VALUES clause uses one operator per value under certain 
conditions which leads to a complicated job graph. Given that we need to 
compile code for every operator in the open method, this looks inefficient to 
me.

For example, the following query creates and unions 6 operators together:
{code}
SELECT *
  FROM (
VALUES
  (1, 'Bob', CAST(0 AS BIGINT)),
  (22, 'Alice', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (1, 'Bob', CAST(0 AS BIGINT)))
AS UserCountTable(user_id, user_name, user_count)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [VOTE] Release 1.6.1, release candidate #1

2018-09-20 Thread Timo Walther


+1 (binding)

- Checked all issues that went into the release (I found two JIRA issue 
that have been incorrectly marked)

- Built the source and verify locally successfully
- Run a couple of end-to-end tests successfully

Regards,
Timo


Am 20.09.18 um 11:13 schrieb Tzu-Li (Gordon) Tai:

+1 (binding)

- Verified checksums / signatures
- Checked announcement PR in flink-web
- Built Flink from sources, test + build passes (Hadoop-free, Scala 2.11)
- Ran Elasticsearch 6 sink, using quickstart POM.
- Ran end-to-end tests locally, passes

On Thu, Sep 20, 2018 at 4:31 PM Till Rohrmann  wrote:


+1 (binding)

- Verified checksums and signatures
- Ran all end-to-end tests
- Executed Jepsen test suite (including test for standby JobManager)
- Build Flink from sources
- Verified that no new dependencies were added

Cheers,
Till

On Thu, Sep 20, 2018 at 9:23 AM Till Rohrmann 
wrote:


I would not block the release on FLINK-10243 since you can always
deactivate the latency metrics. Instead we should discuss whether to back
port this scalability improvement and include it in a next bug fix

release.

For that, I suggest to write on the JIRA thread.

Cheers,
Till

On Thu, Sep 20, 2018 at 8:52 AM shimin yang  wrote:


-1

Could you merge the FLINK-10243 into release 1.6.1. I think the
configurable latency metrics will be quite useful and not much work to
merge.

Best,
Shimin

vino yang  于2018年9月20日周四 下午2:08写道：


+1

I checked the new Flink version in the  root pom file.
I checked the announcement blog post and make sure the version number

is

right.
I checked out the source code and ran mvn package (without test)

Thanks, vino.

Fabian Hueske  于2018年9月20日周四 上午4:54写道：


+1 binding

* I checked the diffs and did not find any added dependencies or

updated

dependency versions.
* I checked the sha hash and signatures of all release artifacts.

Best, Fabian

2018-09-19 11:43 GMT+02:00 Gary Yao :


+1 (non-binding)

Ran test suite from the flink-jepsen project on AWS EC2 without

issues.

Best,
Gary

On Sat, Sep 15, 2018 at 11:32 PM, Till Rohrmann <

trohrm...@apache.org>

wrote:


Hi everyone,
Please review and vote on the release candidate #1 for the

version

1.6.1,

as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific

comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience

releases

to

be

deployed to dist.apache.org [2], which are signed with the key
with fingerprint 1F302569A96CFFD5 [3],
* all artifacts to be deployed to the Maven Central Repository

[4],

* source code tag "release-1.6.1-rc1" [5],
* website pull request listing the new release and adding

announcement

blog post [6].

The vote will be open for at least 72 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12315522=12343752
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.6.1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/

orgapacheflink-1180

[5] https://github.com/apache/flink/tree/release-1.6.1-rc1
[6] https://github.com/apache/flink-web/pull/124

Cheers,
Till

Pro-tip: you can create a settings.xml file with these contents:



   flink-1.6.0


   
 flink-1.6.0
 
   
 flink-1.6.0
 



https://repository.apache.org/content/repositories/orgapacheflink-1180/

<

https://repository.apache.org/content/repositories/orgapacheflink-1178/

 
   
   
 archetype
 



https://repository.apache.org/content/repositories/orgapacheflink-1180/

<

https://repository.apache.org/content/repositories/orgapacheflink-1178/

 
   
 
   



And reference that in your maven commands via --settings
path/to/settings.xml. This is useful for creating a quickstart

based

on

the

staged release and for building against the staged jars.

Re: [VOTE] Release 1.5.4, release candidate #1

2018-09-20 Thread Timo Walther


+1 (binding)

- Checked all issues that went into the release (I found one JIRA issue 
that has been incorrectly marked)
- Built from source (On my machine SelfJoinDeadlockITCase is failing due 
to a timeout, in the IDE it works correctly. I guess my machine was just 
too busy.)

- Run some end-to-end tests locally with success

Regards,
Timo


Am 20.09.18 um 11:24 schrieb Tzu-Li (Gordon) Tai:

+1 (binding)

- Verified checksums / signatures
- Checked announcement PR in flink-web
- No new / changed dependencies
- Built from source (Hadoop-free, Scala 2.11)
- Run end-to-end tests locally, passes

On Thu, Sep 20, 2018 at 5:03 AM Fabian Hueske  wrote:


+1 binding

* I checked the diffs and did not find any added dependencies or updated
dependency versions.
* I checked the sha hash and signatures of all release artifacts.

Best, Fabian

2018-09-15 23:26 GMT+02:00 Till Rohrmann :


Hi everyone,
Please review and vote on the release candidate #1 for the version 1.5.4,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to

be

deployed to dist.apache.org [2], which are signed with the key
with fingerprint 1F302569A96CFFD5 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.4-rc1" [5],
* website pull request listing the new release and adding announcement
blog post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12315522=12343899
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.4/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]

https://repository.apache.org/content/repositories/orgapacheflink-1181

[5] https://github.com/apache/flink/tree/release-1.5.4-rc1
[6] https://github.com/apache/flink-web/pull/123

Cheers,
Till

Pro-tip: you can create a settings.xml file with these contents:



   flink-1.6.0


   
 flink-1.6.0
 
   
 flink-1.6.0
 

https://repository.apache.org/content/repositories/orgapacheflink-1181/

Re: [ANNOUNCE] New committer Gary Yao

2018-09-07 Thread Timo Walther


Congratulations, Gary!

Timo


Am 07.09.18 um 16:46 schrieb Ufuk Celebi:

Great addition to the committers. Congrats, Gary!

– Ufuk


On Fri, Sep 7, 2018 at 4:45 PM, Kostas Kloudas
 wrote:

Congratulations Gary! Well deserved!

Cheers,
Kostas


On Sep 7, 2018, at 4:43 PM, Fabian Hueske  wrote:

Congratulations Gary!

2018-09-07 16:29 GMT+02:00 Thomas Weise :


Congrats, Gary!

On Fri, Sep 7, 2018 at 4:17 PM Dawid Wysakowicz 
wrote:


Congratulations Gary! Well deserved!

On 07/09/18 16:00, zhangmingleihe wrote:

Congrats Gary!

Cheers
Minglei


在 2018年9月7日，下午9:59，Andrey Zagrebin  写道：

Congratulations Gary!


On 7 Sep 2018, at 15:45, Stefan Richter 
wrote:

Congrats Gary!


Am 07.09.2018 um 15:14 schrieb Till Rohrmann 
:

Hi everybody,

On behalf of the PMC I am delighted to announce Gary Yao as a new

Flink

committer!

Gary started contributing to the project in June 2017. He helped

with

the

Flip-6 implementation, implemented many of the new REST handlers,

fixed

Mesos issues and initiated the Jepsen-based distributed test suite

which

uncovered several serious issues. Moreover, he actively helps

community

members on the mailing list and with PR reviews.

Please join me in congratulating Gary for becoming a Flink

committer!

Cheers,
Till

[jira] [Created] (FLINK-10273) Access composite type fields after a function

2018-08-31 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10273:


 Summary: Access composite type fields after a function
 Key: FLINK-10273
 URL: https://issues.apache.org/jira/browse/FLINK-10273
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


If a function returns a composite type, for example, {{Row(lon: Float, lat: 
Float)}}. There is currently no way of accessing fields.

Both queries fail with exceptions:
{code}
select t.c.lat, t.c.lon FROM (select toCoords(12) as c) AS t
{code}

{code}
select toCoords(12).lat
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10269) Elasticsearch UpdateRequest fail because of binary incompatibility

2018-08-31 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10269:


 Summary: Elasticsearch UpdateRequest fail because of binary 
incompatibility
 Key: FLINK-10269
 URL: https://issues.apache.org/jira/browse/FLINK-10269
 Project: Flink
  Issue Type: Bug
  Components: ElasticSearch Connector
Reporter: Timo Walther
Assignee: Timo Walther


When trying to send UpdateRequest(s) to ElasticSearch6, and one gets the 
following
error:

Caused by: java.lang.NoSuchMethodError:
org.elasticsearch.action.bulk.BulkProcessor.add(Lorg/elasticsearch/action/ActionRequest;)Lorg/elasticsearch/action/bulk/BulkProcessor;
at
org.apache.flink.streaming.connectors.elasticsearch.BulkProcessorIndexer.add(BulkProcessorIndexer.java:76)

ElasticsearchSinkFunction:
{code}
import org.elasticsearch.action.update.UpdateRequest
def upsertRequest(element: T): UpdateRequest = {
new UpdateRequest(
"myIndex",
"record",
s"${element.id}")
.doc(element.toMap())
}
override def process(element: T, runtimeContext: RuntimeContext,
requestIndexer: RequestIndexer): Unit = {
requestIndexer.add(upsertRequest(element))
}
{code}

This is due to a binary compatibility issue between the base module (which is 
compiled against a very old ES version and the current Elasticsearch version).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10265) Configure checkpointing behavior for SQL Client

2018-08-30 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10265:


 Summary: Configure checkpointing behavior for SQL Client
 Key: FLINK-10265
 URL: https://issues.apache.org/jira/browse/FLINK-10265
 Project: Flink
  Issue Type: New Feature
  Components: SQL Client
Reporter: Timo Walther


The SQL Client environment file expose checkpointing related properties:

- enable checkpointing
- checkpointing interval
- mode
- timeout
- etc. see {{org.apache.flink.streaming.api.environment.CheckpointConfig}}

Per-job selection of state backends and their configuration.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10261) INSERT INTO does not work with ORDER BY clause

2018-08-30 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10261:


 Summary: INSERT INTO does not work with ORDER BY clause
 Key: FLINK-10261
 URL: https://issues.apache.org/jira/browse/FLINK-10261
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


It seems that INSERT INTO and ORDER BY do not work well together.

An AssertionError is thrown and the ORDER BY clause is duplicated.

Example:
{code}
@Test
  def testInsertIntoMemoryTable(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val tEnv = TableEnvironment.getTableEnvironment(env)
MemoryTableSourceSinkUtil.clear()

val t = StreamTestData.getSmall3TupleDataStream(env)
.assignAscendingTimestamps(x => x._2)
  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
tEnv.registerTable("sourceTable", t)

val fieldNames = Array("d", "e", "f", "t")
val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
Types.SQL_TIMESTAMP)
  .asInstanceOf[Array[TypeInformation[_]]]
val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)

val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime FROM sourceTable 
ORDER BY a"
tEnv.sqlUpdate(sql)
env.execute()
{code}

Error:
{code}
java.lang.AssertionError: not a query: SELECT `sourceTable`.`a`, 
`sourceTable`.`b`, `sourceTable`.`c`, `sourceTable`.`rowtime`
FROM `sourceTable` AS `sourceTable`
ORDER BY `a`
ORDER BY `a`

at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3069)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:557)
at 
org.apache.flink.table.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:104)
at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:717)
at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:683)
at 
org.apache.flink.table.runtime.stream.sql.SqlITCase.testInsertIntoMemoryTable(SqlITCase.scala:735)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10258) Allow streaming sources to be present for batch executions

2018-08-30 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10258:


 Summary: Allow streaming sources to be present for batch 
executions 
 Key: FLINK-10258
 URL: https://issues.apache.org/jira/browse/FLINK-10258
 Project: Flink
  Issue Type: Bug
  Components: SQL Client
Reporter: Timo Walther


For example, if a filesystem connector with CSV format is defined and an update 
mode has been set. When switching to {{SET execution.type=batch}} in CLI the 
connector is not valid anymore and an exception blocks the execution of new SQL 
statements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10232) Add a SQL DDL

Timo Walther created FLINK-10232:


 Summary: Add a SQL DDL
 Key: FLINK-10232
 URL: https://issues.apache.org/jira/browse/FLINK-10232
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


This is an umbrella issue for all efforts related to supporting a SQL Data 
Definition Language (DDL) in Flink's Table & SQL API.

Such a DDL includes creating, deleting, replacing:
- tables
- views
- functions
- types
- libraries
- catalogs

If possible, the parsing/validating/logical part should be done using Calcite. 
Related issues are CALCITE-707, CALCITE-2045, CALCITE-2046, CALCITE-2214, and 
others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10231) Support views in Table API

Timo Walther created FLINK-10231:


 Summary: Support views in Table API
 Key: FLINK-10231
 URL: https://issues.apache.org/jira/browse/FLINK-10231
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


FLINK-10163 added initial view support for the SQL Client. However, for 
supporting the [full definition of 
views|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView]
 (with schema, comments, etc.) we need to support native support for views in 
the Table API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10230) Support printing the query of a view

Timo Walther created FLINK-10230:


 Summary: Support printing the query of a view
 Key: FLINK-10230
 URL: https://issues.apache.org/jira/browse/FLINK-10230
 Project: Flink
  Issue Type: New Feature
  Components: SQL Client
Reporter: Timo Walther


FLINK-10163 added initial support for views in SQL Client. We should add a 
command that allows for printing the query of a view for debugging. MySQL 
offers {{SHOW CREATE VIEW}} for this. Hive generalizes this to {{SHOW CREATE 
TABLE}}. The latter one could be extended to also show information about the 
used table factories and properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10229) Support listing of views

Timo Walther created FLINK-10229:


 Summary: Support listing of views
 Key: FLINK-10229
 URL: https://issues.apache.org/jira/browse/FLINK-10229
 Project: Flink
  Issue Type: New Feature
  Components: SQL Client
Reporter: Timo Walther


FLINK-10163 added initial support of views for the SQL Client. According to 
other database vendors, views are listed in the \{{SHOW TABLES}}. However, 
there should be a way of listing only the views. We can support the \{{SHOW 
VIEWS}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: SQL Client Limitations

2018-08-26 Thread Timo Walther

Hi Eron,

yes, FLINK-9172 covers how the SQL Client will discover ExternalCatalog 
similar to how it discovers connectors and formats today. The exact 
design has to be fleshed out but the SQL Client's environment file will 
declare catalogs and their properties. The SQL Client's gateway will 
then perform the communication to external systems.

Regards,
Timo

Am 26.08.18 um 04:54 schrieb Eron Wright:
I'd like to better understand how catalogs will work in SQL Client.  
 I assume we'll be able to reference catalog classes from the 
environment file (e.g. FLINK-9172).

Thanks

On Tue, Aug 21, 2018 at 4:56 AM Fabian Hueske > wrote:

Hi Dominik,

The SQL Client supports the same subset of SQL that you get with
Java /
Scala embedded queries.
The documentation [1] covers all supported operations.

There are some limitations because certain operators require
special time
attributes (row time or processing time attributes) which are
monotonically
increasing.
Some operators such as a regular join (in contrast to a
time-windowed join)
remove the monotonicity property of time attributes such that
time-based
operations cannot be applied anymore.

Best,
Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-
release-1.6/dev/table/sql.html

2018-08-21 13:27 GMT+02:00 Till Rohrmann mailto:trohrm...@apache.org>>:

> Hi Dominik,
>
> I think such a list would be really helpful. I've pulled Timo
and Fabian
> into this conversation because they probably know more.
>
> Cheers,
> Till
>
> On Mon, Aug 20, 2018 at 12:43 PM Dominik Wosiński
mailto:wos...@gmail.com>>
> wrote:
>
>> Hey,
>>
>> Do we have any list of current limitations of SQL Client available
>> somewhere or the only way is to go through JIRA issues?
>>
>> For example:
>> I tried to make Group By Tumble Window and Inner Join in one
query and it
>> seems that it is not possible currently and I was wondering
whether it's
>> and issue with my query or known limitation.
>>
>> Thanks,
>> Best Regards,
>> Dominik.
>>
>

Re: Side effect of DataStreamRel#translateToPlan

2018-08-21 Thread Timo Walther


Hi Wangsan,

the bahavior of DataStreamRel#translateToPlan is more or less intended. 
That's why you call `toAppendStream` on the table environment. Because 
you add your pipeline to the environment (from source to current operator).


However, the explain() method should not cause those side-effects.

Regards,
Timo

Am 21.08.18 um 17:29 schrieb wangsan:

Hi Timo,

I think this may not only affect  explain() method. Method 
DataStreamRel#translateToPlan is called when we need translate a FlinkRelNode 
into DataStream or DataSet, we add desired operators in execution environment. 
By side effect, I mean that if we call DataStreamRel#translateToPlan on same 
RelNode  several times, the same operators are added in execution environment 
more than once, but actually we need that for only one time. Correct me if I 
misunderstood that.

I will open an issue late this day, if this is indeed a problem.

Best,
wangsan




On Aug 21, 2018, at 10:16 PM, Timo Walther  wrote:

Hi,

this sounds like a bug to me. Maybe the explain() method is not implemented 
correctly. Can you open an issue for it in Jira?

Thanks,
Timo


Am 21.08.18 um 15:04 schrieb wangsan:

Hi all,

I noticed that the DataStreamRel#translateToPlan is non-idempotent, and that 
may cause the execution plan not as what we expected. Every time we call 
DataStreamRel#translateToPlan (in TableEnvirnment#explain, 
TableEnvirnment#writeToSink, etc), we add same operators in execution 
environment repeatedly.

Should we eliminate the side effect of DataStreamRel#translateToPlan ?

Best,  Wangsan

appendix

 tenv.registerTableSource("test_source", sourceTable)

 val t = tenv.sqlQuery("SELECT * from test_source")
 println(tenv.explain(t))
 println(tenv.explain(t))

 implicit val typeInfo = TypeInformation.of(classOf[Row])
 tenv.toAppendStream(t)
 println(tenv.explain(t))
We call explain three times, and the Physical Execution Plan are all diffrent.

== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD


== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 4 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 5 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 6 : Operator
 content : Map
 ship_strategy : FORWARD


== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 4 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 5 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 6 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 7 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 8 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 9 : Operator
 content : Map
 ship_strategy : FORWARD

 Stage 10 : Operator
 content : to: Row
 ship_strategy : FORWARD

Stage 11 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 12 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 13 : Operator
 content : Map
 ship_strategy : FORWARD

Re: Side effect of DataStreamRel#translateToPlan

2018-08-21 Thread Timo Walther


Hi,

this sounds like a bug to me. Maybe the explain() method is not 
implemented correctly. Can you open an issue for it in Jira?


Thanks,
Timo


Am 21.08.18 um 15:04 schrieb wangsan:

Hi all,

I noticed that the DataStreamRel#translateToPlan is non-idempotent, and that 
may cause the execution plan not as what we expected. Every time we call 
DataStreamRel#translateToPlan (in TableEnvirnment#explain, 
TableEnvirnment#writeToSink, etc), we add same operators in execution 
environment repeatedly.

Should we eliminate the side effect of DataStreamRel#translateToPlan ?

Best,  Wangsan

appendix

 tenv.registerTableSource("test_source", sourceTable)

 val t = tenv.sqlQuery("SELECT * from test_source")
 println(tenv.explain(t))
 println(tenv.explain(t))

 implicit val typeInfo = TypeInformation.of(classOf[Row])
 tenv.toAppendStream(t)
 println(tenv.explain(t))
We call explain three times, and the Physical Execution Plan are all diffrent.

== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD


== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 4 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 5 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 6 : Operator
 content : Map
 ship_strategy : FORWARD


== Abstract Syntax Tree ==
LogicalProject(f1=[$0], f2=[$1])
   LogicalTableScan(table=[[test_source]])

== Optimized Logical Plan ==
StreamTableSourceScan(table=[[test_source]], fields=[f1, f2], 
source=[CsvTableSource(read fields: f1, f2)])

== Physical Execution Plan ==
Stage 1 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 2 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 3 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 4 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 5 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 6 : Operator
 content : Map
 ship_strategy : FORWARD

Stage 7 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 8 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 9 : Operator
 content : Map
 ship_strategy : FORWARD

 Stage 10 : Operator
 content : to: Row
 ship_strategy : FORWARD

Stage 11 : Data Source
 content : collect elements with CollectionInputFormat

 Stage 12 : Operator
 content : CsvTableSource(read fields: f1, f2)
 ship_strategy : FORWARD

 Stage 13 : Operator
 content : Map
 ship_strategy : FORWARD

[jira] [Created] (FLINK-10173) Elasticsearch 6 connector does not package Elasticsearch

2018-08-20 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10173:


 Summary: Elasticsearch 6 connector does not package Elasticsearch
 Key: FLINK-10173
 URL: https://issues.apache.org/jira/browse/FLINK-10173
 Project: Flink
  Issue Type: Bug
  Components: ElasticSearch Connector
Reporter: Timo Walther


The Elasticsearch 6 Flink connector does not include Elasticsearch itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10169) RowtimeValidator fails with custom TimestampExtractor

2018-08-19 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10169:


 Summary: RowtimeValidator fails with custom TimestampExtractor
 Key: FLINK-10169
 URL: https://issues.apache.org/jira/browse/FLINK-10169
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


Wrong property conversion for custom TimestampExtractor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10163) Support CREATE VIEW in SQL Client

2018-08-16 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10163:


 Summary: Support CREATE VIEW in SQL Client
 Key: FLINK-10163
 URL: https://issues.apache.org/jira/browse/FLINK-10163
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


The possibility to define a name for a subquery would improve the usability of 
the SQL Client. The SQL standard defines \{{CREATE VIEW}} for defining a 
virtual table.

 

Example:

{code}
CREATE VIEW view_name AS SELECT 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Release Flink 1.5.3

2018-08-16 Thread Timo Walther


Thank you for starting this discussion.

+1 for this

Regards,
Timo

Am 16.08.18 um 09:27 schrieb vino yang:

Agree! This sounds very good.

Till Rohrmann  于2018年8月16日周四 下午3:14写道：


+1 for starting the release process 1.5.3 immediately. We can always
create another bug fix release afterwards. I think the more often we
release the better it is for our users, because they receive incremental
improvements faster.

Cheers,
Till

On Thu, Aug 16, 2018 at 8:52 AM Chesnay Schepler 
wrote:


I would actually start with the release process today unless anyone
objects.

On 16.08.2018 08:46, vino yang wrote:

Hi Chesnay,

+1
   I want to know when you plan to cut the branch.

Thanks, vino.

Chesnay Schepler  于2018年8月16日周四 下午2:29写道：


Hello everyone,

it has been a little over 2 weeks since 1.5.2 was released, and since
then a number of fixes were added to the release-1.5 that I think we
should push to users.

Notable fixes, (among others) are FLINK-9969 which fixes a memory leak
when using batch, FLINK-10066 that reduces the memory print on the JM
for long-running jobs or FLINK-10070 that reverses a regression
introduced in 1.5.2 due to which Flink could not be compiled with
certain maven versions.

I would of course volunteer as Release Manager.

Re: Request contribution permission

2018-08-10 Thread Timo Walther


Hi,

I gave you contributor permissions.

Regards,
Timo

Am 10.08.18 um 13:05 schrieb 邓林:

  Hi Flink community,

I want to contribute code to Flink, can anyone give me
contribution permission? My JIRA account name is  lyndldeng.

Thanks.

[jira] [Created] (FLINK-10107) SQL Client end-to-end test fails for releases

2018-08-09 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10107:


 Summary: SQL Client end-to-end test fails for releases
 Key: FLINK-10107
 URL: https://issues.apache.org/jira/browse/FLINK-10107
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


It seems that SQL JARs for Kafka 0.10 and Kafka 0.9 have conflicts that only 
occur for releases and not SNAPSHOT builds. This might be due to their file 
name. Depending on the file name either 0.9 is loaded before 0.10 and vice 
versa.

One of the following errors occured:
{code}
2018-08-08 18:28:51,636 ERROR 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.ClientUtils  - Failed 
to close coordinator
java.lang.NoClassDefFoundError: 
org/apache/flink/kafka09/shaded/org/apache/kafka/common/requests/OffsetCommitResponse
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:473)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:357)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.maybeAutoCommitOffsetsSync(ConsumerCoordinator.java:439)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.close(ConsumerCoordinator.java:319)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.ClientUtils.closeQuietly(ClientUtils.java:63)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1277)
at 
org.apache.flink.kafka09.shaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:1258)
at 
org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:286)
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.kafka09.shaded.org.apache.kafka.common.requests.OffsetCommitResponse
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:120)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 8 more
{code}
{code}
java.lang.NoSuchFieldError: producer
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer010.invoke(FlinkKafkaProducer010.java:369)
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [VOTE] Release 1.6.0, release candidate #4

2018-08-08 Thread Timo Walther


+1

- successfully run `mvn clean verify` locally
- successfully run end-to-end tests locally (except for SQL Client 
end-to-end test)


Found a bug in the class loading of SQL JAR files. This is not a blocker 
but a bug that we should fix soon. As an easy workaround user should not 
use different Kafka versions as SQL Client dependencies.


Regards,
Timo

Am 08.08.18 um 18:10 schrieb Dawid Wysakowicz:

+1

- verified compilation, tests
- verified checksum and gpg files
- verified sbt templates (g8, quickstart) - run assemblies on local cluster

- I could not execute the nightly-tests.sh though. The tests that were
failing most often are:
     - test_streaming_file_sink.sh
     - test_streaming_elasticsearch.sh

Those are connectors though and it might be only tests flakiness so I
think it should not block the release.

On 08/08/18 16:36, Chesnay Schepler wrote:

I did not use the tools/list_deps.py script as I wasn't aware that it
existed.

Even if I were I wouldn't have used it and in fact would advocate for
removing it.
It manually parses and constructs dependency information which is
utterly unnecessary as maven already provides this functionality, with
the added bonus of also accounting for dependencyManagement and
transitive dependencies which we obviously have to take into account.

I used this one-liner instead:
|mvn dependency:list | ||grep| |":.*:.*:.*"| || ||grep| |-||v| |-e
||"Finished at"| |-e ||"Some problems"| || ||cut| |-d] -f2- | ||sed|
|'s/:[a-z]*$//g'| || ||sort| |-u

|which I have documented here:
https://cwiki.apache.org/confluence/display/FLINK/Dependencies

On 08.08.2018 15:06, Aljoscha Krettek wrote:

+1

- verified checksum and gpg files
- verified LICENSE and NOTICE: NOTICE didn't change from 1.5, LICENSE
had one unnecessary part removed

Side comment: I'm not sure whether the "Verify that the LICENSE and
NOTICE file is correct for the binary and source releases" part is
valid anymore because we only have one LICENSE and NOTICE file. also
"The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer
to the binary distribution and mention all of Flink's Maven
dependencies as well" can be dropped because we don't have them anymore.

I came to the same conclusion on dependencies. I used
tools/list_deps.py and diff'ed the output for 1.5 and 1.6, that's
probably what Chesnay also did ... :-)


On 8. Aug 2018, at 14:43, Chesnay Schepler  wrote:

+1

- verified source release contains no binaries
- verified correct versions in source release
- verified compilation, tests and E2E-tests pass (on travis)
- verified checksum and gpg files

New dependencies (excluding dependencies where we simply depend on a
different version now):
     Apache licensed:
     io.confluent:common-utils:jar:3.3.1
     io.confluent:kafka-schema-registry-client:jar:3.3.1
     io.prometheus:simpleclient_pushgateway:jar:0.3.0
     various Apache Nifi dependencies
     various Apache Parquet dependencies
     various ElasticSearch dependencies
     CDDL:
     javax.ws.rs:javax.ws.rs-api:jar:2.1
     Bouncycastle (MIT-like):
     org.bouncycastle:bcpkix-jdk15on:jar:1.59
     org.bouncycastle:bcprov-jdk15on:jar:1.59
     MIT:
     org.projectlombok:lombok:jar:1.16.20

On 08.08.2018 13:28, Till Rohrmann wrote:

Thanks for reporting these problems Chesnay. The usage string in
`standalone-job.sh` is out dated and should be updated. The same
applies to
the typo.

When calling `standalone-job.sh start --job-classname foobar.Job`
please
make sure that the user code jar is contained in the classpath (e.g.
putting the jar in the lib directory). Documenting this behaviour
is part
of the pending issue FLINK-10001.

We should fix all of these issues. They are, however, no release
blockers.

Cheers,
Till

On Wed, Aug 8, 2018 at 11:31 AM Chesnay Schepler
 wrote:


I found some issues with the standalone-job.sh script.

I ran "./bin/standalone-job.sh start" as described by the usage
string.

  2018-08-08 09:22:34,385 ERROR
  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
  Could not parse command line arguments [--configDir,
  /home/zento/svn/flink-1.6.0/flink-1.6.0/conf].
  org.apache.flink.runtime.entrypoint.FlinkParseException:
Failed to
  parse the command line arguments.
   at

org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:52)

   at

org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:143)

  Caused by: org.apache.commons.cli.MissingOptionException:
Missing
  required option: j
   at

org.apache.commons.cli.DefaultParser.checkRequiredOptions(DefaultParser.java:199)

   at
 
org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:130)

   at
 
org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:81)

   at

[jira] [Created] (FLINK-10073) Allow setting a restart strategy in SQL Client

2018-08-06 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10073:


 Summary: Allow setting a restart strategy in SQL Client
 Key: FLINK-10073
 URL: https://issues.apache.org/jira/browse/FLINK-10073
 Project: Flink
  Issue Type: Improvement
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Currently, it is not possible to set a restart strategy per job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10071) Document usage of INSERT INTO in SQL Client

2018-08-06 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10071:


 Summary: Document usage of INSERT INTO in SQL Client
 Key: FLINK-10071
 URL: https://issues.apache.org/jira/browse/FLINK-10071
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Document the usage of {{INSERT INTO}} statements in SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10062) Docs are not built due to git-commit-id-plugin

2018-08-05 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10062:


 Summary: Docs are not built due to git-commit-id-plugin
 Key: FLINK-10062
 URL: https://issues.apache.org/jira/browse/FLINK-10062
 Project: Flink
  Issue Type: Bug
  Components: Build System, Documentation
Reporter: Timo Walther


The recent changes in FLINK-9986 causes errors during the building of the docs.

{code}
[ERROR] Failed to execute goal 
pl.project13.maven:git-commit-id-plugin:2.1.14:revision (default) on project 
flink-runtime_2.11: The plugin pl.project13.maven:git-commit-id-plugin:2.1.14 
requires Maven version [3.1.1,) -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Permission for contributor status

2018-08-03 Thread Timo Walther


Hi Sampath,

I added you as a contributor. You can now assign issues to yourself.

Regards,
Timo

Am 03.08.18 um 11:42 schrieb Sampath Bhat:

Hello Till

Could you please add sampathBhat to the list of contributors in JIRA.

Thank you
Sampath

On Fri, Aug 3, 2018 at 1:18 PM, Till Rohrmann  wrote:


Hi Sampath,

everyone can be a contributor without special permissions required. The
only thing for which you need permissions is to assign JIRA issues to
yourself. If you want to do this, then let us know your JIRA account name
and we will add you to the list of contributors in JIRA.

Cheers,
Till

On Fri, Aug 3, 2018 at 9:04 AM Sampath Bhat 
wrote:


Hello flink dev team

I would like to know what is the procedure for obtaining contributor
permission for flink.

Thanks
Sampath

[jira] [Created] (FLINK-10031) Support handling late events in Table API & SQL

2018-08-02 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10031:


 Summary: Support handling late events in Table API & SQL
 Key: FLINK-10031
 URL: https://issues.apache.org/jira/browse/FLINK-10031
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


The Table API & SQL drop late events right now. We should offer something like 
a side channel that allows to capture late events for separate processing. For 
example, this could either be simply a table sink or a special table (in Table 
API) derived from the original table that allows further processing. The exact 
design needs to be discussed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10017) Support keyed formats for Kafka table source sink factory

2018-08-01 Thread Timo Walther (JIRA)

Timo Walther created FLINK-10017:


 Summary: Support keyed formats for Kafka table source sink factory
 Key: FLINK-10017
 URL: https://issues.apache.org/jira/browse/FLINK-10017
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Currently, the Kafka table source sink factory does not support keyed formats 
that would result in {{KeyedSerializationSchema}} and 
{{KeyedDeserializationSchema}}. It should be possible to define:

- a key format
- a value format
- a target topic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Error while building 1.5.1 from source

2018-08-01 Thread Timo Walther


Hi,

the Maven enforcer plugin produced invalid errors under certain 
conditions. See also the discussion here [1]. If you start the build 
again, it should succeed. This issue is fixed in 1.5.2.


Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-9091


Am 01.08.18 um 11:26 schrieb vino yang:

Hi shashank,

Can you provide more details to assist with locating problems?

Thanks, vino.

2018-08-01 15:46 GMT+08:00 shashank734 :


Facing issue with Apache flink 1.5.1 while building from source.

MVN : 3.2.5

Command : mvn clean install -DskipTests -Dscala.version=2.11.7
-Pvendor-repos -Dhadoop.version=2.7.3.2.6.2.0-205

Error : [ERROR] Failed to execute goal
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce
(dependency-convergence) on project flink-bucketing-sink-test: Some
Enforcer
rules have failed. Look above for specific messages explaining why the rule
failed. -> [Help 1]



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [RESULT][VOTE] Release 1.5.2, release candidate #2

2018-07-31 Thread Timo Walther


Thanks for managing the release process Chesnay!

Timo

Am 31.07.18 um 10:05 schrieb Chesnay Schepler:

I'm happy to announce that we have approved this release.

There are 5 approving votes, 3 of which are binding:

* Yaz (non-binding)
* Thomas (non-binding)
* Till (binding)
* Timo (binding)
* Chesnay (binding)

There is 1 disapproving vote:

* Piotr (non-binding)

Thanks everyone!


On 26.07.2018 10:39, Chesnay Schepler wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 
1.5.2, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases 
to be deployed to dist.apache.org [2], which are signed with the key 
with fingerprint 11D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.1-rc2" [5],
* website pull request listing the new release and adding 
announcement blog post [6].


The vote will be open for at least 72 hours. It is adopted by 
majority approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343588

[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1174
[5] 
https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.5.2-rc2

[6] https://github.com/apache/flink-web/pull/114

Re: [VOTE] Release 1.5.2, release candidate #2

2018-07-30 Thread Timo Walther


+1

- run a couple of programs from the Web UI
- run SQL Client table programs (failing and non-failing)
- run a couple end-to-end tests on my local machine

Caveat: The test_streaming_classloader.sh does not work on releases. But 
this is a bug in the test, not in the release (see FLINK-9987).


Regards,
Timo

Am 27.07.18 um 08:35 schrieb Chesnay Schepler:
I've opened a PR to fix this from happening again: 
https://github.com/apache/flink/pull/6436


In the mean-time I will regenerate the sha for the source release.

On 27.07.2018 08:27, Chesnay Schepler wrote:
That is indeed weird. It only seems to affect the source release, 
will look into it. This also affects 1.6.0 RC1, so it's at least not 
an issue of my environment.


On 27.07.2018 02:16, Thomas Weise wrote:

+1 but see below

Following shasum verification failed:

shasum:
/mnt/c/Users/Zento/Documents/GitHub/flink/tools/releasing/release/flink-1.5.2-src.tgz: 

/mnt/c/Users/Zento/Documents/GitHub/flink/tools/releasing/release/flink-1.5.2-src.tgz: 


FAILED open or read

The problem can be corrected by fixing the file path in
*flink-1.5.2-src.tgz.sha512* - I would recommend doing that before
promoting the release.

Successfully run Beam pipeline against locally running Flink cluster.

Thanks




On Thu, Jul 26, 2018 at 1:39 AM Chesnay Schepler 
 wrote:



Hi everyone,
Please review and vote on the release candidate #2 for the version
1.5.2, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which 
includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience 
releases to

be deployed to dist.apache.org [2], which are signed with the key with
fingerprint 11D464BA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.1-rc2" [5],
* website pull request listing the new release and adding announcement
blog post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Chesnay

[1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343588 


[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1174

[5]

https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.5.2-rc2 


[6] https://github.com/apache/flink-web/pull/114

[jira] [Created] (FLINK-9984) Add a byte array table format factory

2018-07-27 Thread Timo Walther (JIRA)

Timo Walther created FLINK-9984:
---

 Summary: Add a byte array table format factory
 Key: FLINK-9984
 URL: https://issues.apache.org/jira/browse/FLINK-9984
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


Sometimes it might be useful to just read or write a plain byte array into 
Kafka or other connectors. We should add a simple byte array 
SerializationSchemaFactory and DeserializationSchemaFactory.

See {{org.apache.flink.formats.json.JsonRowFormatFactory}} for a reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Access generic pojo fields

2018-07-27 Thread Timo Walther

I tried to reproduce your error but everything worked fine. Which Flink 
version are you using?


Inner joins are a Flink 1.5 feature.


Am 27.07.18 um 13:28 schrieb Amol S - iProgrammer:

Table master = table1.filter("ns === 'Master'").select("o as master,
'accessBasicDBObject(applicationId,o)' as primaryKey");
Table child1 = table1.filter("ns === 'Child1'").select("o  as child1,
'accessBasicDBObject(applicationId,o)' as foreignKey");
Table child2 = table1.filter("ns === 'Child2'").select("o  as child2,
'accessBasicDBObject(applicationId,o)' as foreignKey2");

Table result = 
master.join(child1).where("primaryKey==foreignKey").join(child2).where("primaryKey==foreignKey2");

Re: Access generic pojo fields

2018-07-27 Thread Timo Walther


Hi,

I think the exception is self-explaining. BasicDBObject is not 
recognized as a POJO by Flink. A POJO is required such that the Table 
API knows the types of fields for following operations.


The easiest way is to implement your own scalar function. E.g. a 
`accessBasicDBObject(obj, key)`.


Regards,
Timo


Am 27.07.18 um 11:25 schrieb Amol S - iProgrammer:

Hello Timo,

Thanks for quick reply. By using your suggestion Previous exception gone
but it is giving me following exception

Expression 'o.get(_id) failed on input check: Cannot access field of
non-composite type 'GenericType'.

---
*Amol Suryawanshi*
Java Developer
am...@iprogrammer.com


*iProgrammer Solutions Pvt. Ltd.*



*Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune - 411016,
MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
www.iprogrammer.com 


On Fri, Jul 27, 2018 at 1:08 PM, Timo Walther  wrote:


Hi Amol,

the dot operation is reserved for calling functions on fields. If you want
to get a nested field in the Table API, use the `.get("applicationId")`
operation. See also [1] under "Value access functions".

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/
dev/table/tableApi.html#built-in-functions


Am 27.07.18 um 09:10 schrieb Amol S - iProgrammer:


Hello Fabian,

I am streaming my mongodb oplog using flink and want to use flink table
API
to join multiple tables.  My code looks like

DataStream streamSource = env
  .addSource(kafkaConsumer)
  .setParallelism(4);

StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvir
onment(env);
// Convert the DataStream into a Table with default fields "f0", "f1"
Table table1 = tableEnv.fromDataStream(streamSource);

Table master = table1.filter("ns === 'Master'").select("o as master,
o.applicationId as primaryKey");
Table child1 = table1.filter("ns === 'Child1'").select("o  as child1,
o.applicationId as foreignKey");
Table child2 = table1.filter("ns === 'Child2'").select("o  as child2,
o.applicationId as foreignKey2");

Table result = master.join(child1).where("pri
maryKey==foreignKey").join(child2).where("primaryKey==foreignKey2");

it is throwing error "Method threw
'org.apache.flink.table.api.ValidationException' exception. Undefined
function: APPLICATIONID"

public class Oplog implements Serializable{
  private BasicDBObject o;
}

Where o is generic java type for fetching mongodb oplog and I can not
replace this generic type with static pojo's. please tell me any work
around on this.

BasicDBObject suffice following two rules


 -

 The class must be public.
 -

 It must have a public constructor without arguments (default
constructor)

and we can access class members through basicDBObject.getString("abc")




---
*Amol Suryawanshi*
Java Developer
am...@iprogrammer.com


*iProgrammer Solutions Pvt. Ltd.*



*Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune -
411016,
MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
www.iprogrammer.com

[jira] [Created] (FLINK-9979) Support a custom FlinkKafkaPartitioner for a Kafka table sink factory

2018-07-27 Thread Timo Walther (JIRA)

Timo Walther created FLINK-9979:
---

 Summary: Support a custom FlinkKafkaPartitioner for a Kafka table 
sink factory
 Key: FLINK-9979
 URL: https://issues.apache.org/jira/browse/FLINK-9979
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


Currently, the Kafka table sink factory does not support a custom 
FlinkKafkaPartitioner. However, this is needed for many use cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Access generic pojo fields

2018-07-27 Thread Timo Walther


Hi Amol,

the dot operation is reserved for calling functions on fields. If you 
want to get a nested field in the Table API, use the 
`.get("applicationId")` operation. See also [1] under "Value access 
functions".


Regards,
Timo

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/table/tableApi.html#built-in-functions



Am 27.07.18 um 09:10 schrieb Amol S - iProgrammer:

Hello Fabian,

I am streaming my mongodb oplog using flink and want to use flink table API
to join multiple tables.  My code looks like

DataStream streamSource = env
 .addSource(kafkaConsumer)
 .setParallelism(4);

StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
// Convert the DataStream into a Table with default fields "f0", "f1"
Table table1 = tableEnv.fromDataStream(streamSource);

Table master = table1.filter("ns === 'Master'").select("o as master,
o.applicationId as primaryKey");
Table child1 = table1.filter("ns === 'Child1'").select("o  as child1,
o.applicationId as foreignKey");
Table child2 = table1.filter("ns === 'Child2'").select("o  as child2,
o.applicationId as foreignKey2");

Table result = 
master.join(child1).where("primaryKey==foreignKey").join(child2).where("primaryKey==foreignKey2");

it is throwing error "Method threw
'org.apache.flink.table.api.ValidationException' exception. Undefined
function: APPLICATIONID"

public class Oplog implements Serializable{
 private BasicDBObject o;
}

Where o is generic java type for fetching mongodb oplog and I can not
replace this generic type with static pojo's. please tell me any work
around on this.

BasicDBObject suffice following two rules


-

The class must be public.
-

It must have a public constructor without arguments (default constructor)

and we can access class members through basicDBObject.getString("abc")




---
*Amol Suryawanshi*
Java Developer
am...@iprogrammer.com


*iProgrammer Solutions Pvt. Ltd.*



*Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune - 411016,
MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
www.iprogrammer.com

Re: [VOTE] Release 1.5.2, release candidate #2

2018-07-26 Thread Timo Walther

I agree with Chesnay. Only regressions in the release candidate should 
cancel minor releases.


Timo


Am 26.07.18 um 15:02 schrieb Chesnay Schepler:

Since the regression already existed in 1.5.0 I will not cancel the vote,
as there's no benefit to canceling the bugfix release.

If a fix is found within the voting period we should consider it so we
don't have to create a separate release shortly afterwards for this 
issue.


On 26.07.2018 14:29, Piotr Nowojski wrote:

Voting -1

Flip6 in 1.5.0 seems like introduced regression in resource 
management in batch: https://issues.apache.org/jira/browse/FLINK-9969 



It’s not a super new bug, yet still I would block the release until 
we either fix it or confirm that’s a some misconfiguration.


Piotrek


On 26 Jul 2018, at 10:39, Chesnay Schepler  wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 
1.5.2, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases 
to be deployed to dist.apache.org [2], which are signed with the key 
with fingerprint 11D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.1-rc2" [5],
* website pull request listing the new release and adding 
announcement blog post [6].


The vote will be open for at least 72 hours. It is adopted by 
majority approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12343588

[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1174
[5] 
https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.5.2-rc2

[6] https://github.com/apache/flink-web/pull/114

[jira] [Created] (FLINK-9966) Add a ORC table factory

Timo Walther created FLINK-9966:
---

 Summary: Add a ORC table factory
 Key: FLINK-9966
 URL: https://issues.apache.org/jira/browse/FLINK-9966
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


We should allow to define an `OrcTableSource` using a table factory. How we 
split connector and format is up for discussion. An ORC format might also be 
necessary for the new streaming file sink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-9965) Support schema derivation in Avro table format factory

Timo Walther created FLINK-9965:
---

 Summary: Support schema derivation in Avro table format factory
 Key: FLINK-9965
 URL: https://issues.apache.org/jira/browse/FLINK-9965
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


Currently, only the {{org.apache.flink.formats.json.JsonRowFormatFactory}} is 
able to use the information provided by the table schema to derive the JSON 
format schema. Avro should support this as well in order to avoid specifying a 
schema twice. This requires the inverse operation that 
{{org.apache.flink.formats.avro.typeutils.AvroSchemaConverter}} does. Instead 
of avroschema-to-typeinfo we need a typeinfo-to-avroschema converter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-9964) Add a CSV table format factory

Timo Walther created FLINK-9964:
---

 Summary: Add a CSV table format factory
 Key: FLINK-9964
 URL: https://issues.apache.org/jira/browse/FLINK-9964
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


We should add a RFC 4180 compliant CSV table format factory to read and write 
data into Kafka and other connectors. This requires a 
{{SerializationSchemaFactory}} and {{DeserializationSchemaFactory}}. How we 
want to represent all data types and nested types is still up for discussion. 
For example, we could flatten and deflatten nested types as it is done 
[here|http://support.gnip.com/articles/json2csv.html]. We can also have a look 
how tools such as the Avro to CSV tool perform the conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-9963) Create a string table format factory