Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Timo Walther Thu, 29 Nov 2018 00:42:51 -0800

@Kurt: Yes, I don't think that that forks of Flink will have a hard timeto keep up with the porting. That is also why I called this `long-termgoal` because I don't see big resources for the porting to happenquicker. But at least new features, API, and runtime profit from Java toScala conversion.


@Jark: I updated the document:


1. flink-table-common has been renamed to flink-table-spi by request.

2. Yes, good point. flink-sql-client can be moved there as well.

3. I added a paragraph to the document. Porting the code generation toJava makes only sense if acceptable tooling for it is in place.



Thanks for the feedback,

Timo


Am 29.11.18 um 08:28 schrieb Jark Wu:

Hi Timo,

Thanks for the great work!

Moving flink-table to Java is a long-awaited things but will involve much
effort. Agree with that we should make it as a long-term goal.

I have read the google doc and +1 for the proposal. Here I have some
questions:

1. Where should the flink-table-common module place ?  Will we move the
flink-table-common classes to the new modules?
2. Should flink-sql-client also as a sub-module under flink-table ?
3. The flink-table-planner contains code generation and will be converted
to Java. Actually, I prefer using Scala to code generate because of the
Multiline-String and String-Interpolation (i.e. s"hello $user") features in
Scala. It makes code of code-generation more readable. Do we really
want to migrate
code generation to Java?

Best,
Jark


On Wed, 28 Nov 2018 at 09:14, Kurt Young <[email protected]> wrote:

Hi Timo and Vino,

I agree that table is very active and there is no guarantee for not
producing any conflicts if you decide
to develop based on community version. I think this part is the risk what
we can imagine in the first place. But massively
language replacing is something you can not imagine and be ready for, there
is no feature added, no refactor is done, simply changing
from scala to java will cause lots of conflicts.

But I also agree that this is a "technical debt" that we should eventually
pay, as you said, we can do this slowly, even one file each time,
let other people have more time to resolve the conflicts.

Best,
Kurt


On Tue, Nov 27, 2018 at 8:37 PM Timo Walther <[email protected]> wrote:

Hi Kurt,

I understand your concerns. However, there is no concrete roadmap for
Flink 2.0 and (as Vino said) the flink-table is developed very actively.
Major refactorings happened in the past and will also happen with or
without Scala migration. A good example, is the proper catalog support
which will refactor big parts of the TableEnvironment class. Or the
introduction of "retractions" which needed a big refactoring of the
planning phase. Stability is only guaranteed for the API and the general
behavior, however, currently flink-table is not using @Public or
@PublicEvolving annotations for a reason.

I think the migration will still happen slowly because it needs people
that allocate time for that. Therefore, even Flink forks can slowly
adapt to the evolving Scala-to-Java code base.

Regards,
Timo


Am 27.11.18 um 13:16 schrieb vino yang:

Hi Kurt,

Currently, there is still a long time to go from flink 2.0. Considering
that the flink-table
is one of the most active modules in the current flink project, each
version has
a number of changes and features added. I think that refactoring faster
will reduce subsequent
complexity and workload. And this may be a gradual and long process. We
should be able to
   regard it as a "technical debt", and if it does not change it, it

will

also affect the decision-making of other issues.

Thanks, vino.

Kurt Young <[email protected]> 于2018年11月27日周二 下午7:34写道：

Hi Timo,

Thanks for writing up the document. I'm +1 for reorganizing the module
structure and make table scala free. But I have
a little concern abount the timing. Is it more appropriate to get this

done

when Flink decide to bump to next big version, like 2.x.
It's true you can keep all the class's package path as it is, and will

not

introduce API change. But if some company are developing their own
Flink, and sync with community version by rebasing, may face a lot of
conflicts. Although you can avoid conflicts by always moving source

codes

between packages, but I assume you still need to delete the original

scala

file and add a new java file when you want to change program language.

Best,
Kurt


On Tue, Nov 27, 2018 at 5:57 PM Timo Walther <[email protected]>

wrote:

Hi Hequn,

thanks for your feedback. Yes, migrating the test cases is another

issue

that is not represented in the document but should naturally go along
with the migration.

I agree that we should migrate the main API classes quickly within

this

1.8 release after the module split has been performed. Help here is
highly appreciated!

I forgot that Java supports static methods in interfaces now, but
actually I don't like the design of calling

`TableEnvironment.get(env)`.

Because people often use `TableEnvironment tEnd =
TableEnvironment.get(env)` and then wonder why there is no
`toAppendStream` or `toDataSet` because they are using the base

class.

However, things like that can be discussed in the corresponding issue
when it comes to implementation.

@Vino: I think your work fits nicely to these efforts.

@everyone: I will wait for more feedback until end of this week.

Then I

will convert the design document into a FLIP and open subtasks in

Jira,

if there are no objections?

Regards,
Timo

Am 24.11.18 um 13:45 schrieb vino yang:

Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect to do, but

from

the

impact I have observed,
it will involve many classes that are currently in flink-table.

*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461

Hequn Cheng <[email protected]> 于2018年11月24日周六 下午7:20写道：

Hi Timo,

Thanks for the effort and writing up this document. I like the idea

to

make

flink-table scala free, so +1 for the proposal!

It's good to make Java the first-class citizen. For a long time, we

have

neglected java so that many features in Table are missed in Java

Test

cases, such as this one[1] I found recently. And I think we may

also

need

to migrate our test cases, i.e, add java tests.

This definitely is a big change and will break API compatible. In

order

to

bring a smaller impact on users, I think we should go fast when we

migrate

APIs targeted to users. It's better to introduce the user sensitive

changes

within a release. However, it may be not that easy. I can help to
contribute.

Separation of interface and implementation is a good idea. This may
introduce a minimum of dependencies or even no dependencies. I saw

your

reply in the google doc. Java8 has already supported static method

for

interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001


On Fri, Nov 23, 2018 at 5:36 PM Timo Walther <[email protected]>

wrote:

Hi everyone,

thanks for the great feedback so far. I updated the document with

the

input I got so far

@Fabian: I moved the porting of flink-table-runtime classes up in

the

list.

@Xiaowei: Could you elaborate what "interface only" means to you?

Do

you

mean a module containing pure Java `interface`s? Or is the

validation

logic also part of the API module? Are 50+ expression classes part

of

the API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when

we

should develop in Scala and when in Java. As Piotr said, every new

Scala

line is instant technical debt.

Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:

Hi Timo,

Thanks for writing this down +1 from my side :)

I'm wondering that whether we can have rule in the interim when

Java

and Scala coexist that dependency can only be one-way. I found

that

in

the

current code base there are cases where a Scala class extends Java

and

vise

versa. This is quite painful. I'm thinking if we could say that

extension

can only be from Java to Scala, which will help the situation.

However,

I'm

not sure if this is practical.

Xuefu: I’m also not sure what’s the best approach here, probably

we

will

have to work it out as we go. One thing to consider is that from

now

on,

every single new code line written in Scala anywhere in

Flink-table

(except

of Flink-table-api-scala) is an instant technological debt. From

this

perspective I would be in favour of tolerating quite big

inchonvieneces

just to avoid any new Scala code.

Piotrek

On 23 Nov 2018, at 03:25, Zhang, Xuefu <[email protected]

wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our

external

catalog rework, we found much confusion between Java and Scala,

and

this

Scala-free roadmap should greatly mitigate that.

I'm wondering that whether we can have rule in the interim when

Java

and Scala coexist that dependency can only be one-way. I found

that

in

the

current code base there are cases where a Scala class extends Java

and

vise

versa. This is quite painful. I'm thinking if we could say that

extension

can only be from Java to Scala, which will help the situation.

However,

I'm

not sure if this is practical.

Thanks,
Xuefu

------------------------------------------------------------------

Sender:jincheng sun <[email protected]>
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev <[email protected]>
Subject:Re: [DISCUSS] Long-term goal of making flink-table

Scala-free

Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many

dependence.

In

particular, it is not necessary to introduce the specific

implementation

dependencies which users do not care about. So I am glad to see

your

proposal, and hope when we consider splitting the API interface

into

separate module, so that the user can introduce minimum of

dependencies.

So, +1 to [separation of interface and implementation; e.g.

`Table` &

`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang <[email protected]> 于2018年11月22日周四 下午10:50写道：

Hi Timo, thanks for driving this! I think that this is a nice

thing

to

do.

While we are doing this, can we also keep in mind that we want

to

eventually have a TableAPI interface only module which users

can

take

dependency on, but without including any implementation

details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <

[email protected]

wrote:

Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize the porting

of

the

flink-table-common classes.
Since flink-table-runtime is (or should be) independent of the

API

and

planner modules, we could start porting these classes once the

code

is

split into the new module structure.
The benefits of a Scala-free flink-table-runtime would be a

Scala-free

execution Jar.

Best, Fabian


Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
[email protected]

:
Hi everyone,

I would like to continue this discussion thread and convert

the

outcome

into a FLIP such that users and contributors know what to

expect

in

the

upcoming releases.

I created a design document [1] that clarifies our motivation

why

we

want to do this, how a Maven module structure could look

like,

and

suggestion for a migration plan.

It would be great to start with the efforts for the 1.8

release

such

that new features can be developed in Java and major

refactorings

such

as improvements to the connectors and external catalog

support

are

not

blocked.

Please let me know what you think.

Regards,
Timo

[1]

https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing

Am 02.07.18 um 17:08 schrieb Fabian Hueske:

Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for

the

comments.

I think the first step would be to separate the flink-table

module

into

multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later

divided

further

into Java/Scala Table API/SQL
- flink-table-planning: involves all planning (basically

everything

we

do

with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime module

and

certain

parts of the planning module ported to Java.
The api module will be much harder to port because of

several

dependencies

to Scala core classes (the parser framework, tree

iterations,

etc.).

I'm

not saying we should not port this to Java, but it is not

clear

to

me

(yet)

how to do it.

I think flink-table-runtime should not be too hard to port.

The

code

does

not make use of many Scala features, i.e., it's writing very

Java-like.

Also, there are not many dependencies and operators can be

individually

ported step-by-step.
For flink-table-planning, we can have certain packages that

we

port

to

Java

like planning rules or plan nodes. The related classes

mostly

extend

Calcite's Java interfaces/classes and would be natural

choices

for

being

ported. The code generation classes will require more effort

to

port.

There

are also some dependencies in planning on the api module

that

we

would

need

to resolve somehow.

For SQL most work when adding new features is done in the

planning

and

runtime modules. So, this separation should already reduce

"technological

dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian



2018-07-02 16:26 GMT+02:00 Xingcan Cui <[email protected]

Hi all,

I also think about this problem these days and here are my

thoughts.

1) We must admit that it’s really a tough task to

interoperate

with

Java

and Scala. E.g., they have different collection types

(Scala

collections

v.s. java.util.*) and in Java, it's hard to implement a

method

which

takes

Scala functions as parameters. Considering the major part

of

the

code

base

is implemented in Java, +1 for this goal from a long-term

view.

2) The ideal solution would be to just expose a Scala API

and

make

all

the

other parts Scala-free. But I am not sure if it could be

achieved

even

in a

long-term. Thus as Timo suggested, keep the Scala codes in
"flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new

features

should be added in Java (regardless of the modules), in

order

to

prevent

the Scala codes from growing.

Best,
Xingcan

On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <

[email protected]>

wrote:

Bumping the topic.

If we want to do this, the sooner we decide, the less code

we

will

have

to rewrite. I have some objections/counter proposals to

Fabian's

proposal

of doing it module wise and one module at a time.

First, I do not see a problem of having java/scala code

even

within

one

module, especially not if there are clean boundaries. Like

we

could

have

API in Scala and optimizer rules/logical nodes written in

Java

in

the

same

module. However I haven’t previously maintained mixed

scala/java

code

bases

before, so I might be missing something here.

Secondly this whole migration might and most like will

take

longer

then

expected, so that creates a problem for a new code that we

will

be

creating. After making a decision to migrate to Java,

almost

any

new

Scala

line of code will be immediately a technological debt and

we

will

have

to

rewrite it to Java later.

Thus I would propose first to state our end goal - modules

structure

and

which parts of modules we want to have eventually

Scala-free.

Secondly

taking all steps necessary that will allow us to write new

code

complaint

with our end goal. Only after that we should/could focus on

incrementally

rewriting the old code. Otherwise we could be stuck/blocked

for

years

writing new code in Scala (and increasing technological

debt),

because

nobody have found a time to rewrite some non important and

not

actively

developed part of some module.

Piotrek

On 14 Jun 2018, at 15:34, Fabian Hueske <

[email protected]

wrote:

Hi,

In general, I think this is a good effort. However, it

won't

be

easy

and I

think we have to plan this well.
I don't like the idea of having the whole code base

fragmented

into

Java

and Scala code for too long.

I think we should do this one step at a time and focus on

migrating

one

module at a time.
IMO, the easiest start would be to port the runtime to

Java.

Extracting the API classes into an own module, porting

them

to

Java,

and

removing the Scala dependency won't be possible without

breaking

the

API

since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <

[email protected]

I think that is a noble and honorable goal and we should

strive

for

it.

This, however, must be an iterative process given the

sheer

size

of

the

code base. I like the approach to define common Java

modules

which

are

used

by more specific Scala modules and slowly moving classes

from

Scala

to

Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <

[email protected]>

wrote:

Hi,

I do not have an experience with how scala and java

interacts

with

each

other, so I can not fully validate your proposal, but

generally

speaking

+1

from me.

Does it also mean, that we should slowly migrate

`flink-table-core`

to

Java? How would you envision it? It would be nice to be

able

to

add

new

classes/features written in Java and so that they can

coexist

with

old

Scala code until we gradually switch from Scala to

Java.

Piotrek

On 13 Jun 2018, at 11:32, Timo Walther <

[email protected]

wrote:

Hi everyone,

as you all know, currently the Table & SQL API is

implemented

in

Scala.

This decision was made a long-time ago when the

initital

code

base

was

created as part of a master's thesis. The community

kept

Scala

because of

the nice language features that enable a fluent Table

API

like

table.select('field.trim()) and because Scala allows

for

quick

prototyping

(e.g. multi-line comments for code generation). The

committers

enforced

not

splitting the code-base into two programming languages.

However, nowadays the flink-table module more and more

becomes

an

important part in the Flink ecosystem. Connectors,

formats,

and

SQL

client

are actually implemented in Java but need to

interoperate

with

flink-table

which makes these modules dependent on Scala. As

mentioned

in

an

earlier

mail thread, using Scala for API classes also exposes

member

variables

and

methods in Java that should not be exposed to users

[1].

Java

is

still

the

most important API language and right now we treat it

as

second-class

citizen. I just noticed that you even need to add Scala

if

you

just

want

to

implement a ScalarFunction because of method clashes

between

`public

String

toString()` and `public scala.Predef.String

toString()`.

Given the size of the current code base,

reimplementing

the

entire

flink-table code in Java is a goal that we might never

reach.

However, we

should at least treat the symptoms and have this as a

long-term

goal

in

mind. My suggestion would be to convert user-facing and

runtime

classes

and

split the code base into multiple modules:

flink-table-java {depends on flink-table-core}

Implemented in Java. Java users can use this. This

would

require

to

convert classes like TableEnvironment, Table.

flink-table-scala {depends on flink-table-core}

Implemented in Scala. Scala users can use this.

flink-table-common

Implemented in Java. Connectors, formats, and UDFs can

use

this.

It

contains interface classes such as descriptors, table

sink,

table

source.

flink-table-core {depends on flink-table-common and

flink-table-runtime}

Implemented in Scala. Contains the current main code

base.

flink-table-runtime

Implemented in Java. This would require to convert

classes

in

o.a.f.table.runtime but would improve the runtime

potentially.

What do you think?


Regards,

Timo

[1]

http://apache-flink-mailing-list-archive.1008284.n3.

nabble.com/DISCUSS-Convert-main-Table-API-classes-into-

traits-tp21335.html

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Reply via email to