Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Timo Walther Tue, 27 Nov 2018 01:58:58 -0800

Hi Hequn,

thanks for your feedback. Yes, migrating the test cases is another issuethat is not represented in the document but should naturally go alongwith the migration.

I agree that we should migrate the main API classes quickly within this1.8 release after the module split has been performed. Help here ishighly appreciated!

I forgot that Java supports static methods in interfaces now, butactually I don't like the design of calling `TableEnvironment.get(env)`.Because people often use `TableEnvironment tEnd =TableEnvironment.get(env)` and then wonder why there is no`toAppendStream` or `toDataSet` because they are using the base class.However, things like that can be discussed in the corresponding issuewhen it comes to implementation.


@Vino: I think your work fits nicely to these efforts.

@everyone: I will wait for more feedback until end of this week. Then Iwill convert the design document into a FLIP and open subtasks in Jira,if there are no objections?


Regards,
Timo

Am 24.11.18 um 13:45 schrieb vino yang:

Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect to do, but from the
impact I have observed,
it will involve many classes that are currently in flink-table.

*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461

Hequn Cheng <chenghe...@gmail.com> 于2018年11月24日周六 下午7:20写道：

Hi Timo,

Thanks for the effort and writing up this document. I like the idea to make
flink-table scala free, so +1 for the proposal!

It's good to make Java the first-class citizen. For a long time, we have
neglected java so that many features in Table are missed in Java Test
cases, such as this one[1] I found recently. And I think we may also need
to migrate our test cases, i.e, add java tests.

This definitely is a big change and will break API compatible. In order to
bring a smaller impact on users, I think we should go fast when we migrate
APIs targeted to users. It's better to introduce the user sensitive changes
within a release. However, it may be not that easy. I can help to
contribute.

Separation of interface and implementation is a good idea. This may
introduce a minimum of dependencies or even no dependencies. I saw your
reply in the google doc. Java8 has already supported static method for
interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001


On Fri, Nov 23, 2018 at 5:36 PM Timo Walther <twal...@apache.org> wrote:

Hi everyone,

thanks for the great feedback so far. I updated the document with the
input I got so far

@Fabian: I moved the porting of flink-table-runtime classes up in the

list.

@Xiaowei: Could you elaborate what "interface only" means to you? Do you
mean a module containing pure Java `interface`s? Or is the validation
logic also part of the API module? Are 50+ expression classes part of
the API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when we
should develop in Scala and when in Java. As Piotr said, every new Scala
line is instant technical debt.

Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:

Hi Timo,

Thanks for writing this down +1 from my side :)

I'm wondering that whether we can have rule in the interim when Java

and Scala coexist that dependency can only be one-way. I found that in

the

current code base there are cases where a Scala class extends Java and

vise

versa. This is quite painful. I'm thinking if we could say that extension
can only be from Java to Scala, which will help the situation. However,

I'm

not sure if this is practical.

Xuefu: I’m also not sure what’s the best approach here, probably we

will

have to work it out as we go. One thing to consider is that from now on,
every single new code line written in Scala anywhere in Flink-table

(except

of Flink-table-api-scala) is an instant technological debt. From this
perspective I would be in favour of tolerating quite big inchonvieneces
just to avoid any new Scala code.

Piotrek

On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuef...@alibaba-inc.com>

wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our external

catalog rework, we found much confusion between Java and Scala, and this
Scala-free roadmap should greatly mitigate that.

I'm wondering that whether we can have rule in the interim when Java

and Scala coexist that dependency can only be one-way. I found that in

the

current code base there are cases where a Scala class extends Java and

vise

versa. This is quite painful. I'm thinking if we could say that extension
can only be from Java to Scala, which will help the situation. However,

I'm

not sure if this is practical.

Thanks,
Xuefu


------------------------------------------------------------------
Sender:jincheng sun <sunjincheng...@gmail.com>
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many dependence. In
particular, it is not necessary to introduce the specific

implementation

dependencies which users do not care about. So I am glad to see your
proposal, and hope when we consider splitting the API interface into a
separate module, so that the user can introduce minimum of

dependencies.

So, +1 to [separation of interface and implementation; e.g. `Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang <xiaow...@gmail.com> 于2018年11月22日周四 下午10:50写道：

Hi Timo, thanks for driving this! I think that this is a nice thing

to

do.

While we are doing this, can we also keep in mind that we want to
eventually have a TableAPI interface only module which users can take
dependency on, but without including any implementation details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhue...@gmail.com>

wrote:

Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize the porting of the
flink-table-common classes.
Since flink-table-runtime is (or should be) independent of the API

and

planner modules, we could start porting these classes once the code

is

split into the new module structure.
The benefits of a Scala-free flink-table-runtime would be a

Scala-free

execution Jar.

Best, Fabian


Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
twal...@apache.org

:
Hi everyone,

I would like to continue this discussion thread and convert the

outcome

into a FLIP such that users and contributors know what to expect in

the

upcoming releases.

I created a design document [1] that clarifies our motivation why

we

want to do this, how a Maven module structure could look like, and

suggestion for a migration plan.

It would be great to start with the efforts for the 1.8 release

such

that new features can be developed in Java and major refactorings

such

as improvements to the connectors and external catalog support are

not

blocked.

Please let me know what you think.

Regards,
Timo

[1]

https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing

Am 02.07.18 um 17:08 schrieb Fabian Hueske:

Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for the

comments.

I think the first step would be to separate the flink-table module

into

multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later divided

further

into Java/Scala Table API/SQL
- flink-table-planning: involves all planning (basically

everything

we

do

with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime module and

certain

parts of the planning module ported to Java.
The api module will be much harder to port because of several

dependencies

to Scala core classes (the parser framework, tree iterations,

etc.).

I'm

not saying we should not port this to Java, but it is not clear to

me

(yet)

how to do it.

I think flink-table-runtime should not be too hard to port. The

code

does

not make use of many Scala features, i.e., it's writing very

Java-like.

Also, there are not many dependencies and operators can be

individually

ported step-by-step.
For flink-table-planning, we can have certain packages that we

port

to

Java

like planning rules or plan nodes. The related classes mostly

extend

Calcite's Java interfaces/classes and would be natural choices for

being

ported. The code generation classes will require more effort to

port.

There

are also some dependencies in planning on the api module that we

would

need

to resolve somehow.

For SQL most work when adding new features is done in the planning

and

runtime modules. So, this separation should already reduce

"technological

dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian



2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingc...@gmail.com>:

Hi all,

I also think about this problem these days and here are my

thoughts.

1) We must admit that it’s really a tough task to interoperate

with

Java

and Scala. E.g., they have different collection types (Scala

collections

v.s. java.util.*) and in Java, it's hard to implement a method

which

takes

Scala functions as parameters. Considering the major part of the

code

base

is implemented in Java, +1 for this goal from a long-term view.

2) The ideal solution would be to just expose a Scala API and

make

all

the

other parts Scala-free. But I am not sure if it could be achieved

even

in a

long-term. Thus as Timo suggested, keep the Scala codes in
"flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new

features

should be added in Java (regardless of the modules), in order to

prevent

the Scala codes from growing.

Best,
Xingcan

On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <

pi...@data-artisans.com>

wrote:

Bumping the topic.

If we want to do this, the sooner we decide, the less code we

will

have

to rewrite. I have some objections/counter proposals to Fabian's

proposal

of doing it module wise and one module at a time.

First, I do not see a problem of having java/scala code even

within

one

module, especially not if there are clean boundaries. Like we

could

have

API in Scala and optimizer rules/logical nodes written in Java in

the

same

module. However I haven’t previously maintained mixed scala/java

code

bases

before, so I might be missing something here.

Secondly this whole migration might and most like will take

longer

then

expected, so that creates a problem for a new code that we will

be

creating. After making a decision to migrate to Java, almost any

new

Scala

line of code will be immediately a technological debt and we will

have

to

rewrite it to Java later.

Thus I would propose first to state our end goal - modules

structure

and

which parts of modules we want to have eventually Scala-free.

Secondly

taking all steps necessary that will allow us to write new code

complaint

with our end goal. Only after that we should/could focus on

incrementally

rewriting the old code. Otherwise we could be stuck/blocked for

years

writing new code in Scala (and increasing technological debt),

because

nobody have found a time to rewrite some non important and not

actively

developed part of some module.

Piotrek

On 14 Jun 2018, at 15:34, Fabian Hueske <fhue...@gmail.com>

wrote:

Hi,

In general, I think this is a good effort. However, it won't be

easy

and I

think we have to plan this well.
I don't like the idea of having the whole code base fragmented

into

Java

and Scala code for too long.

I think we should do this one step at a time and focus on

migrating

one

module at a time.
IMO, the easiest start would be to port the runtime to Java.
Extracting the API classes into an own module, porting them to

Java,

and

removing the Scala dependency won't be possible without

breaking

the

API

since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrm...@apache.org

I think that is a noble and honorable goal and we should

strive

for

it.

This, however, must be an iterative process given the sheer

size

of

the

code base. I like the approach to define common Java modules

which

are

used

by more specific Scala modules and slowly moving classes from

Scala

to

Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <

pi...@data-artisans.com>

wrote:

Hi,

I do not have an experience with how scala and java interacts

with

each

other, so I can not fully validate your proposal, but

generally

speaking

+1

from me.

Does it also mean, that we should slowly migrate

`flink-table-core`

to

Java? How would you envision it? It would be nice to be able

to

add

new

classes/features written in Java and so that they can coexist

with

old

Scala code until we gradually switch from Scala to Java.

Piotrek

On 13 Jun 2018, at 11:32, Timo Walther <twal...@apache.org>

wrote:

Hi everyone,

as you all know, currently the Table & SQL API is

implemented

in

Scala.

This decision was made a long-time ago when the initital code

base

was

created as part of a master's thesis. The community kept

Scala

because of

the nice language features that enable a fluent Table API

like

table.select('field.trim()) and because Scala allows for

quick

prototyping

(e.g. multi-line comments for code generation). The

committers

enforced

not

splitting the code-base into two programming languages.

However, nowadays the flink-table module more and more

becomes

an

important part in the Flink ecosystem. Connectors, formats,

and

SQL

client

are actually implemented in Java but need to interoperate

with

flink-table

which makes these modules dependent on Scala. As mentioned in

an

earlier

mail thread, using Scala for API classes also exposes member

variables

and

methods in Java that should not be exposed to users [1]. Java

is

still

the

most important API language and right now we treat it as a

second-class

citizen. I just noticed that you even need to add Scala if

you

just

want

to

implement a ScalarFunction because of method clashes between

`public

String

toString()` and `public scala.Predef.String toString()`.

Given the size of the current code base, reimplementing the

entire

flink-table code in Java is a goal that we might never reach.

However, we

should at least treat the symptoms and have this as a

long-term

goal

in

mind. My suggestion would be to convert user-facing and

runtime

classes

and

split the code base into multiple modules:

flink-table-java {depends on flink-table-core}

Implemented in Java. Java users can use this. This would

require

to

convert classes like TableEnvironment, Table.

flink-table-scala {depends on flink-table-core}

Implemented in Scala. Scala users can use this.

flink-table-common

Implemented in Java. Connectors, formats, and UDFs can use

this.

It

contains interface classes such as descriptors, table sink,

table

source.

flink-table-core {depends on flink-table-common and

flink-table-runtime}

Implemented in Scala. Contains the current main code base.

flink-table-runtime

Implemented in Java. This would require to convert classes

in

o.a.f.table.runtime but would improve the runtime

potentially.

What do you think?


Regards,

Timo

[1]

http://apache-flink-mailing-list-archive.1008284.n3.

nabble.com/DISCUSS-Convert-main-Table-API-classes-into-

traits-tp21335.html

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Reply via email to