Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Timo Walther Fri, 23 Nov 2018 01:36:41 -0800

Hi everyone,

thanks for the great feedback so far. I updated the document with theinput I got so far


@Fabian: I moved the porting of flink-table-runtime classes up in the list.

@Xiaowei: Could you elaborate what "interface only" means to you? Do youmean a module containing pure Java `interface`s? Or is the validationlogic also part of the API module? Are 50+ expression classes part ofthe API interface or already too implementation-specific?

@Xuefu: I extended the document by almost a page to clarify when weshould develop in Scala and when in Java. As Piotr said, every new Scalaline is instant technical debt.


Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:

Hi Timo,

Thanks for writing this down +1 from my side :)

I'm wondering that whether we can have rule in the interim when Java and Scala 
coexist that dependency can only be one-way. I found that in the current code 
base there are cases where a Scala class extends Java and vise versa. This is 
quite painful. I'm thinking if we could say that extension can only be from 
Java to Scala, which will help the situation. However, I'm not sure if this is 
practical.

Xuefu: I’m also not sure what’s the best approach here, probably we will have 
to work it out as we go. One thing to consider is that from now on, every 
single new code line written in Scala anywhere in Flink-table (except of 
Flink-table-api-scala) is an instant technological debt. From this perspective 
I would be in favour of tolerating quite big inchonvieneces just to avoid any 
new Scala code.

Piotrek

On 23 Nov 2018, at 03:25, Zhang, Xuefu <[email protected]> wrote:

Hi Timo,

Thanks for the effort and the Google writeup. During our external catalog 
rework, we found much confusion between Java and Scala, and this Scala-free 
roadmap should greatly mitigate that.

I'm wondering that whether we can have rule in the interim when Java and Scala 
coexist that dependency can only be one-way. I found that in the current code 
base there are cases where a Scala class extends Java and vise versa. This is 
quite painful. I'm thinking if we could say that extension can only be from 
Java to Scala, which will help the situation. However, I'm not sure if this is 
practical.

Thanks,
Xuefu


------------------------------------------------------------------
Sender:jincheng sun <[email protected]>
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev <[email protected]>
Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many dependence. In
particular, it is not necessary to introduce the specific implementation
dependencies which users do not care about. So I am glad to see your
proposal, and hope when we consider splitting the API interface into a
separate module, so that the user can introduce minimum of dependencies.

So, +1 to [separation of interface and implementation; e.g. `Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng

Xiaowei Jiang <[email protected]> 于2018年11月22日周四 下午10:50写道：

Hi Timo, thanks for driving this! I think that this is a nice thing to do.
While we are doing this, can we also keep in mind that we want to
eventually have a TableAPI interface only module which users can take
dependency on, but without including any implementation details?

Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <[email protected]> wrote:

Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize the porting of the
flink-table-common classes.
Since flink-table-runtime is (or should be) independent of the API and
planner modules, we could start porting these classes once the code is
split into the new module structure.
The benefits of a Scala-free flink-table-runtime would be a Scala-free
execution Jar.

Best, Fabian


Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther <
[email protected]

:
Hi everyone,

I would like to continue this discussion thread and convert the outcome
into a FLIP such that users and contributors know what to expect in the
upcoming releases.

I created a design document [1] that clarifies our motivation why we
want to do this, how a Maven module structure could look like, and a
suggestion for a migration plan.

It would be great to start with the efforts for the 1.8 release such
that new features can be developed in Java and major refactorings such
as improvements to the connectors and external catalog support are not
blocked.

Please let me know what you think.

Regards,
Timo

[1]

https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing


Am 02.07.18 um 17:08 schrieb Fabian Hueske:

Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for the

comments.

I think the first step would be to separate the flink-table module

into

multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later divided

further

into Java/Scala Table API/SQL
- flink-table-planning: involves all planning (basically everything

we

do

with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime module and

certain

parts of the planning module ported to Java.
The api module will be much harder to port because of several

dependencies

to Scala core classes (the parser framework, tree iterations, etc.).

I'm

not saying we should not port this to Java, but it is not clear to me

(yet)

how to do it.

I think flink-table-runtime should not be too hard to port. The code

does

not make use of many Scala features, i.e., it's writing very

Java-like.

Also, there are not many dependencies and operators can be

individually

ported step-by-step.
For flink-table-planning, we can have certain packages that we port

to

Java

like planning rules or plan nodes. The related classes mostly extend
Calcite's Java interfaces/classes and would be natural choices for

being

ported. The code generation classes will require more effort to port.

There

are also some dependencies in planning on the api module that we

would

need

to resolve somehow.

For SQL most work when adding new features is done in the planning

and

runtime modules. So, this separation should already reduce

"technological

dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian



2018-07-02 16:26 GMT+02:00 Xingcan Cui <[email protected]>:

Hi all,

I also think about this problem these days and here are my thoughts.

1) We must admit that it’s really a tough task to interoperate with

Java

and Scala. E.g., they have different collection types (Scala

collections

v.s. java.util.*) and in Java, it's hard to implement a method which

takes

Scala functions as parameters. Considering the major part of the

code

base

is implemented in Java, +1 for this goal from a long-term view.

2) The ideal solution would be to just expose a Scala API and make

all

the

other parts Scala-free. But I am not sure if it could be achieved

even

in a

long-term. Thus as Timo suggested, keep the Scala codes in
"flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new features
should be added in Java (regardless of the modules), in order to

prevent

the Scala codes from growing.

Best,
Xingcan

On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <

[email protected]>

wrote:

Bumping the topic.

If we want to do this, the sooner we decide, the less code we will

have

to rewrite. I have some objections/counter proposals to Fabian's

proposal

of doing it module wise and one module at a time.

First, I do not see a problem of having java/scala code even within

one

module, especially not if there are clean boundaries. Like we could

have

API in Scala and optimizer rules/logical nodes written in Java in

the

same

module. However I haven’t previously maintained mixed scala/java

code

bases

before, so I might be missing something here.

Secondly this whole migration might and most like will take longer

then

expected, so that creates a problem for a new code that we will be
creating. After making a decision to migrate to Java, almost any new

Scala

line of code will be immediately a technological debt and we will

have

to

rewrite it to Java later.

Thus I would propose first to state our end goal - modules

structure

and

which parts of modules we want to have eventually Scala-free.

Secondly

taking all steps necessary that will allow us to write new code

complaint

with our end goal. Only after that we should/could focus on

incrementally

rewriting the old code. Otherwise we could be stuck/blocked for

years

writing new code in Scala (and increasing technological debt),

because

nobody have found a time to rewrite some non important and not

actively

developed part of some module.

Piotrek

On 14 Jun 2018, at 15:34, Fabian Hueske <[email protected]>

wrote:

Hi,

In general, I think this is a good effort. However, it won't be

easy

and I

think we have to plan this well.
I don't like the idea of having the whole code base fragmented

into

Java

and Scala code for too long.

I think we should do this one step at a time and focus on

migrating

one

module at a time.
IMO, the easiest start would be to port the runtime to Java.
Extracting the API classes into an own module, porting them to

Java,

and

removing the Scala dependency won't be possible without breaking

the

API

since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <[email protected]>:

I think that is a noble and honorable goal and we should strive

for

it.

This, however, must be an iterative process given the sheer size

of

the

code base. I like the approach to define common Java modules

which

are

used

by more specific Scala modules and slowly moving classes from

Scala

to

Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <

[email protected]>

wrote:

Hi,

I do not have an experience with how scala and java interacts

with

each

other, so I can not fully validate your proposal, but generally

speaking

+1

from me.

Does it also mean, that we should slowly migrate

`flink-table-core`

to

Java? How would you envision it? It would be nice to be able to

add

new

classes/features written in Java and so that they can coexist

with

old

Scala code until we gradually switch from Scala to Java.

Piotrek

On 13 Jun 2018, at 11:32, Timo Walther <[email protected]>

wrote:

Hi everyone,

as you all know, currently the Table & SQL API is implemented

in

Scala.

This decision was made a long-time ago when the initital code

base

was

created as part of a master's thesis. The community kept Scala

because of

the nice language features that enable a fluent Table API like
table.select('field.trim()) and because Scala allows for quick

prototyping

(e.g. multi-line comments for code generation). The committers

enforced

not

splitting the code-base into two programming languages.

However, nowadays the flink-table module more and more becomes

an

important part in the Flink ecosystem. Connectors, formats, and

SQL

client

are actually implemented in Java but need to interoperate with

flink-table

which makes these modules dependent on Scala. As mentioned in an

earlier

mail thread, using Scala for API classes also exposes member

variables

and

methods in Java that should not be exposed to users [1]. Java is

still

the

most important API language and right now we treat it as a

second-class

citizen. I just noticed that you even need to add Scala if you

just

want

to

implement a ScalarFunction because of method clashes between

`public

String

toString()` and `public scala.Predef.String toString()`.

Given the size of the current code base, reimplementing the

entire

flink-table code in Java is a goal that we might never reach.

However, we

should at least treat the symptoms and have this as a long-term

goal

in

mind. My suggestion would be to convert user-facing and runtime

classes

and

split the code base into multiple modules:

flink-table-java {depends on flink-table-core}

Implemented in Java. Java users can use this. This would

require

to

convert classes like TableEnvironment, Table.

flink-table-scala {depends on flink-table-core}

Implemented in Scala. Scala users can use this.

flink-table-common

Implemented in Java. Connectors, formats, and UDFs can use

this.

It

contains interface classes such as descriptors, table sink,

table

source.

flink-table-core {depends on flink-table-common and

flink-table-runtime}

Implemented in Scala. Contains the current main code base.

flink-table-runtime

Implemented in Java. This would require to convert classes in

o.a.f.table.runtime but would improve the runtime potentially.

What do you think?


Regards,

Timo

[1]

http://apache-flink-mailing-list-archive.1008284.n3.

nabble.com/DISCUSS-Convert-main-Table-API-classes-into-

traits-tp21335.html

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Reply via email to