Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Timo Walther Thu, 22 Nov 2018 01:55:17 -0800

Hi everyone,

I would like to continue this discussion thread and convert the outcomeinto a FLIP such that users and contributors know what to expect in theupcoming releases.

I created a design document [1] that clarifies our motivation why wewant to do this, how a Maven module structure could look like, and asuggestion for a migration plan.

It would be great to start with the efforts for the 1.8 release suchthat new features can be developed in Java and major refactorings suchas improvements to the connectors and external catalog support are notblocked.


Please let me know what you think.

Regards,
Timo

[1]https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing



Am 02.07.18 um 17:08 schrieb Fabian Hueske:

Hi Piotr,

thanks for bumping this thread and thanks for Xingcan for the comments.

I think the first step would be to separate the flink-table module into
multiple sub modules. These could be:

- flink-table-api: All API facing classes. Can be later divided further
into Java/Scala Table API/SQL
- flink-table-planning: involves all planning (basically everything we do
with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime module and certain
parts of the planning module ported to Java.
The api module will be much harder to port because of several dependencies
to Scala core classes (the parser framework, tree iterations, etc.). I'm
not saying we should not port this to Java, but it is not clear to me (yet)
how to do it.

I think flink-table-runtime should not be too hard to port. The code does
not make use of many Scala features, i.e., it's writing very Java-like.
Also, there are not many dependencies and operators can be individually
ported step-by-step.
For flink-table-planning, we can have certain packages that we port to Java
like planning rules or plan nodes. The related classes mostly extend
Calcite's Java interfaces/classes and would be natural choices for being
ported. The code generation classes will require more effort to port. There
are also some dependencies in planning on the api module that we would need
to resolve somehow.

For SQL most work when adding new features is done in the planning and
runtime modules. So, this separation should already reduce "technological
dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian



2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingc...@gmail.com>:

Hi all,

I also think about this problem these days and here are my thoughts.

1) We must admit that it’s really a tough task to interoperate with Java
and Scala. E.g., they have different collection types (Scala collections
v.s. java.util.*) and in Java, it's hard to implement a method which takes
Scala functions as parameters. Considering the major part of the code base
is implemented in Java, +1 for this goal from a long-term view.

2) The ideal solution would be to just expose a Scala API and make all the
other parts Scala-free. But I am not sure if it could be achieved even in a
long-term. Thus as Timo suggested, keep the Scala codes in
"flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new features
should be added in Java (regardless of the modules), in order to prevent
the Scala codes from growing.

Best,
Xingcan

On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <pi...@data-artisans.com>

wrote:

Bumping the topic.

If we want to do this, the sooner we decide, the less code we will have

to rewrite. I have some objections/counter proposals to Fabian's proposal
of doing it module wise and one module at a time.

First, I do not see a problem of having java/scala code even within one

module, especially not if there are clean boundaries. Like we could have
API in Scala and optimizer rules/logical nodes written in Java in the same
module. However I haven’t previously maintained mixed scala/java code bases
before, so I might be missing something here.

Secondly this whole migration might and most like will take longer then

expected, so that creates a problem for a new code that we will be
creating. After making a decision to migrate to Java, almost any new Scala
line of code will be immediately a technological debt and we will have to
rewrite it to Java later.

Thus I would propose first to state our end goal - modules structure and

which parts of modules we want to have eventually Scala-free. Secondly
taking all steps necessary that will allow us to write new code complaint
with our end goal. Only after that we should/could focus on incrementally
rewriting the old code. Otherwise we could be stuck/blocked for years
writing new code in Scala (and increasing technological debt), because
nobody have found a time to rewrite some non important and not actively
developed part of some module.

Piotrek

On 14 Jun 2018, at 15:34, Fabian Hueske <fhue...@gmail.com> wrote:

Hi,

In general, I think this is a good effort. However, it won't be easy

and I

think we have to plan this well.
I don't like the idea of having the whole code base fragmented into Java
and Scala code for too long.

I think we should do this one step at a time and focus on migrating one
module at a time.
IMO, the easiest start would be to port the runtime to Java.
Extracting the API classes into an own module, porting them to Java, and
removing the Scala dependency won't be possible without breaking the API
since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrm...@apache.org>:

I think that is a noble and honorable goal and we should strive for it.
This, however, must be an iterative process given the sheer size of the
code base. I like the approach to define common Java modules which are

used

by more specific Scala modules and slowly moving classes from Scala to
Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <

pi...@data-artisans.com>

wrote:

Hi,

I do not have an experience with how scala and java interacts with

each

other, so I can not fully validate your proposal, but generally

speaking

+1

from me.

Does it also mean, that we should slowly migrate `flink-table-core` to
Java? How would you envision it? It would be nice to be able to add

new

classes/features written in Java and so that they can coexist with old
Scala code until we gradually switch from Scala to Java.

Piotrek

On 13 Jun 2018, at 11:32, Timo Walther <twal...@apache.org> wrote:

Hi everyone,

as you all know, currently the Table & SQL API is implemented in

Scala.

This decision was made a long-time ago when the initital code base was
created as part of a master's thesis. The community kept Scala

because of

the nice language features that enable a fluent Table API like
table.select('field.trim()) and because Scala allows for quick

prototyping

(e.g. multi-line comments for code generation). The committers

enforced

not

splitting the code-base into two programming languages.

However, nowadays the flink-table module more and more becomes an

important part in the Flink ecosystem. Connectors, formats, and SQL

client

are actually implemented in Java but need to interoperate with

flink-table

which makes these modules dependent on Scala. As mentioned in an

earlier

mail thread, using Scala for API classes also exposes member variables

and

methods in Java that should not be exposed to users [1]. Java is still

the

most important API language and right now we treat it as a

second-class

citizen. I just noticed that you even need to add Scala if you just

want

to

implement a ScalarFunction because of method clashes between `public

String

toString()` and `public scala.Predef.String toString()`.

Given the size of the current code base, reimplementing the entire

flink-table code in Java is a goal that we might never reach.

However, we

should at least treat the symptoms and have this as a long-term goal

in

mind. My suggestion would be to convert user-facing and runtime

classes

and

split the code base into multiple modules:

flink-table-java {depends on flink-table-core}

Implemented in Java. Java users can use this. This would require to

convert classes like TableEnvironment, Table.

flink-table-scala {depends on flink-table-core}

Implemented in Scala. Scala users can use this.

flink-table-common

Implemented in Java. Connectors, formats, and UDFs can use this. It

contains interface classes such as descriptors, table sink, table

source.

flink-table-core {depends on flink-table-common and

flink-table-runtime}

Implemented in Scala. Contains the current main code base.

flink-table-runtime

Implemented in Java. This would require to convert classes in

o.a.f.table.runtime but would improve the runtime potentially.


What do you think?


Regards,

Timo

[1]

http://apache-flink-mailing-list-archive.1008284.n3.

nabble.com/DISCUSS-Convert-main-Table-API-classes-into-

traits-tp21335.html

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Reply via email to