Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

Jing Ge Sun, 26 Feb 2023 13:52:21 -0800

Hi Ran,

NIT: ILIKE is supported by Clickhouse. Since Flink is focusing on theunified batch and stream data processing, it might make sense to supportit. It is more convenient for users.

Thanks for clarifying the definition of "useLike" and "notLike". Itseems that "notLike" depends on "useLike", i.e. some combinations arenot allowed, e.g while userLike is false, it will make no sense to usenotLike with any value. Users might feel confused. Given that there isanother "likePattern" parameter contains information whether "Like" withbe used or not, could we reduce the method signature to using twoparameters? WDYT?


||
|||// enable like by default
|
|||private||boolean||useLike = true;|
||
|||// true for not like, false is like|
|||private||final||boolean||notLike;|

|||// like or not like pattern (sql-like-pattern, consistent withcurrent flink behavior)|

|||private||final||String likePattern;|
    // Like will be enabled/used implicitly when this method is called.
|||public||ShowCatalogsOperation(||boolean||notLike, String likePattern) {|
|||this||.useLike = useLike;|
|||this||.notLike = notLike;|

| // -> additonal logic to assert likePattern before setting thevalue.

|
|||this||.likePattern = likePattern;|
|||}|
|
|
|||public||ShowCatalogsOperation(||String likePattern) {|
|||ShowCatalogsOperation(false, likePattern);|
|||}|
||
|
|
|     // Like is disabled.
|
|||public||ShowCatalogsOperation(||) {|
|||this||.useLike = false;|
|||}|

Best regards,

Jing


On 2/26/23 08:01, Ran Tao wrote:

Hi, Jark, thanks for explantations.
if there are no other questions, i will try to start a VOTE to address
current things in this FLIP.
We may open another dedicated FLIP to discuss INFORMATION_SCHEMA.
hi, guys, WDYT?

Best Regards,
Ran Tao


Jark Wu<[email protected]>  于2023年2月25日周六 22:57写道：

Hi Sergey,

I think INFORMATION_SCHEMA is a very interesting idea, and I hope we can
support it. However, it doesn't conflict with the idea of auxiliary
statements. I can see different benefits of them. The information schema
provides powerful and flexible capabilities but needs to learn the complex
entity relationship[1]. The auxiliary SQL statements are super handy and
can resolve most problems, but they offer limited features.

I can see almost all the mature systems support both of them. I think it
also makes sense to support both of them in Flink. Considering they
are orthogonal and information schema requires more complex design and
discussion, it deserves a separate FLIP. Sergey, are you willing to
contribute this FLIP?

Best,
Jark

[1]:

https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html


On Fri, 24 Feb 2023 at 22:43, Ran Tao<[email protected]>  wrote:

Thanks John.

It seems that most people prefer the information_schema implementation.
information_schema does have more benefits (however, the show operation

is

also an option and supplement).
Otherwise, the sql syntax and keywords may be changed frequently.
Of course, it will be more complicated than the extension of the show
operation.
  It is necessary to design various tables in information_schema, which

may

take a period of effort.

I will try to design the information_schema and integrate it with flink.
This may be a relatively big feature for me. I hope you guys can give
comments and opinions later.
Thank you all.

Best Regards,
Ran Tao


John Roesler<[email protected]>  于2023年2月24日周五 21:53写道：

Hello Ran,

Thanks for the FLIP!

Do you mind if we revisit the topic of doing this by adding an

Information

schema? The SHOW approach requires modifying the parser/language for

every

gap we identify. On the flip side, an Information schema is much easier

to

discover and remember how to use, and the ability to run queries on it

is

quite valuable for admins. It’s also better for Flink maintainers,

because

the information tables’ schemas can be evolved over time just like

regular

tables, whereas every change to a SHOW statement would be a breaking
change.

I understand that it may seem like a big effort, but we’re proposing

quite

a big extension to the space of SHOW statement, so it seems appropriate

to

take the opportunity and migrate to a better framework rather than
incrementally building on (and tying us even more firmly to) the

existing

approach.

Thanks for your consideration,
John

On Fri, Feb 24, 2023, at 05:58, Sergey Nuyanzin wrote:

thanks for explanation

But it's not clear to me what exactly
you want to display? Is it the name of the plugin?

I was thinking about name, type (source/sink) and may be version (not

sure

if it's possible right now)

On Fri, Feb 24, 2023 at 12:46 PM Ran Tao<[email protected]>

wrote:

Hi, Sergey. thanks. first step we can support filtering for show

operations

in this FLIP try to align other engines.
If we want to support describe other objects, of course we need to

design

how to get these metadatas or informations and printAsStyle. (So it

maybe

need another FLIP for more details).

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?

I think we can support it, currently flink do support some

operations

only

for flink itself such as showJobs. But it's not clear to me what

exactly

you want to display? Is it the name of the plugin?
Just Like:
Kafka
Hudi
Files

Best Regards,
Ran Tao


Sergey Nuyanzin<[email protected]>  于2023年2月24日周五 19:11写道：

Thanks for driving the FLIP

I have a couple of questions
Am I right that INFORMATION_SCHEMA mentioned by Timo[1]  is out of

scope

of

this FLIP?
I noticed there are some support of it in

Spark[2]/Hive[3]/Snowflake[4]

and

others

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?

[1]

https://lists.apache.org/thread/2g108qlfwbhb56wnoc5qj0yq29dvq1vv

[2]https://issues.apache.org/jira/browse/SPARK-16452
[3]https://issues.apache.org/jira/browse/HIVE-1010
[4]https://docs.snowflake.com/en/sql-reference/info-schema


On Fri, Feb 24, 2023 at 4:19 AM Jark Wu<[email protected]>  wrote:

Hi Jing,

we'd better reduce the dependency chain and follow the Law of

Demeter(LoD, clean code).

Adding a new method in TableEnvironment is therefore better

than

calling

an API chain

I think I don't fully agree that LoD is a good practice.

Actually, I

would

prefer to keep the API clean and concise.
This is also the Java Collection Framework design principle [1]:

"High

power-to-weight ratio". Otherwise,
it will explode the API interfaces with different combinations

of

methods.

Currently, TableEnvironment
already provides 60+ methods.

IMO, with the increasing methods of accessing and manipulating

metadata,

they can be extracted to
a separate interface, where we can add richer methods. This work

can be

aligned with the
CatalogManager interface (FLIP-295) [2].

Best,
Jark

[1]:

https://stackoverflow.com/questions/7568819/why-no-tail-or-head-method-in-list-to-get-last-or-first-element

[2]:

https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70


On Fri, 24 Feb 2023 at 10:38, Aitozi<[email protected]>

wrote:

Hi,
     Thanks for the nice proposal, Ran.
     Regarding this api usage, I have some discussion with

@twalthr

before

as here <

https://github.com/apache/flink/pull/15137#issuecomment-1356124138>

Personally, I think leaking the Catalog to the user side is

not

suitable,

since there are some read/write operations in the Catalog, the
TableEnvironment#getCatalog will expose all of them to the

user

side.

So

learned to add a new api in TableEnvironment to reduce

reliance

on

the

current TableEnvironment#getCatalog.

Thanks,
Aitozi


Ran Tao<[email protected]>  于2023年2月23日周四 23:44写道：

Hi, JingSong, Jing.

thank for sharing your opinions.

What you say makes sense, both approaches have pros and

cons.

If it is a modification of `TableEnvrionment`, such as
listDatabases(catalog). It is actually consistent with the

other

overloaded

methods before,
and defining this method means that TableEnvrionment does

provide

this

capability (rather than relying on the functionality of

another

class).

The disadvantage is that api changes may be required, and

may

continue

to

be modified in the future.
But from the TableEnvrionment itself, it really doesn't pay

attention

to

how the underlying layer is implemented.
(Although it is actually taken from the catalogManager at

present,

this

is

another question)

Judging from the current dependencies, flink-table-api-java

strongly

relies

on flink-table-common to use various common classes and

interfaces,

especially the catalog interface.
So we can extract various metadata information in the

catalog

through

`tEnv.getCatalog`.
The advantage is that it will not cause api modification,

but

this

method

of use breaks LoD.
In fact, the current flink-table-api-java design is

completely

bound

to

the

catalog interface.

If the mandatory modification of PublicApi is constrained

(may

be

modified

again and later), I tend to use `tEnv.getCatalog` directly,

otherwise

It would actually be more standard to add overloaded methods

to

`TableEnvrionment`.

Another question, can the later capabilities of

TableEnvrionment be

implemented through SupportXXX?
In order to solve the problem that the method needs to be

added

in

the

future. This kind of usage occurs frequently in flink.

Looking forward to your other considerations,
and also try to wait to see if there are other relevant API

designers

or

committers to provide comments.


Best Regards,
Ran Tao

Jing Ge<[email protected]>  于2023年2月23日周四 18:58写道：

Hi Jingson,

Thanks for sharing your thoughts. Please see my reply

below.

On Thu, Feb 23, 2023 at 10:16 AM Jingsong Li <

[email protected]

wrote:

Hi Jing Ge,

First, flink-table-common contains all common classes of

Flink

Table,

I think it is hard to bypass its dependence.

If any time when we use flink-table-api-java, we have to

cross

through

flink-table-api-java and use flink-table-common, we should

reconsider

the

design of these two modules and how interfaces/classes are

classified

into

those modules.

Secondly, almost all methods in Catalog looks useful to

me,

so

if

we

are following LoD, we should add all methods again to
TableEnvironment. I think it is redundant.

That is the enlarged issue I mentioned previously. A

simple

solution

is

to

move Catalog to the top level API. The fact is that a

catalog

package

already exists in flink-table-api-java but the Catalog

interface

is

in

flink-table-common. I don't know the historical context of

this

design.

Maybe you could share some insight with us? Thanks in

advance.

Beyond

that,

there should be other AOP options but need more time to

figure it

out.

And, this API chain does not look deep.
-

"tEnv.getCatalog(tEnv.getCurrentCatalog()).get().listDatabases()"

looks a little complicated. The complex part is ahead.
- If we have a method to get Catalog directly, can be

simplify

to

"tEnv.catalog().listDatabase()", this is simple.

Commonly, it will need more effort to always follow LoD,

but

for

the

top

level facade API like TableEnvironment, both the API

developer,

API

consumer and the project itself from a long-term

perspective

will

benefit

from sticking to LoD. Since we already have the

getCatalog(String

catalog)

method in TableEnvironment, it also makes sense to follow

your

suggestion,

if we only want to limit/avoid public API changes. But

please

be

aware

that

we will have all known long-term drawbacks because of LoD
violation, especially the cross modules violation. I just

checked

all

usages of getCatalog(String catalog) in the master branch.

Currently

there

are very limited calls. It is better to pay attention to

it

before

it

goes

worse. Just my 2 cents. :)

Best,
Jingsong

On Thu, Feb 23, 2023 at 4:47 PM Jing Ge

<[email protected]

wrote:

Hi Jingson,

Thanks for the knowledge sharing. IMHO, it looks more

like a

design

guideline question than just avoiding public API

change.

Please

correct

me

if I'm wrong.

Catalog is in flink-table-common module and

TableEnvironment

is

in

flink-table-api-java. Depending on how and where those

features

proposed

in

this FLIP will be used, we'd better reduce the

dependency

chain

and

follow

the Law of Demeter(LoD, clean code) [1]. Adding a new

method

in

TableEnvironment is therefore better than calling an

API

chain.

It

is

also

more user friendly for the caller, because there is no

need

to

understand

the internal structure of the called API. The downside

of

doing

this

is

that we might have another issue with the current

TableEnvironment

design -

the TableEnvironment interface got enlarged with more

wrapper

methods.

This

is a different issue that could be solved with

improved

abstraction

design

in the future. After considering pros and cons, if we

want to

add

those

features now, I would prefer following LoD than API

chain

calls.

WDYT?

Best regards,
Jing

[1]

https://hackernoon.com/object-oriented-tricks-2-law-of-demeter-4ecc9becad85

On Thu, Feb 23, 2023 at 6:26 AM Ran Tao <

[email protected]

wrote:

Hi Jingsong. thanks. i got it.
In this way, there is no need to introduce new API

changes.

Best Regards,
Ran Tao


Jingsong Li<[email protected]>  于2023年2月23日周四

12:26写道：

Hi Ran,

I mean we can just use

TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases().

We don't need to provide new apis just for utils.

Best,
Jingsong

On Thu, Feb 23, 2023 at 12:11 PM Ran Tao <

[email protected]>

wrote:

Hi Jingsong, thanks.

The implementation of these statements in

TableEnvironmentImpl

is

called

through the catalog api.

but it does support some new override methods on

the

catalog

api

side,

and

I will update it later. Thank you.

e.g.
TableEnvironmentImpl
     @Override
     public String[] listDatabases() {
         return catalogManager

  .getCatalog(catalogManager.getCurrentCatalog())

                 .get()
                 .listDatabases()
                 .toArray(new String[0]);
     }

Best Regards,
Ran Tao


Jingsong Li<[email protected]>

于2023年2月23日周四

11:47写道：

Thanks for the proposal.

+1 for the proposal.

I am confused about "Proposed TableEnvironment

SQL

API

Changes",

can

we just use catalog api for this requirement?

Best,
Jingsong

On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau <

[email protected]

wrote:

Hi Ran:
Thanks for driving the FLIP. the google doc

looks

really

good.

it

is

important to improve user interactive

experience.

+1

to

support

this

feature.

Jing Ge<[email protected]>

于2023年2月23日周四

00:51写道：

Hi Ran,

Thanks for driving the FLIP.  It looks

overall

good.

Would

you

like to

add

a description of useLike and notLike? I

guess

useLike

true

is for

"LIKE"

and notLike true is for "NOT LIKE" but I

am

not

sure

if I

understood it

correctly. Furthermore, does it make sense

to

support

"ILIKE"

too?

Best regards,
Jing

On Wed, Feb 22, 2023 at 1:17 PM Ran Tao <

[email protected]>

wrote:

Currently flink sql auxiliary statements

has

supported

some

good

features

such as catalog/databases/table support.

But these features are not very complete

compared

with

other

popular

engines such as spark, presto, hive and

commercial

engines

such

as

snowflake.

For example, many engines support show

operation

with

filtering

except

flink, and support describe other

object(flink

only

support

describe

table).

I wonder can we add these useful

features

for

flink?

You can find details in this doc.[1] or

FLIP.[2]

Also, please let me know if there is a

mistake.

Looking

forward

to

your

reply.

[1]

https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/

[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements

Best Regards,
Ran Tao


--
Best regards,
Sergey


--
Best regards,
Sergey

Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

Reply via email to