Hi Ran,

NIT: ILIKE is supported by Clickhouse. Since Flink is focusing on the unified batch and stream data processing, it might make sense to support it. It is more convenient for users.

Thanks for clarifying the definition of "useLike" and "notLike". It seems that "notLike" depends on "useLike", i.e. some combinations are not allowed, e.g while userLike is false, it will make no sense to use notLike with any value. Users might feel confused. Given that there is another "likePattern" parameter contains information whether "Like" with be used or not, could we reduce the method signature to using two parameters? WDYT?

||
|||// enable like by default
|
|||private||boolean||useLike = true;|
||
|||// true for not like, false is like|
|||private||final||boolean||notLike;|
|||// like or not like pattern (sql-like-pattern, consistent with current flink behavior)|
|||private||final||String likePattern;|
    // Like will be enabled/used implicitly when this method is called.
|||public||ShowCatalogsOperation(||boolean||notLike, String likePattern) {|
|||this||.useLike = useLike;|
|||this||.notLike = notLike;|
|        // -> additonal logic to assert likePattern before setting the value.
|
|||this||.likePattern = likePattern;|
|||}|
|
|
|||public||ShowCatalogsOperation(||String likePattern) {|
|||ShowCatalogsOperation(false, likePattern);|
|||}|
||
|
|
|     // Like is disabled.
|
|||public||ShowCatalogsOperation(||) {|
|||this||.useLike = false;|
|||}|

Best regards,

Jing


On 2/26/23 08:01, Ran Tao wrote:
Hi, Jark, thanks for explantations.
if there are no other questions, i will try to start a VOTE to address
current things in this FLIP.
We may open another dedicated FLIP to discuss INFORMATION_SCHEMA.
hi, guys, WDYT?

Best Regards,
Ran Tao


Jark Wu<imj...@gmail.com>  于2023年2月25日周六 22:57写道:

Hi Sergey,

I think INFORMATION_SCHEMA is a very interesting idea, and I hope we can
support it. However, it doesn't conflict with the idea of auxiliary
statements. I can see different benefits of them. The information schema
provides powerful and flexible capabilities but needs to learn the complex
entity relationship[1]. The auxiliary SQL statements are super handy and
can resolve most problems, but they offer limited features.

I can see almost all the mature systems support both of them. I think it
also makes sense to support both of them in Flink. Considering they
are orthogonal and information schema requires more complex design and
discussion, it deserves a separate FLIP. Sergey, are you willing to
contribute this FLIP?

Best,
Jark

[1]:

https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html


On Fri, 24 Feb 2023 at 22:43, Ran Tao<chucheng...@gmail.com>  wrote:

Thanks John.

It seems that most people prefer the information_schema implementation.
information_schema does have more benefits (however, the show operation
is
also an option and supplement).
Otherwise, the sql syntax and keywords may be changed frequently.
Of course, it will be more complicated than the extension of the show
operation.
  It is necessary to design various tables in information_schema, which
may
take a period of effort.

I will try to design the information_schema and integrate it with flink.
This may be a relatively big feature for me. I hope you guys can give
comments and opinions later.
Thank you all.

Best Regards,
Ran Tao


John Roesler<vvcep...@apache.org>  于2023年2月24日周五 21:53写道:

Hello Ran,

Thanks for the FLIP!

Do you mind if we revisit the topic of doing this by adding an
Information
schema? The SHOW approach requires modifying the parser/language for
every
gap we identify. On the flip side, an Information schema is much easier
to
discover and remember how to use, and the ability to run queries on it
is
quite valuable for admins. It’s also better for Flink maintainers,
because
the information tables’ schemas can be evolved over time just like
regular
tables, whereas every change to a SHOW statement would be a breaking
change.

I understand that it may seem like a big effort, but we’re proposing
quite
a big extension to the space of SHOW statement, so it seems appropriate
to
take the opportunity and migrate to a better framework rather than
incrementally building on (and tying us even more firmly to) the
existing
approach.

Thanks for your consideration,
John

On Fri, Feb 24, 2023, at 05:58, Sergey Nuyanzin wrote:
thanks for explanation

But it's not clear to me what exactly
you want to display? Is it the name of the plugin?
I was thinking about name, type (source/sink) and may be version (not
sure
if it's possible right now)

On Fri, Feb 24, 2023 at 12:46 PM Ran Tao<chucheng...@gmail.com>
wrote:
Hi, Sergey. thanks. first step we can support filtering for show
operations
in this FLIP try to align other engines.
If we want to support describe other objects, of course we need to
design
how to get these metadatas or informations and printAsStyle. (So it
maybe
need another FLIP for more details).

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?
I think we can support it, currently flink do support some
operations
only
for flink itself such as showJobs. But it's not clear to me what
exactly
you want to display? Is it the name of the plugin?
Just Like:
Kafka
Hudi
Files

Best Regards,
Ran Tao


Sergey Nuyanzin<snuyan...@gmail.com>  于2023年2月24日周五 19:11写道:

Thanks for driving the FLIP

I have a couple of questions
Am I right that INFORMATION_SCHEMA mentioned by Timo[1]  is out of
scope
of
this FLIP?
I noticed there are some support of it in
Spark[2]/Hive[3]/Snowflake[4]
and
others

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?

[1]
https://lists.apache.org/thread/2g108qlfwbhb56wnoc5qj0yq29dvq1vv
[2]https://issues.apache.org/jira/browse/SPARK-16452
[3]https://issues.apache.org/jira/browse/HIVE-1010
[4]https://docs.snowflake.com/en/sql-reference/info-schema


On Fri, Feb 24, 2023 at 4:19 AM Jark Wu<imj...@gmail.com>  wrote:

Hi Jing,

we'd better reduce the dependency chain and follow the Law of
Demeter(LoD, clean code).
Adding a new method in TableEnvironment is therefore better
than
calling
an API chain

I think I don't fully agree that LoD is a good practice.
Actually, I
would
prefer to keep the API clean and concise.
This is also the Java Collection Framework design principle [1]:
"High
power-to-weight ratio". Otherwise,
it will explode the API interfaces with different combinations
of
methods.
Currently, TableEnvironment
already provides 60+ methods.

IMO, with the increasing methods of accessing and manipulating
metadata,
they can be extracted to
a separate interface, where we can add richer methods. This work
can be
aligned with the
CatalogManager interface (FLIP-295) [2].

Best,
Jark

[1]:


https://stackoverflow.com/questions/7568819/why-no-tail-or-head-method-in-list-to-get-last-or-first-element
[2]:
https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70

On Fri, 24 Feb 2023 at 10:38, Aitozi<gjying1...@gmail.com>
wrote:
Hi,
     Thanks for the nice proposal, Ran.
     Regarding this api usage, I have some discussion with
@twalthr
before
as here <

https://github.com/apache/flink/pull/15137#issuecomment-1356124138>
Personally, I think leaking the Catalog to the user side is
not
suitable,
since there are some read/write operations in the Catalog, the
TableEnvironment#getCatalog will expose all of them to the
user
side.
So
I
learned to add a new api in TableEnvironment to reduce
reliance
on
the
current TableEnvironment#getCatalog.

Thanks,
Aitozi


Ran Tao<chucheng...@gmail.com>  于2023年2月23日周四 23:44写道:

Hi, JingSong, Jing.

thank for sharing your opinions.

What you say makes sense, both approaches have pros and
cons.
If it is a modification of `TableEnvrionment`, such as
listDatabases(catalog). It is actually consistent with the
other
overloaded
methods before,
and defining this method means that TableEnvrionment does
provide
this
capability (rather than relying on the functionality of
another
class).
The disadvantage is that api changes may be required, and
may
continue
to
be modified in the future.
But from the TableEnvrionment itself, it really doesn't pay
attention
to
how the underlying layer is implemented.
(Although it is actually taken from the catalogManager at
present,
this
is
another question)

Judging from the current dependencies, flink-table-api-java
strongly
relies
on flink-table-common to use various common classes and
interfaces,
especially the catalog interface.
So we can extract various metadata information in the
catalog
through
`tEnv.getCatalog`.
The advantage is that it will not cause api modification,
but
this
method
of use breaks LoD.
In fact, the current flink-table-api-java design is
completely
bound
to
the
catalog interface.

If the mandatory modification of PublicApi is constrained
(may
be
modified
again and later), I tend to use `tEnv.getCatalog` directly,
otherwise
It would actually be more standard to add overloaded methods
to
`TableEnvrionment`.

Another question, can the later capabilities of
TableEnvrionment be
implemented through SupportXXX?
In order to solve the problem that the method needs to be
added
in
the
future. This kind of usage occurs frequently in flink.

Looking forward to your other considerations,
and also try to wait to see if there are other relevant API
designers
or
committers to provide comments.


Best Regards,
Ran Tao

Jing Ge<j...@ververica.com.invalid>  于2023年2月23日周四 18:58写道:

Hi Jingson,

Thanks for sharing your thoughts. Please see my reply
below.
On Thu, Feb 23, 2023 at 10:16 AM Jingsong Li <
jingsongl...@gmail.com
wrote:

Hi Jing Ge,

First, flink-table-common contains all common classes of
Flink
Table,
I think it is hard to bypass its dependence.

If any time when we use flink-table-api-java, we have to
cross
through
flink-table-api-java and use flink-table-common, we should
reconsider
the
design of these two modules and how interfaces/classes are
classified
into
those modules.


Secondly, almost all methods in Catalog looks useful to
me,
so
if
we
are following LoD, we should add all methods again to
TableEnvironment. I think it is redundant.

That is the enlarged issue I mentioned previously. A
simple
solution
is
to
move Catalog to the top level API. The fact is that a
catalog
package
already exists in flink-table-api-java but the Catalog
interface
is
in
flink-table-common. I don't know the historical context of
this
design.
Maybe you could share some insight with us? Thanks in
advance.
Beyond
that,
there should be other AOP options but need more time to
figure it
out.

And, this API chain does not look deep.
-
"tEnv.getCatalog(tEnv.getCurrentCatalog()).get().listDatabases()"
looks a little complicated. The complex part is ahead.
- If we have a method to get Catalog directly, can be
simplify
to
"tEnv.catalog().listDatabase()", this is simple.

Commonly, it will need more effort to always follow LoD,
but
for
the
top
level facade API like TableEnvironment, both the API
developer,
API
consumer and the project itself from a long-term
perspective
will
benefit
from sticking to LoD. Since we already have the
getCatalog(String
catalog)
method in TableEnvironment, it also makes sense to follow
your
suggestion,
if we only want to limit/avoid public API changes. But
please
be
aware
that
we will have all known long-term drawbacks because of LoD
violation, especially the cross modules violation. I just
checked
all
usages of getCatalog(String catalog) in the master branch.
Currently
there
are very limited calls. It is better to pay attention to
it
before
it
goes
worse. Just my 2 cents. :)


Best,
Jingsong

On Thu, Feb 23, 2023 at 4:47 PM Jing Ge
<j...@ververica.com.invalid
wrote:
Hi Jingson,

Thanks for the knowledge sharing. IMHO, it looks more
like a
design
guideline question than just avoiding public API
change.
Please
correct
me
if I'm wrong.

Catalog is in flink-table-common module and
TableEnvironment
is
in
flink-table-api-java. Depending on how and where those
features
proposed
in
this FLIP will be used, we'd better reduce the
dependency
chain
and
follow
the Law of Demeter(LoD, clean code) [1]. Adding a new
method
in
TableEnvironment is therefore better than calling an
API
chain.
It
is
also
more user friendly for the caller, because there is no
need
to
understand
the internal structure of the called API. The downside
of
doing
this
is
that we might have another issue with the current
TableEnvironment
design -
the TableEnvironment interface got enlarged with more
wrapper
methods.
This
is a different issue that could be solved with
improved
abstraction
design
in the future. After considering pros and cons, if we
want to
add
those
features now, I would prefer following LoD than API
chain
calls.
WDYT?
Best regards,
Jing

[1]

https://hackernoon.com/object-oriented-tricks-2-law-of-demeter-4ecc9becad85
On Thu, Feb 23, 2023 at 6:26 AM Ran Tao <
chucheng...@gmail.com
wrote:
Hi Jingsong. thanks. i got it.
In this way, there is no need to introduce new API
changes.
Best Regards,
Ran Tao


Jingsong Li<jingsongl...@gmail.com>  于2023年2月23日周四
12:26写道:
Hi Ran,

I mean we can just use

TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases().
We don't need to provide new apis just for utils.

Best,
Jingsong

On Thu, Feb 23, 2023 at 12:11 PM Ran Tao <
chucheng...@gmail.com>
wrote:
Hi Jingsong, thanks.

The implementation of these statements in
TableEnvironmentImpl
is
called
through the catalog api.

but it does support some new override methods on
the
catalog
api
side,
and
I will update it later. Thank you.

e.g.
TableEnvironmentImpl
     @Override
     public String[] listDatabases() {
         return catalogManager

  .getCatalog(catalogManager.getCurrentCatalog())
                 .get()
                 .listDatabases()
                 .toArray(new String[0]);
     }

Best Regards,
Ran Tao


Jingsong Li<jingsongl...@gmail.com>
于2023年2月23日周四
11:47写道:
Thanks for the proposal.

+1 for the proposal.

I am confused about "Proposed TableEnvironment
SQL
API
Changes",
can
we just use catalog api for this requirement?

Best,
Jingsong

On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau <
liuyong...@gmail.com
wrote:
Hi Ran:
Thanks for driving the FLIP. the google doc
looks
really
good.
it
is
important to improve user interactive
experience.
+1
to
support
this
feature.

Jing Ge<j...@ververica.com.invalid>
于2023年2月23日周四
00:51写道:
Hi Ran,

Thanks for driving the FLIP.  It looks
overall
good.
Would
you
like to
add
a description of useLike and notLike? I
guess
useLike
true
is for
"LIKE"
and notLike true is for "NOT LIKE" but I
am
not
sure
if I
understood it
correctly. Furthermore, does it make sense
to
support
"ILIKE"
too?
Best regards,
Jing

On Wed, Feb 22, 2023 at 1:17 PM Ran Tao <
chucheng...@gmail.com>
wrote:
Currently flink sql auxiliary statements
has
supported
some
good
features
such as catalog/databases/table support.

But these features are not very complete
compared
with
other
popular
engines such as spark, presto, hive and
commercial
engines
such
as
snowflake.

For example, many engines support show
operation
with
filtering
except
flink, and support describe other
object(flink
only
support
describe
table).

I wonder can we add these useful
features
for
flink?
You can find details in this doc.[1] or
FLIP.[2]
Also, please let me know if there is a
mistake.
Looking
forward
to
your
reply.

[1]


https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/
[2]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
Best Regards,
Ran Tao


--
Best regards,
Sergey


--
Best regards,
Sergey

Reply via email to