Hi Danny,
thanks for updating the FLIP. I think your current design is sufficient
to separate hints from result-related properties.
One remark to the naming itself: I would vote for calling the hints
around table scan `OPTIONS('k'='v')`. We used the term "properties" in
the past but since we want to unify the Flink configuration experience,
we should use consistent naming and classes around `ConfigOptions`.
It would be nice to use `Set<ConfigOption> supportedHintOptions();` to
start using config options instead of pure string properties. This will
also allow us to generate documentation in the future around supported
data types, ranges, etc. for options. At some point we would also like
to drop `DescriptorProperties` class. "Options" is also used in the
documentation [1] and in the SQL/MED standard [2].
Furthermore, I would still vote for separating CatalogTable and hint
options. Otherwise the planner would need to create a new CatalogTable
instance which might not always be easy. We should offer them via:
org.apache.flink.table.factories.TableSourceFactory.Context#getHints:
ReadableConfig
What do you think?
Regards,
Timo
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table
[2] https://wiki.postgresql.org/wiki/SQL/MED
On 12.03.20 15:06, Stephan Ewen wrote:
@Danny sounds good.
Maybe it is worth listing all the classes of problems that you want to
address and then look at each class and see if hints are a good default
solution or a good optional way of simplifying things?
The discussion has grown a lot and it is starting to be hard to distinguish
the parts where everyone agrees from the parts were there are concerns.
On Thu, Mar 12, 2020 at 2:31 PM Danny Chan <danny0...@apache.org> wrote:
Thanks Stephan ~
We can remove the support for properties that may change the semantics of
query if you think that is a trouble.
How about we support the /*+ properties() */ hint only for those optimize
parameters, such as the fetch size of source or something like that, does
that make sense?
Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道:
I think Bowen has actually put it very well.
(1) Hints that change semantics looks like trouble waiting to happen. For
example Kafka offset handling should be in filters. The Kafka source
should
support predicate pushdown.
(2) Hints should not be a workaround for current shortcomings. A lot of
the
suggested above sounds exactly like that. Working around catalog/DDL
shortcomings, missing exposure of metadata (offsets), missing predicate
pushdown in Kafka. Abusing a feature like hints now as a quick fix for
these issues, rather than fixing the root causes, will much likely bite
us
back badly in the future.
Best,
Stephan
On Thu, Mar 12, 2020 at 10:43 AM Kurt Young <ykt...@gmail.com> wrote:
It seems this FLIP's name is somewhat misleading. From my
understanding,
this FLIP is trying to
address the dynamic parameter issue, and table hints is the way we wan
to
choose. I think we should
be focus on "what's the right way to solve dynamic property" instead of
discussing "whether table
hints can affect query semantics".
For now, there are two proposed ways to achieve dynamic property:
1. FLIP-110: create temporary table xx like xx with (xxx)
2. use custom "from t with (xxx)" syntax
3. "Borrow" the table hints to have a special PROPERTIES hint.
The first one didn't break anything, but the only problem i see is a
little
more verbose than the table hint
approach. I can imagine when someone using SQL CLI to have a sql
experience, it's quite often that
he will modify the table property, some use cases i can think of:
1. the source contains some corrupted data, i want to turn on the
"ignore-error" flag for certain formats.
2. I have a kafka table and want to see some sample data from the
beginning, so i change the offset
to "earliest", and then I want to observe the latest data which keeps
coming in. I would write another query
to select from the latest table.
3. I want to my jdbc sink flush data more eagerly then i can observe
the
data from database side.
Most of such use cases are quite ad-hoc. If every time I want to have a
different experience, i need to create
a temporary table and then also modify my query, it doesn't feel
smooth.
Embed such dynamic property into
query would have better user experience.
Both 2 & 3 can make this happen. The cons of #2 is breaking SQL
compliant,
and for #3, it only breaks some
unwritten rules, but we can have an explanation on that. And I really
doubt
whether user would complain about
this when they actually have flexible and good experience using this.
My tendency would be #3 > #1 > #2, what do you think?
Best,
Kurt
On Thu, Mar 12, 2020 at 1:11 PM Danny Chan <yuzhao....@gmail.com>
wrote:
Thanks Aljoscha ~
I agree for most of the query hints, they are optional as an
optimizer
instruction, especially for the traditional RDBMS.
But, just like BenChao said, Flink as a computation engine has many
different kind of data sources, thus, dynamic parameters like
start_offest
can only bind to each table scope, we can not set a session config
like
KSQL because they are all about Kafka:
SET ‘auto.offset.reset’=‘earliest’;
Thus the most flexible way to set up these dynamic params is to bind
to
the table scope in the query when we want to override something, so
we
have
these solutions above (with pros and cons from my side):
• 1. Select * from t(offset=123) (from Timo)
Pros:
- Easy to add
- Parameters are part of the main query
Cons:
- Not SQL compliant
• 2. Select * from t /*+ PROPERTIES(offset=123) */ (from me)
Pros:
- Easy to add
- SQL compliant because it is nested in the comments
Cons:
- Parameters are not part of the main query
- Cryptic syntax for new users
The biggest problem for hints way may be the “if hints must be
optional”,
actually we have though about 1 for a while but aborted because it
breaks
the SQL standard too much. And we replace it with 2, because the
hints
syntax do not break SQL standard(nested in comments).
What if we have the special /*+ PROPERTIES */ hint that allows
override
some properties of table dynamically, it does not break anything, at
lease
for current Flink use cases.
Planner hints are optional just because they are naturally enforcers
of
the planner, most of them aim to instruct the optimizer, but, the
table
hints is a little different, table hints can specify the table meta
like
index column, and it is very convenient to specify table properties.
Or shall we not call /*+ PROPERTIES(offset=123) */ table hint, we
can
call it table dynamic parameters.
Best,
Danny Chan
在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek <aljos...@apache.org>,写道:
Hi,
I don't understand this discussion. Hints, as I understand them,
should
work like this:
- hints are *optional* advice for the optimizer to try and help it
to
find a good execution strategy
- hints should not change query semantics, i.e. they should not
change
connector properties executing a query with taking into account the
hints *must* produce the same result as executing the query without
taking into account the hints
From these simple requirements you can derive a solution that makes
sense. I don't have a strong preference for the syntax but we
should
strive to be in line with prior work.
Best,
Aljoscha
On 11.03.20 11:53, Danny Chan wrote:
Thanks Timo for summarize the 3 options ~
I agree with Kurt that option2 is too complicated to use because:
• As a Kafka topic consumer, the user must define both the
virtual
column for start offset and he must apply a special filter predicate
after
each query
• And for the internal implementation, the metadata column push
down
is another hard topic, each kind of message queue may have its offset
attribute, we need to consider the expression type for different
kind;
the
source also need to recognize the constant column as a config
option(which
is weird because usually what we pushed down is a table column)
For option 1 and option3, I think there is no difference, option1
is
also a hint syntax which is introduced in Sybase and referenced then
deprecated by MS-SQL in 199X years because of the ambitiousness.
Personally
I prefer /*+ */ style table hint than WITH keyword for these reasons:
• We do not break the standard SQL, the hints are nested in SQL
comments
• We do not need to introduce additional WITH keyword which may
appear
in a query if we use that because a table can be referenced in all
kinds
of
SQL contexts: INSERT/DELETE/FROM/JOIN …. That would make our sql
query
break too much of the SQL from standard
• We would have uniform syntax for hints as query hint, one
syntax
fits all and more easy to use
And here is the reason why we choose a uniform Oracle style query
hint syntax which is addressed by Julian Hyde when we design the
syntax
from the Calcite community:
I don’t much like the MSSQL-style syntax for table hints. It
adds a
new use of the WITH keyword that is unrelated to the use of WITH for
common-table expressions.
A historical note. Microsoft SQL Server inherited its hint syntax
from
Sybase a very long time ago. (See “Transact SQL Programming”[1], page
632,
“Optimizer hints”. The book was written in 1999, and covers Microsoft
SQL
Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, but the syntax very
likely predates Sybase 4.3, from which Microsoft SQL Server was
forked
in
1993.)
Microsoft later added the WITH keyword to make it less ambiguous,
and
has now deprecated the syntax that does not use WITH.
They are forced to keep the syntax for backwards compatibility
but
that doesn’t mean that we should shoulder their burden.
I think formatted comments are the right container for hints
because
it allows us to change the hint syntax without changing the SQL
parser,
and
makes clear that we are at liberty to ignore hints entirely.
Julian
[1] https://www.amazon.com/s?k=9781565924017 <
https://www.amazon.com/s?k=9781565924017>
Best,
Danny Chan
在 2020年3月11日 +0800 PM4:03,Timo Walther <twal...@apache.org>,写道:
Hi Danny,
it is true that our DDL is not standard compliant by using the
WITH
clause. Nevertheless, we aim for not diverging too much and the
LIKE
clause is an example of that. It will solve things like
overwriting
WATERMARKs, add additional/modifying properties and inherit
schema.
Bowen is right that Flink's DDL is mixing 3 types definition
together.
We are not the first ones that try to solve this. There is also
the
SQL
MED standard [1] that tried to tackle this problem. I think it
was
not
considered when designing the current DDL.
Currently, I see 3 options for handling Kafka offsets. I will
give
some
examples and look forward to feedback here:
*Option 1* Runtime and semantic parms as part of the query
`SELECT * FROM MyTable('offset'=123)`
Pros:
- Easy to add
- Parameters are part of the main query
- No complicated hinting syntax
Cons:
- Not SQL compliant
*Option 2* Use metadata in query
`CREATE TABLE MyTable (id INT, offset AS
SYSTEM_METADATA('offset'))`
`SELECT * FROM MyTable WHERE offset > TIMESTAMP '2012-12-12
12:34:22'`
Pros:
- SQL compliant in the query
- Access of metadata in the DDL which is required anyway
- Regular pushdown rules apply
Cons:
- Users need to add an additional comlumn in the DDL
*Option 3*: Use hints for properties
`
SELECT *
FROM MyTable /*+ PROPERTIES('offset'=123) */
`
Pros:
- Easy to add
Cons:
- Parameters are not part of the main query
- Cryptic syntax for new users
- Not standard compliant.
If we go with this option, I would suggest to make it available
in
a
separate map and don't mix it with statically defined
properties.
Such
that the factory can decide which properties have the right to
be
overwritten by the hints:
TableSourceFactory.Context.getQueryHints(): ReadableConfig
Regards,
Timo
[1] https://en.wikipedia.org/wiki/SQL/MED
Currently I see 3 options as a
On 11.03.20 07:21, Danny Chan wrote:
Thanks Bowen ~
I agree we should somehow categorize our connector
parameters.
For type1, I’m already preparing a solution like the
Confluent
schema registry + Avro schema inference thing, so this may not be a
problem
in the near future.
For type3, I have some questions:
"SELECT * FROM mykafka WHERE offset > 12pm yesterday”
Where does the offset column come from, a virtual column from
the
table schema, you said that
They change
almost every time a query starts and have nothing to do with
metadata, thus
should not be part of table definition/DDL
But why you can reference it in the query, I’m confused for
that,
can you elaborate a little ?
Best,
Danny Chan
在 2020年3月11日 +0800 PM12:52,Bowen Li <bowenl...@gmail.com
,写道:
Thanks Danny for kicking off the effort
The root cause of too much manual work is Flink DDL has
mixed 3
types of
params together and doesn't handle each of them very well.
Below
are how I
categorize them and corresponding solutions in my mind:
- type 1: Metadata of external data, like external
endpoint/url,
username/pwd, schemas, formats.
Such metadata are mostly already accessible in external
system
as long as
endpoints and credentials are provided. Flink can get it
thru
catalogs, but
we haven't had many catalogs yet and thus Flink just hasn't
been
able to
leverage that. So the solution should be building more
catalogs.
Such
params should be part of a Flink table DDL/definition, and
not
overridable
in any means.
- type 2: Runtime params, like jdbc connector's fetch size,
elasticsearch
connector's bulk flush size.
Such params don't affect query results, but affect how
results
are produced
(eg. fast or slow, aka performance) - they are essentially
execution and
implementation details. They change often in exploration or
development
stages, but not quite frequently in well-defined
long-running
pipelines.
They should always have default values and can be missing
in
query. They
can be part of a table DDL/definition, but should also be
replaceable in a
query - *this is what table "hints" in FLIP-113 should
cover*.
- type 3: Semantic params, like kafka connector's start
offset.
Such params affect query results - the semantics. They'd
better
be as
filter conditions in WHERE clause that can be pushed down.
They
change
almost every time a query starts and have nothing to do
with
metadata, thus
should not be part of table definition/DDL, nor be
persisted
in
catalogs.
If they will, users should create views to keep such params
around (note
this is different from variable substitution).
Take Flink-Kafka as an example. Once we get these params
right,
here're the
steps users need to do to develop and run a Flink job:
- configure a Flink ConfluentSchemaRegistry with url,
username,
and password
- run "SELECT * FROM mykafka WHERE offset > 12pm yesterday"
(simplified
timestamp) in SQL CLI, Flink automatically retrieves all
metadata of
schema, file format, etc and start the job
- users want to make the job read Kafka topic faster, so it
goes
as "SELECT
* FROM mykafka /* faster_read_key=value*/ WHERE offset >
12pm
yesterday"
- done and satisfied, users submit it to production
Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think
it's
a
nice-to-have feature, but not a strategically critical,
long-term solution,
because
1) It may seem promising at the current stage to solve the
too-much-manual-work problem, but that's only because Flink
hasn't
leveraged catalogs well and handled the 3 types of params
above
properly.
Once we get the params types right, the LIKE syntax won't
be
that
important, and will be just an easier way to create tables
without retyping
long fields like username and pwd.
2) Note that only some rare type of catalog can store k-v
property pair, so
table created this way often cannot be persisted. In the
foreseeable
future, such catalog will only be HiveCatalog, and not
everyone
has a Hive
metastore. To be honest, without persistence, recreating
tables
every time
this way is still a lot of keyboard typing.
Cheers,
Bowen
On Tue, Mar 10, 2020 at 8:07 PM Kurt Young <
ykt...@gmail.com
wrote:
If a specific connector want to have such parameter and
read
if out of
configuration, then that's fine.
If we are talking about a configuration for all kinds of
sources, I would
be super careful about that.
It's true it can solve maybe 80% cases, but it will also
make
the left 20%
feels weird.
Best,
Kurt
On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <
imj...@gmail.com
wrote:
Hi Kurt,
#3 Regarding to global offset:
I'm not saying to use the global configuration to
override
connector
properties by the planner.
But the connector should take this configuration and
translate into their
client API.
AFAIK, almost all the message queues support eariliest
and
latest and a
timestamp value as start point.
So we can support 3 options for this configuration:
"eariliest", "latest"
and a timestamp string value.
Of course, this can't solve 100% cases, but I guess can
sovle 80% or 90%
cases.
And the remaining cases can be resolved by LIKE syntax
which
I guess is
not
very common cases.
Best,
Jark
On Wed, 11 Mar 2020 at 10:33, Kurt Young <
ykt...@gmail.com
wrote:
Good to have such lovely discussions. I also want to
share
some of my
opinions.
#1 Regarding to error handling: I also think ignore
invalid hints would
be
dangerous, maybe
the simplest solution is just throw an exception.
#2 Regarding to property replacement: I don't think
we
should
constraint
ourself to
the meaning of the word "hint", and forbidden it
modifying
any
properties
which can effect
query results. IMO `PROPERTIES` is one of the table
hints,
and a
powerful
one. It can
modify properties located in DDL's WITH block. But I
also
see the harm
that
if we make it
too flexible like change the kafka topic name with a
hint.
Such use
case
is
not common and
sounds very dangerous to me. I would propose we have
a
map
of hintable
properties for each
connector, and should validate all passed in
properties
are actually
hintable. And combining with
#1 error handling, we can throw an exception once
received
invalid
property.
#3 Regarding to global offset: I'm not sure it's
feasible.
Different
connectors will have totally
different properties to represent offset, some might
be
timestamps,
some
might be string literals
like "earliest", and others might be just integers.
Best,
Kurt
On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <
imj...@gmail.com>
wrote:
Hi everyone,
I want to jump in the discussion about the "dynamic
start offset"
problem.
First of all, I share the same concern with Timo
and
Fabian, that the
"start offset" affects the query semantics, i.e.
the
query result.
But "hints" is just used for optimization which
should
affect the
result?
I think the "dynamic start offset" is an very
important
usability
problem
which will be faced by many streaming platforms.
I also agree "CREATE TEMPORARY TABLE Temp (LIKE t)
WITH
('connector.startup-timestamp-millis' =
'1578538374471')" is verbose,
what
if we have 10 tables to join?
However, what I want to propose (should be another
thread) is a
global
configuration to reset start offsets of all the
source
connectors
in the query session, e.g.
"table.sources.start-offset".
This is
possible
now because `TableSourceFactory.Context` has
`getConfiguration`
method to get the session configuration, and use it
to
create an
adapted
TableSource.
Then we can also expose to SQL CLI via SET command,
e.g.
`SET
'table.sources.start-offset'='earliest';`, which is
pretty simple and
straightforward.
This is very similar to KSQL's `SET
'auto.offset.reset'='earliest'`
which
is very helpful IMO.
Best,
Jark
On Tue, 10 Mar 2020 at 22:29, Timo Walther <
twal...@apache.org>
wrote:
Hi Danny,
compared to the hints, FLIP-110 is fully
compliant
to
the SQL
standard.
I don't think that `CREATE TEMPORARY TABLE Temp
(LIKE
t) WITH
(k=v)`
is
too verbose or awkward for the power of basically
changing the
entire
connector. Usually, this statement would just
precede
the query in
a
multiline file. So it can be change "in-place"
like
the hints you
proposed.
Many companies have a well-defined set of tables
that
should be
used.
It
would be dangerous if users can change the path
or
topic in a hint.
The
catalog/catalog manager should be the entity that
controls which
tables
exist and how they can be accessed.
what’s the problem there if we user the table
hints
to support
“start
offset”?
IMHO it violates the meaning of a hint. According
to
the
dictionary,
a
hint is "a statement that expresses indirectly
what
one prefers not
to
say explicitly". But offsets are a property that
are
very explicit.
If we go with the hint approach, it should be
expressible in the
TableSourceFactory which properties are supported
for
hinting. Or
do
you
plan to offer those hints in a separate
Map<String,
String> that
cannot
overwrite existing properties? I think this would
be
a
different
story...
Regards,
Timo
On 10.03.20 10:34, Danny Chan wrote:
Thanks Timo ~
Personally I would say that offset > 0 and
start
offset = 10 does
not
have the same semantic, so from the SQL aspect,
we
can
not
implement
a
“starting offset” hint for query with such a
syntax.
And the CREATE TABLE LIKE syntax is a DDL which
is
just verbose
for
defining such dynamic parameters even if it could
do
that, shall we
force
users to define a temporal table for each query
with
dynamic
params,
I
would say it’s an awkward solution.
"Hints should give "hints" but not affect the
actual
produced
result.”
You mentioned that multiple times and could we
give a
reason,
what’s
the
problem there if we user the table hints to
support
“start offset”
?
From
my side I saw some benefits for that:
• It’s very convent to set up these parameters,
the
syntax is
very
much
like the DDL definition
• It’s scope is very clear, right on the table
it
attathed
• It does not affect the table schema, which
means
in order to
specify
the offset, there is no need to define an offset
column which is
weird
actually, offset should never be a column, it’s
more
like a
metadata
or a
start option.
So in total, FLIP-110 uses the offset more
like a
Hive partition
prune,
we can do that if we have an offset column, but
most
of the case we
do
not
define that, so there is actually no conflict or
overlap.
Best,
Danny Chan
在 2020年3月10日 +0800 PM4:28,Timo Walther <
twal...@apache.org>,写道:
Hi Danny,
shouldn't FLIP-110[1] solve most of the
problems
we have around
defining
table properties more dynamically without
manual
schema work?
Also
offset definition is easier with such a
syntax.
They must not be
defined
in catalog but could be temporary tables that
extend from the
original
table.
In general, we should aim to keep the syntax
concise and don't
provide
too many ways of doing the same thing. Hints
should give "hints"
but
not
affect the actual produced result.
Some connector properties might also change
the
plan or schema
in
the
future. E.g. they might also define whether a
table source
supports
certain push-downs (e.g. predicate
push-down).
Dawid is currently working a draft that might
makes it possible
to
expose a Kafka offset via the schema such
that
`SELECT * FROM
Topic
WHERE offset > 10` would become possible and
could
be pushed
down.
But
this is of course, not planned initially.
Regards,
Timo
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
On 10.03.20 08:34, Danny Chan wrote:
Thanks Wenlong ~
For PROPERTIES Hint Error handling
Actually we have no way to figure out
whether a
error prone
hint
is a
PROPERTIES hint, for example, if use writes a
hint
like
‘PROPERTIAS’,
we
do
not know if this hint is a PROPERTIES hint, what
we
know is that
the
hint
name was not registered in our Flink.
If the user writes the hint name correctly
(i.e.
PROPERTIES),
we
did
can enforce the validation of the hint options
though
the pluggable
HintOptionChecker.
For PROPERTIES Hint Option Format
For a key value style hint option, the key
can
be either a
simple
identifier or a string literal, which means that
it’s
compatible
with
our
DDL syntax. We support simple identifier because
many
other hints
do
not
have the component complex keys like the table
properties, and we
want
to
unify the parse block.
Best,
Danny Chan
在 2020年3月10日 +0800 PM3:19,wenlong.lwl <
wenlong88....@gmail.com
,写道:
Hi Danny, thanks for the proposal. +1 for
adding table hints,
it
is
really
a necessary feature for flink sql to
integrate
with a catalog.
For error handling, I think it would be
more
natural to throw
an
exception when error table hint provided,
because the
properties
in
hint
will be merged and used to find the table
factory which would
cause
an
exception when error properties provided,
right? On the other
hand,
unlike
other hints which just affect the way to
execute the query,
the
property
table hint actually affects the result of
the
query, we should
never
ignore
the given property hints.
For the format of property hints,
currently,
in sql client, we
accept
properties in format of string only in
DDL:
'connector.type'='kafka',
I
think the format of properties in hint
should
be the same as
the
format we
defined in ddl. What do you think?
Bests,
Wenlong Lyu
On Tue, 10 Mar 2020 at 14:22, Danny Chan
<
yuzhao....@gmail.com>
wrote:
To Weike: About the Error Handing
To be consistent with other SQL
vendors,
the
default is to
log
warnings
and if there is any error (invalid hint
name
or options), the
hint
is just
ignored. I have already addressed in
the
wiki.
To Timo: About the PROPERTIES Table
Hint
• The properties hints is also
optional,
user can pass in an
option
to
override the table properties but this
does
not mean it is
required.
• They should not include semantics:
does
the properties
belong
to
semantic ? I don't think so, the plan
does
not change right ?
The
result
set may be affected, but there are
already
some hints do so,
for
example,
MS-SQL MAXRECURSION and SNAPSHOT hint
[1]
• `SELECT * FROM t(k=v, k=v)`: this
grammar
breaks the SQL
standard
compared to the hints way(which is
included
in comments)
• I actually didn't found any vendors
to
support such
grammar,
and
there
is no way to override table level
properties
dynamically. For
normal
RDBMS,
I think there are no requests for such
dynamic parameters
because
all the
table have the same storage and
computation
and they are
almost
all
batch
tables.
• While Flink as a computation engine
has
many connectors,
especially for
some message queue like Kafka, we would
have
a start_offset
which
is
different each time we start the query,
such
parameters can
not
be
persisted to catalog, because it’s not
static, this is
actually
the
background we propose the table hints
to
indicate such
properties
dynamically.
To Jark and Jinsong: I have removed the
query hints part and
change
the
title.
[1]
https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
Best,
Danny Chan
在 2020年3月9日 +0800 PM5:46,Timo Walther <
twal...@apache.org
,写道:
Hi Danny,
thanks for the proposal. I agree with
Jark
and Jingsong.
Planner
hints
and table hints are orthogonal topics
that
should be
discussed
separately.
I share Jingsong's opinion that we
should
not use planner
hints
for
passing connector properties. Planner
hints should be
optional
at
any
time. They should not include
semantics
but only affect
execution
time.
Connector properties are an important
part
of the query
itself.
Have you thought about options such
as
`SELECT * FROM t(k=v,
k=v)`?
How
are other vendors deal with this
problem?
Regards,
Timo
On 09.03.20 10:37, Jingsong Li wrote:
Hi Danny, +1 for table hints,
thanks
for
driving.
I took a look to FLIP, most of
content
are talking about
query
hints.
It is
hard to discussion and voting. So
+1
to
split it as Jark
said.
Another thing is configuration that
suitable to config with
table
hints:
"connector.path" and
"connector.topic",
Are they really
suitable
for
table
hints? Looks weird to me. Because I
think these properties
are
the
core of
table.
Best,
Jingsong Lee
On Mon, Mar 9, 2020 at 5:30 PM Jark
Wu
<
imj...@gmail.com>
wrote:
Thanks Danny for starting the
discussion.
+1 for this feature.
If we just focus on the table
hints
not the query hints in
this
release,
could you split the FLIP into two
FLIPs?
Because it's hard to vote on
partial
part of a FLIP. You
can
keep
the table
hints proposal in FLIP-113 and
move
query hints into
another
FLIP.
So that we can focuse on the
table
hints in the FLIP.
Thanks,
Jark
On Mon, 9 Mar 2020 at 17:14,
DONG,
Weike <
kyled...@connect.hku.hk
wrote:
Hi Danny,
This is a nice feature, +1.
One thing I am interested in
but
not
mentioned in the
proposal
is
the
error
handling, as it is quite common
for
users to write
inappropriate
hints in
SQL code, if illegal or "bad"
hints
are given, would the
system
simply
ignore them or throw
exceptions?
Thanks : )
Best,
Weike
On Mon, Mar 9, 2020 at 5:02 PM
Danny
Chan <
yuzhao....@gmail.com>
wrote:
Note:
we only plan to support table
hints in Flink release
1.11,
so
please
focus
mainly on the table hints
part
and
just ignore the
planner
hints, sorry
for
that mistake ~
Best,
Danny Chan
在 2020年3月9日 +0800
PM4:36,Danny
Chan <
yuzhao....@gmail.com
,写道:
Hi, fellows ~
I would like to propose the
supports for SQL hints for
our
Flink SQL.
We would support hints
syntax
as
following:
select /*+ NO_HASH_JOIN,
RESOURCE(mem='128mb',
parallelism='24') */
from
emp /*+ INDEX(idx1, idx2)
*/
join
dept /*+
PROPERTIES(k1='v1',
k2='v2') */
on
emp.deptno = dept.deptno
Basically we would support
both
query hints(after the
SELECT
keyword)
and table hints(after the
referenced table name), for
1.11,
we
plan to
only
support table hints with a
hint
probably named
PROPERTIES:
table_name /*+
PROPERTIES(k1='v1', k2='v2') *+/
I am looking forward to
your
comments.
You can access the FLIP
here:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
Best,
Danny Chan