Hi, everyone ~

@Aljoscha @Timo

> I think we're designing ourselves into ever more complicated corners
here

I kindly agree that, personally didn't see strong reasons why we should limit 
on each connector properties:

• we can define any table options for CREATE TABLE, why we treat the dynamic 
options differently, we never consider any security problems when create table, 
we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties, the table 
source creation work would be much easier, just used the merged options. There 
is no need to modify each connector to decide which options could be overridden 
and how we merge them(the merge work is redundant).
• @Timo, how about we support another interface 
`TableSourceFactory#Context.getExecutionOptions`, we always use this interface 
to get the options to create our table source. There is no need to copy the 
catalog table itselt, we just need to generate our Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree to default 
disable it, a global default config would break the user experience too much, 
especially when user want to modify the options in a ad-hoc way.



I suggest to remove `TableSourceFactory#supportedHintOptions` or 
`TableSourceFactory#forbiddenHintOptions` based on the fact that we does not 
have black/white list for CREATE TABLE at all at lease for current codebase.


@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 reasons 
compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a blacklist make us 
have the ability to disable some of the keys under that, i.e. 
connector.property.key1 , a whitelist can only match with prefix

2. We want the connectors to have the ability to disable format type switch 
format.type but allows all the other properties, e.g. format.* without 
format.type(let's call it SET_B), if we use the whitelist, we have to enumerate 
all the specific format keys start with format (SET_B), but with the old 
connector factories, we have no idea what specific format keys it 
supports(there is either a format.* or nothing).

3. Except the cases for 1 and 2, for normal keys(no wildcard), the blacklist 
and whitelist has the same expressiveness, use blacklist makes the code not too 
verbose to enumerate all the duplicate keys with #supportedKeys .(Not very 
strong reason, but i think as a connector developer, it makes sense)

Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道:
> Hi everyone,
>
> @Aljoscha: I disagree with your approach because a `Catalog` can return
> a custom factory that is not using any properties. The hinting must be
> transparent to a factory. We should NOT modify the metadata
> `CatalogTable` at any point in time after the catalog.
>
> @Danny, @Jingsong: How about we stick to the original design that we
> wanted to vote on but use:
>
> Set<String> supportedHintProperties()
>
> This fits better to the old factory design. And for the new FLIP-95
> factories we will use `ConfigOption` and provide good utilities for
> merging with hints etc.
>
> We can allow `"format.*"` in `supportedHintProperties()` to allow
> hinting in formats.
>
> What do you think?
>
> Regards,
> Timo
>
>
> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > I think we're designing ourselves into ever more complicated corners
> > here. Maybe we need to take a step back and reconsider. What would you
> > think about this (somewhat) simpler proposal:
> >
> > - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > CONNECTOR_PROPERTIES, depending on what naming we want to have for this
> > in the future. This will simply overwrite all connector properties, the
> > table factories don't know about hints but simply work with the
> > properties that they are given
> >
> > - this special hint is disabled by default and can be activated with a
> > global option "foo.bazzle.connector-hints" (or something like this)
> > which has a warning that describes that this can change query semantics
> > etc.
> >
> > That's it. This makes connector implementations a lot easier while still
> > allowing inline configuration.
> >
> > I still don't like using hint syntax at all for this, because I strongly
> > maintain that hints should not change query syntax. In general using
> > hints should be kept to a minimum because they usually point to
> > shortcomings in the system.
> >
> > Best,
> > Aljoscha
> >
> > On 02.04.20 06:06, Jingsong Li wrote:
> > > Hi Dawid,
> > >
> > > > When a factory is instantiated it has access to the CatalogTable,
> > > therefore it has access to all the original properties. In turn it knows
> > > the original format and can call FormatFactory#supportedHintOptions().
> > >
> > > Factory can only get CatalogTable when creating source or sink,
> > > right? IIUC, TableFactory may be stateless too.
> > > When invoking SourceFactory#supportedHintOptions(), it can not
> > > get CatalogTable, so it is impossible to create FormatFactory?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao....@gmail.com> wrote:
> > >
> > > > Hi, Dawid, thanks so much for the detail suggestions ~
> > > >
> > > > 1. Regarding the motivation:
> > > >
> > > > I agree it's not a good suggested way based on the fact that we have
> > > > better solution, but i think we can support override that as long as it
> > > > exists as one of the the table options. I would remove if from the
> > > > motication part.
> > > >
> > > > 2. The options passes around during sql-to-rel conversion, right
> > > > after we
> > > > generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
> > > > indeed a push way method at least in the RelOptTable layer, then we hand
> > > > over the options to TableSourceFactory with our own context, which is
> > > > fine
> > > > becuase TableSourceFactory#Context is the contact to pass around these
> > > > table-about variables.
> > > >
> > > > 3. "We should not end up with an extreme example where we can change the
> > > > connector type", i totally agree that, and i have listed the
> > > > "connector.type" as forbidden attribute in the WIKI. As for the
> > > > format, i
> > > > think the connector itself can/should control whether to override the
> > > > "format.type", that is one of the reason i change the
> > > > TableSourceFactory#supportedHintOpitons to
> > > > TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> > > > limit the
> > > > format keys we want conveniently.
> > > >
> > > > 4. SQL Hints syntax.
> > > >
> > > > > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> > > > if it is equivalent to string_literal).
> > > >
> > > > I disagree, we at least should keep sync with our DDL: use the string
> > > > literal as the key. We did also support the simple identifier because
> > > > this
> > > > is the common hint syntax from Calcite, it does not hurt anything for
> > > > the
> > > > OPTIONS hint, the unsupported keys would validate fails.(If you think
> > > > that
> > > > may cause some confuse, i can make the syntax pluggable for each hint in
> > > > CALCITE 1.23)
> > > >
> > > > We only supports OPTIONS hint in the FLIP, and i have changed the
> > > > title to
> > > > "Supports dynamic table options", would make it more clear in the WIKI.
> > > >
> > > > 5. Yes, we also have this concerns from our offline discussion, that is
> > > > one of the reason, why i change the
> > > > TableSourceFactory#supportedHintOpitons
> > > > to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
> > > > supportedHintOptionKeys() with wildcard for 2 reasons:
> > > >
> > > >    - The wildcard is still not descriptive, we can still not
> > > > forbidden one
> > > > of the properties among the wildcard properties, we can not enable or
> > > > disable them totally
> > > >    - ConfigOption is our new structure for keys, and it does not support
> > > > wildcard yet.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > > > <dwysakow...@apache.org>,写道:
> > > > > Hi,
> > > > > Few comments from my side:
> > > > > 1. Regarding the motivation:
> > > > > I think the example with changing the update-mode is not a good one. 
> > > > > In
> > > > the long term this should be done with EMIT CHANGELOG (discussed in
> > > > FLIP-105).
> > > > > Nitpicking: I would mention that it is rather for debugging/ad-hoc
> > > > solution. I think this should not be a recommended way for production
> > > > use
> > > > cases as it bypasses the Catalog, which should be the source of truth.
> > > > > 2. I could not understand how the additional options will be passed to
> > > > the TableSourceFactory. Could you elaborate a bit more on that? I see
> > > > there
> > > > is a Context interface that gives the options. But cannot find a way
> > > > to get
> > > > the context itself in the factory. Moreover I think it would make more
> > > > sense to have rather a push based approach here. Something like
> > > > applyOptions(ReadableConfig) method.
> > > > > 3. As for the concerns Jingsong raised in the voting thread. I think 
> > > > > it
> > > > is not a big problem, but I agree this should be also described. I
> > > > disagree
> > > > with "Connector don't know format information in TableFactory before
> > > > obtains real properties, so it can not list any format
> > > > `supportedHintOptions`".
> > > > > When a factory is instantiated it has access to the CatalogTable,
> > > > therefore it has access to all the original properties. In turn it knows
> > > > the original format and can call FormatFactory#supportedHintOptions().
> > > > > The only case when this would not work would be if we allow changing
> > > > > the
> > > > format of the Table (e.g. from avro to parquet), which does not sound
> > > > like
> > > > a good idea to me. I think this feature should not end up as a way to
> > > > declare a whole table inline in a SQL query, but should rather be a
> > > > simple
> > > > way for debugging queries. We should not end up with an extreme example
> > > > where we do:
> > > > > SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> > > > 'format.type' = 'json', ....) */
> > > > > 4. SQL Hints syntax.
> > > > > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> > > > if it is equivalent to string_literal). I think we should not use
> > > > simple_identifier because this implies that we cannot use e.g. any SQL
> > > > keywords. Anyway it has nothing to do with identifiers. If I am not
> > > > mistaken it is also how the options in the CREATE statement are
> > > > implemented.
> > > > > What is the purpose of the remaining hint_item: hint_name(hint_opt [
> > > > ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> > > > feeling it
> > > > does also suggests to support the whole Apache Calcite hint system
> > > > without
> > > > specifying that explicitly. Is the intention of the FLIP to support
> > > > choosing e.g. JOIN strategies through hints already? If it is so it
> > > > should
> > > > be mentioned in the FLIP, imo.
> > > > > 5. I think something does not work around the supportedHintOptions and
> > > > wildcards. How do you want to represent a wildcard key as a
> > > > ConfigOption? I
> > > > am not sure about that, just a though, maybe it make sense to have
> > > > rather
> > > > Set<String> supportedHintOptionKeys()?
> > > > > Best,
> > > > > Dawid
> > > >
> > >
> > >
>

Reply via email to