Hi all,

thanks for all the feedback we have received so far. Also from the offline feedback that I have received so far, it seems this is a very sensitive topic. I think we have reached consensus that we want to improve the configuration experience in Flink. We just need to figure out the "how". Maybe FLIP-54 contains too many features at the same time which made the discussion around this topic quite difficult to follow.

In order to unblock the efforts, I would like to split FLIP-54 into 3 independent FLIPs:

- One FLIP for adding a new way of declaring, reading and writing config options with a basic set of data types. - One FLIP for adding the concept of a "tuple/struct/object" (terminology is up for discussion) data type. - One FLIP that discusses validation for config options and the concept of config option sets.

This allows us to have progress and unblock features that depend on parts of FLIP-54.

What do you think?

Thanks,
Timo


On 30.09.19 12:03, Kostas Kloudas wrote:
Hi all,

I would also like to give a +1 for supporting lists as config options
with the delimeter being a parameter (if we cannot find a consensus).
To some extent the current codebase has already solved the issue by
already having lists as options, but the problem is that so far there
was no principled way of doing it so everyone has his/her own way of
encoding such lists. For example, we have:

CoreOptions.TMP_DIRS: separated by ",", "|", or the system's path separator
Whatever uses the NetUtils.getPortRangeFromString(), such as
     QueryableStateOptions.PROXY_PORT_RANGE and
     QueryableStateOptions.SERVER_PORT_RANGE : it can be a port:
"9123", a range of ports: "50100-50200", or a list of ranges and
ports: "50100-50200,50300-50400,51234"
dynamicproperties in the FlinkYarnSessionCLI: which uses @@ as
delimeter. This is not strictly a config option, but it would be nice
to aim for unified user experience in CLI and config files.

Cheers,
Kostas

On Thu, Sep 5, 2019 at 8:42 AM Becket Qin <becket....@gmail.com> wrote:
Hi Dawid,

Thanks a lot for the clarification. Got it now. A few more thoughts:

1. Naming.
I agree that if the name of "Configurable" is a little misleading if we
just want to use it to save POJOs. It would probably help to just name it
something like "ConfigPojo".

2. Flat config map v.s. structured config map.
 From user's perspective, I personally find a flat config map is usually
easier to understand than a structured config format. But it is just my
personal opinion and up for debate.

Taking the Host and CachedFile as examples, personally I think the
following format is more concise and user friendly:

Host: 192.168.0.1:1234 (single config)
Hosts: 192.168.0.1:1234, 192.168.0.2:5678 (list of configs)

CachedFile: path:file:flag (single config)
CachedFile: path1:file1:flag1, path2:file2:flag2 (list config)

Maybe for complicate POJOs the full K-V pair would be necessary, but it
looks we are trying to avoid such complicated POJOs to begin with. Even if
a full K-V is needed, a List<Map<String, String>> format would also be
almost equivalent to the current design.

3. The necessity of the POJO class in Configuration / ConfigOption system.
I can see the convenience of have a POJO (or ConfigObject) type supported
in the Configuration / ConfigOption. However, one thing to notice is that
API wise, the ConfigurableFactory can return arbitrary type of class
instead of just POJO. This can easily be misused or abused in cases such as
plugins. And the current API design will force such plugins to implement
methods like toConfiguration() which is a little awkward.

Given that 1) there will not be many such Pojo classes and 2) these POJO
classes are defined by Flink, I am thinking that one alternative approach
is to just have the constructors to take the configuration String (or list
of string) and parse that. This will avoid a few complications in this FLIP.
   a) No need to have the ConfigurableFactory class
   b) No need to have the toConfiguration() implementation. So there is just
one way to set values in the Configuration instance.
   c) The Configuration / ConfigOption does not have to also deal with the
Object creation. Instead, they will simply focus on configuration itself.

Thanks for the patient discussion. I don't want to block this FLIP further,
so I am fine to go with the current design with changing the name of
Configurable to something like ConfigPojo in order to avoid misuse as much
as possible.

Thanks,

Jiangjie (Becket) Qin

On Wed, Sep 4, 2019 at 5:50 PM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

Hi Becket,

You are right, that what we had in mind for
ExecutionConfig/CheckpointConfig etc. is the option b) from your email.
In the context of the FLIP-54, those objects are not Configurable. What
we understood as a Configurable by the FLIP-54 are a simple pojos, that
are stored under a single key. Such as the examples either from the ML
thread (Host) or from the design doc (CacheFile). So when configuring
the host user can provide a host like this:

connector.host: address:localhost, port:1234

rather than

connector.host.address: localhost

connector.host.port: 1234

This is important especially if one wants to configure lists of such
objects:

connector.hosts: address:localhost,port:1234;address:localhost,port:4567

The intention was definitely not to store whole complex objects, such as
ExecutionConfig, CheckpointConfig etc. that contain multiple different
options Maybe it makes sense to call it ConfigObject as Aljosha
suggested? What do you think? Would that make it more understandable?

For the initialization/configuration of objects such as ExecutionConfig,
CheckpointConfig you may have a look at FLIP-59[1] where we suggest to
add a configure method to those classes and we pretty much describe the
process you outline in the last message.

Best,

Dawid

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object

On 04/09/2019 03:37, Becket Qin wrote:
Hi Timo, Dawid and Aljoscha,

Thanks for clarifying the goals. It is very helpful to understand the
motivation here. It would be great to add them to the FLIP wiki.

I agree that the current FLIP design achieves the two goals it wants to
achieve. But I am trying to see is if the current approach is the most
reasonable approach.

Please let me check if I understand this correctly. From end users'
perspective, they will do the following when they want to configure their
Flink Jobs.
1. Create a Configuration instance, and call setters of Configuration
with
the ConfigOptions defined in different components.
2. The Configuration created in step 1 will be passed around, and each
component will just exact their own options from it.
3. ExecutionConfig, CheckpointConfig (and other Config classes) will
become
a Configurable, which is responsible for extracting the configuration
values from the Configuration set by users in step 1.

The confusion I had was that in step 1, how users are going to set the
configs for the ExecutionConfig / CheckpointConfig? There may be two
ways:
a) Users will call setConfigurable(ExectionConfigConfigurableOption,
"config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is
exposed as a Configurable to the users.
b) Users will call setInteger(MAX_PARALLELISM, 1),
setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will
set individual ConfigOptions for the ExecutionConfig. And they do not see
ExecutionConfig as a Configurable.

I assume we are following b), then do we need to expose Configurable to
the
users in this FLIP? My concern is that the Configurable may be related to
other mechanism such as plugin which we have not really thought through
in
this FLIP.

I know Becket at least has some thoughts about immutability and loading
objects via the configuration but maybe they could be put into a
follow-up
FLIP if they are needed.
I am perfectly fine to leave something out of the scope of this FLIP to
later FLIPs. But I think it is important to avoid introducing something
in
this FLIP that will be shortly changed by the follow-up FLIPs.

Thanks,

Jiangjie (Becket) Qin

On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <aljos...@apache.org>
wrote:
Hi,

I think it’s important to keep in mind the original goals of this FLIP
and
not let the scope grow indefinitely. As I recall it, the goals are:

  - Extend the ConfigOption system enough to allow the Table API to
configure options that are right now only available on
CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment.
We
also want to do this without manually having to “forward” all the
available
configuration options by introducing equivalent setters in the Table API

  - Do the above while keeping in mind that eventually we want to allow
users to configure everything from either the flink-conf.yaml, vie
command
line parameters, or via a Configuration.

I think the FLIP achieves this, with the added side goals of making
validation a part of ConfigOptions, making them type safe, and making
the
validation constraints documentable (via automatic doc generation.) All
this without breaking backwards compatibility, if I’m not mistaken.

I think we should first agree what the basic goals are so that we can
quickly converge to consensus on this FLIP because it blocks other
people/work. Among other things FLIP-59 depends on this. What are other
opinions that people have? I know Becket at least has some thoughts
about
immutability and loading objects via the configuration but maybe they
could
be put into a follow-up FLIP if they are needed.

Also, I had one thought on the interaction of this FLIP-54 and FLIP-59
when it comes to naming. I think eventually it makes sense to have a
common
interface for things that are configurable from a Configuration (FLIP-59
introduces the first batch of this). It seems natural to call this
interface Configurable. That’s a problem for this FLIP-54 because we
also
introduce a Configurable. Maybe the thing that we introduce here should
be
called ConfigObject or ConfigStruct to highlight that it has a more
narrow
focus and is really only a POJO for holding a bunch of config options
that
have to go together. What do you think?

Best,
Aljoscha

On 3. Sep 2019, at 14:08, Timo Walther <twal...@apache.org> wrote:

Hi Danny,

yes, this FLIP covers all the building blocks we need also for
unification of the DDL properties.
Regards,
Timo


On 03.09.19 13:45, Danny Chan wrote:
with the new SQL DDL
based on properties as well as more connectors and formats coming up,
unified configuration becomes more important

I Cann’t agree more, do you think we can unify the config options key
format here for all the DDL properties ?
Best,
Danny Chan
在 2019年8月16日 +0800 PM10:12,dev@flink.apache.org,写道:
with the new SQL DDL
based on properties as well as more connectors and formats coming up,
unified configuration becomes more important


Reply via email to