+1,
We could refer the Flink configuration to add some other properties for 
ConfigOption like description, fallbackKeys (collect the deprecated keys)
 
Thanks,
Harvey Yue


On 01/10/2020 06:54,lamberken<lamber...@163.com> wrote:


hi @Vinoth Chandar,


Got it, thanks.


best,
lamber-ken







At 2020-01-09 23:52:52, "Vinoth Chandar" <vin...@apache.org> wrote:
Hi lamber-ken,

A ConfigOption class would be good indeed. +1 on starting incrementally
with DataSource first and then iterating..

Thanks
Vinoth

On Tue, Jan 7, 2020 at 6:58 PM lamberken <lamber...@163.com> wrote:



Hi @Vinoth,


It's time to pick up this topic. Based on the content we talked about,
here are my thoughts


1, Initial proposal aims to rework configuration framework
includes(DataSource and WriteClient level),
for compatibility, we can introduce a ConfigOption class and rework it on
DataSource level.


2, It's very right that the scoped down version does not need a RFC[1], so
change state from 'Under Discussion' to 'Close' ?


[1]
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+11+%3A+Refactor+of+the+configuration+framework+of+hudi+project


Best,
Lamber-Ken









At 2019-12-19 11:05:16, "Vinoth Chandar" <vin...@apache.org> wrote:
Sounds good.. This scoped down version per se, does not need a RFC.

On Wed, Dec 18, 2019 at 3:09 PM lamberken <lamber...@163.com> wrote:



Hi @Vinoth


I understand what you mean, I will continue to work on this when I
finish
reworking the new UI. :)


best,
lamber-ken




At 2019-12-18 11:39:30, "Vinoth Chandar" <vin...@apache.org> wrote:
Expect most users to use inputDF.write() approach...  Uber uses the
lower
level RDD apis, like the DeltaStreamer tool does..
If we don't rename configs and still support a builder, it should be
fine.

I think we can scope this down to introducing a ConfigOption class that
ties, the key,value, default together.. That definitely seems like a
better
abstraction.

On Fri, Dec 13, 2019 at 5:18 PM lamberken <lamber...@163.com> wrote:



Hi, @vinoth


Okay, I see. If we don't want existing users to do any upgrading or
reconfigurations, then this refactor work will not make much sense.
This issue can be closed, because ConfigOptions and these builders do
the
same things.
From another side, if we finish this work before a stable release, we
will
benefit a lot from it. We need to make a choice.


btw, I have a question that users will use HoodieWriteConfig /
HoodieWriteClient in their program?



/----------------------------------------------------------------------------------------
HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
.withPath(basePath)
.forTable(tableName)
.withSchema(schemaStr)
.withProps(props) // pass raw k,v pairs from a property file.



.withCompactionConfig(HoodieCompactionConfig.newBuilder().withXXX(...).build())

.withIndexConfig(HoodieIndexConfig.newBuilder().withXXX(...).build())
...
.build();



/----------------------------------------------------------------------------------------
OR



/----------------------------------------------------------------------------------------
inputDF.write()
.format("org.apache.hudi")
.options(clientOpts) // any of the Hudi client opts can be
passed in
as well
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),
"_row_key")
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
"partition")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),
"timestamp")
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.mode(SaveMode.Append)
.save(basePath);



/----------------------------------------------------------------------------------------




best,
lamber-ken

At 2019-12-14 08:43:06, "Vinoth Chandar" <vin...@apache.org> wrote:
Hi,

Are you saying these classes needs to change? If so, understood. But
are
you planning on renaming configs or relocating them? We dont want
existing
users to do any upgrading or reconfigurations

On Fri, Dec 13, 2019 at 10:28 AM lamberken <lamber...@163.com>
wrote:



Hi,


They need to change due to this, because only HoodieWriteConfig
and
*Options will be kept.


best,
lamber-ken


At 2019-12-14 01:23:35, "Vinoth Chandar" <vin...@apache.org>
wrote:
Hi,

We are trying to understand if existing jobs (datasource,
deltastreamer,
anything else) needs to change due to this.

On Wed, Dec 11, 2019 at 7:18 PM lamberken <lamber...@163.com>
wrote:



Hi, @vinoth


1, Hoodie*Config classes are only used to set default value
when
call
their build method currently.
They will be replaced by HoodieMemoryOptions,
HoodieIndexOptions,
HoodieHBaseIndexOptions, etc...
2, I don't understand the question "It is not clear to me
whether
there
is
any external facing changes which changes this model.".


Best,
lamber-ken


At 2019-12-12 11:01:36, "Vinoth Chandar" <vin...@apache.org>
wrote:
I actually prefer the builder pattern for making the configs,
because I
can
do `builder.` in the IDE and actually see all the options...
That
said,
most developers program against the Spark datasource and so
this
may
not
be
useful, unless we expose a builder for that.. I will concede
that
since
its
also subjective anyway.

But, to clarify Siva's question, you do intend to keep the
different
component level config classes right - HoodieIndexConfig,
HoodieCompactionConfig?

Once again, can you please explicitly address the following
question,
so
we
can get on the same page?
It is not clear to me whether there is any external facing
changes
which
changes this model.
This is still the most critical question from both me and
balaji.

On Wed, Dec 11, 2019 at 11:35 AM lamberken <lamber...@163.com

wrote:

hi, @Sivabalan

Yes, thanks very much for help me explain my initial
proposal.


Answer your question, we can call HoodieWriteConfig as a
SystemConfig,
we
need to pass it everywhere.
Actually, it may just contains a few custom configurations(
does
not
include default configurations)
Because each component has its own ConfigOptions.


The old version HoodieWriteConfig includes all keys(custom
configurations,
default configurations), it is a fat.


Best,
lamber-ken








At 2019-12-12 03:14:11, "Sivabalan" <n.siv...@gmail.com>
wrote:
Let me summarize your initial proposal and then will get
into
details.
- Introduce ConfigOptions for ease of handling of default
values.
- Remove all Hoodie*Config classes and just have
HoodieWriteConfig.
What
this means is that, every other config file will be
replaced
by
ConfigOptions. eg, HoodieIndexConfigOption,
HoodieCompactionConfigOption,
etc.
- Config option will take care of returning defaults for
any
property,
even
if an entire Config(eg IndexConfig) is not explicitly set.

Here are the positives I see.
- By way of having component level ConfigOptions, we
bucketize
the
configs
and have defaults set(same as before)
- User doesn't need to set each component's config(eg
IndexConfig)
explicitly with HoodieWriteConfig.

But have one question:
- I see Bucketizing only in write path. How does one get
hold
of
IndexConfigOptions as a consumer?  For eg, If some class is
using
just
IndexConfig alone, how will it consume? From your eg, I see
only
HoodieWriteConfig. Do we pass in HoodieWriteConfig
everywhere
then?
Wouldn't that contradicts your initial proposal to not
have a
fat
config
class? May be can you expand your example below to show
how a
consumer
of
IndexConfig look like.

Your eg:
/**
* New version
*/
// set value overrite the default value
HoodieWriteConfig config = new HoodieWriteConfig();
config.set(HoodieIndexConfigOptions.INDEX_TYPE,
HoodieIndex.IndexType.HBASE.name <
http://hoodieindex.indextype.hbase.name/>
())




On Wed, Dec 11, 2019 at 8:33 AM lamberken <
lamber...@163.com>
wrote:



Hi,




On 1,2. Yes, you are right, moving the getter to the
component
level
Config class itself.


On 3, HoodieWriteConfig can also set value through
ConfigOption,
small
code snippets.
From the bellow snippets, we can see that clients need to
know
each
component's builders
and also call their "with" methods to override the
default
value
in
old
version.


But, in new version, clients just need to know each
component's
public
config options, just like constants.
So, these builders are redundant.








/---------------------------------------------------------------------------/


public class HoodieIndexConfigOptions {
public static final ConfigOption<String> INDEX_TYPE =
ConfigOption
.key("hoodie.index.type")
.defaultValue(HoodieIndex.IndexType.BLOOM.name
());
}


public class HoodieWriteConfig {
public void setString(ConfigOption<String> option,
String
value) {
this.props.put(option.key(), value);
}
}




/**
* New version
*/
// set value overrite the default value
HoodieWriteConfig config = new HoodieWriteConfig();
config.set(HoodieIndexConfigOptions.INDEX_TYPE,
HoodieIndex.IndexType.HBASE.name())




/**
* Old version
*/
HoodieWriteConfig.Builder builder =
HoodieWriteConfig.newBuilder()







builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())









/---------------------------------------------------------------------------/


Another, users use hudi like bellow, here're all keys.







/---------------------------------------------------------------------------/


df.write.format("hudi").
option("hoodie.insert.shuffle.parallelism", "10").
option("hoodie.upsert.shuffle.parallelism", "10").
option("hoodie.delete.shuffle.parallelism", "10").
option("hoodie.bulkinsert.shuffle.parallelism",
"10").
option("hoodie.datasource.write.recordkey.field",
"name").
option("hoodie.datasource.write.partitionpath.field",
"location").
option("hoodie.datasource.write.precombine.field",
"ts").
option("hoodie.table.name", tableName).
mode(Overwrite).
save(basePath);









/---------------------------------------------------------------------------/




Last, as I responsed to @vino, it's reasonable to handle
fallbackkeys. I
think we need to do this step by step,
it's easy to integrate FallbackKey in the future, it is
not
what
we
need
right now in my opinion.


If some places are still not very clear, feel free to
feedback.




Best,
lamber-ken












At 2019-12-11 23:41:31, "Vinoth Chandar" <
vin...@apache.org

wrote:
Hi Lamber-ken,

I looked at the sample PR you put up as well.

On 1,2 => Seems your intent is to replace these with
moving
the
getter
to
the component level Config class itself? I am fine with
that
(although
I
think its not that big of a hurdle really to use atm).
But,
once
we
do
that
we could pass just the specific component config into
parts of
code
versus
passing in the entire HoodieWriteConfig object. I am
fine
with
moving
the
classes to a ConfigOption class as you suggested as
well.

On 3, I still we feel we will need the builder pattern
going
forward.
to
build the HoodieWriteConfig object. Like below? Cannot
understand
why
we
would want to change this. Could you please clarify?

HoodieWriteConfig.Builder builder =







HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
true)







.withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName)
// Inline compaction is disabled for
continuous
mode.
otherwise enabled for MOR


.withInlineCompaction(cfg.isInlineCompactionEnabled()).build())
.forTable(cfg.targetTableName)







.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
.withAutoCommit(false).withProps(props);


Typically, we write RFCs for large changes that breaks
existing
behavior
or
introduces significantly complex new features.. If you
are
just
planning
to
do the refactoring into ConfigOption class, per se you
don't
need a
RFC.
But , if you plan to address the fallback keys (or) your
changes
are
going
to break/change existing jobs, we would need a RFC.

It is not clear to me whether there is any external
facing
changes
which
changes this model.
I am still unclear on this as well. can you please
explicitly
clarify?

thanks
vinoth


On Tue, Dec 10, 2019 at 12:35 PM lamberken <
lamber...@163.com

wrote:


Hi, @Balaji @Vinoth


I'm sorry, some places are not very clear,


1, We can see that HoodieMetricsConfig,
HoodieStorageConfig,
etc..
already
defined in project.
But we get property value by methods which defined
in
HoodieWriteConfig, like
HoodieWriteConfig#getParquetMaxFileSize,
HoodieWriteConfig#getParquetBlockSize, etc. It's
means
that
Hoodie*Config are redundant.


2, These Hoodie*Config classes are used to set default
value
when
call
their build method, nothing else.


3, For current plan is keep the Builder pattern when
configuring,
when
we
are familiar with the config framework,
We will find that Hoodie*Config class are redundant
and
methods
prefixed with "get" in HoodieWriteConfig are also
redundant.


In addition, I create a pr[1] for initializing with a
demo.
At
this
demo,
I create
MetricsGraphiteReporterOptions which contains HOST,
PORT,
PREFIX,
and
remove getGraphiteServerHost,
getGraphiteServerPort, getGraphiteMetricPrefix in
HoodieMetricsConfig.


https://github.com/apache/incubator-hudi/pull/1094


Best,
lamber-ken







At 2019-12-11 02:35:30, "Balaji Varadarajan"
<v.bal...@ymail.com.INVALID

wrote:
Hi Lamber-Ken,
Thanks for the time writing the proposal and thinking
about
improving
Hudi usability.
My preference would be to keep the Builder pattern
when
configuring. It
is something I find it natural when configuring. It is
not
clear
to
me
whether there is any external facing changes which
changes
this
model.
Would you mind adding some more details on the RFC. It
would
save
time
to
read it in one place as opposed to checking out github
repo
:)
Thanks,Balaji.V
On Tuesday, December 10, 2019, 07:55:01 AM PST,
Vinoth
Chandar <
vin...@apache.org> wrote:

Hi ,

Thanks for the proposal. Some parts I agree, some
parts
I
don't
and
some
parts are unclear

Agree :
- On introducing a class that binds key, default
value,
provided
value,
and
also may be a doc along with it (?).
- Designing the framework to have fallback keys is
good
IMO.
It
helps
us
do
things like
https://issues.apache.org/jira/browse/HUDI-89

Disagree :
- Not all configuration values are in
HoodieWriteConfig,
its
not
accurate.
Configs are already split by components into
HoodieIndexConfig,
HoodieCompactionConfig etc..
- There are helpers for all these conveniently
located
in
HoodieWriteConfig. I think some of the claims of
usability
seem
subjective
to me, speaking from hands-on experience writing
jobs.
So,
if
you
proposing
a large shake up (e.g not have a single properties
file
load
all
components), I would love to understand this at more
depth.
From
my
experience, well namespaced configs in a single
properties
file
keeps
it
simple and understandable.

Unclear :
- What is impact on existing jobs - using
RDD/WriteClient
API,
DataSource,
DeltaStreamer level? Do you intend to change
namespacing of
configs?


Thanks
Vinoth

On Tue, Dec 10, 2019 at 6:44 AM lamberken <
lamber...@163.com>
wrote:



Hi, vino


Reasonable,  we can refactor this step by step. The
first
step
now
is to
introduce the config framework.
When our community is familiar with the config
framework
mechanism,
it's
easy to integrate FallbackKey in the future.


Best,
lamber-ken



At 2019-12-10 11:51:22, "vino yang" <
yanghua1...@gmail.com>
wrote:
Hi Lamber,

Thanks for the proposal. +1 from my side.

When it comes to configuration, it will involve
how
we
handle
deprecated
configuration items in the future. In my opinion,
we
need
to
take
this
into
consideration when designing. There are already
some
successful
practices
for our reference. For example, Flink defines some
deprecated
configurations as FallbackKey[1]. Maybe we can
learn
from
these
designs.

WDYT?

[1]:









https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java

Best,
Vino

lamberken <lamber...@163.com> 于2019年12月9日周一
下午11:19写道:



Hi, all


Currently, many configuration items and their
default
values
are
dispersed
in the config file like HoodieWriteConfig. It’s
very
confused
for
developers, and it's easy for developers to use
them
in a
wrong
place
especially when there are more and more
configuration
items.
If we
can
solve this, developers will benefit from it and
the
code
structure
will
be
more concise.


I had create a JIRA[1] and a under discuss
RFC[2]
to
explain
how
to
solve
the problem, if you are interested in this, you
can
visit
jira
and
RFC
for
detail. Any comments and feedback are welcome,
WDYT?


Best,
lamber-ken


[1]

https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375
[2]









https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project





--
Regards,
-Sivabalan






Reply via email to