I actually prefer the builder pattern for making the configs, because I can
do `builder.` in the IDE and actually see all the options... That said,
most developers program against the Spark datasource and so this may not be
useful, unless we expose a builder for that.. I will concede that since its
also subjective anyway.

But, to clarify Siva's question, you do intend to keep the different
component level config classes right - HoodieIndexConfig,
HoodieCompactionConfig?

Once again, can you please explicitly address the following question, so we
can get on the same page?
>> It is not clear to me whether there is any external facing changes which
changes this model.
This is still the most critical question from both me and balaji.

On Wed, Dec 11, 2019 at 11:35 AM lamberken <lamber...@163.com> wrote:

>  hi, @Sivabalan
>
> Yes, thanks very much for help me explain my initial proposal.
>
>
> Answer your question, we can call HoodieWriteConfig as a SystemConfig, we
> need to pass it everywhere.
> Actually, it may just contains a few custom configurations( does not
> include default configurations)
> Because each component has its own ConfigOptions.
>
>
> The old version HoodieWriteConfig includes all keys(custom configurations,
> default configurations), it is a fat.
>
>
> Best,
> lamber-ken
>
>
>
>
>
>
>
>
> At 2019-12-12 03:14:11, "Sivabalan" <n.siv...@gmail.com> wrote:
> >Let me summarize your initial proposal and then will get into details.
> >- Introduce ConfigOptions for ease of handling of default values.
> >- Remove all Hoodie*Config classes and just have HoodieWriteConfig. What
> >this means is that, every other config file will be replaced by
> >ConfigOptions. eg, HoodieIndexConfigOption, HoodieCompactionConfigOption,
> >etc.
> >- Config option will take care of returning defaults for any property,
> even
> >if an entire Config(eg IndexConfig) is not explicitly set.
> >
> >Here are the positives I see.
> >- By way of having component level ConfigOptions, we bucketize the configs
> >and have defaults set(same as before)
> >- User doesn't need to set each component's config(eg IndexConfig)
> >explicitly with HoodieWriteConfig.
> >
> >But have one question:
> >- I see Bucketizing only in write path. How does one get hold of
> >IndexConfigOptions as a consumer?  For eg, If some class is using just
> >IndexConfig alone, how will it consume? From your eg, I see only
> >HoodieWriteConfig. Do we pass in HoodieWriteConfig everywhere then?
> >Wouldn't that contradicts your initial proposal to not have a fat config
> >class? May be can you expand your example below to show how a consumer of
> >IndexConfig look like.
> >
> >Your eg:
> >/**
> > * New version
> > */
> >// set value overrite the default value
> >HoodieWriteConfig config = new HoodieWriteConfig();
> >config.set(HoodieIndexConfigOptions.INDEX_TYPE,
> >HoodieIndex.IndexType.HBASE.name <
> http://hoodieindex.indextype.hbase.name/>
> >())
> >
> >
> >
> >
> >On Wed, Dec 11, 2019 at 8:33 AM lamberken <lamber...@163.com> wrote:
> >
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >>
> >> On 1,2. Yes, you are right, moving the getter to the component level
> >> Config class itself.
> >>
> >>
> >> On 3, HoodieWriteConfig can also set value through ConfigOption, small
> >> code snippets.
> >> From the bellow snippets, we can see that clients need to know each
> >> component's builders
> >> and also call their "with" methods to override the default value in old
> >> version.
> >>
> >>
> >> But, in new version, clients just need to know each component's public
> >> config options, just like constants.
> >> So, these builders are redundant.
> >>
> >>
> >>
> /---------------------------------------------------------------------------/
> >>
> >>
> >> public class HoodieIndexConfigOptions {
> >>   public static final ConfigOption<String> INDEX_TYPE = ConfigOption
> >>           .key("hoodie.index.type")
> >>           .defaultValue(HoodieIndex.IndexType.BLOOM.name());
> >> }
> >>
> >>
> >> public class HoodieWriteConfig {
> >>   public void setString(ConfigOption<String> option, String value) {
> >>     this.props.put(option.key(), value);
> >>   }
> >> }
> >>
> >>
> >>
> >>
> >> /**
> >>  * New version
> >>  */
> >> // set value overrite the default value
> >> HoodieWriteConfig config = new HoodieWriteConfig();
> >> config.set(HoodieIndexConfigOptions.INDEX_TYPE,
> >> HoodieIndex.IndexType.HBASE.name())
> >>
> >>
> >>
> >>
> >> /**
> >>  * Old version
> >>  */
> >> HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
> >>
> >>
> builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
> >>
> >>
> >>
> >>
> /---------------------------------------------------------------------------/
> >>
> >>
> >> Another, users use hudi like bellow, here're all keys.
> >>
> >>
> /---------------------------------------------------------------------------/
> >>
> >>
> >> df.write.format("hudi").
> >>     option("hoodie.insert.shuffle.parallelism", "10").
> >>     option("hoodie.upsert.shuffle.parallelism", "10").
> >>     option("hoodie.delete.shuffle.parallelism", "10").
> >>     option("hoodie.bulkinsert.shuffle.parallelism", "10").
> >>     option("hoodie.datasource.write.recordkey.field", "name").
> >>     option("hoodie.datasource.write.partitionpath.field", "location").
> >>     option("hoodie.datasource.write.precombine.field", "ts").
> >>     option("hoodie.table.name", tableName).
> >>     mode(Overwrite).
> >>     save(basePath);
> >>
> >>
> >>
> >>
> /---------------------------------------------------------------------------/
> >>
> >>
> >>
> >>
> >> Last, as I responsed to @vino, it's reasonable to handle fallbackkeys. I
> >> think we need to do this step by step,
> >> it's easy to integrate FallbackKey in the future, it is not what we need
> >> right now in my opinion.
> >>
> >>
> >> If some places are still not very clear, feel free to feedback.
> >>
> >>
> >>
> >>
> >> Best,
> >> lamber-ken
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2019-12-11 23:41:31, "Vinoth Chandar" <vin...@apache.org> wrote:
> >> >Hi Lamber-ken,
> >> >
> >> >I looked at the sample PR you put up as well.
> >> >
> >> >On 1,2 => Seems your intent is to replace these with moving the getter
> to
> >> >the component level Config class itself? I am fine with that (although
> I
> >> >think its not that big of a hurdle really to use atm). But, once we do
> >> that
> >> >we could pass just the specific component config into parts of code
> versus
> >> >passing in the entire HoodieWriteConfig object. I am fine with moving
> the
> >> >classes to a ConfigOption class as you suggested as well.
> >> >
> >> >On 3, I still we feel we will need the builder pattern going forward.
> to
> >> >build the HoodieWriteConfig object. Like below? Cannot understand why
> we
> >> >would want to change this. Could you please clarify?
> >> >
> >> >HoodieWriteConfig.Builder builder =
> >> >
> >>
> HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
> >> >true)
> >> >
> >>
> .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName)
> >> >            // Inline compaction is disabled for continuous mode.
> >> >otherwise enabled for MOR
> >> >
> >> .withInlineCompaction(cfg.isInlineCompactionEnabled()).build())
> >> >        .forTable(cfg.targetTableName)
> >> >
> >>
> .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
> >> >        .withAutoCommit(false).withProps(props);
> >> >
> >> >
> >> >Typically, we write RFCs for large changes that breaks existing
> behavior
> >> or
> >> >introduces significantly complex new features.. If you are just
> planning
> >> to
> >> >do the refactoring into ConfigOption class, per se you don't need a
> RFC.
> >> >But , if you plan to address the fallback keys (or) your changes are
> going
> >> >to break/change existing jobs, we would need a RFC.
> >> >
> >> >>> It is not clear to me whether there is any external facing changes
> >> which
> >> >changes this model.
> >> >I am still unclear on this as well. can you please explicitly clarify?
> >> >
> >> >thanks
> >> >vinoth
> >> >
> >> >
> >> >On Tue, Dec 10, 2019 at 12:35 PM lamberken <lamber...@163.com> wrote:
> >> >
> >> >>
> >> >> Hi, @Balaji @Vinoth
> >> >>
> >> >>
> >> >> I'm sorry, some places are not very clear,
> >> >>
> >> >>
> >> >> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, etc..
> >> already
> >> >> defined in project.
> >> >>    But we get property value by methods which defined in
> >> >> HoodieWriteConfig, like HoodieWriteConfig#getParquetMaxFileSize,
> >> >>    HoodieWriteConfig#getParquetBlockSize, etc. It's means that
> >> >> Hoodie*Config are redundant.
> >> >>
> >> >>
> >> >> 2, These Hoodie*Config classes are used to set default value when
> call
> >> >> their build method, nothing else.
> >> >>
> >> >>
> >> >> 3, For current plan is keep the Builder pattern when configuring,
> when
> >> we
> >> >> are familiar with the config framework,
> >> >>    We will find that Hoodie*Config class are redundant and methods
> >> >> prefixed with "get" in HoodieWriteConfig are also redundant.
> >> >>
> >> >>
> >> >> In addition, I create a pr[1] for initializing with a demo. At this
> >> demo,
> >> >> I create
> >> >> MetricsGraphiteReporterOptions which contains HOST, PORT, PREFIX, and
> >> >> remove getGraphiteServerHost,
> >> >> getGraphiteServerPort, getGraphiteMetricPrefix in
> HoodieMetricsConfig.
> >> >>
> >> >>
> >> >> https://github.com/apache/incubator-hudi/pull/1094
> >> >>
> >> >>
> >> >> Best,
> >> >> lamber-ken
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2019-12-11 02:35:30, "Balaji Varadarajan"
> <v.bal...@ymail.com.INVALID
> >> >
> >> >> wrote:
> >> >> > Hi Lamber-Ken,
> >> >> >Thanks for the time writing the proposal and thinking about
> improving
> >> >> Hudi usability.
> >> >> >My preference would be to keep the Builder pattern when
> configuring. It
> >> >> is something I find it natural when configuring. It is not clear to
> me
> >> >> whether there is any external facing changes which changes this
> model.
> >> >> Would you mind adding some more details on the RFC. It would save
> time
> >> to
> >> >> read it in one place as opposed to checking out github repo :)
> >> >> >Thanks,Balaji.V
> >> >> >    On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar <
> >> >> vin...@apache.org> wrote:
> >> >> >
> >> >> > Hi ,
> >> >> >
> >> >> >Thanks for the proposal. Some parts I agree, some parts I don't and
> >> some
> >> >> >parts are unclear
> >> >> >
> >> >> >Agree :
> >> >> >- On introducing a class that binds key, default value, provided
> value,
> >> >> and
> >> >> >also may be a doc along with it (?).
> >> >> >- Designing the framework to have fallback keys is good IMO. It
> helps
> >> us
> >> >> do
> >> >> >things like https://issues.apache.org/jira/browse/HUDI-89
> >> >> >
> >> >> >Disagree :
> >> >> >- Not all configuration values are in HoodieWriteConfig, its not
> >> accurate.
> >> >> >Configs are already split by components into HoodieIndexConfig,
> >> >> >HoodieCompactionConfig etc..
> >> >> >- There are helpers for all these conveniently located in
> >> >> >HoodieWriteConfig. I think some of the claims of usability seem
> >> subjective
> >> >> >to me, speaking from hands-on experience writing jobs. So, if you
> >> >> proposing
> >> >> >a large shake up (e.g not have a single properties file load all
> >> >> >components), I would love to understand this at more depth. From my
> >> >> >experience, well namespaced configs in a single properties file
> keeps
> >> it
> >> >> >simple and understandable.
> >> >> >
> >> >> >Unclear :
> >> >> >- What is impact on existing jobs - using  RDD/WriteClient API,
> >> >> DataSource,
> >> >> >DeltaStreamer level? Do you intend to change namespacing of configs?
> >> >> >
> >> >> >
> >> >> >Thanks
> >> >> >Vinoth
> >> >> >
> >> >> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <lamber...@163.com>
> wrote:
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Hi, vino
> >> >> >>
> >> >> >>
> >> >> >> Reasonable,  we can refactor this step by step. The first step now
> >> is to
> >> >> >> introduce the config framework.
> >> >> >> When our community is familiar with the config framework
> mechanism,
> >> it's
> >> >> >> easy to integrate FallbackKey in the future.
> >> >> >>
> >> >> >>
> >> >> >> Best,
> >> >> >> lamber-ken
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> At 2019-12-10 11:51:22, "vino yang" <yanghua1...@gmail.com>
> wrote:
> >> >> >> >Hi Lamber,
> >> >> >> >
> >> >> >> >Thanks for the proposal. +1 from my side.
> >> >> >> >
> >> >> >> >When it comes to configuration, it will involve how we handle
> >> >> deprecated
> >> >> >> >configuration items in the future. In my opinion, we need to take
> >> this
> >> >> >> into
> >> >> >> >consideration when designing. There are already some successful
> >> >> practices
> >> >> >> >for our reference. For example, Flink defines some deprecated
> >> >> >> >configurations as FallbackKey[1]. Maybe we can learn from these
> >> >> designs.
> >> >> >> >
> >> >> >> >WDYT?
> >> >> >> >
> >> >> >> >[1]:
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java
> >> >> >> >
> >> >> >> >Best,
> >> >> >> >Vino
> >> >> >> >
> >> >> >> >lamberken <lamber...@163.com> 于2019年12月9日周一 下午11:19写道:
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Hi, all
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Currently, many configuration items and their default values
> are
> >> >> >> dispersed
> >> >> >> >> in the config file like HoodieWriteConfig. It’s very confused
> for
> >> >> >> >> developers, and it's easy for developers to use them in a wrong
> >> place
> >> >> >> >> especially when there are more and more configuration items.
> If we
> >> >> can
> >> >> >> >> solve this, developers will benefit from it and the code
> structure
> >> >> will
> >> >> >> be
> >> >> >> >> more concise.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I had create a JIRA[1] and a under discuss RFC[2] to explain
> how
> >> to
> >> >> >> solve
> >> >> >> >> the problem, if you are interested in this, you can visit jira
> and
> >> >> RFC
> >> >> >> for
> >> >> >> >> detail. Any comments and feedback are welcome, WDYT?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Best,
> >> >> >> >> lamber-ken
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> [1]
> https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375
> >> >> >> >> [2]
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project
> >> >> >>
> >> >>
> >>
> >
> >
> >--
> >Regards,
> >-Sivabalan
>

Reply via email to