hi, @Sivabalan Yes, thanks very much for help me explain my initial proposal.
Answer your question, we can call HoodieWriteConfig as a SystemConfig, we need to pass it everywhere. Actually, it may just contains a few custom configurations( does not include default configurations) Because each component has its own ConfigOptions. The old version HoodieWriteConfig includes all keys(custom configurations, default configurations), it is a fat. Best, lamber-ken At 2019-12-12 03:14:11, "Sivabalan" <[email protected]> wrote: >Let me summarize your initial proposal and then will get into details. >- Introduce ConfigOptions for ease of handling of default values. >- Remove all Hoodie*Config classes and just have HoodieWriteConfig. What >this means is that, every other config file will be replaced by >ConfigOptions. eg, HoodieIndexConfigOption, HoodieCompactionConfigOption, >etc. >- Config option will take care of returning defaults for any property, even >if an entire Config(eg IndexConfig) is not explicitly set. > >Here are the positives I see. >- By way of having component level ConfigOptions, we bucketize the configs >and have defaults set(same as before) >- User doesn't need to set each component's config(eg IndexConfig) >explicitly with HoodieWriteConfig. > >But have one question: >- I see Bucketizing only in write path. How does one get hold of >IndexConfigOptions as a consumer? For eg, If some class is using just >IndexConfig alone, how will it consume? From your eg, I see only >HoodieWriteConfig. Do we pass in HoodieWriteConfig everywhere then? >Wouldn't that contradicts your initial proposal to not have a fat config >class? May be can you expand your example below to show how a consumer of >IndexConfig look like. > >Your eg: >/** > * New version > */ >// set value overrite the default value >HoodieWriteConfig config = new HoodieWriteConfig(); >config.set(HoodieIndexConfigOptions.INDEX_TYPE, >HoodieIndex.IndexType.HBASE.name <http://hoodieindex.indextype.hbase.name/> >()) > > > > >On Wed, Dec 11, 2019 at 8:33 AM lamberken <[email protected]> wrote: > >> >> >> Hi, >> >> >> >> >> On 1,2. Yes, you are right, moving the getter to the component level >> Config class itself. >> >> >> On 3, HoodieWriteConfig can also set value through ConfigOption, small >> code snippets. >> From the bellow snippets, we can see that clients need to know each >> component's builders >> and also call their "with" methods to override the default value in old >> version. >> >> >> But, in new version, clients just need to know each component's public >> config options, just like constants. >> So, these builders are redundant. >> >> >> /---------------------------------------------------------------------------/ >> >> >> public class HoodieIndexConfigOptions { >> public static final ConfigOption<String> INDEX_TYPE = ConfigOption >> .key("hoodie.index.type") >> .defaultValue(HoodieIndex.IndexType.BLOOM.name()); >> } >> >> >> public class HoodieWriteConfig { >> public void setString(ConfigOption<String> option, String value) { >> this.props.put(option.key(), value); >> } >> } >> >> >> >> >> /** >> * New version >> */ >> // set value overrite the default value >> HoodieWriteConfig config = new HoodieWriteConfig(); >> config.set(HoodieIndexConfigOptions.INDEX_TYPE, >> HoodieIndex.IndexType.HBASE.name()) >> >> >> >> >> /** >> * Old version >> */ >> HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder() >> >> builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) >> >> >> >> /---------------------------------------------------------------------------/ >> >> >> Another, users use hudi like bellow, here're all keys. >> >> /---------------------------------------------------------------------------/ >> >> >> df.write.format("hudi"). >> option("hoodie.insert.shuffle.parallelism", "10"). >> option("hoodie.upsert.shuffle.parallelism", "10"). >> option("hoodie.delete.shuffle.parallelism", "10"). >> option("hoodie.bulkinsert.shuffle.parallelism", "10"). >> option("hoodie.datasource.write.recordkey.field", "name"). >> option("hoodie.datasource.write.partitionpath.field", "location"). >> option("hoodie.datasource.write.precombine.field", "ts"). >> option("hoodie.table.name", tableName). >> mode(Overwrite). >> save(basePath); >> >> >> >> /---------------------------------------------------------------------------/ >> >> >> >> >> Last, as I responsed to @vino, it's reasonable to handle fallbackkeys. I >> think we need to do this step by step, >> it's easy to integrate FallbackKey in the future, it is not what we need >> right now in my opinion. >> >> >> If some places are still not very clear, feel free to feedback. >> >> >> >> >> Best, >> lamber-ken >> >> >> >> >> >> >> >> >> >> >> >> >> At 2019-12-11 23:41:31, "Vinoth Chandar" <[email protected]> wrote: >> >Hi Lamber-ken, >> > >> >I looked at the sample PR you put up as well. >> > >> >On 1,2 => Seems your intent is to replace these with moving the getter to >> >the component level Config class itself? I am fine with that (although I >> >think its not that big of a hurdle really to use atm). But, once we do >> that >> >we could pass just the specific component config into parts of code versus >> >passing in the entire HoodieWriteConfig object. I am fine with moving the >> >classes to a ConfigOption class as you suggested as well. >> > >> >On 3, I still we feel we will need the builder pattern going forward. to >> >build the HoodieWriteConfig object. Like below? Cannot understand why we >> >would want to change this. Could you please clarify? >> > >> >HoodieWriteConfig.Builder builder = >> > >> HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes, >> >true) >> > >> .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName) >> > // Inline compaction is disabled for continuous mode. >> >otherwise enabled for MOR >> > >> .withInlineCompaction(cfg.isInlineCompactionEnabled()).build()) >> > .forTable(cfg.targetTableName) >> > >> .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) >> > .withAutoCommit(false).withProps(props); >> > >> > >> >Typically, we write RFCs for large changes that breaks existing behavior >> or >> >introduces significantly complex new features.. If you are just planning >> to >> >do the refactoring into ConfigOption class, per se you don't need a RFC. >> >But , if you plan to address the fallback keys (or) your changes are going >> >to break/change existing jobs, we would need a RFC. >> > >> >>> It is not clear to me whether there is any external facing changes >> which >> >changes this model. >> >I am still unclear on this as well. can you please explicitly clarify? >> > >> >thanks >> >vinoth >> > >> > >> >On Tue, Dec 10, 2019 at 12:35 PM lamberken <[email protected]> wrote: >> > >> >> >> >> Hi, @Balaji @Vinoth >> >> >> >> >> >> I'm sorry, some places are not very clear, >> >> >> >> >> >> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, etc.. >> already >> >> defined in project. >> >> But we get property value by methods which defined in >> >> HoodieWriteConfig, like HoodieWriteConfig#getParquetMaxFileSize, >> >> HoodieWriteConfig#getParquetBlockSize, etc. It's means that >> >> Hoodie*Config are redundant. >> >> >> >> >> >> 2, These Hoodie*Config classes are used to set default value when call >> >> their build method, nothing else. >> >> >> >> >> >> 3, For current plan is keep the Builder pattern when configuring, when >> we >> >> are familiar with the config framework, >> >> We will find that Hoodie*Config class are redundant and methods >> >> prefixed with "get" in HoodieWriteConfig are also redundant. >> >> >> >> >> >> In addition, I create a pr[1] for initializing with a demo. At this >> demo, >> >> I create >> >> MetricsGraphiteReporterOptions which contains HOST, PORT, PREFIX, and >> >> remove getGraphiteServerHost, >> >> getGraphiteServerPort, getGraphiteMetricPrefix in HoodieMetricsConfig. >> >> >> >> >> >> https://github.com/apache/incubator-hudi/pull/1094 >> >> >> >> >> >> Best, >> >> lamber-ken >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2019-12-11 02:35:30, "Balaji Varadarajan" <[email protected] >> > >> >> wrote: >> >> > Hi Lamber-Ken, >> >> >Thanks for the time writing the proposal and thinking about improving >> >> Hudi usability. >> >> >My preference would be to keep the Builder pattern when configuring. It >> >> is something I find it natural when configuring. It is not clear to me >> >> whether there is any external facing changes which changes this model. >> >> Would you mind adding some more details on the RFC. It would save time >> to >> >> read it in one place as opposed to checking out github repo :) >> >> >Thanks,Balaji.V >> >> > On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar < >> >> [email protected]> wrote: >> >> > >> >> > Hi , >> >> > >> >> >Thanks for the proposal. Some parts I agree, some parts I don't and >> some >> >> >parts are unclear >> >> > >> >> >Agree : >> >> >- On introducing a class that binds key, default value, provided value, >> >> and >> >> >also may be a doc along with it (?). >> >> >- Designing the framework to have fallback keys is good IMO. It helps >> us >> >> do >> >> >things like https://issues.apache.org/jira/browse/HUDI-89 >> >> > >> >> >Disagree : >> >> >- Not all configuration values are in HoodieWriteConfig, its not >> accurate. >> >> >Configs are already split by components into HoodieIndexConfig, >> >> >HoodieCompactionConfig etc.. >> >> >- There are helpers for all these conveniently located in >> >> >HoodieWriteConfig. I think some of the claims of usability seem >> subjective >> >> >to me, speaking from hands-on experience writing jobs. So, if you >> >> proposing >> >> >a large shake up (e.g not have a single properties file load all >> >> >components), I would love to understand this at more depth. From my >> >> >experience, well namespaced configs in a single properties file keeps >> it >> >> >simple and understandable. >> >> > >> >> >Unclear : >> >> >- What is impact on existing jobs - using RDD/WriteClient API, >> >> DataSource, >> >> >DeltaStreamer level? Do you intend to change namespacing of configs? >> >> > >> >> > >> >> >Thanks >> >> >Vinoth >> >> > >> >> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <[email protected]> wrote: >> >> > >> >> >> >> >> >> >> >> >> Hi, vino >> >> >> >> >> >> >> >> >> Reasonable, we can refactor this step by step. The first step now >> is to >> >> >> introduce the config framework. >> >> >> When our community is familiar with the config framework mechanism, >> it's >> >> >> easy to integrate FallbackKey in the future. >> >> >> >> >> >> >> >> >> Best, >> >> >> lamber-ken >> >> >> >> >> >> >> >> >> >> >> >> At 2019-12-10 11:51:22, "vino yang" <[email protected]> wrote: >> >> >> >Hi Lamber, >> >> >> > >> >> >> >Thanks for the proposal. +1 from my side. >> >> >> > >> >> >> >When it comes to configuration, it will involve how we handle >> >> deprecated >> >> >> >configuration items in the future. In my opinion, we need to take >> this >> >> >> into >> >> >> >consideration when designing. There are already some successful >> >> practices >> >> >> >for our reference. For example, Flink defines some deprecated >> >> >> >configurations as FallbackKey[1]. Maybe we can learn from these >> >> designs. >> >> >> > >> >> >> >WDYT? >> >> >> > >> >> >> >[1]: >> >> >> > >> >> >> >> >> >> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java >> >> >> > >> >> >> >Best, >> >> >> >Vino >> >> >> > >> >> >> >lamberken <[email protected]> 于2019年12月9日周一 下午11:19写道: >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> Hi, all >> >> >> >> >> >> >> >> >> >> >> >> Currently, many configuration items and their default values are >> >> >> dispersed >> >> >> >> in the config file like HoodieWriteConfig. It’s very confused for >> >> >> >> developers, and it's easy for developers to use them in a wrong >> place >> >> >> >> especially when there are more and more configuration items. If we >> >> can >> >> >> >> solve this, developers will benefit from it and the code structure >> >> will >> >> >> be >> >> >> >> more concise. >> >> >> >> >> >> >> >> >> >> >> >> I had create a JIRA[1] and a under discuss RFC[2] to explain how >> to >> >> >> solve >> >> >> >> the problem, if you are interested in this, you can visit jira and >> >> RFC >> >> >> for >> >> >> >> detail. Any comments and feedback are welcome, WDYT? >> >> >> >> >> >> >> >> >> >> >> >> Best, >> >> >> >> lamber-ken >> >> >> >> >> >> >> >> >> >> >> >> [1] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375 >> >> >> >> [2] >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project >> >> >> >> >> >> > > >-- >Regards, >-Sivabalan
