Let me summarize your initial proposal and then will get into details. - Introduce ConfigOptions for ease of handling of default values. - Remove all Hoodie*Config classes and just have HoodieWriteConfig. What this means is that, every other config file will be replaced by ConfigOptions. eg, HoodieIndexConfigOption, HoodieCompactionConfigOption, etc. - Config option will take care of returning defaults for any property, even if an entire Config(eg IndexConfig) is not explicitly set.
Here are the positives I see. - By way of having component level ConfigOptions, we bucketize the configs and have defaults set(same as before) - User doesn't need to set each component's config(eg IndexConfig) explicitly with HoodieWriteConfig. But have one question: - I see Bucketizing only in write path. How does one get hold of IndexConfigOptions as a consumer? For eg, If some class is using just IndexConfig alone, how will it consume? From your eg, I see only HoodieWriteConfig. Do we pass in HoodieWriteConfig everywhere then? Wouldn't that contradicts your initial proposal to not have a fat config class? May be can you expand your example below to show how a consumer of IndexConfig look like. Your eg: /** * New version */ // set value overrite the default value HoodieWriteConfig config = new HoodieWriteConfig(); config.set(HoodieIndexConfigOptions.INDEX_TYPE, HoodieIndex.IndexType.HBASE.name <http://hoodieindex.indextype.hbase.name/> ()) On Wed, Dec 11, 2019 at 8:33 AM lamberken <[email protected]> wrote: > > > Hi, > > > > > On 1,2. Yes, you are right, moving the getter to the component level > Config class itself. > > > On 3, HoodieWriteConfig can also set value through ConfigOption, small > code snippets. > From the bellow snippets, we can see that clients need to know each > component's builders > and also call their "with" methods to override the default value in old > version. > > > But, in new version, clients just need to know each component's public > config options, just like constants. > So, these builders are redundant. > > > /---------------------------------------------------------------------------/ > > > public class HoodieIndexConfigOptions { > public static final ConfigOption<String> INDEX_TYPE = ConfigOption > .key("hoodie.index.type") > .defaultValue(HoodieIndex.IndexType.BLOOM.name()); > } > > > public class HoodieWriteConfig { > public void setString(ConfigOption<String> option, String value) { > this.props.put(option.key(), value); > } > } > > > > > /** > * New version > */ > // set value overrite the default value > HoodieWriteConfig config = new HoodieWriteConfig(); > config.set(HoodieIndexConfigOptions.INDEX_TYPE, > HoodieIndex.IndexType.HBASE.name()) > > > > > /** > * Old version > */ > HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder() > > builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) > > > > /---------------------------------------------------------------------------/ > > > Another, users use hudi like bellow, here're all keys. > > /---------------------------------------------------------------------------/ > > > df.write.format("hudi"). > option("hoodie.insert.shuffle.parallelism", "10"). > option("hoodie.upsert.shuffle.parallelism", "10"). > option("hoodie.delete.shuffle.parallelism", "10"). > option("hoodie.bulkinsert.shuffle.parallelism", "10"). > option("hoodie.datasource.write.recordkey.field", "name"). > option("hoodie.datasource.write.partitionpath.field", "location"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.table.name", tableName). > mode(Overwrite). > save(basePath); > > > > /---------------------------------------------------------------------------/ > > > > > Last, as I responsed to @vino, it's reasonable to handle fallbackkeys. I > think we need to do this step by step, > it's easy to integrate FallbackKey in the future, it is not what we need > right now in my opinion. > > > If some places are still not very clear, feel free to feedback. > > > > > Best, > lamber-ken > > > > > > > > > > > > > At 2019-12-11 23:41:31, "Vinoth Chandar" <[email protected]> wrote: > >Hi Lamber-ken, > > > >I looked at the sample PR you put up as well. > > > >On 1,2 => Seems your intent is to replace these with moving the getter to > >the component level Config class itself? I am fine with that (although I > >think its not that big of a hurdle really to use atm). But, once we do > that > >we could pass just the specific component config into parts of code versus > >passing in the entire HoodieWriteConfig object. I am fine with moving the > >classes to a ConfigOption class as you suggested as well. > > > >On 3, I still we feel we will need the builder pattern going forward. to > >build the HoodieWriteConfig object. Like below? Cannot understand why we > >would want to change this. Could you please clarify? > > > >HoodieWriteConfig.Builder builder = > > > HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes, > >true) > > > .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName) > > // Inline compaction is disabled for continuous mode. > >otherwise enabled for MOR > > > .withInlineCompaction(cfg.isInlineCompactionEnabled()).build()) > > .forTable(cfg.targetTableName) > > > .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) > > .withAutoCommit(false).withProps(props); > > > > > >Typically, we write RFCs for large changes that breaks existing behavior > or > >introduces significantly complex new features.. If you are just planning > to > >do the refactoring into ConfigOption class, per se you don't need a RFC. > >But , if you plan to address the fallback keys (or) your changes are going > >to break/change existing jobs, we would need a RFC. > > > >>> It is not clear to me whether there is any external facing changes > which > >changes this model. > >I am still unclear on this as well. can you please explicitly clarify? > > > >thanks > >vinoth > > > > > >On Tue, Dec 10, 2019 at 12:35 PM lamberken <[email protected]> wrote: > > > >> > >> Hi, @Balaji @Vinoth > >> > >> > >> I'm sorry, some places are not very clear, > >> > >> > >> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, etc.. > already > >> defined in project. > >> But we get property value by methods which defined in > >> HoodieWriteConfig, like HoodieWriteConfig#getParquetMaxFileSize, > >> HoodieWriteConfig#getParquetBlockSize, etc. It's means that > >> Hoodie*Config are redundant. > >> > >> > >> 2, These Hoodie*Config classes are used to set default value when call > >> their build method, nothing else. > >> > >> > >> 3, For current plan is keep the Builder pattern when configuring, when > we > >> are familiar with the config framework, > >> We will find that Hoodie*Config class are redundant and methods > >> prefixed with "get" in HoodieWriteConfig are also redundant. > >> > >> > >> In addition, I create a pr[1] for initializing with a demo. At this > demo, > >> I create > >> MetricsGraphiteReporterOptions which contains HOST, PORT, PREFIX, and > >> remove getGraphiteServerHost, > >> getGraphiteServerPort, getGraphiteMetricPrefix in HoodieMetricsConfig. > >> > >> > >> https://github.com/apache/incubator-hudi/pull/1094 > >> > >> > >> Best, > >> lamber-ken > >> > >> > >> > >> > >> > >> > >> > >> At 2019-12-11 02:35:30, "Balaji Varadarajan" <[email protected] > > > >> wrote: > >> > Hi Lamber-Ken, > >> >Thanks for the time writing the proposal and thinking about improving > >> Hudi usability. > >> >My preference would be to keep the Builder pattern when configuring. It > >> is something I find it natural when configuring. It is not clear to me > >> whether there is any external facing changes which changes this model. > >> Would you mind adding some more details on the RFC. It would save time > to > >> read it in one place as opposed to checking out github repo :) > >> >Thanks,Balaji.V > >> > On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar < > >> [email protected]> wrote: > >> > > >> > Hi , > >> > > >> >Thanks for the proposal. Some parts I agree, some parts I don't and > some > >> >parts are unclear > >> > > >> >Agree : > >> >- On introducing a class that binds key, default value, provided value, > >> and > >> >also may be a doc along with it (?). > >> >- Designing the framework to have fallback keys is good IMO. It helps > us > >> do > >> >things like https://issues.apache.org/jira/browse/HUDI-89 > >> > > >> >Disagree : > >> >- Not all configuration values are in HoodieWriteConfig, its not > accurate. > >> >Configs are already split by components into HoodieIndexConfig, > >> >HoodieCompactionConfig etc.. > >> >- There are helpers for all these conveniently located in > >> >HoodieWriteConfig. I think some of the claims of usability seem > subjective > >> >to me, speaking from hands-on experience writing jobs. So, if you > >> proposing > >> >a large shake up (e.g not have a single properties file load all > >> >components), I would love to understand this at more depth. From my > >> >experience, well namespaced configs in a single properties file keeps > it > >> >simple and understandable. > >> > > >> >Unclear : > >> >- What is impact on existing jobs - using RDD/WriteClient API, > >> DataSource, > >> >DeltaStreamer level? Do you intend to change namespacing of configs? > >> > > >> > > >> >Thanks > >> >Vinoth > >> > > >> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <[email protected]> wrote: > >> > > >> >> > >> >> > >> >> Hi, vino > >> >> > >> >> > >> >> Reasonable, we can refactor this step by step. The first step now > is to > >> >> introduce the config framework. > >> >> When our community is familiar with the config framework mechanism, > it's > >> >> easy to integrate FallbackKey in the future. > >> >> > >> >> > >> >> Best, > >> >> lamber-ken > >> >> > >> >> > >> >> > >> >> At 2019-12-10 11:51:22, "vino yang" <[email protected]> wrote: > >> >> >Hi Lamber, > >> >> > > >> >> >Thanks for the proposal. +1 from my side. > >> >> > > >> >> >When it comes to configuration, it will involve how we handle > >> deprecated > >> >> >configuration items in the future. In my opinion, we need to take > this > >> >> into > >> >> >consideration when designing. There are already some successful > >> practices > >> >> >for our reference. For example, Flink defines some deprecated > >> >> >configurations as FallbackKey[1]. Maybe we can learn from these > >> designs. > >> >> > > >> >> >WDYT? > >> >> > > >> >> >[1]: > >> >> > > >> >> > >> > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java > >> >> > > >> >> >Best, > >> >> >Vino > >> >> > > >> >> >lamberken <[email protected]> 于2019年12月9日周一 下午11:19写道: > >> >> > > >> >> >> > >> >> >> > >> >> >> Hi, all > >> >> >> > >> >> >> > >> >> >> Currently, many configuration items and their default values are > >> >> dispersed > >> >> >> in the config file like HoodieWriteConfig. It’s very confused for > >> >> >> developers, and it's easy for developers to use them in a wrong > place > >> >> >> especially when there are more and more configuration items. If we > >> can > >> >> >> solve this, developers will benefit from it and the code structure > >> will > >> >> be > >> >> >> more concise. > >> >> >> > >> >> >> > >> >> >> I had create a JIRA[1] and a under discuss RFC[2] to explain how > to > >> >> solve > >> >> >> the problem, if you are interested in this, you can visit jira and > >> RFC > >> >> for > >> >> >> detail. Any comments and feedback are welcome, WDYT? > >> >> >> > >> >> >> > >> >> >> Best, > >> >> >> lamber-ken > >> >> >> > >> >> >> > >> >> >> [1] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375 > >> >> >> [2] > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project > >> >> > >> > -- Regards, -Sivabalan
