hi, @Sivabalan
 
Yes, thanks very much for help me explain my initial proposal.


Answer your question, we can call HoodieWriteConfig as a SystemConfig, we need 
to pass it everywhere. 
Actually, it may just contains a few custom configurations( does not include 
default configurations)
Because each component has its own ConfigOptions.


The old version HoodieWriteConfig includes all keys(custom configurations, 
default configurations), it is a fat.


Best,
lamber-ken








At 2019-12-12 03:14:11, "Sivabalan" <[email protected]> wrote:
>Let me summarize your initial proposal and then will get into details.
>- Introduce ConfigOptions for ease of handling of default values.
>- Remove all Hoodie*Config classes and just have HoodieWriteConfig. What
>this means is that, every other config file will be replaced by
>ConfigOptions. eg, HoodieIndexConfigOption, HoodieCompactionConfigOption,
>etc.
>- Config option will take care of returning defaults for any property, even
>if an entire Config(eg IndexConfig) is not explicitly set.
>
>Here are the positives I see.
>- By way of having component level ConfigOptions, we bucketize the configs
>and have defaults set(same as before)
>- User doesn't need to set each component's config(eg IndexConfig)
>explicitly with HoodieWriteConfig.
>
>But have one question:
>- I see Bucketizing only in write path. How does one get hold of
>IndexConfigOptions as a consumer?  For eg, If some class is using just
>IndexConfig alone, how will it consume? From your eg, I see only
>HoodieWriteConfig. Do we pass in HoodieWriteConfig everywhere then?
>Wouldn't that contradicts your initial proposal to not have a fat config
>class? May be can you expand your example below to show how a consumer of
>IndexConfig look like.
>
>Your eg:
>/**
> * New version
> */
>// set value overrite the default value
>HoodieWriteConfig config = new HoodieWriteConfig();
>config.set(HoodieIndexConfigOptions.INDEX_TYPE,
>HoodieIndex.IndexType.HBASE.name <http://hoodieindex.indextype.hbase.name/>
>())
>
>
>
>
>On Wed, Dec 11, 2019 at 8:33 AM lamberken <[email protected]> wrote:
>
>>
>>
>> Hi,
>>
>>
>>
>>
>> On 1,2. Yes, you are right, moving the getter to the component level
>> Config class itself.
>>
>>
>> On 3, HoodieWriteConfig can also set value through ConfigOption, small
>> code snippets.
>> From the bellow snippets, we can see that clients need to know each
>> component's builders
>> and also call their "with" methods to override the default value in old
>> version.
>>
>>
>> But, in new version, clients just need to know each component's public
>> config options, just like constants.
>> So, these builders are redundant.
>>
>>
>> /---------------------------------------------------------------------------/
>>
>>
>> public class HoodieIndexConfigOptions {
>>   public static final ConfigOption<String> INDEX_TYPE = ConfigOption
>>           .key("hoodie.index.type")
>>           .defaultValue(HoodieIndex.IndexType.BLOOM.name());
>> }
>>
>>
>> public class HoodieWriteConfig {
>>   public void setString(ConfigOption<String> option, String value) {
>>     this.props.put(option.key(), value);
>>   }
>> }
>>
>>
>>
>>
>> /**
>>  * New version
>>  */
>> // set value overrite the default value
>> HoodieWriteConfig config = new HoodieWriteConfig();
>> config.set(HoodieIndexConfigOptions.INDEX_TYPE,
>> HoodieIndex.IndexType.HBASE.name())
>>
>>
>>
>>
>> /**
>>  * Old version
>>  */
>> HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
>>
>> builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
>>
>>
>>
>> /---------------------------------------------------------------------------/
>>
>>
>> Another, users use hudi like bellow, here're all keys.
>>
>> /---------------------------------------------------------------------------/
>>
>>
>> df.write.format("hudi").
>>     option("hoodie.insert.shuffle.parallelism", "10").
>>     option("hoodie.upsert.shuffle.parallelism", "10").
>>     option("hoodie.delete.shuffle.parallelism", "10").
>>     option("hoodie.bulkinsert.shuffle.parallelism", "10").
>>     option("hoodie.datasource.write.recordkey.field", "name").
>>     option("hoodie.datasource.write.partitionpath.field", "location").
>>     option("hoodie.datasource.write.precombine.field", "ts").
>>     option("hoodie.table.name", tableName).
>>     mode(Overwrite).
>>     save(basePath);
>>
>>
>>
>> /---------------------------------------------------------------------------/
>>
>>
>>
>>
>> Last, as I responsed to @vino, it's reasonable to handle fallbackkeys. I
>> think we need to do this step by step,
>> it's easy to integrate FallbackKey in the future, it is not what we need
>> right now in my opinion.
>>
>>
>> If some places are still not very clear, feel free to feedback.
>>
>>
>>
>>
>> Best,
>> lamber-ken
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2019-12-11 23:41:31, "Vinoth Chandar" <[email protected]> wrote:
>> >Hi Lamber-ken,
>> >
>> >I looked at the sample PR you put up as well.
>> >
>> >On 1,2 => Seems your intent is to replace these with moving the getter to
>> >the component level Config class itself? I am fine with that (although I
>> >think its not that big of a hurdle really to use atm). But, once we do
>> that
>> >we could pass just the specific component config into parts of code versus
>> >passing in the entire HoodieWriteConfig object. I am fine with moving the
>> >classes to a ConfigOption class as you suggested as well.
>> >
>> >On 3, I still we feel we will need the builder pattern going forward. to
>> >build the HoodieWriteConfig object. Like below? Cannot understand why we
>> >would want to change this. Could you please clarify?
>> >
>> >HoodieWriteConfig.Builder builder =
>> >
>> HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
>> >true)
>> >
>> .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName)
>> >            // Inline compaction is disabled for continuous mode.
>> >otherwise enabled for MOR
>> >
>> .withInlineCompaction(cfg.isInlineCompactionEnabled()).build())
>> >        .forTable(cfg.targetTableName)
>> >
>> .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
>> >        .withAutoCommit(false).withProps(props);
>> >
>> >
>> >Typically, we write RFCs for large changes that breaks existing behavior
>> or
>> >introduces significantly complex new features.. If you are just planning
>> to
>> >do the refactoring into ConfigOption class, per se you don't need a RFC.
>> >But , if you plan to address the fallback keys (or) your changes are going
>> >to break/change existing jobs, we would need a RFC.
>> >
>> >>> It is not clear to me whether there is any external facing changes
>> which
>> >changes this model.
>> >I am still unclear on this as well. can you please explicitly clarify?
>> >
>> >thanks
>> >vinoth
>> >
>> >
>> >On Tue, Dec 10, 2019 at 12:35 PM lamberken <[email protected]> wrote:
>> >
>> >>
>> >> Hi, @Balaji @Vinoth
>> >>
>> >>
>> >> I'm sorry, some places are not very clear,
>> >>
>> >>
>> >> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, etc..
>> already
>> >> defined in project.
>> >>    But we get property value by methods which defined in
>> >> HoodieWriteConfig, like HoodieWriteConfig#getParquetMaxFileSize,
>> >>    HoodieWriteConfig#getParquetBlockSize, etc. It's means that
>> >> Hoodie*Config are redundant.
>> >>
>> >>
>> >> 2, These Hoodie*Config classes are used to set default value when call
>> >> their build method, nothing else.
>> >>
>> >>
>> >> 3, For current plan is keep the Builder pattern when configuring, when
>> we
>> >> are familiar with the config framework,
>> >>    We will find that Hoodie*Config class are redundant and methods
>> >> prefixed with "get" in HoodieWriteConfig are also redundant.
>> >>
>> >>
>> >> In addition, I create a pr[1] for initializing with a demo. At this
>> demo,
>> >> I create
>> >> MetricsGraphiteReporterOptions which contains HOST, PORT, PREFIX, and
>> >> remove getGraphiteServerHost,
>> >> getGraphiteServerPort, getGraphiteMetricPrefix in HoodieMetricsConfig.
>> >>
>> >>
>> >> https://github.com/apache/incubator-hudi/pull/1094
>> >>
>> >>
>> >> Best,
>> >> lamber-ken
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2019-12-11 02:35:30, "Balaji Varadarajan" <[email protected]
>> >
>> >> wrote:
>> >> > Hi Lamber-Ken,
>> >> >Thanks for the time writing the proposal and thinking about improving
>> >> Hudi usability.
>> >> >My preference would be to keep the Builder pattern when configuring. It
>> >> is something I find it natural when configuring. It is not clear to me
>> >> whether there is any external facing changes which changes this model.
>> >> Would you mind adding some more details on the RFC. It would save time
>> to
>> >> read it in one place as opposed to checking out github repo :)
>> >> >Thanks,Balaji.V
>> >> >    On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar <
>> >> [email protected]> wrote:
>> >> >
>> >> > Hi ,
>> >> >
>> >> >Thanks for the proposal. Some parts I agree, some parts I don't and
>> some
>> >> >parts are unclear
>> >> >
>> >> >Agree :
>> >> >- On introducing a class that binds key, default value, provided value,
>> >> and
>> >> >also may be a doc along with it (?).
>> >> >- Designing the framework to have fallback keys is good IMO. It helps
>> us
>> >> do
>> >> >things like https://issues.apache.org/jira/browse/HUDI-89
>> >> >
>> >> >Disagree :
>> >> >- Not all configuration values are in HoodieWriteConfig, its not
>> accurate.
>> >> >Configs are already split by components into HoodieIndexConfig,
>> >> >HoodieCompactionConfig etc..
>> >> >- There are helpers for all these conveniently located in
>> >> >HoodieWriteConfig. I think some of the claims of usability seem
>> subjective
>> >> >to me, speaking from hands-on experience writing jobs. So, if you
>> >> proposing
>> >> >a large shake up (e.g not have a single properties file load all
>> >> >components), I would love to understand this at more depth. From my
>> >> >experience, well namespaced configs in a single properties file keeps
>> it
>> >> >simple and understandable.
>> >> >
>> >> >Unclear :
>> >> >- What is impact on existing jobs - using  RDD/WriteClient API,
>> >> DataSource,
>> >> >DeltaStreamer level? Do you intend to change namespacing of configs?
>> >> >
>> >> >
>> >> >Thanks
>> >> >Vinoth
>> >> >
>> >> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <[email protected]> wrote:
>> >> >
>> >> >>
>> >> >>
>> >> >> Hi, vino
>> >> >>
>> >> >>
>> >> >> Reasonable,  we can refactor this step by step. The first step now
>> is to
>> >> >> introduce the config framework.
>> >> >> When our community is familiar with the config framework mechanism,
>> it's
>> >> >> easy to integrate FallbackKey in the future.
>> >> >>
>> >> >>
>> >> >> Best,
>> >> >> lamber-ken
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2019-12-10 11:51:22, "vino yang" <[email protected]> wrote:
>> >> >> >Hi Lamber,
>> >> >> >
>> >> >> >Thanks for the proposal. +1 from my side.
>> >> >> >
>> >> >> >When it comes to configuration, it will involve how we handle
>> >> deprecated
>> >> >> >configuration items in the future. In my opinion, we need to take
>> this
>> >> >> into
>> >> >> >consideration when designing. There are already some successful
>> >> practices
>> >> >> >for our reference. For example, Flink defines some deprecated
>> >> >> >configurations as FallbackKey[1]. Maybe we can learn from these
>> >> designs.
>> >> >> >
>> >> >> >WDYT?
>> >> >> >
>> >> >> >[1]:
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java
>> >> >> >
>> >> >> >Best,
>> >> >> >Vino
>> >> >> >
>> >> >> >lamberken <[email protected]> 于2019年12月9日周一 下午11:19写道:
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> Hi, all
>> >> >> >>
>> >> >> >>
>> >> >> >> Currently, many configuration items and their default values are
>> >> >> dispersed
>> >> >> >> in the config file like HoodieWriteConfig. It’s very confused for
>> >> >> >> developers, and it's easy for developers to use them in a wrong
>> place
>> >> >> >> especially when there are more and more configuration items. If we
>> >> can
>> >> >> >> solve this, developers will benefit from it and the code structure
>> >> will
>> >> >> be
>> >> >> >> more concise.
>> >> >> >>
>> >> >> >>
>> >> >> >> I had create a JIRA[1] and a under discuss RFC[2] to explain how
>> to
>> >> >> solve
>> >> >> >> the problem, if you are interested in this, you can visit jira and
>> >> RFC
>> >> >> for
>> >> >> >> detail. Any comments and feedback are welcome, WDYT?
>> >> >> >>
>> >> >> >>
>> >> >> >> Best,
>> >> >> >> lamber-ken
>> >> >> >>
>> >> >> >>
>> >> >> >> [1] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375
>> >> >> >> [2]
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project
>> >> >>
>> >>
>>
>
>
>-- 
>Regards,
>-Sivabalan

Reply via email to