Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs
Thanks for the reply ticket filed : https://issues.apache.org/jira/browse/HUDI-1928 > 2021年5月24日 下午6:41,vino yang 写道: > > also +1, > > IMO, simplifying the complexity of configuration and reducing the cost of > entry for new users are very important for improving user experience. > > It is a good proposal to simplify the configuration complexity by > introducing some built-in enumerations. > > But at the same time, it is necessary to allow the fully qualified name of > the configuration class (for advanced requirements that have > self-extension). > > Best, > Vino > > Pratyaksh Sharma 于2021年5月22日周六 下午8:24写道: > >> +1 from my side. >> >> Introducing new configs based on types definitely improves user experience >> as compared to supplying full class names. We just need to define the enums >> properly. >> >> On Sat, May 22, 2021 at 9:13 AM wangxianghu wrote: >> >>> Hi community: >>> >>> >>> >>> Here I want to start a discussion about improving the hudi user >> experience. >>> >>> >>> >>> >>> Now hudi has more and more users all over the world, but most of them >>> don’t know hudi like uber engineers or us. >>> >>> when they start hudi tasks, they need to do a lot of configuration,many >> of >>> which are not user-friendly. >>> >>> >>> >>> >>> such as: >>> ``` >>> >>> hoodie.datasource.write.keygenerator.class -> >>> org.apache.hudi.keygen.SimpleKeyGenerator >>> >>> hoodie.datasource.write.payload.class -> >>> org.apache.hudi.OverwriteWithLatestAvroPayload` >>> >>> --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema >>> >>> --transformer-class -> full class names to act transform >>> >>> --sync-tool-classes -> full class names of sync tool >>> >>> --source-class -> Subclass of org.apache.hudi.utilities.sources >>> ... >>> ``` >>> >>> I think asking users to provide the full name of the class is not very >>> friendly, especially for new users. >>> >>> so, maybe we can provide more ways to configure parameters, just like the >>> case of `HoodieIndex`. >>> >>> >>> >>> >>> In `HoodieIndex` case, The users can configure one of the index type or >>> index class names to tell hudi which index to use. >>> >>> ``` >>> >>> hoodie.index.type -> HBASE >>> >>> ``` >>> >>> or >>> >>> ``` >>> >>> hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex >>> >>> ``` >>> >>> I believe more users like the `hoodie.index.type` way. >>> >>> >>> >>> >>> So, I think we can make some configuration above support being set >> through >>> type, and keep the way of class name configuration at the same time, in >>> case of some users need customizing functions on their own. >>> >>> >>> >>> >>> I'm looking forward to your feedback. Any suggestions are appreciated >>
Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs
also +1, IMO, simplifying the complexity of configuration and reducing the cost of entry for new users are very important for improving user experience. It is a good proposal to simplify the configuration complexity by introducing some built-in enumerations. But at the same time, it is necessary to allow the fully qualified name of the configuration class (for advanced requirements that have self-extension). Best, Vino Pratyaksh Sharma 于2021年5月22日周六 下午8:24写道: > +1 from my side. > > Introducing new configs based on types definitely improves user experience > as compared to supplying full class names. We just need to define the enums > properly. > > On Sat, May 22, 2021 at 9:13 AM wangxianghu wrote: > > > Hi community: > > > > > > > > Here I want to start a discussion about improving the hudi user > experience. > > > > > > > > > > Now hudi has more and more users all over the world, but most of them > > don’t know hudi like uber engineers or us. > > > > when they start hudi tasks, they need to do a lot of configuration,many > of > > which are not user-friendly. > > > > > > > > > > such as: > > ``` > > > > hoodie.datasource.write.keygenerator.class -> > > org.apache.hudi.keygen.SimpleKeyGenerator > > > > hoodie.datasource.write.payload.class -> > > org.apache.hudi.OverwriteWithLatestAvroPayload` > > > > --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema > > > > --transformer-class -> full class names to act transform > > > > --sync-tool-classes -> full class names of sync tool > > > > --source-class -> Subclass of org.apache.hudi.utilities.sources > > ... > > ``` > > > > I think asking users to provide the full name of the class is not very > > friendly, especially for new users. > > > > so, maybe we can provide more ways to configure parameters, just like the > > case of `HoodieIndex`. > > > > > > > > > > In `HoodieIndex` case, The users can configure one of the index type or > > index class names to tell hudi which index to use. > > > > ``` > > > > hoodie.index.type -> HBASE > > > > ``` > > > > or > > > > ``` > > > > hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex > > > > ``` > > > > I believe more users like the `hoodie.index.type` way. > > > > > > > > > > So, I think we can make some configuration above support being set > through > > type, and keep the way of class name configuration at the same time, in > > case of some users need customizing functions on their own. > > > > > > > > > > I'm looking forward to your feedback. Any suggestions are appreciated >
Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs
+1 from my side. Introducing new configs based on types definitely improves user experience as compared to supplying full class names. We just need to define the enums properly. On Sat, May 22, 2021 at 9:13 AM wangxianghu wrote: > Hi community: > > > > Here I want to start a discussion about improving the hudi user experience. > > > > > Now hudi has more and more users all over the world, but most of them > don’t know hudi like uber engineers or us. > > when they start hudi tasks, they need to do a lot of configuration,many of > which are not user-friendly. > > > > > such as: > ``` > > hoodie.datasource.write.keygenerator.class -> > org.apache.hudi.keygen.SimpleKeyGenerator > > hoodie.datasource.write.payload.class -> > org.apache.hudi.OverwriteWithLatestAvroPayload` > > --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema > > --transformer-class -> full class names to act transform > > --sync-tool-classes -> full class names of sync tool > > --source-class -> Subclass of org.apache.hudi.utilities.sources > ... > ``` > > I think asking users to provide the full name of the class is not very > friendly, especially for new users. > > so, maybe we can provide more ways to configure parameters, just like the > case of `HoodieIndex`. > > > > > In `HoodieIndex` case, The users can configure one of the index type or > index class names to tell hudi which index to use. > > ``` > > hoodie.index.type -> HBASE > > ``` > > or > > ``` > > hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex > > ``` > > I believe more users like the `hoodie.index.type` way. > > > > > So, I think we can make some configuration above support being set through > type, and keep the way of class name configuration at the same time, in > case of some users need customizing functions on their own. > > > > > I'm looking forward to your feedback. Any suggestions are appreciated
[DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs
Hi community: Here I want to start a discussion about improving the hudi user experience. Now hudi has more and more users all over the world, but most of them don’t know hudi like uber engineers or us. when they start hudi tasks, they need to do a lot of configuration,many of which are not user-friendly. such as: ``` hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.SimpleKeyGenerator hoodie.datasource.write.payload.class -> org.apache.hudi.OverwriteWithLatestAvroPayload` --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema --transformer-class -> full class names to act transform --sync-tool-classes -> full class names of sync tool --source-class -> Subclass of org.apache.hudi.utilities.sources ... ``` I think asking users to provide the full name of the class is not very friendly, especially for new users. so, maybe we can provide more ways to configure parameters, just like the case of `HoodieIndex`. In `HoodieIndex` case, The users can configure one of the index type or index class names to tell hudi which index to use. ``` hoodie.index.type -> HBASE ``` or ``` hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex ``` I believe more users like the `hoodie.index.type` way. So, I think we can make some configuration above support being set through type, and keep the way of class name configuration at the same time, in case of some users need customizing functions on their own. I'm looking forward to your feedback. Any suggestions are appreciated