Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs

2021-05-24 Thread wangxianghu
Thanks for the reply 
ticket filed : https://issues.apache.org/jira/browse/HUDI-1928

> 2021年5月24日 下午6:41,vino yang  写道:
> 
> also +1,
> 
> IMO, simplifying the complexity of configuration and reducing the cost of
> entry for new users are very important for improving user experience.
> 
> It is a good proposal to simplify the configuration complexity by
> introducing some built-in enumerations.
> 
> But at the same time, it is necessary to allow the fully qualified name of
> the configuration class (for advanced requirements that have
> self-extension).
> 
> Best,
> Vino
> 
> Pratyaksh Sharma  于2021年5月22日周六 下午8:24写道:
> 
>> +1 from my side.
>> 
>> Introducing new configs based on types definitely improves user experience
>> as compared to supplying full class names. We just need to define the enums
>> properly.
>> 
>> On Sat, May 22, 2021 at 9:13 AM wangxianghu  wrote:
>> 
>>> Hi community:
>>> 
>>> 
>>> 
>>> Here I want to start a discussion about improving the hudi user
>> experience.
>>> 
>>> 
>>> 
>>> 
>>> Now hudi has more and more users all over the world, but most of them
>>> don’t know hudi like uber engineers or us.
>>> 
>>> when they start hudi tasks, they need to do a lot of configuration,many
>> of
>>> which are not user-friendly.
>>> 
>>> 
>>> 
>>> 
>>> such as:
>>> ```
>>> 
>>> hoodie.datasource.write.keygenerator.class   ->
>>> org.apache.hudi.keygen.SimpleKeyGenerator
>>> 
>>> hoodie.datasource.write.payload.class ->
>>> org.apache.hudi.OverwriteWithLatestAvroPayload`
>>> 
>>> --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema
>>> 
>>> --transformer-class -> full class names to act transform
>>> 
>>> --sync-tool-classes -> full class names of sync tool
>>> 
>>> --source-class -> Subclass of org.apache.hudi.utilities.sources
>>> ...
>>> ```
>>> 
>>> I think asking users to provide the full name of the class is not very
>>> friendly, especially for new users.
>>> 
>>> so, maybe we can provide more ways to configure parameters, just like the
>>> case of `HoodieIndex`.
>>> 
>>> 
>>> 
>>> 
>>> In `HoodieIndex` case, The users can configure one of the index type or
>>> index class names to tell hudi which index to use.
>>> 
>>> ```
>>> 
>>> hoodie.index.type -> HBASE
>>> 
>>> ```
>>> 
>>> or
>>> 
>>> ```
>>> 
>>> hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex
>>> 
>>> ```
>>> 
>>> I believe more users like the `hoodie.index.type` way.
>>> 
>>> 
>>> 
>>> 
>>> So, I think we can make some configuration above support being set
>> through
>>> type, and keep the way of class name configuration at the same time, in
>>> case of some users need customizing functions on their own.
>>> 
>>> 
>>> 
>>> 
>>> I'm looking forward to your feedback. Any suggestions are appreciated
>> 



Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs

2021-05-24 Thread vino yang
also +1,

IMO, simplifying the complexity of configuration and reducing the cost of
entry for new users are very important for improving user experience.

It is a good proposal to simplify the configuration complexity by
introducing some built-in enumerations.

But at the same time, it is necessary to allow the fully qualified name of
the configuration class (for advanced requirements that have
self-extension).

Best,
Vino

Pratyaksh Sharma  于2021年5月22日周六 下午8:24写道:

> +1 from my side.
>
> Introducing new configs based on types definitely improves user experience
> as compared to supplying full class names. We just need to define the enums
> properly.
>
> On Sat, May 22, 2021 at 9:13 AM wangxianghu  wrote:
>
> > Hi community:
> >
> >
> >
> > Here I want to start a discussion about improving the hudi user
> experience.
> >
> >
> >
> >
> > Now hudi has more and more users all over the world, but most of them
> > don’t know hudi like uber engineers or us.
> >
> > when they start hudi tasks, they need to do a lot of configuration,many
> of
> > which are not user-friendly.
> >
> >
> >
> >
> > such as:
> > ```
> >
> > hoodie.datasource.write.keygenerator.class   ->
> > org.apache.hudi.keygen.SimpleKeyGenerator
> >
> > hoodie.datasource.write.payload.class ->
> > org.apache.hudi.OverwriteWithLatestAvroPayload`
> >
> > --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema
> >
> > --transformer-class -> full class names to act transform
> >
> > --sync-tool-classes -> full class names of sync tool
> >
> > --source-class -> Subclass of org.apache.hudi.utilities.sources
> > ...
> > ```
> >
> > I think asking users to provide the full name of the class is not very
> > friendly, especially for new users.
> >
> > so, maybe we can provide more ways to configure parameters, just like the
> > case of `HoodieIndex`.
> >
> >
> >
> >
> > In `HoodieIndex` case, The users can configure one of the index type or
> > index class names to tell hudi which index to use.
> >
> > ```
> >
> > hoodie.index.type -> HBASE
> >
> > ```
> >
> > or
> >
> > ```
> >
> > hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex
> >
> > ```
> >
> > I believe more users like the `hoodie.index.type` way.
> >
> >
> >
> >
> > So, I think we can make some configuration above support being set
> through
> > type, and keep the way of class name configuration at the same time, in
> > case of some users need customizing functions on their own.
> >
> >
> >
> >
> > I'm looking forward to your feedback. Any suggestions are appreciated
>


Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs

2021-05-22 Thread Pratyaksh Sharma
+1 from my side.

Introducing new configs based on types definitely improves user experience
as compared to supplying full class names. We just need to define the enums
properly.

On Sat, May 22, 2021 at 9:13 AM wangxianghu  wrote:

> Hi community:
>
>
>
> Here I want to start a discussion about improving the hudi user experience.
>
>
>
>
> Now hudi has more and more users all over the world, but most of them
> don’t know hudi like uber engineers or us.
>
> when they start hudi tasks, they need to do a lot of configuration,many of
> which are not user-friendly.
>
>
>
>
> such as:
> ```
>
> hoodie.datasource.write.keygenerator.class   ->
> org.apache.hudi.keygen.SimpleKeyGenerator
>
> hoodie.datasource.write.payload.class ->
> org.apache.hudi.OverwriteWithLatestAvroPayload`
>
> --schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema
>
> --transformer-class -> full class names to act transform
>
> --sync-tool-classes -> full class names of sync tool
>
> --source-class -> Subclass of org.apache.hudi.utilities.sources
> ...
> ```
>
> I think asking users to provide the full name of the class is not very
> friendly, especially for new users.
>
> so, maybe we can provide more ways to configure parameters, just like the
> case of `HoodieIndex`.
>
>
>
>
> In `HoodieIndex` case, The users can configure one of the index type or
> index class names to tell hudi which index to use.
>
> ```
>
> hoodie.index.type -> HBASE
>
> ```
>
> or
>
> ```
>
> hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex
>
> ```
>
> I believe more users like the `hoodie.index.type` way.
>
>
>
>
> So, I think we can make some configuration above support being set through
> type, and keep the way of class name configuration at the same time, in
> case of some users need customizing functions on their own.
>
>
>
>
> I'm looking forward to your feedback. Any suggestions are appreciated


[DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs

2021-05-21 Thread wangxianghu
Hi community:



Here I want to start a discussion about improving the hudi user experience.




Now hudi has more and more users all over the world, but most of them don’t 
know hudi like uber engineers or us.

when they start hudi tasks, they need to do a lot of configuration,many of 
which are not user-friendly.




such as:
```

hoodie.datasource.write.keygenerator.class   ->  
org.apache.hudi.keygen.SimpleKeyGenerator

hoodie.datasource.write.payload.class -> 
org.apache.hudi.OverwriteWithLatestAvroPayload`

--schemaprovider-class` -> subclass of org.apache.hudi.utilities.schema

--transformer-class -> full class names to act transform

--sync-tool-classes -> full class names of sync tool

--source-class -> Subclass of org.apache.hudi.utilities.sources
...
```

I think asking users to provide the full name of the class is not very 
friendly, especially for new users.

so, maybe we can provide more ways to configure parameters, just like the case 
of `HoodieIndex`. 




In `HoodieIndex` case, The users can configure one of the index type or index 
class names to tell hudi which index to use. 

```

hoodie.index.type -> HBASE

```

or

```

hoodie.index.class -> org.apache.hudi.index.hbase.SparkHoodieHBaseIndex

```

I believe more users like the `hoodie.index.type` way.




So, I think we can make some configuration above support being set through 
type, and keep the way of class name configuration at the same time, in case of 
some users need customizing functions on their own.




I'm looking forward to your feedback. Any suggestions are appreciated