Hi team,

I am thinking about whether it is necessary to add the feature of configuration 
hot update to deltastreamer.

In our company, hudi is used as a platform. We provide deltastreamer (run in 
continuous mode) to write to a large number of sources (including mysql & tidb) 
as a long time service. We often need to update the hudi configuration, but we 
don’t want to restart deltastreamer to achieve it, which may be too heavy for 
our job scheduler server based on livy/yarn. Therefore, we provide 
deltastreamer configuration hot update function. It is possible to update some 
common parameters instantly, and these parameters will take effect at the next 
sync of deltastreamer. These parameters include:

hoodie.bulkinsert.shuffle.parallelism (used only in bulkinsert)
hoodie.upsert.shuffle.parallelism
hoodie.deltastreamer.kafka.source.maxEvents
hoodie.memory.merge.max.size
hoodie.memory.compaction.max.size
hoodie.datasource.hive_sync.*
hoodie.compact.inline.max.delta.commits
hoodie.compaction.strategy
hoodie.compaction.target.io
The whole flow chart is as follows:

Reply via email to