Hi team, I am thinking about whether it is necessary to add the feature of configuration hot update to deltastreamer.
In our company, hudi is used as a platform. We provide deltastreamer (run in continuous mode) to write to a large number of sources (including mysql & tidb) as a long time service. We often need to update the hudi configuration, but we don’t want to restart deltastreamer to achieve it, which may be too heavy for our job scheduler server based on livy/yarn. Therefore, we provide deltastreamer configuration hot update function. It is possible to update some common parameters instantly, and these parameters will take effect at the next sync of deltastreamer. These parameters include: hoodie.bulkinsert.shuffle.parallelism (used only in bulkinsert) hoodie.upsert.shuffle.parallelism hoodie.deltastreamer.kafka.source.maxEvents hoodie.memory.merge.max.size hoodie.memory.compaction.max.size hoodie.datasource.hive_sync.* hoodie.compact.inline.max.delta.commits hoodie.compaction.strategy hoodie.compaction.target.io The whole flow chart is as follows:
