I could not see the image you have attached. But I do get your ask here. Will definitely benefit continuous deltastreamer use-cases. One possible option is to keep track of the last mod time of the property file that we feed in for detlastreamer top level config and before every batch, we can check if it has changed. Optionally re-instantiate write config and other components (write client etc) if applicable. If not, proceed as usual. Should not be hard to add the support.
On Mon, 22 May 2023 at 00:05, 孔维 <18701146...@163.com> wrote: > Hi team, > > I am thinking about whether it is necessary to add the feature of > configuration hot update to deltastreamer. > > In our company, hudi is used as a platform. We provide deltastreamer (run > in continuous mode) to write to a large number of sources (including mysql > & tidb) as a long time service. We often need to update the hudi > configuration, but we don’t want to restart deltastreamer to achieve it, > which may be too heavy for our job scheduler server based on livy/yarn. > Therefore, we provide deltastreamer configuration hot update function. It > is possible to update some common parameters instantly, and these > parameters will take effect at the next sync of deltastreamer. These > parameters include: > > - hoodie.bulkinsert.shuffle.parallelism (used only in bulkinsert) > - hoodie.upsert.shuffle.parallelism > - hoodie.deltastreamer.kafka.source.maxEvents > - hoodie.memory.merge.max.size > - hoodie.memory.compaction.max.size > - hoodie.datasource.hive_sync.* > - hoodie.compact.inline.max.delta.commits > - hoodie.compaction.strategy > - hoodie.compaction.target.io > > The whole flow chart is as follows: > > -- Regards, -Sivabalan