Thanks for all your attention. Sure, we do need to take care of high availability in design.
Also in my opinion this lake manager wouldn't drive hudi into a database on the cloud. It is just an official option. Something like HoodieDeltaStreamer and help users to reduce maintenance and hudi data governance efforts. As for resource and performance concerns, this lake manager should be designed as a planner/master, for example, lake manager will call out cleaner apis to launch a (spark/flink) execution to delete files under certain conditions based on table metadata information, rather than doing works itself. So that the workload and resources requirement is much less. But in general, I agree that we have to consider failure recovery and high availability, etc. On 2022/04/19 04:30:22 Simon Su wrote: > > > > I agree with Danny said. IMO, there are two points that should be > > considered > > 1. If Lake Manager is designed as a service, so we should consider its High > Availability, Dynamic Expanding/Shrinking, and state consistency. > 2. How many resources will Lake Manager used to execute those actions of > HUDI such as compaction, clustering, etc.. >