Thanks for all your attention.
Sure, we do need to take care of high availability in design.

Also in my opinion this lake manager wouldn't drive hudi into a database on the 
cloud. It is just an official option. Something like HoodieDeltaStreamer and 
help users to reduce maintenance and hudi data governance efforts. 

As for resource and performance concerns, this lake manager should be designed 
as a planner/master, for example, lake manager will call out cleaner apis to 
launch a (spark/flink) execution to delete files under certain conditions based 
on table metadata information, rather than doing works itself. So that the 
workload and resources requirement is much less. But in general, I agree that 
we have to consider failure recovery and high availability, etc.
 
On 2022/04/19 04:30:22 Simon Su wrote:
> >
> > I agree with Danny said. IMO, there are two points that should be
> > considered
> 
> 1. If Lake Manager is designed as a service, so we should consider its High
> Availability, Dynamic Expanding/Shrinking, and state consistency.
> 2. How many resources will Lake Manager used to execute those actions of
> HUDI such as compaction, clustering, etc..
> 

Reply via email to