Hey, folks! I feel there's quite a bit of confusion in this thread, so let's try to clear it: my understanding (please correct me if I'm wrong) is that Lake Manager was referred to as a service in a similar interpretation of how we call compaction, clustering and cleaning a* table services.*
So, i'd suggest for us to be extra careful in operating familiar terms to avoid stirring up the confusion: for all things related to *RPC services * (like Metastore Server) we can call them "servers"*, *and for compaction, clustering and the rest we stick w/ "table services". If my understanding of the proposal is correct, then I think the proposal is to consolidate knobs and levers for Data Governance, Data Management, etc w/in the layer called *Lake Manager, *which will be orchestrating already existing table services through a nicely abstracted high-level API. Regarding adding any new *server* components: given Hudi's *stateless* architecture where we rely on standalone execution engines (like Spark or Flink) to operate, i don't really see us introducing a server component directly into Hudi's core. Metastore Server on the other hand will be a *standalone* component, that Hudi (as well as other processes) could be relying on to access the metadata. On Mon, Apr 18, 2022 at 10:07 PM Yue Zhang <zhangyue19921...@apache.org> wrote: > Thanks for all your attention. > Sure, we do need to take care of high availability in design. > > Also in my opinion this lake manager wouldn't drive hudi into a database > on the cloud. It is just an official option. Something like > HoodieDeltaStreamer and help users to reduce maintenance and hudi data > governance efforts. > > As for resource and performance concerns, this lake manager should be > designed as a planner/master, for example, lake manager will call out > cleaner apis to launch a (spark/flink) execution to delete files under > certain conditions based on table metadata information, rather than doing > works itself. So that the workload and resources requirement is much less. > But in general, I agree that we have to consider failure recovery and high > availability, etc. > > On 2022/04/19 04:30:22 Simon Su wrote: > > > > > > I agree with Danny said. IMO, there are two points that should be > > > considered > > > > 1. If Lake Manager is designed as a service, so we should consider its > High > > Availability, Dynamic Expanding/Shrinking, and state consistency. > > 2. How many resources will Lake Manager used to execute those actions of > > HUDI such as compaction, clustering, etc.. > > >