Hello steven.
HadoopCatalog does have many problems, but because the community added it to the QuickStart chapter in the first place, many users have actually stayed with hadoopCatalog. There is a huge cost to switching catalogs. In addition, HIVE even uses HadoopCatalog as an implementation of iceberg-external-table. In other words, HadoopCatalog is actually heavily used in production environments without the user's knowledge. Against this background, there are two things we can do: 1. guide the user to replace the catalog implementation. 2. Fix hadoopCatalog. We chose the second option and received good feedback from our users. I'm proud of the results of our work, as we have actually solved a large number of user problems. In addition, based on our latest research, we are confident that we can actually manage catalogues reliably without relying on distributed locks, regardless of whether the file system supports atomic operations or not. We have initially implemented our internal implementation in the object store catalog with good results. In addition to serving these customers and solving their problems, if a message queuing system like kafka wants to interface its tiered storage to iceberg, I think a file system based catalog would be their favourite thing. Because they already use files to manage metadata. I think the idea that the filesystem catalog must need a distributed lock is completely wrong. But in any case, if the community wishes to stop supporting FileSystemCatalog, I will respect the community's choice. I'm glad to hear from you. Regards lisoda 在 2024-07-16 23:18:42,"Steven Wu" <stevenz...@gmail.com> 写道: Lisoda, HadoopCatalog has many issues for production usage like Dan said. It has never been recommended in production. It was widely used in unit test code, which is also slowly moving toward InMemoryCatalog. As the community is aligned behind the REST catalog, it is preferable to limit the work related hadoop catalog. On Sun, Jul 14, 2024 at 11:44 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: Again, it's my "vision": if the community wants to maintain and move forward on HadoopCatalog, that's fine (not sure it would be a good idea regarding the "limitations" of filesystem based catalog :)). Let's see what the others are thinking. Regards JB On Mon, Jul 15, 2024 at 8:29 AM lisoda <lis...@yeah.net> wrote: > > Okay. I see...... > I‘m so sad. :( > But anyway, thanks for answering all my questions. > > > > > > > 在 2024-07-15 14:25:16,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道: > >Hi > > > >HadoopCatalog is not a "recommended" catalog for production (at least > >up to now). So, we should consider either to move it in a separate > >repo (if we have the guarantee that it's gonna be maintained, else it > >doesn't make sense) or remove it to avoid confusion. My take here is > >the same (for several months :)): we should privilege the REST Catalog > >API and users should use a REST Catalog server implementation. > > > >Regards > >JB > > > >On Mon, Jul 15, 2024 at 8:13 AM lisoda <lis...@yeah.net> wrote: > >> > >> Sir. Even if the entire hadoopCatalog can be used without lockManager, > >> should we delete it? > >> > >> > >> > >> > >> > >> > >> 在 2024-07-15 14:08:40,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道: > >> >Hi > >> > > >> >My understanding is that lock manager is mostly used on the > >> >HadoopCatalog. The other catalogs relays on a third party lock > >> >mechanism: for instance, JDBC Catalog uses the RDBMS table/row > >> >locking, REST Catalog uses implementation lock. > >> >I would rather remove HadoopCatalog and the lock manager in favor of > >> >the REST catalog and implementation lock mechanism. > >> > > >> >Just my $0.01 :) > >> > > >> >Regards > >> >JB > >> > > >> >On Fri, Jul 12, 2024 at 7:41 AM lisoda <lis...@yeah.net> wrote: > >> >> > >> >> Currently, the only lockManager implementation in iceberg-core is > >> >> InMemoryLockManager. This PR extends two LockManager implementations, > >> >> one based on the Redis, and another based on the Rest-API. > >> >> In general, most users use redisLockManager is sufficient to cope with > >> >> most of the scenarios, for redis can not meet the user's requirements, > >> >> we can let the user to provide a RestApi service to achieve this > >> >> function. I believe that, for a long time, these two lock-manager's > >> >> will satisfy most of the customer's needs. > >> >> > >> >> If someone could review this PR, that would be great. > >> >> > >> >> PR: https://github.com/apache/iceberg/pull/10688 > >> >> SLACK: > >> >> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720761992982729