Hello steven.

HadoopCatalog does have many problems, but because the community added it to 
the QuickStart chapter in the first place, many users have actually stayed with 
hadoopCatalog. There is a huge cost to switching catalogs. In addition, HIVE 
even uses HadoopCatalog as an implementation of iceberg-external-table. In 
other words, HadoopCatalog is actually heavily used in production environments 
without the user's knowledge.


Against this background, there are two things we can do:
1. guide the user to replace the catalog implementation.
2. Fix hadoopCatalog.


We chose the second option and received good feedback from our users. I'm proud 
of the results of our work, as we have actually solved a large number of user 
problems.


In addition, based on our latest research, we are confident that we can 
actually manage catalogues reliably without relying on distributed locks, 
regardless of whether the file system supports atomic operations or not. We 
have initially implemented our internal implementation in the object store 
catalog with good results.


In addition to serving these customers and solving their problems, if a message 
queuing system like kafka wants to interface its tiered storage to iceberg, I 
think a file system based catalog would be their favourite thing. Because they 
already use files to manage metadata. I think the idea that the filesystem 
catalog must need a distributed lock is completely wrong.


But in any case, if the community wishes to stop supporting FileSystemCatalog, 
I will respect the community's choice.


I'm glad to hear from you.


Regards
lisoda











在 2024-07-16 23:18:42,"Steven Wu" <stevenz...@gmail.com> 写道:

Lisoda, HadoopCatalog has many issues for production usage like Dan said. It 
has never been recommended in production. It was widely used in unit test code, 
which is also slowly moving toward InMemoryCatalog. As the community is aligned 
behind the REST catalog, it is preferable to limit the work related hadoop 
catalog. 


On Sun, Jul 14, 2024 at 11:44 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:

Again, it's my "vision": if the community wants to maintain and move
forward on HadoopCatalog, that's fine (not sure it would be a good
idea regarding the "limitations" of filesystem based catalog :)).

Let's see what the others are thinking.

Regards
JB

On Mon, Jul 15, 2024 at 8:29 AM lisoda <lis...@yeah.net> wrote:
>
> Okay. I see......
> I‘m so sad. :(
> But anyway, thanks for answering all my questions.
>
>
>
>
>
>
> 在 2024-07-15 14:25:16,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道:
> >Hi
> >
> >HadoopCatalog is not a "recommended" catalog for production (at least
> >up to now). So, we should consider either to move it in a separate
> >repo (if we have the guarantee that it's gonna be maintained, else it
> >doesn't make sense) or remove it to avoid confusion. My take here is
> >the same (for several months :)): we should privilege the REST Catalog
> >API and users should use a REST Catalog server implementation.
> >
> >Regards
> >JB
> >
> >On Mon, Jul 15, 2024 at 8:13 AM lisoda <lis...@yeah.net> wrote:
> >>
> >> Sir. Even if the entire hadoopCatalog can be used without lockManager, 
> >> should we delete it?
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2024-07-15 14:08:40,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道:
> >> >Hi
> >> >
> >> >My understanding is that lock manager is mostly used on the
> >> >HadoopCatalog. The other catalogs relays on a third party lock
> >> >mechanism: for instance, JDBC Catalog uses the RDBMS table/row
> >> >locking, REST Catalog uses implementation lock.
> >> >I would rather remove HadoopCatalog and the lock manager in favor of
> >> >the REST catalog and implementation lock mechanism.
> >> >
> >> >Just my $0.01 :)
> >> >
> >> >Regards
> >> >JB
> >> >
> >> >On Fri, Jul 12, 2024 at 7:41 AM lisoda <lis...@yeah.net> wrote:
> >> >>
> >> >> Currently, the only lockManager implementation in iceberg-core is 
> >> >> InMemoryLockManager. This PR extends two LockManager implementations, 
> >> >> one based on the Redis, and another based on the Rest-API.
> >> >> In general, most users use redisLockManager is sufficient to cope with 
> >> >> most of the scenarios, for redis can not meet the user's requirements, 
> >> >> we can let the user to provide a RestApi service to achieve this 
> >> >> function. I believe that, for a long time, these two lock-manager's 
> >> >> will satisfy most of the customer's needs.
> >> >>
> >> >> If someone could review this PR, that would be great.
> >> >>
> >> >> PR: https://github.com/apache/iceberg/pull/10688
> >> >> SLACK: 
> >> >> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720761992982729

Reply via email to