lisoda, I don't think there is a good way to fix the HadoopCatalog
implementation. That's why we recommend not using it.

In the quickstart, the assumption is that you're using a Hive catalog. The
HadoopCatalog example shows how to add additional catalogs (in this case, a
local one for testing). I don't think it is too misleading, but perhaps we
should change that so people are not confused.

I'll start a thread about deprecating HadoopCatalog and
HadopTableOperations since the operations are unsafe.

On Wed, Jul 17, 2024 at 2:45 AM lisoda <lis...@yeah.net> wrote:

> Hello steven.
>
> HadoopCatalog does have many problems, but because the community added it
> to the QuickStart chapter in the first place, many users have actually
> stayed with hadoopCatalog. There is a huge cost to switching catalogs. In
> addition, HIVE even uses HadoopCatalog as an implementation of
> iceberg-external-table. In other words, HadoopCatalog is actually heavily
> used in production environments without the user's knowledge.
>
> Against this background, there are two things we can do:
> 1. guide the user to replace the catalog implementation.
> 2. Fix hadoopCatalog.
>
> We chose the second option and received good feedback from our users. I'm
> proud of the results of our work, as we have actually solved a large number
> of user problems.
>
> In addition, based on our latest research, we are confident that we can
> actually manage catalogues reliably without relying on distributed locks,
> regardless of whether the file system supports atomic operations or not. We
> have initially implemented our internal implementation in the object store
> catalog with good results.
>
> In addition to serving these customers and solving their problems, if a
> message queuing system like kafka wants to interface its tiered storage to
> iceberg, I think a file system based catalog would be their favourite
> thing. Because they already use files to manage metadata. I think the idea
> that the filesystem catalog must need a distributed lock is completely
> wrong.
>
> But in any case, if the community wishes to stop supporting
> FileSystemCatalog, I will respect the community's choice.
>
> I'm glad to hear from you.
>
> Regards
> lisoda
>
>
>
>
>
> 在 2024-07-16 23:18:42,"Steven Wu" <stevenz...@gmail.com> 写道:
>
> Lisoda, HadoopCatalog has many issues for production usage like Dan said.
> It has never been recommended in production. It was widely used in unit
> test code, which is also slowly moving toward InMemoryCatalog. As the
> community is aligned behind the REST catalog, it is preferable to limit the
> work related hadoop catalog.
>
> On Sun, Jul 14, 2024 at 11:44 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Again, it's my "vision": if the community wants to maintain and move
>> forward on HadoopCatalog, that's fine (not sure it would be a good
>> idea regarding the "limitations" of filesystem based catalog :)).
>>
>> Let's see what the others are thinking.
>>
>> Regards
>> JB
>>
>> On Mon, Jul 15, 2024 at 8:29 AM lisoda <lis...@yeah.net> wrote:
>> >
>> > Okay. I see......
>> > I‘m so sad. :(
>> > But anyway, thanks for answering all my questions.
>> >
>> >
>> >
>> >
>> >
>> >
>> > 在 2024-07-15 14:25:16,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道:
>> > >Hi
>> > >
>> > >HadoopCatalog is not a "recommended" catalog for production (at least
>> > >up to now). So, we should consider either to move it in a separate
>> > >repo (if we have the guarantee that it's gonna be maintained, else it
>> > >doesn't make sense) or remove it to avoid confusion. My take here is
>> > >the same (for several months :)): we should privilege the REST Catalog
>> > >API and users should use a REST Catalog server implementation.
>> > >
>> > >Regards
>> > >JB
>> > >
>> > >On Mon, Jul 15, 2024 at 8:13 AM lisoda <lis...@yeah.net> wrote:
>> > >>
>> > >> Sir. Even if the entire hadoopCatalog can be used without
>> lockManager, should we delete it?
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> 在 2024-07-15 14:08:40,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道:
>> > >> >Hi
>> > >> >
>> > >> >My understanding is that lock manager is mostly used on the
>> > >> >HadoopCatalog. The other catalogs relays on a third party lock
>> > >> >mechanism: for instance, JDBC Catalog uses the RDBMS table/row
>> > >> >locking, REST Catalog uses implementation lock.
>> > >> >I would rather remove HadoopCatalog and the lock manager in favor of
>> > >> >the REST catalog and implementation lock mechanism.
>> > >> >
>> > >> >Just my $0.01 :)
>> > >> >
>> > >> >Regards
>> > >> >JB
>> > >> >
>> > >> >On Fri, Jul 12, 2024 at 7:41 AM lisoda <lis...@yeah.net> wrote:
>> > >> >>
>> > >> >> Currently, the only lockManager implementation in iceberg-core is
>> InMemoryLockManager. This PR extends two LockManager implementations, one
>> based on the Redis, and another based on the Rest-API.
>> > >> >> In general, most users use redisLockManager is sufficient to cope
>> with most of the scenarios, for redis can not meet the user's requirements,
>> we can let the user to provide a RestApi service to achieve this function.
>> I believe that, for a long time, these two lock-manager's will satisfy most
>> of the customer's needs.
>> > >> >>
>> > >> >> If someone could review this PR, that would be great.
>> > >> >>
>> > >> >> PR: https://github.com/apache/iceberg/pull/10688
>> > >> >> SLACK:
>> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720761992982729
>>
>

-- 
Ryan Blue
Databricks

Reply via email to