I think one of the main questions is whether we want to support locking
strategies moving forward. These were needed in early catalogs that didn't
have support for atomic operations (HadoopCatalog and GlueCatalog). Now,
Glue supports atomic commits and we have been discouraging the use of
HadoopCatalog, which is a purely filesystem-based implementation for a long
time.

One thing to consider is that external locking does solve a few of the
challenges of the filesystem-based approach, but doesn't help with many of
the shortcomings of the HadoopCatalog, like being able to atomically delete
or rename a table. (Operations that are very commonly used in data
engineering!)

Maybe we should consider moving Hadoop* classes into a separate
iceberg-hadoop module, along with the LockManager to make it work somewhat
better. Personally, I'd prefer deprecating HadoopCatalog and
HadoopTableOperations because of their serious limitations. But moving
these into a separate module seems like a good compromise. That would also
avoid needing to add dependencies to core, like Redis for lock
implementations.

Ryan

On Thu, Jul 11, 2024 at 10:42 PM lisoda <lis...@yeah.net> wrote:

> Currently, the only lockManager implementation in iceberg-core is
> InMemoryLockManager. This PR extends two LockManager implementations, one
> based on the Redis, and another based on the Rest-API.
> In general, most users use redisLockManager is sufficient to cope with
> most of the scenarios, for redis can not meet the user's requirements, we
> can let the user to provide a RestApi service to achieve this function. I
> believe that, for a long time, these two lock-manager's will satisfy most
> of the customer's needs.
>
> If someone could review this PR, that would be great.
>
> PR: https://github.com/apache/iceberg/pull/10688
> SLACK:
> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720761992982729
>


-- 
Ryan Blue
Databricks

Reply via email to