Nice to hear, so we work on it in parallel. On Mon, Mar 2, 2026 at 8:33 PM Anoop Johnson <[email protected]> wrote:
> > A major challenge with UniForm right now is its limitation regarding > Deletion Vectors (DVs). Support for this is critical for many users > migrating their workloads. > > The reason why Uniform v1/v2 blocked DVs was because Iceberg v1/v2 had a > different positional delete representation than Delta Lake. But that > changed in Iceberg v3. So the upcoming version of Uniform (IcebergCompatV3 > <https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-compat-v3.md>) > will lift this restriction. > > On Mon, Mar 2, 2026 at 10:48 AM Vladislav Sidorovich via dev < > [email protected]> wrote: > >> Hi Anoop, >> >> Thanks for the feedback and for raising these important points. >> >> Regarding the technical feedback on minimizing the use of internal Delta >> Kernel classes: I completely agree. Relying on internal APIs like AddFile >> introduces an unnecessary maintenance burden. My plan is to refactor the >> code (e.g., transitioning to the Row API) once we have alignment on the >> core features this PR will support. I will also put together a list of the >> gaps I've encountered in the Kernel API (such as change detection) so we >> can file those upstream, as you suggested. >> >> As a quick update on the PR's progress: I’ve recently added support for >> UPDATE and DELETE operations, along with expanded test coverage. At this >> stage, the PR is roughly at feature parity with the existing tool >> (excluding VACUUM) but supports newer Delta versions. As outlined in the >> PR description, the next features on the roadmap are: >> >> 1. VACUUM support >> 2. Deletion Vectors (DVs) support >> 3. Incremental conversion >> >> >> *Bigger question*. To address your broader question about whether we >> should consider sunsetting the Delta Lake module in favor of Delta UniForm: >> based on my experience and observations, there are still compelling reasons >> to maintain a native Iceberg-driven conversion tool. >> >> - >> >> *Feature Limitations:* A major challenge with UniForm right now is >> its limitation regarding Deletion Vectors (DVs). Support for this is >> critical for many users migrating their workloads. >> - >> >> *User Preference:* I've observed that teams looking to migrate to >> Iceberg strongly prefer "native" tooling maintained by the technology they >> are migrating *to*, rather than relying on the ecosystem they are >> trying to move *from*. Having an in-house Iceberg tool gives the >> community more control over the migration experience. >> >> Let me know your thoughts on the above, particularly regarding the >> long-term need for a native migration path. >> >> Best, Vladislav >> >> On Thu, Feb 26, 2026 at 8:07 PM Anoop Johnson <[email protected]> wrote: >> >>> Vladislav, >>> >>> We should minimize the usage of internal Delta kernel classes as much >>> as possible. There are no guarantees about the stability of the internal >>> APIs, and it will be a maintenance burden on the Iceberg project. For >>> instance, instead of using the internal `AddFile` class use the `Row` API >>> using ordinals defined by the scan file schema. I do recognize that there >>> are some gaps in the kernel API (you mentioned change detection): do you >>> have a list? It would be worth filing an issue against Delta kernel, it is >>> possible some of these like providing file changes might be in their >>> roadmap. >>> >>> *I have a higher level question to the community:* should we consider >>> sunsetting the Delta lake module? Delta Lake's Uniform >>> <https://docs.delta.io/delta-uniform/> can already generate Iceberg >>> metadata: it is incremental, and already handles several features such as >>> column mapping. Do we need to duplicate all of that work? Obviously it is >>> better to have less code and less components to maintain. >>> >>> Best, >>> Anoop >>> >>> Disclosure: I work on Delta also as part of my day job. >>> >>> >>> On Wed, Feb 25, 2026 at 1:44 PM Vladislav Sidorovich < >>> [email protected]> wrote: >>> >>>> Hi Anoop, >>>> >>>> Thanks a lot for the initial review. >>>> >>>> Data correctness guards: >>>> 1. I will add support for Remove action soon, work on the PR is in >>>> progress. >>>> 2. Sure, let's do reject for `column mapping` feature for now for the >>>> safety. Later I will try to provide support of this feature as well. >>>> >>>> >>>> Yes, the PR depends on `*internal*` API of the delta-kernel. I do not >>>> see a simple way to replace it with the public API. As an option I can >>>> replace these classes with our `in-house` classes that would rely on the >>>> Dela protocol spec, it will be safe in terms of runtime but it will be >>>> additional code that we will need to support. >>>> >>>> What do you think if I will continue work with `*internal*` delta API >>>> for now and refactor this logic before merging the PR once we will agree on >>>> some solutions? >>>> >>>> >>>> On Tue, Feb 24, 2026 at 5:29 AM Anoop Johnson <[email protected]> wrote: >>>> >>>>> Hi, Vladislav - >>>>> >>>>> I've done an initial review of the PR >>>>> <https://github.com/apache/iceberg/pull/15407>. Moving to the Delta >>>>> kernel is the right direction, so thank you for doing this. Here's a >>>>> summary of my initial feedback (full details are in the PR): >>>>> >>>>> Data correctness guards: >>>>> 1. If we encounter `Remove` actions, it should fail fast rather than >>>>> silently skip it. Otherwise tables with DML will produce duplicate rows in >>>>> the Iceberg table. >>>>> 2. Tables with column mapping enabled) will produce silent data >>>>> corruption because the Parquet files will have physical column names that >>>>> don't match the logical schema. We should validate this and reject until >>>>> column mapping support is added (which can be done as a separate PR). >>>>> >>>>> The PR relies heavily on io.delta.kernel.internal.* classes, which can >>>>> be fragile. We should consider replacing them with the public kernel APIs. >>>>> >>>>> Best, >>>>> Anoop >>>>> >>>>> >>>>> On Mon, Feb 23, 2026 at 12:29 AM Vladislav Sidorovich via dev < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Iceberg Community, >>>>>> >>>>>> I recently opened a PR to update the existing Delta Lake to Iceberg >>>>>> migration functionality to support recent Delta Lake table versions >>>>>> (read: >>>>>> 3, write: 7). I would appreciate it if anyone take a look and share >>>>>> thoughts on the architecture and initial implementation >>>>>> >>>>>> *PR Link:* https://github.com/apache/iceberg/pull/15407 >>>>>> >>>>>> The main motivation for sharing this now is to get some early >>>>>> feedback from the community on the approach and the initial >>>>>> implementation. >>>>>> >>>>>> To make reviewing easier, this PR doesn't remove or overwrite the old >>>>>> logic. Instead, I’ve added a new interface implementation utilizing the >>>>>> *Delta >>>>>> Lake Kernel library* (replacing the deprecated Delta Lake standalone >>>>>> library). This side-by-side approach allows for easier comparison and >>>>>> shouldn't introduce any issues with current usage scenarios. >>>>>> >>>>>> >>>>>> *Current PR Scope:* >>>>>> >>>>>> - Maintains support for the existing migration interface. >>>>>> - Migrates the underlying engine to the Delta Lake Kernel library. >>>>>> - Contains the basic migration flow. >>>>>> - Successfully converts all data types, table schemas, and >>>>>> partition specs. >>>>>> - Currently supports INSERT operations only (Delta Lake Add >>>>>> action). >>>>>> - *Testing:* Includes unit tests for all supported data types >>>>>> (including complex arrays and structures) and integration tests for >>>>>> insert-only scenarios using Spark 3.5. >>>>>> >>>>>> *Future Steps (Next PRs):* >>>>>> >>>>>> Once we align on this foundation, I plan to follow up with: >>>>>> >>>>>> - Adding support for UPDATE and DELETE (Delta Lake Remove action). >>>>>> - Supporting all remaining Delta Lake actions. >>>>>> - Handling edge cases for partitions and generated columns. >>>>>> - Adding Schema Evolution support. >>>>>> - Adding Deletion Vector (DV) support. >>>>>> - Enabling Incremental Conversion (from/to specific Delta >>>>>> versions). >>>>>> - Adding all tables from the Delta golden tables for robust >>>>>> testing. *(Note: The current integration test will be updated for >>>>>> newer Delta Lake versions once the old standalone solution is fully >>>>>> deprecated/deleted).* >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Vladislav Sidorovich >>>>>> >>>>>> Feedback: *go/feedback-for-vladislav >>>>>> <https://goto.google.com/feedback-for-vladislav> * >>>>>> [image: Google Logo] >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> Best regards, >>>> Vladislav Sidorovich >>>> >>>> Feedback: *go/feedback-for-vladislav >>>> <https://goto.google.com/feedback-for-vladislav> * >>>> [image: Google Logo] >>>> >>>> >>>> >> >> -- >> Best regards, >> Vladislav Sidorovich >> >> Feedback: *go/feedback-for-vladislav >> <https://goto.google.com/feedback-for-vladislav> * >> [image: Google Logo] >> >> >> -- Best regards, Vladislav Sidorovich Feedback: *go/feedback-for-vladislav * [image: Google Logo]
