> A major challenge with UniForm right now is its limitation regarding
Deletion Vectors (DVs). Support for this is critical for many users
migrating their workloads.

The reason why Uniform v1/v2 blocked DVs was because Iceberg v1/v2 had a
different positional delete representation than Delta Lake. But that
changed in Iceberg v3. So the upcoming version of Uniform (IcebergCompatV3
<https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-compat-v3.md>)
will lift this restriction.

On Mon, Mar 2, 2026 at 10:48 AM Vladislav Sidorovich via dev <
[email protected]> wrote:

> Hi Anoop,
>
> Thanks for the feedback and for raising these important points.
>
> Regarding the technical feedback on minimizing the use of internal Delta
> Kernel classes: I completely agree. Relying on internal APIs like AddFile
> introduces an unnecessary maintenance burden. My plan is to refactor the
> code (e.g., transitioning to the Row API) once we have alignment on the
> core features this PR will support. I will also put together a list of the
> gaps I've encountered in the Kernel API (such as change detection) so we
> can file those upstream, as you suggested.
>
> As a quick update on the PR's progress: I’ve recently added support for
> UPDATE and DELETE operations, along with expanded test coverage. At this
> stage, the PR is roughly at feature parity with the existing tool
> (excluding VACUUM) but supports newer Delta versions. As outlined in the
> PR description, the next features on the roadmap are:
>
>    1. VACUUM support
>    2. Deletion Vectors (DVs) support
>    3. Incremental conversion
>
>
> *Bigger question*. To address your broader question about whether we
> should consider sunsetting the Delta Lake module in favor of Delta UniForm:
> based on my experience and observations, there are still compelling reasons
> to maintain a native Iceberg-driven conversion tool.
>
>    -
>
>    *Feature Limitations:* A major challenge with UniForm right now is its
>    limitation regarding Deletion Vectors (DVs). Support for this is critical
>    for many users migrating their workloads.
>    -
>
>    *User Preference:* I've observed that teams looking to migrate to
>    Iceberg strongly prefer "native" tooling maintained by the technology they
>    are migrating *to*, rather than relying on the ecosystem they are
>    trying to move *from*. Having an in-house Iceberg tool gives the
>    community more control over the migration experience.
>
> Let me know your thoughts on the above, particularly regarding the
> long-term need for a native migration path.
>
> Best, Vladislav
>
> On Thu, Feb 26, 2026 at 8:07 PM Anoop Johnson <[email protected]> wrote:
>
>> Vladislav,
>>
>> We should minimize the usage of internal Delta kernel classes as much
>> as possible. There are no guarantees about the stability of the internal
>> APIs, and it will be a maintenance burden on the Iceberg project. For
>> instance, instead of using the internal `AddFile` class use the `Row` API
>> using ordinals defined by the scan file schema. I do recognize that there
>> are some gaps in the kernel API (you mentioned change detection): do you
>> have a list? It would be worth filing an issue against Delta kernel, it is
>> possible some of these like providing file changes might be in their
>> roadmap.
>>
>> *I have a higher level question to the community:* should we consider
>> sunsetting the Delta lake module? Delta Lake's Uniform
>> <https://docs.delta.io/delta-uniform/> can  already generate Iceberg
>> metadata: it is incremental, and already handles several features such as
>> column mapping. Do we need to duplicate all of that work? Obviously it is
>> better to have less code and less components to maintain.
>>
>> Best,
>> Anoop
>>
>> Disclosure: I work on Delta also as part of my day job.
>>
>>
>> On Wed, Feb 25, 2026 at 1:44 PM Vladislav Sidorovich <
>> [email protected]> wrote:
>>
>>> Hi Anoop,
>>>
>>> Thanks a lot for the initial review.
>>>
>>> Data correctness guards:
>>> 1. I will add support for Remove action soon, work on the PR is in
>>> progress.
>>> 2. Sure, let's do reject for `column mapping` feature for now for the
>>> safety. Later I will try to provide support of this feature as well.
>>>
>>>
>>> Yes, the PR depends on `*internal*` API of the delta-kernel. I do not
>>> see a simple way to replace it with the public API.  As an option I can
>>> replace these classes with our `in-house` classes that would rely on the
>>> Dela protocol spec, it will be safe in terms of runtime but it will be
>>> additional code that we will need to support.
>>>
>>> What do you think if I will continue work with `*internal*` delta API
>>> for now and refactor this logic before merging the PR once we will agree on
>>> some solutions?
>>>
>>>
>>> On Tue, Feb 24, 2026 at 5:29 AM Anoop Johnson <[email protected]> wrote:
>>>
>>>> Hi, Vladislav -
>>>>
>>>> I've done an initial review of the PR
>>>> <https://github.com/apache/iceberg/pull/15407>. Moving to the Delta
>>>> kernel is the right direction, so thank you for doing this. Here's a
>>>> summary of my initial feedback (full details are in the PR):
>>>>
>>>> Data correctness guards:
>>>> 1. If we encounter `Remove` actions, it should fail fast rather than
>>>> silently skip it. Otherwise tables with DML will produce duplicate rows in
>>>> the Iceberg table.
>>>> 2. Tables with column mapping enabled) will produce silent data
>>>> corruption because the Parquet files will have physical column names that
>>>> don't match the logical schema. We should validate this and reject until
>>>> column mapping support is added (which can be done as a separate PR).
>>>>
>>>> The PR relies heavily on io.delta.kernel.internal.* classes, which can
>>>> be fragile. We should consider replacing them with the public kernel APIs.
>>>>
>>>> Best,
>>>> Anoop
>>>>
>>>>
>>>> On Mon, Feb 23, 2026 at 12:29 AM Vladislav Sidorovich via dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Iceberg Community,
>>>>>
>>>>> I recently opened a PR to update the existing Delta Lake to Iceberg
>>>>> migration functionality to support recent Delta Lake table versions (read:
>>>>> 3, write: 7). I would appreciate it if anyone take a look and share
>>>>> thoughts on the architecture and initial implementation
>>>>>
>>>>> *PR Link:* https://github.com/apache/iceberg/pull/15407
>>>>>
>>>>> The main motivation for sharing this now is to get some early feedback
>>>>> from the community on the approach and the initial implementation.
>>>>>
>>>>> To make reviewing easier, this PR doesn't remove or overwrite the old
>>>>> logic. Instead, I’ve added a new interface implementation utilizing the 
>>>>> *Delta
>>>>> Lake Kernel library* (replacing the deprecated Delta Lake standalone
>>>>> library). This side-by-side approach allows for easier comparison and
>>>>> shouldn't introduce any issues with current usage scenarios.
>>>>>
>>>>>
>>>>> *Current PR Scope:*
>>>>>
>>>>>    - Maintains support for the existing migration interface.
>>>>>    - Migrates the underlying engine to the Delta Lake Kernel library.
>>>>>    - Contains the basic migration flow.
>>>>>    - Successfully converts all data types, table schemas, and
>>>>>    partition specs.
>>>>>    - Currently supports INSERT operations only (Delta Lake Add
>>>>>    action).
>>>>>    - *Testing:* Includes unit tests for all supported data types
>>>>>    (including complex arrays and structures) and integration tests for
>>>>>    insert-only scenarios using Spark 3.5.
>>>>>
>>>>> *Future Steps (Next PRs):*
>>>>>
>>>>> Once we align on this foundation, I plan to follow up with:
>>>>>
>>>>>    - Adding support for UPDATE and DELETE (Delta Lake Remove action).
>>>>>    - Supporting all remaining Delta Lake actions.
>>>>>    - Handling edge cases for partitions and generated columns.
>>>>>    - Adding Schema Evolution support.
>>>>>    - Adding Deletion Vector (DV) support.
>>>>>    - Enabling Incremental Conversion (from/to specific Delta
>>>>>    versions).
>>>>>    - Adding all tables from the Delta golden tables for robust
>>>>>    testing. *(Note: The current integration test will be updated for
>>>>>    newer Delta Lake versions once the old standalone solution is fully
>>>>>    deprecated/deleted).*
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Vladislav Sidorovich
>>>>>
>>>>> Feedback: *go/feedback-for-vladislav
>>>>> <https://goto.google.com/feedback-for-vladislav> *
>>>>> [image: Google Logo]
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Best regards,
>>> Vladislav Sidorovich
>>>
>>> Feedback: *go/feedback-for-vladislav
>>> <https://goto.google.com/feedback-for-vladislav> *
>>> [image: Google Logo]
>>>
>>>
>>>
>
> --
> Best regards,
> Vladislav Sidorovich
>
> Feedback: *go/feedback-for-vladislav *
> [image: Google Logo]
>
>
>

Reply via email to