Nice to hear, so we work on it in parallel.

On Mon, Mar 2, 2026 at 8:33 PM Anoop Johnson <[email protected]> wrote:

> > A major challenge with UniForm right now is its limitation regarding
> Deletion Vectors (DVs). Support for this is critical for many users
> migrating their workloads.
>
> The reason why Uniform v1/v2 blocked DVs was because Iceberg v1/v2 had a
> different positional delete representation than Delta Lake. But that
> changed in Iceberg v3. So the upcoming version of Uniform (IcebergCompatV3
> <https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-compat-v3.md>)
> will lift this restriction.
>
> On Mon, Mar 2, 2026 at 10:48 AM Vladislav Sidorovich via dev <
> [email protected]> wrote:
>
>> Hi Anoop,
>>
>> Thanks for the feedback and for raising these important points.
>>
>> Regarding the technical feedback on minimizing the use of internal Delta
>> Kernel classes: I completely agree. Relying on internal APIs like AddFile
>> introduces an unnecessary maintenance burden. My plan is to refactor the
>> code (e.g., transitioning to the Row API) once we have alignment on the
>> core features this PR will support. I will also put together a list of the
>> gaps I've encountered in the Kernel API (such as change detection) so we
>> can file those upstream, as you suggested.
>>
>> As a quick update on the PR's progress: I’ve recently added support for
>> UPDATE and DELETE operations, along with expanded test coverage. At this
>> stage, the PR is roughly at feature parity with the existing tool
>> (excluding VACUUM) but supports newer Delta versions. As outlined in the
>> PR description, the next features on the roadmap are:
>>
>>    1. VACUUM support
>>    2. Deletion Vectors (DVs) support
>>    3. Incremental conversion
>>
>>
>> *Bigger question*. To address your broader question about whether we
>> should consider sunsetting the Delta Lake module in favor of Delta UniForm:
>> based on my experience and observations, there are still compelling reasons
>> to maintain a native Iceberg-driven conversion tool.
>>
>>    -
>>
>>    *Feature Limitations:* A major challenge with UniForm right now is
>>    its limitation regarding Deletion Vectors (DVs). Support for this is
>>    critical for many users migrating their workloads.
>>    -
>>
>>    *User Preference:* I've observed that teams looking to migrate to
>>    Iceberg strongly prefer "native" tooling maintained by the technology they
>>    are migrating *to*, rather than relying on the ecosystem they are
>>    trying to move *from*. Having an in-house Iceberg tool gives the
>>    community more control over the migration experience.
>>
>> Let me know your thoughts on the above, particularly regarding the
>> long-term need for a native migration path.
>>
>> Best, Vladislav
>>
>> On Thu, Feb 26, 2026 at 8:07 PM Anoop Johnson <[email protected]> wrote:
>>
>>> Vladislav,
>>>
>>> We should minimize the usage of internal Delta kernel classes as much
>>> as possible. There are no guarantees about the stability of the internal
>>> APIs, and it will be a maintenance burden on the Iceberg project. For
>>> instance, instead of using the internal `AddFile` class use the `Row` API
>>> using ordinals defined by the scan file schema. I do recognize that there
>>> are some gaps in the kernel API (you mentioned change detection): do you
>>> have a list? It would be worth filing an issue against Delta kernel, it is
>>> possible some of these like providing file changes might be in their
>>> roadmap.
>>>
>>> *I have a higher level question to the community:* should we consider
>>> sunsetting the Delta lake module? Delta Lake's Uniform
>>> <https://docs.delta.io/delta-uniform/> can  already generate Iceberg
>>> metadata: it is incremental, and already handles several features such as
>>> column mapping. Do we need to duplicate all of that work? Obviously it is
>>> better to have less code and less components to maintain.
>>>
>>> Best,
>>> Anoop
>>>
>>> Disclosure: I work on Delta also as part of my day job.
>>>
>>>
>>> On Wed, Feb 25, 2026 at 1:44 PM Vladislav Sidorovich <
>>> [email protected]> wrote:
>>>
>>>> Hi Anoop,
>>>>
>>>> Thanks a lot for the initial review.
>>>>
>>>> Data correctness guards:
>>>> 1. I will add support for Remove action soon, work on the PR is in
>>>> progress.
>>>> 2. Sure, let's do reject for `column mapping` feature for now for the
>>>> safety. Later I will try to provide support of this feature as well.
>>>>
>>>>
>>>> Yes, the PR depends on `*internal*` API of the delta-kernel. I do not
>>>> see a simple way to replace it with the public API.  As an option I can
>>>> replace these classes with our `in-house` classes that would rely on the
>>>> Dela protocol spec, it will be safe in terms of runtime but it will be
>>>> additional code that we will need to support.
>>>>
>>>> What do you think if I will continue work with `*internal*` delta API
>>>> for now and refactor this logic before merging the PR once we will agree on
>>>> some solutions?
>>>>
>>>>
>>>> On Tue, Feb 24, 2026 at 5:29 AM Anoop Johnson <[email protected]> wrote:
>>>>
>>>>> Hi, Vladislav -
>>>>>
>>>>> I've done an initial review of the PR
>>>>> <https://github.com/apache/iceberg/pull/15407>. Moving to the Delta
>>>>> kernel is the right direction, so thank you for doing this. Here's a
>>>>> summary of my initial feedback (full details are in the PR):
>>>>>
>>>>> Data correctness guards:
>>>>> 1. If we encounter `Remove` actions, it should fail fast rather than
>>>>> silently skip it. Otherwise tables with DML will produce duplicate rows in
>>>>> the Iceberg table.
>>>>> 2. Tables with column mapping enabled) will produce silent data
>>>>> corruption because the Parquet files will have physical column names that
>>>>> don't match the logical schema. We should validate this and reject until
>>>>> column mapping support is added (which can be done as a separate PR).
>>>>>
>>>>> The PR relies heavily on io.delta.kernel.internal.* classes, which can
>>>>> be fragile. We should consider replacing them with the public kernel APIs.
>>>>>
>>>>> Best,
>>>>> Anoop
>>>>>
>>>>>
>>>>> On Mon, Feb 23, 2026 at 12:29 AM Vladislav Sidorovich via dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Iceberg Community,
>>>>>>
>>>>>> I recently opened a PR to update the existing Delta Lake to Iceberg
>>>>>> migration functionality to support recent Delta Lake table versions 
>>>>>> (read:
>>>>>> 3, write: 7). I would appreciate it if anyone take a look and share
>>>>>> thoughts on the architecture and initial implementation
>>>>>>
>>>>>> *PR Link:* https://github.com/apache/iceberg/pull/15407
>>>>>>
>>>>>> The main motivation for sharing this now is to get some early
>>>>>> feedback from the community on the approach and the initial 
>>>>>> implementation.
>>>>>>
>>>>>> To make reviewing easier, this PR doesn't remove or overwrite the old
>>>>>> logic. Instead, I’ve added a new interface implementation utilizing the 
>>>>>> *Delta
>>>>>> Lake Kernel library* (replacing the deprecated Delta Lake standalone
>>>>>> library). This side-by-side approach allows for easier comparison and
>>>>>> shouldn't introduce any issues with current usage scenarios.
>>>>>>
>>>>>>
>>>>>> *Current PR Scope:*
>>>>>>
>>>>>>    - Maintains support for the existing migration interface.
>>>>>>    - Migrates the underlying engine to the Delta Lake Kernel library.
>>>>>>    - Contains the basic migration flow.
>>>>>>    - Successfully converts all data types, table schemas, and
>>>>>>    partition specs.
>>>>>>    - Currently supports INSERT operations only (Delta Lake Add
>>>>>>    action).
>>>>>>    - *Testing:* Includes unit tests for all supported data types
>>>>>>    (including complex arrays and structures) and integration tests for
>>>>>>    insert-only scenarios using Spark 3.5.
>>>>>>
>>>>>> *Future Steps (Next PRs):*
>>>>>>
>>>>>> Once we align on this foundation, I plan to follow up with:
>>>>>>
>>>>>>    - Adding support for UPDATE and DELETE (Delta Lake Remove action).
>>>>>>    - Supporting all remaining Delta Lake actions.
>>>>>>    - Handling edge cases for partitions and generated columns.
>>>>>>    - Adding Schema Evolution support.
>>>>>>    - Adding Deletion Vector (DV) support.
>>>>>>    - Enabling Incremental Conversion (from/to specific Delta
>>>>>>    versions).
>>>>>>    - Adding all tables from the Delta golden tables for robust
>>>>>>    testing. *(Note: The current integration test will be updated for
>>>>>>    newer Delta Lake versions once the old standalone solution is fully
>>>>>>    deprecated/deleted).*
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Vladislav Sidorovich
>>>>>>
>>>>>> Feedback: *go/feedback-for-vladislav
>>>>>> <https://goto.google.com/feedback-for-vladislav> *
>>>>>> [image: Google Logo]
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Vladislav Sidorovich
>>>>
>>>> Feedback: *go/feedback-for-vladislav
>>>> <https://goto.google.com/feedback-for-vladislav> *
>>>> [image: Google Logo]
>>>>
>>>>
>>>>
>>
>> --
>> Best regards,
>> Vladislav Sidorovich
>>
>> Feedback: *go/feedback-for-vladislav
>> <https://goto.google.com/feedback-for-vladislav> *
>> [image: Google Logo]
>>
>>
>>

-- 
Best regards,
Vladislav Sidorovich

Feedback: *go/feedback-for-vladislav *
[image: Google Logo]

Reply via email to