This is an automated email from the ASF dual-hosted git repository. ashvin pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-xtable.git
The following commit(s) were added to refs/heads/main by this push: new dc209962 Add change proposal for conversion of deletion vectors dc209962 is described below commit dc209962c8f255be3b2afa5e213e5ca7735a3a14 Author: Ashvin Agrawal <ash...@apache.org> AuthorDate: Tue Jan 28 09:15:12 2025 -0800 Add change proposal for conversion of deletion vectors --- rfc/rfc-2/2 - Deletion Info Conversion.md | 227 ++++++++++++++++++++++++++++++ 1 file changed, 227 insertions(+) diff --git a/rfc/rfc-2/2 - Deletion Info Conversion.md b/rfc/rfc-2/2 - Deletion Info Conversion.md new file mode 100644 index 00000000..c18724b7 --- /dev/null +++ b/rfc/rfc-2/2 - Deletion Info Conversion.md @@ -0,0 +1,227 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +# RFC-2: Support for conversion of Deletion Vectors from Delta Lake to Iceberg format + +## Proposers + +- @ashvin + +## Status + +GH Feature Request: https://github.com/apache/incubator-xtable/issues/339 + +## Abstract +Deletion vectors are one of the most popular features in table formats, allowing logical deletion of records, by marking +them as deleted, without requiring rewriting of the data files to remove them physically. This document provides a +high-level proposal of deletion vector conversion in XTable, focusing on the current implementation of deletion vectors +in Delta Lake and Iceberg. + +## Background +Deletion vectors feature in table formats like **_Delta Lake_** and **_Apache Iceberg_** are designed to enhance data +management efficiency. These vectors allow for the marking of deleted rows without physically removing them from the +data files, thereby optimizing operations such as `DELETE`, `UPDATE`, and `MERGE`. By maintaining a separate files of +deletions, deletion vectors enable faster query performance and reduce the need for costly data file rewrites. This +approach aligns with the merge-on-read paradigm, where changes are merged during read operations rather than write +operations, ensuring minimal disruption to ongoing data ingestion processes. + +The main types of delete vectors are _**equality deletes**_ and _**position deletes**_. + +#### Equality deletes +Equality deletes allow for row-level deletions based on specific column values rather than their positions within a +file. This means that records in data files matching certain criteria are considered deleted without altering the +original data files. This deletion vector type is useful when identifying and storing the exact position of the row(s) +is expensive, but the content that identifies the row is clear. This feature is particularly useful in streaming use +cases where updates are frequent. Currently, Iceberg supports equality deletes, however, there is a proposal to replace +it in future versions. + +#### Position deletes +Position deletes, on the other hand, specify the ordinals of records within a data file that should be considered +deleted. This method relies on identifying the exact position within the data file, thereby providing a more precise and +direct approach to managing row-level deletions. Position deletes can be further categorized into two representations: +simple table representations and compressed bitmap representations. + +**_Simple table representations_** involve a straightforward structure typically stored in Parquet format, containing +columns like `file_path` and `pos`. The `file_path` indicates the path of the data file, while `pos` specifies the +ordinal of the record to be deleted. These files can optionally include the row data for equality deletes, aiding in +identifying the exact change which is particularly useful in CDC applications. + +A **_compressed bitmap representations_**, leverage a compressed bitmap to represent deletions. This method allows for a +more compact and optimized storage of delete vectors, which in turn saves memory and network bandwidth. The +_RoaringBitmap_ is a popular choice for this representation. While the bitmaps are typically stored in separate files, +in cases where the bitmaps are small because the number of deleted records is low, they can be embedded within the +commit logs to optimize the number of network round trips required to load all the files. + +#### Row level deletes in Delta Lake +Currently, Delta Lake supports RoaringBitmaps for deletion vectors, while Iceberg supports simple table representations. +The java libraries of Delta Lake and Iceberg provide APIs to read and write between these representations. Delta Lake +provides support for deletion vectors for `DELETE` operations since versions 2.4, while `UPDATE` and `MERGE` commands +default to the copy-on-write mode. + +In Delta Lake, the implementation of deletion vectors involves `actions` in the commit logs, namely `AddFile` and +`RemoveFile`. When a DML command is executed, it identifies the rows to be modified and generates a deletion vector, +typically in a compressed bitmap format like RoaringBitmap. The generated deletion vector is stored directly +in the commit logs `json` files (when the number of deletions is minimal). When stored in separate files, the files +paths are encoded and referenced in the json files. With each DML operation, the commit log records the removal of the +original data file and its addition with updated details including the newly added deletion vector. + +Example: +``` +Type Path DV Path +RemoveFile file_a.parquet NULL +AddFile file_a.parquet file_a_dv_1.bin +``` + +If a file with an existing deletion vector is updated again, a new deletion vector is created, merging the previous and +new deletions. The commit log records the removal of the old deletion vector information and the addition of the new +one. + +Example: +``` +Type Path DV Path +RemoveFile file_a.parquet file_a_dv_1.bin +AddFile file_a.parquet file_a_dv_2.bin +``` + +When a compaction operation rewrites records into new files, the deletion vectors associated with data files that are +removed are also marked as deleted. The commit log records the removal of the old file and deletion vector and +the addition of the new compacted file. + +Example: +``` +Type Path DV Path +RemoveFile file_a.parquet file_a_dv_2.bin +AddFile file_d.parquet NULL +``` + +> Note: XTable would generate deletion vectors in the target table format corresponding to the source table format. +> While compaction in the source table will result in the removal of deletion vectors, XTable currently does not remove +> the generated deletion files in the target table. Over time, if the deletion vectors are not removed in the target +> table, they could significantly increase the number of files and storage size of target table. This is a known issue +> and a mitigation strategy is currently being discussed and tracked by this +> [task](https://github.com/apache/incubator-xtable/issues/655). + +#### Row level deletes in Iceberg +Iceberg (v2), on the other hand, supports equality deletes and simple table representations for position based row level +deletes. This proposal focuses on positional deletes. Similar to Delta Lake, Iceberg also maintains a separate file for +recording ordinals of deleted records. The delete entries are typically stored in parquet files with `file_path` and +`pos` as mandatory columns. + +Example: +``` +file_path, pos +/test_table/part-00000-3c1805f0c478-c000.parquet 36 +/test_table/part-00000-3c1805f0c478-c000.parquet 35 +/test_table/part-00000-446d734f04e8-c000.parquet 7 +/test_table/part-00000-446d734f04e8-c000.parquet 9 +``` + +However, unlike Delta Lake, the Iceberg commit log does not embed the metadata of deletion +files withing the metadata of data files. Instead, it records addition of a deletion file as an independent event in +a separate manifest file. The resulting manifest list file uses the same structure as that for data files, but with a +few more properties. + +Example: +``` + "added-position-delete-files": "1", + "added-delete-files": "1", + "added-position-deletes": "12", + "total-delete-files": "3", + "total-position-deletes": "27", + "total-equality-deletes": "0" +``` + + +## Implementation of the conversion of Deletion Vectors in XTable +On a high level, the implementation of deletion vector conversion in XTable involves the following steps. XTable must +first locate the deletion vector, which can be stored either inline within the commit logs or can be stored in a +separate deletion vector file whose path is recorded in the commit logs. Once located, XTable can utilize the utility +methods provided by table format libraries to stream the ordinals of the records affected by the deletion operations. + +Unlike currently supported conversion of metadata in XTable, deletion vectors require special handling. While all +table metadata is present in commit logs and manifest files, deletion vectors can be stored in separate files. The size +of the deletion vectors is proportional to the number of records deleted, which can be significant in large tables. +The proposed implementation aims to handle this efficiently by streaming the records from the sources to the targets. + +**New Class: `InternalDeletionVector`** + +To improve the functionality, a new class to manage the internal in-memory representation of deletion vectors is added. +This is similar to creating internal representation for data files, schema, and partitioning spec. For XTable, adding +a new deletion vector file is same as adding a new data file. Moreover, the deletion vector representation is an +extension of the data file representation, with additional fields to capture the deletion vector metadata. + +The following metadata is relevant for deletion vectors: +- `size`: Size of the deletion vector. +- `countRecordsDeleted`: Number of records deleted. +- `sourceDeletionVectorFilePath`: Path of the deletion vector file (optional as deletion vectors can be stored inline). +> The fields above are inherited from InternalDataFile. The fields below are new fields specific to deletion vectors. +- `dataFilePath`: Path of the data file associated with the deletion vector. +- `offset`: Offset of the deletion vector within the file (optional as inline deletion vectors do not have an offset). +- `ordinalStream`: Stream of ordinals of the records deleted. + +**Modifications in `DataFilesDiff`** + +Currently, XTable only converts the addition and removal of data files. To support deletion vectors, XTable would need +to detect new deletion vectors in the source table and add it to the conversion pipeline similar to data files. +Handling new deletion vectors does not require any changes in the `DataFilesDiff` class. The class encapsulates the +addition and removal of data files, and the addition of deletion vectors is treated as a new data file addition. +Which means, if a source produces deletion vectors, going forward, the `filesAdded` field in `DataFilesDiff` will +not only contain data files but also deletion vectors. The consumers which cannot handle instance of +`InternalDeletionVector` in `filesAdded` can either ignore them or raise an exception. + +**Updates in `DeltaActionsConverter`** +The converter class is responsible for converting Delta Lake `actions` into internal representations, resulting in +`TableChange` objects. The converter is updated to extract deletion vectors from the `AddFile` actions. Currently, +the converters read content from source metadata files and then close the file handles. However, this will be different +for deletion vectors. To avoid the potential memory explosion, the converters will create a provider of deletion vectors +ordinals, which will be used to stream the ordinals of the records deleted to the targets. + +**Changes in `DeltaConversionSource`** +As described above, the delta commit logs adds data file entries in commit log each time delete vectors are updated. +This needs special handling in the delta source, to avoid duplication of data file actions. + +**New class `IcebergDeleteVectorConverter`** + +The purpose of this class is to convert internal deletion vectors to Iceberg positional delete files. The class will +contain methods matching other Iceberg converters, for instance `toIceberg`. However, similar to unique handling of +deletion vectors in `DeltaActionsConverter`, this class will have a unique implementation. This class will generate +data files for deletion vectors, which will be added to the target table. The cost of conversion of deletion vectors +is proportional to the number of records deleted, and not the size of metadata or number of files. + +Iceberg is quite +flexible when it comes to location of data and metadata files. Since deletion vectors are data files, colocating them +with metadata files could be confusing. Typically, deletion files are stored in partition specific directories, in +the data directory of a table. However, it seems cleaner to keep all XTable generated data files in a separate +directory, which will be configurable. The default location for deletion vectors will be `deletion-vectors` directory +in the base directory of the table. + +**Modifications in `IcebergDataFileUpdatesSync` and `IcebergConversionTarget`** +Finally, existing classes that handle Iceberg data files and sync operations will be updated to handle deletion vectors. +Most changes are straightforward and inline with existing logic. However, as described earlier, new deletion vectors +are added to a separate manifest file in Iceberg. This requires invoking a separate transaction operation specific +to row level operations. The row level operation is initiated only if there are new deletion vectors in the +`FilesAdded`. + +## Rollout/Adoption Plan + +This change does not cause any breaking changes. Before this change, delete vectors were ignored and would cause +incorrect results in the target table. With this change, XTable will be able to convert the deletion vectors from Delta +Lake to Iceberg format, ensuring that the target table is consistent with the source table. + +## Test Plan + +New tests will be added to verify the conversion of deletion vectors from Delta Lake to Iceberg format. The integration +tests will cover generation of position deletes using spark sql, and the conversion of these deletes to Iceberg format. \ No newline at end of file