This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 9b379a092c [docs] add docs for deletion-vectors.bitmap64 (#5572)
9b379a092c is described below
commit 9b379a092c11207c20dab12f73cb44f3ed02993a
Author: LsomeYeah <[email protected]>
AuthorDate: Wed May 7 12:51:10 2025 +0800
[docs] add docs for deletion-vectors.bitmap64 (#5572)
---
docs/content/concepts/spec/tableindex.md | 15 +++++++++++++--
docs/static/img/deletion-file.png | Bin 1160387 -> 167404 bytes
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/docs/content/concepts/spec/tableindex.md
b/docs/content/concepts/spec/tableindex.md
index dadd8be1dd..7011d214e4 100644
--- a/docs/content/concepts/spec/tableindex.md
+++ b/docs/content/concepts/spec/tableindex.md
@@ -51,7 +51,18 @@ The deletion file is a binary file, and the format is as
follows:
- Then, record <size of serialized bin, serialized bin, checksum of serialized
bin> in sequence.
- Size and checksum are BIG_ENDIAN Integer.
-For each serialized bin:
+For each serialized bin, its serialization format is determined by
`deletion-vectors.bitmap64`.
+Paimon will use a 32-bit bitmap to store deleted records by default, but if
`deletion-vectors.bitmap64` is set to true, a 64-bit bitmap will be used.
+Serialization of the two bitmaps is different. Note that only 64-bit bitmap
implementation is compatible with Iceberg.
+Serialized bin for 32-bit bitmap:(default)
- First, record a const magic number by an int (BIG_ENDIAN). Current the magic
number is 1581511376.
-- Then, record serialized bitmap. Which is a
[RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap)
(org.roaringbitmap.RoaringBitmap).
+- Then, record a 32-bit serialized bitmap. Which is a
[RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap)
(org.roaringbitmap.RoaringBitmap).
+
+Serialized bin for 64-bit bitmap:
+- First, record a const magic number by an int (LITTLE_ENDIAN). Current the
magic number is 1681511377.
+- Then, record a 64-bit serialized bitmap. Which supports positive 64-bit
positions (the most significant bit must be 0),
+ but is optimized for cases where most positions fit in 32 bits by using an
array of 32-bit Roaring bitmaps. The internal bitmap array is grown as needed
to accommodate the largest position.
+ The serialization of the 64-bit bitmap is as follows:
+ - First, record the size of bitmaps array by a long (LITTLE_ENDIAN).
+ - Then, record the index by an int (LITTLE_ENDIAN) and serialized bytes of
each bitmap in the array in sequence.
diff --git a/docs/static/img/deletion-file.png
b/docs/static/img/deletion-file.png
index e66aa43618..375a0c08bd 100644
Binary files a/docs/static/img/deletion-file.png and
b/docs/static/img/deletion-file.png differ