This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git


The following commit(s) were added to refs/heads/master by this push:
     new 9b379a092c [docs] add docs for deletion-vectors.bitmap64 (#5572)
9b379a092c is described below

commit 9b379a092c11207c20dab12f73cb44f3ed02993a
Author: LsomeYeah <[email protected]>
AuthorDate: Wed May 7 12:51:10 2025 +0800

    [docs] add docs for deletion-vectors.bitmap64 (#5572)
---
 docs/content/concepts/spec/tableindex.md |  15 +++++++++++++--
 docs/static/img/deletion-file.png        | Bin 1160387 -> 167404 bytes
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/docs/content/concepts/spec/tableindex.md 
b/docs/content/concepts/spec/tableindex.md
index dadd8be1dd..7011d214e4 100644
--- a/docs/content/concepts/spec/tableindex.md
+++ b/docs/content/concepts/spec/tableindex.md
@@ -51,7 +51,18 @@ The deletion file is a binary file, and the format is as 
follows:
 - Then, record <size of serialized bin, serialized bin, checksum of serialized 
bin> in sequence.
 - Size and checksum are BIG_ENDIAN Integer.
 
-For each serialized bin:
+For each serialized bin, its serialization format is determined by 
`deletion-vectors.bitmap64`. 
+Paimon will use a 32-bit bitmap to store deleted records by default, but if 
`deletion-vectors.bitmap64` is set to true, a 64-bit bitmap will be used.
+Serialization of the two bitmaps is different. Note that only 64-bit bitmap 
implementation is compatible with Iceberg.
 
+Serialized bin for 32-bit bitmap:(default)
 - First, record a const magic number by an int (BIG_ENDIAN). Current the magic 
number is 1581511376.
-- Then, record serialized bitmap. Which is a 
[RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) 
(org.roaringbitmap.RoaringBitmap).
+- Then, record a 32-bit serialized bitmap. Which is a 
[RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) 
(org.roaringbitmap.RoaringBitmap).
+
+Serialized bin for 64-bit bitmap:
+- First, record a const magic number by an int (LITTLE_ENDIAN). Current the 
magic number is 1681511377.
+- Then, record a 64-bit serialized bitmap. Which supports positive 64-bit 
positions (the most significant bit must be 0), 
+  but is optimized for cases where most positions fit in 32 bits by using an 
array of 32-bit Roaring bitmaps. The internal bitmap array is grown as needed 
to accommodate the largest position.
+  The serialization of the 64-bit bitmap is as follows:
+  - First, record the size of bitmaps array by a long (LITTLE_ENDIAN).
+  - Then, record the index by an int (LITTLE_ENDIAN) and serialized bytes of 
each bitmap in the array in sequence.
diff --git a/docs/static/img/deletion-file.png 
b/docs/static/img/deletion-file.png
index e66aa43618..375a0c08bd 100644
Binary files a/docs/static/img/deletion-file.png and 
b/docs/static/img/deletion-file.png differ

Reply via email to