Re: [PR] GH-533: Add ALP (Adaptive Lossless floating-Point) encoding specification [parquet-format]

via GitHub Wed, 29 Apr 2026 14:13:07 -0700


prtkgaur commented on code in PR #557:
URL: https://github.com/apache/parquet-format/pull/557#discussion_r3163623109



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+

Review Comment:
   Done — added an encoding applicability table at the top of the file with all 
encodings and their supported types.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.

Review Comment:
   Done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |
+
+###### AlpInfo (4 bytes, both types)
+
+```
+ Byte:    0           1          2       3
+       +----------+----------+---------+---------+
+       | exponent |  factor  |  num_exceptions   |
+       |  (uint8) | (uint8)  |   (uint16 LE)     |
+       +----------+----------+---------+---------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | exponent | 1 byte | uint8 | Power-of-10 exponent *e*. Range: \[0, 10\] 
for FLOAT, \[0, 18\] for DOUBLE. |
+| 1 | factor | 1 byte | uint8 | Power-of-10 factor *f*. Range: \[0, *e*\]. |
+| 2 | num_exceptions | 2 bytes | uint16 | Number of exception values in this 
vector. |
+
+###### ForInfo for FLOAT (5 bytes)
+
+```
+ Byte:    0    1    2    3       4
+       +----+----+----+----+-----------+
+       | frame_of_reference | bit_width |
+       |    (int32 LE)      |  (uint8)  |
+       +----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 4 bytes | int32 | Minimum encoded integer in the 
vector |
+| 4 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 32\]. |
+
+###### ForInfo for DOUBLE (9 bytes)
+
+```
+ Byte:    0    1    2    3    4    5    6    7       8
+       +----+----+----+----+----+----+----+----+-----------+
+       |          frame_of_reference           | bit_width |
+       |              (int64 LE)               |  (uint8)  |
+       +----+----+----+----+----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 8 bytes | int64 | Minimum encoded long in the 
vector |
+| 8 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 64\]. |
+
+###### PackedValues
+
+The FOR-encoded deltas, bit-packed into `ceil(num_elements_in_vector * 
bit_width / 8)` bytes.
+Values are packed from the least significant bit of each byte to the most 
significant bit,
+in groups of 8 values, using the same bit-packing order as the
+[RLE/Bit-Packing Hybrid](#RLE) encoding.
+
+If `bit_width` is 0, no bytes are stored (all deltas are zero, meaning all 
encoded
+integers are equal to `frame_of_reference`).
+
+###### ExceptionPositions
+
+An array of `num_exceptions` little-endian uint16 values, each giving
+the 0-based index within the vector of an exception value.
+
+###### ExceptionValues
+
+An array of `num_exceptions` values in the original floating-point type
+(4 bytes little-endian IEEE 754 for FLOAT, 8 bytes for DOUBLE), stored in
+the same order as the corresponding positions.
+
+#### Encoding
+
+##### Encoding Formula
+
+```
++-------------------------------------------------------------------+
+|                                                                   |
+|   encoded = round( value  *  10^e  *  10^(-f) )                  |
+|                                                                   |
+|   decoded = encoded  *  10^f  *  10^(-e)                          |
+|                                                                   |
++-------------------------------------------------------------------+
+```
+
+The encoding uses two separate multiplications (not a single multiplication by
+`10^(e-f)`, and not division) to ensure that implementations produce identical
+floating-point rounding across languages. The powers of 10 MUST be stored as
+precomputed floating-point constants (i.e., literal values like `1e-3f`), not
+computed at runtime.
+
+##### Fast Rounding
+
+The rounding function uses a "magic number" technique for branchless rounding:
+
+| Type   | Magic Number                      | Formula                         
 |
+|--------|-----------------------------------|----------------------------------|
+| FLOAT  | 2^22 + 2^23 = 12,582,912         | `(int)((value + magic) - magic)` 
|
+| DOUBLE | 2^51 + 2^52 = 6,755,399,441,055,744 | `(long)((value + magic) - 
magic)` |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |
+
+###### AlpInfo (4 bytes, both types)
+
+```
+ Byte:    0           1          2       3
+       +----------+----------+---------+---------+
+       | exponent |  factor  |  num_exceptions   |
+       |  (uint8) | (uint8)  |   (uint16 LE)     |
+       +----------+----------+---------+---------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | exponent | 1 byte | uint8 | Power-of-10 exponent *e*. Range: \[0, 10\] 
for FLOAT, \[0, 18\] for DOUBLE. |
+| 1 | factor | 1 byte | uint8 | Power-of-10 factor *f*. Range: \[0, *e*\]. |
+| 2 | num_exceptions | 2 bytes | uint16 | Number of exception values in this 
vector. |
+
+###### ForInfo for FLOAT (5 bytes)
+
+```
+ Byte:    0    1    2    3       4
+       +----+----+----+----+-----------+
+       | frame_of_reference | bit_width |
+       |    (int32 LE)      |  (uint8)  |
+       +----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 4 bytes | int32 | Minimum encoded integer in the 
vector |
+| 4 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 32\]. |
+
+###### ForInfo for DOUBLE (9 bytes)
+
+```
+ Byte:    0    1    2    3    4    5    6    7       8
+       +----+----+----+----+----+----+----+----+-----------+
+       |          frame_of_reference           | bit_width |
+       |              (int64 LE)               |  (uint8)  |
+       +----+----+----+----+----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 8 bytes | int64 | Minimum encoded long in the 
vector |
+| 8 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 64\]. |
+
+###### PackedValues
+
+The FOR-encoded deltas, bit-packed into `ceil(num_elements_in_vector * 
bit_width / 8)` bytes.
+Values are packed from the least significant bit of each byte to the most 
significant bit,
+in groups of 8 values, using the same bit-packing order as the
+[RLE/Bit-Packing Hybrid](#RLE) encoding.
+
+If `bit_width` is 0, no bytes are stored (all deltas are zero, meaning all 
encoded
+integers are equal to `frame_of_reference`).
+
+###### ExceptionPositions
+
+An array of `num_exceptions` little-endian uint16 values, each giving
+the 0-based index within the vector of an exception value.
+
+###### ExceptionValues
+
+An array of `num_exceptions` values in the original floating-point type
+(4 bytes little-endian IEEE 754 for FLOAT, 8 bytes for DOUBLE), stored in
+the same order as the corresponding positions.
+
+#### Encoding
+
+##### Encoding Formula
+
+```
++-------------------------------------------------------------------+
+|                                                                   |
+|   encoded = round( value  *  10^e  *  10^(-f) )                  |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+

Review Comment:
   Agreed — simplified the pipeline to 'Choose parameters'. The sampling/preset 
strategy remains in the Parameter Selection section as a SHOULD recommendation.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |

Review Comment:
   changing the code to read ALP's newer laid out bits would be difficult to 
apply across all implementations. but AlpRD is a completely newer 
implementations.
   
   Added a note to the field description.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.

Review Comment:
   Removed the packed data size bullet, kept the elements-per-vector 
clarification.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |

Review Comment:
   done



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |
+
+###### AlpInfo (4 bytes, both types)
+
+```
+ Byte:    0           1          2       3
+       +----------+----------+---------+---------+
+       | exponent |  factor  |  num_exceptions   |
+       |  (uint8) | (uint8)  |   (uint16 LE)     |
+       +----------+----------+---------+---------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | exponent | 1 byte | uint8 | Power-of-10 exponent *e*. Range: \[0, 10\] 
for FLOAT, \[0, 18\] for DOUBLE. |
+| 1 | factor | 1 byte | uint8 | Power-of-10 factor *f*. Range: \[0, *e*\]. |
+| 2 | num_exceptions | 2 bytes | uint16 | Number of exception values in this 
vector. |
+
+###### ForInfo for FLOAT (5 bytes)
+
+```
+ Byte:    0    1    2    3       4
+       +----+----+----+----+-----------+
+       | frame_of_reference | bit_width |
+       |    (int32 LE)      |  (uint8)  |
+       +----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 4 bytes | int32 | Minimum encoded integer in the 
vector |
+| 4 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 32\]. |
+
+###### ForInfo for DOUBLE (9 bytes)
+
+```
+ Byte:    0    1    2    3    4    5    6    7       8
+       +----+----+----+----+----+----+----+----+-----------+
+       |          frame_of_reference           | bit_width |
+       |              (int64 LE)               |  (uint8)  |
+       +----+----+----+----+----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 8 bytes | int64 | Minimum encoded long in the 
vector |
+| 8 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 64\]. |
+
+###### PackedValues
+
+The FOR-encoded deltas, bit-packed into `ceil(num_elements_in_vector * 
bit_width / 8)` bytes.
+Values are packed from the least significant bit of each byte to the most 
significant bit,
+in groups of 8 values, using the same bit-packing order as the

Review Comment:
   Good catch — removed 'groups of 8' phrasing. Now simply references the same 
LSB-first packing order as RLE/Bit-Packing Hybrid.
   
   (I think RleBitPackHybrid was in my mind at that point :) )



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |
+
+###### AlpInfo (4 bytes, both types)
+
+```
+ Byte:    0           1          2       3
+       +----------+----------+---------+---------+
+       | exponent |  factor  |  num_exceptions   |
+       |  (uint8) | (uint8)  |   (uint16 LE)     |
+       +----------+----------+---------+---------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | exponent | 1 byte | uint8 | Power-of-10 exponent *e*. Range: \[0, 10\] 
for FLOAT, \[0, 18\] for DOUBLE. |
+| 1 | factor | 1 byte | uint8 | Power-of-10 factor *f*. Range: \[0, *e*\]. |
+| 2 | num_exceptions | 2 bytes | uint16 | Number of exception values in this 
vector. |
+
+###### ForInfo for FLOAT (5 bytes)
+
+```
+ Byte:    0    1    2    3       4
+       +----+----+----+----+-----------+
+       | frame_of_reference | bit_width |
+       |    (int32 LE)      |  (uint8)  |
+       +----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 4 bytes | int32 | Minimum encoded integer in the 
vector |
+| 4 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 32\]. |
+
+###### ForInfo for DOUBLE (9 bytes)
+
+```
+ Byte:    0    1    2    3    4    5    6    7       8
+       +----+----+----+----+----+----+----+----+-----------+
+       |          frame_of_reference           | bit_width |
+       |              (int64 LE)               |  (uint8)  |
+       +----+----+----+----+----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 8 bytes | int64 | Minimum encoded long in the 
vector |
+| 8 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 64\]. |
+
+###### PackedValues
+
+The FOR-encoded deltas, bit-packed into `ceil(num_elements_in_vector * 
bit_width / 8)` bytes.
+Values are packed from the least significant bit of each byte to the most 
significant bit,
+in groups of 8 values, using the same bit-packing order as the
+[RLE/Bit-Packing Hybrid](#RLE) encoding.
+
+If `bit_width` is 0, no bytes are stored (all deltas are zero, meaning all 
encoded
+integers are equal to `frame_of_reference`).
+
+###### ExceptionPositions
+
+An array of `num_exceptions` little-endian uint16 values, each giving
+the 0-based index within the vector of an exception value.
+
+###### ExceptionValues
+
+An array of `num_exceptions` values in the original floating-point type
+(4 bytes little-endian IEEE 754 for FLOAT, 8 bytes for DOUBLE), stored in
+the same order as the corresponding positions.
+
+#### Encoding
+
+##### Encoding Formula
+
+```
++-------------------------------------------------------------------+
+|                                                                   |
+|   encoded = round( value  *  10^e  *  10^(-f) )                  |
+|                                                                   |
+|   decoded = encoded  *  10^f  *  10^(-e)                          |
+|                                                                   |
++-------------------------------------------------------------------+
+```
+
+The encoding uses two separate multiplications (not a single multiplication by
+`10^(e-f)`, and not division) to ensure that implementations produce identical
+floating-point rounding across languages. The powers of 10 MUST be stored as
+precomputed floating-point constants (i.e., literal values like `1e-3f`), not

Review Comment:
   You're right — this shouldn't mandate literals vs runtime computation. 
Reworded to require that encoder and decoder use identical power-of-10 values, 
without prescribing how they're obtained.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).

Review Comment:
   Good call — introduced exponent, factor, and exceptions in the opening 
paragraph itself so the reader has the mental model before hitting the layout 
details.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.

Review Comment:
   Agreed — added sentence about value independence enabling random access and 
parallel encode/decode.



##########
Encodings.md:
##########
@@ -391,3 +363,522 @@ After applying the transformation, the data has the 
following representation:
 ```
 Bytes  AA 00 A3 BB 11 B4 CC 22 C5 DD 33 D6
 ```
+
+<a name="ALP"></a>
+### Adaptive Lossless floating-Point: (ALP = 10)
+
+Supported Types: FLOAT, DOUBLE
+
+This encoding is adapted from the paper
+["ALP: Adaptive Lossless floating-Point 
Compression"](https://dl.acm.org/doi/10.1145/3626717)
+by Afroozeh and Boncz (SIGMOD 2024).
+
+ALP works by converting floating-point values to integers using decimal 
scaling,
+then applying Frame of Reference (FOR) encoding and bit-packing. Values that
+cannot be losslessly converted are stored as exceptions. The encoding achieves
+high compression for decimal-like floating-point data (e.g., monetary values,
+sensor readings) while remaining fully lossless.
+
+#### Overview
+
+ALP encoding consists of a page-level header followed by an offset array and 
one
+or more encoded vectors (batches of values). Each vector contains up to
+`vector_size` elements (default 1024).
+
+```
++-------------+-----------------------------+--------------------------------------+
+|   Header    |        Offset Array         |            Vector Data           
    |
+|  (7 bytes)  |   (num_vectors * 4 bytes)   |            (variable)            
    |
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+| Page Header | off0 | off1 | ... | off N-1 | Vector 0 | Vector 1 | ... | Vec 
N-1  |
+|  (7 bytes)  | (4B) | (4B) |     |  (4B)   |(variable)|(variable)|     
|(variable)|
++-------------+------+------+-----+---------+----------+----------+-----+----------+
+```
+
+The compression pipeline for each vector is:
+
+```
+                    Input: float/double array
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  1. SAMPLING & PRESET GENERATION                         |
+    |     Sample vectors from column chunk                     |
+    |     Try all (exponent, factor) combinations              |
+    |     Select best k combinations for preset                |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  2. DECIMAL ENCODING                                     |
+    |     encoded[i] = round(value[i] * 10^e * 10^(-f))       |
+    |     Detect exceptions where decode(encode(v)) != v       |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  3. FRAME OF REFERENCE (FOR)                             |
+    |     min_val = min(encoded[])                             |
+    |     delta[i] = encoded[i] - min_val                      |
+    +----------------------------------------------------------+
+                              |
+                              v
+    +----------------------------------------------------------+
+    |  4. BIT PACKING                                          |
+    |     bit_width = ceil(log2(max_delta + 1))                |
+    |     Pack each delta into bit_width bits                  |
+    +----------------------------------------------------------+
+                              |
+                              v
+                   Output: Serialized vector bytes
+```
+
+#### Page Layout
+
+##### Header (7 bytes)
+
+All multi-byte values are little-endian.
+
+```
+ Byte:    0              1               2              3    4    5    6
+       +----------------+---------------+--------------+----+----+----+----+
+       | compression    | integer       | log_vector   |     num_elements  |
+       | _mode          | _encoding     | _size        |     (int32 LE)    |
+       +----------------+---------------+--------------+----+----+----+----+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | compression_mode | 1 byte | uint8 | Compression mode (must be 0 = ALP) |
+| 1 | integer_encoding | 1 byte | uint8 | Integer encoding (must be 0 = FOR + 
bit-packing) |
+| 2 | log_vector_size | 1 byte | uint8 | log2(vector\_size). Must be in \[3, 
15\]. Default: 10 (vector size 1024) |
+| 3 | num_elements | 4 bytes | int32 | Total number of floating-point values 
in the page |
+
+The number of vectors is `ceil(num_elements / vector_size)`. The last vector 
may
+contain fewer than `vector_size` elements.
+
+**Note:** The number of elements per vector and the packed data size are NOT 
stored
+in the header. They are derived:
+* Elements per vector: `vector_size` for all vectors except the last, which 
may be smaller.
+* Packed data size: `ceil(num_elements_in_vector * bit_width / 8)`.
+
+##### Offset Array
+
+Immediately following the header is an array of `num_vectors` little-endian 
uint32
+values. Each offset gives the byte position of the corresponding vector's data,
+measured from the start of the offset array itself.
+
+The first offset equals `num_vectors * 4` (pointing just past the offset 
array).
+Each subsequent offset equals the previous offset plus the stored size of the
+previous vector.
+
+##### Vector Format
+
+Each vector is self-describing and contains the encoding parameters, FOR 
metadata,
+bit-packed encoded values, and exception data.
+
+```
++-------------------+-----------------+-------------------+---------------------+-------------------+
+|      AlpInfo      |     ForInfo     |   PackedValues    | ExceptionPositions 
 | ExceptionValues   |
+|     (4 bytes)     | (5B or 9B)      |    (variable)     |     (variable)     
 |    (variable)     |
++-------------------+-----------------+-------------------+---------------------+-------------------+
+```
+
+Vector header sizes:
+| Type   | AlpInfo | ForInfo | Total Header |
+|--------|---------|---------|--------------|
+| FLOAT  | 4 bytes | 5 bytes | 9 bytes      |
+| DOUBLE | 4 bytes | 9 bytes | 13 bytes     |
+
+Data section sizes:
+| Section             | Size Formula                | Description              
    |
+|---------------------|-----------------------------|------------------------------|
+| PackedValues        | ceil(N * bit\_width / 8)    | Bit-packed delta values  
    |
+| ExceptionPositions  | num\_exceptions * 2 bytes   | uint16 indices of 
exceptions |
+| ExceptionValues     | num\_exceptions * sizeof(T) | Original float/double 
values |
+
+###### AlpInfo (4 bytes, both types)
+
+```
+ Byte:    0           1          2       3
+       +----------+----------+---------+---------+
+       | exponent |  factor  |  num_exceptions   |
+       |  (uint8) | (uint8)  |   (uint16 LE)     |
+       +----------+----------+---------+---------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | exponent | 1 byte | uint8 | Power-of-10 exponent *e*. Range: \[0, 10\] 
for FLOAT, \[0, 18\] for DOUBLE. |
+| 1 | factor | 1 byte | uint8 | Power-of-10 factor *f*. Range: \[0, *e*\]. |
+| 2 | num_exceptions | 2 bytes | uint16 | Number of exception values in this 
vector. |
+
+###### ForInfo for FLOAT (5 bytes)
+
+```
+ Byte:    0    1    2    3       4
+       +----+----+----+----+-----------+
+       | frame_of_reference | bit_width |
+       |    (int32 LE)      |  (uint8)  |
+       +----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 4 bytes | int32 | Minimum encoded integer in the 
vector |
+| 4 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 32\]. |
+
+###### ForInfo for DOUBLE (9 bytes)
+
+```
+ Byte:    0    1    2    3    4    5    6    7       8
+       +----+----+----+----+----+----+----+----+-----------+
+       |          frame_of_reference           | bit_width |
+       |              (int64 LE)               |  (uint8)  |
+       +----+----+----+----+----+----+----+----+-----------+
+```
+
+| Offset | Field | Size | Type | Description |
+|--------|-------|------|------|-------------|
+| 0 | frame_of_reference | 8 bytes | int64 | Minimum encoded long in the 
vector |
+| 8 | bit_width | 1 byte | uint8 | Bits per packed value. Range: \[0, 64\]. |
+
+###### PackedValues
+
+The FOR-encoded deltas, bit-packed into `ceil(num_elements_in_vector * 
bit_width / 8)` bytes.
+Values are packed from the least significant bit of each byte to the most 
significant bit,
+in groups of 8 values, using the same bit-packing order as the
+[RLE/Bit-Packing Hybrid](#RLE) encoding.
+
+If `bit_width` is 0, no bytes are stored (all deltas are zero, meaning all 
encoded
+integers are equal to `frame_of_reference`).
+
+###### ExceptionPositions
+
+An array of `num_exceptions` little-endian uint16 values, each giving
+the 0-based index within the vector of an exception value.
+
+###### ExceptionValues
+
+An array of `num_exceptions` values in the original floating-point type
+(4 bytes little-endian IEEE 754 for FLOAT, 8 bytes for DOUBLE), stored in
+the same order as the corresponding positions.
+
+#### Encoding
+
+##### Encoding Formula
+
+```
++-------------------------------------------------------------------+
+|                                                                   |
+|   encoded = round( value  *  10^e  *  10^(-f) )                  |
+|                                                                   |
+|   decoded = encoded  *  10^f  *  10^(-e)                          |
+|                                                                   |
++-------------------------------------------------------------------+
+```
+
+The encoding uses two separate multiplications (not a single multiplication by
+`10^(e-f)`, and not division) to ensure that implementations produce identical
+floating-point rounding across languages. The powers of 10 MUST be stored as
+precomputed floating-point constants (i.e., literal values like `1e-3f`), not
+computed at runtime.
+
+##### Fast Rounding
+
+The rounding function uses a "magic number" technique for branchless rounding:
+
+| Type   | Magic Number                      | Formula                         
 |
+|--------|-----------------------------------|----------------------------------|
+| FLOAT  | 2^22 + 2^23 = 12,582,912         | `(int)((value + magic) - magic)` 
|

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] GH-533: Add ALP (Adaptive Lossless floating-Point) encoding specification [parquet-format]

Reply via email to