Re: [PR] [Variant] Optimize the object header generation logic in ObjectBuilder::finish [arrow-rs]

via GitHub Sun, 28 Dec 2025 19:19:20 -0800


klion26 commented on PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#issuecomment-3695333270


   As this is included in #8480, which wants to finalize Variant Type support 
in Parquet, I want to push this forward to a final state.
   
   # Conclusion
   After playing with some benchmarks, I have some conclusions here
   - We can add a `HeaderWriter` for object/list builder(the headwriter is 
generic with some const parameter), listbuilder and objectbuilder use the same 
header writer, and it will improve the new benchmark(described below) by 3%~4%.
   - We need some benchmarks to better measure the object/list builder's 
work(currently, there are >50% times on JSON-related work)
   
   CC @alamb @scovich 
   
   # Details
   
   - I re-ran the existing benchmarks for multiple solutions(including the 
generic with const parameter), and found that there wasn't much difference in 
the results.
   - Use `sample` and [`flamegraph` 
tools](https://github.com/brendangregg/FlameGraph) to generate the flamegraph, 
which shows that the `objectbuilder::finish` only consumes ~2% time, and 
JSON-related work consumes ~54% time, ObjectBuilder::insert ~ 14%, and 
ObjectBuilder::append_value 15% (flamegraph attached below)
   - I added a new benchmark(described below) that excludes the Json 
deserialize work for the ObjectBuilder, shows that `ObjectBuilder::finish` 
consumes ~27%, and `ObjectBuilder::append_value` consumes ~ 62%
   - I ran a benchmark for the single-depth object builder with 100/200 fields, 
which shows that the new solution will win with 3% ~ 4% improvement. 
   - did not add a benchmark for the `ObjectBuilder::finish` because the 
`append_value` and `finish` are a single unit for the caller.
   
   ## Benchmark
   The benchmark results
   - `optimize-with-header-writer-const-generic` is optimized headerWriter and 
const generic(branch goes 
[here](https://github.com/klion26/arrow-rs/blob/b7c3078937f6e511a9e76c4b45e423da60585322/parquet-variant/src/builder/object.rs#L27-L68),
 and attach sample code below)
   - `main_with_object_bench` is the main branch with the new object builder 
benchmark(the generate code 
[here](https://github.com/klion26/arrow-rs/blob/7d176cc8dbb89f558df760d64bfe20f45aff16b4/parquet-variant-compute/benches/variant_kernels.rs#L449-L503),
 the benchmark [code 
here](https://github.com/klion26/arrow-rs/blob/7d176cc8dbb89f558df760d64bfe20f45aff16b4/parquet-variant-compute/benches/variant_kernels.rs#L75-L110),
 sample code below)
   - `optimize-with-packedu32iterator` is based on `main-with-object-bench` and 
uses the `packedu32Iteratro`(code here -- 
[PackedU32Iterator](https://github.com/klion26/arrow-rs/blob/5ddfba050d475026c5545ba66d675c0335986887/parquet-variant/src/builder.rs#L62-L100),
 and [the caller 
logic](https://github.com/klion26/arrow-rs/blob/5ddfba050d475026c5545ba66d675c0335986887/parquet-variant/src/builder/object.rs#L253-L398),
 and sample code below)
   - The result shows that `optimize-with-header-writer-const-generic` and 
`optimize-with-packedu32iterator` are both better than 
`main`(`optimize-with-packedu32iterator` is better), but as 
`optimize-with-header-writer-const-generic` is more easily understand(maybe 
maintain), wo I lean to use `optimize-with-header-writer-const-generic` 
   
   ```
   group                                                                        
                   optimize-with-header-writer-const-generic     
main-with-object-bench            optimize-with-packedu32iterator
   -----                                                                        
                   --------------------------------------------------------    
---------------------------            
----------------------------------------------------
   batch_json_string_to_variant json_list 8k string                             
                   1.00     27.5±0.89ms        ? ?/sec                         
1.09     29.9±0.81ms        ? ?/sec    1.03     28.3±0.76ms        ? ?/sec
   batch_json_string_to_variant object - 1 depth(8000-100) random_json(2436 
bytes per document)    1.01    204.1±4.21ms        ? ?/sec                      
   1.03    208.6±4.17ms        ? ?/sec    1.00    201.8±4.59ms        ? ?/sec
   batch_json_string_to_variant object - 1 depth(8000-200) random_json(4871 
bytes per document)    1.02    389.8±7.24ms        ? ?/sec                      
   1.04    399.6±7.10ms        ? ?/sec    1.00    383.8±7.35ms        ? ?/sec
   batch_json_string_to_variant random_json(2633 bytes per document)            
                   1.01    328.1±5.06ms        ? ?/sec                         
1.04    337.8±6.06ms        ? ?/sec    1.00    325.6±5.64ms        ? ?/sec
   batch_json_string_to_variant repeated_struct 8k string                       
                   1.06     11.0±0.35ms        ? ?/sec                         
1.05     11.0±0.33ms        ? ?/sec    1.00     10.4±0.28ms        ? ?/sec
   variant_get_primitive                                                        
                   1.01  1730.8±49.77ns        ? ?/sec                         
1.03  1759.5±53.97ns        ? ?/sec    1.00  1712.1±57.32ns        ? ?/sec
   variant_get_shredded_utf8                                                    
                   1.02  1248.2±43.11ns        ? ?/sec                         
1.04  1276.2±26.61ns        ? ?/sec    1.00  1225.9±38.09ns        ? ?/sec
   ```
   
   # Ref
   <details>
   <summary>new benchmark logic</summary>
   
   ```
          write!(output_buffer, "{{").unwrap();
           for i in 0..*max_fields {
               let key_length = rng.random_range(1..=20);
               let key: String = (0..key_length)
                   .map(|_| rng.sample(Alphanumeric) as char)
                   .collect();
               write!(output_buffer, "\"{key}\":").unwrap();
   
               let total_weight = *null_weight + *string_weight + 
*number_weight + *boolean_weight;
   
               // Generate a random number to determine the type
               let mut random_value: usize = rng.random_range(0..total_weight);
   
               if random_value <= *null_weight {
                   write!(output_buffer, "null").unwrap();
               } else {
                   random_value -= *null_weight;
   
                   if random_value <= *string_weight {
                       // Generate a random string between 1 and 20 characters
                       let length = rng.random_range(1..=20);
                       let random_string: String = (0..length)
                           .map(|_| rng.sample(Alphanumeric) as char)
                           .collect();
                       write!(output_buffer, "\"{random_string}\"",).unwrap();
                   } else {
                       random_value -= *string_weight;
   
                       if random_value <= *number_weight {
                           // 50% chance of generating an integer or a float
                           if rng.random_bool(0.5) {
                               // Generate a random integer
                               let random_integer: i64 = 
rng.random_range(-1000..1000);
                               write!(output_buffer, 
"{random_integer}",).unwrap();
                           } else {
                               // Generate a random float
                               let random_float: f64 = 
rng.random_range(-1000.0..1000.0);
                               write!(output_buffer, 
"{random_float}",).unwrap();
                           }
                       } else {
                           random_value -= *number_weight;
   
                           if random_value <= *boolean_weight {
                               // Generate a random boolean
                               let random_boolean: bool = rng.random();
                               write!(output_buffer, 
"{random_boolean}",).unwrap();
                           }
                       }
                   }
               }
               if i < *max_fields - 1 {
                   write!(output_buffer, ",").unwrap();
               }
           }
           write!(&mut self.output_buffer, "}}").unwrap();
   ```
   </details>
   
   <details>
   <summary>
   PackedU32Iteratro
   </summary>
   
   ```
   struct PackedU32Iterator<const PACKED_BYTES: usize, T: Iterator<Item = [u8; 
4]>> {
       iterator: T,
       current_item: [u8; 4],
       current_byte: usize, // 0..3
   }
   
   impl<const PACKED_BYTES: usize, T: Iterator<Item = [u8; 4]>> 
PackedU32Iterator<PACKED_BYTES, T> {
       fn new(packed_bytes: usize, iterator: T) -> Self {
           // eliminate corner cases in `next` by initializing
           // with a fake already-consumed "first" item
           Self {
               iterator,
               current_item: [0; 4],
               current_byte: packed_bytes,
           }
       }
   }
   
   impl<const PACKED_BYTES: usize, T: Iterator<Item = [u8; 4]>> Iterator
       for PackedU32Iterator<PACKED_BYTES, T>
   {
       type Item = u8;
   
       fn next(&mut self) -> Option<Self::Item> {
           if self.current_byte >= PACKED_BYTES {
               self.current_item = self.iterator.next()?;
               self.current_byte = 0;
           }
   
           let rval = self.current_item[self.current_byte];
           self.current_byte += 1;
           Some(rval)
       }
   
       fn size_hint(&self) -> (usize, Option<usize>) {
           let lower = (PACKED_BYTES - self.current_byte) + PACKED_BYTES * 
self.iterator.size_hint().0;
           (lower, Some(lower))
       }
   }
   ```
   </details>
   
   <details>
   <summary>
   header writer with const generic
   </summary>
   
   ```
   fn object_header<const LARGE_BIT: u8, const ID_SIZE: u8, const OFFSET_SIZE: 
u8>() -> u8 {
       (LARGE_BIT << (BASIC_TYPE_BITS + 4))
           | ((ID_SIZE - 1) << (BASIC_TYPE_BITS + 2))
           | ((OFFSET_SIZE - 1) << BASIC_TYPE_BITS)
           | VariantBasicType::Object as u8
   }
   
   struct ObjectHeaderWriter<const OFFSET_SIZE: u8, const ID_SIZE: u8>();
   
   impl<const OFFSET_SIZE: u8, const ID_SIZE: u8> 
ObjectHeaderWriter<OFFSET_SIZE, ID_SIZE> {
       fn write(
           dst: &mut Vec<u8>,
           num_fields: usize,
           field_ids: impl Iterator<Item = u32>,
           offsets: impl Iterator<Item = usize>,
           data_size: usize,
       ) {
           let header = match num_fields > u8::MAX as usize {
               true => object_header::<1, { ID_SIZE }, { OFFSET_SIZE }>(),
               false => object_header::<0, { ID_SIZE }, { OFFSET_SIZE }>(),
           };
           dst.push(header);
   
           append_packed_u32::<ID_SIZE>(dst, num_fields as u32);
   
           for id in field_ids {
               append_packed_u32::<ID_SIZE>(dst, id);
           }
   
           for off in offsets {
               append_packed_u32::<OFFSET_SIZE>(dst, off as u32);
           }
   
           append_packed_u32::<OFFSET_SIZE>(dst, data_size as u32);
       }
   }
   
   fn append_packed_u32<const SIZE: u8>(dest: &mut Vec<u8>, value: u32) {
       let len = dest.len() + SIZE as usize;
       dest.extend(value.to_le_bytes());
       dest.truncate(len);
   }
   ```
   </details>
   
   > The flamegraph was obtained with the steps
   > 1. execute `cargo bench --no-run` to get the executable 
   > 2. execute with the executable with the arguments like 
   > 3. use `sudo sample $pid 60 -file /tmp/variant_kernels.sample` to generate 
the sample file
   > 4. use flamegraph tools to generate the flamegraph
   > use these steps to generate the flamegraph as can't install xcode for now 
:(
   
   The flamegraph with JSON deserialize
   
![variant_kernels-object](https://github.com/user-attachments/assets/f088cf50-96c3-462c-b2f5-96b2b53255b9)
   
   The flamegraph without the JSON deserialize
   
![variant_kernels-object-no-json-desr](https://github.com/user-attachments/assets/a551e9e8-0c50-4f5e-97da-8ee39db55d7e)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Variant] Optimize the object header generation logic in ObjectBuilder::finish [arrow-rs]

Reply via email to