andygrove opened a new issue, #3225:
URL: https://github.com/apache/datafusion-comet/issues/3225

   ## Summary
   
   PR #3224 implements field-major processing for struct fields, which moves 
type dispatch from O(rows × fields) to O(fields). However, for complex nested 
types (Struct, List, Map inside a struct), it falls back to row-major 
processing via `append_field`.
   
   This issue tracks extending the field-major optimization to nested Struct 
fields specifically.
   
   ## Current Behavior
   
   In `append_struct_fields_field_major()` (row.rs), complex types fall back to 
per-row processing:
   
   ```rust
   // For complex types (struct, list, map), fall back to append_field
   // since they have their own nested processing logic
   dt @ (DataType::Struct(_) | DataType::List(_) | DataType::Map(_, _)) => {
       for (row_idx, i) in (row_start..row_end).enumerate() {
           let nested_row = if struct_is_null[row_idx] {
               SparkUnsafeRow::default()
           } else {
               // ... extract nested row
           };
           append_field(dt, struct_builder, &nested_row, field_idx)?;
       }
   }
   ```
   
   This means for deeply nested structs, we lose the benefit of field-major 
processing at each nesting level.
   
   ## Proposed Optimization
   
   For nested Struct fields:
   1. Get the nested `StructBuilder` once per field
   2. Build nested struct validity in one pass  
   3. Recursively apply field-major processing to nested struct fields
   
   This would require refactoring to separate validity handling from field 
value processing.
   
   ## Expected Impact
   
   - 1.2-1.5x speedup for workloads with deeply nested struct types
   - Benefit multiplies with nesting depth
   
   ## Notes
   
   - List and Map fields are harder to optimize due to variable-length elements 
per row
   - This is a follow-up to PR #3224 which implemented the initial field-major 
optimization


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to