andygrove opened a new issue, #3225:
URL: https://github.com/apache/datafusion-comet/issues/3225
## Summary
PR #3224 implements field-major processing for struct fields, which moves
type dispatch from O(rows × fields) to O(fields). However, for complex nested
types (Struct, List, Map inside a struct), it falls back to row-major
processing via `append_field`.
This issue tracks extending the field-major optimization to nested Struct
fields specifically.
## Current Behavior
In `append_struct_fields_field_major()` (row.rs), complex types fall back to
per-row processing:
```rust
// For complex types (struct, list, map), fall back to append_field
// since they have their own nested processing logic
dt @ (DataType::Struct(_) | DataType::List(_) | DataType::Map(_, _)) => {
for (row_idx, i) in (row_start..row_end).enumerate() {
let nested_row = if struct_is_null[row_idx] {
SparkUnsafeRow::default()
} else {
// ... extract nested row
};
append_field(dt, struct_builder, &nested_row, field_idx)?;
}
}
```
This means for deeply nested structs, we lose the benefit of field-major
processing at each nesting level.
## Proposed Optimization
For nested Struct fields:
1. Get the nested `StructBuilder` once per field
2. Build nested struct validity in one pass
3. Recursively apply field-major processing to nested struct fields
This would require refactoring to separate validity handling from field
value processing.
## Expected Impact
- 1.2-1.5x speedup for workloads with deeply nested struct types
- Benefit multiplies with nesting depth
## Notes
- List and Map fields are harder to optimize due to variable-length elements
per row
- This is a follow-up to PR #3224 which implemented the initial field-major
optimization
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]