adriangb opened a new pull request, #18720:
URL: https://github.com/apache/datafusion/pull/18720

   ## Summary
   
   This PR consolidates the separate `ArrowFileSource` and 
`ArrowStreamFileSource` implementations into a unified `ArrowSource` with an 
`ArrowFormat` enum.
   
   This is part of the larger projection refactoring effort tracked in 
https://github.com/apache/datafusion/pull/18627.
   
   ## Key Changes
   
   - **Removed separate structs**: Eliminated duplicate `ArrowFileSource` and 
`ArrowStreamFileSource` implementations
   - **Added `ArrowFormat` enum**: Simple enum with `File` and `Stream` 
variants to distinguish between Arrow IPC formats
   - **Unified `ArrowSource` struct**: Single struct that uses `ArrowFormat` to 
dispatch to appropriate opener
   - **Kept separate openers**: `ArrowFileOpener` and `ArrowStreamFileOpener` 
remain distinct as their implementations differ significantly
   - **Format-specific behavior**: `repartitioned()` method returns `None` for 
Stream format (doesn't support parallel reading) and delegates to default logic 
for File format
   
   ## Benefits
   
   - **Reduced code duplication**: ~144 net lines removed
   - **Clearer architecture**: Single source of truth for Arrow file handling
   - **Maintained separation**: Format-specific logic remains in separate 
openers
   - **No behavior changes**: All existing tests pass without modification
   
   ## Testing
   
   - All existing tests pass
   - No changes to test files needed
   - Both file and stream formats work correctly
   
   ## Related Work
   
   This PR is independent and can be merged before or after:
   - PR 1: Move Statistics Handling (if created)
   - PR 3: Enhance Physical-Expr Projection Handling (if created)
   
   Part of #18627
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to