crm26 opened a new pull request, #21367:
URL: https://github.com/apache/datafusion/pull/21367

   ## Summary
   
   Adds four inline SQL table functions for ad-hoc file querying:
   
   ```sql
   SELECT * FROM read_parquet('/path/to/*.parquet')
   SELECT * FROM read_csv('/data/file.csv')
   SELECT * FROM read_json('/data/file.json')
   SELECT * FROM read_avro('/data/file.avro')
   ```
   
   Closes #3773
   
   ## Design
   
   Each function is a thin `TableFunctionImpl` wrapper (~60 lines) over 
`ListingTable`:
   
   1. Extract path string from `Expr::Literal`
   2. Construct `ListingOptions` with the format's `FileFormat`
   3. Infer schema via blocking bridge
   4. Return `ListingTable` as `TableProvider`
   
   Since the SQL planner wraps UDTF output as `LogicalPlan::TableScan`, all 
optimizer rules apply automatically:
   - **Filter pushdown** — verified via EXPLAIN test
   - **Projection pushdown** — verified via EXPLAIN test
   - **Partition pruning** — inherited from `ListingTable`
   
   ## Async bridge
   
   `call_with_args` is a sync fn but `infer_schema` is async. Uses 
`std::thread::scope` + `Handle::block_on` (not `block_in_place`) so it works on 
both multi-thread and current-thread Tokio runtimes. Tested with 
single-threaded runtime.
   
   ## Feature gating
   
   - `read_parquet` — requires `parquet` feature (default on)
   - `read_avro` — requires `avro` feature (default off)
   - `read_csv` / `read_json` — always available (no heavy optional 
dependencies)
   
   ## Limitations (v1)
   
   - Positional arguments only — no named args like `has_header => true`
   - No user-supplied schema override
   - No explicit Hive partition column specification
   - S3 paths require a registered object store
   
   These can be addressed in follow-on PRs.
   
   ## Tests
   
   16 tests covering: basic read, filtered read, projection, aggregation, glob 
multi-file, error paths (no args, wrong type), filter/projection pushdown 
verification, and single-threaded runtime safety.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to