[I] Spark DataSource backed by a DataFusion TableProvider over ADBC [datafusion-java]

via GitHub Tue, 30 Jun 2026 13:23:35 -0700


timsaucer opened a new issue, #112:
URL: https://github.com/apache/datafusion-java/issues/112


   **Is your feature request related to a problem or challenge?**
   
   Spark users want to read data from a DataFusion `TableProvider` as a native 
Spark `DataSourceV2`. Today there is no first-class path; options are either a 
bespoke per-operation JNI surface (more native surface to maintain) or copying 
data out of process.
   
   **Describe the solution you'd like**
   
   A Spark `DataSourceV2` connector that places the native boundary at a 
**standard ADBC driver**. Spark talks to the upstream arrow-adbc Java driver 
manager (`adbc-core` + `adbc-driver-jni`), which loads a native DataFusion ADBC 
cdylib and returns arrow-java `ArrowReader`s consumed zero-copy as 
`ArrowColumnVector`s on the cluster-provided Arrow. This reuses the upstream 
ADBC bindings rather than reproducing them.
   
   Scope:
   - `adbc-datafusion` format registered as a `DataSourceV2`; schema probed on 
the driver.
   - Projection / filter / limit pushdown via Substrait, with a SQL fallback.
   - Multi-partition reads (`executePartitioned` / `readPartition`) and a 
`target_partitions` option.
   - Per-executor connection pool to amortize driver/database setup across task 
slots.
   - An example DataFusion ADBC driver cdylib plus end-to-end (PySpark) 
coverage.
   
   **Describe alternatives you've considered**
   
   A plain-C scan ABI + hand-written JNI shim (discussed on #103 / #104). The 
ADBC approach reuses standard, separately-reviewed bindings and a stable driver 
contract instead.
   
   **Additional context**
   
   Implemented in #111.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Spark DataSource backed by a DataFusion TableProvider over ADBC [datafusion-java]

Reply via email to