[PR] WIP: Proof-of-concept ADBC driver that lives outside of arrow-adbc [arrow]

via GitHub Thu, 15 Aug 2024 19:45:21 -0700


paleolimbot opened a new pull request, #43722:
URL: https://github.com/apache/arrow/pull/43722


   ### Rationale for this change
   
   Not intended to be merged! This PR is to explore the various 
components/challenges of building an ADBC driver outside our own repository. We 
want people to build ADBC drivers and for it to be the norm for providing 
drivers, and we want to provide tools to make it easier for ourselves and 
others to build/maintain those drivers. The framework part lives/will live in 
arrow-adbc, but putting this proof-of-concept there would make it hard to 
assess challenges of a driver living elsewhere.
   
   ### What changes are included in this PR?
   
   This PR includes a simple driver wrapping substrait execution. It only works 
for substrait plans without named tables (although you could in theory use the 
bulk insert feature to put named tables with some work). I am not sure we 
actually want this ADBC driver but it is quite nice to be able to have 
something beyond a toy to work with.
   
   ### Are these changes tested?
   
   ```python
   import tempfile
   import pyarrow as pa
   from pyarrow import parquet
   import pyarrow._substrait as substrait_internal
   from adbc_driver_manager import AdbcDatabase, AdbcConnection, AdbcStatement
   
   db = AdbcDatabase(init_func=substrait_internal.get_adbc_driver_init_func())
   con = AdbcConnection(db)
   stmt = AdbcStatement(con)
   
   table = pa.table({"i": [1, 2, 3], "b": [True, False, True]})
   with tempfile.TemporaryDirectory() as td:
       parquet_file = f"{td}/tmp.parquet"
       parquet.write_table(table, parquet_file)
       plan_json = """{
           "relations": [
           {"rel": {
               "read": {
               "base_schema": {
                   "struct": {
                   "types": [ {"i64": {}}, {"bool": {}} ]
                   },
                   "names": ["i", "b"]
               },
               "local_files": {
                   "items": [
                   {
                       "uri_file": "file://FILENAME",
                       "parquet": {}
                   }
                   ]
               }
               }
           }}
           ]
       }""".replace("FILENAME", parquet_file)
   
       stmt.set_sql_query(plan_json)
       stream, rows_affected = stmt.execute_query()
       stream_result = pa.table(stream)
   
   stream_result, rows_affected
   #> (pyarrow.Table
   #>  i: int64
   #>  b: bool
   #>  ----
   #>  i: [[1,2,3]]
   #>  b: [[true,false,true]],
   #>  3)
   ```
   
   ### Are there any user-facing changes?
   
   Not intended to be merged!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] WIP: Proof-of-concept ADBC driver that lives outside of arrow-adbc [arrow]

Reply via email to