henryharbeck commented on issue #2828:
URL: https://github.com/apache/arrow-adbc/issues/2828#issuecomment-3121459342

   Hi @lidavidm, I had a crack at implementing this, but I think I am blocked 
by the lack of union types in Polars. It would be good if you could confirm. 
Here is a simple reproducer.
   
   ```py
   import adbc_driver_sqlite.dbapi
   
   import polars  # noqa: F401
   
   # Ensure no PyArrow
   try:
       import pyarrow
   except ImportError:
       pass
   else:
       raise RuntimeError("Uninstall PyArrow")
   
   conn = adbc_driver_sqlite.dbapi.connect()
   
   # print(conn._backend)  # <adbc_driver_manager._dbapi_backend._PolarsBackend 
object...>
   
   handle = conn._conn.get_info()
   
   # print(type(handle))  # <class 
'adbc_driver_manager._lib.ArrowArrayStreamHandle'>
   
   conn._backend.import_array_stream(handle)  # Panic
   
   # Try direct constructors as well
   # polars.from_arrow(handle)  # Panic
   # polars.DataFrame(handle)  # Panic (also supports PyCapsule interface)
   ```
   
   All panics are
   ```
   thread '<unnamed>' panicked at 
crates/polars-core/src/datatypes/field.rs:256:19:
   Arrow datatype Union(UnionType { fields: [Field { name: "string_value", 
dtype: Utf8, is_nullable: true, metadata: None }, Field { name: "bool_value", 
dtype: Boolean, is_nullable: true, metadata: None }, Field { name: 
"int64_value", dtype: Int64, is_nullable: true, metadata: None }, Field { name: 
"int32_bitmask", dtype: Int32, is_nullable: true, metadata: None }, Field { 
name: "string_list", dtype: List(Field { name: "item", dtype: Utf8, 
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }, 
Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries", 
dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata: 
None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32, 
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]), 
is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None 
}], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not supported by Polars. You 
probabl
 y need to activate that data-type feature.
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   Traceback (most recent call last):
     File "/home/henry/development/temp/repro.py", line 22, in <module>
       conn._backend.import_array_stream(handle)  # Panic
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/adbc_driver_manager/_dbapi_backend.py",
 line 147, in import_array_stream
       return polars.from_arrow(handle)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/convert/general.py",
 line 536, in from_arrow
       return pycapsule_to_frame(
              ^^^^^^^^^^^^^^^^^^^
     File 
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/_utils/pycapsule.py",
 line 41, in pycapsule_to_frame
       s = wrap_s(PySeries.from_arrow_c_stream(obj))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   pyo3_runtime.PanicException: Arrow datatype Union(UnionType { fields: [Field 
{ name: "string_value", dtype: Utf8, is_nullable: true, metadata: None }, Field 
{ name: "bool_value", dtype: Boolean, is_nullable: true, metadata: None }, 
Field { name: "int64_value", dtype: Int64, is_nullable: true, metadata: None }, 
Field { name: "int32_bitmask", dtype: Int32, is_nullable: true, metadata: None 
}, Field { name: "string_list", dtype: List(Field { name: "item", dtype: Utf8, 
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }, 
Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries", 
dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata: 
None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32, 
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]), 
is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None 
}], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not sup
 ported by Polars. You probably need to activate that data-type feature.
   ```
   
   Looking with the PyArrow backend gives a bit more context
   ```py
   # conn as above, but this time with PyArrow backend
   handle = conn._conn.get_info()
   
   reader = conn._backend.import_array_stream(handle)
   tbl = reader.read_all()
   print(tbl.schema)
   # info_name: uint32 not null
   # info_value: dense_union<string_value: string=0, bool_value: bool=1, 
int64_value: int64=2, int32_bitmask: int32=3 (... 94 chars omitted)
   #   child 0, string_value: string
   #   child 1, bool_value: bool
   #   child 2, int64_value: int64
   #   child 3, int32_bitmask: int32
   #   child 4, string_list: list<item: string>
   #       child 0, item: string
   #   child 5, int32_to_int32_list_map: map<int32, list<item: int32>>
   #       child 0, entries: struct<key: int32 not null, value: list<item: 
int32>> not null
   #           child 0, key: int32 not null
   #           child 1, value: list<item: int32>
   #               child 0, item: int32
   
   print(tbl.to_pylist())
   # [
   #     {'info_name': 0, 'info_value': 'SQLite'},
   #     {'info_name': 1, 'info_value': '3.45.3'},
   #     {'info_name': 100, 'info_value': 'ADBC SQLite Driver'},
   #     {'info_name': 101, 'info_value': '(unknown)'},
   #     {'info_name': 102, 'info_value': '0.6.0'}
   # ]
   ```
   
   and for reference
   ```py
   _KNOWN_INFO_VALUES = {
       0: "vendor_name",
       1: "vendor_version",
       2: "vendor_arrow_version",
       100: "driver_name",
       101: "driver_version",
       102: "driver_arrow_version",
       103: "driver_adbc_version",
   }
   ```
   
   Is there a way around the union type? Perhaps exposing the "info_value" 
field to Python as a string instead? I do note that "driver_adbc_version" has 
an (albeit inconsistent with other version info values) int value (per the 
postgres test), which would need to be cast afterwards.
   
   Apologies if these questions are a bit naïve given I am only looking at 
Python here.
   
   I am keen to do this (or have it picked up by yourself or another dev), as 
it is the final upstream piece of the related Polars issue mentioned in the 
description. FWIW, union types may come in Polars, but not yet 
(https://github.com/pola-rs/polars/issues/9112#issuecomment-3102111334)
   
   Keen to hear your thoughts. Thanks
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to