henryharbeck commented on issue #2828: URL: https://github.com/apache/arrow-adbc/issues/2828#issuecomment-3121459342
Hi @lidavidm, I had a crack at implementing this, but I think I am blocked by the lack of union types in Polars. It would be good if you could confirm. Here is a simple reproducer. ```py import adbc_driver_sqlite.dbapi import polars # noqa: F401 # Ensure no PyArrow try: import pyarrow except ImportError: pass else: raise RuntimeError("Uninstall PyArrow") conn = adbc_driver_sqlite.dbapi.connect() # print(conn._backend) # <adbc_driver_manager._dbapi_backend._PolarsBackend object...> handle = conn._conn.get_info() # print(type(handle)) # <class 'adbc_driver_manager._lib.ArrowArrayStreamHandle'> conn._backend.import_array_stream(handle) # Panic # Try direct constructors as well # polars.from_arrow(handle) # Panic # polars.DataFrame(handle) # Panic (also supports PyCapsule interface) ``` All panics are ``` thread '<unnamed>' panicked at crates/polars-core/src/datatypes/field.rs:256:19: Arrow datatype Union(UnionType { fields: [Field { name: "string_value", dtype: Utf8, is_nullable: true, metadata: None }, Field { name: "bool_value", dtype: Boolean, is_nullable: true, metadata: None }, Field { name: "int64_value", dtype: Int64, is_nullable: true, metadata: None }, Field { name: "int32_bitmask", dtype: Int32, is_nullable: true, metadata: None }, Field { name: "string_list", dtype: List(Field { name: "item", dtype: Utf8, is_nullable: true, metadata: None }), is_nullable: true, metadata: None }, Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries", dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata: None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32, is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]), is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None }], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not supported by Polars. You probabl y need to activate that data-type feature. note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Traceback (most recent call last): File "/home/henry/development/temp/repro.py", line 22, in <module> conn._backend.import_array_stream(handle) # Panic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/henry/development/temp/.venv/lib/python3.11/site-packages/adbc_driver_manager/_dbapi_backend.py", line 147, in import_array_stream return polars.from_arrow(handle) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/convert/general.py", line 536, in from_arrow return pycapsule_to_frame( ^^^^^^^^^^^^^^^^^^^ File "/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/_utils/pycapsule.py", line 41, in pycapsule_to_frame s = wrap_s(PySeries.from_arrow_c_stream(obj)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pyo3_runtime.PanicException: Arrow datatype Union(UnionType { fields: [Field { name: "string_value", dtype: Utf8, is_nullable: true, metadata: None }, Field { name: "bool_value", dtype: Boolean, is_nullable: true, metadata: None }, Field { name: "int64_value", dtype: Int64, is_nullable: true, metadata: None }, Field { name: "int32_bitmask", dtype: Int32, is_nullable: true, metadata: None }, Field { name: "string_list", dtype: List(Field { name: "item", dtype: Utf8, is_nullable: true, metadata: None }), is_nullable: true, metadata: None }, Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries", dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata: None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32, is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]), is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None }], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not sup ported by Polars. You probably need to activate that data-type feature. ``` Looking with the PyArrow backend gives a bit more context ```py # conn as above, but this time with PyArrow backend handle = conn._conn.get_info() reader = conn._backend.import_array_stream(handle) tbl = reader.read_all() print(tbl.schema) # info_name: uint32 not null # info_value: dense_union<string_value: string=0, bool_value: bool=1, int64_value: int64=2, int32_bitmask: int32=3 (... 94 chars omitted) # child 0, string_value: string # child 1, bool_value: bool # child 2, int64_value: int64 # child 3, int32_bitmask: int32 # child 4, string_list: list<item: string> # child 0, item: string # child 5, int32_to_int32_list_map: map<int32, list<item: int32>> # child 0, entries: struct<key: int32 not null, value: list<item: int32>> not null # child 0, key: int32 not null # child 1, value: list<item: int32> # child 0, item: int32 print(tbl.to_pylist()) # [ # {'info_name': 0, 'info_value': 'SQLite'}, # {'info_name': 1, 'info_value': '3.45.3'}, # {'info_name': 100, 'info_value': 'ADBC SQLite Driver'}, # {'info_name': 101, 'info_value': '(unknown)'}, # {'info_name': 102, 'info_value': '0.6.0'} # ] ``` and for reference ```py _KNOWN_INFO_VALUES = { 0: "vendor_name", 1: "vendor_version", 2: "vendor_arrow_version", 100: "driver_name", 101: "driver_version", 102: "driver_arrow_version", 103: "driver_adbc_version", } ``` Is there a way around the union type? Perhaps exposing the "info_value" field to Python as a string instead? I do note that "driver_adbc_version" has an (albeit inconsistent with other version info values) int value (per the postgres test), which would need to be cast afterwards. Apologies if these questions are a bit naïve given I am only looking at Python here. I am keen to do this (or have it picked up by yourself or another dev), as it is the final upstream piece of the related Polars issue mentioned in the description. FWIW, union types may come in Polars, but not yet (https://github.com/pola-rs/polars/issues/9112#issuecomment-3102111334) Keen to hear your thoughts. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org