Yunbo Deng created ARROW-17077: ---------------------------------- Summary: Unicode character issue with pyarrow Key: ARROW-17077 URL: https://issues.apache.org/jira/browse/ARROW-17077 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Yunbo Deng
When running code using databricks SQL connector for Python, it hit a unicode character issue in pyarrow library. The customer has to put a workaround in the client code, something like "SELECT decode(string(unbase64(value)), 'utf8')" Exception in the main script No data fetched using SQL-statement: SELECT * FROM parquet.`abfss://x...@xxxx.xxx.net/structXXXXXXX`. Exception: Unknown error: Wrapping TP H�kan Sweater failed Traceback (most recent call last): File "/home/xxxx/yy/allo/yy/db/sql_reader.py", line 53, in query rows = cursor.fetchmany(self.MAX_ROWS) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 401, in fetchmany return self.active_result_set.fetchmany(size) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 630, in fetchmany return self._convert_arrow_table(self.fetchmany_arrow(size)) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 563, in _convert_arrow_table df = table_renamed.to_pandas( File "pyarrow/array.pxi", line 822, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow/table.pxi", line 3889, in pyarrow.lib.Table._to_pandas File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 803, in table_to_blockmanager blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 1155, in _table_to_blocks return [_reconstruct_block(item, columns, extension_columns) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 1155, in <listcomp> return [_reconstruct_block(item, columns, extension_columns) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 763, in _reconstruct_block pd_ext_arr = pandas_dtype.__from_arrow__(arr) File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pandas/core/arrays/string_.py", line 217, in __from_arrow__ str_arr = StringArray._from_sequence(np.array(arr)) File "pyarrow/array.pxi", line 1395, in pyarrow.lib.Array.__array__ File "pyarrow/array.pxi", line 1441, in pyarrow.lib.Array.to_numpy File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status pyarrow.lib.ArrowException: Unknown error: Wrapping TP H�kan Sweater failed During handling of the above exception, another exception occurred: Traceback (most recent call last): -- This message was sent by Atlassian Jira (v8.20.10#820010)