Yunbo Deng created ARROW-17077:
----------------------------------

             Summary: Unicode character issue with pyarrow
                 Key: ARROW-17077
                 URL: https://issues.apache.org/jira/browse/ARROW-17077
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Yunbo Deng


When running code using databricks SQL connector for Python, it hit a unicode 
character issue in pyarrow library. The customer has to put a workaround in the 
client code, something like
"SELECT decode(string(unbase64(value)), 'utf8')"
 
Exception in the main script No data fetched using SQL-statement: SELECT * FROM 
parquet.`abfss://x...@xxxx.xxx.net/structXXXXXXX`. Exception: Unknown error: 
Wrapping TP H�kan  Sweater failed Traceback (most recent call last):  
File "/home/xxxx/yy/allo/yy/db/sql_reader.py", line 53, in query     rows = 
cursor.fetchmany(self.MAX_ROWS)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", 
line 401, in fetchmany     return self.active_result_set.fetchmany(size)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", 
line 630, in fetchmany     return 
self._convert_arrow_table(self.fetchmany_arrow(size))  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", 
line 563, in _convert_arrow_table     df = table_renamed.to_pandas(  
File "pyarrow/array.pxi", line 822, in pyarrow.lib._PandasConvertible.to_pandas 
 
File "pyarrow/table.pxi", line 3889, in pyarrow.lib.Table._to_pandas  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 803, in table_to_blockmanager     blocks = _table_to_blocks(options, 
table, categories, ext_columns_dtypes)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 1155, in _table_to_blocks     return [_reconstruct_block(item, columns, 
extension_columns)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 1155, in <listcomp>     return [_reconstruct_block(item, columns, 
extension_columns)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", 
line 763, in _reconstruct_block     pd_ext_arr = 
pandas_dtype.__from_arrow__(arr)  
File 
"/home/xxxx/yy/.venv/lib/python3.10/site-packages/pandas/core/arrays/string_.py",
 line 217, in __from_arrow__     str_arr = 
StringArray._from_sequence(np.array(arr))  
File "pyarrow/array.pxi", line 1395, in pyarrow.lib.Array.__array__  
File "pyarrow/array.pxi", line 1441, in pyarrow.lib.Array.to_numpy  
File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status 
pyarrow.lib.ArrowException: Unknown error: Wrapping TP H�kan  Sweater failed 
During handling of the above exception, another exception occurred: Traceback 
(most recent call last):  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to