Ravi Gummadi created ARROW-15645: ------------------------------------ Summary: Data read through Flight is having endianness issue on s390x Key: ARROW-15645 URL: https://issues.apache.org/jira/browse/ARROW-15645 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC, Python Affects Versions: 5.0.0 Environment: Linux s390x (big endian) Reporter: Ravi Gummadi
Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error ``` Traceback (most recent call last): File "/tmp/2.py", line 51, in <module> table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array ``` (2) table.to_pandas() gives a segmentation fault ____________ Here is a sample code that I am using: ``` from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def __init__(self, token: str = None): super().__init__() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() ``` -- This message was sent by Atlassian Jira (v8.20.1#820001)