[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490682#comment-17490682 ]
Ravi Gummadi commented on ARROW-15645: -------------------------------------- Flight server side arrow version is 6.x Any clues on why only pyarrow 5.0.0 has the issue and the issue is not seen with pyarrow 3.0.0 ? Where in the arrow source code the fix may have to go in ? Thanks > Data read through Flight is having endianness issue on s390x > ------------------------------------------------------------ > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python > Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) > Reporter: Ravi Gummadi > Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in <module> > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > ____________ > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)