Ravi Gummadi created ARROW-15645:
------------------------------------

             Summary: Data read through Flight is having endianness issue on 
s390x
                 Key: ARROW-15645
                 URL: https://issues.apache.org/jira/browse/ARROW-15645
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, FlightRPC, Python
    Affects Versions: 5.0.0
         Environment: Linux s390x (big endian)
            Reporter: Ravi Gummadi


Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error 
```
Traceback (most recent call last):
  File "/tmp/2.py", line 51, in <module>
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array
```
(2) table.to_pandas() gives a segmentation fault
____________
Here is a sample code that I am using:
```

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def __init__(self, token: str = None):
        super().__init__()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()
```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to