[jira] [Commented] (ARROW-15778) [Java] Endianness field not emitted in IPC stream
[ https://issues.apache.org/jira/browse/ARROW-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502198#comment-17502198 ] Ravi Gummadi commented on ARROW-15778: -- Thanks [~kiszk] > [Java] Endianness field not emitted in IPC stream > - > > Key: ARROW-15778 > URL: https://issues.apache.org/jira/browse/ARROW-15778 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Antoine Pitrou >Priority: Major > Fix For: 8.0.0 > > > It seems the Java IPC writer implementation does not emit the Endianness > information at all (making it Little by default). This complicates > interoperability with the C++ IPC reader, which does read this information > and acts on it to decide whether it needs to byteswap the incoming data. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (ARROW-15645) [Flight][Java][C++] Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498004#comment-17498004 ] Ravi Gummadi edited comment on ARROW-15645 at 2/25/22, 9:42 AM: Yes. Both server and client are on s390x. Thanks for the details [~apitrou] . I will watch [https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on my environment once a fix for 15778 is available. was (Author: ravidotg): Thanks for the details [~apitrou] . I will watch [https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on my environment once a fix for 15778 is available. > [Flight][Java][C++] Data read through Flight is having endianness issue on > s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Java >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > {code} > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > {code} > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > {code:python} > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15645) [Flight][Java][C++] Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498004#comment-17498004 ] Ravi Gummadi commented on ARROW-15645: -- Thanks for the details [~apitrou] . I will watch [https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on my environment once a fix for 15778 is available. > [Flight][Java][C++] Data read through Flight is having endianness issue on > s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Java >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > {code} > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > {code} > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > {code:python} > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497337#comment-17497337 ] Ravi Gummadi commented on ARROW-15645: -- Flight server side is using java based arrow 6.0.1 version. Client side pyarrow 5.0.0 or 6.0.0 or 7.0.0 all 3 versions are facing the above reported issue. > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492382#comment-17492382 ] Ravi Gummadi commented on ARROW-15645: -- [~kiszk] , I tried using pyarrow 6.0 on the client side and still the issue is seen. So (1) the issue is NOT there in pyarrow 3.0.0 on the client side and with flight server side arrow version 6.0.x (2) the issue is seen with pyarrow 5.0.0 on the client side and flight server side arrow version 6.0.x (3) the issue is seen with pyarrow 6.0.0 on the client side and flight server side arrow version 6.0.x > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490682#comment-17490682 ] Ravi Gummadi commented on ARROW-15645: -- Flight server side arrow version is 6.x Any clues on why only pyarrow 5.0.0 has the issue and the issue is not seen with pyarrow 3.0.0 ? Where in the arrow source code the fix may have to go in ? Thanks > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def \_\_init\_\_(self, token: str = None): > super().\_\_init\__() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated ARROW-15645: - Description: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def \_\_init\_\_(self, token: str = None): super().\_\_init\__() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() was: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def {_}__{_}init{_}__{_}(self, token: str = None): super().{_}__{_}init{_}__{_}() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > Fil
[jira] [Comment Edited] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490156#comment-17490156 ] Ravi Gummadi edited comment on ARROW-15645 at 2/10/22, 11:55 AM: - The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0. _ Some investigation details from my side while debugging validate(): The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc) (gdb) p data.buffers[1]->data() $8 = (const uint8_t *) 0x3fff9680040 "" (gdb) p data.buffers[1]->data()[4] $9 = 26 '\032' The same flight server is used for reading data. When I run the above sample code with pyarrow 3.0.0, I see the following correct offset values with the right byte-order. >From data.h GetValues() called from validate.cc: (gdb) p buffers[1]->data() $11 = (const uint8_t *) 0x2aa00a544b2 "" (gdb) p buffers[1]->data()[4] $12 = 0 '\000' (gdb) p buffers[1]->data()[5] $13 = 0 '\000' (gdb) p buffers[1]->data()[6] $14 = 0 '\000' (gdb) p buffers[1]->data()[7] $15 = 26 '\032' All offsets are seen as little endian order when using pyarrow5.0.0 on big endian machine. With pyarrow3.0.0, the offsets are in the expected byte-order on big endian machine. was (Author: ravidotg): The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0. _ Some investigation details from my side while debugging validate(): The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc) (gdb) p data.buffers[1]->data() $8 = (const uint8_t *) 0x3fff9680040 "" (gdb) p data.buffers[1]->data()[4] $9 = 26 '\032' The same flight server is used for reading data. When I run the above sample code with pyarrow 3.0.0, I see the following correct offset values with the right byte-order. >From data.h GetValues() called from validate.cc: (gdb) p buffers[1]->data() $11 = (const uint8_t *) 0x2aa00a544b2 "" (gdb) p buffers[1]->data()[4] $12 = 0 '\000' (gdb) p buffers[1]->data()[5] $13 = 0 '\000' (gdb) p buffers[1]->data()[6] $14 = 0 '\000' (gdb) p buffers[1]->data()[7] $15 = 26 '\032' > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def {_}__{_}init{_}__{_}(self, token: str = None): > super().{_}__{_}init{_}__{_}() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490156#comment-17490156 ] Ravi Gummadi commented on ARROW-15645: -- The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0. _ Some investigation details from my side while debugging validate(): The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc) (gdb) p data.buffers[1]->data() $8 = (const uint8_t *) 0x3fff9680040 "" (gdb) p data.buffers[1]->data()[4] $9 = 26 '\032' The same flight server is used for reading data. When I run the above sample code with pyarrow 3.0.0, I see the following correct offset values with the right byte-order. >From data.h GetValues() called from validate.cc: (gdb) p buffers[1]->data() $11 = (const uint8_t *) 0x2aa00a544b2 "" (gdb) p buffers[1]->data()[4] $12 = 0 '\000' (gdb) p buffers[1]->data()[5] $13 = 0 '\000' (gdb) p buffers[1]->data()[6] $14 = 0 '\000' (gdb) p buffers[1]->data()[7] $15 = 26 '\032' > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in > binary array > (2) table.to_pandas() gives a segmentation fault > > Here is a sample code that I am using: > from pyarrow import flight > import os > import json > flight_endpoint = os.environ.get("flight_server_url", > "grpc+tls://...local:443") > print(flight_endpoint) > # > class TokenClientAuthHandler(flight.ClientAuthHandler): > """An example implementation of authentication via handshake. > With the default constructor, the user token is read from the > environment: TokenClientAuthHandler(). > You can also pass a user token as parameter to the constructor, > TokenClientAuthHandler(yourtoken). > """ > def {_}__{_}init{_}__{_}(self, token: str = None): > super().{_}__{_}init{_}__{_}() > if( token != None): > strToken = strToken = 'Bearer {}'.format(token) > else: > strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) > self.token = strToken.encode('utf-8') > #print(self.token) > def authenticate(self, outgoing, incoming): > outgoing.write(self.token) > self.token = incoming.read() > def get_token(self): > return self.token > > readClient = flight.FlightClient(flight_endpoint) > readClient.authenticate(TokenClientAuthHandler()) > cmd = json.dumps(\{...}) > descriptor = flight.FlightDescriptor.for_command(cmd) > flightInfo = readClient.get_flight_info(descriptor) > reader = readClient.do_get(flightInfo.endpoints[0].ticket) > table = reader.read_all() > print(table) > print(table.num_columns) > print(table.num_rows) > table.validate() > table.to_pandas() -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated ARROW-15645: - Description: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def {_}__{_}init{_}__{_}(self, token: str = None): super().{_}__{_}init{_}__{_}() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() was: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def _{_}init{_}_(self, token: str = None): super()._{_}init{_}_() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() >
[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x
[ https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated ARROW-15645: - Description: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def _{_}init{_}_(self, token: str = None): super()._{_}init{_}_() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() was: Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error ``` Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array ``` (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: ``` from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def __init__(self, token: str = None): super().__init__() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() ``` > Data read through Flight is having endianness issue on s390x > > > Key: ARROW-15645 > URL: https://issues.apache.org/jira/browse/ARROW-15645 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC, Python >Affects Versions: 5.0.0 > Environment: Linux s390x (big endian) >Reporter: Ravi Gummadi >Priority: Major > > Am facing an endianness issue on s390x(big endian) when converting the data > read through flight to pandas data frame. > (1) table.validate() fails with error > Traceback (most recent call last): > File "/tmp/2.py", line 51, in > table.validate() > File "p
[jira] [Created] (ARROW-15645) Data read through Flight is having endianness issue on s390x
Ravi Gummadi created ARROW-15645: Summary: Data read through Flight is having endianness issue on s390x Key: ARROW-15645 URL: https://issues.apache.org/jira/browse/ARROW-15645 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC, Python Affects Versions: 5.0.0 Environment: Linux s390x (big endian) Reporter: Ravi Gummadi Am facing an endianness issue on s390x(big endian) when converting the data read through flight to pandas data frame. (1) table.validate() fails with error ``` Traceback (most recent call last): File "/tmp/2.py", line 51, in table.validate() File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in binary array ``` (2) table.to_pandas() gives a segmentation fault Here is a sample code that I am using: ``` from pyarrow import flight import os import json flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443") print(flight_endpoint) # class TokenClientAuthHandler(flight.ClientAuthHandler): """An example implementation of authentication via handshake. With the default constructor, the user token is read from the environment: TokenClientAuthHandler(). You can also pass a user token as parameter to the constructor, TokenClientAuthHandler(yourtoken). """ def __init__(self, token: str = None): super().__init__() if( token != None): strToken = strToken = 'Bearer {}'.format(token) else: strToken = 'Bearer {}'.format(os.environ.get("some_auth_token")) self.token = strToken.encode('utf-8') #print(self.token) def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token readClient = flight.FlightClient(flight_endpoint) readClient.authenticate(TokenClientAuthHandler()) cmd = json.dumps(\{...}) descriptor = flight.FlightDescriptor.for_command(cmd) flightInfo = readClient.get_flight_info(descriptor) reader = readClient.do_get(flightInfo.endpoints[0].ticket) table = reader.read_all() print(table) print(table.num_columns) print(table.num_rows) table.validate() table.to_pandas() ``` -- This message was sent by Atlassian Jira (v8.20.1#820001)