[jira] [Commented] (ARROW-15778) [Java] Endianness field not emitted in IPC stream

2022-03-07 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502198#comment-17502198
 ] 

Ravi Gummadi commented on ARROW-15778:
--

Thanks [~kiszk] 

> [Java] Endianness field not emitted in IPC stream
> -
>
> Key: ARROW-15778
> URL: https://issues.apache.org/jira/browse/ARROW-15778
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 8.0.0
>
>
> It seems the Java IPC writer implementation does not emit the Endianness 
> information at all (making it Little by default). This complicates 
> interoperability with the C++ IPC reader, which does read this information 
> and acts on it to decide whether it needs to byteswap the incoming data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (ARROW-15645) [Flight][Java][C++] Data read through Flight is having endianness issue on s390x

2022-02-25 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498004#comment-17498004
 ] 

Ravi Gummadi edited comment on ARROW-15645 at 2/25/22, 9:42 AM:


Yes. Both server and client are on s390x.

Thanks for the details [~apitrou] . I will watch 
[https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on 
my environment once a fix for 15778 is available.


was (Author: ravidotg):
Thanks for the details [~apitrou] . I will watch 
[https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on 
my environment once a fix for 15778 is available.

> [Flight][Java][C++] Data read through Flight is having endianness issue on 
> s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Java
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> {code}
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> {code}
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> {code:python}
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15645) [Flight][Java][C++] Data read through Flight is having endianness issue on s390x

2022-02-25 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498004#comment-17498004
 ] 

Ravi Gummadi commented on ARROW-15645:
--

Thanks for the details [~apitrou] . I will watch 
[https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15778] and test on 
my environment once a fix for 15778 is available.

> [Flight][Java][C++] Data read through Flight is having endianness issue on 
> s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Java
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> {code}
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> {code}
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> {code:python}
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-24 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497337#comment-17497337
 ] 

Ravi Gummadi commented on ARROW-15645:
--

Flight server side is using java based arrow 6.0.1 version.
Client side pyarrow 5.0.0 or 6.0.0 or 7.0.0  all 3 versions are facing the 
above reported issue.

> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-14 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492382#comment-17492382
 ] 

Ravi Gummadi commented on ARROW-15645:
--

[~kiszk] ,

I tried using pyarrow 6.0 on the client side and still the issue is seen.
So
(1) the issue is NOT there in pyarrow 3.0.0 on the client side and with flight 
server side arrow version 6.0.x
(2) the issue is seen with pyarrow 5.0.0 on the client side and flight server 
side arrow version 6.0.x
(3) the issue is seen with pyarrow 6.0.0 on the client side and flight server 
side arrow version 6.0.x

> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490682#comment-17490682
 ] 

Ravi Gummadi commented on ARROW-15645:
--

Flight server side arrow version is 6.x

Any clues on why only pyarrow 5.0.0 has the issue and the issue is not seen 
with pyarrow 3.0.0 ? Where in the arrow source code the fix may have to go in ? 
Thanks

> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated ARROW-15645:
-
Description: 
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error

Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array

(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def \_\_init\_\_(self, token: str = None):
        super().\_\_init\__()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()

  was:
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error

Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array

(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def {_}__{_}init{_}__{_}(self, token: str = None):
        super().{_}__{_}init{_}__{_}()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()


> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   Fil

[jira] [Comment Edited] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490156#comment-17490156
 ] 

Ravi Gummadi edited comment on ARROW-15645 at 2/10/22, 11:55 AM:
-

The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0.
_
Some investigation details from my side while debugging validate():

The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc)

(gdb) p data.buffers[1]->data()
$8 = (const uint8_t *) 0x3fff9680040 ""
(gdb) p data.buffers[1]->data()[4]
$9 = 26 '\032'

The same flight server is used for reading data. When I run the above sample 
code with pyarrow 3.0.0, I see the following correct offset values with the 
right byte-order.
>From data.h GetValues() called from validate.cc:

(gdb) p buffers[1]->data()
$11 = (const uint8_t *) 0x2aa00a544b2 ""
(gdb) p buffers[1]->data()[4]
$12 = 0 '\000'
(gdb) p buffers[1]->data()[5]
$13 = 0 '\000'
(gdb) p buffers[1]->data()[6]
$14 = 0 '\000'
(gdb) p buffers[1]->data()[7]
$15 = 26 '\032'


All offsets are seen as little endian order when using pyarrow5.0.0 on big 
endian machine. With pyarrow3.0.0, the offsets are in the expected byte-order 
on big endian machine.


was (Author: ravidotg):
The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0.
_
Some investigation details from my side while debugging validate():

The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc)

(gdb) p data.buffers[1]->data()
$8 = (const uint8_t *) 0x3fff9680040 ""
(gdb) p data.buffers[1]->data()[4]
$9 = 26 '\032'

The same flight server is used for reading data. When I run the above sample 
code with pyarrow 3.0.0, I see the following correct offset values with the 
right byte-order.
>From data.h GetValues() called from validate.cc:

(gdb) p buffers[1]->data()
$11 = (const uint8_t *) 0x2aa00a544b2 ""
(gdb) p buffers[1]->data()[4]
$12 = 0 '\000'
(gdb) p buffers[1]->data()[5]
$13 = 0 '\000'
(gdb) p buffers[1]->data()[6]
$14 = 0 '\000'
(gdb) p buffers[1]->data()[7]
$15 = 26 '\032'

> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def {_}__{_}init{_}__{_}(self, token: str = None):
>         super().{_}__{_}init{_}__{_}()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490156#comment-17490156
 ] 

Ravi Gummadi commented on ARROW-15645:
--

The issue is seen only with pyarrow 5.0.0 and is not seen with pyarrow 3.0.0.
_
Some investigation details from my side while debugging validate():

The offsets are having opposite byte-order with pyarrow 5.0.0 (from validate.cc)

(gdb) p data.buffers[1]->data()
$8 = (const uint8_t *) 0x3fff9680040 ""
(gdb) p data.buffers[1]->data()[4]
$9 = 26 '\032'

The same flight server is used for reading data. When I run the above sample 
code with pyarrow 3.0.0, I see the following correct offset values with the 
right byte-order.
>From data.h GetValues() called from validate.cc:

(gdb) p buffers[1]->data()
$11 = (const uint8_t *) 0x2aa00a544b2 ""
(gdb) p buffers[1]->data()[4]
$12 = 0 '\000'
(gdb) p buffers[1]->data()[5]
$13 = 0 '\000'
(gdb) p buffers[1]->data()[6]
$14 = 0 '\000'
(gdb) p buffers[1]->data()[7]
$15 = 26 '\032'

> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> 
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def {_}__{_}init{_}__{_}(self, token: str = None):
>         super().{_}__{_}init{_}__{_}()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated ARROW-15645:
-
Description: 
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error

Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array

(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def {_}__{_}init{_}__{_}(self, token: str = None):
        super().{_}__{_}init{_}__{_}()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()

  was:
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error 

Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array

(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:



from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def _{_}init{_}_(self, token: str = None):
        super()._{_}init{_}_()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()


> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>  

[jira] [Updated] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated ARROW-15645:
-
Description: 
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error 

Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array

(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:



from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def _{_}init{_}_(self, token: str = None):
        super()._{_}init{_}_()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()

  was:
Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error 
```
Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array
```
(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:
```

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def __init__(self, token: str = None):
        super().__init__()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()
```


> Data read through Flight is having endianness issue on s390x
> 
>
> Key: ARROW-15645
> URL: https://issues.apache.org/jira/browse/ARROW-15645
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC, Python
>Affects Versions: 5.0.0
> Environment: Linux s390x (big endian)
>Reporter: Ravi Gummadi
>Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error 
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in 
>     table.validate()
>   File "p

[jira] [Created] (ARROW-15645) Data read through Flight is having endianness issue on s390x

2022-02-10 Thread Ravi Gummadi (Jira)
Ravi Gummadi created ARROW-15645:


 Summary: Data read through Flight is having endianness issue on 
s390x
 Key: ARROW-15645
 URL: https://issues.apache.org/jira/browse/ARROW-15645
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC, Python
Affects Versions: 5.0.0
 Environment: Linux s390x (big endian)
Reporter: Ravi Gummadi


Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.

(1) table.validate() fails with error 
```
Traceback (most recent call last):
  File "/tmp/2.py", line 51, in 
    table.validate()
  File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array
```
(2) table.to_pandas() gives a segmentation fault

Here is a sample code that I am using:
```

from pyarrow import flight
import os
import json

flight_endpoint = os.environ.get("flight_server_url", "grpc+tls://...local:443")
print(flight_endpoint)

#
class TokenClientAuthHandler(flight.ClientAuthHandler):
    """An example implementation of authentication via handshake.
       With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
       You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
    """
    def __init__(self, token: str = None):
        super().__init__()
        if( token != None):
            strToken = strToken = 'Bearer {}'.format(token)
        else:
            strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
        self.token = strToken.encode('utf-8')
        #print(self.token)

    def authenticate(self, outgoing, incoming):
        outgoing.write(self.token)
        self.token = incoming.read()

    def get_token(self):
        return self.token
    
readClient = flight.FlightClient(flight_endpoint)
readClient.authenticate(TokenClientAuthHandler())

cmd = json.dumps(\{...})

descriptor = flight.FlightDescriptor.for_command(cmd)
flightInfo = readClient.get_flight_info(descriptor)

reader = readClient.do_get(flightInfo.endpoints[0].ticket)
table = reader.read_all()

print(table)
print(table.num_columns)
print(table.num_rows)
table.validate()
table.to_pandas()
```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)