from:"Bryan Cutler \(Jira\)"

[jira] [Updated] (ARROW-18360) [Python] Incorrectly passing schema=None to do_put crashes

2022-11-17 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-18360:
-
Priority: Minor  (was: Major)

> [Python] Incorrectly passing schema=None to do_put crashes
> --
>
> Key: ARROW-18360
> URL: https://issues.apache.org/jira/browse/ARROW-18360
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 9.0.0
>Reporter: Bryan Cutler
>Priority: Minor
>
> In pyarrow.flight, putting an incorrect value of None for schema in do_put 
> will lead to a core dump.
> In pyarrow 9.0.0, trying to enter the command leads to a 
> {code}
> In [3]: writer, reader = 
> client.do_put(flight.FlightDescriptor.for_command(cmd), schema=None)
> Segmentation fault (core dumped)
> {code}
> In pyarrow 7.0.0, the kernel crashes after attempting to access the writer 
> and I got the following:
> {code}
> In [38]: client = flight.FlightClient('grpc+tls://localhost:9643', 
> disable_server_verification=True)
> In [39]: writer, reader = 
> client.do_put(flight.FlightDescriptor.for_command(cmd), None)
> In [40]: 
> writer./home/conda/feedstock_root/build_artifacts/arrow-cpp-ext_1644752264449/work/cpp/src/arrow/flight/client.cc:736:
>   Check failed: (batch_writer_) != (nullptr) 
> miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(+0x66288c)[0x7f0feeae088c]
> miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x101)[0x7f0feeae0c91]
> miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow_flight.so.700(+0x7c1e1)[0x7f0fa9e331e1]
> miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so(+0x17cf1a)[0x7f0fefe7ff1a]
> miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
> miniconda3/envs/dev/bin/python(+0x144814)[0x559a7cb8f814]
> miniconda3/envs/dev/bin/python(+0x1445bf)[0x559a7cb8f5bf]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc]
> miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac]
> miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
> miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
> miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44]
> miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397]
> miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
> miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
> miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
> miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac]
> miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
> miniconda3/envs/dev/bin/python(+0x151ef3)[0x559a7cb9cef3]
> miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44]
> miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x1311)[0x559a7cb7fbd1]
> miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc]
> miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
> miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x66f)[0x559a7cb7ef2f]
> miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
> miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
> miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
> miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
> miniconda3/envs/dev/bin/python(+0x1416f5)[0x559a7cb8c6f5]
> miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x52)[0x559a7cb8c4a2]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
> miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
> miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
> miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494]
> miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
> miniconda3/envs/dev/bin/python(+0x15a178)[0x559a7cba5178]
>

[jira] [Created] (ARROW-18360) [Python] Incorrectly passing schema=None to do_put crashes

2022-11-17 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-18360:


 Summary: [Python] Incorrectly passing schema=None to do_put crashes
 Key: ARROW-18360
 URL: https://issues.apache.org/jira/browse/ARROW-18360
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 9.0.0
Reporter: Bryan Cutler


In pyarrow.flight, putting an incorrect value of None for schema in do_put will 
lead to a core dump.

In pyarrow 9.0.0, trying to enter the command leads to a 

{code}
In [3]: writer, reader = 
client.do_put(flight.FlightDescriptor.for_command(cmd), schema=None)
Segmentation fault (core dumped)
{code}

In pyarrow 7.0.0, the kernel crashes after attempting to access the writer and 
I got the following:
{code}
In [38]: client = flight.FlightClient('grpc+tls://localhost:9643', 
disable_server_verification=True)

In [39]: writer, reader = 
client.do_put(flight.FlightDescriptor.for_command(cmd), None)

In [40]: 
writer./home/conda/feedstock_root/build_artifacts/arrow-cpp-ext_1644752264449/work/cpp/src/arrow/flight/client.cc:736:
  Check failed: (batch_writer_) != (nullptr) 
miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(+0x66288c)[0x7f0feeae088c]
miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x101)[0x7f0feeae0c91]
miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow_flight.so.700(+0x7c1e1)[0x7f0fa9e331e1]
miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so(+0x17cf1a)[0x7f0fefe7ff1a]
miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
miniconda3/envs/dev/bin/python(+0x144814)[0x559a7cb8f814]
miniconda3/envs/dev/bin/python(+0x1445bf)[0x559a7cb8f5bf]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc]
miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac]
miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44]
miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397]
miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac]
miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
miniconda3/envs/dev/bin/python(+0x151ef3)[0x559a7cb9cef3]
miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44]
miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x1311)[0x559a7cb7fbd1]
miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc]
miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5]
miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x66f)[0x559a7cb7ef2f]
miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
miniconda3/envs/dev/bin/python(+0x1416f5)[0x559a7cb8c6f5]
miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x52)[0x559a7cb8c4a2]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d]
miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03]
miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f]
miniconda3/envs/dev/bin/python(+0x15a178)[0x559a7cba5178]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x9ca)[0x559a7cb7f28a]
miniconda3/envs/dev/bin/python(+0x15a178)[0x559a7cba5178]
miniconda3/envs/dev/bin/python(+0x1602d9)[0x559a7cbab2d9]
miniconda3/envs/dev/bin/python(+0x19d5f5)[0x559a7cbe85f5]
miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc]
miniconda3/envs/dev/bin/python(+0x15a178)[0x559a7cba5178]

[jira] [Assigned] (ARROW-15778) [Java] Endianness field not emitted in IPC stream

2022-04-05 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-15778:


Assignee: Kazuaki Ishizaki

> [Java] Endianness field not emitted in IPC stream
> -
>
> Key: ARROW-15778
> URL: https://issues.apache.org/jira/browse/ARROW-15778
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Assignee: Kazuaki Ishizaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It seems the Java IPC writer implementation does not emit the Endianness 
> information at all (making it Little by default). This complicates 
> interoperability with the C++ IPC reader, which does read this information 
> and acts on it to decide whether it needs to byteswap the incoming data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-15778) [Java] Endianness field not emitted in IPC stream

2022-04-05 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-15778.
--
Resolution: Fixed

Issue resolved by pull request 12777
[https://github.com/apache/arrow/pull/12777]

> [Java] Endianness field not emitted in IPC stream
> -
>
> Key: ARROW-15778
> URL: https://issues.apache.org/jira/browse/ARROW-15778
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It seems the Java IPC writer implementation does not emit the Endianness 
> information at all (making it Little by default). This complicates 
> interoperability with the C++ IPC reader, which does read this information 
> and acts on it to decide whether it needs to byteswap the incoming data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15778) [Java] Endianness field not emitted in IPC stream

2022-03-24 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512169#comment-17512169
 ] 

Bryan Cutler commented on ARROW-15778:
--

[~kiszk] since your patch looks to be pretty simple and low risk, what do you 
think about making a PR for it as is? If it passes current tests, we know it 
doesn't mess up anything for little endian. Then we could have a separate task 
to add a proper test with big endian machine, if that requires more effort?

> [Java] Endianness field not emitted in IPC stream
> -
>
> Key: ARROW-15778
> URL: https://issues.apache.org/jira/browse/ARROW-15778
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 8.0.0
>
>
> It seems the Java IPC writer implementation does not emit the Endianness 
> information at all (making it Little by default). This complicates 
> interoperability with the C++ IPC reader, which does read this information 
> and acts on it to decide whether it needs to byteswap the incoming data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15954) [Java][Packaging] Nigthly build is broken due dependencies problems for flight-core module

2022-03-16 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507812#comment-17507812
 ] 

Bryan Cutler commented on ARROW-15954:
--

I think after the grpc and netty upgrade, some native libraries were no longer 
required and caused a warning. I took out the epoll dependency for linux which, 
but left the kqueue for mac since I couldn't verify. I'll make a pr to do this 
now.

> [Java][Packaging] Nigthly build is broken due dependencies problems for 
> flight-core module
> --
>
> Key: ARROW-15954
> URL: https://issues.apache.org/jira/browse/ARROW-15954
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Packaging
>Reporter: Anthony Louis Gotlib Ferreira
>Priority: Major
>
> The nightly builds that generate the java jars for the Arrow project are 
> broken due dependency problems in *flight-core* module: 
> [https://github.com/ursacomputing/crossbow/runs/5566185348?check_suite_focus=true]
>  
> The error started to appear after that PR with dependencies upgrades: 
> [https://github.com/apache/arrow/pull/12550]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (ARROW-15954) [Java][Packaging] Nigthly build is broken due dependencies problems for flight-core module

2022-03-16 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-15954:


Assignee: Bryan Cutler

> [Java][Packaging] Nigthly build is broken due dependencies problems for 
> flight-core module
> --
>
> Key: ARROW-15954
> URL: https://issues.apache.org/jira/browse/ARROW-15954
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Packaging
>Reporter: Anthony Louis Gotlib Ferreira
>Assignee: Bryan Cutler
>Priority: Major
>
> The nightly builds that generate the java jars for the Arrow project are 
> broken due dependency problems in *flight-core* module: 
> [https://github.com/ursacomputing/crossbow/runs/5566185348?check_suite_focus=true]
>  
> The error started to appear after that PR with dependencies upgrades: 
> [https://github.com/apache/arrow/pull/12550]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-15795) Add a getter for the timeZone in TimeStampMicroTZVector

2022-03-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-15795:
-
Fix Version/s: 7.0.1

> Add a getter for the timeZone in TimeStampMicroTZVector
> ---
>
> Key: ARROW-15795
> URL: https://issues.apache.org/jira/browse/ARROW-15795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 7.0.0
>Reporter: Fabien
>Assignee: Fabien
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 7.0.1, 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TimeStampMicroTZVector is a vector containing a timezone.
> However when reading values from this vector there is no clean way to get 
> this time zone.
> The current way to get it is:
> {code:java}
> ((ArrowType.Timestamp) 
> vector.getField().getFieldType().getType()).getTimezone()
> {code}
> But the vector has the timezone as a private field: 
> https://github.com/apache/arrow/blob/fa78edc8b08fa022e34db8b8fdeef4df41de703f/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroTZVector.java#L41
> so adding a getter is easy and would really simplify reading the values in 
> this vector. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (ARROW-15795) Add a getter for the timeZone in TimeStampMicroTZVector

2022-03-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-15795:


Assignee: Fabien

> Add a getter for the timeZone in TimeStampMicroTZVector
> ---
>
> Key: ARROW-15795
> URL: https://issues.apache.org/jira/browse/ARROW-15795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 7.0.0
>Reporter: Fabien
>Assignee: Fabien
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TimeStampMicroTZVector is a vector containing a timezone.
> However when reading values from this vector there is no clean way to get 
> this time zone.
> The current way to get it is:
> {code:java}
> ((ArrowType.Timestamp) 
> vector.getField().getFieldType().getType()).getTimezone()
> {code}
> But the vector has the timezone as a private field: 
> https://github.com/apache/arrow/blob/fa78edc8b08fa022e34db8b8fdeef4df41de703f/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroTZVector.java#L41
> so adding a getter is easy and would really simplify reading the values in 
> this vector. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-15795) Add a getter for the timeZone in TimeStampMicroTZVector

2022-03-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-15795.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12521
[https://github.com/apache/arrow/pull/12521]

> Add a getter for the timeZone in TimeStampMicroTZVector
> ---
>
> Key: ARROW-15795
> URL: https://issues.apache.org/jira/browse/ARROW-15795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 7.0.0
>Reporter: Fabien
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TimeStampMicroTZVector is a vector containing a timezone.
> However when reading values from this vector there is no clean way to get 
> this time zone.
> The current way to get it is:
> {code:java}
> ((ArrowType.Timestamp) 
> vector.getField().getFieldType().getType()).getTimezone()
> {code}
> But the vector has the timezone as a private field: 
> https://github.com/apache/arrow/blob/fa78edc8b08fa022e34db8b8fdeef4df41de703f/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroTZVector.java#L41
> so adding a getter is easy and would really simplify reading the values in 
> this vector. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (ARROW-15831) [Java] Upgrade Flight dependencies

2022-03-02 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-15831:


 Summary: [Java] Upgrade Flight dependencies
 Key: ARROW-15831
 URL: https://issues.apache.org/jira/browse/ARROW-15831
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Upgrade grpc, netty and protobuf dependencies for Flight



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-11396) [JAVA] Update the access permission from private to protected

2022-02-28 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-11396.
--
Resolution: Won't Fix

> [JAVA] Update the access permission from private to protected
> -
>
> Key: ARROW-11396
> URL: https://issues.apache.org/jira/browse/ARROW-11396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ke Jia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (ARROW-14665) [Java] JdbcToArrowUtils ResultSet iteration bug

2022-02-28 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-14665:


Assignee: Zac

> [Java] JdbcToArrowUtils ResultSet iteration bug
> ---
>
> Key: ARROW-14665
> URL: https://issues.apache.org/jira/browse/ARROW-14665
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.0
>Reporter: Zac
>Assignee: Zac
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When specifying a target batch size, the [iteration 
> logic|https://github.com/apache/arrow/blob/ea42b9e0aa000238fff22fd48f06f3aa516b9f3f/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L266]
>  is currently broken:
> {code:java}
> while (rs.next() && readRowCount < config.getTargetBatchSize()) {
>   compositeConsumer.consume(rs);
>   readRowCount++;
> }
> {code}
> calling next() on the result set will move the cursor forward to the next 
> row, even when we've reached the target batch size.
> For example, consider setting target batch size to 1, and query a table that 
> has three rows.
> On the first iteration, we'll successfully consume the first row. On the next 
> iteration, we'll move the cursor to row 2, but detect the read row count is 
> no longer < target batch size and return.
> Upon calling into the method again with the same result set, rs.next will be 
> called again which will result in successfully consuming row 3.
> *Problem:* row 2 is skipped! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (ARROW-15272) [Java] ArrowVectorIterator eats initialization exceptions when close fails

2022-02-28 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-15272:


Assignee: Andrew Higgins

> [Java] ArrowVectorIterator eats initialization exceptions when close fails
> --
>
> Key: ARROW-15272
> URL: https://issues.apache.org/jira/browse/ARROW-15272
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.1
>Reporter: Andrew Higgins
>Assignee: Andrew Higgins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In ArrowVectorIterator's create method exceptions thrown during initialize() 
> are eaten if there are further exceptions while closing the iterator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-15272) [Java] ArrowVectorIterator eats initialization exceptions when close fails

2022-02-25 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-15272.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12094
[https://github.com/apache/arrow/pull/12094]

> [Java] ArrowVectorIterator eats initialization exceptions when close fails
> --
>
> Key: ARROW-15272
> URL: https://issues.apache.org/jira/browse/ARROW-15272
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.1
>Reporter: Andrew Higgins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In ArrowVectorIterator's create method exceptions thrown during initialize() 
> are eaten if there are further exceptions while closing the iterator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-14549) VectorSchemaRoot is not refreshed when value is null

2022-02-25 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498249#comment-17498249
 ] 

Bryan Cutler edited comment on ARROW-14549 at 2/25/22, 6:22 PM:


[~hu6360567] Calling `allocateNew()` will create new buffers, which is one way 
to clear previous results. If you don't want to allocate any new memory, you 
would need to to zero out all the vectors by calling `zeroVector()` and 
`setValueCount(0)`. If you don't do either of these, the incorrect data you see 
is expected.


was (Author: bryanc):
[~hu6360567] Calling `allocateNew()` will create new buffers, which is one way 
to clear previous results. If you don't want to allocate any new memory, you 
would need to to zero out all the vectors by calling `zeroVector()` and 
`setValueCount(0)`

> VectorSchemaRoot is not refreshed when value is null
> 
>
> Key: ARROW-14549
> URL: https://issues.apache.org/jira/browse/ARROW-14549
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.0
>Reporter: Wenbo Hu
>Priority: Major
>
> I'm using `arrow-jdbc` to convert query result from JDBC to arrow.
>  But the following code, unexpected behaivor happens.
> Assuming a sqlite db, the 2nd row of col_2 and col_3 are null.
> |col_1|col_2|col_3|
> |1|abc|3.14|
> |2|NULL|NULL|
> As document suggests,
> {quote}populated data over and over into the same VectorSchemaRoot in a 
> stream of batches rather than creating a new VectorSchemaRoot instance each 
> time.
> {quote}
> *JdbcToArrowConfig* is set to reuse root.
> {code:java}
> public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)
>  ) {
>  // create config with reuse schema root and custom batch size from option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(true).build();
>   final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>while (iterator.hasNext()){ // retrieve result from iterator 
>  final VectorSchemaRoot root = iterator.next(); 
> option.getCallback().handleBatchResult(root); 
>  root.allocateNew(); // it has to be allocate new 
>    }
>   } catch (java.lang.Exception e){ throw new Exception(e.getMessage()); }
>  }
>  
>  ..
>  // batch_size is set to 1, then callback is called twice.
>  QueryOptions options = new QueryOption(1, 
>  root -> {
>  // if printer is not set, get schema, write header
>  if (printer == null) { 
>   final String[] headers = 
> root.getSchema().getFields().stream().map(Field::getName).toArray(String[]::new);
>  
>   printer = new CSVPrinter(writer, 
> CSVFormat.Builder.create(CSVFormat.DEFAULT).setHeader(headers).build()); 
>   }
>  
>  final int rows = root.getRowCount();
>  final List fieldVectors = root.getFieldVectors();
>  
>  // iterate over rows
>  for (int i = 0; i < rows; i++) { 
>   final int rowId = i; 
>   final List row = fieldVectors.stream().map(v -> 
> v.getObject(rowId)).map(String::valueOf).collect(Collectors.toList()); 
> printer.printRecord(row); 
>   }
>  });
>  
>  connection.querySql("SELECT * FROM test_db", options);
>  ..
> {code}
> if `root.allocateNew()` is called, the csv file is expected,
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,null,null
>  ```
>  Otherwise, null values of 2nd row are remaining the same values of 1st row
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,abc,3.14
>  ```
> **Question: Is expected to call `allocateNew` every time when the schema root 
> is reused?**
> By without reusing schemaroot, the following code works as expected.
> {code:java}
>  public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)) {
>  // create config without reuse schema root and custom batch size from 
> option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(false).build();
>  
>  final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>  while (iterator.hasNext()) {
>  // retrieve result from iterator
>  try (VectorSchemaRoot root =

[jira] [Resolved] (ARROW-14549) VectorSchemaRoot is not refreshed when value is null

2022-02-25 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-14549.
--
Resolution: Not A Problem

> VectorSchemaRoot is not refreshed when value is null
> 
>
> Key: ARROW-14549
> URL: https://issues.apache.org/jira/browse/ARROW-14549
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.0
>Reporter: Wenbo Hu
>Priority: Major
>
> I'm using `arrow-jdbc` to convert query result from JDBC to arrow.
>  But the following code, unexpected behaivor happens.
> Assuming a sqlite db, the 2nd row of col_2 and col_3 are null.
> |col_1|col_2|col_3|
> |1|abc|3.14|
> |2|NULL|NULL|
> As document suggests,
> {quote}populated data over and over into the same VectorSchemaRoot in a 
> stream of batches rather than creating a new VectorSchemaRoot instance each 
> time.
> {quote}
> *JdbcToArrowConfig* is set to reuse root.
> {code:java}
> public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)
>  ) {
>  // create config with reuse schema root and custom batch size from option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(true).build();
>   final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>while (iterator.hasNext()){ // retrieve result from iterator 
>  final VectorSchemaRoot root = iterator.next(); 
> option.getCallback().handleBatchResult(root); 
>  root.allocateNew(); // it has to be allocate new 
>    }
>   } catch (java.lang.Exception e){ throw new Exception(e.getMessage()); }
>  }
>  
>  ..
>  // batch_size is set to 1, then callback is called twice.
>  QueryOptions options = new QueryOption(1, 
>  root -> {
>  // if printer is not set, get schema, write header
>  if (printer == null) { 
>   final String[] headers = 
> root.getSchema().getFields().stream().map(Field::getName).toArray(String[]::new);
>  
>   printer = new CSVPrinter(writer, 
> CSVFormat.Builder.create(CSVFormat.DEFAULT).setHeader(headers).build()); 
>   }
>  
>  final int rows = root.getRowCount();
>  final List fieldVectors = root.getFieldVectors();
>  
>  // iterate over rows
>  for (int i = 0; i < rows; i++) { 
>   final int rowId = i; 
>   final List row = fieldVectors.stream().map(v -> 
> v.getObject(rowId)).map(String::valueOf).collect(Collectors.toList()); 
> printer.printRecord(row); 
>   }
>  });
>  
>  connection.querySql("SELECT * FROM test_db", options);
>  ..
> {code}
> if `root.allocateNew()` is called, the csv file is expected,
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,null,null
>  ```
>  Otherwise, null values of 2nd row are remaining the same values of 1st row
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,abc,3.14
>  ```
> **Question: Is expected to call `allocateNew` every time when the schema root 
> is reused?**
> By without reusing schemaroot, the following code works as expected.
> {code:java}
>  public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)) {
>  // create config without reuse schema root and custom batch size from 
> option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(false).build();
>  
>  final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>  while (iterator.hasNext()) {
>  // retrieve result from iterator
>  try (VectorSchemaRoot root = iterator.next()) { 
>   option.getCallback().handleBatchResult(root); root.allocateNew(); 
>   }
>}
>  } catch (java.lang.Exception e) { throw new Exception(e.getMessage()); }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-14549) VectorSchemaRoot is not refreshed when value is null

2022-02-25 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498249#comment-17498249
 ] 

Bryan Cutler commented on ARROW-14549:
--

[~hu6360567] Calling `allocateNew()` will create new buffers, which is one way 
to clear previous results. If you don't want to allocate any new memory, you 
would need to to zero out all the vectors by calling `zeroVector()` and 
`setValueCount(0)`

> VectorSchemaRoot is not refreshed when value is null
> 
>
> Key: ARROW-14549
> URL: https://issues.apache.org/jira/browse/ARROW-14549
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.0
>Reporter: Wenbo Hu
>Priority: Major
>
> I'm using `arrow-jdbc` to convert query result from JDBC to arrow.
>  But the following code, unexpected behaivor happens.
> Assuming a sqlite db, the 2nd row of col_2 and col_3 are null.
> |col_1|col_2|col_3|
> |1|abc|3.14|
> |2|NULL|NULL|
> As document suggests,
> {quote}populated data over and over into the same VectorSchemaRoot in a 
> stream of batches rather than creating a new VectorSchemaRoot instance each 
> time.
> {quote}
> *JdbcToArrowConfig* is set to reuse root.
> {code:java}
> public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)
>  ) {
>  // create config with reuse schema root and custom batch size from option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(true).build();
>   final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>while (iterator.hasNext()){ // retrieve result from iterator 
>  final VectorSchemaRoot root = iterator.next(); 
> option.getCallback().handleBatchResult(root); 
>  root.allocateNew(); // it has to be allocate new 
>    }
>   } catch (java.lang.Exception e){ throw new Exception(e.getMessage()); }
>  }
>  
>  ..
>  // batch_size is set to 1, then callback is called twice.
>  QueryOptions options = new QueryOption(1, 
>  root -> {
>  // if printer is not set, get schema, write header
>  if (printer == null) { 
>   final String[] headers = 
> root.getSchema().getFields().stream().map(Field::getName).toArray(String[]::new);
>  
>   printer = new CSVPrinter(writer, 
> CSVFormat.Builder.create(CSVFormat.DEFAULT).setHeader(headers).build()); 
>   }
>  
>  final int rows = root.getRowCount();
>  final List fieldVectors = root.getFieldVectors();
>  
>  // iterate over rows
>  for (int i = 0; i < rows; i++) { 
>   final int rowId = i; 
>   final List row = fieldVectors.stream().map(v -> 
> v.getObject(rowId)).map(String::valueOf).collect(Collectors.toList()); 
> printer.printRecord(row); 
>   }
>  });
>  
>  connection.querySql("SELECT * FROM test_db", options);
>  ..
> {code}
> if `root.allocateNew()` is called, the csv file is expected,
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,null,null
>  ```
>  Otherwise, null values of 2nd row are remaining the same values of 1st row
>  ```
>  column_1,column_2,column_3
>  1,abc,3.14
>  2,abc,3.14
>  ```
> **Question: Is expected to call `allocateNew` every time when the schema root 
> is reused?**
> By without reusing schemaroot, the following code works as expected.
> {code:java}
>  public void querySql(String query, QueryOption option) throws Exception {
>  try (final java.sql.Connection conn = connectContainer.getConnection();
>  final Statement stmt = conn.createStatement();
>  final ResultSet rs = stmt.executeQuery(query)) {
>  // create config without reuse schema root and custom batch size from 
> option
>  final JdbcToArrowConfig config = new 
> JdbcToArrowConfigBuilder().setAllocator(new 
> RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
>  
> .setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(false).build();
>  
>  final ArrowVectorIterator iterator = 
> JdbcToArrow.sqlToArrowVectorIterator(rs, config);
>  while (iterator.hasNext()) {
>  // retrieve result from iterator
>  try (VectorSchemaRoot root = iterator.next()) { 
>   option.getCallback().handleBatchResult(root); root.allocateNew(); 
>   }
>}
>  } catch (java.lang.Exception e) { throw new Exception(e.getMessage()); }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-14665) [Java] JdbcToArrowUtils ResultSet iteration bug

2022-02-25 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-14665.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 11667
[https://github.com/apache/arrow/pull/11667]

> [Java] JdbcToArrowUtils ResultSet iteration bug
> ---
>
> Key: ARROW-14665
> URL: https://issues.apache.org/jira/browse/ARROW-14665
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 6.0.0
>Reporter: Zac
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When specifying a target batch size, the [iteration 
> logic|https://github.com/apache/arrow/blob/ea42b9e0aa000238fff22fd48f06f3aa516b9f3f/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L266]
>  is currently broken:
> {code:java}
> while (rs.next() && readRowCount < config.getTargetBatchSize()) {
>   compositeConsumer.consume(rs);
>   readRowCount++;
> }
> {code}
> calling next() on the result set will move the cursor forward to the next 
> row, even when we've reached the target batch size.
> For example, consider setting target batch size to 1, and query a table that 
> has three rows.
> On the first iteration, we'll successfully consume the first row. On the next 
> iteration, we'll move the cursor to row 2, but detect the read row count is 
> no longer < target batch size and return.
> Upon calling into the method again with the same result set, rs.next will be 
> called again which will result in successfully consuming row 3.
> *Problem:* row 2 is skipped! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-15722) [Java] Improve error message for ListVector with wrong number of children

2022-02-22 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-15722.
--
Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12456
[https://github.com/apache/arrow/pull/12456]

> [Java] Improve error message for ListVector with wrong number of children
> -
>
> Key: ARROW-15722
> URL: https://issues.apache.org/jira/browse/ARROW-15722
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If a ListVector is made without any children, the error message will say 
> "Lists have only one child. Found: []".
> The wording could be improved a little to let the user know what went wrong.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (ARROW-15746) [Java] Add arrow-flight pom to list of artifacts to deploy

2022-02-21 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-15746:


 Summary: [Java] Add arrow-flight pom to list of artifacts to deploy
 Key: ARROW-15746
 URL: https://issues.apache.org/jira/browse/ARROW-15746
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The arrow-flight pom is currently not being deployed, see 
https://lists.apache.org/thread/fbrgvf30os5h4ox7fk4txrlgdp1g5g4g



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-15486) [Release][Java] Verify staged maven artifacts

2022-02-17 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-15486:
-
Summary: [Release][Java] Verify staged maven artifacts  (was: 
[Relase][Java] Verify staged maven artifacts)

> [Release][Java] Verify staged maven artifacts
> -
>
> Key: ARROW-15486
> URL: https://issues.apache.org/jira/browse/ARROW-15486
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> We have two tests right now:
> 1. Execute {{mvn test}} from the source tarball's java directory testing the 
> source 
> https://github.com/apache/arrow/blob/master/dev/release/verify-release-candidate.sh#L278
> 2. Verify the checksums and signatures of the uploaded maven artifacts 
> https://github.com/apache/arrow/blob/master/dev/release/verify-release-candidate.sh#L766
> But we don't actually *test* the packages. We should add that to the 
> verification scripts, since 7.0 is going to be the first release shipping the 
> jars with bundled JNI libraries.
> cc [~kou] [~anthonylouis]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (ARROW-15722) [Java] Improve error message for ListVector with wrong number of children

2022-02-17 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-15722:


 Summary: [Java] Improve error message for ListVector with wrong 
number of children
 Key: ARROW-15722
 URL: https://issues.apache.org/jira/browse/ARROW-15722
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


If a ListVector is made without any children, the error message will say "Lists 
have only one child. Found: []".

The wording could be improved a little to let the user know what went wrong.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-13814) [CI] Nightly integration build with spark master failing to compile spark

2021-10-05 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-13814.
--
Resolution: Fixed

Issue resolved by pull request 11282
[https://github.com/apache/arrow/pull/11282]

> [CI] Nightly integration build with spark master failing to compile spark
> -
>
> Key: ARROW-13814
> URL: https://issues.apache.org/jira/browse/ARROW-13814
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration
>Reporter: Joris Van den Bossche
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The {{test-conda-python-3.8-spark-master}} nightly builds are failing at the 
> moment, see eg https://github.com/ursacomputing/crossbow/runs/3469630203
> There is a long output, which includes (not fully sure if this is the 
> relevant part):
> {code}
> ...
> Error:  ## Exception when compiling 494 sources to 
> /spark/sql/catalyst/target/scala-2.12/classes
> ...
> {code}
> cc [~bryanc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-14014) FlightClient.ClientStreamListener not notified on error when parsing invalid trailers

2021-10-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-14014.
--
Fix Version/s: 6.0.0
   Resolution: Fixed

Issue resolved by pull request 11311
[https://github.com/apache/arrow/pull/11311]

> FlightClient.ClientStreamListener not notified on error when parsing invalid 
> trailers
> -
>
> Key: ARROW-14014
> URL: https://issues.apache.org/jira/browse/ARROW-14014
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 5.0.0
>Reporter: manudebouc
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When using FlightClient.startPut combined with an AsyncPutListener, we are 
> sometimes blocked for ever on FlightClient.ClientStreamListener.getResult() 
> because we do not receive error notification.
> Due to intermediate proxy we sometime receive 502 or 504 errors and invalid 
> {{':status'}} header in trailers that cannot be parsed by 
> {{StatusUtils.parseTrailers in SetStreamObserver.onError(Throwable t)}} 
> generating an {{IllegalArgumentException}} that prevent our listener 
> notification, blocking for ever.
> {{SEVERE: Exception while executing runnable 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@de593f34
>  java.lang.IllegalArgumentException: Invalid character ':' in key name 
> ':status'
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:275)
>  at io.grpc.Metadata$Key.validateName(Metadata.java:742)
>  at io.grpc.Metadata$Key.(Metadata.java:750)
>  at io.grpc.Metadata$Key.(Metadata.java:668)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:959)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:954)
>  at io.grpc.Metadata$Key.of(Metadata.java:705)
>  at io.grpc.Metadata$Key.of(Metadata.java:701)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.parseTrailers(StatusUtils.java:164)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcStatusAndTrailers(StatusUtils.java:128)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:152)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:176)
>  at 
> org.apache.arrow.flight.FlightClient$SetStreamObserver.onError(FlightClient.java:440)
>  at 
> io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
>  at 
> io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
>  at 
> io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
>  at 
> io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
>  at 
> org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117)
>  at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
>  at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
>  at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:823) }}
> It seems also that same problem exsists with FlightClient.getStream() and 
> ClientResponseObserver.onError(Throwable t)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-14014) FlightClient.ClientStreamListener not notified on error when parsing invalid trailers

2021-10-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-14014:


Assignee: Bryan Cutler

> FlightClient.ClientStreamListener not notified on error when parsing invalid 
> trailers
> -
>
> Key: ARROW-14014
> URL: https://issues.apache.org/jira/browse/ARROW-14014
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 5.0.0
>Reporter: manudebouc
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using FlightClient.startPut combined with an AsyncPutListener, we are 
> sometimes blocked for ever on FlightClient.ClientStreamListener.getResult() 
> because we do not receive error notification.
> Due to intermediate proxy we sometime receive 502 or 504 errors and invalid 
> {{':status'}} header in trailers that cannot be parsed by 
> {{StatusUtils.parseTrailers in SetStreamObserver.onError(Throwable t)}} 
> generating an {{IllegalArgumentException}} that prevent our listener 
> notification, blocking for ever.
> {{SEVERE: Exception while executing runnable 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@de593f34
>  java.lang.IllegalArgumentException: Invalid character ':' in key name 
> ':status'
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:275)
>  at io.grpc.Metadata$Key.validateName(Metadata.java:742)
>  at io.grpc.Metadata$Key.(Metadata.java:750)
>  at io.grpc.Metadata$Key.(Metadata.java:668)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:959)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:954)
>  at io.grpc.Metadata$Key.of(Metadata.java:705)
>  at io.grpc.Metadata$Key.of(Metadata.java:701)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.parseTrailers(StatusUtils.java:164)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcStatusAndTrailers(StatusUtils.java:128)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:152)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:176)
>  at 
> org.apache.arrow.flight.FlightClient$SetStreamObserver.onError(FlightClient.java:440)
>  at 
> io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
>  at 
> io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
>  at 
> io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
>  at 
> io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
>  at 
> org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117)
>  at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
>  at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
>  at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:823) }}
> It seems also that same problem exsists with FlightClient.getStream() and 
> ClientResponseObserver.onError(Throwable t)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-14198) [Java] Upgrade Netty and gRPC dependencies

2021-10-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-14198.
--
Fix Version/s: 6.0.0
   Resolution: Fixed

Issue resolved by pull request 11293
[https://github.com/apache/arrow/pull/11293]

> [Java] Upgrade Netty and gRPC dependencies
> --
>
> Key: ARROW-14198
> URL: https://issues.apache.org/jira/browse/ARROW-14198
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current versions in use are quite old and subject to vulnerabilities.
> See https://www.cvedetails.com/cve/CVE-2021-21409/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-14014) FlightClient.ClientStreamListener not notified on error when parsing invalid trailers

2021-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423442#comment-17423442
 ] 

Bryan Cutler commented on ARROW-14014:
--

I've run into this a couple times [~manudebouc], but can't seem to reproduce it 
right now. Do you have some code you could share that would reproduce this?

> FlightClient.ClientStreamListener not notified on error when parsing invalid 
> trailers
> -
>
> Key: ARROW-14014
> URL: https://issues.apache.org/jira/browse/ARROW-14014
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 5.0.0
>Reporter: manudebouc
>Priority: Major
>
> When using FlightClient.startPut combined with an AsyncPutListener, we are 
> sometimes blocked for ever on FlightClient.ClientStreamListener.getResult() 
> because we do not receive error notification.
> Due to intermediate proxy we sometime receive 502 or 504 errors and invalid 
> {{':status'}} header in trailers that cannot be parsed by 
> {{StatusUtils.parseTrailers in SetStreamObserver.onError(Throwable t)}} 
> generating an {{IllegalArgumentException}} that prevent our listener 
> notification, blocking for ever.
> {{SEVERE: Exception while executing runnable 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@de593f34
>  java.lang.IllegalArgumentException: Invalid character ':' in key name 
> ':status'
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:275)
>  at io.grpc.Metadata$Key.validateName(Metadata.java:742)
>  at io.grpc.Metadata$Key.(Metadata.java:750)
>  at io.grpc.Metadata$Key.(Metadata.java:668)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:959)
>  at io.grpc.Metadata$AsciiKey.(Metadata.java:954)
>  at io.grpc.Metadata$Key.of(Metadata.java:705)
>  at io.grpc.Metadata$Key.of(Metadata.java:701)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.parseTrailers(StatusUtils.java:164)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcStatusAndTrailers(StatusUtils.java:128)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:152)
>  at 
> org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:176)
>  at 
> org.apache.arrow.flight.FlightClient$SetStreamObserver.onError(FlightClient.java:440)
>  at 
> io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
>  at 
> io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
>  at 
> io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
>  at 
> io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
>  at 
> org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117)
>  at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
>  at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
>  at 
> io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
>  at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:823) }}
> It seems also that same problem exsists with FlightClient.getStream() and 
> ClientResponseObserver.onError(Throwable t)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-14198) [Java] Upgrade Netty and gRPC dependencies

2021-10-01 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-14198:


 Summary: [Java] Upgrade Netty and gRPC dependencies
 Key: ARROW-14198
 URL: https://issues.apache.org/jira/browse/ARROW-14198
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Current versions in use are quite old and subject to vulnerabilities.

See https://www.cvedetails.com/cve/CVE-2021-21409/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9647) [Java] Cannot install arrow-memory 1.0.0 from maven central

2021-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9647.
-
Resolution: Not A Problem

> [Java] Cannot install arrow-memory 1.0.0 from maven central
> ---
>
> Key: ARROW-9647
> URL: https://issues.apache.org/jira/browse/ARROW-9647
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 1.0.0
>Reporter: Marius van Niekerk
>Priority: Major
>
> Seems that the jar is missing from 
> [https://mvnrepository.com/artifact/org.apache.arrow/arrow-memory/1.0.0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-13814) [CI] Nightly integration build with spark master failing to compile spark

2021-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-13814:


Assignee: Bryan Cutler

> [CI] Nightly integration build with spark master failing to compile spark
> -
>
> Key: ARROW-13814
> URL: https://issues.apache.org/jira/browse/ARROW-13814
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration
>Reporter: Joris Van den Bossche
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{test-conda-python-3.8-spark-master}} nightly builds are failing at the 
> moment, see eg https://github.com/ursacomputing/crossbow/runs/3469630203
> There is a long output, which includes (not fully sure if this is the 
> relevant part):
> {code}
> ...
> Error:  ## Exception when compiling 494 sources to 
> /spark/sql/catalyst/target/scala-2.12/classes
> ...
> {code}
> cc [~bryanc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13872) [Java] ExtensionTypeVector does not work with RangeEqualsVisitor

2021-09-02 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-13872:


 Summary: [Java] ExtensionTypeVector does not work with 
RangeEqualsVisitor
 Key: ARROW-13872
 URL: https://issues.apache.org/jira/browse/ARROW-13872
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 5.0.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler


When using an ExtensionTypeVector with a RangeEqualsVector to compare with 
another extension type vector, it fails because in vector.accept() the 
extension type defers to the underlyingVector, but this is not done for the 
vector initially set in the RangeEqualsVisitor, so it ends up either failing 
due to different types or attempting to cast the extension vector to the 
underlying vector type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13814) [CI] Nightly integration build with spark master failing to compile spark

2021-08-31 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407801#comment-17407801
 ] 

Bryan Cutler commented on ARROW-13814:
--

Not sure why compilation would fail like this, but I'll look into it.

> [CI] Nightly integration build with spark master failing to compile spark
> -
>
> Key: ARROW-13814
> URL: https://issues.apache.org/jira/browse/ARROW-13814
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 6.0.0
>
>
> The {{test-conda-python-3.8-spark-master}} nightly builds are failing at the 
> moment, see eg https://github.com/ursacomputing/crossbow/runs/3469630203
> There is a long output, which includes (not fully sure if this is the 
> relevant part):
> {code}
> ...
> Error:  ## Exception when compiling 494 sources to 
> /spark/sql/catalyst/target/scala-2.12/classes
> ...
> {code}
> cc [~bryanc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-12679) [Java] JDBC adapter does not preserve SQL-nullability

2021-07-28 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-12679:


Assignee: Joris Peeters  (was: Bryan Cutler)

> [Java] JDBC adapter does not preserve SQL-nullability
> -
>
> Key: ARROW-12679
> URL: https://issues.apache.org/jira/browse/ARROW-12679
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Joris Peeters
>Assignee: Joris Peeters
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When using the JDBC adapter, the schema of the VectorSchemaRoot's in the 
> ArrowVectorIterator have all the columns as nullable, irregardless of whether 
> they are nullable in SQL.
> This should be fixed, so that the schema has only those columns as nullable 
> that are also nullable in SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-12679) [Java] JDBC adapter does not preserve SQL-nullability

2021-07-28 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-12679:


Assignee: Bryan Cutler  (was: Joris Peeters)

> [Java] JDBC adapter does not preserve SQL-nullability
> -
>
> Key: ARROW-12679
> URL: https://issues.apache.org/jira/browse/ARROW-12679
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Joris Peeters
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When using the JDBC adapter, the schema of the VectorSchemaRoot's in the 
> ArrowVectorIterator have all the columns as nullable, irregardless of whether 
> they are nullable in SQL.
> This should be fixed, so that the schema has only those columns as nullable 
> that are also nullable in SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-13044) [Java] Union vectors should extend ValueVector

2021-06-14 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-13044.
--
Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10513
[https://github.com/apache/arrow/pull/10513]

> [Java] Union vectors should extend ValueVector
> --
>
> Key: ARROW-13044
> URL: https://issues.apache.org/jira/browse/ARROW-13044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 4.0.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I was going to try using a DenseUnionVector as the underlying vector of an 
> extension type but it's not currently possible because ExtensionTypeVector 
> has a type constraint for the underlying storage to extend BaseValueVector 
> and the union vectors do not extend this class.
> It should be possible for UnionVector and DenseUnionVector to extend 
> AbstractContainerVector, which is a subclass of ValueVector, then relax the 
> type constraint for an ExtensionTypeVector to use the ValueVector interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13076) [Java] Enable ExtensionType to use StructVector and UnionVector for underlying storage

2021-06-14 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-13076:


 Summary: [Java] Enable ExtensionType to use StructVector and 
UnionVector for underlying storage
 Key: ARROW-13076
 URL: https://issues.apache.org/jira/browse/ARROW-13076
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Currently, an ExtensionTypeVector has a type constraint for the underlying 
storage to extend BaseValueVector. StructVector, UnionVector and 
DenseUnionVector do not extend this base class.

After ARROW-13044, Union vectors will extend the ValueVector interface and the 
extension vector type constrain could be relaxed to this interface to allow the 
above vector types to be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13044) [Java] Union vectors should extend ValueVector

2021-06-14 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-13044:
-
Description: 
I was going to try using a DenseUnionVector as the underlying vector of an 
extension type but it's not currently possible because ExtensionTypeVector has 
a type constraint for the underlying storage to extend BaseValueVector and the 
union vectors do not extend this class.

It should be possible for UnionVector and DenseUnionVector to extend 
AbstractContainerVector, which is a subclass of ValueVector, then relax the 
type constraint for an ExtensionTypeVector to use the ValueVector interface.

  was:
I was going to try using a DenseUnionVector as the underlying vector of an 
extension type but it's not currently possible because ExtensionTypeVector has 
a type constraint for the underlying storage to extend BaseValueVector and the 
union vectors do not extend this class.

It should be possible for UnionVector and DenseUnionVector to extend 
AbstractContainerVector, which is a subclass of BaseValueVector.


> [Java] Union vectors should extend ValueVector
> --
>
> Key: ARROW-13044
> URL: https://issues.apache.org/jira/browse/ARROW-13044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 4.0.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I was going to try using a DenseUnionVector as the underlying vector of an 
> extension type but it's not currently possible because ExtensionTypeVector 
> has a type constraint for the underlying storage to extend BaseValueVector 
> and the union vectors do not extend this class.
> It should be possible for UnionVector and DenseUnionVector to extend 
> AbstractContainerVector, which is a subclass of ValueVector, then relax the 
> type constraint for an ExtensionTypeVector to use the ValueVector interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13044) [Java] Union vectors should extend ValueVector

2021-06-14 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-13044:
-
Summary: [Java] Union vectors should extend ValueVector  (was: [Java] Union 
vectors should extend BaseValueVector)

> [Java] Union vectors should extend ValueVector
> --
>
> Key: ARROW-13044
> URL: https://issues.apache.org/jira/browse/ARROW-13044
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 4.0.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I was going to try using a DenseUnionVector as the underlying vector of an 
> extension type but it's not currently possible because ExtensionTypeVector 
> has a type constraint for the underlying storage to extend BaseValueVector 
> and the union vectors do not extend this class.
> It should be possible for UnionVector and DenseUnionVector to extend 
> AbstractContainerVector, which is a subclass of BaseValueVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13044) [Java] Union vectors should extend BaseValueVector

2021-06-10 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-13044:


 Summary: [Java] Union vectors should extend BaseValueVector
 Key: ARROW-13044
 URL: https://issues.apache.org/jira/browse/ARROW-13044
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 4.0.1
Reporter: Bryan Cutler
Assignee: Bryan Cutler


I was going to try using a DenseUnionVector as the underlying vector of an 
extension type but it's not currently possible because ExtensionTypeVector has 
a type constraint for the underlying storage to extend BaseValueVector and the 
union vectors do not extend this class.

It should be possible for UnionVector and DenseUnionVector to extend 
AbstractContainerVector, which is a subclass of BaseValueVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8666) [Java] DenseUnionVector has no way to set offset/validity directly

2021-06-10 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361229#comment-17361229
 ] 

Bryan Cutler commented on ARROW-8666:
-

[~lidavidm] can this be closed then, or are there further API improvements 
needed?

> [Java] DenseUnionVector has no way to set offset/validity directly
> --
>
> Key: ARROW-8666
> URL: https://issues.apache.org/jira/browse/ARROW-8666
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.17.0
>Reporter: David Li
>Priority: Major
>
> You can set the type ID manually, but you cannot set the offset or validity 
> directly. Ideally, we'd have an API like Python that lets us build it 
> directly from constituent vectors and the offsets/type IDs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11223) [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and getBufferSizeFor is buggy

2021-02-22 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-11223.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9187
[https://github.com/apache/arrow/pull/9187]

> [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and 
> getBufferSizeFor is buggy
> -
>
> Key: ARROW-11223
> URL: https://issues.apache.org/jira/browse/ARROW-11223
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> We may get error  java.lang.IndexOutOfBoundsException: index: 15880, length: 
> 4 (expected: range(0, 15880)).
> I test on arrow 2.0.0
> Reproduce code in scala:
> {code}
> import org.apache.arrow.vector.VarCharVector
> import org.apache.arrow.memory.RootAllocator
> val rootAllocator = new RootAllocator(Long.MaxValue)
> val v1 = new VarCharVector("var1", rootAllocator)
> v1.allocateNew()
> val valueCount = 3970 // use any number >= 3970 will get similar error
> for (idx <- 0 until valueCount) {
>   v1.setNull(idx)
> }
> v1.getBufferSizeFor(valueCount) # failed, get error 
> java.lang.IndexOutOfBoundsException: index: 15880, length: 4 (expected: 
> range(0, 15880))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11739) [Java] Add API for getBufferSizeFor() with density to BaseVariableWidthVector

2021-02-22 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-11739:
-
Description: 
Following the discussion on https://github.com/apache/arrow/pull/9187.

Proposed API in BaseVariableWidthVector.java:

{code:java}
/**
   * Get the potential buffer size for a particular number of records and 
density.
   * @param valueCount desired number of elements in the vector
   * @param density average number of bytes per variable width element
   * @return estimated size of underlying buffers if the vector holds
   * a given number of elements
   */
public int getBufferSizeFor(final int valueCount, double density)
{code}

The current `getBufferSizeFor(int valueCount)` for BaseVariableWidthVector 
requires that validity and offset vectors have already been allocated for at 
least the given `valueCount`. If the aim of this method is to estimate memory 
usage for a value count, it's not very useful because it can only give sizes 
for less than or equal value counts in the currently allocated vector.

A better approach for approximating memory usage is to include a density 
argument, along with value count. Then the buffer estimate does not require the 
validity and offset vector to have any allocation. This also is inline with 
`setInitialCapacity(int valueCount, double density)`

NOTE: this API should also be added to BaseLargeVariableWidthVector and 
possibly BaseRepeatedValueVector(Large) as well.

  was:
Following the discussion on https://github.com/apache/arrow/pull/9187.

The current `getBufferSize(int valueCount)` for BaseVariableWidthVector 
requires that validity and offset vectors have already been allocated for at 
least the given `valueCount`. If the aim of this method is to estimate the 


> [Java] Add API for getBufferSizeFor() with density to BaseVariableWidthVector
> -
>
> Key: ARROW-11739
> URL: https://issues.apache.org/jira/browse/ARROW-11739
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
>
> Following the discussion on https://github.com/apache/arrow/pull/9187.
> Proposed API in BaseVariableWidthVector.java:
> {code:java}
> /**
>* Get the potential buffer size for a particular number of records and 
> density.
>* @param valueCount desired number of elements in the vector
>* @param density average number of bytes per variable width element
>* @return estimated size of underlying buffers if the vector holds
>* a given number of elements
>*/
> public int getBufferSizeFor(final int valueCount, double density)
> {code}
> The current `getBufferSizeFor(int valueCount)` for BaseVariableWidthVector 
> requires that validity and offset vectors have already been allocated for at 
> least the given `valueCount`. If the aim of this method is to estimate memory 
> usage for a value count, it's not very useful because it can only give sizes 
> for less than or equal value counts in the currently allocated vector.
> A better approach for approximating memory usage is to include a density 
> argument, along with value count. Then the buffer estimate does not require 
> the validity and offset vector to have any allocation. This also is inline 
> with `setInitialCapacity(int valueCount, double density)`
> NOTE: this API should also be added to BaseLargeVariableWidthVector and 
> possibly BaseRepeatedValueVector(Large) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11739) [Java] Add API for getBufferSize() with density to BaseVariableWidthVector

2021-02-22 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-11739:
-
Description: 
Following the discussion on https://github.com/apache/arrow/pull/9187.

The current `getBufferSize(int valueCount)` for BaseVariableWidthVector 
requires that validity and offset vectors have already been allocated for at 
least the given `valueCount`. If the aim of this method is to estimate the 

> [Java] Add API for getBufferSize() with density to BaseVariableWidthVector
> --
>
> Key: ARROW-11739
> URL: https://issues.apache.org/jira/browse/ARROW-11739
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
>
> Following the discussion on https://github.com/apache/arrow/pull/9187.
> The current `getBufferSize(int valueCount)` for BaseVariableWidthVector 
> requires that validity and offset vectors have already been allocated for at 
> least the given `valueCount`. If the aim of this method is to estimate the 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11739) [Java] Add API for getBufferSizeFor() with density to BaseVariableWidthVector

2021-02-22 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-11739:
-
Summary: [Java] Add API for getBufferSizeFor() with density to 
BaseVariableWidthVector  (was: [Java] Add API for getBufferSize() with density 
to BaseVariableWidthVector)

> [Java] Add API for getBufferSizeFor() with density to BaseVariableWidthVector
> -
>
> Key: ARROW-11739
> URL: https://issues.apache.org/jira/browse/ARROW-11739
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
>
> Following the discussion on https://github.com/apache/arrow/pull/9187.
> The current `getBufferSize(int valueCount)` for BaseVariableWidthVector 
> requires that validity and offset vectors have already been allocated for at 
> least the given `valueCount`. If the aim of this method is to estimate the 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11739) [Java] Add API for getBufferSize() with density to BaseVariableWidthVector

2021-02-22 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-11739:


 Summary: [Java] Add API for getBufferSize() with density to 
BaseVariableWidthVector
 Key: ARROW-11739
 URL: https://issues.apache.org/jira/browse/ARROW-11739
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11382) [Java] NullVector field name can't be set

2021-01-25 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-11382:


 Summary: [Java] NullVector field name can't be set
 Key: ARROW-11382
 URL: https://issues.apache.org/jira/browse/ARROW-11382
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


Currently, the Java NullVector has a default Field name fixed to 
DATA_VECTOR_NAME, which is "$data$". This should be able to be changed by the 
user, probably by having an alternate constructor that accepts a name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-3850) [Python] Support MapType and StructType for enhanced PySpark integration

2020-11-20 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236400#comment-17236400
 ] 

Bryan Cutler edited comment on ARROW-3850 at 11/20/20, 7:47 PM:


MapType has been added to PySpark when using PyArrow 2.0.0+, I think we can 
close this now.


was (Author: bryanc):
MapType has been added to PySpark, I think we can close this now.

> [Python] Support MapType and StructType for enhanced PySpark integration
> 
>
> Key: ARROW-3850
> URL: https://issues.apache.org/jira/browse/ARROW-3850
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Florian Wilhelm
>Priority: Major
> Fix For: 3.0.0
>
>
> It would be great to support MapType and (nested) StructType in Arrow so that 
> PySpark can make use of it.
>  
>  Quite often as in my use-case in Hive table cells are also complex types 
> saved. Currently it's not possible to user the new 
> {{[pandas_udf|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode#pyspark.sql.functions.pandas_udf]}}
>  decorator which internally uses Arrow to generate a UDF for columns with 
> complex types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-3850) [Python] Support MapType and StructType for enhanced PySpark integration

2020-11-20 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-3850.
-
Resolution: Fixed

MapType has been added to PySpark, I think we can close this now.

> [Python] Support MapType and StructType for enhanced PySpark integration
> 
>
> Key: ARROW-3850
> URL: https://issues.apache.org/jira/browse/ARROW-3850
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Florian Wilhelm
>Priority: Major
> Fix For: 3.0.0
>
>
> It would be great to support MapType and (nested) StructType in Arrow so that 
> PySpark can make use of it.
>  
>  Quite often as in my use-case in Hive table cells are also complex types 
> saved. Currently it's not possible to user the new 
> {{[pandas_udf|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode#pyspark.sql.functions.pandas_udf]}}
>  decorator which internally uses Arrow to generate a UDF for columns with 
> complex types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9862) Throw an exception in UnsafeDirectLittleEndian on Big-Endian platform

2020-11-13 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9862.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 8057
[https://github.com/apache/arrow/pull/8057]

> Throw an exception in UnsafeDirectLittleEndian on Big-Endian platform
> -
>
> Key: ARROW-9862
> URL: https://issues.apache.org/jira/browse/ARROW-9862
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The current code throws an intended exception on a big-endian platform while 
> this class supports native endianness for the primitive data types (up to 
> 64-bit).
> {code:java}
> throw new IllegalStateException("Arrow only runs on LittleEndian systems.");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-10512) [Python] Arrow to Pandas conversion promotes child array to float for NULL values

2020-11-06 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-10512:


 Summary: [Python] Arrow to Pandas conversion promotes child array 
to float for NULL values
 Key: ARROW-10512
 URL: https://issues.apache.org/jira/browse/ARROW-10512
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler


When converting a nested Arrow array to Pandas, if a child array is an integer 
type with NULL values, it gets promoted to floating point and NULL values are 
replaced with NaNs. Since the Pandas conversion for these types results in 
Python objects, it is not necessary to use NaN and `None` values could be 
inserted instead. This is the case for ListType, MapType and StructType, etc.

{code}
In [4]: s = pd.Series([[1, 2, 3], [4, 5, None]])

In [5]: arr = pa.Array.from_pandas(s)

In [6]: arr.type
Out[6]: ListType(list)

In [7]: arr.to_pandas()
Out[7]: 
0[1.0, 2.0, 3.0]
1[4.0, 5.0, nan]
dtype: object {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9861) [Java] Failed Arrow Vector on big-endian platform

2020-11-04 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9861.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 8056
[https://github.com/apache/arrow/pull/8056]

> [Java] Failed Arrow Vector on big-endian platform
> -
>
> Key: ARROW-9861
> URL: https://issues.apache.org/jira/browse/ARROW-9861
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The following test failure occurs on a big-endian platform
> {code:java}
> mvn -B -Drat.skip=true 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn
>  -Dflatc.download.skip=true -rf :arrow-vector test
> ...
> [INFO] Running org.apache.arrow.vector.TestDecimalVector
> [ERROR] Tests run: 9, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 0.008 
> s <<< FAILURE! - in org.apache.arrow.vector.TestDecimalVector
> [ERROR] setUsingArrowBufOfLEInts  Time elapsed: 0.001 s  <<< FAILURE!
> java.lang.AssertionError: expected:<705.32> but was:<-20791293.44>
>   at 
> org.apache.arrow.vector.TestDecimalVector.setUsingArrowBufOfLEInts(TestDecimalVector.java:295)
> [ERROR] setUsingArrowLongLEBytes  Time elapsed: 0.001 s  <<< FAILURE!
> java.lang.AssertionError: expected:<9223372036854775807> but was:<-129>
>   at 
> org.apache.arrow.vector.TestDecimalVector.setUsingArrowLongLEBytes(TestDecimalVector.java:322)
> [ERROR] testLongReadWrite  Time elapsed: 0.001 s  <<< FAILURE!
> java.lang.AssertionError: expected:<-2> but was:<-72057594037927937>
>   at 
> org.apache.arrow.vector.TestDecimalVector.testLongReadWrite(TestDecimalVector.java:176)
> ...
> [ERROR] Failures: 
> [ERROR]   TestDecimalVector.setUsingArrowBufOfLEInts:295 expected:<705.32> 
> but was:<-20791293.44>
> [ERROR]   TestDecimalVector.setUsingArrowLongLEBytes:322 
> expected:<9223372036854775807> but was:<-129>
> [ERROR]   TestDecimalVector.testLongReadWrite:176 expected:<-2> but 
> was:<-72057594037927937>
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10478) [Dev][Release] Correct Java versions to 3.0.0-SNAPSHOT

2020-11-03 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-10478.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 8577
[https://github.com/apache/arrow/pull/8577]

> [Dev][Release] Correct Java versions to 3.0.0-SNAPSHOT
> --
>
> Key: ARROW-10478
> URL: https://issues.apache.org/jira/browse/ARROW-10478
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: James Duong
>Assignee: James Duong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the 2.0.0 release, other languages were bumped to 3.0.0-SNAPSHOT in 
> master but Java pom files weren't. Update them to 3.0.0-SNAPSHOT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9709) [Java] Test cases in arrow-vector assume little-endian platform

2020-11-02 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9709.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 7944
[https://github.com/apache/arrow/pull/7944]

> [Java] Test cases in arrow-vector assume little-endian platform
> ---
>
> Key: ARROW-9709
> URL: https://issues.apache.org/jira/browse/ARROW-9709
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{MessageSerializerTest.testWriteMessageBufferAligned}}, 
> {{TestArrowReaderWriter.testChannelReadFully}} and 
> {{TestArrowReaderWriter.testChannelReadFullyEos}} assume only a little-endian 
> platform.
> Two tests in {{TestArrowReaderWriter}} fails on a big-endian platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9709) [Java] Test cases in arrow-vector assume little-endian platform

2020-11-02 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9709:

Component/s: Java

> [Java] Test cases in arrow-vector assume little-endian platform
> ---
>
> Key: ARROW-9709
> URL: https://issues.apache.org/jira/browse/ARROW-9709
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{MessageSerializerTest.testWriteMessageBufferAligned}}, 
> {{TestArrowReaderWriter.testChannelReadFully}} and 
> {{TestArrowReaderWriter.testChannelReadFullyEos}} assume only a little-endian 
> platform.
> Two tests in {{TestArrowReaderWriter}} fails on a big-endian platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-10457) [CI] Fix Spark branch-3.0 integration tests

2020-11-01 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-10457:


 Summary: [CI] Fix Spark branch-3.0 integration tests
 Key: ARROW-10457
 URL: https://issues.apache.org/jira/browse/ARROW-10457
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Bryan Cutler


The Spark branch-3.0 is currently failing because this branch has not been 
updated or patched to use the latest Arrow Java, see 
https://github.com/ursa-labs/crossbow/actions?query=branch:actions-681-github-test-conda-python-3.7-spark-branch-3.0.
 The Spark branch-3.0 has already been released and only able to get bug fixes. 
Instead of patching the Spark build, we should not try to rebuild Spark with 
the latest Arrow Java, and instead only test against the latest pyarrow. This 
should work, but might also need a minor Python patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant shape, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223839#comment-17223839
 ] 

Bryan Cutler commented on ARROW-1614:
-

Great, [~rokm] , looking forward to it. I also updated the title and 
description to reflect this is for tensors with the same fixed shape.

> [C++] Add a Tensor logical value type with constant shape, implemented using 
> ExtensionType
> --
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> shape/dimensions. These would be stored as a binary value, plus some metadata 
> to store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-1614) [C++] Add a Tensor logical value type with constant shape, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-1614:

Description: In an Arrow table, we would like to add support for a column 
that has values cells each containing a tensor value, with all tensors having 
the same shape/dimensions. These would be stored as a binary value, plus some 
metadata to store type and shape/strides.  (was: In an Arrow table, we would 
like to add support for a column that has values cells each containing a tensor 
value, with all tensors having the same dimensions. These would be stored as a 
binary value, plus some metadata to store type and shape/strides.)

> [C++] Add a Tensor logical value type with constant shape, implemented using 
> ExtensionType
> --
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> shape/dimensions. These would be stored as a binary value, plus some metadata 
> to store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-1614) [C++] Add a Tensor logical value type with constant shape, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-1614:

Summary: [C++] Add a Tensor logical value type with constant shape, 
implemented using ExtensionType  (was: [C++] Add a Tensor logical value type 
with constant dimensions, implemented using ExtensionType)

> [C++] Add a Tensor logical value type with constant shape, implemented using 
> ExtensionType
> --
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8714) [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223832#comment-17223832
 ] 

Bryan Cutler commented on ARROW-8714:
-

Sorry, I mistyped dimension when I meant shape above (fixed now). I was trying 
to think of a way to use a single extension type for constant _and_ variable 
shapes, with a fixed dimension. Although there is a problem with my suggestion 
in that the arrays won't be able to be sliced without recomputing the shapes, 
and I don't see a way around that. So I guess it seems better to stay with 2 
different extension types, this one for variable shapes and ARROW-1614 for 
constant shapes.

> [C++] Add a Tensor logical value type with varying dimensions, implemented 
> using ExtensionType
> --
>
> Key: ARROW-8714
> URL: https://issues.apache.org/jira/browse/ARROW-8714
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Christian Hudon
>Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of 
> a different shape (e.g images of different sizes), but of the same underlying 
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the 
> format. 
> I don't see needing each row being a tensor with a different number of 
> dimensions, so if the implementation for that falls out easily of the use 
> case with each row in the table having a tensor with the same number of 
> dimensions, great. If it adds a lot of complexity, that case would be 
> postponed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-8714) [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222490#comment-17222490
 ] 

Bryan Cutler edited comment on ARROW-8714 at 10/30/20, 6:19 PM:


+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
shapes. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same shape, as an example would give you {{[(5, 2, 3), (5, 4, 6)}}. 
What do you think of this [~chrish42] and [~jorisvandenbossche] ?


was (Author: bryanc):
+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
shapes. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you 
think of this [~chrish42] and [~jorisvandenbossche] ?

> [C++] Add a Tensor logical value type with varying dimensions, implemented 
> using ExtensionType
> --
>
> Key: ARROW-8714
> URL: https://issues.apache.org/jira/browse/ARROW-8714
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Christian Hudon
>Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of 
> a different shape (e.g images of different sizes), but of the same underlying 
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the 
> format. 
> I don't see needing each row being a tensor with a different number of 
> dimensions, so if the implementation for that falls out easily of the use 
> case with each row in the table having a tensor with the same number of 
> dimensions, great. If it adds a lot of complexity, that case would be 
> postponed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-8714) [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

2020-10-30 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222490#comment-17222490
 ] 

Bryan Cutler edited comment on ARROW-8714 at 10/30/20, 6:13 PM:


+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
shapes. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you 
think of this [~chrish42] and [~jorisvandenbossche] ?


was (Author: bryanc):
+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
dimensions. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you 
think of this [~chrish42] and [~jorisvandenbossche] ?

> [C++] Add a Tensor logical value type with varying dimensions, implemented 
> using ExtensionType
> --
>
> Key: ARROW-8714
> URL: https://issues.apache.org/jira/browse/ARROW-8714
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Christian Hudon
>Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of 
> a different shape (e.g images of different sizes), but of the same underlying 
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the 
> format. 
> I don't see needing each row being a tensor with a different number of 
> dimensions, so if the implementation for that falls out easily of the use 
> case with each row in the table having a tensor with the same number of 
> dimensions, great. If it adds a lot of complexity, that case would be 
> postponed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-28 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222496#comment-17222496
 ] 

Bryan Cutler edited comment on ARROW-1614 at 10/28/20, 9:29 PM:


[~rokm] I had an idea for making a single extension type that would handle 
constant and varying dimensions, see comment 
[here|https://issues.apache.org/jira/browse/ARROW-8714?focusedCommentId=17222490=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17222490]
 and let me know what you think.


was (Author: bryanc):
[~rokm] I had an idea for making a single extension type that would handle 
constant and varying dimensions, see comment here and let me know what you 
think.

> [C++] Add a Tensor logical value type with constant dimensions, implemented 
> using ExtensionType
> ---
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-28 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222496#comment-17222496
 ] 

Bryan Cutler commented on ARROW-1614:
-

[~rokm] I had an idea for making a single extension type that would handle 
constant and varying dimensions, see comment here and let me know what you 
think.

> [C++] Add a Tensor logical value type with constant dimensions, implemented 
> using ExtensionType
> ---
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8714) [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

2020-10-28 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222490#comment-17222490
 ] 

Bryan Cutler commented on ARROW-8714:
-

+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
dimensions. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you 
think of this [~chrish42] and [~jorisvandenbossche] ?

> [C++] Add a Tensor logical value type with varying dimensions, implemented 
> using ExtensionType
> --
>
> Key: ARROW-8714
> URL: https://issues.apache.org/jira/browse/ARROW-8714
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Christian Hudon
>Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of 
> a different shape (e.g images of different sizes), but of the same underlying 
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the 
> format. 
> I don't see needing each row being a tensor with a different number of 
> dimensions, so if the implementation for that falls out easily of the use 
> case with each row in the table having a tensor with the same number of 
> dimensions, great. If it adds a lot of complexity, that case would be 
> postponed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-22 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219342#comment-17219342
 ] 

Bryan Cutler commented on ARROW-1614:
-

bq. As this is for the case where all tensors in the array are of the same 
shape I propose we store the data in a single Tensor. Is there a good reason 
not to do that?

I believe if there is a single pyarrow.Tensor serialized in the backing binary 
array, that array will have a length of 1.  Then if placing in a Table or 
RecordBatch, would restrict it to 1 row for the other columns as well.

> [C++] Add a Tensor logical value type with constant dimensions, implemented 
> using ExtensionType
> ---
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-10260) [Python] Missing MapType to Pandas dtype

2020-10-10 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-10260:


Assignee: (was: Bryan Cutler)

> [Python] Missing MapType to Pandas dtype
> 
>
> Key: ARROW-10260
> URL: https://issues.apache.org/jira/browse/ARROW-10260
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype 
> mapping for {{to_pandas_dtype()}}
>  
> {code:java}
> In [2]: d = pa.map_(pa.int64(), pa.float64()) 
>In [3]: d.to_pandas_dtype()
>   
> 
> ---
> NotImplementedError   Traceback (most recent call last)
>  in 
> > 1 
> d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi
>  in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-10260) [Python] Missing MapType to Pandas dtype

2020-10-10 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-10260:


Assignee: Bryan Cutler

> [Python] Missing MapType to Pandas dtype
> 
>
> Key: ARROW-10260
> URL: https://issues.apache.org/jira/browse/ARROW-10260
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype 
> mapping for {{to_pandas_dtype()}}
>  
> {code:java}
> In [2]: d = pa.map_(pa.int64(), pa.float64()) 
>In [3]: d.to_pandas_dtype()
>   
> 
> ---
> NotImplementedError   Traceback (most recent call last)
>  in 
> > 1 
> d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi
>  in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10260) [Python] Missing MapType to Pandas dtype

2020-10-09 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211521#comment-17211521
 ] 

Bryan Cutler commented on ARROW-10260:
--

Should be a quick fix, so marking this for 2.0.0

> [Python] Missing MapType to Pandas dtype
> 
>
> Key: ARROW-10260
> URL: https://issues.apache.org/jira/browse/ARROW-10260
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 2.0.0
>
>
> The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype 
> mapping for {{to_pandas_dtype()}}
>  
> {code:java}
> In [2]: d = pa.map_(pa.int64(), pa.float64()) 
>In [3]: d.to_pandas_dtype()
>   
> 
> ---
> NotImplementedError   Traceback (most recent call last)
>  in 
> > 1 
> d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi
>  in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10260) [Python] Missing MapType to Pandas dtype

2020-10-09 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-10260:
-
Fix Version/s: 2.0.0

> [Python] Missing MapType to Pandas dtype
> 
>
> Key: ARROW-10260
> URL: https://issues.apache.org/jira/browse/ARROW-10260
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 2.0.0
>
>
> The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype 
> mapping for {{to_pandas_dtype()}}
>  
> {code:java}
> In [2]: d = pa.map_(pa.int64(), pa.float64()) 
>In [3]: d.to_pandas_dtype()
>   
> 
> ---
> NotImplementedError   Traceback (most recent call last)
>  in 
> > 1 
> d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi
>  in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map double>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Parquet

2020-10-09 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9812:

Description: 
Hi,

I'm having problems using 'map' data type in Arrow/parquet/pandas.

I'm able to convert a pandas data frame to Arrow with a map data type.

When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
type is written correctly.

When I read back Parquet to Arrow, it fails saying "reading list of structs" is 
not supported. It seems that map is stored as list of structs.

There are two problems here:
 # -Map data type doesn't work from Arrow -> Pandas-. Fixed in ARROW-10151
 # Map data type doesn't get written to or read from Arrow -> Parquet.

Questions:

1. Am I doing something wrong? Is there a way to get these to work? 

2. If these are unsupported features, will this be fixed in a future version? 
Do you plans or ETA?

The following code example (followed by output) should demonstrate the issues:

I'm using Arrow 1.0.0 and Pandas 1.0.5.

Thanks!

Mayur
{code:java}
$ cat arrowtest.py

import pyarrow as pa
import pandas as pd
import pyarrow.parquet as pq
import traceback as tb
import io

print(f'PyArrow Version = {pa.__version__}')
print(f'Pandas Version = {pd.__version__}')

df1 = pd.DataFrame({'a': [[('b', '2')]]})
print(f'df1')
print(f'{df1}')

print(f'Pandas -> Arrow')
try:
t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
pa.map_(pa.string(), pa.string()))]))
print('PASSED')
print(t1)
except:
print(f'FAILED')
tb.print_exc()

print(f'Arrow -> Pandas')
try:
t1.to_pandas()
print('PASSED')
except:
print(f'FAILED')
tb.print_exc()print(f'Arrow -> Parquet')

fh = io.BytesIO()
try:
pq.write_table(t1, fh)
print('PASSED')
except:
print('FAILED')
tb.print_exc()

print(f'Parquet -> Arrow')
try:
t2 = pq.read_table(source=fh)
print('PASSED')
print(t2)
except:
print('FAILED')
tb.print_exc()
{code}
{code:java}
$ python3.6 arrowtest.py
PyArrow Version = 1.0.0 
Pandas Version = 1.0.5 
df1 
a 0 [(b, 2)] 
 
Pandas -> Arrow 
PASSED 
pyarrow.Table 
a: map
 child 0, entries: struct not null
 child 0, key: string not null
 child 1, value: string 
 
Arrow -> Pandas 
FAILED 
Traceback (most recent call last):
File "arrowtest.py", line 26, in  t1.to_pandas() 
File "pyarrow/array.pxi", line 715, in pyarrow.lib._PandasConvertible.to_pandas 
File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
"XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
ext_columns_dtypes) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
1115, in _table_to_blocks list(extension_columns.keys())) 
File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
"pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
Arrow data of type map is known. 
 
Arrow -> Parquet 
PASSED 
 
Parquet -> Arrow 
FAILED 
Traceback (most recent call last): File "arrowtest.py", line 43, in  t2 
= pq.read_table(source=fh) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
read_table use_pandas_metadata=use_pandas_metadata) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
read use_threads=use_threads 
File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
File "pyarrow/error.pxi", line 122, in 
pyarrow.lib.pyarrow_internal_check_status 
File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
files not yet supported: key_value: list not null> not null
{code}
Updated to indicate to Pandas conversion done, but not yet for Parquet.

  was:
Hi,

I'm having problems using 'map' data type in Arrow/parquet/pandas.

I'm able to convert a pandas data frame to Arrow with a map data type.

But, -Arrow to Pandas doesn't work.-  Fixed in ARROW-10151

When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
type is written correctly.

When I read back Parquet to Arrow, it fails saying "reading list of structs" is 
not supported. It seems that map is stored as list of structs.

There are two problems here:
 # -Map data type doesn't work from Arrow -> Pandas-. Fixed in ARROW-10151
 # Map data type doesn't get written to or read from Arrow -> Parquet.

Questions:

1. Am I doing something wrong? Is there a way to get these to work? 

2. If these are unsupported features, will this be fixed in a future version? 
Do you plans or ETA?

The following code example (followed by output) should demonstrate the issues:

I'm using Arrow 1.0.0 and Pandas 1.0.5.

Thanks!

Mayur

[jira] [Updated] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Parquet

2020-10-09 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9812:

Description: 
Hi,

I'm having problems using 'map' data type in Arrow/parquet/pandas.

I'm able to convert a pandas data frame to Arrow with a map data type.

But, -Arrow to Pandas doesn't work.-  Fixed in ARROW-10151

When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
type is written correctly.

When I read back Parquet to Arrow, it fails saying "reading list of structs" is 
not supported. It seems that map is stored as list of structs.

There are two problems here:
 # -Map data type doesn't work from Arrow -> Pandas-. Fixed in ARROW-10151
 # Map data type doesn't get written to or read from Arrow -> Parquet.

Questions:

1. Am I doing something wrong? Is there a way to get these to work? 

2. If these are unsupported features, will this be fixed in a future version? 
Do you plans or ETA?

The following code example (followed by output) should demonstrate the issues:

I'm using Arrow 1.0.0 and Pandas 1.0.5.

Thanks!

Mayur
{code:java}
$ cat arrowtest.py

import pyarrow as pa
import pandas as pd
import pyarrow.parquet as pq
import traceback as tb
import io

print(f'PyArrow Version = {pa.__version__}')
print(f'Pandas Version = {pd.__version__}')

df1 = pd.DataFrame({'a': [[('b', '2')]]})
print(f'df1')
print(f'{df1}')

print(f'Pandas -> Arrow')
try:
t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
pa.map_(pa.string(), pa.string()))]))
print('PASSED')
print(t1)
except:
print(f'FAILED')
tb.print_exc()

print(f'Arrow -> Pandas')
try:
t1.to_pandas()
print('PASSED')
except:
print(f'FAILED')
tb.print_exc()print(f'Arrow -> Parquet')

fh = io.BytesIO()
try:
pq.write_table(t1, fh)
print('PASSED')
except:
print('FAILED')
tb.print_exc()

print(f'Parquet -> Arrow')
try:
t2 = pq.read_table(source=fh)
print('PASSED')
print(t2)
except:
print('FAILED')
tb.print_exc()
{code}
{code:java}
$ python3.6 arrowtest.py
PyArrow Version = 1.0.0 
Pandas Version = 1.0.5 
df1 
a 0 [(b, 2)] 
 
Pandas -> Arrow 
PASSED 
pyarrow.Table 
a: map
 child 0, entries: struct not null
 child 0, key: string not null
 child 1, value: string 
 
Arrow -> Pandas 
FAILED 
Traceback (most recent call last):
File "arrowtest.py", line 26, in  t1.to_pandas() 
File "pyarrow/array.pxi", line 715, in pyarrow.lib._PandasConvertible.to_pandas 
File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
"XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
ext_columns_dtypes) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
1115, in _table_to_blocks list(extension_columns.keys())) 
File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
"pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
Arrow data of type map is known. 
 
Arrow -> Parquet 
PASSED 
 
Parquet -> Arrow 
FAILED 
Traceback (most recent call last): File "arrowtest.py", line 43, in  t2 
= pq.read_table(source=fh) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
read_table use_pandas_metadata=use_pandas_metadata) 
File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
read use_threads=use_threads 
File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
File "pyarrow/error.pxi", line 122, in 
pyarrow.lib.pyarrow_internal_check_status 
File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
files not yet supported: key_value: list not null> not null
{code}
Updated to indicate to Pandas conversion done, but not yet for Parquet.

  was:
Hi,

I'm having problems using 'map' data type in Arrow/parquet/pandas.

I'm able to convert a pandas data frame to Arrow with a map data type.

But, Arrow to Pandas doesn't work.

When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
type is written correctly.

When I read back Parquet to Arrow, it fails saying "reading list of structs" is 
not supported. It seems that map is stored as list of structs.

There are two problems here:
 # Map data type doesn't work from Arrow -> Pandas.
 # Map data type doesn't get written to or read from Arrow -> Parquet.

Questions:

1. Am I doing something wrong? Is there a way to get these to work? 

2. If these are unsupported features, will this be fixed in a future version? 
Do you plans or ETA?

The following code example (followed by output) should demonstrate the issues:

I'm using Arrow 1.0.0 and Pandas 1.0.5.

Thanks!

[jira] [Updated] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Parquet

2020-10-09 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9812:

Summary: [Python] Map data types doesn't work from Arrow to Parquet  (was: 
[Python] Map data types doesn't work from Arrow to Pandas and Parquet)

> [Python] Map data types doesn't work from Arrow to Parquet
> --
>
> Key: ARROW-9812
> URL: https://issues.apache.org/jira/browse/ARROW-9812
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mayur Srivastava
>Priority: Major
>
> Hi,
> I'm having problems using 'map' data type in Arrow/parquet/pandas.
> I'm able to convert a pandas data frame to Arrow with a map data type.
> But, Arrow to Pandas doesn't work.
> When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
> type is written correctly.
> When I read back Parquet to Arrow, it fails saying "reading list of structs" 
> is not supported. It seems that map is stored as list of structs.
> There are two problems here:
>  # Map data type doesn't work from Arrow -> Pandas.
>  # Map data type doesn't get written to or read from Arrow -> Parquet.
> Questions:
> 1. Am I doing something wrong? Is there a way to get these to work? 
> 2. If these are unsupported features, will this be fixed in a future version? 
> Do you plans or ETA?
> The following code example (followed by output) should demonstrate the issues:
> I'm using Arrow 1.0.0 and Pandas 1.0.5.
> Thanks!
> Mayur
> {code:java}
> $ cat arrowtest.py
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> import traceback as tb
> import io
> print(f'PyArrow Version = {pa.__version__}')
> print(f'Pandas Version = {pd.__version__}')
> df1 = pd.DataFrame({'a': [[('b', '2')]]})
> print(f'df1')
> print(f'{df1}')
> print(f'Pandas -> Arrow')
> try:
> t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
> pa.map_(pa.string(), pa.string()))]))
> print('PASSED')
> print(t1)
> except:
> print(f'FAILED')
> tb.print_exc()
> print(f'Arrow -> Pandas')
> try:
> t1.to_pandas()
> print('PASSED')
> except:
> print(f'FAILED')
> tb.print_exc()print(f'Arrow -> Parquet')
> fh = io.BytesIO()
> try:
> pq.write_table(t1, fh)
> print('PASSED')
> except:
> print('FAILED')
> tb.print_exc()
> 
> print(f'Parquet -> Arrow')
> try:
> t2 = pq.read_table(source=fh)
> print('PASSED')
> print(t2)
> except:
> print('FAILED')
> tb.print_exc()
> {code}
> {code:java}
> $ python3.6 arrowtest.py
> PyArrow Version = 1.0.0 
> Pandas Version = 1.0.5 
> df1 
> a 0 [(b, 2)] 
>  
> Pandas -> Arrow 
> PASSED 
> pyarrow.Table 
> a: map
>  child 0, entries: struct not null
>  child 0, key: string not null
>  child 1, value: string 
>  
> Arrow -> Pandas 
> FAILED 
> Traceback (most recent call last):
> File "arrowtest.py", line 26, in  t1.to_pandas() 
> File "pyarrow/array.pxi", line 715, in 
> pyarrow.lib._PandasConvertible.to_pandas 
> File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
> "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
> table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
> 1115, in _table_to_blocks list(extension_columns.keys())) 
> File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
> "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
> Arrow data of type map is known. 
>  
> Arrow -> Parquet 
> PASSED 
>  
> Parquet -> Arrow 
> FAILED 
> Traceback (most recent call last): File "arrowtest.py", line 43, in  
> t2 = pq.read_table(source=fh) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
> read_table use_pandas_metadata=use_pandas_metadata) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
> read use_threads=use_threads 
> File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
> File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
> File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status 
> File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
> files not yet supported: key_value: list null, value: string> not null> not null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Pandas and Parquet

2020-10-09 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211472#comment-17211472
 ] 

Bryan Cutler edited comment on ARROW-9812 at 10/9/20, 11:50 PM:


Hi [~admrsh] , I implemented Map types to Pandas conversion recently in 
ARROW-10151, but looks like I forgot that line you pointed out in 
{{types.pxi}}. That should be in for the upcoming release, if you are able to 
do a PR for it's cut - likely today or tomorrow - that would be great. 
Otherwise, I can go ahead and add it. I will update this Jira to reflect Pandas 
conversion is complete. I made ARROW-10260 to add {{to_pandas_dtype}}. Thanks!


was (Author: bryanc):
Hi [~admrsh] , I implemented Map types to Pandas conversion recently in 
ARROW-10151, but looks like I forgot that line you pointed out in 
{{types.pxi}}. That should be in for the upcoming release, if you are able to 
do a PR for it's cut - likely today or tomorrow - that would be great. 
Otherwise, I can go ahead and add it. I will update this Jira to reflect Pandas 
conversion is complete. Thanks!

> [Python] Map data types doesn't work from Arrow to Pandas and Parquet
> -
>
> Key: ARROW-9812
> URL: https://issues.apache.org/jira/browse/ARROW-9812
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mayur Srivastava
>Priority: Major
>
> Hi,
> I'm having problems using 'map' data type in Arrow/parquet/pandas.
> I'm able to convert a pandas data frame to Arrow with a map data type.
> But, Arrow to Pandas doesn't work.
> When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
> type is written correctly.
> When I read back Parquet to Arrow, it fails saying "reading list of structs" 
> is not supported. It seems that map is stored as list of structs.
> There are two problems here:
>  # Map data type doesn't work from Arrow -> Pandas.
>  # Map data type doesn't get written to or read from Arrow -> Parquet.
> Questions:
> 1. Am I doing something wrong? Is there a way to get these to work? 
> 2. If these are unsupported features, will this be fixed in a future version? 
> Do you plans or ETA?
> The following code example (followed by output) should demonstrate the issues:
> I'm using Arrow 1.0.0 and Pandas 1.0.5.
> Thanks!
> Mayur
> {code:java}
> $ cat arrowtest.py
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> import traceback as tb
> import io
> print(f'PyArrow Version = {pa.__version__}')
> print(f'Pandas Version = {pd.__version__}')
> df1 = pd.DataFrame({'a': [[('b', '2')]]})
> print(f'df1')
> print(f'{df1}')
> print(f'Pandas -> Arrow')
> try:
> t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
> pa.map_(pa.string(), pa.string()))]))
> print('PASSED')
> print(t1)
> except:
> print(f'FAILED')
> tb.print_exc()
> print(f'Arrow -> Pandas')
> try:
> t1.to_pandas()
> print('PASSED')
> except:
> print(f'FAILED')
> tb.print_exc()print(f'Arrow -> Parquet')
> fh = io.BytesIO()
> try:
> pq.write_table(t1, fh)
> print('PASSED')
> except:
> print('FAILED')
> tb.print_exc()
> 
> print(f'Parquet -> Arrow')
> try:
> t2 = pq.read_table(source=fh)
> print('PASSED')
> print(t2)
> except:
> print('FAILED')
> tb.print_exc()
> {code}
> {code:java}
> $ python3.6 arrowtest.py
> PyArrow Version = 1.0.0 
> Pandas Version = 1.0.5 
> df1 
> a 0 [(b, 2)] 
>  
> Pandas -> Arrow 
> PASSED 
> pyarrow.Table 
> a: map
>  child 0, entries: struct not null
>  child 0, key: string not null
>  child 1, value: string 
>  
> Arrow -> Pandas 
> FAILED 
> Traceback (most recent call last):
> File "arrowtest.py", line 26, in  t1.to_pandas() 
> File "pyarrow/array.pxi", line 715, in 
> pyarrow.lib._PandasConvertible.to_pandas 
> File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
> "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
> table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
> 1115, in _table_to_blocks list(extension_columns.keys())) 
> File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
> "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
> Arrow data of type map is known. 
>  
> Arrow -> Parquet 
> PASSED 
>  
> Parquet -> Arrow 
> FAILED 
> Traceback (most recent call last): File "arrowtest.py", line 43, in  
> t2 = pq.read_table(source=fh) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
> read_table use_pandas_metadata=use_pandas_metadata) 
> File

[jira] [Created] (ARROW-10260) [Python] Missing MapType to Pandas dtype

2020-10-09 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-10260:


 Summary: [Python] Missing MapType to Pandas dtype
 Key: ARROW-10260
 URL: https://issues.apache.org/jira/browse/ARROW-10260
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler


The Map type conversion to Pandas done in ARROW-10151 forgot to add dtype 
mapping for {{to_pandas_dtype()}}

 
{code:java}
In [2]: d = pa.map_(pa.int64(), pa.float64())   
 In [3]: d.to_pandas_dtype()

  
---
NotImplementedError   Traceback (most recent call last)
 in 
> 1 
d.to_pandas_dtype()~/miniconda2/envs/pyarrow-test/lib/python3.7/site-packages/pyarrow/types.pxi
 in pyarrow.lib.DataType.to_pandas_dtype()NotImplementedError: map{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Pandas and Parquet

2020-10-09 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211472#comment-17211472
 ] 

Bryan Cutler commented on ARROW-9812:
-

Hi [~admrsh] , I implemented Map types to Pandas conversion recently in 
ARROW-10151, but looks like I forgot that line you pointed out in 
{{types.pxi}}. That should be in for the upcoming release, if you are able to 
do a PR for it's cut - likely today or tomorrow - that would be great. 
Otherwise, I can go ahead and add it. I will update this Jira to reflect Pandas 
conversion is complete. Thanks!

> [Python] Map data types doesn't work from Arrow to Pandas and Parquet
> -
>
> Key: ARROW-9812
> URL: https://issues.apache.org/jira/browse/ARROW-9812
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mayur Srivastava
>Priority: Major
>
> Hi,
> I'm having problems using 'map' data type in Arrow/parquet/pandas.
> I'm able to convert a pandas data frame to Arrow with a map data type.
> But, Arrow to Pandas doesn't work.
> When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
> type is written correctly.
> When I read back Parquet to Arrow, it fails saying "reading list of structs" 
> is not supported. It seems that map is stored as list of structs.
> There are two problems here:
>  # Map data type doesn't work from Arrow -> Pandas.
>  # Map data type doesn't get written to or read from Arrow -> Parquet.
> Questions:
> 1. Am I doing something wrong? Is there a way to get these to work? 
> 2. If these are unsupported features, will this be fixed in a future version? 
> Do you plans or ETA?
> The following code example (followed by output) should demonstrate the issues:
> I'm using Arrow 1.0.0 and Pandas 1.0.5.
> Thanks!
> Mayur
> {code:java}
> $ cat arrowtest.py
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> import traceback as tb
> import io
> print(f'PyArrow Version = {pa.__version__}')
> print(f'Pandas Version = {pd.__version__}')
> df1 = pd.DataFrame({'a': [[('b', '2')]]})
> print(f'df1')
> print(f'{df1}')
> print(f'Pandas -> Arrow')
> try:
> t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
> pa.map_(pa.string(), pa.string()))]))
> print('PASSED')
> print(t1)
> except:
> print(f'FAILED')
> tb.print_exc()
> print(f'Arrow -> Pandas')
> try:
> t1.to_pandas()
> print('PASSED')
> except:
> print(f'FAILED')
> tb.print_exc()print(f'Arrow -> Parquet')
> fh = io.BytesIO()
> try:
> pq.write_table(t1, fh)
> print('PASSED')
> except:
> print('FAILED')
> tb.print_exc()
> 
> print(f'Parquet -> Arrow')
> try:
> t2 = pq.read_table(source=fh)
> print('PASSED')
> print(t2)
> except:
> print('FAILED')
> tb.print_exc()
> {code}
> {code:java}
> $ python3.6 arrowtest.py
> PyArrow Version = 1.0.0 
> Pandas Version = 1.0.5 
> df1 
> a 0 [(b, 2)] 
>  
> Pandas -> Arrow 
> PASSED 
> pyarrow.Table 
> a: map
>  child 0, entries: struct not null
>  child 0, key: string not null
>  child 1, value: string 
>  
> Arrow -> Pandas 
> FAILED 
> Traceback (most recent call last):
> File "arrowtest.py", line 26, in  t1.to_pandas() 
> File "pyarrow/array.pxi", line 715, in 
> pyarrow.lib._PandasConvertible.to_pandas 
> File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
> "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
> table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
> 1115, in _table_to_blocks list(extension_columns.keys())) 
> File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
> "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
> Arrow data of type map is known. 
>  
> Arrow -> Parquet 
> PASSED 
>  
> Parquet -> Arrow 
> FAILED 
> Traceback (most recent call last): File "arrowtest.py", line 43, in  
> t2 = pq.read_table(source=fh) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
> read_table use_pandas_metadata=use_pandas_metadata) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
> read use_threads=use_threads 
> File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
> File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
> File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status 
> File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
> files not yet supported: key_value:

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-09 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210634#comment-17210634
 ] 

Bryan Cutler commented on ARROW-1614:
-

[~rokm] for our purposes, it wasn't necessary to use pyarrow.Tensor, but there 
are some limitations with it currently so maybe there are some trade-offs. 
Please go ahead and start if you like and I'd be happy to help review and 
discuss further.

> [C++] Add a Tensor logical value type with constant dimensions, implemented 
> using ExtensionType
> ---
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-08 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210489#comment-17210489
 ] 

Bryan Cutler commented on ARROW-1614:
-

I just wanted to let you all know I have been working on a similar Tensor 
extension type. I currently have a Pandas extension type for a tensor with 
conversion to/from an Arrow extension type, just for Python/PyArrow right now, 
and zero-copy conversion with numpy.ndarrays. It's part of the project [Text 
Extensions for Pandas|https://github.com/CODAIT/text-extensions-for-pandas] 
where we use it for NLP feature vectors, but it's really general purpose. You 
can check it out at

[https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/array/tensor.py]
 
[https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/array/arrow_conversion.py]
 Or install the package if you like via {{pip install 
text-extensions-for-pandas}} (it's currently in alpha)

We would love to help out with this effort and contribute what we have to 
Arrow, if it fits the bill!

> [C++] Add a Tensor logical value type with constant dimensions, implemented 
> using ExtensionType
> ---
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>
> In an Arrow table, we would like to add support for a column that has values 
> cells each containing a tensor value, with all tensors having the same 
> dimensions. These would be stored as a binary value, plus some metadata to 
> store type and shape/strides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10178) [CI] Fix spark master integration test build setup

2020-10-05 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208367#comment-17208367
 ] 

Bryan Cutler commented on ARROW-10178:
--

I'll check it out

> [CI] Fix spark master integration test build setup
> --
>
> Key: ARROW-10178
> URL: https://issues.apache.org/jira/browse/ARROW-10178
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> https://github.com/ursa-labs/crossbow/runs/1204690363



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Pandas and Parquet

2020-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205731#comment-17205731
 ] 

Bryan Cutler commented on ARROW-9812:
-

I started work on https://issues.apache.org/jira/browse/ARROW-10151 for the 
Pandas conversion. Let's keep this open for Parquet conversion after ARROW-1644.

> [Python] Map data types doesn't work from Arrow to Pandas and Parquet
> -
>
> Key: ARROW-9812
> URL: https://issues.apache.org/jira/browse/ARROW-9812
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mayur Srivastava
>Priority: Major
>
> Hi,
> I'm having problems using 'map' data type in Arrow/parquet/pandas.
> I'm able to convert a pandas data frame to Arrow with a map data type.
> But, Arrow to Pandas doesn't work.
> When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
> type is written correctly.
> When I read back Parquet to Arrow, it fails saying "reading list of structs" 
> is not supported. It seems that map is stored as list of structs.
> There are two problems here:
>  # Map data type doesn't work from Arrow -> Pandas.
>  # Map data type doesn't get written to or read from Arrow -> Parquet.
> Questions:
> 1. Am I doing something wrong? Is there a way to get these to work? 
> 2. If these are unsupported features, will this be fixed in a future version? 
> Do you plans or ETA?
> The following code example (followed by output) should demonstrate the issues:
> I'm using Arrow 1.0.0 and Pandas 1.0.5.
> Thanks!
> Mayur
> {code:java}
> $ cat arrowtest.py
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> import traceback as tb
> import io
> print(f'PyArrow Version = {pa.__version__}')
> print(f'Pandas Version = {pd.__version__}')
> df1 = pd.DataFrame({'a': [[('b', '2')]]})
> print(f'df1')
> print(f'{df1}')
> print(f'Pandas -> Arrow')
> try:
> t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
> pa.map_(pa.string(), pa.string()))]))
> print('PASSED')
> print(t1)
> except:
> print(f'FAILED')
> tb.print_exc()
> print(f'Arrow -> Pandas')
> try:
> t1.to_pandas()
> print('PASSED')
> except:
> print(f'FAILED')
> tb.print_exc()print(f'Arrow -> Parquet')
> fh = io.BytesIO()
> try:
> pq.write_table(t1, fh)
> print('PASSED')
> except:
> print('FAILED')
> tb.print_exc()
> 
> print(f'Parquet -> Arrow')
> try:
> t2 = pq.read_table(source=fh)
> print('PASSED')
> print(t2)
> except:
> print('FAILED')
> tb.print_exc()
> {code}
> {code:java}
> $ python3.6 arrowtest.py
> PyArrow Version = 1.0.0 
> Pandas Version = 1.0.5 
> df1 
> a 0 [(b, 2)] 
>  
> Pandas -> Arrow 
> PASSED 
> pyarrow.Table 
> a: map
>  child 0, entries: struct not null
>  child 0, key: string not null
>  child 1, value: string 
>  
> Arrow -> Pandas 
> FAILED 
> Traceback (most recent call last):
> File "arrowtest.py", line 26, in  t1.to_pandas() 
> File "pyarrow/array.pxi", line 715, in 
> pyarrow.lib._PandasConvertible.to_pandas 
> File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
> "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
> table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
> 1115, in _table_to_blocks list(extension_columns.keys())) 
> File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
> "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
> Arrow data of type map is known. 
>  
> Arrow -> Parquet 
> PASSED 
>  
> Parquet -> Arrow 
> FAILED 
> Traceback (most recent call last): File "arrowtest.py", line 43, in  
> t2 = pq.read_table(source=fh) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
> read_table use_pandas_metadata=use_pandas_metadata) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
> read use_threads=use_threads 
> File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
> File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
> File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status 
> File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
> files not yet supported: key_value: list null, value: string> not null> not null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10151) [Python] Add support MapArray to_pandas conversion

2020-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205730#comment-17205730
 ] 

Bryan Cutler commented on ARROW-10151:
--

Thanks David, I must have missed that one. I'll keep this open for Pandas 
conversion.

> [Python] Add support MapArray to_pandas conversion
> --
>
> Key: ARROW-10151
> URL: https://issues.apache.org/jira/browse/ARROW-10151
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 2.0.0
>
>
> MapArray does not currently support to_pandas conversion and raises a 
> {{Status::NotImplemented("No known equivalent Pandas block for Arrow data of 
> type ")}}
> Conversion from Pandas seems to work, but should verify there are tests in 
> place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-9812) [Python] Map data types doesn't work from Arrow to Pandas and Parquet

2020-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205731#comment-17205731
 ] 

Bryan Cutler edited comment on ARROW-9812 at 10/1/20, 6:10 PM:
---

I started work on ARROW-10151 for the Pandas conversion. Let's keep this open 
for Parquet conversion after ARROW-1644.


was (Author: bryanc):
I started work on https://issues.apache.org/jira/browse/ARROW-10151 for the 
Pandas conversion. Let's keep this open for Parquet conversion after ARROW-1644.

> [Python] Map data types doesn't work from Arrow to Pandas and Parquet
> -
>
> Key: ARROW-9812
> URL: https://issues.apache.org/jira/browse/ARROW-9812
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mayur Srivastava
>Priority: Major
>
> Hi,
> I'm having problems using 'map' data type in Arrow/parquet/pandas.
> I'm able to convert a pandas data frame to Arrow with a map data type.
> But, Arrow to Pandas doesn't work.
> When I write Arrow to Parquet, it seems to work, but I'm not sure if the data 
> type is written correctly.
> When I read back Parquet to Arrow, it fails saying "reading list of structs" 
> is not supported. It seems that map is stored as list of structs.
> There are two problems here:
>  # Map data type doesn't work from Arrow -> Pandas.
>  # Map data type doesn't get written to or read from Arrow -> Parquet.
> Questions:
> 1. Am I doing something wrong? Is there a way to get these to work? 
> 2. If these are unsupported features, will this be fixed in a future version? 
> Do you plans or ETA?
> The following code example (followed by output) should demonstrate the issues:
> I'm using Arrow 1.0.0 and Pandas 1.0.5.
> Thanks!
> Mayur
> {code:java}
> $ cat arrowtest.py
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> import traceback as tb
> import io
> print(f'PyArrow Version = {pa.__version__}')
> print(f'Pandas Version = {pd.__version__}')
> df1 = pd.DataFrame({'a': [[('b', '2')]]})
> print(f'df1')
> print(f'{df1}')
> print(f'Pandas -> Arrow')
> try:
> t1 = pa.Table.from_pandas(df1, schema=pa.schema([pa.field('a', 
> pa.map_(pa.string(), pa.string()))]))
> print('PASSED')
> print(t1)
> except:
> print(f'FAILED')
> tb.print_exc()
> print(f'Arrow -> Pandas')
> try:
> t1.to_pandas()
> print('PASSED')
> except:
> print(f'FAILED')
> tb.print_exc()print(f'Arrow -> Parquet')
> fh = io.BytesIO()
> try:
> pq.write_table(t1, fh)
> print('PASSED')
> except:
> print('FAILED')
> tb.print_exc()
> 
> print(f'Parquet -> Arrow')
> try:
> t2 = pq.read_table(source=fh)
> print('PASSED')
> print(t2)
> except:
> print('FAILED')
> tb.print_exc()
> {code}
> {code:java}
> $ python3.6 arrowtest.py
> PyArrow Version = 1.0.0 
> Pandas Version = 1.0.5 
> df1 
> a 0 [(b, 2)] 
>  
> Pandas -> Arrow 
> PASSED 
> pyarrow.Table 
> a: map
>  child 0, entries: struct not null
>  child 0, key: string not null
>  child 1, value: string 
>  
> Arrow -> Pandas 
> FAILED 
> Traceback (most recent call last):
> File "arrowtest.py", line 26, in  t1.to_pandas() 
> File "pyarrow/array.pxi", line 715, in 
> pyarrow.lib._PandasConvertible.to_pandas 
> File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas File 
> "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 779, in 
> table_to_blockmanager blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/pandas_compat.py", line 
> 1115, in _table_to_blocks list(extension_columns.keys())) 
> File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks File 
> "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: No known equivalent Pandas block for 
> Arrow data of type map is known. 
>  
> Arrow -> Parquet 
> PASSED 
>  
> Parquet -> Arrow 
> FAILED 
> Traceback (most recent call last): File "arrowtest.py", line 43, in  
> t2 = pq.read_table(source=fh) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1586, in 
> read_table use_pandas_metadata=use_pandas_metadata) 
> File "XXX/pyarrow/1/0/x/dist/lib/python3.6/pyarrow/parquet.py", line 1474, in 
> read use_threads=use_threads 
> File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.to_table 
> File "pyarrow/_dataset.pyx", line 1994, in pyarrow._dataset.Scanner.to_table 
> File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status 
> File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status 
> pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet 
> files not yet supported: key_value: list null, value: string> not null> not null
> {code}

[jira] [Commented] (ARROW-10151) [Python] Add support MapArray to_pandas conversion

2020-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205715#comment-17205715
 ] 

Bryan Cutler commented on ARROW-10151:
--

I started working on this, I think I can have it ready for the 2.0.0 release.

> [Python] Add support MapArray to_pandas conversion
> --
>
> Key: ARROW-10151
> URL: https://issues.apache.org/jira/browse/ARROW-10151
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 2.0.0
>
>
> MapArray does not currently support to_pandas conversion and raises a 
> {{Status::NotImplemented("No known equivalent Pandas block for Arrow data of 
> type ")}}
> Conversion from Pandas seems to work, but should verify there are tests in 
> place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-10151) [Python] Add support MapArray to_pandas conversion

2020-10-01 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-10151:


 Summary: [Python] Add support MapArray to_pandas conversion
 Key: ARROW-10151
 URL: https://issues.apache.org/jira/browse/ARROW-10151
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
 Fix For: 2.0.0


MapArray does not currently support to_pandas conversion and raises a 
{{Status::NotImplemented("No known equivalent Pandas block for Arrow data of 
type ")}}

Conversion from Pandas seems to work, but should verify there are tests in 
place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-4526.
-
Resolution: Fixed

I changed the remaining child issue to only be related, since the main 
objective of this Jira has been done I think the parent should be closed.

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-4526:

Fix Version/s: (was: 3.0.0)
   1.0.0

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9356) [Java] Remove Netty dependency from arrow-vector

2020-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9356:

Fix Version/s: (was: 2.0.0)
   3.0.0

> [Java] Remove Netty dependency from arrow-vector 
> -
>
> Key: ARROW-9356
> URL: https://issues.apache.org/jira/browse/ARROW-9356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 3.0.0
>
>
> Cleanup remaining usage of Netty from arrow-vector and remove as a dependency 
> after ARROW-9300.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-10-01 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-4526:

Fix Version/s: (was: 2.0.0)
   3.0.0

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 3.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9750) [Doc][Python] Add usage of py.Array scalar operations behavior

2020-08-14 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-9750:
---

 Summary: [Doc][Python] Add usage of py.Array scalar operations 
behavior
 Key: ARROW-9750
 URL: https://issues.apache.org/jira/browse/ARROW-9750
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Affects Versions: 1.0.0
Reporter: Bryan Cutler


Recent changes in 1.0.0 affected the way pyarrow.Array  scalars handle 
operations such as equality. For example, an equality check will compare object 
equivalence and return False no matter what the value is. Since this could be 
confusing to the user, there should be some documentation on this behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9576) [Doc] Fix error in code example for extension types

2020-07-27 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-9576:
---

 Summary: [Doc] Fix error in code example for extension types
 Key: ARROW-9576
 URL: https://issues.apache.org/jira/browse/ARROW-9576
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler


There is an error in the example code using an undefined variable `arr` instead 
of `self` here 
https://arrow.apache.org/docs/python/extending_types.html#conversion-to-pandas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9371) [Java] Run vector tests for both allocators

2020-07-24 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9371.
-
Fix Version/s: 1.1.0
   Resolution: Fixed

Issue resolved by pull request 7676
[https://github.com/apache/arrow/pull/7676]

> [Java] Run vector tests for both allocators
> ---
>
> Key: ARROW-9371
> URL: https://issues.apache.org/jira/browse/ARROW-9371
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As per https://github.com/apache/arrow/pull/7619#discussion_r451140735 the 
> vector tests should be run for both netty and unsafe allocators



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9545) [Java] Add forward compatibility checks for unrecognized future MetadataVersion

2020-07-23 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9545:

Fix Version/s: (was: 1.0.0)
   2.0.0

> [Java] Add forward compatibility checks for unrecognized future 
> MetadataVersion
> ---
>
> Key: ARROW-9545
> URL: https://issues.apache.org/jira/browse/ARROW-9545
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Critical
> Fix For: 2.0.0
>
>
> We should have no need of these checks in theory, but they present a 
> safeguard should some years in the future it became necessary to increment 
> the MetadataVersion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9545) [Java] Add forward compatibility checks for unrecognized future MetadataVersion

2020-07-23 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9545:

Component/s: (was: C++)
 Java

> [Java] Add forward compatibility checks for unrecognized future 
> MetadataVersion
> ---
>
> Key: ARROW-9545
> URL: https://issues.apache.org/jira/browse/ARROW-9545
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> We should have no need of these checks in theory, but they present a 
> safeguard should some years in the future it became necessary to increment 
> the MetadataVersion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-9545) [Java] Add forward compatibility checks for unrecognized future MetadataVersion

2020-07-23 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-9545:
---

Assignee: (was: Wes McKinney)

> [Java] Add forward compatibility checks for unrecognized future 
> MetadataVersion
> ---
>
> Key: ARROW-9545
> URL: https://issues.apache.org/jira/browse/ARROW-9545
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Bryan Cutler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> We should have no need of these checks in theory, but they present a 
> safeguard should some years in the future it became necessary to increment 
> the MetadataVersion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9545) [Java] Add forward compatibility checks for unrecognized future MetadataVersion

2020-07-23 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9545:

Labels:   (was: pull-request-available)

> [Java] Add forward compatibility checks for unrecognized future 
> MetadataVersion
> ---
>
> Key: ARROW-9545
> URL: https://issues.apache.org/jira/browse/ARROW-9545
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Critical
> Fix For: 1.0.0
>
>
> We should have no need of these checks in theory, but they present a 
> safeguard should some years in the future it became necessary to increment 
> the MetadataVersion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9545) [Java] Add forward compatibility checks for unrecognized future MetadataVersion

2020-07-23 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-9545:
---

 Summary: [Java] Add forward compatibility checks for unrecognized 
future MetadataVersion
 Key: ARROW-9545
 URL: https://issues.apache.org/jira/browse/ARROW-9545
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Bryan Cutler
Assignee: Wes McKinney
 Fix For: 1.0.0


We should have no need of these checks in theory, but they present a safeguard 
should some years in the future it became necessary to increment the 
MetadataVersion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9438) [CI] Spark integration tests are failing

2020-07-13 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156916#comment-17156916
 ] 

Bryan Cutler commented on ARROW-9438:
-

The errors are coming from the recent change in Arrow Java in ARROW-9300. I 
have a patch ready for that as part of the Arrow 1.0.0 upgrade, I'll make a PR 
to do that here soon. There will still be 1 pyspark test failure due to 
ARROW-9223. That will take a little more effort to resolve, and require another 
patch.

> [CI] Spark integration tests are failing
> 
>
> Key: ARROW-9438
> URL: https://issues.apache.org/jira/browse/ARROW-9438
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: spark
> Fix For: 1.0.0
>
>
> It is failing 3 days ago, see the build history here 
> https://github.com/ursa-labs/crossbow/branches/all?query=spark
> It might be a spark regression: 
> https://github.com/ursa-labs/crossbow/runs/864989434
> cc [~bryanc]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9356) [Java] Remove Netty dependency from arrow-vector

2020-07-12 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156404#comment-17156404
 ] 

Bryan Cutler commented on ARROW-9356:
-

I took a quick look and don't think I could do this in the next couple of days. 
It's fine to push off until the next release.

> [Java] Remove Netty dependency from arrow-vector 
> -
>
> Key: ARROW-9356
> URL: https://issues.apache.org/jira/browse/ARROW-9356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 1.0.0
>
>
> Cleanup remaining usage of Netty from arrow-vector and remove as a dependency 
> after ARROW-9300.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9356) [Java] Remove Netty dependency from arrow-vector

2020-07-12 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-9356:

Fix Version/s: (was: 1.0.0)
   2.0.0

> [Java] Remove Netty dependency from arrow-vector 
> -
>
> Key: ARROW-9356
> URL: https://issues.apache.org/jira/browse/ARROW-9356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 2.0.0
>
>
> Cleanup remaining usage of Netty from arrow-vector and remove as a dependency 
> after ARROW-9300.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-9300) [Java] Separate Netty Memory to its own module

2020-07-09 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-9300.
-
Resolution: Fixed

Issue resolved by pull request 7619
[https://github.com/apache/arrow/pull/7619]

> [Java] Separate Netty Memory to its own module
> --
>
> Key: ARROW-9300
> URL: https://issues.apache.org/jira/browse/ARROW-9300
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Finish the work started in ARROW-8230



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 115 matches

Mail list logo