[jira] [Created] (ARROW-11868) [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL

2021-03-04 Thread Vivek Shankar (Jira)
Vivek Shankar created ARROW-11868:
-

 Summary: [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ 
Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL
 Key: ARROW-11868
 URL: https://issues.apache.org/jira/browse/ARROW-11868
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Vivek Shankar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11867) [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL

2021-03-04 Thread Vivek Shankar (Jira)
Vivek Shankar created ARROW-11867:
-

 Summary: [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ 
Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL
 Key: ARROW-11867
 URL: https://issues.apache.org/jira/browse/ARROW-11867
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Vivek Shankar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11866) [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC

2021-03-04 Thread Lynch Wu (Jira)
Lynch Wu created ARROW-11866:


 Summary: [C++] Arrow Flight SetShutdownOnSignals cause potential 
mutex deadlock in gRPC 
 Key: ARROW-11866
 URL: https://issues.apache.org/jira/browse/ARROW-11866
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC
Affects Versions: 3.0.0
 Environment: * Arrow 3.0
* gcc (Debian 10.2.0-16) 10.2.0
* grpc 1.33.1/C++
Reporter: Lynch Wu
 Fix For: 4.0.0


1. When starting a Flight Server with shut down signal (SIGTERM) and using 
`Ctrl + C` to terminate the server cause potential mutex deadlock in gRPC.
 * gcc (Debian 10.2.0-16) 10.2.0
 * grpc 1.33.1/C++

2. 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/test_server.cc]
 This test case can be used as minimal code to re-produce the issue with above 
gRPC version and gcc version on Debian 10.2.0.

3. Related gRPC issue is: [https://github.com/grpc/grpc/issues/24884]

4. I am wandering the way handling the signal is proper with gRPC here: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/server.cc#L881]

 

Feel free to share your thoughts for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11865) [Rust] Out of bound reads in chunk iterator

2021-03-04 Thread Jira
Jorge Leitão created ARROW-11865:


 Summary: [Rust] Out of bound reads in chunk iterator
 Key: ARROW-11865
 URL: https://issues.apache.org/jira/browse/ARROW-11865
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Jorge Leitão


There is a `read_unaligned` of a pointer offseted by `index + 1` whose 
corresponding slice is only valid up to `index - X`, where X depends on the 
iterators' offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11864) [R] Document arrow.int64_downcast option

2021-03-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11864:
---

 Summary: [R] Document arrow.int64_downcast option
 Key: ARROW-11864
 URL: https://issues.apache.org/jira/browse/ARROW-11864
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Matthew Summersgill
 Fix For: 4.0.0


See ARROW-9083 and discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11863) [Rust][DataFusion] No way to get to the examples from docs.rs

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11863:
---

 Summary: [Rust][DataFusion] No way to get to the examples from 
docs.rs
 Key: ARROW-11863
 URL: https://issues.apache.org/jira/browse/ARROW-11863
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
 Attachments: Screen Shot 2021-03-04 at 2.51.54 PM.png

https://docs.rs/datafusion/3.0.0/datafusion/ has a tantalizing piece of text 
about the examples, but no link or explanation of how to find them

 !Screen Shot 2021-03-04 at 2.51.54 PM.png! 

The examples are at 
https://github.com/apache/arrow/tree/master/rust/datafusion/examples

The ideal outcome would be to point people somehow at the examples directory 
for the version of the docs they are looking at in docs.rs. The ok, outcome 
would be to point the docs from docs.rs always at master. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11862) [Rust] String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11862:
---

 Summary: [Rust] String and BinaryArray created from iterators that 
don't accurately report size can lead to undefined behavior
 Key: ARROW-11862
 URL: https://issues.apache.org/jira/browse/ARROW-11862
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


As [~jorgecarleitao] says on  
https://github.com/apache/arrow/pull/9588#discussion_r584290701

The (Rust) Iterator spec recommends, but does not require, that the iterator 
reports a correct length. Consumer that lead to undefined behavior from an 
incorrect size_hint are the causers of said undefined behavior.

The only case where consumers can trust the iterators' length is when the 
interator implement unsafe trait TrustedLen. Unfortunately, TrustedLen is still 
in unstable. For that reason, we have been exposing unsafe 
Buffer::from_trusted_len_iter and the like for those cases.

So the code should be updated to handle the case where the reported `size_hint` 
turns out to be incorrect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11861) [R][Packaging] Apply changes in r/tools/autobrew upstream

2021-03-04 Thread Ian Cook (Jira)
Ian Cook created ARROW-11861:


 Summary: [R][Packaging] Apply changes in r/tools/autobrew upstream
 Key: ARROW-11861
 URL: https://issues.apache.org/jira/browse/ARROW-11861
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, R
Reporter: Ian Cook
Assignee: Neal Richardson
 Fix For: 4.0.0


The changes to {{r/tools/autobrew}} in ARROW-11735 must be applied upstream 
when we release 4.0.0.

See https://github.com/apache/arrow/pull/9610#discussion_r586819763



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11860) [Rust] [DataFusion] Add DataFusion logos

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11860:
---

 Summary: [Rust] [DataFusion] Add DataFusion logos
 Key: ARROW-11860
 URL: https://issues.apache.org/jira/browse/ARROW-11860
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9630|https://github.com/apache/arrow/pull/9630]
I don't think this needs a JIRA?

These are the DataFusion logos that I had created before the project was 
donated to Apache Arrow. They weren't part of the source code repo so didn't 
get donated at the time.

https://user-images.githubusercontent.com/934084/109990656-d55ddf80-7cc6-11eb-8bbc-f21946fd1dfc.png;>

https://user-images.githubusercontent.com/934084/109990665-d68f0c80-7cc6-11eb-891c-bf367cb5f447.png;>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11859) [combine_chunks and concat_arrays

2021-03-04 Thread Dominic Sisneros (Jira)
Dominic Sisneros created ARROW-11859:


 Summary: [combine_chunks and concat_arrays
 Key: ARROW-11859
 URL: https://issues.apache.org/jira/browse/ARROW-11859
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, GLib
Affects Versions: 3.0.0
Reporter: Dominic Sisneros


please add concat_arrays to Arrow::Array classes and combine_chunks to 
Arrow::ChunkedArray classes.  This has been added separately to Python but 
would be good to have for all languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11858) [Glib] Gandiva Filter in Glib

2021-03-04 Thread Dominic Sisneros (Jira)
Dominic Sisneros created ARROW-11858:


 Summary: [Glib] Gandiva Filter in Glib
 Key: ARROW-11858
 URL: https://issues.apache.org/jira/browse/ARROW-11858
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Affects Versions: 3.0.0
Reporter: Dominic Sisneros


I was trying to use Gandiva under ruby.  I was looking able to get a Projection 
working because that is annotated.  I was trying to do a Gandiva filter but 
this doesn't seem to be available with Gandiva. It is not listed in the Glib 
documentation.  Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [arrow-testing] jmgpeeters opened a new pull request #59: ARROW-11838: files for testing IPC reads with shared dictionaries.

2021-03-04 Thread GitBox


jmgpeeters opened a new pull request #59:
URL: https://github.com/apache/arrow-testing/pull/59


   ARROW-11838 aims to add C++ read capability for IPC data with shared 
dictionaries. 
   
   Write support isn't available yet either, and out of scope here, so these 
files - allowing testing of the read functionality - were generated from Java.
   
   Ideally this PR is reviewed alongside the upcoming one in apache/arrow that 
contains the src and test changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (ARROW-11857) Resource temporarily unavailable when using the new Dataset API with Pandas

2021-03-04 Thread Anton Friberg (Jira)
Anton Friberg created ARROW-11857:
-

 Summary: Resource temporarily unavailable when using the new 
Dataset API with Pandas
 Key: ARROW-11857
 URL: https://issues.apache.org/jira/browse/ARROW-11857
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 3.0.0
 Environment: OS: Debian GNU/Linux 10 (buster) x86_64 
Kernel: 4.19.0-14-amd64 
CPU: Intel i7-6700K (8) @ 4.200GHz 
Memory: 32122MiB
Python: v3.7.3
Reporter: Anton Friberg


When using the new Dataset API under v3.0.0 it instantly crashes with
{code:java}
 terminate called after throwing an instance of 'std::system_error'
 what(): Resource temporarily unavailable{code}
This does not happen in an earlier version. The error message leads me to 
believe that the issue is not on the Python side but might be in the C++ 
libraries.

As background, I am using the new Dataset API by calling the following
{code:java}
s3_fs = fs.S3FileSystem()
dataset = pq.ParquetDataset(
f"{bucket}/{base_path}",
filesystem=s3_fs,
partitioning="hive",
use_legacy_dataset=False,
filters=filters
)
dataframe = dataset.read_pandas(columns=columns).to_pandas(){code}
The dataset itself contains 10,000s of files around 100 MB in size and is 
created using incremental bulk processing from pandas and pyarrow v1.0.1.

I am suspecting an issue with a limit in the total amount of threads that are 
spawning but I have been unable to resolve it by calling
{code:java}
pyarrow.set_cpu_count(1) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)