[jira] [Created] (ARROW-11868) [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL
Vivek Shankar created ARROW-11868: - Summary: [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL Key: ARROW-11868 URL: https://issues.apache.org/jira/browse/ARROW-11868 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Reporter: Vivek Shankar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11867) [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL
Vivek Shankar created ARROW-11867: - Summary: [C++][Gandiva] Casting Float.POSITIVE_INFINITY/ Float.NEGATIVE_INFINITY/ Float.NaN to a decimal results in 0.0 instead of NULL Key: ARROW-11867 URL: https://issues.apache.org/jira/browse/ARROW-11867 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Reporter: Vivek Shankar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11866) [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
Lynch Wu created ARROW-11866: Summary: [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC Key: ARROW-11866 URL: https://issues.apache.org/jira/browse/ARROW-11866 Project: Apache Arrow Issue Type: Bug Components: FlightRPC Affects Versions: 3.0.0 Environment: * Arrow 3.0 * gcc (Debian 10.2.0-16) 10.2.0 * grpc 1.33.1/C++ Reporter: Lynch Wu Fix For: 4.0.0 1. When starting a Flight Server with shut down signal (SIGTERM) and using `Ctrl + C` to terminate the server cause potential mutex deadlock in gRPC. * gcc (Debian 10.2.0-16) 10.2.0 * grpc 1.33.1/C++ 2. [https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/test_server.cc] This test case can be used as minimal code to re-produce the issue with above gRPC version and gcc version on Debian 10.2.0. 3. Related gRPC issue is: [https://github.com/grpc/grpc/issues/24884] 4. I am wandering the way handling the signal is proper with gRPC here: [https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/server.cc#L881] Feel free to share your thoughts for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11865) [Rust] Out of bound reads in chunk iterator
Jorge Leitão created ARROW-11865: Summary: [Rust] Out of bound reads in chunk iterator Key: ARROW-11865 URL: https://issues.apache.org/jira/browse/ARROW-11865 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jorge Leitão There is a `read_unaligned` of a pointer offseted by `index + 1` whose corresponding slice is only valid up to `index - X`, where X depends on the iterators' offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11864) [R] Document arrow.int64_downcast option
Neal Richardson created ARROW-11864: --- Summary: [R] Document arrow.int64_downcast option Key: ARROW-11864 URL: https://issues.apache.org/jira/browse/ARROW-11864 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Matthew Summersgill Fix For: 4.0.0 See ARROW-9083 and discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11863) [Rust][DataFusion] No way to get to the examples from docs.rs
Andrew Lamb created ARROW-11863: --- Summary: [Rust][DataFusion] No way to get to the examples from docs.rs Key: ARROW-11863 URL: https://issues.apache.org/jira/browse/ARROW-11863 Project: Apache Arrow Issue Type: Bug Reporter: Andrew Lamb Attachments: Screen Shot 2021-03-04 at 2.51.54 PM.png https://docs.rs/datafusion/3.0.0/datafusion/ has a tantalizing piece of text about the examples, but no link or explanation of how to find them !Screen Shot 2021-03-04 at 2.51.54 PM.png! The examples are at https://github.com/apache/arrow/tree/master/rust/datafusion/examples The ideal outcome would be to point people somehow at the examples directory for the version of the docs they are looking at in docs.rs. The ok, outcome would be to point the docs from docs.rs always at master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11862) [Rust] String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior
Andrew Lamb created ARROW-11862: --- Summary: [Rust] String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior Key: ARROW-11862 URL: https://issues.apache.org/jira/browse/ARROW-11862 Project: Apache Arrow Issue Type: Bug Reporter: Andrew Lamb As [~jorgecarleitao] says on https://github.com/apache/arrow/pull/9588#discussion_r584290701 The (Rust) Iterator spec recommends, but does not require, that the iterator reports a correct length. Consumer that lead to undefined behavior from an incorrect size_hint are the causers of said undefined behavior. The only case where consumers can trust the iterators' length is when the interator implement unsafe trait TrustedLen. Unfortunately, TrustedLen is still in unstable. For that reason, we have been exposing unsafe Buffer::from_trusted_len_iter and the like for those cases. So the code should be updated to handle the case where the reported `size_hint` turns out to be incorrect -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11861) [R][Packaging] Apply changes in r/tools/autobrew upstream
Ian Cook created ARROW-11861: Summary: [R][Packaging] Apply changes in r/tools/autobrew upstream Key: ARROW-11861 URL: https://issues.apache.org/jira/browse/ARROW-11861 Project: Apache Arrow Issue Type: Improvement Components: Packaging, R Reporter: Ian Cook Assignee: Neal Richardson Fix For: 4.0.0 The changes to {{r/tools/autobrew}} in ARROW-11735 must be applied upstream when we release 4.0.0. See https://github.com/apache/arrow/pull/9610#discussion_r586819763 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11860) [Rust] [DataFusion] Add DataFusion logos
Andrew Lamb created ARROW-11860: --- Summary: [Rust] [DataFusion] Add DataFusion logos Key: ARROW-11860 URL: https://issues.apache.org/jira/browse/ARROW-11860 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Andrew Lamb Issue automatically created from Pull Request [9630|https://github.com/apache/arrow/pull/9630] I don't think this needs a JIRA? These are the DataFusion logos that I had created before the project was donated to Apache Arrow. They weren't part of the source code repo so didn't get donated at the time. https://user-images.githubusercontent.com/934084/109990656-d55ddf80-7cc6-11eb-8bbc-f21946fd1dfc.png;> https://user-images.githubusercontent.com/934084/109990665-d68f0c80-7cc6-11eb-891c-bf367cb5f447.png;> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11859) [combine_chunks and concat_arrays
Dominic Sisneros created ARROW-11859: Summary: [combine_chunks and concat_arrays Key: ARROW-11859 URL: https://issues.apache.org/jira/browse/ARROW-11859 Project: Apache Arrow Issue Type: Improvement Components: C++, GLib Affects Versions: 3.0.0 Reporter: Dominic Sisneros please add concat_arrays to Arrow::Array classes and combine_chunks to Arrow::ChunkedArray classes. This has been added separately to Python but would be good to have for all languages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11858) [Glib] Gandiva Filter in Glib
Dominic Sisneros created ARROW-11858: Summary: [Glib] Gandiva Filter in Glib Key: ARROW-11858 URL: https://issues.apache.org/jira/browse/ARROW-11858 Project: Apache Arrow Issue Type: Improvement Components: GLib Affects Versions: 3.0.0 Reporter: Dominic Sisneros I was trying to use Gandiva under ruby. I was looking able to get a Projection working because that is annotated. I was trying to do a Gandiva filter but this doesn't seem to be available with Gandiva. It is not listed in the Glib documentation. Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [arrow-testing] jmgpeeters opened a new pull request #59: ARROW-11838: files for testing IPC reads with shared dictionaries.
jmgpeeters opened a new pull request #59: URL: https://github.com/apache/arrow-testing/pull/59 ARROW-11838 aims to add C++ read capability for IPC data with shared dictionaries. Write support isn't available yet either, and out of scope here, so these files - allowing testing of the read functionality - were generated from Java. Ideally this PR is reviewed alongside the upcoming one in apache/arrow that contains the src and test changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-11857) Resource temporarily unavailable when using the new Dataset API with Pandas
Anton Friberg created ARROW-11857: - Summary: Resource temporarily unavailable when using the new Dataset API with Pandas Key: ARROW-11857 URL: https://issues.apache.org/jira/browse/ARROW-11857 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 3.0.0 Environment: OS: Debian GNU/Linux 10 (buster) x86_64 Kernel: 4.19.0-14-amd64 CPU: Intel i7-6700K (8) @ 4.200GHz Memory: 32122MiB Python: v3.7.3 Reporter: Anton Friberg When using the new Dataset API under v3.0.0 it instantly crashes with {code:java} terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable{code} This does not happen in an earlier version. The error message leads me to believe that the issue is not on the Python side but might be in the C++ libraries. As background, I am using the new Dataset API by calling the following {code:java} s3_fs = fs.S3FileSystem() dataset = pq.ParquetDataset( f"{bucket}/{base_path}", filesystem=s3_fs, partitioning="hive", use_legacy_dataset=False, filters=filters ) dataframe = dataset.read_pandas(columns=columns).to_pandas(){code} The dataset itself contains 10,000s of files around 100 MB in size and is created using incremental bulk processing from pandas and pyarrow v1.0.1. I am suspecting an issue with a limit in the total amount of threads that are spawning but I have been unable to resolve it by calling {code:java} pyarrow.set_cpu_count(1) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)