[jira] [Created] (ARROW-18433) Optimize aggregate functions to work with batches.
A. Coady created ARROW-18433: Summary: Optimize aggregate functions to work with batches. Key: ARROW-18433 URL: https://issues.apache.org/jira/browse/ARROW-18433 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Affects Versions: 10.0.1 Reporter: A. Coady Most compute functions work with the dataset api and don't load columns. But aggregate functions which are associative could also work: `min`, `max`, `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`. A couple of implementation ideas: * expand the dataset api to support expressions which return scalars * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy loading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18432) [Python] Array constructor doesn't support arrow scalars.
A. Coady created ARROW-18432: Summary: [Python] Array constructor doesn't support arrow scalars. Key: ARROW-18432 URL: https://issues.apache.org/jira/browse/ARROW-18432 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 10.0.1 Reporter: A. Coady {code:python} pa.array([pa.scalar(0)]) ArrowInvalid: Could not convert with type pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an Arrow data type pa.array([pa.scalar(0)], 'int64') ArrowInvalid: Could not convert with type pyarrow.lib.Int64Scalar: tried to convert to int64{code} It seems odd that the array constructors don't recognize their own scalars. In practice, a list of scalars has to be converted with `.as_py()` just to be converted back, and that also loses the type information. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow] abduazizR opened a new issue, #14907: [R] right_join() function does not produce the expected outcome
abduazizR opened a new issue, #14907: URL: https://github.com/apache/arrow/issues/14907 ### Describe the bug, including details regarding any error messages, version, and platform. Hi, I noticed something strange today when I was using arrow datasets. I cannot give a reproducible example but you can get the idea from the code below. I have `ccaei` as an arrow dataset. When I try to use `right_join()` with an R tibble before using `collect()`, it gives me wrong numbers (the number of distinct `ENROLID` is less than that present in `outpatients`). I get the correct number when I use `right_join()` after using `collect()`, although this is computationally inefficient. Could you help me with this? This gives a really weird number ``` ccaei |> filter(ADMDATE >= as_date("2016-10-01")) |> filter(!is.na(ENROLID)) |> select(ENROLID, ADMDATE) |> right_join(outpatients) |> collect() |> count(ENROLID) ``` This makes sense ``` ccaei |> filter(ADMDATE >= as_date("2016-10-01")) |> filter(!is.na(ENROLID)) |> select(ENROLID, ADMDATE) |> collect() |> right_join(outpatients) |> count(ENROLID) ``` Not sure where the mistake came from. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #14890: [Java] DictionaryEncoder may leak memory when exception thrown
lidavidm closed issue #14890: [Java] DictionaryEncoder may leak memory when exception thrown URL: https://github.com/apache/arrow/issues/14890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #14901: [Java] ListSubfieldEncoder and StructSubfieldEncoder can decode without DictionaryHashTable
lidavidm closed issue #14901: [Java] ListSubfieldEncoder and StructSubfieldEncoder can decode without DictionaryHashTable URL: https://github.com/apache/arrow/issues/14901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-julia] ericphanson closed issue #348: Install Registrator.jl github app
ericphanson closed issue #348: Install Registrator.jl github app URL: https://github.com/apache/arrow-julia/issues/348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org