[jira] [Commented] (ARROW-14363) [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods without explicit element type
[ https://issues.apache.org/jira/browse/ARROW-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608161#comment-17608161 ] Krisztian Szucs commented on ARROW-14363: - Updated it, thanks [~jinshang]! > [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods > without explicit element type > > > Key: ARROW-14363 > URL: https://issues.apache.org/jira/browse/ARROW-14363 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Gandiva >Reporter: Krisztian Szucs >Assignee: Jin Shang >Priority: Major > Fix For: 10.0.0 > > > Added a workaround for the 6.0.0 release in > https://github.com/apache/arrow/pull/11448 > The LLVM commit > https://reviews.llvm.org/rGf164bc52b61a34f8f95032e1e4fe68bd4eff995f doesn't > provide much context about the reason of the deprication. > The gandiva code should be updated to use the CreateGEP and CreateLoad > methods with element types passed explicity. > cc [~pravindra] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-14363) [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods without explicit element type
[ https://issues.apache.org/jira/browse/ARROW-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-14363. - Fix Version/s: 10.0.0 Resolution: Fixed > [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods > without explicit element type > > > Key: ARROW-14363 > URL: https://issues.apache.org/jira/browse/ARROW-14363 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Gandiva >Reporter: Krisztian Szucs >Assignee: Jin Shang >Priority: Major > Fix For: 10.0.0 > > > Added a workaround for the 6.0.0 release in > https://github.com/apache/arrow/pull/11448 > The LLVM commit > https://reviews.llvm.org/rGf164bc52b61a34f8f95032e1e4fe68bd4eff995f doesn't > provide much context about the reason of the deprication. > The gandiva code should be updated to use the CreateGEP and CreateLoad > methods with element types passed explicity. > cc [~pravindra] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17294) [Release] Update remove old artifacts release script
Krisztian Szucs created ARROW-17294: --- Summary: [Release] Update remove old artifacts release script Key: ARROW-17294 URL: https://issues.apache.org/jira/browse/ARROW-17294 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Fix For: 10.0.0 I just executed the remove old artifacts release script which also removed the previously created three patch releases for 6.0.2, 7.0.1, 8.0.1. That's not desirable since those have just been released so I had to revert to an earlier revision. cc [~kou] [~assignUser] [~raulcd] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17253) [Python] pyarrow.array() crashes the interpreter when given a generator that raises while iterating
[ https://issues.apache.org/jira/browse/ARROW-17253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17253: Fix Version/s: 9.0.1 > [Python] pyarrow.array() crashes the interpreter when given a generator that > raises while iterating > --- > > Key: ARROW-17253 > URL: https://issues.apache.org/jira/browse/ARROW-17253 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 8.0.0 >Reporter: Li Jin >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 9.0.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code:java} > pa.array((1 // 0 for x in range(10)), size=10){code} > This would crash the python interpreter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded
[ https://issues.apache.org/jira/browse/ARROW-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573183#comment-17573183 ] Krisztian Szucs commented on ARROW-17260: - This is the second submission after I uploaded and closed the java release on the apache sonatype repo: https://github.com/apache/arrow/pull/13749#issuecomment-129881 > [Release] Java jars verification pass despite that nothing has been uploaded > > > Key: ARROW-17260 > URL: https://issues.apache.org/jira/browse/ARROW-17260 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Priority: Major > > Build do pass, despite that I forgot to upload the java binaries: > https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true > > cc [~assignUser] [~raulcd] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded
Krisztian Szucs created ARROW-17260: --- Summary: [Release] Java jars verification pass despite that nothing has been uploaded Key: ARROW-17260 URL: https://issues.apache.org/jira/browse/ARROW-17260 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Build do pass, despite that I forgot to upload the java binaries: https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true cc [~assignUser] [~raulcd] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17067) Implement Substring_Index
[ https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17067: Fix Version/s: (was: 9.0.0) > Implement Substring_Index > - > > Key: ARROW-17067 > URL: https://issues.apache.org/jira/browse/ARROW-17067 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Adding Substring_index Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17067) Implement Substring_Index
[ https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17067: Fix Version/s: 9.0.0 > Implement Substring_Index > - > > Key: ARROW-17067 > URL: https://issues.apache.org/jira/browse/ARROW-17067 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Adding Substring_index Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17246) [Packaging][deb][RPM] Don't use system jemalloc
[ https://issues.apache.org/jira/browse/ARROW-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17246. - Resolution: Fixed Issue resolved by pull request 13739 [https://github.com/apache/arrow/pull/13739] > [Packaging][deb][RPM] Don't use system jemalloc > --- > > Key: ARROW-17246 > URL: https://issues.apache.org/jira/browse/ARROW-17246 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Because system jemalloc can't be used with {{dlopen()}}. If system jemalloc > can't used with {{dlopen()}}, our shared libraried can't be loaded as > bindings of script languages such as Ruby: > {noformat} > + ruby -r gi -e 'p GI.load('\''Arrow'\'')' > (null)-WARNING **: Failed to load shared library 'libarrow-glib.so.900' > referenced by the typelib: /lib64/libjemalloc.so.2: cannot allocate memory in > static TLS block > {noformat} > This is caused because system jemalloc isn't built with > {{--disable-initial-exec-tls}}. See also: > * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951704 > * https://github.com/jemalloc/jemalloc/issues/1237 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17238) [Release] Turn off GCS testing during wheel verification
[ https://issues.apache.org/jira/browse/ARROW-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17238. - Resolution: Fixed Issue resolved by pull request 13736 [https://github.com/apache/arrow/pull/13736] > [Release] Turn off GCS testing during wheel verification > > > Key: ARROW-17238 > URL: https://issues.apache.org/jira/browse/ARROW-17238 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17234) [Release][R] Add r-binary-packages to packaging group
[ https://issues.apache.org/jira/browse/ARROW-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17234. - Resolution: Fixed Issue resolved by pull request 13734 [https://github.com/apache/arrow/pull/13734] > [Release][R] Add r-binary-packages to packaging group > - > > Key: ARROW-17234 > URL: https://issues.apache.org/jira/browse/ARROW-17234 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools, R >Reporter: Jacob Wujciak-Jens >Assignee: Jacob Wujciak-Jens >Priority: Critical > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > r-binary-packages is only in nightly-packaging and missing from the release > relevant packaging group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17238) [Release] Turn off GCS testing during wheel verification
Krisztian Szucs created ARROW-17238: --- Summary: [Release] Turn off GCS testing during wheel verification Key: ARROW-17238 URL: https://issues.apache.org/jira/browse/ARROW-17238 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Fix For: 9.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17238) [Release] Turn off GCS testing during wheel verification
[ https://issues.apache.org/jira/browse/ARROW-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-17238: --- Assignee: Krisztian Szucs > [Release] Turn off GCS testing during wheel verification > > > Key: ARROW-17238 > URL: https://issues.apache.org/jira/browse/ARROW-17238 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Fix For: 9.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17237) [Dev][Release] Install wheel test requirements if testing wheels on release verification
[ https://issues.apache.org/jira/browse/ARROW-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17237. - Fix Version/s: 10.0.0 (was: 9.0.0) Resolution: Fixed Issue resolved by pull request 13735 [https://github.com/apache/arrow/pull/13735] > [Dev][Release] Install wheel test requirements if testing wheels on release > verification > > > Key: ARROW-17237 > URL: https://issues.apache.org/jira/browse/ARROW-17237 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > If we are running the verify release wheel tasks we should install the wheel > test requirements or we get the following import errors > ([https://github.com/ursacomputing/crossbow/runs/7558071074?check_suite_focus=true)] > : > {code:java} > + python -m pytest -r s --pyargs pyarrow > /tmp/arrow-9.0.0.frvqL/venv-wheel-3.8-manylinux_2_17_x86_64.manylinux2014_x86_64/bin/python: > No module named pytest > Failed to verify release candidate. See /tmp/arrow-9.0.0.frvqL for details. > 1 {code} > This has been added to the release: > [https://github.com/apache/arrow/pull/13729/commits/2a91ba91016634478c84f9081702f8e7cada7529] > but we should backport to master -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17237) [Dev][Release] Install wheel test requirements if testing wheels on release verification
[ https://issues.apache.org/jira/browse/ARROW-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17237: Fix Version/s: 9.0.0 (was: 10.0.0) > [Dev][Release] Install wheel test requirements if testing wheels on release > verification > > > Key: ARROW-17237 > URL: https://issues.apache.org/jira/browse/ARROW-17237 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > If we are running the verify release wheel tasks we should install the wheel > test requirements or we get the following import errors > ([https://github.com/ursacomputing/crossbow/runs/7558071074?check_suite_focus=true)] > : > {code:java} > + python -m pytest -r s --pyargs pyarrow > /tmp/arrow-9.0.0.frvqL/venv-wheel-3.8-manylinux_2_17_x86_64.manylinux2014_x86_64/bin/python: > No module named pytest > Failed to verify release candidate. See /tmp/arrow-9.0.0.frvqL for details. > 1 {code} > This has been added to the release: > [https://github.com/apache/arrow/pull/13729/commits/2a91ba91016634478c84f9081702f8e7cada7529] > but we should backport to master -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17233) [Crossbow] Outdated artifact patterns for certain linux jobs
Krisztian Szucs created ARROW-17233: --- Summary: [Crossbow] Outdated artifact patterns for certain linux jobs Key: ARROW-17233 URL: https://issues.apache.org/jira/browse/ARROW-17233 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs almalinux-8-arm64 and almalinux-9-arm64: {code} arrow-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-glib-devel-9.0.0-1.el8.aarch64.rpm [ OK] arrow-glib-doc-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-glib-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-glib-libs-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-libs-9.0.0-1.el8.aarch64.rpm [ OK] arrow-python-devel-9.0.0-1.el8.aarch64.rpm [ OK] arrow-python-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-python-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-python-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] {code} centos-7-amd64 {code} arrow-python-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-python-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] {code} centos-8-arm64 and centos-9-arm64: {code} arrow-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-flight-sql-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-flight-sql-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow-glib-devel-9.0.0-1.el8.aarch64.rpm [ OK] arrow-glib-doc-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-glib-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-glib-libs-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK] arrow9-libs-9.0.0-1.el8.aarch64.rpm [ OK] arrow-python-devel-9.0.0-1.el8.aarch64.rpm [ OK] arrow-python-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-python-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] arrow[0-9]+-python-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING] {code} ubuntu-bionic-amd64 / ubuntu-bionic-arm64: {code} libarrow-python-dev_9.0.0-1_[a-z0-9]+.deb [MISSING] libarrow-python-flight-dev_9.0.0-1_[a-z0-9]+.deb [MISSING] libarrow-python-flight900-dbgsym_9.0.0-1_[a-z0-9]+.d?deb [MISSING] libarrow-python-flight900_9.0.0-1_[a-z0-9]+.deb [MISSING] libarrow-python900-dbgsym_9.0.0-1_
[jira] [Created] (ARROW-17232) [Release] Missing R binary packages
Krisztian Szucs created ARROW-17232: --- Summary: [Release] Missing R binary packages Key: ARROW-17232 URL: https://issues.apache.org/jira/browse/ARROW-17232 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Seems like the binary upload script now expects some R binaries to upload, but the {{packaging}} crossbow task group doesn't contain any relevant tasks. I assume the {{r-binary-packages}} should be added to the {{packaging}} group. cc [~kou][~raulcd][~assignUser] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17188) [R] Update news for 9.0.0
[ https://issues.apache.org/jira/browse/ARROW-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17188. - Resolution: Fixed Issue resolved by pull request 13726 [https://github.com/apache/arrow/pull/13726] > [R] Update news for 9.0.0 > - > > Key: ARROW-17188 > URL: https://issues.apache.org/jira/browse/ARROW-17188 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Affects Versions: 9.0.0 >Reporter: Will Jones >Assignee: Will Jones >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches
[ https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17227. - Resolution: Fixed Issue resolved by pull request 13725 [https://github.com/apache/arrow/pull/13725] > [C++] Extend hash-join unit tests to cover both empty and length=0 batches > -- > > Key: ARROW-17227 > URL: https://issues.apache.org/jira/browse/ARROW-17227 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Krisztian Szucs >Assignee: Weston Pace >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition
[ https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-15938. - Resolution: Fixed Issue resolved by pull request 13686 [https://github.com/apache/arrow/pull/13686] > [R][C++] Segfault in left join with empty right table when filtered on > partition > > > Key: ARROW-15938 > URL: https://issues.apache.org/jira/browse/ARROW-15938 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 7.0.2 > Environment: ubuntu linux, R4.1.2 >Reporter: Vitalie Spinu >Assignee: Weston Pace >Priority: Critical > Labels: pull-request-available, query-engine > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When the right table in a join is empty as a result of a filtering on a > partition group the join segfaults: > {code:java} > library(arrow) > library(glue) > df <- mutate(iris, id = runif(n())) > dir <- "./tmp/iris" > dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F) > dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F) > write_parquet(df, glue("{dir}/group=a/part1.parquet")) > write_parquet(df, glue("{dir}/group=b/part2.parquet")) > db1 <- open_dataset(dir) %>% > filter(group == "blabla") > open_dataset(dir) %>% > filter(group == "b") %>% > select(id) %>% > left_join(db1, by = "id") %>% > collect() > {code} > {code:java} > ==24063== Thread 7: > ==24063== Invalid read of size 1 > ==24063== at 0x1FFE606D: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE68CC: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, > int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE84D5: > arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, > arrow::compute::ExecBatch const&) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE8CB4: > arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x200011CF: > arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB580E: > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> > >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FE2B2A0: > std::thread::_State_impl > > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x92844BF: ??? (in > /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29) > ==24063== by 0x6DD46DA: start_thread (pthread_create.c:463) > ==24063== by 0x710D71E: clone (clone.S:95) > ==24063== Address 0x10 is not stack'd, malloc'd or (recently) free'd > ==24063== *** caught segfault *** > address 0x10, cause 'memory not mapped'Traceback: > 1: Table__from_RecordBatchReader(self) > 2: tab$read_table() > 3: do_exec_plan(x) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(tab <- do_exec_plan(x), error = function(e) { > handle_csv_read_error(e, x$.data$schema)}) > 8: collect.arrow_dplyr_query(.) > 9: collect(.) > 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>% > left_join(db1, by = "id") %>% collect()Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace {code} > This is arrow from the current master ece0e23f1. > It's worth noting that if the right table is filtered on a non-partitioned > variable the problem does not occur. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches
[ https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-17227: --- Assignee: Krisztian Szucs (was: Weston Pace) > [C++] Extend hash-join unit tests to cover both empty and length=0 batches > -- > > Key: ARROW-17227 > URL: https://issues.apache.org/jira/browse/ARROW-17227 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Fix For: 9.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches
[ https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-17227: --- Assignee: Weston Pace (was: Krisztian Szucs) > [C++] Extend hash-join unit tests to cover both empty and length=0 batches > -- > > Key: ARROW-17227 > URL: https://issues.apache.org/jira/browse/ARROW-17227 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Krisztian Szucs >Assignee: Weston Pace >Priority: Major > Fix For: 9.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition
[ https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-15938: Fix Version/s: 9.0.0 (was: 10.0.0) > [R][C++] Segfault in left join with empty right table when filtered on > partition > > > Key: ARROW-15938 > URL: https://issues.apache.org/jira/browse/ARROW-15938 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 7.0.2 > Environment: ubuntu linux, R4.1.2 >Reporter: Vitalie Spinu >Assignee: Weston Pace >Priority: Critical > Labels: pull-request-available, query-engine > Fix For: 9.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When the right table in a join is empty as a result of a filtering on a > partition group the join segfaults: > {code:java} > library(arrow) > library(glue) > df <- mutate(iris, id = runif(n())) > dir <- "./tmp/iris" > dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F) > dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F) > write_parquet(df, glue("{dir}/group=a/part1.parquet")) > write_parquet(df, glue("{dir}/group=b/part2.parquet")) > db1 <- open_dataset(dir) %>% > filter(group == "blabla") > open_dataset(dir) %>% > filter(group == "b") %>% > select(id) %>% > left_join(db1, by = "id") %>% > collect() > {code} > {code:java} > ==24063== Thread 7: > ==24063== Invalid read of size 1 > ==24063== at 0x1FFE606D: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE68CC: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, > int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE84D5: > arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, > arrow::compute::ExecBatch const&) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE8CB4: > arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x200011CF: > arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB580E: > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> > >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FE2B2A0: > std::thread::_State_impl > > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x92844BF: ??? (in > /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29) > ==24063== by 0x6DD46DA: start_thread (pthread_create.c:463) > ==24063== by 0x710D71E: clone (clone.S:95) > ==24063== Address 0x10 is not stack'd, malloc'd or (recently) free'd > ==24063== *** caught segfault *** > address 0x10, cause 'memory not mapped'Traceback: > 1: Table__from_RecordBatchReader(self) > 2: tab$read_table() > 3: do_exec_plan(x) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(tab <- do_exec_plan(x), error = function(e) { > handle_csv_read_error(e, x$.data$schema)}) > 8: collect.arrow_dplyr_query(.) > 9: collect(.) > 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>% > left_join(db1, by = "id") %>% collect()Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace {code} > This is arrow from the current master ece0e23f1. > It's worth noting that if the right table is filtered on a non-partitioned > variable the problem does not occur. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches
[ https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17227: Issue Type: Test (was: Improvement) > [C++] Extend hash-join unit tests to cover both empty and length=0 batches > -- > > Key: ARROW-17227 > URL: https://issues.apache.org/jira/browse/ARROW-17227 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Krisztian Szucs >Assignee: Weston Pace >Priority: Major > Fix For: 9.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches
Krisztian Szucs created ARROW-17227: --- Summary: [C++] Extend hash-join unit tests to cover both empty and length=0 batches Key: ARROW-17227 URL: https://issues.apache.org/jira/browse/ARROW-17227 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Krisztian Szucs Assignee: Weston Pace Fix For: 9.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition
[ https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-15938: Fix Version/s: 10.0.0 (was: 9.0.0) > [R][C++] Segfault in left join with empty right table when filtered on > partition > > > Key: ARROW-15938 > URL: https://issues.apache.org/jira/browse/ARROW-15938 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 7.0.2 > Environment: ubuntu linux, R4.1.2 >Reporter: Vitalie Spinu >Assignee: Weston Pace >Priority: Critical > Labels: pull-request-available, query-engine > Fix For: 10.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When the right table in a join is empty as a result of a filtering on a > partition group the join segfaults: > {code:java} > library(arrow) > library(glue) > df <- mutate(iris, id = runif(n())) > dir <- "./tmp/iris" > dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F) > dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F) > write_parquet(df, glue("{dir}/group=a/part1.parquet")) > write_parquet(df, glue("{dir}/group=b/part2.parquet")) > db1 <- open_dataset(dir) %>% > filter(group == "blabla") > open_dataset(dir) %>% > filter(group == "b") %>% > select(id) %>% > left_join(db1, by = "id") %>% > collect() > {code} > {code:java} > ==24063== Thread 7: > ==24063== Invalid read of size 1 > ==24063== at 0x1FFE606D: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE68CC: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, > int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE84D5: > arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, > arrow::compute::ExecBatch const&) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE8CB4: > arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x200011CF: > arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB580E: > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> > >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FE2B2A0: > std::thread::_State_impl > > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x92844BF: ??? (in > /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29) > ==24063== by 0x6DD46DA: start_thread (pthread_create.c:463) > ==24063== by 0x710D71E: clone (clone.S:95) > ==24063== Address 0x10 is not stack'd, malloc'd or (recently) free'd > ==24063== *** caught segfault *** > address 0x10, cause 'memory not mapped'Traceback: > 1: Table__from_RecordBatchReader(self) > 2: tab$read_table() > 3: do_exec_plan(x) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(tab <- do_exec_plan(x), error = function(e) { > handle_csv_read_error(e, x$.data$schema)}) > 8: collect.arrow_dplyr_query(.) > 9: collect(.) > 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>% > left_join(db1, by = "id") %>% collect()Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace {code} > This is arrow from the current master ece0e23f1. > It's worth noting that if the right table is filtered on a non-partitioned > variable the problem does not occur. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition
[ https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571970#comment-17571970 ] Krisztian Szucs commented on ARROW-15938: - Postponing to 10.0. > [R][C++] Segfault in left join with empty right table when filtered on > partition > > > Key: ARROW-15938 > URL: https://issues.apache.org/jira/browse/ARROW-15938 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 7.0.2 > Environment: ubuntu linux, R4.1.2 >Reporter: Vitalie Spinu >Assignee: Weston Pace >Priority: Critical > Labels: pull-request-available, query-engine > Fix For: 9.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When the right table in a join is empty as a result of a filtering on a > partition group the join segfaults: > {code:java} > library(arrow) > library(glue) > df <- mutate(iris, id = runif(n())) > dir <- "./tmp/iris" > dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F) > dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F) > write_parquet(df, glue("{dir}/group=a/part1.parquet")) > write_parquet(df, glue("{dir}/group=b/part2.parquet")) > db1 <- open_dataset(dir) %>% > filter(group == "blabla") > open_dataset(dir) %>% > filter(group == "b") %>% > select(id) %>% > left_join(db1, by = "id") %>% > collect() > {code} > {code:java} > ==24063== Thread 7: > ==24063== Invalid read of size 1 > ==24063== at 0x1FFE606D: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, > arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE68CC: > arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, > int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE84D5: > arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, > arrow::compute::ExecBatch const&) (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFE8CB4: > arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x200011CF: > arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, > arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB580E: > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in > /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, > arrow::compute::MapNode::SubmitTask(std::function > (arrow::compute::ExecBatch)>, > arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> > >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x1FE2B2A0: > std::thread::_State_impl > > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0) > ==24063== by 0x92844BF: ??? (in > /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29) > ==24063== by 0x6DD46DA: start_thread (pthread_create.c:463) > ==24063== by 0x710D71E: clone (clone.S:95) > ==24063== Address 0x10 is not stack'd, malloc'd or (recently) free'd > ==24063== *** caught segfault *** > address 0x10, cause 'memory not mapped'Traceback: > 1: Table__from_RecordBatchReader(self) > 2: tab$read_table() > 3: do_exec_plan(x) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(tab <- do_exec_plan(x), error = function(e) { > handle_csv_read_error(e, x$.data$schema)}) > 8: collect.arrow_dplyr_query(.) > 9: collect(.) > 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>% > left_join(db1, by = "id") %>% collect()Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace {code} > This is arrow from the current master ece0e23f1. > It's worth noting that if the right table is filtered on a non-partitioned > variable the problem does not occur. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16276) [R] Release News
[ https://issues.apache.org/jira/browse/ARROW-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16276: Fix Version/s: 9.0.0 (was: 8.0.0) > [R] Release News > > > Key: ARROW-16276 > URL: https://issues.apache.org/jira/browse/ARROW-16276 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Jonathan Keane >Assignee: Will Jones >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > I typically use a command like: > {code} > git log fcab481 --grep=".*\[R\].*" --format="%s" > {code} > Which will find all the commits with {{[R]}}, since commit fcab481. I found > commit fcab481 by going to the 7.0.0 release branch and then finding the last > commit that is in the master branch as well as the 7.0.0 release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16035) [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty result set
[ https://issues.apache.org/jira/browse/ARROW-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16035: Fix Version/s: 9.0.0 (was: 8.0.0) > [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty > result set > -- > > Key: ARROW-16035 > URL: https://issues.apache.org/jira/browse/ARROW-16035 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 7.0.0 >Reporter: Jonathan Swenson >Assignee: Todd Farmer >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Using an ArrowVectorIterator built from a JDBC Result Set that is empty > causes the iterator to never terminate. > {code:java} > ArrowVectorIterator iterator = > JdbcToArrow.sqlToArrowVectorIterator(conn.createStatement() > .executeQuery("select 1 from table1 where false"), config); {code} > > It appears as though this is due to the implementation of the > [hasNext()|https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L158] > method. > The expectation is that the `isAfterLast()` method on a JDBC result set > return true when the result set is empty. However, according to the [JDBC > documentation|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()] > it will always return false when the result set is empty. > {quote}Returns:{{{}true{}}} if the cursor is after the last row; {{false}} if > the cursor is at any other position or the result set contains no rows > {quote} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-15568) [C++][Gandiva] Implement Translate Function
[ https://issues.apache.org/jira/browse/ARROW-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-15568. - Fix Version/s: 9.0.0 Resolution: Fixed > [C++][Gandiva] Implement Translate Function > --- > > Key: ARROW-15568 > URL: https://issues.apache.org/jira/browse/ARROW-15568 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Vinicius Souza Roque >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Translates the input string by replacing the characters present in the > {{from}} string with the corresponding characters in the {{to}} string. This > is similar to the {{translate}} function in > [PostgreSQL|http://www.postgresql.org/docs/9.1/interactive/functions-string.html]. > If any of the parameters to this UDF are NULL, the result is NULL as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-15568) [C++][Gandiva] Implement Translate Function
[ https://issues.apache.org/jira/browse/ARROW-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-15568: --- Assignee: Vinicius Souza Roque > [C++][Gandiva] Implement Translate Function > --- > > Key: ARROW-15568 > URL: https://issues.apache.org/jira/browse/ARROW-15568 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Vinicius Souza Roque >Assignee: Vinicius Souza Roque >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Translates the input string by replacing the characters present in the > {{from}} string with the corresponding characters in the {{to}} string. This > is similar to the {{translate}} function in > [PostgreSQL|http://www.postgresql.org/docs/9.1/interactive/functions-string.html]. > If any of the parameters to this UDF are NULL, the result is NULL as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-17035) [C++][Gandiva] Add Ceil Function
[ https://issues.apache.org/jira/browse/ARROW-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-17035: - > [C++][Gandiva] Add Ceil Function > > > Key: ARROW-17035 > URL: https://issues.apache.org/jira/browse/ARROW-17035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implementing Ceil Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17035) [C++][Gandiva] Add Ceil Function
[ https://issues.apache.org/jira/browse/ARROW-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17035. - Fix Version/s: 9.0.0 Resolution: Fixed > [C++][Gandiva] Add Ceil Function > > > Key: ARROW-17035 > URL: https://issues.apache.org/jira/browse/ARROW-17035 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implementing Ceil Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-16784) [C++][Gandiva] Add alias to Upper and Lower
[ https://issues.apache.org/jira/browse/ARROW-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16784. - Resolution: Fixed > [C++][Gandiva] Add alias to Upper and Lower > --- > > Key: ARROW-16784 > URL: https://issues.apache.org/jira/browse/ARROW-16784 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Vinicius Souza Roque >Assignee: Vinicius Souza Roque >Priority: Trivial > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding alias to functions Upper and Lower -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16784) [C++][Gandiva] Add alias to Upper and Lower
[ https://issues.apache.org/jira/browse/ARROW-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16784: Fix Version/s: 9.0.0 > [C++][Gandiva] Add alias to Upper and Lower > --- > > Key: ARROW-16784 > URL: https://issues.apache.org/jira/browse/ARROW-16784 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Vinicius Souza Roque >Assignee: Vinicius Souza Roque >Priority: Trivial > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Adding alias to functions Upper and Lower -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16455) [CI] [Packaging] Anaconda storage size exceeded for linux-ppc64le
[ https://issues.apache.org/jira/browse/ARROW-16455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16455: Fix Version/s: 9.0.0 (was: 8.0.0) > [CI] [Packaging] Anaconda storage size exceeded for linux-ppc64le > -- > > Key: ARROW-16455 > URL: https://issues.apache.org/jira/browse/ARROW-16455 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Packaging >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Critical > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Our Anaconda storage size for nightlies is exceeded: > {code:java} > "[ERROR] ('Storage requirements exceeded (3221225472 bytes). Payment is > required to add a file. Please go to > https://anaconda.org/binstar.settings/billing to update your plan', 402)" > {code} > It seems we forgot to add *linux-ppc64le* to the architectures list on this > fix: [https://github.com/apache/arrow/pull/12604] > See original issue: https://issues.apache.org/jira/browse/ARROW-15898 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17140) Adding Floor Function
[ https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17140: Fix Version/s: 9.0.0 > Adding Floor Function > - > > Key: ARROW-17140 > URL: https://issues.apache.org/jira/browse/ARROW-17140 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Adding Floor Function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17140) Adding Floor Function
[ https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17140. - Resolution: Fixed > Adding Floor Function > - > > Key: ARROW-17140 > URL: https://issues.apache.org/jira/browse/ARROW-17140 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding Floor Function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-17140) Adding Floor Function
[ https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-17140: - > Adding Floor Function > - > > Key: ARROW-17140 > URL: https://issues.apache.org/jira/browse/ARROW-17140 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding Floor Function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17140) Adding Floor Function
[ https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-17140: --- Assignee: Sahaj Gupta > Adding Floor Function > - > > Key: ARROW-17140 > URL: https://issues.apache.org/jira/browse/ARROW-17140 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding Floor Function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16413) [Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem
[ https://issues.apache.org/jira/browse/ARROW-16413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16413: Fix Version/s: 9.0.0 (was: 8.0.0) > [Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem > --- > > Key: ARROW-16413 > URL: https://issues.apache.org/jira/browse/ARROW-16413 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > See https://github.com/dask/dask/pull/8993 for details. > When using an fsspec filesystem (or maybe more generally a PyFileSystem), > inspecting a file through the FileFormat.inspect is hanging (this eg happens > in ParquetDatasetFactory) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation
[ https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16881: Fix Version/s: 9.0.0 > [Gandiva][C++] Fix castINTERVALYEAR implementation > -- > > Key: ARROW-16881 > URL: https://issues.apache.org/jira/browse/ARROW-16881 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Fix error in LLVM where didn't find this function. > Fix regex to allow negative digits for Interval Year. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation
[ https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-16881: - > [Gandiva][C++] Fix castINTERVALYEAR implementation > -- > > Key: ARROW-16881 > URL: https://issues.apache.org/jira/browse/ARROW-16881 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Fix error in LLVM where didn't find this function. > Fix regex to allow negative digits for Interval Year. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation
[ https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16881. - Resolution: Fixed > [Gandiva][C++] Fix castINTERVALYEAR implementation > -- > > Key: ARROW-16881 > URL: https://issues.apache.org/jira/browse/ARROW-16881 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Fix error in LLVM where didn't find this function. > Fix regex to allow negative digits for Interval Year. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16442) [Python] The fragments for ORC dataset return base Fragment instead of FileFragment
[ https://issues.apache.org/jira/browse/ARROW-16442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16442: Fix Version/s: 9.0.0 (was: 8.0.0) > [Python] The fragments for ORC dataset return base Fragment instead of > FileFragment > --- > > Key: ARROW-16442 > URL: https://issues.apache.org/jira/browse/ARROW-16442 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: dataset, dataset-dask-integration, pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > From https://github.com/dask/dask/pull/8944#issuecomment-1112620037 > For the ORC file format, we return base {{Fragment}} objects instead of the > {{FileFragment}} subclass (which has more functionality): > {code:python} > import pyarrow as pa > import pyarrow.dataset as ds > from pyarrow import orc > table = pa.table({'a': [1, 2, 3]}) > orc.write_table(table, "test.orc") > dataset = ds.dataset("test.orc", format="orc") > fragment = list(dataset.get_fragments())[0] > {code} > {code} > In [9]: fragment > Out[9]: > In [10]: fragment.path > --- > AttributeErrorTraceback (most recent call last) > in > > 1 fragment.path > AttributeError: 'pyarrow._dataset.Fragment' object has no attribute 'path' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-17036) [C++][Gandiva] Add sign Function
[ https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-17036: - > [C++][Gandiva] Add sign Function > > > Key: ARROW-17036 > URL: https://issues.apache.org/jira/browse/ARROW-17036 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Implementing Sign Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function
[ https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-15661: - > [Gandiva][C++] Add Mask_Hash function > - > > Key: ARROW-15661 > URL: https://issues.apache.org/jira/browse/ARROW-15661 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Returns a hashed value based on str. The hash is consistent and can be used > to join masked values together across tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function
[ https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-15661. - Resolution: Fixed > [Gandiva][C++] Add Mask_Hash function > - > > Key: ARROW-15661 > URL: https://issues.apache.org/jira/browse/ARROW-15661 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Returns a hashed value based on str. The hash is consistent and can be used > to join masked values together across tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17036) [C++][Gandiva] Add sign Function
[ https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17036: Fix Version/s: 9.0.0 > [C++][Gandiva] Add sign Function > > > Key: ARROW-17036 > URL: https://issues.apache.org/jira/browse/ARROW-17036 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Implementing Sign Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function
[ https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-15661: Fix Version/s: 9.0.0 > [Gandiva][C++] Add Mask_Hash function > - > > Key: ARROW-15661 > URL: https://issues.apache.org/jira/browse/ARROW-15661 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Johnnathan Rodrigo Pego de Almeida >Assignee: Johnnathan Rodrigo Pego de Almeida >Priority: Trivial > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Returns a hashed value based on str. The hash is consistent and can be used > to join masked values together across tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17036) [C++][Gandiva] Add sign Function
[ https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17036. - Resolution: Fixed > [C++][Gandiva] Add sign Function > > > Key: ARROW-17036 > URL: https://issues.apache.org/jira/browse/ARROW-17036 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Sahaj Gupta >Assignee: Sahaj Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Implementing Sign Function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
[ https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17070. - Resolution: Fixed > [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions > -- > > Key: ARROW-17070 > URL: https://issues.apache.org/jira/browse/ARROW-17070 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding functions to Gandiva: > mask_show_first_n(string, int) > mask_show_last_n(string, int) > 'Masking' according to Hive specification > (a-z : x, A-Z : X, 0-9 : n) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
[ https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17070: Fix Version/s: 9.0.0 > [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions > -- > > Key: ARROW-17070 > URL: https://issues.apache.org/jira/browse/ARROW-17070 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Adding functions to Gandiva: > mask_show_first_n(string, int) > mask_show_last_n(string, int) > 'Masking' according to Hive specification > (a-z : x, A-Z : X, 0-9 : n) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
[ https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-17070: - > [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions > -- > > Key: ARROW-17070 > URL: https://issues.apache.org/jira/browse/ARROW-17070 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Adding functions to Gandiva: > mask_show_first_n(string, int) > mask_show_last_n(string, int) > 'Masking' according to Hive specification > (a-z : x, A-Z : X, 0-9 : n) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17121) [Gandiva][C++] Adding mask function
[ https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17121. - Resolution: Fixed > [Gandiva][C++] Adding mask function > --- > > Key: ARROW-17121 > URL: https://issues.apache.org/jira/browse/ARROW-17121 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to > Gandiva. > With default masking upper case letters as 'X', lower case letters as 'x' and > numbers as 'n'. > Custom masking as optionally specified in parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (ARROW-17121) [Gandiva][C++] Adding mask function
[ https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reopened ARROW-17121: - > [Gandiva][C++] Adding mask function > --- > > Key: ARROW-17121 > URL: https://issues.apache.org/jira/browse/ARROW-17121 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to > Gandiva. > With default masking upper case letters as 'X', lower case letters as 'x' and > numbers as 'n'. > Custom masking as optionally specified in parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17121) [Gandiva][C++] Adding mask function
[ https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17121: Fix Version/s: 9.0.0 > [Gandiva][C++] Adding mask function > --- > > Key: ARROW-17121 > URL: https://issues.apache.org/jira/browse/ARROW-17121 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Palak Pariawala >Assignee: Palak Pariawala >Priority: Minor > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to > Gandiva. > With default masking upper case letters as 'X', lower case letters as 'x' and > numbers as 'n'. > Custom masking as optionally specified in parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect
[ https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16897: Affects Version/s: 9.0.0 > [R][C++] Full join on Arrow objects is incorrect > > > Key: ARROW-16897 > URL: https://issues.apache.org/jira/browse/ARROW-16897 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 8.0.0, 9.0.0 > Environment: Linux >Reporter: Oliver Reiter >Assignee: Weston Pace >Priority: Critical > Labels: joins, query-engine > Fix For: 10.0.0 > > > Hello, > I am trying to do a full join on a dataset. It produces the correct number of > observations, but not the correct result (the resulting data.frame is just > filled up with NA-rows). > My use case: I want to include the 'full' year range for every factor value: > {code:java} > library(data.table) > library(arrow) > library(dplyr) > year_range <- 2000:2019 > group_n <- 100 > N <- 1000 ## the resulting data should have 100 groups * 20 years > dt <- data.table(value = rnorm(N), > group = rep(paste0("g", 1:group_n), length.out = N)) > ## there are only observations for some years in every group > dt[, year := sample(year_range, size = N / group_n), by = .(group)] > dt[group == "g1", ] > ## this would be the 'full' data.table > group_years <- data.table(group = rep(unique(dt$group), each = 20), > year = rep(year_range, times = 10)) > group_years[group == "g1", ] > write_dataset(dt, path = "parquet_db") > db <- open_dataset(sources = "parquet_db") > ## full_join using data.table -> expected result > db_full <- merge(dt, group_years, > by = c("group", "year"), > all = TRUE) > setorder(db_full, group, year) > db_full[group == "g1", ] > ## try to do the full_join with arrow -> incorrect result > db_full_arrow <- db |> > full_join(group_years, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ] > ## or: convert data.table to arrow_table beforehand -> incorrect result > group_years_arrow <- group_years |> > as_arrow_table() > db_full_arrow <- db |> > full_join(group_years_arrow, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ]{code} > The [documentation|https://arrow.apache.org/docs/r/] says equality joins are > supported, which should hold also for `full_join` I guess? > Thanks for your time and work! > > Oliver -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect
[ https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16897: Fix Version/s: 10.0.0 (was: 9.0.0) > [R][C++] Full join on Arrow objects is incorrect > > > Key: ARROW-16897 > URL: https://issues.apache.org/jira/browse/ARROW-16897 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 8.0.0 > Environment: Linux >Reporter: Oliver Reiter >Assignee: Weston Pace >Priority: Critical > Labels: joins, query-engine > Fix For: 10.0.0 > > > Hello, > I am trying to do a full join on a dataset. It produces the correct number of > observations, but not the correct result (the resulting data.frame is just > filled up with NA-rows). > My use case: I want to include the 'full' year range for every factor value: > {code:java} > library(data.table) > library(arrow) > library(dplyr) > year_range <- 2000:2019 > group_n <- 100 > N <- 1000 ## the resulting data should have 100 groups * 20 years > dt <- data.table(value = rnorm(N), > group = rep(paste0("g", 1:group_n), length.out = N)) > ## there are only observations for some years in every group > dt[, year := sample(year_range, size = N / group_n), by = .(group)] > dt[group == "g1", ] > ## this would be the 'full' data.table > group_years <- data.table(group = rep(unique(dt$group), each = 20), > year = rep(year_range, times = 10)) > group_years[group == "g1", ] > write_dataset(dt, path = "parquet_db") > db <- open_dataset(sources = "parquet_db") > ## full_join using data.table -> expected result > db_full <- merge(dt, group_years, > by = c("group", "year"), > all = TRUE) > setorder(db_full, group, year) > db_full[group == "g1", ] > ## try to do the full_join with arrow -> incorrect result > db_full_arrow <- db |> > full_join(group_years, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ] > ## or: convert data.table to arrow_table beforehand -> incorrect result > group_years_arrow <- group_years |> > as_arrow_table() > db_full_arrow <- db |> > full_join(group_years_arrow, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ]{code} > The [documentation|https://arrow.apache.org/docs/r/] says equality joins are > supported, which should hold also for `full_join` I guess? > Thanks for your time and work! > > Oliver -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect
[ https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571887#comment-17571887 ] Krisztian Szucs commented on ARROW-16897: - Postponing to 10.0 since it depends on several other unresolved issues. > [R][C++] Full join on Arrow objects is incorrect > > > Key: ARROW-16897 > URL: https://issues.apache.org/jira/browse/ARROW-16897 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 8.0.0 > Environment: Linux >Reporter: Oliver Reiter >Assignee: Weston Pace >Priority: Critical > Labels: joins, query-engine > Fix For: 9.0.0 > > > Hello, > I am trying to do a full join on a dataset. It produces the correct number of > observations, but not the correct result (the resulting data.frame is just > filled up with NA-rows). > My use case: I want to include the 'full' year range for every factor value: > {code:java} > library(data.table) > library(arrow) > library(dplyr) > year_range <- 2000:2019 > group_n <- 100 > N <- 1000 ## the resulting data should have 100 groups * 20 years > dt <- data.table(value = rnorm(N), > group = rep(paste0("g", 1:group_n), length.out = N)) > ## there are only observations for some years in every group > dt[, year := sample(year_range, size = N / group_n), by = .(group)] > dt[group == "g1", ] > ## this would be the 'full' data.table > group_years <- data.table(group = rep(unique(dt$group), each = 20), > year = rep(year_range, times = 10)) > group_years[group == "g1", ] > write_dataset(dt, path = "parquet_db") > db <- open_dataset(sources = "parquet_db") > ## full_join using data.table -> expected result > db_full <- merge(dt, group_years, > by = c("group", "year"), > all = TRUE) > setorder(db_full, group, year) > db_full[group == "g1", ] > ## try to do the full_join with arrow -> incorrect result > db_full_arrow <- db |> > full_join(group_years, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ] > ## or: convert data.table to arrow_table beforehand -> incorrect result > group_years_arrow <- group_years |> > as_arrow_table() > db_full_arrow <- db |> > full_join(group_years_arrow, by = c("group", "year")) |> > collect() |> > setDT() > setorder(db_full_arrow, group, year) > db_full_arrow[group == "g1", ]{code} > The [documentation|https://arrow.apache.org/docs/r/] says equality joins are > supported, which should hold also for `full_join` I guess? > Thanks for your time and work! > > Oliver -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17206) [R] Skip test to fix snappy sanitizer issue
[ https://issues.apache.org/jira/browse/ARROW-17206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17206. - Resolution: Fixed Issue resolved by pull request 13704 [https://github.com/apache/arrow/pull/13704] > [R] Skip test to fix snappy sanitizer issue > --- > > Key: ARROW-17206 > URL: https://issues.apache.org/jira/browse/ARROW-17206 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Jacob Wujciak-Jens >Assignee: Jacob Wujciak-Jens >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Known bug with snappy in a new test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17211) [Java] Fix java-jar nightly on gh & self-hosted runners
[ https://issues.apache.org/jira/browse/ARROW-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-17211. - Resolution: Fixed Issue resolved by pull request 13712 [https://github.com/apache/arrow/pull/13712] > [Java] Fix java-jar nightly on gh & self-hosted runners > --- > > Key: ARROW-17211 > URL: https://issues.apache.org/jira/browse/ARROW-17211 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Java >Reporter: Jacob Wujciak-Jens >Assignee: Jacob Wujciak-Jens >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > [ARROW-16943] added clean up to {{java_full_build.sh}} to fix issues with > multiple jars when the job was run on a self-hosted (aka non-ephemeral) > runner. This does fails when {{.~/m2}} does not exists. > I marked this as a blocker because this prevents us from building the release > Jars. > cc: [~kszucs] [~raulcd] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17051) [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN
[ https://issues.apache.org/jira/browse/ARROW-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17051: Fix Version/s: 9.0.0 (was: 10.0.0) > [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN > - > > Key: ARROW-17051 > URL: https://issues.apache.org/jira/browse/ARROW-17051 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Raúl Cumplido >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > The CI job for ASAN UBSAN is based on Ubuntu 20.04: *C++ / AMD64 Ubuntu 20.04 > C++ ASAN UBSAN* > Trying to build Flight and Flight SQL on Ubuntu 20.04 the job for ASAN UBSAN > will also build with Flight and Flight SQL. This triggers some > arrow-flight-sql-test failures like: > {code:java} > [ RUN ] TestFlightSqlClient.TestGetDbSchemas > unknown file: Failure > Unexpected mock function call - taking default action specified at: > /arrow/cpp/src/arrow/flight/sql/client_test.cc:151: > Function call: GetFlightInfo(@0x6157d948 184-byte object <00-00 00-00 > 00-00 F0-BF 40-00 00-00 00-00 00-00 80-4C 06-49 CF-7F 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 01-01 00-00 00-00 00-00 > 00-20 00-00 00-00 00-00 ... 01-00 00-04 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, > @0x7fff35794e80 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 > 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>) > Returns: (nullptr) > Google Mock tried the following 1 expectation, but it didn't match: > /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: EXPECT_CALL(sql_client_, > GetFlightInfo(Ref(call_options_), descriptor))... > Expected arg #1: is equal to 64-byte object <02-00 00-00 BE-BE BE-BE C0-6B > 05-00 C0-60 00-00 73-00 00-00 00-00 00-00 73-00 00-00 00-00 00-00 BE-BE BE-BE > BE-BE BE-BE 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00> > Actual: 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 > 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00> > Expected: to be called once > Actual: never called - unsatisfied and active > /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: Failure > Actual function call count doesn't match EXPECT_CALL(sql_client_, > GetFlightInfo(Ref(call_options_), descriptor))... > Expected: to be called once > Actual: never called - unsatisfied and active > [ FAILED ] TestFlightSqlClient.TestGetDbSchemas (1 ms){code} > The error can be seen here: > [https://github.com/apache/arrow/runs/7297442828?check_suite_focus=true] > This is the initial PR that triggered it: > [https://github.com/apache/arrow/pull/13548] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled
[ https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-15678: Fix Version/s: 10.0.0 (was: 9.0.0) > [C++][CI] a crossbow job with MinRelSize enabled > > > Key: ARROW-15678 > URL: https://issues.apache.org/jira/browse/ARROW-15678 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Jonathan Keane >Priority: Blocker > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 13h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled
[ https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571097#comment-17571097 ] Krisztian Szucs commented on ARROW-15678: - Postponing to 10.0 for now. > [C++][CI] a crossbow job with MinRelSize enabled > > > Key: ARROW-15678 > URL: https://issues.apache.org/jira/browse/ARROW-15678 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Jonathan Keane >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 13h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-16887) [Doc][R] Document GCSFileSystem for R package
[ https://issues.apache.org/jira/browse/ARROW-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16887. - Resolution: Fixed Issue resolved by pull request 13601 [https://github.com/apache/arrow/pull/13601] > [Doc][R] Document GCSFileSystem for R package > - > > Key: ARROW-16887 > URL: https://issues.apache.org/jira/browse/ARROW-16887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, R >Reporter: Will Jones >Assignee: Will Jones >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > We should update the [cloud storage > vignette|https://arrow.apache.org/docs/r/articles/fs.html] and the filesystem > RD to show configuration and usage of GCSFileSystem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64
[ https://issues.apache.org/jira/browse/ARROW-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570779#comment-17570779 ] Krisztian Szucs commented on ARROW-16919: - That looks like quite a journey :) Thanks [~lidavidm] for figuring it out! > [C++] Flight integration tests fail on verify rc nightly on linux amd64 > --- > > Key: ARROW-16919 > URL: https://issues.apache.org/jira/browse/ARROW-16919 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, FlightRPC >Reporter: Raúl Cumplido >Priority: Critical > Labels: Nightly, pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Some of our nightly builds to verify the release are failing: > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-almalinux-8-amd64|https://github.com/ursacomputing/crossbow/runs/7073206980?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-18.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073217433?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-20.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073210299?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-22.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073273051?check_suite_focus=true] > with the following: > {code:java} > # FAILURES # > FAILED TEST: middleware C++ producing, C++ consuming > 1 failures > File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd > output = subprocess.check_output(cmd, stderr=subprocess.STDOUT) > File "/usr/lib/python3.8/subprocess.py", line 411, in check_output > return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > File "/usr/lib/python3.8/subprocess.py", line 512, in run > raise CalledProcessError(retcode, process.args, > subprocess.CalledProcessError: Command > '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', > '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with > . > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/arrow/dev/archery/archery/integration/runner.py", line 379, in > _run_flight_test_case > consumer.flight_request(port, **client_args) > File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 134, in > flight_request > run_cmd(cmd) > File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd > raise RuntimeError(sio.getvalue()) > RuntimeError: Command failed: > /tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client -host > localhost -port=36719 -scenario middleware > With output: > -- > Headers received successfully on failing call. > Headers received successfully on passing call. > free(): double free detected in tcache 2 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17051) [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN
[ https://issues.apache.org/jira/browse/ARROW-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-17051: Fix Version/s: 9.0.0 > [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN > - > > Key: ARROW-17051 > URL: https://issues.apache.org/jira/browse/ARROW-17051 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The CI job for ASAN UBSAN is based on Ubuntu 20.04: *C++ / AMD64 Ubuntu 20.04 > C++ ASAN UBSAN* > Trying to build Flight and Flight SQL on Ubuntu 20.04 the job for ASAN UBSAN > will also build with Flight and Flight SQL. This triggers some > arrow-flight-sql-test failures like: > {code:java} > [ RUN ] TestFlightSqlClient.TestGetDbSchemas > unknown file: Failure > Unexpected mock function call - taking default action specified at: > /arrow/cpp/src/arrow/flight/sql/client_test.cc:151: > Function call: GetFlightInfo(@0x6157d948 184-byte object <00-00 00-00 > 00-00 F0-BF 40-00 00-00 00-00 00-00 80-4C 06-49 CF-7F 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 01-01 00-00 00-00 00-00 > 00-20 00-00 00-00 00-00 ... 01-00 00-04 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, > @0x7fff35794e80 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 > 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>) > Returns: (nullptr) > Google Mock tried the following 1 expectation, but it didn't match: > /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: EXPECT_CALL(sql_client_, > GetFlightInfo(Ref(call_options_), descriptor))... > Expected arg #1: is equal to 64-byte object <02-00 00-00 BE-BE BE-BE C0-6B > 05-00 C0-60 00-00 73-00 00-00 00-00 00-00 73-00 00-00 00-00 00-00 BE-BE BE-BE > BE-BE BE-BE 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00> > Actual: 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 > 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00> > Expected: to be called once > Actual: never called - unsatisfied and active > /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: Failure > Actual function call count doesn't match EXPECT_CALL(sql_client_, > GetFlightInfo(Ref(call_options_), descriptor))... > Expected: to be called once > Actual: never called - unsatisfied and active > [ FAILED ] TestFlightSqlClient.TestGetDbSchemas (1 ms){code} > The error can be seen here: > [https://github.com/apache/arrow/runs/7297442828?check_suite_focus=true] > This is the initial PR that triggered it: > [https://github.com/apache/arrow/pull/13548] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64
[ https://issues.apache.org/jira/browse/ARROW-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16919: Fix Version/s: 9.0.0 > [C++] Flight integration tests fail on verify rc nightly on linux amd64 > --- > > Key: ARROW-16919 > URL: https://issues.apache.org/jira/browse/ARROW-16919 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, FlightRPC >Reporter: Raúl Cumplido >Priority: Critical > Labels: Nightly, pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Some of our nightly builds to verify the release are failing: > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-almalinux-8-amd64|https://github.com/ursacomputing/crossbow/runs/7073206980?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-18.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073217433?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-20.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073210299?check_suite_focus=true] > {color:#1d1c1d}- > {color}[verify-rc-source-integration-linux-ubuntu-22.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073273051?check_suite_focus=true] > with the following: > {code:java} > # FAILURES # > FAILED TEST: middleware C++ producing, C++ consuming > 1 failures > File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd > output = subprocess.check_output(cmd, stderr=subprocess.STDOUT) > File "/usr/lib/python3.8/subprocess.py", line 411, in check_output > return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > File "/usr/lib/python3.8/subprocess.py", line 512, in run > raise CalledProcessError(retcode, process.args, > subprocess.CalledProcessError: Command > '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', > '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with > . > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/arrow/dev/archery/archery/integration/runner.py", line 379, in > _run_flight_test_case > consumer.flight_request(port, **client_args) > File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 134, in > flight_request > run_cmd(cmd) > File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd > raise RuntimeError(sio.getvalue()) > RuntimeError: Command failed: > /tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client -host > localhost -port=36719 -scenario middleware > With output: > -- > Headers received successfully on failing call. > Headers received successfully on passing call. > free(): double free detected in tcache 2 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled
[ https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570777#comment-17570777 ] Krisztian Szucs commented on ARROW-15678: - [~jonkeane] can you give an update on this issue? > [C++][CI] a crossbow job with MinRelSize enabled > > > Key: ARROW-15678 > URL: https://issues.apache.org/jira/browse/ARROW-15678 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Jonathan Keane >Priority: Blocker > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 13h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-14314) [C++] Sorting dictionary array not implemented
[ https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-14314: Fix Version/s: 10.0.0 (was: 9.0.0) > [C++] Sorting dictionary array not implemented > -- > > Key: ARROW-14314 > URL: https://issues.apache.org/jira/browse/ARROW-14314 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Ariana Villegas >Priority: Major > Labels: kernel, pull-request-available > Fix For: 10.0.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > From R, taking the stock {{mtcars}} dataset and giving it a dictionary type > column: > {code} > mtcars %>% > mutate(cyl = as.factor(cyl)) %>% > Table$create() %>% > arrange(cyl) %>% > collect() > Error: Type error: Sorting not supported for type dictionary indices=int8, ordered=0> > ../src/arrow/compute/kernels/vector_array_sort.cc:427 VisitTypeInline(type, > this) > ../src/arrow/compute/kernels/vector_sort.cc:148 > GetArraySorter(*physical_type_) > ../src/arrow/compute/kernels/vector_sort.cc:1206 sorter.Sort() > ../src/arrow/compute/api_vector.cc:259 CallFunction("sort_indices", {datum}, > &options, ctx) > ../src/arrow/compute/exec/order_by_impl.cc:53 SortIndices(table, options_, > ctx_) > ../src/arrow/compute/exec/sink_node.cc:292 impl_->DoFinish() > ../src/arrow/compute/exec/exec_plan.cc:297 iterator_.Next() > ../src/arrow/record_batch.cc:318 ReadNext(&batch) > ../src/arrow/record_batch.cc:329 ReadAll(&batches) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16817) [C++][Python] Segfaults for unsupported datatypes in the ORC writer
[ https://issues.apache.org/jira/browse/ARROW-16817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16817: Fix Version/s: 10.0.0 (was: 9.0.0) > [C++][Python] Segfaults for unsupported datatypes in the ORC writer > --- > > Key: ARROW-16817 > URL: https://issues.apache.org/jira/browse/ARROW-16817 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Alexander Joiner >Assignee: Ian Alexander Joiner >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In the ORC writer if a table has at least a column with unsupported datatype > segfaults occur when we try to write them in ORC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-16817) [C++][Python] Segfaults for unsupported datatypes in the ORC writer
[ https://issues.apache.org/jira/browse/ARROW-16817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570765#comment-17570765 ] Krisztian Szucs commented on ARROW-16817: - Postponing to 10.0. > [C++][Python] Segfaults for unsupported datatypes in the ORC writer > --- > > Key: ARROW-16817 > URL: https://issues.apache.org/jira/browse/ARROW-16817 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Ian Alexander Joiner >Assignee: Ian Alexander Joiner >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In the ORC writer if a table has at least a column with unsupported datatype > segfaults occur when we try to write them in ORC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-16616) [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method
[ https://issues.apache.org/jira/browse/ARROW-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570764#comment-17570764 ] Krisztian Szucs commented on ARROW-16616: - Postponing to 10.0, feel free to include it when the PR is ready. > [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter > method > - > > Key: ARROW-16616 > URL: https://issues.apache.org/jira/browse/ARROW-16616 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python >Reporter: Alessandro Molina >Assignee: Alessandro Molina >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > To keep the {{Dataset}} api compatible with the {{Table}} one in terms of > analytics capabilities, we should add a {{Dataset.filter}} method. The > initial POC was based on {{_table_filter}} but that required materialising > all the {{Dataset}} content after filtering as it returned an > {{{}InMemoryDataset{}}}. > Given that {{Scanner}} can filter a dataset without actually materialising > the data until a final step happens, it would be good to have > {{Dataset.filter}} return some form of lazy dataset when the filter is only > stored aside and the Scanner is created when data is actually retrieved. > PS: Also update {{test_dataset_filter}} test to use the {{Dataset.filter}} > method -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16616) [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method
[ https://issues.apache.org/jira/browse/ARROW-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16616: Fix Version/s: 10.0.0 (was: 9.0.0) > [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter > method > - > > Key: ARROW-16616 > URL: https://issues.apache.org/jira/browse/ARROW-16616 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python >Reporter: Alessandro Molina >Assignee: Alessandro Molina >Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > To keep the {{Dataset}} api compatible with the {{Table}} one in terms of > analytics capabilities, we should add a {{Dataset.filter}} method. The > initial POC was based on {{_table_filter}} but that required materialising > all the {{Dataset}} content after filtering as it returned an > {{{}InMemoryDataset{}}}. > Given that {{Scanner}} can filter a dataset without actually materialising > the data until a final step happens, it would be good to have > {{Dataset.filter}} return some form of lazy dataset when the filter is only > stored aside and the Scanner is created when data is actually retrieved. > PS: Also update {{test_dataset_filter}} test to use the {{Dataset.filter}} > method -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers
[ https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-10739: Fix Version/s: 10.0.0 > [Python] Pickling a sliced array serializes all the buffers > --- > > Key: ARROW-10739 > URL: https://issues.apache.org/jira/browse/ARROW-10739 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Maarten Breddels >Assignee: Alessandro Molina >Priority: Critical > Fix For: 10.0.0 > > > If a large array is sliced, and pickled, it seems the full buffer is > serialized, this leads to excessive memory usage and data transfer when using > multiprocessing or dask. > {code:java} > >>> import pyarrow as pa > >>> ar = pa.array(['foo'] * 100_000) > >>> ar.nbytes > 74 > >>> import pickle > >>> len(pickle.dumps(ar.slice(10, 1))) > 700165 > NumPy for instance > >>> import numpy as np > >>> ar_np = np.array(ar) > >>> ar_np > array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object) > >>> import pickle > >>> len(pickle.dumps(ar_np[10:11])) > 165{code} > I think this makes sense if you know arrow, but kind of unexpected as a user. > Is there a workaround for this? For instance copy an arrow array to get rid > of the offset, and trim the buffers? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers
[ https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570763#comment-17570763 ] Krisztian Szucs commented on ARROW-10739: - Postponing to 10.0 since there is no PR available at the moment. > [Python] Pickling a sliced array serializes all the buffers > --- > > Key: ARROW-10739 > URL: https://issues.apache.org/jira/browse/ARROW-10739 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Maarten Breddels >Assignee: Alessandro Molina >Priority: Critical > > If a large array is sliced, and pickled, it seems the full buffer is > serialized, this leads to excessive memory usage and data transfer when using > multiprocessing or dask. > {code:java} > >>> import pyarrow as pa > >>> ar = pa.array(['foo'] * 100_000) > >>> ar.nbytes > 74 > >>> import pickle > >>> len(pickle.dumps(ar.slice(10, 1))) > 700165 > NumPy for instance > >>> import numpy as np > >>> ar_np = np.array(ar) > >>> ar_np > array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object) > >>> import pickle > >>> len(pickle.dumps(ar_np[10:11])) > 165{code} > I think this makes sense if you know arrow, but kind of unexpected as a user. > Is there a workaround for this? For instance copy an arrow array to get rid > of the offset, and trim the buffers? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers
[ https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-10739: Fix Version/s: (was: 9.0.0) > [Python] Pickling a sliced array serializes all the buffers > --- > > Key: ARROW-10739 > URL: https://issues.apache.org/jira/browse/ARROW-10739 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Maarten Breddels >Assignee: Alessandro Molina >Priority: Critical > > If a large array is sliced, and pickled, it seems the full buffer is > serialized, this leads to excessive memory usage and data transfer when using > multiprocessing or dask. > {code:java} > >>> import pyarrow as pa > >>> ar = pa.array(['foo'] * 100_000) > >>> ar.nbytes > 74 > >>> import pickle > >>> len(pickle.dumps(ar.slice(10, 1))) > 700165 > NumPy for instance > >>> import numpy as np > >>> ar_np = np.array(ar) > >>> ar_np > array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object) > >>> import pickle > >>> len(pickle.dumps(ar_np[10:11])) > 165{code} > I think this makes sense if you know arrow, but kind of unexpected as a user. > Is there a workaround for this? For instance copy an arrow array to get rid > of the offset, and trim the buffers? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-16655) [Release] Release improvements
[ https://issues.apache.org/jira/browse/ARROW-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16655. - Resolution: Fixed Nice! Thanks [~raulcd]! > [Release] Release improvements > -- > > Key: ARROW-16655 > URL: https://issues.apache.org/jira/browse/ARROW-16655 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Fix For: 9.0.0 > > > This is an umbrella ticket collecting various improvements to our existing > Release Process. > The improvements are focused on: > * Improvements and fixes for the current release scripts and steps > * Documentation improvements -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-16665) [Release] Update 03-binary-submit.sh to comment on PR and track binary submission with badges
[ https://issues.apache.org/jira/browse/ARROW-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16665. - Resolution: Fixed Issue resolved by pull request 13612 [https://github.com/apache/arrow/pull/13612] > [Release] Update 03-binary-submit.sh to comment on PR and track binary > submission with badges > - > > Key: ARROW-16665 > URL: https://issues.apache.org/jira/browse/ARROW-16665 > Project: Apache Arrow > Issue Type: Sub-task > Components: Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15961) [C++] Check nullability when validating fields on batches or struct arrays
[ https://issues.apache.org/jira/browse/ARROW-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-15961: Fix Version/s: 10.0.0 (was: 9.0.0) > [C++] Check nullability when validating fields on batches or struct arrays > -- > > Key: ARROW-15961 > URL: https://issues.apache.org/jira/browse/ARROW-15961 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Kaouther Abrougui >Priority: Major > Labels: good-first-issue, good-second-issue, > pull-request-available > Fix For: 10.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > > According to ARROW-15899, it is possible to declare a field non-nullable, > associate with data that has nulls, and still pass validation. > Validation should instead fail in such a situation (at least full validation, > since computing the null count can be O\(n\)). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-15961) [C++] Check nullability when validating fields on batches or struct arrays
[ https://issues.apache.org/jira/browse/ARROW-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570759#comment-17570759 ] Krisztian Szucs commented on ARROW-15961: - Moving to 10.0 since the PR is in progress. > [C++] Check nullability when validating fields on batches or struct arrays > -- > > Key: ARROW-15961 > URL: https://issues.apache.org/jira/browse/ARROW-15961 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Kaouther Abrougui >Priority: Major > Labels: good-first-issue, good-second-issue, > pull-request-available > Fix For: 9.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > > According to ARROW-15899, it is possible to declare a field non-nullable, > associate with data that has nulls, and still pass validation. > Validation should instead fail in such a situation (at least full validation, > since computing the null count can be O\(n\)). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-16797) [Python][Packaging] Update conda-recipes from conda-forge feedstock
[ https://issues.apache.org/jira/browse/ARROW-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552384#comment-17552384 ] Krisztian Szucs commented on ARROW-16797: - Conda forge feedstocks seem more self-contained nowadays. One possible solution could be to copy the top-level azure template https://github.com/conda-forge/arrow-cpp-feedstock/blob/main/azure-pipelines.yml and clone the upstream feedstock as the first step and copy our meta.yml to the freshly cloned feedstock directory. We should be able to choose the template at runtime https://docs.microsoft.com/en-us/azure/devops/pipelines/process/templates?view=azure-devops#parameters-to-select-a-template-at-runtime Of course this would mean that we need to remove the task parametrization from tasks.yml and let the upstream feedstock configutation handle the build matrix. Mixing it with the arrow-r feedstock is less trivial though (perhaps we should just drop that). > [Python][Packaging] Update conda-recipes from conda-forge feedstock > --- > > Key: ARROW-16797 > URL: https://issues.apache.org/jira/browse/ARROW-16797 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Raúl Cumplido >Priority: Major > > Our conda-recipes have not been updated for the last 4 months > ([https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes/.ci_support)] > and they are not up-to-date with the upstream feedstocks: > [arrow-cpp-feedstock]: [https://github.com/conda-forge/arrow-cpp-feedstock] > [parquet-cpp-feedstock]: > [https://github.com/conda-forge/parquet-cpp-feedstock] > We should keep them up-to-date. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (ARROW-16797) [Python][Packaging] Update conda-recipes from conda-forge feedstock
[ https://issues.apache.org/jira/browse/ARROW-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552377#comment-17552377 ] Krisztian Szucs commented on ARROW-16797: - Could we perhaps automatize this somehow? > [Python][Packaging] Update conda-recipes from conda-forge feedstock > --- > > Key: ARROW-16797 > URL: https://issues.apache.org/jira/browse/ARROW-16797 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Raúl Cumplido >Priority: Major > > Our conda-recipes have not been updated for the last 4 months > ([https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes/.ci_support)] > and they are not up-to-date with the upstream feedstocks: > [arrow-cpp-feedstock]: [https://github.com/conda-forge/arrow-cpp-feedstock] > [parquet-cpp-feedstock]: > [https://github.com/conda-forge/parquet-cpp-feedstock] > We should keep them up-to-date. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16553) [Java][CI] Use GitHub repository Jar assets as a repository that could be consumed by dependencies management (Maven/Gradle)
[ https://issues.apache.org/jira/browse/ARROW-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16553. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13328 [https://github.com/apache/arrow/pull/13328] > [Java][CI] Use GitHub repository Jar assets as a repository that could be > consumed by dependencies management (Maven/Gradle) > > > Key: ARROW-16553 > URL: https://issues.apache.org/jira/browse/ARROW-16553 > Project: Apache Arrow > Issue Type: Sub-task > Components: Developer Tools, Java >Affects Versions: 9.0.0 >Reporter: David Dali Susanibar Arce >Assignee: David Dali Susanibar Arce >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > For Java side currently we are offering nightly builds Jar artifacts uploaded > to GitHub repository as an assets. > Then, if a user decided to use that in their local projects they need to > download that Jar assets from the GitHub nightly packages and i[nstall that > manually one by one as Jar needed as mention in the > documentation|https://arrow.apache.org/docs/java/install.html#installing-nightly-packages]. > Trying to figure out if there are some option to use GitHub nightly builds > Jar artifacts as a really repository and only configure the nightly build in > my pom.xml for example and maven be able to download dependencies needed > automatically. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16785) [Packaging][Linux] Add FindThrift.cmake
[ https://issues.apache.org/jira/browse/ARROW-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16785. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13337 [https://github.com/apache/arrow/pull/13337] > [Packaging][Linux] Add FindThrift.cmake > > > Key: ARROW-16785 > URL: https://issues.apache.org/jira/browse/ARROW-16785 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is a follow-up of ARROW-1672. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16767) [Archery] Refactor archery.release submodule to its own subpackage
Krisztian Szucs created ARROW-16767: --- Summary: [Archery] Refactor archery.release submodule to its own subpackage Key: ARROW-16767 URL: https://issues.apache.org/jira/browse/ARROW-16767 Project: Apache Arrow Issue Type: Improvement Components: Archery Reporter: Krisztian Szucs Fix For: 9.0.0 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (ARROW-16767) [Archery] Refactor archery.release submodule to its own subpackage
[ https://issues.apache.org/jira/browse/ARROW-16767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-16767: --- Assignee: Krisztian Szucs > [Archery] Refactor archery.release submodule to its own subpackage > -- > > Key: ARROW-16767 > URL: https://issues.apache.org/jira/browse/ARROW-16767 > Project: Apache Arrow > Issue Type: Improvement > Components: Archery >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Fix For: 9.0.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16663) [Release][Dev] Add flag to archery release curate to only show minimal information
[ https://issues.apache.org/jira/browse/ARROW-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16663. - Fix Version/s: 9.0.0 Resolution: Fixed > [Release][Dev] Add flag to archery release curate to only show minimal > information > -- > > Key: ARROW-16663 > URL: https://issues.apache.org/jira/browse/ARROW-16663 > Project: Apache Arrow > Issue Type: Sub-task > Components: Developer Tools >Reporter: Raúl Cumplido >Assignee: Jacob Wujciak-Jens >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently archery release curate shows a lot of information that is not > relevant, like the tickets that are correctly assigned. Have a new flag to > show only the information that requires manual fixing. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16684) [CI][Archery] Add retry mechanism to git fetch
[ https://issues.apache.org/jira/browse/ARROW-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16684. - Resolution: Fixed Issue resolved by pull request 13258 [https://github.com/apache/arrow/pull/13258] > [CI][Archery] Add retry mechanism to git fetch > -- > > Key: ARROW-16684 > URL: https://issues.apache.org/jira/browse/ARROW-16684 > Project: Apache Arrow > Issue Type: Improvement > Components: Archery, Continuous Integration, Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Archery seems to fail sometimes to fetch branches for some repositories. Some > of the report packaging jobs > ([https://github.com/ursacomputing/crossbow/runs/6643769198?check_suite_focus=true)] > have been failing due to git errors when fetching: > {code:java} > File > "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/crossbow/cli.py", > line 238, in latest_prefix > queue.fetch() > File > "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/crossbow/core.py", > line 271, in fetch > self.origin.fetch([refspec]) > File > "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/remote.py", > line 146, in fetch > payload.check_error(err) > File > "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/callbacks.py", > line 93, in check_error > check_error(error_code) > File > "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/errors.py", > line 65, in check_error > raise GitError(message) > _pygit2.GitError: SSL error: received early EOF > Error: Process completed with exit code 1.{code} > I have seen that retrying the job can make it pass. > We should add a retry mechanism to archery to allow retry on GitErrors when > fetching branches. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16560) [Website][Release] Version JSON files not updated in release
[ https://issues.apache.org/jira/browse/ARROW-16560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16560. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13257 [https://github.com/apache/arrow/pull/13257] > [Website][Release] Version JSON files not updated in release > > > Key: ARROW-16560 > URL: https://issues.apache.org/jira/browse/ARROW-16560 > Project: Apache Arrow > Issue Type: Bug > Components: Website >Reporter: Nicola Crane >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > ARROW-15366 added a script to automatically increment the version switchers > for the docs, which was updated as part of the changes in ARROW-1. > However, the latest release did not increment the version numbers (and > ARROW-1 changes the script to update on snapshots instead of releases - > could be the reason for it not happening?) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases
[ https://issues.apache.org/jira/browse/ARROW-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-16654: --- Assignee: Krisztian Szucs > [Dev][Archery] Support cherry-picking for major releases > - > > Key: ARROW-16654 > URL: https://issues.apache.org/jira/browse/ARROW-16654 > Project: Apache Arrow > Issue Type: New Feature > Components: Archery, Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases
[ https://issues.apache.org/jira/browse/ARROW-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16654. - Resolution: Fixed Issue resolved by pull request 13230 [https://github.com/apache/arrow/pull/13230] > [Dev][Archery] Support cherry-picking for major releases > - > > Key: ARROW-16654 > URL: https://issues.apache.org/jira/browse/ARROW-16654 > Project: Apache Arrow > Issue Type: New Feature > Components: Archery, Developer Tools >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases
Krisztian Szucs created ARROW-16654: --- Summary: [Dev][Archery] Support cherry-picking for major releases Key: ARROW-16654 URL: https://issues.apache.org/jira/browse/ARROW-16654 Project: Apache Arrow Issue Type: New Feature Components: Archery, Developer Tools Reporter: Krisztian Szucs Fix For: 9.0.0 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (ARROW-16445) [R] [Doc] Add a short summary for the Installing the Arrow package on Linux article
[ https://issues.apache.org/jira/browse/ARROW-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16445: Fix Version/s: 9.0.0 (was: 8.0.0) > [R] [Doc] Add a short summary for the Installing the Arrow package on Linux > article > --- > > Key: ARROW-16445 > URL: https://issues.apache.org/jira/browse/ARROW-16445 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, R >Reporter: Dragoș Moldovan-Grünfeld >Assignee: Dragoș Moldovan-Grünfeld >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > From [~npr]: "I think [https://arrow.apache.org/docs/r/articles/install.html] > would benefit from a very simple summary at the top: > {{install.packages("arrow")}} just works; there are things you can do to make > it install faster (see below); if for some reason it doesn't work, set the > env var {{{}ARROW_R_DEV=true{}}}, retry, and share the logs with us." -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (ARROW-16327) [Java][CI]: Add support for Java 17 CI process
[ https://issues.apache.org/jira/browse/ARROW-16327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16327: Fix Version/s: 9.0.0 (was: 8.0.0) > [Java][CI]: Add support for Java 17 CI process > -- > > Key: ARROW-16327 > URL: https://issues.apache.org/jira/browse/ARROW-16327 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Affects Versions: 9.0.0 >Reporter: David Dali Susanibar Arce >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently Arrow Java code is tenting with JSE11. > This ticket is to planning/mapping activities involved to also offer support > JS17 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16317) [Archery][CI] Fix possible race condition when submitting crossbow builds
[ https://issues.apache.org/jira/browse/ARROW-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16317. - Resolution: Fixed Issue resolved by pull request 13188 [https://github.com/apache/arrow/pull/13188] > [Archery][CI] Fix possible race condition when submitting crossbow builds > - > > Key: ARROW-16317 > URL: https://issues.apache.org/jira/browse/ARROW-16317 > Project: Apache Arrow > Issue Type: Bug > Components: Archery, Continuous Integration >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Sometimes when trying to use github-actions to submit crossbow jobs an error > is raised like: > {code:java} > Failed to push updated references, potentially because of credential issues: > ['refs/heads/actions-1883-github-wheel-windows-cp310-amd64', > 'refs/tags/actions-1883-github-wheel-windows-cp310-amd64', > 'refs/heads/actions-1883-github-wheel-windows-cp39-amd64', > 'refs/tags/actions-1883-github-wheel-windows-cp39-amd64', > 'refs/heads/actions-1883-github-wheel-windows-cp37-amd64', > 'refs/tags/actions-1883-github-wheel-windows-cp37-amd64', > 'refs/heads/actions-1883-github-wheel-windows-cp38-amd64', > 'refs/tags/actions-1883-github-wheel-windows-cp38-amd64', > 'refs/heads/actions-1883'] > The Archery job run can be found at: > https://github.com/apache/arrow/actions/runs/2195038965{code} > As discussed on this github comment > ([https://github.com/apache/arrow/pull/12930#issuecomment-1103772507)] > We should remove the auto incremented IDs entirely and use unique hashes > instead, e.g.: actions--github-wheel-windows-cp310-amd64 instead > of actions-1883-github-wheel-windows-cp310-amd64. Then we wouldn't need to > fetch the new references either, making remote crossbow builds and local > submission much quicker. > The error can also be seen here: > https://github.com/apache/arrow/pull/12987#issuecomment-1108516668 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16589) [CI][Dev] Make tasks.yml easier to maintain
Krisztian Szucs created ARROW-16589: --- Summary: [CI][Dev] Make tasks.yml easier to maintain Key: ARROW-16589 URL: https://issues.apache.org/jira/browse/ARROW-16589 Project: Apache Arrow Issue Type: New Feature Components: Continuous Integration, Developer Tools Reporter: Krisztian Szucs I think {{dev/tasks/tasks.yml}} has reached its limits by using jinja2 templated yml. We should think about a better way to define crossbow jobs while: - keeping it readable - in a dialect which is natively supported by editors - while supporting tasks parametrization Just one idea is to use python files containing python objects, e.g.: {code} Task( name="wheel-macos-big-sur-cp38-arm64", ci="github", template="python-wheels/github.osx.arm64.yml", params=dict( arch="arm64", arrow_simd_level="DEFAULT", python_version="3.8", macos_deployment_target="11.0" ), artifacts=[ "pyarrow-{no_rc_version}-cp38-cp38-macosx_11_0_arm64.whl" ] ) {code} where {{Task}} would be the crossbow task class (which could be refactored to use pydantic or another alternative for less boilerplate). Of course porting to the tasks definitions to plain python could make the situation even worse by accessing too many scripting utilities. We could try a dynamic config language which sits between yaml and python like HCL. [~kou] what syntax would you be comfortable to work with? Do you have any alternatives we could use? cc [~amol-] [~raulcd] [~assignUser] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning
[ https://issues.apache.org/jira/browse/ARROW-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-16420: Fix Version/s: (was: 8.0.0) > [Python] pq.write_to_dataset always ignores partitioning > > > Key: ARROW-16420 > URL: https://issues.apache.org/jira/browse/ARROW-16420 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 8.0.0 >Reporter: David Li >Assignee: Alenka Frim >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0, 8.0.1 > > Time Spent: 50m > Remaining Estimate: 0h > > The code unconditionally sets {{partitioning}} to None, so the user-supplied > partitioning is ignored. > https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (ARROW-16488) [Archery][DevTools] Allow extra message to be sent on chat report
[ https://issues.apache.org/jira/browse/ARROW-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-16488. - Resolution: Fixed Issue resolved by pull request 13081 [https://github.com/apache/arrow/pull/13081] > [Archery][DevTools] Allow extra message to be sent on chat report > - > > Key: ARROW-16488 > URL: https://issues.apache.org/jira/browse/ARROW-16488 > Project: Apache Arrow > Issue Type: Sub-task > Components: Archery, Developer Tools >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Allow some extra content to be configurable via CLI when sending a > chat-report. > This will allow to slightly customize the message that is sent. -- This message was sent by Atlassian Jira (v8.20.7#820007)