[jira] [Commented] (ARROW-14363) [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods without explicit element type

2022-09-22 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608161#comment-17608161
 ] 

Krisztian Szucs commented on ARROW-14363:
-

Updated it, thanks [~jinshang]!

> [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods 
> without explicit element type
> 
>
> Key: ARROW-14363
> URL: https://issues.apache.org/jira/browse/ARROW-14363
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva
>Reporter: Krisztian Szucs
>Assignee: Jin Shang
>Priority: Major
> Fix For: 10.0.0
>
>
> Added a workaround for the 6.0.0 release in 
> https://github.com/apache/arrow/pull/11448
> The LLVM commit 
> https://reviews.llvm.org/rGf164bc52b61a34f8f95032e1e4fe68bd4eff995f doesn't 
> provide much context about the reason of the deprication.
> The gandiva code should be updated to use the CreateGEP and CreateLoad 
> methods with element types passed explicity.
> cc [~pravindra] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-14363) [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods without explicit element type

2022-09-22 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-14363.
-
Fix Version/s: 10.0.0
   Resolution: Fixed

> [C++][Gandiva] LLVM 13 has deprecated CreateGEP and CreateLoad methods 
> without explicit element type
> 
>
> Key: ARROW-14363
> URL: https://issues.apache.org/jira/browse/ARROW-14363
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva
>Reporter: Krisztian Szucs
>Assignee: Jin Shang
>Priority: Major
> Fix For: 10.0.0
>
>
> Added a workaround for the 6.0.0 release in 
> https://github.com/apache/arrow/pull/11448
> The LLVM commit 
> https://reviews.llvm.org/rGf164bc52b61a34f8f95032e1e4fe68bd4eff995f doesn't 
> provide much context about the reason of the deprication.
> The gandiva code should be updated to use the CreateGEP and CreateLoad 
> methods with element types passed explicity.
> cc [~pravindra] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17294) [Release] Update remove old artifacts release script

2022-08-03 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17294:
---

 Summary: [Release] Update remove old artifacts release script
 Key: ARROW-17294
 URL: https://issues.apache.org/jira/browse/ARROW-17294
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs
 Fix For: 10.0.0


I just executed the remove old artifacts release script which also removed the 
previously created three patch releases for 6.0.2, 7.0.1, 8.0.1. 

That's not desirable since those have just been released so I had to revert to 
an earlier revision.

cc [~kou] [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17253) [Python] pyarrow.array() crashes the interpreter when given a generator that raises while iterating

2022-08-01 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17253:

Fix Version/s: 9.0.1

> [Python] pyarrow.array() crashes the interpreter when given a generator that 
> raises while iterating
> ---
>
> Key: ARROW-17253
> URL: https://issues.apache.org/jira/browse/ARROW-17253
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 8.0.0
>Reporter: Li Jin
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> pa.array((1 // 0 for x in range(10)), size=10){code}
> This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded

2022-07-29 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573183#comment-17573183
 ] 

Krisztian Szucs commented on ARROW-17260:
-

This is the second submission after I uploaded and closed the java release on 
the apache sonatype repo:

https://github.com/apache/arrow/pull/13749#issuecomment-129881

> [Release] Java jars verification pass despite that nothing has been uploaded
> 
>
> Key: ARROW-17260
> URL: https://issues.apache.org/jira/browse/ARROW-17260
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Priority: Major
>
> Build do pass, despite that I forgot to upload the java binaries: 
> https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true
>  
> cc [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17260) [Release] Java jars verification pass despite that nothing has been uploaded

2022-07-29 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17260:
---

 Summary: [Release] Java jars verification pass despite that 
nothing has been uploaded
 Key: ARROW-17260
 URL: https://issues.apache.org/jira/browse/ARROW-17260
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs


Build do pass, despite that I forgot to upload the java binaries: 
https://github.com/ursacomputing/crossbow/runs/7587084181?check_suite_focus=true
 

cc [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17067) Implement Substring_Index

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17067:

Fix Version/s: (was: 9.0.0)

> Implement Substring_Index
> -
>
> Key: ARROW-17067
> URL: https://issues.apache.org/jira/browse/ARROW-17067
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Adding Substring_index Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17067) Implement Substring_Index

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17067:

Fix Version/s: 9.0.0

> Implement Substring_Index
> -
>
> Key: ARROW-17067
> URL: https://issues.apache.org/jira/browse/ARROW-17067
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Adding Substring_index Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17246) [Packaging][deb][RPM] Don't use system jemalloc

2022-07-29 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17246.
-
Resolution: Fixed

Issue resolved by pull request 13739
[https://github.com/apache/arrow/pull/13739]

> [Packaging][deb][RPM] Don't use system jemalloc
> ---
>
> Key: ARROW-17246
> URL: https://issues.apache.org/jira/browse/ARROW-17246
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Because system jemalloc can't be used with {{dlopen()}}. If system jemalloc 
> can't used with {{dlopen()}}, our shared libraried can't be loaded as 
> bindings of script languages such as Ruby:
> {noformat}
> + ruby -r gi -e 'p GI.load('\''Arrow'\'')'
> (null)-WARNING **: Failed to load shared library 'libarrow-glib.so.900' 
> referenced by the typelib: /lib64/libjemalloc.so.2: cannot allocate memory in 
> static TLS block
> {noformat}
> This is caused because system jemalloc isn't built with 
> {{--disable-initial-exec-tls}}. See also:
> * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951704
> * https://github.com/jemalloc/jemalloc/issues/1237



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17238) [Release] Turn off GCS testing during wheel verification

2022-07-28 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17238.
-
Resolution: Fixed

Issue resolved by pull request 13736
[https://github.com/apache/arrow/pull/13736]

> [Release] Turn off GCS testing during wheel verification
> 
>
> Key: ARROW-17238
> URL: https://issues.apache.org/jira/browse/ARROW-17238
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17234) [Release][R] Add r-binary-packages to packaging group

2022-07-28 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17234.
-
Resolution: Fixed

Issue resolved by pull request 13734
[https://github.com/apache/arrow/pull/13734]

> [Release][R] Add r-binary-packages to packaging group
> -
>
> Key: ARROW-17234
> URL: https://issues.apache.org/jira/browse/ARROW-17234
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> r-binary-packages is only in nightly-packaging and missing from the release 
> relevant packaging group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17238) [Release] Turn off GCS testing during wheel verification

2022-07-28 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17238:
---

 Summary: [Release] Turn off GCS testing during wheel verification
 Key: ARROW-17238
 URL: https://issues.apache.org/jira/browse/ARROW-17238
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs
 Fix For: 9.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17238) [Release] Turn off GCS testing during wheel verification

2022-07-28 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-17238:
---

Assignee: Krisztian Szucs

> [Release] Turn off GCS testing during wheel verification
> 
>
> Key: ARROW-17238
> URL: https://issues.apache.org/jira/browse/ARROW-17238
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17237) [Dev][Release] Install wheel test requirements if testing wheels on release verification

2022-07-28 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17237.
-
Fix Version/s: 10.0.0
   (was: 9.0.0)
   Resolution: Fixed

Issue resolved by pull request 13735
[https://github.com/apache/arrow/pull/13735]

> [Dev][Release] Install wheel test requirements if testing wheels on release 
> verification
> 
>
> Key: ARROW-17237
> URL: https://issues.apache.org/jira/browse/ARROW-17237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If we are running the verify release wheel tasks we should install the wheel 
> test requirements or we get the following import errors 
> ([https://github.com/ursacomputing/crossbow/runs/7558071074?check_suite_focus=true)]
>  :
> {code:java}
> + python -m pytest -r s --pyargs pyarrow
> /tmp/arrow-9.0.0.frvqL/venv-wheel-3.8-manylinux_2_17_x86_64.manylinux2014_x86_64/bin/python:
>  No module named pytest
> Failed to verify release candidate. See /tmp/arrow-9.0.0.frvqL for details.
> 1 {code}
> This has been added to the release:
> [https://github.com/apache/arrow/pull/13729/commits/2a91ba91016634478c84f9081702f8e7cada7529]
> but we should backport to master



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17237) [Dev][Release] Install wheel test requirements if testing wheels on release verification

2022-07-28 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17237:

Fix Version/s: 9.0.0
   (was: 10.0.0)

> [Dev][Release] Install wheel test requirements if testing wheels on release 
> verification
> 
>
> Key: ARROW-17237
> URL: https://issues.apache.org/jira/browse/ARROW-17237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If we are running the verify release wheel tasks we should install the wheel 
> test requirements or we get the following import errors 
> ([https://github.com/ursacomputing/crossbow/runs/7558071074?check_suite_focus=true)]
>  :
> {code:java}
> + python -m pytest -r s --pyargs pyarrow
> /tmp/arrow-9.0.0.frvqL/venv-wheel-3.8-manylinux_2_17_x86_64.manylinux2014_x86_64/bin/python:
>  No module named pytest
> Failed to verify release candidate. See /tmp/arrow-9.0.0.frvqL for details.
> 1 {code}
> This has been added to the release:
> [https://github.com/apache/arrow/pull/13729/commits/2a91ba91016634478c84f9081702f8e7cada7529]
> but we should backport to master



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17233) [Crossbow] Outdated artifact patterns for certain linux jobs

2022-07-28 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17233:
---

 Summary: [Crossbow] Outdated artifact patterns for certain linux 
jobs
 Key: ARROW-17233
 URL: https://issues.apache.org/jira/browse/ARROW-17233
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs


almalinux-8-arm64 and almalinux-9-arm64:
{code}
  arrow-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-flight-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow-flight-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
  arrow-flight-sql-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-flight-sql-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow-flight-sql-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
arrow[0-9]+-flight-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow[0-9]+-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow[0-9]+-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-sql-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
arrow[0-9]+-flight-sql-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-sql-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow[0-9]+-flight-sql-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-glib-devel-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-glib-doc-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow9-glib-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK]
 arrow9-glib-libs-9.0.0-1.el8.aarch64.rpm [ OK]
arrow9-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK]
  arrow9-libs-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-python-devel-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-python-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-python-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
  arrow[0-9]+-python-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
{code}


centos-7-amd64
{code}
  arrow-python-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow[0-9]+-python-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
{code}

centos-8-arm64 and centos-9-arm64:
{code}
 arrow-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-flight-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow-flight-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
  arrow-flight-sql-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-flight-sql-glib-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow-flight-sql-glib-doc-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
arrow[0-9]+-flight-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
   arrow[0-9]+-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow[0-9]+-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-sql-glib-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
arrow[0-9]+-flight-sql-glib-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-flight-sql-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow[0-9]+-flight-sql-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
 arrow-glib-devel-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-glib-doc-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow9-glib-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK]
 arrow9-glib-libs-9.0.0-1.el8.aarch64.rpm [ OK]
arrow9-libs-debuginfo-9.0.0-1.el8.aarch64.rpm [ OK]
  arrow9-libs-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-python-devel-9.0.0-1.el8.aarch64.rpm [ OK]
   arrow-python-flight-devel-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
arrow[0-9]+-python-flight-libs-debuginfo-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm 
[MISSING]
  arrow[0-9]+-python-flight-libs-9.0.0-1.[a-z0-9]+.[a-z0-9_]+.rpm [MISSING]
{code}

ubuntu-bionic-amd64 / ubuntu-bionic-arm64:
{code}
libarrow-python-dev_9.0.0-1_[a-z0-9]+.deb [MISSING]
 libarrow-python-flight-dev_9.0.0-1_[a-z0-9]+.deb [MISSING]
 libarrow-python-flight900-dbgsym_9.0.0-1_[a-z0-9]+.d?deb [MISSING]
  libarrow-python-flight900_9.0.0-1_[a-z0-9]+.deb [MISSING]
libarrow-python900-dbgsym_9.0.0-1_

[jira] [Created] (ARROW-17232) [Release] Missing R binary packages

2022-07-28 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17232:
---

 Summary: [Release] Missing R binary packages
 Key: ARROW-17232
 URL: https://issues.apache.org/jira/browse/ARROW-17232
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs


Seems like the binary upload script now expects some R binaries to upload, but 
the {{packaging}} crossbow task group doesn't contain any relevant tasks. 

I assume the {{r-binary-packages}} should be added to the {{packaging}} group. 

cc [~kou][~raulcd][~assignUser]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17188) [R] Update news for 9.0.0

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17188.
-
Resolution: Fixed

Issue resolved by pull request 13726
[https://github.com/apache/arrow/pull/13726]

> [R] Update news for 9.0.0
> -
>
> Key: ARROW-17188
> URL: https://issues.apache.org/jira/browse/ARROW-17188
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Affects Versions: 9.0.0
>Reporter: Will Jones
>Assignee: Will Jones
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17227.
-
Resolution: Fixed

Issue resolved by pull request 13725
[https://github.com/apache/arrow/pull/13725]

> [C++] Extend hash-join unit tests to cover both empty and length=0 batches
> --
>
> Key: ARROW-17227
> URL: https://issues.apache.org/jira/browse/ARROW-17227
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-15938.
-
Resolution: Fixed

Issue resolved by pull request 13686
[https://github.com/apache/arrow/pull/13686]

> [R][C++] Segfault in left join with empty right table when filtered on 
> partition
> 
>
> Key: ARROW-15938
> URL: https://issues.apache.org/jira/browse/ARROW-15938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.2
> Environment: ubuntu linux, R4.1.2
>Reporter: Vitalie Spinu
>Assignee: Weston Pace
>Priority: Critical
>  Labels: pull-request-available, query-engine
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When the right table in a join is empty as a result of a filtering on a 
> partition group the join segfaults:
> {code:java}
>   library(arrow)
>   library(glue)
>   df <- mutate(iris, id = runif(n()))
>   dir <- "./tmp/iris"
>   dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F)
>   dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F)
>   write_parquet(df, glue("{dir}/group=a/part1.parquet"))
>   write_parquet(df, glue("{dir}/group=b/part2.parquet")) 
>  db1 <- open_dataset(dir) %>%
>     filter(group == "blabla")  
> open_dataset(dir) %>%
>     filter(group == "b") %>%
>     select(id) %>%
>     left_join(db1, by = "id") %>%
>     collect()
>   {code}
> {code:java}
> ==24063== Thread 7:
> ==24063== Invalid read of size 1
> ==24063==    at 0x1FFE606D: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE68CC: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, 
> int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE84D5: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, 
> arrow::compute::ExecBatch const&) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE8CB4: 
> arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x200011CF: 
> arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB580E: 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> 
> >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FE2B2A0: 
> std::thread::_State_impl
>  > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x92844BF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29)
> ==24063==    by 0x6DD46DA: start_thread (pthread_create.c:463)
> ==24063==    by 0x710D71E: clone (clone.S:95)
> ==24063==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==24063==  *** caught segfault ***
> address 0x10, cause 'memory not mapped'Traceback:
>  1: Table__from_RecordBatchReader(self)
>  2: tab$read_table()
>  3: do_exec_plan(x)
>  4: doTryCatch(return(expr), name, parentenv, handler)
>  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  6: tryCatchList(expr, classes, parentenv, handlers)
>  7: tryCatch(tab <- do_exec_plan(x), error = function(e) {    
> handle_csv_read_error(e, x$.data$schema)})
>  8: collect.arrow_dplyr_query(.)
>  9: collect(.)
> 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>%     
> left_join(db1, by = "id") %>% collect()Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace {code}
> This is arrow from the current master ece0e23f1. 
> It's worth noting that if the right table is filtered on a non-partitioned 
> variable the problem does not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-17227:
---

Assignee: Krisztian Szucs  (was: Weston Pace)

> [C++] Extend hash-join unit tests to cover both empty and length=0 batches
> --
>
> Key: ARROW-17227
> URL: https://issues.apache.org/jira/browse/ARROW-17227
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-17227:
---

Assignee: Weston Pace  (was: Krisztian Szucs)

> [C++] Extend hash-join unit tests to cover both empty and length=0 batches
> --
>
> Key: ARROW-17227
> URL: https://issues.apache.org/jira/browse/ARROW-17227
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Weston Pace
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-15938:

Fix Version/s: 9.0.0
   (was: 10.0.0)

> [R][C++] Segfault in left join with empty right table when filtered on 
> partition
> 
>
> Key: ARROW-15938
> URL: https://issues.apache.org/jira/browse/ARROW-15938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.2
> Environment: ubuntu linux, R4.1.2
>Reporter: Vitalie Spinu
>Assignee: Weston Pace
>Priority: Critical
>  Labels: pull-request-available, query-engine
> Fix For: 9.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When the right table in a join is empty as a result of a filtering on a 
> partition group the join segfaults:
> {code:java}
>   library(arrow)
>   library(glue)
>   df <- mutate(iris, id = runif(n()))
>   dir <- "./tmp/iris"
>   dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F)
>   dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F)
>   write_parquet(df, glue("{dir}/group=a/part1.parquet"))
>   write_parquet(df, glue("{dir}/group=b/part2.parquet")) 
>  db1 <- open_dataset(dir) %>%
>     filter(group == "blabla")  
> open_dataset(dir) %>%
>     filter(group == "b") %>%
>     select(id) %>%
>     left_join(db1, by = "id") %>%
>     collect()
>   {code}
> {code:java}
> ==24063== Thread 7:
> ==24063== Invalid read of size 1
> ==24063==    at 0x1FFE606D: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE68CC: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, 
> int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE84D5: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, 
> arrow::compute::ExecBatch const&) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE8CB4: 
> arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x200011CF: 
> arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB580E: 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> 
> >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FE2B2A0: 
> std::thread::_State_impl
>  > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x92844BF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29)
> ==24063==    by 0x6DD46DA: start_thread (pthread_create.c:463)
> ==24063==    by 0x710D71E: clone (clone.S:95)
> ==24063==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==24063==  *** caught segfault ***
> address 0x10, cause 'memory not mapped'Traceback:
>  1: Table__from_RecordBatchReader(self)
>  2: tab$read_table()
>  3: do_exec_plan(x)
>  4: doTryCatch(return(expr), name, parentenv, handler)
>  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  6: tryCatchList(expr, classes, parentenv, handlers)
>  7: tryCatch(tab <- do_exec_plan(x), error = function(e) {    
> handle_csv_read_error(e, x$.data$schema)})
>  8: collect.arrow_dplyr_query(.)
>  9: collect(.)
> 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>%     
> left_join(db1, by = "id") %>% collect()Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace {code}
> This is arrow from the current master ece0e23f1. 
> It's worth noting that if the right table is filtered on a non-partitioned 
> variable the problem does not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17227:

Issue Type: Test  (was: Improvement)

> [C++] Extend hash-join unit tests to cover both empty and length=0 batches
> --
>
> Key: ARROW-17227
> URL: https://issues.apache.org/jira/browse/ARROW-17227
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Weston Pace
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17227) [C++] Extend hash-join unit tests to cover both empty and length=0 batches

2022-07-27 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17227:
---

 Summary: [C++] Extend hash-join unit tests to cover both empty and 
length=0 batches
 Key: ARROW-17227
 URL: https://issues.apache.org/jira/browse/ARROW-17227
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
Assignee: Weston Pace
 Fix For: 9.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-15938:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [R][C++] Segfault in left join with empty right table when filtered on 
> partition
> 
>
> Key: ARROW-15938
> URL: https://issues.apache.org/jira/browse/ARROW-15938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.2
> Environment: ubuntu linux, R4.1.2
>Reporter: Vitalie Spinu
>Assignee: Weston Pace
>Priority: Critical
>  Labels: pull-request-available, query-engine
> Fix For: 10.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the right table in a join is empty as a result of a filtering on a 
> partition group the join segfaults:
> {code:java}
>   library(arrow)
>   library(glue)
>   df <- mutate(iris, id = runif(n()))
>   dir <- "./tmp/iris"
>   dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F)
>   dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F)
>   write_parquet(df, glue("{dir}/group=a/part1.parquet"))
>   write_parquet(df, glue("{dir}/group=b/part2.parquet")) 
>  db1 <- open_dataset(dir) %>%
>     filter(group == "blabla")  
> open_dataset(dir) %>%
>     filter(group == "b") %>%
>     select(id) %>%
>     left_join(db1, by = "id") %>%
>     collect()
>   {code}
> {code:java}
> ==24063== Thread 7:
> ==24063== Invalid read of size 1
> ==24063==    at 0x1FFE606D: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE68CC: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, 
> int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE84D5: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, 
> arrow::compute::ExecBatch const&) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE8CB4: 
> arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x200011CF: 
> arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB580E: 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> 
> >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FE2B2A0: 
> std::thread::_State_impl
>  > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x92844BF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29)
> ==24063==    by 0x6DD46DA: start_thread (pthread_create.c:463)
> ==24063==    by 0x710D71E: clone (clone.S:95)
> ==24063==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==24063==  *** caught segfault ***
> address 0x10, cause 'memory not mapped'Traceback:
>  1: Table__from_RecordBatchReader(self)
>  2: tab$read_table()
>  3: do_exec_plan(x)
>  4: doTryCatch(return(expr), name, parentenv, handler)
>  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  6: tryCatchList(expr, classes, parentenv, handlers)
>  7: tryCatch(tab <- do_exec_plan(x), error = function(e) {    
> handle_csv_read_error(e, x$.data$schema)})
>  8: collect.arrow_dplyr_query(.)
>  9: collect(.)
> 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>%     
> left_join(db1, by = "id") %>% collect()Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace {code}
> This is arrow from the current master ece0e23f1. 
> It's worth noting that if the right table is filtered on a non-partitioned 
> variable the problem does not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15938) [R][C++] Segfault in left join with empty right table when filtered on partition

2022-07-27 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571970#comment-17571970
 ] 

Krisztian Szucs commented on ARROW-15938:
-

Postponing to 10.0.

> [R][C++] Segfault in left join with empty right table when filtered on 
> partition
> 
>
> Key: ARROW-15938
> URL: https://issues.apache.org/jira/browse/ARROW-15938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.2
> Environment: ubuntu linux, R4.1.2
>Reporter: Vitalie Spinu
>Assignee: Weston Pace
>Priority: Critical
>  Labels: pull-request-available, query-engine
> Fix For: 9.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the right table in a join is empty as a result of a filtering on a 
> partition group the join segfaults:
> {code:java}
>   library(arrow)
>   library(glue)
>   df <- mutate(iris, id = runif(n()))
>   dir <- "./tmp/iris"
>   dir.create(glue("{dir}/group=a/"), recursive = T, showWarnings = F)
>   dir.create(glue("{dir}/group=b/"), recursive = T, showWarnings = F)
>   write_parquet(df, glue("{dir}/group=a/part1.parquet"))
>   write_parquet(df, glue("{dir}/group=b/part2.parquet")) 
>  db1 <- open_dataset(dir) %>%
>     filter(group == "blabla")  
> open_dataset(dir) %>%
>     filter(group == "b") %>%
>     select(id) %>%
>     left_join(db1, by = "id") %>%
>     collect()
>   {code}
> {code:java}
> ==24063== Thread 7:
> ==24063== Invalid read of size 1
> ==24063==    at 0x1FFE606D: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(long, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*, 
> arrow::compute::ExecBatch*, arrow::compute::ExecBatch*) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE68CC: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch_OutputOne(unsigned long, long, 
> int const*, int const*) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE84D5: 
> arrow::compute::HashJoinBasicImpl::ProbeBatch(unsigned long, 
> arrow::compute::ExecBatch const&) (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFE8CB4: 
> arrow::compute::HashJoinBasicImpl::InputReceived(unsigned long, int, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x200011CF: 
> arrow::compute::HashJoinNode::InputReceived(arrow::compute::ExecNode*, 
> arrow::compute::ExecBatch) (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB580E: 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#1}::operator()() const (in 
> /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FFB6444: arrow::internal::FnOnce ()>::FnImpl (arrow::Future, 
> arrow::compute::MapNode::SubmitTask(std::function
>  (arrow::compute::ExecBatch)>, 
> arrow::compute::ExecBatch)::{lambda()#2}::operator()() const::{lambda()#1})> 
> >::invoke() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x1FE2B2A0: 
> std::thread::_State_impl
>  > >::_M_run() (in /home/vspinu/bin/arrow/lib/libarrow.so.800.0.0)
> ==24063==    by 0x92844BF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29)
> ==24063==    by 0x6DD46DA: start_thread (pthread_create.c:463)
> ==24063==    by 0x710D71E: clone (clone.S:95)
> ==24063==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==24063==  *** caught segfault ***
> address 0x10, cause 'memory not mapped'Traceback:
>  1: Table__from_RecordBatchReader(self)
>  2: tab$read_table()
>  3: do_exec_plan(x)
>  4: doTryCatch(return(expr), name, parentenv, handler)
>  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  6: tryCatchList(expr, classes, parentenv, handlers)
>  7: tryCatch(tab <- do_exec_plan(x), error = function(e) {    
> handle_csv_read_error(e, x$.data$schema)})
>  8: collect.arrow_dplyr_query(.)
>  9: collect(.)
> 10: open_dataset(dir) %>% filter(group == "b") %>% select(id) %>%     
> left_join(db1, by = "id") %>% collect()Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace {code}
> This is arrow from the current master ece0e23f1. 
> It's worth noting that if the right table is filtered on a non-partitioned 
> variable the problem does not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16276) [R] Release News

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16276:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [R] Release News
> 
>
> Key: ARROW-16276
> URL: https://issues.apache.org/jira/browse/ARROW-16276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Will Jones
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I typically use a command like:
> {code}
> git log fcab481 --grep=".*\[R\].*" --format="%s"
> {code}
> Which will find all the commits with {{[R]}}, since commit fcab481. I found 
> commit fcab481 by going to the 7.0.0 release branch and then finding the last 
> commit that is in the master branch as well as the 7.0.0 release. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16035) [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty result set

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16035:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty 
> result set
> --
>
> Key: ARROW-16035
> URL: https://issues.apache.org/jira/browse/ARROW-16035
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 7.0.0
>Reporter: Jonathan Swenson
>Assignee: Todd Farmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Using an ArrowVectorIterator built from a JDBC Result Set that is empty 
> causes the iterator to never terminate. 
> {code:java}
> ArrowVectorIterator iterator =
> JdbcToArrow.sqlToArrowVectorIterator(conn.createStatement()
> .executeQuery("select 1 from table1 where false"), config); {code}
>  
> It appears as though this is due to the implementation of the 
> [hasNext()|https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L158]
>  method.
> The expectation is that the `isAfterLast()` method on a JDBC result set 
> return true when the result set is empty. However, according to the [JDBC 
> documentation|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()]
>  it will always return false when the result set is empty. 
> {quote}Returns:{{{}true{}}} if the cursor is after the last row; {{false}} if 
> the cursor is at any other position or the result set contains no rows
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-15568) [C++][Gandiva] Implement Translate Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-15568.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

> [C++][Gandiva] Implement Translate Function
> ---
>
> Key: ARROW-15568
> URL: https://issues.apache.org/jira/browse/ARROW-15568
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Vinicius Souza Roque
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Translates the input string by replacing the characters present in the 
> {{from}} string with the corresponding characters in the {{to}} string. This 
> is similar to the {{translate}} function in 
> [PostgreSQL|http://www.postgresql.org/docs/9.1/interactive/functions-string.html].
>  If any of the parameters to this UDF are NULL, the result is NULL as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15568) [C++][Gandiva] Implement Translate Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-15568:
---

Assignee: Vinicius Souza Roque

> [C++][Gandiva] Implement Translate Function
> ---
>
> Key: ARROW-15568
> URL: https://issues.apache.org/jira/browse/ARROW-15568
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Vinicius Souza Roque
>Assignee: Vinicius Souza Roque
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Translates the input string by replacing the characters present in the 
> {{from}} string with the corresponding characters in the {{to}} string. This 
> is similar to the {{translate}} function in 
> [PostgreSQL|http://www.postgresql.org/docs/9.1/interactive/functions-string.html].
>  If any of the parameters to this UDF are NULL, the result is NULL as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-17035) [C++][Gandiva] Add Ceil Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-17035:
-

> [C++][Gandiva] Add Ceil Function
> 
>
> Key: ARROW-17035
> URL: https://issues.apache.org/jira/browse/ARROW-17035
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implementing Ceil Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17035) [C++][Gandiva] Add Ceil Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17035.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

> [C++][Gandiva] Add Ceil Function
> 
>
> Key: ARROW-17035
> URL: https://issues.apache.org/jira/browse/ARROW-17035
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implementing Ceil Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16784) [C++][Gandiva] Add alias to Upper and Lower

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16784.
-
Resolution: Fixed

> [C++][Gandiva] Add alias to Upper and Lower
> ---
>
> Key: ARROW-16784
> URL: https://issues.apache.org/jira/browse/ARROW-16784
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Vinicius Souza Roque
>Assignee: Vinicius Souza Roque
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding alias to functions Upper and Lower



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16784) [C++][Gandiva] Add alias to Upper and Lower

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16784:

Fix Version/s: 9.0.0

> [C++][Gandiva] Add alias to Upper and Lower
> ---
>
> Key: ARROW-16784
> URL: https://issues.apache.org/jira/browse/ARROW-16784
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Vinicius Souza Roque
>Assignee: Vinicius Souza Roque
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding alias to functions Upper and Lower



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16455) [CI] [Packaging] Anaconda storage size exceeded for linux-ppc64le

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16455:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [CI] [Packaging] Anaconda storage size exceeded for linux-ppc64le 
> --
>
> Key: ARROW-16455
> URL: https://issues.apache.org/jira/browse/ARROW-16455
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Packaging
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Our Anaconda storage size for nightlies is exceeded:
> {code:java}
> "[ERROR] ('Storage requirements exceeded (3221225472 bytes). Payment is 
> required to add a file. Please go to 
> https://anaconda.org/binstar.settings/billing to update your plan', 402)" 
> {code}
> It seems we forgot to add *linux-ppc64le* to the architectures list on this 
> fix: [https://github.com/apache/arrow/pull/12604]
> See original issue: https://issues.apache.org/jira/browse/ARROW-15898



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17140) Adding Floor Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17140:

Fix Version/s: 9.0.0

> Adding Floor Function
> -
>
> Key: ARROW-17140
> URL: https://issues.apache.org/jira/browse/ARROW-17140
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding Floor Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17140) Adding Floor Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17140.
-
Resolution: Fixed

> Adding Floor Function
> -
>
> Key: ARROW-17140
> URL: https://issues.apache.org/jira/browse/ARROW-17140
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding Floor Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-17140) Adding Floor Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-17140:
-

> Adding Floor Function
> -
>
> Key: ARROW-17140
> URL: https://issues.apache.org/jira/browse/ARROW-17140
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding Floor Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17140) Adding Floor Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-17140:
---

Assignee: Sahaj Gupta

> Adding Floor Function
> -
>
> Key: ARROW-17140
> URL: https://issues.apache.org/jira/browse/ARROW-17140
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding Floor Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16413) [Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16413:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem
> ---
>
> Key: ARROW-16413
> URL: https://issues.apache.org/jira/browse/ARROW-16413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> See https://github.com/dask/dask/pull/8993 for details. 
> When using an fsspec filesystem (or maybe more generally a PyFileSystem), 
> inspecting a file through the FileFormat.inspect is hanging (this eg happens 
> in ParquetDatasetFactory)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16881:

Fix Version/s: 9.0.0

> [Gandiva][C++] Fix castINTERVALYEAR implementation
> --
>
> Key: ARROW-16881
> URL: https://issues.apache.org/jira/browse/ARROW-16881
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Fix error in LLVM where didn't find this function.
> Fix regex to allow negative digits for Interval Year.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-16881:
-

> [Gandiva][C++] Fix castINTERVALYEAR implementation
> --
>
> Key: ARROW-16881
> URL: https://issues.apache.org/jira/browse/ARROW-16881
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Fix error in LLVM where didn't find this function.
> Fix regex to allow negative digits for Interval Year.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16881) [Gandiva][C++] Fix castINTERVALYEAR implementation

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16881.
-
Resolution: Fixed

> [Gandiva][C++] Fix castINTERVALYEAR implementation
> --
>
> Key: ARROW-16881
> URL: https://issues.apache.org/jira/browse/ARROW-16881
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Fix error in LLVM where didn't find this function.
> Fix regex to allow negative digits for Interval Year.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16442) [Python] The fragments for ORC dataset return base Fragment instead of FileFragment

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16442:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [Python] The fragments for ORC dataset return base Fragment instead of 
> FileFragment
> ---
>
> Key: ARROW-16442
> URL: https://issues.apache.org/jira/browse/ARROW-16442
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: dataset, dataset-dask-integration, pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From https://github.com/dask/dask/pull/8944#issuecomment-1112620037
> For the ORC file format, we return base {{Fragment}} objects instead of the 
> {{FileFragment}} subclass (which has more functionality):
> {code:python}
> import pyarrow as pa
> import pyarrow.dataset as ds
> from pyarrow import orc
> table = pa.table({'a': [1, 2, 3]})
> orc.write_table(table, "test.orc")
> dataset = ds.dataset("test.orc", format="orc")
> fragment = list(dataset.get_fragments())[0]
> {code}
> {code}
> In [9]: fragment
> Out[9]: 
> In [10]: fragment.path
> ---
> AttributeErrorTraceback (most recent call last)
>  in 
> > 1 fragment.path
> AttributeError: 'pyarrow._dataset.Fragment' object has no attribute 'path'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-17036) [C++][Gandiva] Add sign Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-17036:
-

> [C++][Gandiva] Add sign Function
> 
>
> Key: ARROW-17036
> URL: https://issues.apache.org/jira/browse/ARROW-17036
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Implementing Sign Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-15661:
-

> [Gandiva][C++] Add Mask_Hash function
> -
>
> Key: ARROW-15661
> URL: https://issues.apache.org/jira/browse/ARROW-15661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Returns a hashed value based on str. The hash is consistent and can be used 
> to join masked values together across tables. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-15661.
-
Resolution: Fixed

> [Gandiva][C++] Add Mask_Hash function
> -
>
> Key: ARROW-15661
> URL: https://issues.apache.org/jira/browse/ARROW-15661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Returns a hashed value based on str. The hash is consistent and can be used 
> to join masked values together across tables. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17036) [C++][Gandiva] Add sign Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17036:

Fix Version/s: 9.0.0

> [C++][Gandiva] Add sign Function
> 
>
> Key: ARROW-17036
> URL: https://issues.apache.org/jira/browse/ARROW-17036
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Implementing Sign Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15661) [Gandiva][C++] Add Mask_Hash function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-15661:

Fix Version/s: 9.0.0

> [Gandiva][C++] Add Mask_Hash function
> -
>
> Key: ARROW-15661
> URL: https://issues.apache.org/jira/browse/ARROW-15661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Johnnathan Rodrigo Pego de Almeida
>Assignee: Johnnathan Rodrigo Pego de Almeida
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Returns a hashed value based on str. The hash is consistent and can be used 
> to join masked values together across tables. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17036) [C++][Gandiva] Add sign Function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17036.
-
Resolution: Fixed

> [C++][Gandiva] Add sign Function
> 
>
> Key: ARROW-17036
> URL: https://issues.apache.org/jira/browse/ARROW-17036
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Sahaj Gupta
>Assignee: Sahaj Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Implementing Sign Function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17070.
-
Resolution: Fixed

> [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
> --
>
> Key: ARROW-17070
> URL: https://issues.apache.org/jira/browse/ARROW-17070
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding functions to Gandiva:
> mask_show_first_n(string, int)
> mask_show_last_n(string, int)
> 'Masking' according to Hive specification
> (a-z : x, A-Z : X, 0-9 : n)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17070:

Fix Version/s: 9.0.0

> [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
> --
>
> Key: ARROW-17070
> URL: https://issues.apache.org/jira/browse/ARROW-17070
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding functions to Gandiva:
> mask_show_first_n(string, int)
> mask_show_last_n(string, int)
> 'Masking' according to Hive specification
> (a-z : x, A-Z : X, 0-9 : n)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-17070) [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-17070:
-

> [Gandiva][C++] Adding mask-show-first-n and mask-show-last-n functions
> --
>
> Key: ARROW-17070
> URL: https://issues.apache.org/jira/browse/ARROW-17070
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding functions to Gandiva:
> mask_show_first_n(string, int)
> mask_show_last_n(string, int)
> 'Masking' according to Hive specification
> (a-z : x, A-Z : X, 0-9 : n)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17121) [Gandiva][C++] Adding mask function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17121.
-
Resolution: Fixed

> [Gandiva][C++] Adding mask function
> ---
>
> Key: ARROW-17121
> URL: https://issues.apache.org/jira/browse/ARROW-17121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to 
> Gandiva.
> With default masking upper case letters as 'X', lower case letters as 'x' and 
> numbers as 'n'.
> Custom masking as optionally specified in parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (ARROW-17121) [Gandiva][C++] Adding mask function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reopened ARROW-17121:
-

> [Gandiva][C++] Adding mask function
> ---
>
> Key: ARROW-17121
> URL: https://issues.apache.org/jira/browse/ARROW-17121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to 
> Gandiva.
> With default masking upper case letters as 'X', lower case letters as 'x' and 
> numbers as 'n'.
> Custom masking as optionally specified in parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17121) [Gandiva][C++] Adding mask function

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17121:

Fix Version/s: 9.0.0

> [Gandiva][C++] Adding mask function
> ---
>
> Key: ARROW-17121
> URL: https://issues.apache.org/jira/browse/ARROW-17121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Palak Pariawala
>Assignee: Palak Pariawala
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add mask(str inp[, str uc-mask[, str lc-mask[, str num-mask]]]) function to 
> Gandiva.
> With default masking upper case letters as 'X', lower case letters as 'x' and 
> numbers as 'n'.
> Custom masking as optionally specified in parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16897:

Affects Version/s: 9.0.0

> [R][C++] Full join on Arrow objects is incorrect
> 
>
> Key: ARROW-16897
> URL: https://issues.apache.org/jira/browse/ARROW-16897
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 8.0.0, 9.0.0
> Environment: Linux
>Reporter: Oliver Reiter
>Assignee: Weston Pace
>Priority: Critical
>  Labels: joins, query-engine
> Fix For: 10.0.0
>
>
> Hello,
> I am trying to do a full join on a dataset. It produces the correct number of 
> observations, but not the correct result (the resulting data.frame is just 
> filled up with NA-rows).
> My use case: I want to include the 'full' year range for every factor value:
> {code:java}
> library(data.table)
> library(arrow)
> library(dplyr)
> year_range <- 2000:2019
> group_n <- 100
> N <- 1000 ## the resulting data should have 100 groups * 20 years
> dt <- data.table(value = rnorm(N),
>                  group = rep(paste0("g", 1:group_n), length.out = N))
> ## there are only observations for some years in every group
> dt[, year := sample(year_range, size = N / group_n), by = .(group)]
> dt[group == "g1", ]
> ## this would be the 'full' data.table
> group_years <- data.table(group = rep(unique(dt$group), each = 20),
>                           year = rep(year_range, times = 10))
> group_years[group == "g1", ]
> write_dataset(dt, path = "parquet_db")
> db <- open_dataset(sources = "parquet_db")
> ## full_join using data.table -> expected result
> db_full <- merge(dt, group_years,
>                  by = c("group", "year"),
>                  all = TRUE)
> setorder(db_full, group, year)
> db_full[group == "g1", ]
> ## try to do the full_join with arrow -> incorrect result
> db_full_arrow <- db |>
>   full_join(group_years, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]
> ## or: convert data.table to arrow_table beforehand -> incorrect result
> group_years_arrow <- group_years |>
>   as_arrow_table()
> db_full_arrow <- db |>
>   full_join(group_years_arrow, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]{code}
> The [documentation|https://arrow.apache.org/docs/r/] says equality joins are 
> supported, which should hold also for `full_join` I guess?
> Thanks for your time and work!
>  
> Oliver



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16897:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [R][C++] Full join on Arrow objects is incorrect
> 
>
> Key: ARROW-16897
> URL: https://issues.apache.org/jira/browse/ARROW-16897
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 8.0.0
> Environment: Linux
>Reporter: Oliver Reiter
>Assignee: Weston Pace
>Priority: Critical
>  Labels: joins, query-engine
> Fix For: 10.0.0
>
>
> Hello,
> I am trying to do a full join on a dataset. It produces the correct number of 
> observations, but not the correct result (the resulting data.frame is just 
> filled up with NA-rows).
> My use case: I want to include the 'full' year range for every factor value:
> {code:java}
> library(data.table)
> library(arrow)
> library(dplyr)
> year_range <- 2000:2019
> group_n <- 100
> N <- 1000 ## the resulting data should have 100 groups * 20 years
> dt <- data.table(value = rnorm(N),
>                  group = rep(paste0("g", 1:group_n), length.out = N))
> ## there are only observations for some years in every group
> dt[, year := sample(year_range, size = N / group_n), by = .(group)]
> dt[group == "g1", ]
> ## this would be the 'full' data.table
> group_years <- data.table(group = rep(unique(dt$group), each = 20),
>                           year = rep(year_range, times = 10))
> group_years[group == "g1", ]
> write_dataset(dt, path = "parquet_db")
> db <- open_dataset(sources = "parquet_db")
> ## full_join using data.table -> expected result
> db_full <- merge(dt, group_years,
>                  by = c("group", "year"),
>                  all = TRUE)
> setorder(db_full, group, year)
> db_full[group == "g1", ]
> ## try to do the full_join with arrow -> incorrect result
> db_full_arrow <- db |>
>   full_join(group_years, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]
> ## or: convert data.table to arrow_table beforehand -> incorrect result
> group_years_arrow <- group_years |>
>   as_arrow_table()
> db_full_arrow <- db |>
>   full_join(group_years_arrow, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]{code}
> The [documentation|https://arrow.apache.org/docs/r/] says equality joins are 
> supported, which should hold also for `full_join` I guess?
> Thanks for your time and work!
>  
> Oliver



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16897) [R][C++] Full join on Arrow objects is incorrect

2022-07-27 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571887#comment-17571887
 ] 

Krisztian Szucs commented on ARROW-16897:
-

Postponing to 10.0 since it depends on several other unresolved issues.

> [R][C++] Full join on Arrow objects is incorrect
> 
>
> Key: ARROW-16897
> URL: https://issues.apache.org/jira/browse/ARROW-16897
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 8.0.0
> Environment: Linux
>Reporter: Oliver Reiter
>Assignee: Weston Pace
>Priority: Critical
>  Labels: joins, query-engine
> Fix For: 9.0.0
>
>
> Hello,
> I am trying to do a full join on a dataset. It produces the correct number of 
> observations, but not the correct result (the resulting data.frame is just 
> filled up with NA-rows).
> My use case: I want to include the 'full' year range for every factor value:
> {code:java}
> library(data.table)
> library(arrow)
> library(dplyr)
> year_range <- 2000:2019
> group_n <- 100
> N <- 1000 ## the resulting data should have 100 groups * 20 years
> dt <- data.table(value = rnorm(N),
>                  group = rep(paste0("g", 1:group_n), length.out = N))
> ## there are only observations for some years in every group
> dt[, year := sample(year_range, size = N / group_n), by = .(group)]
> dt[group == "g1", ]
> ## this would be the 'full' data.table
> group_years <- data.table(group = rep(unique(dt$group), each = 20),
>                           year = rep(year_range, times = 10))
> group_years[group == "g1", ]
> write_dataset(dt, path = "parquet_db")
> db <- open_dataset(sources = "parquet_db")
> ## full_join using data.table -> expected result
> db_full <- merge(dt, group_years,
>                  by = c("group", "year"),
>                  all = TRUE)
> setorder(db_full, group, year)
> db_full[group == "g1", ]
> ## try to do the full_join with arrow -> incorrect result
> db_full_arrow <- db |>
>   full_join(group_years, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]
> ## or: convert data.table to arrow_table beforehand -> incorrect result
> group_years_arrow <- group_years |>
>   as_arrow_table()
> db_full_arrow <- db |>
>   full_join(group_years_arrow, by = c("group", "year")) |>
>   collect() |>
>   setDT()
> setorder(db_full_arrow, group, year)
> db_full_arrow[group == "g1", ]{code}
> The [documentation|https://arrow.apache.org/docs/r/] says equality joins are 
> supported, which should hold also for `full_join` I guess?
> Thanks for your time and work!
>  
> Oliver



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17206) [R] Skip test to fix snappy sanitizer issue

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17206.
-
Resolution: Fixed

Issue resolved by pull request 13704
[https://github.com/apache/arrow/pull/13704]

> [R] Skip test to fix snappy sanitizer issue
> ---
>
> Key: ARROW-17206
> URL: https://issues.apache.org/jira/browse/ARROW-17206
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Known bug with snappy in a new test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17211) [Java] Fix java-jar nightly on gh & self-hosted runners

2022-07-27 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-17211.
-
Resolution: Fixed

Issue resolved by pull request 13712
[https://github.com/apache/arrow/pull/13712]

> [Java] Fix java-jar nightly on gh & self-hosted runners
> ---
>
> Key: ARROW-17211
> URL: https://issues.apache.org/jira/browse/ARROW-17211
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [ARROW-16943] added clean up to {{java_full_build.sh}} to fix issues with 
> multiple jars when the job was run on a self-hosted (aka non-ephemeral) 
> runner. This does fails when {{.~/m2}} does not exists.
> I marked this as a blocker because this prevents us from building the release 
> Jars.
> cc: [~kszucs] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17051) [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN

2022-07-26 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17051:

Fix Version/s: 9.0.0
   (was: 10.0.0)

> [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN
> -
>
> Key: ARROW-17051
> URL: https://issues.apache.org/jira/browse/ARROW-17051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Raúl Cumplido
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The CI job for ASAN UBSAN is based on Ubuntu 20.04: *C++ / AMD64 Ubuntu 20.04 
> C++ ASAN UBSAN*  
> Trying to build Flight and Flight SQL on Ubuntu 20.04 the job for ASAN UBSAN 
> will also build with Flight and Flight SQL. This triggers some 
> arrow-flight-sql-test failures like:
> {code:java}
>   [ RUN      ] TestFlightSqlClient.TestGetDbSchemas
> unknown file: Failure
> Unexpected mock function call - taking default action specified at:
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:151:
>     Function call: GetFlightInfo(@0x6157d948 184-byte object <00-00 00-00 
> 00-00 F0-BF 40-00 00-00 00-00 00-00 80-4C 06-49 CF-7F 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 01-01 00-00 00-00 00-00 
> 00-20 00-00 00-00 00-00 ... 01-00 00-04 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 
> @0x7fff35794e80 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 
> 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>)
>           Returns: (nullptr)
> Google Mock tried the following 1 expectation, but it didn't match:
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: EXPECT_CALL(sql_client_, 
> GetFlightInfo(Ref(call_options_), descriptor))...
>   Expected arg #1: is equal to 64-byte object <02-00 00-00 BE-BE BE-BE C0-6B 
> 05-00 C0-60 00-00 73-00 00-00 00-00 00-00 73-00 00-00 00-00 00-00 BE-BE BE-BE 
> BE-BE BE-BE 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00>
>            Actual: 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 
> 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: Failure
> Actual function call count doesn't match EXPECT_CALL(sql_client_, 
> GetFlightInfo(Ref(call_options_), descriptor))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> [  FAILED  ] TestFlightSqlClient.TestGetDbSchemas (1 ms){code}
> The error can be seen here: 
> [https://github.com/apache/arrow/runs/7297442828?check_suite_focus=true]
> This is the initial PR that triggered it:
> [https://github.com/apache/arrow/pull/13548]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-15678:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571097#comment-17571097
 ] 

Krisztian Szucs commented on ARROW-15678:
-

Postponing to 10.0 for now.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16887) [Doc][R] Document GCSFileSystem for R package

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16887.
-
Resolution: Fixed

Issue resolved by pull request 13601
[https://github.com/apache/arrow/pull/13601]

> [Doc][R] Document GCSFileSystem for R package
> -
>
> Key: ARROW-16887
> URL: https://issues.apache.org/jira/browse/ARROW-16887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Will Jones
>Assignee: Will Jones
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should update the [cloud storage 
> vignette|https://arrow.apache.org/docs/r/articles/fs.html] and the filesystem 
> RD to show configuration and usage of GCSFileSystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570779#comment-17570779
 ] 

Krisztian Szucs commented on ARROW-16919:
-

That looks like quite a journey :) 

Thanks [~lidavidm] for figuring it out!

> [C++] Flight integration tests fail on verify rc nightly on linux amd64
> ---
>
> Key: ARROW-16919
> URL: https://issues.apache.org/jira/browse/ARROW-16919
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, FlightRPC
>Reporter: Raúl Cumplido
>Priority: Critical
>  Labels: Nightly, pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some of our nightly builds to verify the release are failing:
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-almalinux-8-amd64|https://github.com/ursacomputing/crossbow/runs/7073206980?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-18.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073217433?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-20.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073210299?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-22.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073273051?check_suite_focus=true]
> with the following:
> {code:java}
>  # FAILURES #
> FAILED TEST: middleware C++ producing,  C++ consuming
> 1 failures
>   File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
>     output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
>   File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
>     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>   File "/usr/lib/python3.8/subprocess.py", line 512, in run
>     raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command 
> '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', 
> '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with 
> .
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/runner.py", line 379, in 
> _run_flight_test_case
>     consumer.flight_request(port, **client_args)
>   File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 134, in 
> flight_request
>     run_cmd(cmd)
>   File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd
>     raise RuntimeError(sio.getvalue())
> RuntimeError: Command failed: 
> /tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client -host 
> localhost -port=36719 -scenario middleware
> With output:
> --
> Headers received successfully on failing call.
> Headers received successfully on passing call.
> free(): double free detected in tcache 2 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17051) [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-17051:

Fix Version/s: 9.0.0

> [C++][Flight] arrow-flight-sql-test fails with ASAN UBSAN
> -
>
> Key: ARROW-17051
> URL: https://issues.apache.org/jira/browse/ARROW-17051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The CI job for ASAN UBSAN is based on Ubuntu 20.04: *C++ / AMD64 Ubuntu 20.04 
> C++ ASAN UBSAN*  
> Trying to build Flight and Flight SQL on Ubuntu 20.04 the job for ASAN UBSAN 
> will also build with Flight and Flight SQL. This triggers some 
> arrow-flight-sql-test failures like:
> {code:java}
>   [ RUN      ] TestFlightSqlClient.TestGetDbSchemas
> unknown file: Failure
> Unexpected mock function call - taking default action specified at:
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:151:
>     Function call: GetFlightInfo(@0x6157d948 184-byte object <00-00 00-00 
> 00-00 F0-BF 40-00 00-00 00-00 00-00 80-4C 06-49 CF-7F 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 01-01 00-00 00-00 00-00 
> 00-20 00-00 00-00 00-00 ... 01-00 00-04 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 
> @0x7fff35794e80 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 
> 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>)
>           Returns: (nullptr)
> Google Mock tried the following 1 expectation, but it didn't match:
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: EXPECT_CALL(sql_client_, 
> GetFlightInfo(Ref(call_options_), descriptor))...
>   Expected arg #1: is equal to 64-byte object <02-00 00-00 BE-BE BE-BE C0-6B 
> 05-00 C0-60 00-00 73-00 00-00 00-00 00-00 73-00 00-00 00-00 00-00 BE-BE BE-BE 
> BE-BE BE-BE 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00>
>            Actual: 64-byte object <02-00 00-00 00-00 00-00 C0-45 08-00 B0-60 
> 00-00 65-00 00-00 00-00 00-00 65-00 00-00 00-00 00-00 C4-A9 AE-66 00-10 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> /arrow/cpp/src/arrow/flight/sql/client_test.cc:152: Failure
> Actual function call count doesn't match EXPECT_CALL(sql_client_, 
> GetFlightInfo(Ref(call_options_), descriptor))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> [  FAILED  ] TestFlightSqlClient.TestGetDbSchemas (1 ms){code}
> The error can be seen here: 
> [https://github.com/apache/arrow/runs/7297442828?check_suite_focus=true]
> This is the initial PR that triggered it:
> [https://github.com/apache/arrow/pull/13548]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16919:

Fix Version/s: 9.0.0

> [C++] Flight integration tests fail on verify rc nightly on linux amd64
> ---
>
> Key: ARROW-16919
> URL: https://issues.apache.org/jira/browse/ARROW-16919
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, FlightRPC
>Reporter: Raúl Cumplido
>Priority: Critical
>  Labels: Nightly, pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some of our nightly builds to verify the release are failing:
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-almalinux-8-amd64|https://github.com/ursacomputing/crossbow/runs/7073206980?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-18.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073217433?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-20.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073210299?check_suite_focus=true]
> {color:#1d1c1d}- 
> {color}[verify-rc-source-integration-linux-ubuntu-22.04-amd64|https://github.com/ursacomputing/crossbow/runs/7073273051?check_suite_focus=true]
> with the following:
> {code:java}
>  # FAILURES #
> FAILED TEST: middleware C++ producing,  C++ consuming
> 1 failures
>   File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
>     output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
>   File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
>     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>   File "/usr/lib/python3.8/subprocess.py", line 512, in run
>     raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command 
> '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', 
> '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with 
> .
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/arrow/dev/archery/archery/integration/runner.py", line 379, in 
> _run_flight_test_case
>     consumer.flight_request(port, **client_args)
>   File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 134, in 
> flight_request
>     run_cmd(cmd)
>   File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd
>     raise RuntimeError(sio.getvalue())
> RuntimeError: Command failed: 
> /tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client -host 
> localhost -port=36719 -scenario middleware
> With output:
> --
> Headers received successfully on failing call.
> Headers received successfully on passing call.
> free(): double free detected in tcache 2 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570777#comment-17570777
 ] 

Krisztian Szucs commented on ARROW-15678:
-

[~jonkeane] can you give an update on this issue?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-14314) [C++] Sorting dictionary array not implemented

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-14314:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [C++] Sorting dictionary array not implemented
> --
>
> Key: ARROW-14314
> URL: https://issues.apache.org/jira/browse/ARROW-14314
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Ariana Villegas
>Priority: Major
>  Labels: kernel, pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> From R, taking the stock {{mtcars}} dataset and giving it a dictionary type 
> column:
> {code}
> mtcars %>% 
>   mutate(cyl = as.factor(cyl)) %>% 
>   Table$create() %>% 
>   arrange(cyl) %>% 
>   collect()
> Error: Type error: Sorting not supported for type dictionary indices=int8, ordered=0>
> ../src/arrow/compute/kernels/vector_array_sort.cc:427  VisitTypeInline(type, 
> this)
> ../src/arrow/compute/kernels/vector_sort.cc:148  
> GetArraySorter(*physical_type_)
> ../src/arrow/compute/kernels/vector_sort.cc:1206  sorter.Sort()
> ../src/arrow/compute/api_vector.cc:259  CallFunction("sort_indices", {datum}, 
> &options, ctx)
> ../src/arrow/compute/exec/order_by_impl.cc:53  SortIndices(table, options_, 
> ctx_)
> ../src/arrow/compute/exec/sink_node.cc:292  impl_->DoFinish()
> ../src/arrow/compute/exec/exec_plan.cc:297  iterator_.Next()
> ../src/arrow/record_batch.cc:318  ReadNext(&batch)
> ../src/arrow/record_batch.cc:329  ReadAll(&batches)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16817) [C++][Python] Segfaults for unsupported datatypes in the ORC writer

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16817:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [C++][Python] Segfaults for unsupported datatypes in the ORC writer
> ---
>
> Key: ARROW-16817
> URL: https://issues.apache.org/jira/browse/ARROW-16817
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Ian Alexander Joiner
>Assignee: Ian Alexander Joiner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In the ORC writer if a table has at least a column with unsupported datatype 
> segfaults occur when we try to write them in ORC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16817) [C++][Python] Segfaults for unsupported datatypes in the ORC writer

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570765#comment-17570765
 ] 

Krisztian Szucs commented on ARROW-16817:
-

Postponing to 10.0.

> [C++][Python] Segfaults for unsupported datatypes in the ORC writer
> ---
>
> Key: ARROW-16817
> URL: https://issues.apache.org/jira/browse/ARROW-16817
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Ian Alexander Joiner
>Assignee: Ian Alexander Joiner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In the ORC writer if a table has at least a column with unsupported datatype 
> segfaults occur when we try to write them in ORC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16616) [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570764#comment-17570764
 ] 

Krisztian Szucs commented on ARROW-16616:
-

Postponing to 10.0, feel free to include it when the PR is ready.

> [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter 
> method
> -
>
> Key: ARROW-16616
> URL: https://issues.apache.org/jira/browse/ARROW-16616
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Alessandro Molina
>Assignee: Alessandro Molina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> To keep the {{Dataset}} api compatible with the {{Table}} one in terms of 
> analytics capabilities, we should add a {{Dataset.filter}} method. The 
> initial POC was based on {{_table_filter}} but that required materialising 
> all the {{Dataset}} content after filtering as it returned an 
> {{{}InMemoryDataset{}}}. 
> Given that {{Scanner}} can filter a dataset without actually materialising 
> the data until a final step happens, it would be good to have 
> {{Dataset.filter}} return some form of lazy dataset when the filter is only 
> stored aside and the Scanner is created when data is actually retrieved.
> PS: Also update {{test_dataset_filter}} test to use the {{Dataset.filter}} 
> method



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16616) [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter method

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16616:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [Python] Allow lazy evaluation of filters in Dataset and add Datset.filter 
> method
> -
>
> Key: ARROW-16616
> URL: https://issues.apache.org/jira/browse/ARROW-16616
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Alessandro Molina
>Assignee: Alessandro Molina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> To keep the {{Dataset}} api compatible with the {{Table}} one in terms of 
> analytics capabilities, we should add a {{Dataset.filter}} method. The 
> initial POC was based on {{_table_filter}} but that required materialising 
> all the {{Dataset}} content after filtering as it returned an 
> {{{}InMemoryDataset{}}}. 
> Given that {{Scanner}} can filter a dataset without actually materialising 
> the data until a final step happens, it would be good to have 
> {{Dataset.filter}} return some form of lazy dataset when the filter is only 
> stored aside and the Scanner is created when data is actually retrieved.
> PS: Also update {{test_dataset_filter}} test to use the {{Dataset.filter}} 
> method



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-10739:

Fix Version/s: 10.0.0

> [Python] Pickling a sliced array serializes all the buffers
> ---
>
> Key: ARROW-10739
> URL: https://issues.apache.org/jira/browse/ARROW-10739
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Maarten Breddels
>Assignee: Alessandro Molina
>Priority: Critical
> Fix For: 10.0.0
>
>
> If a large array is sliced, and pickled, it seems the full buffer is 
> serialized, this leads to excessive memory usage and data transfer when using 
> multiprocessing or dask.
> {code:java}
> >>> import pyarrow as pa
> >>> ar = pa.array(['foo'] * 100_000)
> >>> ar.nbytes
> 74
> >>> import pickle
> >>> len(pickle.dumps(ar.slice(10, 1)))
> 700165
> NumPy for instance
> >>> import numpy as np
> >>> ar_np = np.array(ar)
> >>> ar_np
> array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object)
> >>> import pickle
> >>> len(pickle.dumps(ar_np[10:11]))
> 165{code}
> I think this makes sense if you know arrow, but kind of unexpected as a user.
> Is there a workaround for this? For instance copy an arrow array to get rid 
> of the offset, and trim the buffers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570763#comment-17570763
 ] 

Krisztian Szucs commented on ARROW-10739:
-

Postponing to 10.0 since there is no PR available at the moment.

> [Python] Pickling a sliced array serializes all the buffers
> ---
>
> Key: ARROW-10739
> URL: https://issues.apache.org/jira/browse/ARROW-10739
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Maarten Breddels
>Assignee: Alessandro Molina
>Priority: Critical
>
> If a large array is sliced, and pickled, it seems the full buffer is 
> serialized, this leads to excessive memory usage and data transfer when using 
> multiprocessing or dask.
> {code:java}
> >>> import pyarrow as pa
> >>> ar = pa.array(['foo'] * 100_000)
> >>> ar.nbytes
> 74
> >>> import pickle
> >>> len(pickle.dumps(ar.slice(10, 1)))
> 700165
> NumPy for instance
> >>> import numpy as np
> >>> ar_np = np.array(ar)
> >>> ar_np
> array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object)
> >>> import pickle
> >>> len(pickle.dumps(ar_np[10:11]))
> 165{code}
> I think this makes sense if you know arrow, but kind of unexpected as a user.
> Is there a workaround for this? For instance copy an arrow array to get rid 
> of the offset, and trim the buffers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-10739:

Fix Version/s: (was: 9.0.0)

> [Python] Pickling a sliced array serializes all the buffers
> ---
>
> Key: ARROW-10739
> URL: https://issues.apache.org/jira/browse/ARROW-10739
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Maarten Breddels
>Assignee: Alessandro Molina
>Priority: Critical
>
> If a large array is sliced, and pickled, it seems the full buffer is 
> serialized, this leads to excessive memory usage and data transfer when using 
> multiprocessing or dask.
> {code:java}
> >>> import pyarrow as pa
> >>> ar = pa.array(['foo'] * 100_000)
> >>> ar.nbytes
> 74
> >>> import pickle
> >>> len(pickle.dumps(ar.slice(10, 1)))
> 700165
> NumPy for instance
> >>> import numpy as np
> >>> ar_np = np.array(ar)
> >>> ar_np
> array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object)
> >>> import pickle
> >>> len(pickle.dumps(ar_np[10:11]))
> 165{code}
> I think this makes sense if you know arrow, but kind of unexpected as a user.
> Is there a workaround for this? For instance copy an arrow array to get rid 
> of the offset, and trim the buffers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16655) [Release] Release improvements

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16655.
-
Resolution: Fixed

Nice! Thanks [~raulcd]!

> [Release] Release improvements
> --
>
> Key: ARROW-16655
> URL: https://issues.apache.org/jira/browse/ARROW-16655
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
> Fix For: 9.0.0
>
>
> This is an umbrella ticket collecting various improvements to our existing 
> Release Process.
> The improvements are focused on:
>  * Improvements and fixes for the current release scripts and steps
>  * Documentation improvements



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16665) [Release] Update 03-binary-submit.sh to comment on PR and track binary submission with badges

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16665.
-
Resolution: Fixed

Issue resolved by pull request 13612
[https://github.com/apache/arrow/pull/13612]

> [Release] Update 03-binary-submit.sh to comment on PR and track binary 
> submission with badges
> -
>
> Key: ARROW-16665
> URL: https://issues.apache.org/jira/browse/ARROW-16665
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15961) [C++] Check nullability when validating fields on batches or struct arrays

2022-07-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-15961:

Fix Version/s: 10.0.0
   (was: 9.0.0)

> [C++] Check nullability when validating fields on batches or struct arrays
> --
>
> Key: ARROW-15961
> URL: https://issues.apache.org/jira/browse/ARROW-15961
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kaouther Abrougui
>Priority: Major
>  Labels: good-first-issue, good-second-issue, 
> pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> According to ARROW-15899, it is possible to declare a field non-nullable, 
> associate with data that has nulls, and still pass validation.
> Validation should instead fail in such a situation (at least full validation, 
> since computing the null count can be O\(n\)).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15961) [C++] Check nullability when validating fields on batches or struct arrays

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570759#comment-17570759
 ] 

Krisztian Szucs commented on ARROW-15961:
-

Moving to 10.0 since the PR is in progress.

> [C++] Check nullability when validating fields on batches or struct arrays
> --
>
> Key: ARROW-15961
> URL: https://issues.apache.org/jira/browse/ARROW-15961
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kaouther Abrougui
>Priority: Major
>  Labels: good-first-issue, good-second-issue, 
> pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> According to ARROW-15899, it is possible to declare a field non-nullable, 
> associate with data that has nulls, and still pass validation.
> Validation should instead fail in such a situation (at least full validation, 
> since computing the null count can be O\(n\)).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16797) [Python][Packaging] Update conda-recipes from conda-forge feedstock

2022-06-09 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552384#comment-17552384
 ] 

Krisztian Szucs commented on ARROW-16797:
-

Conda forge feedstocks seem more self-contained nowadays. 

One possible solution could be to copy the top-level azure template 
https://github.com/conda-forge/arrow-cpp-feedstock/blob/main/azure-pipelines.yml
 and clone the upstream feedstock as the first step and copy our meta.yml to 
the freshly cloned feedstock directory. We should be able to choose the 
template at runtime 
https://docs.microsoft.com/en-us/azure/devops/pipelines/process/templates?view=azure-devops#parameters-to-select-a-template-at-runtime

Of course this would mean that we need to remove the task parametrization from 
tasks.yml and let the upstream feedstock configutation handle the build matrix. 

Mixing it with the arrow-r feedstock is less trivial though (perhaps we should 
just drop that).

> [Python][Packaging] Update conda-recipes from conda-forge feedstock
> ---
>
> Key: ARROW-16797
> URL: https://issues.apache.org/jira/browse/ARROW-16797
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Raúl Cumplido
>Priority: Major
>
> Our conda-recipes have not been updated for the last 4 months 
> ([https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes/.ci_support)]
>  and they are not up-to-date with the upstream feedstocks:
> [arrow-cpp-feedstock]: [https://github.com/conda-forge/arrow-cpp-feedstock]
> [parquet-cpp-feedstock]: 
> [https://github.com/conda-forge/parquet-cpp-feedstock]
> We should keep them up-to-date.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16797) [Python][Packaging] Update conda-recipes from conda-forge feedstock

2022-06-09 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552377#comment-17552377
 ] 

Krisztian Szucs commented on ARROW-16797:
-

Could we perhaps automatize this somehow?

> [Python][Packaging] Update conda-recipes from conda-forge feedstock
> ---
>
> Key: ARROW-16797
> URL: https://issues.apache.org/jira/browse/ARROW-16797
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Raúl Cumplido
>Priority: Major
>
> Our conda-recipes have not been updated for the last 4 months 
> ([https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes/.ci_support)]
>  and they are not up-to-date with the upstream feedstocks:
> [arrow-cpp-feedstock]: [https://github.com/conda-forge/arrow-cpp-feedstock]
> [parquet-cpp-feedstock]: 
> [https://github.com/conda-forge/parquet-cpp-feedstock]
> We should keep them up-to-date.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16553) [Java][CI] Use GitHub repository Jar assets as a repository that could be consumed by dependencies management (Maven/Gradle)

2022-06-09 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16553.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13328
[https://github.com/apache/arrow/pull/13328]

> [Java][CI] Use GitHub repository Jar assets as a repository that could be 
> consumed by dependencies management (Maven/Gradle)
> 
>
> Key: ARROW-16553
> URL: https://issues.apache.org/jira/browse/ARROW-16553
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Developer Tools, Java
>Affects Versions: 9.0.0
>Reporter: David Dali Susanibar Arce
>Assignee: David Dali Susanibar Arce
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> For Java side currently we are offering nightly builds Jar artifacts uploaded 
> to GitHub repository as an assets.
> Then, if a user decided to use that in their local projects they need to 
> download that Jar assets from the GitHub nightly packages and i[nstall that 
> manually one by one as Jar needed as mention in the 
> documentation|https://arrow.apache.org/docs/java/install.html#installing-nightly-packages].
> Trying to figure out if there are some option to use GitHub nightly builds 
> Jar artifacts as a really repository and only configure the nightly build in 
> my pom.xml for example and maven be able to download dependencies needed 
> automatically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16785) [Packaging][Linux] Add FindThrift.cmake

2022-06-08 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16785.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13337
[https://github.com/apache/arrow/pull/13337]

> [Packaging][Linux] Add FindThrift.cmake 
> 
>
> Key: ARROW-16785
> URL: https://issues.apache.org/jira/browse/ARROW-16785
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up of ARROW-1672.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16767) [Archery] Refactor archery.release submodule to its own subpackage

2022-06-07 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-16767:
---

 Summary: [Archery] Refactor archery.release submodule to its own 
subpackage
 Key: ARROW-16767
 URL: https://issues.apache.org/jira/browse/ARROW-16767
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Krisztian Szucs
 Fix For: 9.0.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (ARROW-16767) [Archery] Refactor archery.release submodule to its own subpackage

2022-06-07 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-16767:
---

Assignee: Krisztian Szucs

> [Archery] Refactor archery.release submodule to its own subpackage
> --
>
> Key: ARROW-16767
> URL: https://issues.apache.org/jira/browse/ARROW-16767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Archery
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16663) [Release][Dev] Add flag to archery release curate to only show minimal information

2022-06-07 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16663.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

> [Release][Dev] Add flag to archery release curate to only show minimal 
> information
> --
>
> Key: ARROW-16663
> URL: https://issues.apache.org/jira/browse/ARROW-16663
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently archery release curate shows a lot of information that is not 
> relevant, like the tickets that are correctly assigned. Have a new flag to 
> show only the information that requires manual fixing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16684) [CI][Archery] Add retry mechanism to git fetch

2022-05-30 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16684.
-
Resolution: Fixed

Issue resolved by pull request 13258
[https://github.com/apache/arrow/pull/13258]

> [CI][Archery] Add retry mechanism to git fetch
> --
>
> Key: ARROW-16684
> URL: https://issues.apache.org/jira/browse/ARROW-16684
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Archery, Continuous Integration, Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Archery seems to fail sometimes to fetch branches for some repositories. Some 
> of the report packaging jobs 
> ([https://github.com/ursacomputing/crossbow/runs/6643769198?check_suite_focus=true)]
>  have been failing due to git errors when fetching:
> {code:java}
>    File 
> "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/crossbow/cli.py",
>  line 238, in latest_prefix
>     queue.fetch()
>   File 
> "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/crossbow/core.py",
>  line 271, in fetch
>     self.origin.fetch([refspec])
>   File 
> "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/remote.py",
>  line 146, in fetch
>     payload.check_error(err)
>   File 
> "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/callbacks.py",
>  line 93, in check_error
>     check_error(error_code)
>   File 
> "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pygit2/errors.py",
>  line 65, in check_error
>     raise GitError(message)
> _pygit2.GitError: SSL error: received early EOF
> Error: Process completed with exit code 1.{code}
> I have seen that retrying the job can make it pass.
> We should add a retry mechanism to archery to allow retry on GitErrors when 
> fetching branches.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16560) [Website][Release] Version JSON files not updated in release

2022-05-30 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16560.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13257
[https://github.com/apache/arrow/pull/13257]

> [Website][Release] Version JSON files not updated in release
> 
>
> Key: ARROW-16560
> URL: https://issues.apache.org/jira/browse/ARROW-16560
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Nicola Crane
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ARROW-15366 added a script to automatically increment the version switchers 
> for the docs, which was updated as part of the changes in ARROW-1.  
> However, the latest release did not increment the version numbers (and 
> ARROW-1 changes the script to update on snapshots instead of releases - 
> could be the reason for it not happening?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases

2022-05-30 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-16654:
---

Assignee: Krisztian Szucs

> [Dev][Archery] Support cherry-picking for major releases 
> -
>
> Key: ARROW-16654
> URL: https://issues.apache.org/jira/browse/ARROW-16654
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Archery, Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases

2022-05-30 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16654.
-
Resolution: Fixed

Issue resolved by pull request 13230
[https://github.com/apache/arrow/pull/13230]

> [Dev][Archery] Support cherry-picking for major releases 
> -
>
> Key: ARROW-16654
> URL: https://issues.apache.org/jira/browse/ARROW-16654
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Archery, Developer Tools
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16654) [Dev][Archery] Support cherry-picking for major releases

2022-05-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-16654:
---

 Summary: [Dev][Archery] Support cherry-picking for major releases 
 Key: ARROW-16654
 URL: https://issues.apache.org/jira/browse/ARROW-16654
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Archery, Developer Tools
Reporter: Krisztian Szucs
 Fix For: 9.0.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARROW-16445) [R] [Doc] Add a short summary for the Installing the Arrow package on Linux article

2022-05-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16445:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [R] [Doc] Add a short summary for the Installing the Arrow package on Linux 
> article
> ---
>
> Key: ARROW-16445
> URL: https://issues.apache.org/jira/browse/ARROW-16445
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> From [~npr]: "I think [https://arrow.apache.org/docs/r/articles/install.html] 
> would benefit from a very simple summary at the top: 
> {{install.packages("arrow")}} just works; there are things you can do to make 
> it install faster (see below); if for some reason it doesn't work, set the 
> env var {{{}ARROW_R_DEV=true{}}}, retry, and share the logs with us."



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARROW-16327) [Java][CI]: Add support for Java 17 CI process

2022-05-25 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16327:

Fix Version/s: 9.0.0
   (was: 8.0.0)

> [Java][CI]: Add support for Java 17 CI process
> --
>
> Key: ARROW-16327
> URL: https://issues.apache.org/jira/browse/ARROW-16327
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Affects Versions: 9.0.0
>Reporter: David Dali Susanibar Arce
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently Arrow Java code is tenting with JSE11.
> This ticket is to planning/mapping activities involved to also offer support 
> JS17



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16317) [Archery][CI] Fix possible race condition when submitting crossbow builds

2022-05-19 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16317.
-
Resolution: Fixed

Issue resolved by pull request 13188
[https://github.com/apache/arrow/pull/13188]

> [Archery][CI] Fix possible race condition when submitting crossbow builds
> -
>
> Key: ARROW-16317
> URL: https://issues.apache.org/jira/browse/ARROW-16317
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery, Continuous Integration
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Sometimes when trying to use github-actions to submit crossbow jobs an error 
> is raised like:
> {code:java}
> Failed to push updated references, potentially because of credential issues: 
> ['refs/heads/actions-1883-github-wheel-windows-cp310-amd64', 
> 'refs/tags/actions-1883-github-wheel-windows-cp310-amd64', 
> 'refs/heads/actions-1883-github-wheel-windows-cp39-amd64', 
> 'refs/tags/actions-1883-github-wheel-windows-cp39-amd64', 
> 'refs/heads/actions-1883-github-wheel-windows-cp37-amd64', 
> 'refs/tags/actions-1883-github-wheel-windows-cp37-amd64', 
> 'refs/heads/actions-1883-github-wheel-windows-cp38-amd64', 
> 'refs/tags/actions-1883-github-wheel-windows-cp38-amd64', 
> 'refs/heads/actions-1883']
> The Archery job run can be found at: 
> https://github.com/apache/arrow/actions/runs/2195038965{code}
> As discussed on this github comment 
> ([https://github.com/apache/arrow/pull/12930#issuecomment-1103772507)]
> We should remove the auto incremented IDs entirely and use unique hashes 
> instead, e.g.: actions--github-wheel-windows-cp310-amd64 instead 
> of actions-1883-github-wheel-windows-cp310-amd64. Then we wouldn't need to 
> fetch the new references either, making remote crossbow builds and local 
> submission much quicker.
> The error can also be seen here: 
> https://github.com/apache/arrow/pull/12987#issuecomment-1108516668



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16589) [CI][Dev] Make tasks.yml easier to maintain

2022-05-16 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-16589:
---

 Summary: [CI][Dev] Make tasks.yml easier to maintain
 Key: ARROW-16589
 URL: https://issues.apache.org/jira/browse/ARROW-16589
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration, Developer Tools
Reporter: Krisztian Szucs


I think {{dev/tasks/tasks.yml}} has reached its limits by using jinja2 
templated yml. 

We should think about a better way to define crossbow jobs while:
- keeping it readable
- in a dialect which is natively supported by editors
- while supporting tasks parametrization

Just one idea is to use python files containing python objects, e.g.:

{code}
Task(
  name="wheel-macos-big-sur-cp38-arm64",
  ci="github",
  template="python-wheels/github.osx.arm64.yml",
  params=dict(
arch="arm64",
arrow_simd_level="DEFAULT",
python_version="3.8",
macos_deployment_target="11.0"
  ),
  artifacts=[
"pyarrow-{no_rc_version}-cp38-cp38-macosx_11_0_arm64.whl"
  ]
)
{code}

where {{Task}} would be the crossbow task class (which could be refactored to 
use pydantic or another alternative for less boilerplate). Of course porting to 
the tasks definitions to plain python could make the situation even worse by 
accessing too many scripting utilities. We could try a dynamic config language 
which sits between yaml and python like HCL.

[~kou] what syntax would you be comfortable to work with? Do you have any 
alternatives we could use?

cc [~amol-] [~raulcd] [~assignUser]




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning

2022-05-06 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-16420:

Fix Version/s: (was: 8.0.0)

> [Python] pq.write_to_dataset always ignores partitioning
> 
>
> Key: ARROW-16420
> URL: https://issues.apache.org/jira/browse/ARROW-16420
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 8.0.0
>Reporter: David Li
>Assignee: Alenka Frim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0, 8.0.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The code unconditionally sets {{partitioning}} to None, so the user-supplied 
> partitioning is ignored. 
> https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16488) [Archery][DevTools] Allow extra message to be sent on chat report

2022-05-06 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-16488.
-
Resolution: Fixed

Issue resolved by pull request 13081
[https://github.com/apache/arrow/pull/13081]

> [Archery][DevTools] Allow extra message to be sent on chat report
> -
>
> Key: ARROW-16488
> URL: https://issues.apache.org/jira/browse/ARROW-16488
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Archery, Developer Tools
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Allow some extra content to be configurable via CLI when sending a 
> chat-report.
> This will allow to slightly customize the message that is sent.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   3   4   5   6   7   8   9   10   >