from:"Jacob Wujciak\-Jens \(Jira\)"

[jira] [Commented] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641471#comment-17641471
 ] 

Jacob Wujciak-Jens commented on ARROW-18375:


Those labels would make it possible to sort the commit changelog into somethign 
a bit more useful for users!

> MIGRATION: Enable GitHub issue type labels
> --
>
> Key: ARROW-18375
> URL: https://issues.apache.org/jira/browse/ARROW-18375
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> As part of enabling GitHub issue reporting, the following labels have been 
> defined and need to be added to the repository label options. Without these 
> labels added, [new issues|https://github.com/apache/arrow/issues/14692] do 
> not get the issue template-defined issue type labels set properly.
>  
> Labels:
>  * Type: bug
>  * Type: enhancement
>  * Type: usage
>  * Type: task
>  * Type: test
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641380#comment-17641380
 ] 

Jacob Wujciak-Jens edited comment on ARROW-18375 at 11/30/22 3:13 PM:
--

I don't think we need that granularity but I don't hold that as strong opinion, 
so  I'll follow your experience as PMCs  (y)


was (Author: JIRAUSER287549):
 (y)I don't think we need that granularity but I don't hold that as strong 
opinion, so  I'll follow your experience as PMCs

> MIGRATION: Enable GitHub issue type labels
> --
>
> Key: ARROW-18375
> URL: https://issues.apache.org/jira/browse/ARROW-18375
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> As part of enabling GitHub issue reporting, the following labels have been 
> defined and need to be added to the repository label options. Without these 
> labels added, [new issues|https://github.com/apache/arrow/issues/14692] do 
> not get the issue template-defined issue type labels set properly.
>  
> Labels:
>  * Type: bug
>  * Type: enhancement
>  * Type: usage
>  * Type: task
>  * Type: test
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641380#comment-17641380
 ] 

Jacob Wujciak-Jens commented on ARROW-18375:


 (y)I don't think we need that granularity but I don't hold that as strong 
opinion, so  I'll follow your experience as PMCs

> MIGRATION: Enable GitHub issue type labels
> --
>
> Key: ARROW-18375
> URL: https://issues.apache.org/jira/browse/ARROW-18375
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> As part of enabling GitHub issue reporting, the following labels have been 
> defined and need to be added to the repository label options. Without these 
> labels added, [new issues|https://github.com/apache/arrow/issues/14692] do 
> not get the issue template-defined issue type labels set properly.
>  
> Labels:
>  * Type: bug
>  * Type: enhancement
>  * Type: usage
>  * Type: task
>  * Type: test
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641376#comment-17641376
 ] 

Jacob Wujciak-Jens commented on ARROW-18375:


What is the difference between enhancement and task? And what is test? Maybe 
rather feature and enhancment? Where enhancement is everything that is not a 
new feature e.g. refactoring, adding tests... as those all enhance the 
codebase/project?

> MIGRATION: Enable GitHub issue type labels
> --
>
> Key: ARROW-18375
> URL: https://issues.apache.org/jira/browse/ARROW-18375
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> As part of enabling GitHub issue reporting, the following labels have been 
> defined and need to be added to the repository label options. Without these 
> labels added, [new issues|https://github.com/apache/arrow/issues/14692] do 
> not get the issue template-defined issue type labels set properly.
>  
> Labels:
>  * Type: bug
>  * Type: enhancement
>  * Type: usage
>  * Type: task
>  * Type: test
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-18385) [Java]

2022-11-22 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-18385:
--

 Summary: [Java] 
 Key: ARROW-18385
 URL: https://issues.apache.org/jira/browse/ARROW-18385
 Project: Apache Arrow
  Issue Type: Wish
  Components: Java
Reporter: Jacob Wujciak-Jens
 Fix For: 11.0.0
 Attachments: image.png

While verifying 10.0.1 I came across this java test error that is caused by a 
mismatch in the ordering of the JSON metadata description (see attached image)
ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.177 s 
<<< FAILURE! - in org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest
[ERROR] 
org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest.schemaCommentWithDatabaseMetadata
  Time elapsed: 0.141 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 
cc [~lidavidm] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18385) [Java] Test fails due to JSON key order

2022-11-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18385:
---
Description: 
While verifying 10.0.1 I came across this java test error that is caused by a 
mismatch in the ordering of the JSON metadata description (see attached image)


{{ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.177 
s <<< FAILURE! - in 
org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest}}
{{[ERROR] 
org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest.schemaCommentWithDatabaseMetadata
 Time elapsed: 0.141 s <<< FAILURE!}}
{{org.opentest4j.AssertionFailedError: }}


cc [~lidavidm] 

  was:
While verifying 10.0.1 I came across this java test error that is caused by a 
mismatch in the ordering of the JSON metadata description (see attached image)
ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.177 s 
<<< FAILURE! - in org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest
[ERROR] 
org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest.schemaCommentWithDatabaseMetadata
  Time elapsed: 0.141 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 
cc [~lidavidm] 


> [Java] Test fails due to JSON key order
> ---
>
> Key: ARROW-18385
> URL: https://issues.apache.org/jira/browse/ARROW-18385
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Java
>Reporter: Jacob Wujciak-Jens
>Priority: Major
> Fix For: 11.0.0
>
> Attachments: image.png
>
>
> While verifying 10.0.1 I came across this java test error that is caused by a 
> mismatch in the ordering of the JSON metadata description (see attached image)
> {{ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 0.177 s <<< FAILURE! - in 
> org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest}}
> {{[ERROR] 
> org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest.schemaCommentWithDatabaseMetadata
>  Time elapsed: 0.141 s <<< FAILURE!}}
> {{org.opentest4j.AssertionFailedError: }}
> cc [~lidavidm] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18385) [Java] Test fails due to JSON key order

2022-11-22 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18385:
---
Summary: [Java] Test fails due to JSON key order  (was: [Java] )

> [Java] Test fails due to JSON key order
> ---
>
> Key: ARROW-18385
> URL: https://issues.apache.org/jira/browse/ARROW-18385
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Java
>Reporter: Jacob Wujciak-Jens
>Priority: Major
> Fix For: 11.0.0
>
> Attachments: image.png
>
>
> While verifying 10.0.1 I came across this java test error that is caused by a 
> mismatch in the ordering of the JSON metadata description (see attached image)
> ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.177 
> s <<< FAILURE! - in 
> org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest
> [ERROR] 
> org.apache.arrow.adapter.jdbc.JdbcToArrowCommentMetadataTest.schemaCommentWithDatabaseMetadata
>   Time elapsed: 0.141 s  <<< FAILURE!
> org.opentest4j.AssertionFailedError: 
> cc [~lidavidm] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18258) [CI] Substrait Integration Testing

2022-11-07 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18258:
---
Summary: [CI] Substrait Integration Testing  (was: [Docker] Substrait 
Integration Testing)

> [CI] Substrait Integration Testing
> --
>
> Key: ARROW-18258
> URL: https://issues.apache.org/jira/browse/ARROW-18258
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment the Substrait consumer test suite is developed at 
> [https://github.com/substrait-io/consumer-testing.] To evaluate the 
> performance and functionality against Acero/Substrait development, an 
> integration test suite is important.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18258) [Docker] Substrait Integration Testing

2022-11-07 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18258:
---
Component/s: Continuous Integration

> [Docker] Substrait Integration Testing
> --
>
> Key: ARROW-18258
> URL: https://issues.apache.org/jira/browse/ARROW-18258
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment the Substrait consumer test suite is developed at 
> [https://github.com/substrait-io/consumer-testing.] To evaluate the 
> performance and functionality against Acero/Substrait development, an 
> integration test suite is important.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18016) [CI] Add sccache to r jobs

2022-11-02 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18016:
---
Description: Building on the work in ARROW-17021 we can now activate 
sccache on more builds and save more time! To keep the PRs reviewable I have 
reduced this to only R jobs and will open follow-ups for the other tasks.  
(was: Building on the work in [ARROW-17021] we can now activate sccache on more 
builds and save more time!)

> [CI] Add sccache to r jobs
> --
>
> Key: ARROW-18016
> URL: https://issues.apache.org/jira/browse/ARROW-18016
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Building on the work in ARROW-17021 we can now activate sccache on more 
> builds and save more time! To keep the PRs reviewable I have reduced this to 
> only R jobs and will open follow-ups for the other tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18016) [CI] Add sccache to r jobs

2022-11-02 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-18016:
---
Summary: [CI] Add sccache to r jobs  (was: [CI] Add sccache to more jobs)

> [CI] Add sccache to r jobs
> --
>
> Key: ARROW-18016
> URL: https://issues.apache.org/jira/browse/ARROW-18016
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Building on the work in [ARROW-17021] we can now activate sccache on more 
> builds and save more time!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-18188) [CI] CUDA nightly docker upload fails due to wrong tag

2022-10-28 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-18188:
--

 Summary: [CI] CUDA nightly docker upload fails due to wrong tag
 Key: ARROW-18188
 URL: https://issues.apache.org/jira/browse/ARROW-18188
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 11.0.0


Due to the different CUDA version required for this job vs the version in .env 
the CUDA envvar needs to be set in the push step too to generate the correct 
tag.

https://github.com/ursacomputing/crossbow/actions/runs/3341920350/jobs/5533611188#step:7:9



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (ARROW-14536) [R] [CI] M1 tests

2022-10-27 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens closed ARROW-14536.
--
Resolution: Duplicate

> [R] [CI] M1 tests
> -
>
> Key: ARROW-14536
> URL: https://issues.apache.org/jira/browse/ARROW-14536
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Priority: Major
>
> The CRAN specifics are: https://www.stats.ox.ac.uk/pub/bdr/M1mac/README.txt
> ARROW-10657 is the more general ticket for the project



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-26 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624628#comment-17624628
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


It looks like they already support it 
[https://github.com/pypa/setuptools_scm/issues/258] but when running the docker 
job the dir containing the .git folder is not mounted only the arrow dir, so 
it's not a bug but rather due to my specific circumstances of using worktrees + 
docker. So I think this can be closed?

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-26 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624619#comment-17624619
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


[~apitrou] I found out why this fails for me without the envvar: I use git 
worktrees so my {{.git}} folder is not available to scm to find the version.

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (ARROW-17726) [CI] Enable sccache on more builds

2022-10-26 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens closed ARROW-17726.
--
Resolution: Fixed

> [CI] Enable sccache on more builds
> --
>
> Key: ARROW-17726
> URL: https://issues.apache.org/jira/browse/ARROW-17726
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 11.0.0
>
>
> As a follow up to [ARROW-17021]. Enabling sccache should be as easy as adding 
> the install script to the relevant docker image and the sccache env to the 
> docker-compose service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-25 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623859#comment-17623859
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


You can remove it from docker-compose.yml and test it, on my machine the same 
error occurred, that's why I added it as a convenience. If there is another way 
to avoid using it I am happy to add that, I used this as it is used in all 
crossbow wheel builds

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-25 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623848#comment-17623848
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


as a workaround you can use docker-compose run ubuntu-cuda-python directly for 
now

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-25 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623842#comment-17623842
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


This is an archery issue, I checked within the container and it get's set to  
'None'. archery SHOULD use docker-compose by default and that would use the yml 
to set the envvars but apparently it is also passing them in explicitly and 
redundantly  as {{-e var=value}}

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18155) [Python][GPU] "archery docker run ubuntu-cuda-python" fails

2022-10-25 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623829#comment-17623829
 ] 

Jacob Wujciak-Jens commented on ARROW-18155:


[~apitrou] but it also fails if you remove that with the same error. IIUC this 
envvar has to be set for the wheels to be built and is set within the crossbow 
jobs via the -e option. I added it to the environment so it is inherited from 
the calling env without having to be explicitly called. If you have not set the 
envvar it will remain unset (same as if it was not there).

> [Python][GPU] "archery docker run ubuntu-cuda-python" fails
> ---
>
> Key: ARROW-18155
> URL: https://issues.apache.org/jira/browse/ARROW-18155
> Project: Apache Arrow
>  Issue Type: Task
>  Components: GPU, Python
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> This has just started occurring. Log below:
> https://gist.github.com/pitrou/7b945b6e58d42d7aafb9d669cd31eb5f



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (ARROW-12826) [R] [CI] Add caching to revdepchecks

2022-10-25 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens closed ARROW-12826.
--
Resolution: Invalid

We have removed the revdepcheck crossbow job in favour of running it locally, 
so this issue is invalid.

> [R] [CI] Add caching to revdepchecks
> 
>
> Key: ARROW-12826
> URL: https://issues.apache.org/jira/browse/ARROW-12826
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Priority: Major
>
> With ARROW-12569 we added a (manual) reverse dependency check job. This runs 
> fine (if slow) for a one-off run. It should be possible to cache between 
> runs. There are a few issues with this currently:
> * {revdepcheck} does not (yet) [support only running new 
> runs|https://github.com/r-lib/revdepcheck/issues/94]
> * The cache doesn't cache some of the longest running tasks (installing the 
> reverse dependencies)
> * If we cache the revdeps directory, we will need to re-add packages that 
> should be re-checked.
> We should investigate contributing to revdepcheck to resolve the run-only-new 
> and possibly also add features for cacheing the installations (and only 
> change when the crancache is invalidated / finds a new package?) 
> https://github.com/HenrikBengtsson/revdepcheck.extras might also be helpful
> For posterity, the following is ~what we would need to add to 
> dev/tasks/r/github.linux.revdepcheck.yml
> ```
>   - name: Cache crancache and revdeps directory
> uses: actions/cache@v2
> with:
>   key: {{ "r-revdep-cache-${{ some-way-to-get-arrow-version }}" }}
>   path: |
> arrow/r/revdep
> arrow/.crancache
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16464) [C++][CI][GPU] Add CUDA CI

2022-10-24 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623068#comment-17623068
 ] 

Jacob Wujciak-Jens commented on ARROW-16464:


We will use self-hosted runners via crossbow, probably for nightlies only for 
now.

> [C++][CI][GPU] Add CUDA CI
> --
>
> Key: ARROW-16464
> URL: https://issues.apache.org/jira/browse/ARROW-16464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, GPU
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 11.0.0
>
>
> Arrow C++, PyArrow and perhaps other bindings have CUDA support, but none is 
> currently tested on CI, and I think few of the contributors enable CUDA on 
> their local builds.
> We should definitely exercise CUDA support, at least in the nightly builds 
> where we may have more flexibility to use custom machines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17949) [Python][C++][Docs] Use ccache when developing on Windows

2022-10-21 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622310#comment-17622310
 ] 

Jacob Wujciak-Jens commented on ARROW-17949:


You don't have to explicitly turn (S)CCACHE on via cmake flag, they will be 
used automatically if the program is installed. SCCACHE will have precedence 
over CCACHE if both are installed and SCCACHE credentials are available via 
envvar. I am not sure if there is caching on Appveyor we could use to cache the 
ccache cache (xD) but I could take a look.

> [Python][C++][Docs] Use ccache when developing on Windows
> -
>
> Key: ARROW-17949
> URL: https://issues.apache.org/jira/browse/ARROW-17949
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Alenka Frim
>Priority: Major
>
> When I was trying to update AppVeyor build to use Python 3.10 in 
> https://issues.apache.org/jira/browse/ARROW-17892 CI build gave me an error 
> about the pyuv source install: 
> [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/44974788#L884|https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/44974788#L884.]
> After some investigation me and Joris found that 
> [clcache|https://github.com/frerich/clcache] project used in the AppVeyor 
> setup is archived and it's dependency 
> [pyuv|https://pypi.org/project/pyuv/#files] is no longer maintained.
> h5. Is there any substitute for clcache?
> There is a 
> [PR|https://github.com/apache/arrow/pull/12230/files#diff-1380789f5b2c91702997c503a501ba8cbaa3a44747848e7c372b8b6eda1369f0]
>  from [~willjones127] that changes the documentation to advise a use of 
> [https://github.com/Nuitka/clcache] . That project is unfortunately also 
> depending on pyuv. 
> But is there any other alternative to these two projects? cc [~raulcd] 
> [~assignUser] 
> h5. Arrow C++ documentation
> The use of clcache is still mentioned in our [C++ docs 
> |https://arrow.apache.org/docs/dev/developers/cpp/windows.html#building-with-ninja-and-clcache]and
>  should be updated/removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-18016) [CI] Add sccache to more jobs

2022-10-12 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-18016:
--

 Summary: [CI] Add sccache to more jobs
 Key: ARROW-18016
 URL: https://issues.apache.org/jira/browse/ARROW-18016
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


Building on the work in [ARROW-17021] we can now activate sccache on more 
builds and save more time!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17990) [CI][R][C++] R macOS 10.13 build fails due to bmi2 changes

2022-10-11 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17990:
---
Summary: [CI][R][C++] R macOS 10.13 build fails  due to bmi2 changes  (was: 
[CI][R][C++] R macOS 10.13 build due to bmi2 changes)

> [CI][R][C++] R macOS 10.13 build fails  due to bmi2 changes
> ---
>
> Key: ARROW-17990
> URL: https://issues.apache.org/jira/browse/ARROW-17990
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> After the changes in [ARROW-15768] the macos 10.13 build is failing: 
> {code}
> /Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
>  error: always_inline function '_pext_u64' requires target feature 'bmi2', 
> but would be inlined into function 'ExtractBits' that is compiled without 
> support for 'bmi2'
> {code}
> https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17990) [CI][R][C++] R macOS 10.13 build due to bmi2 changes

2022-10-11 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615940#comment-17615940
 ] 

Jacob Wujciak-Jens commented on ARROW-17990:


cc: [~kou]

> [CI][R][C++] R macOS 10.13 build due to bmi2 changes
> 
>
> Key: ARROW-17990
> URL: https://issues.apache.org/jira/browse/ARROW-17990
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> After the changes in [ARROW-15768] the macos 10.13 build is failing: 
> {code}
> /Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
>  error: always_inline function '_pext_u64' requires target feature 'bmi2', 
> but would be inlined into function 'ExtractBits' that is compiled without 
> support for 'bmi2'
> {code}
> https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17990) [CI][R][C++] R macOS 10.13 build due to bmi2 changes

2022-10-11 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17990:
--

 Summary: [CI][R][C++] R macOS 10.13 build due to bmi2 changes
 Key: ARROW-17990
 URL: https://issues.apache.org/jira/browse/ARROW-17990
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


After the changes in [ARROW-15768] the macos 10.13 build is failing: 

{code}
/Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
 error: always_inline function '_pext_u64' requires target feature 'bmi2', but 
would be inlined into function 'ExtractBits' that is compiled without support 
for 'bmi2'
{code}

https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016

cc: ~kou



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17990) [CI][R][C++] R macOS 10.13 build due to bmi2 changes

2022-10-11 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17990:
---
Description: 
After the changes in [ARROW-15768] the macos 10.13 build is failing: 

{code}
/Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
 error: always_inline function '_pext_u64' requires target feature 'bmi2', but 
would be inlined into function 'ExtractBits' that is compiled without support 
for 'bmi2'
{code}

https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016


  was:
After the changes in [ARROW-15768] the macos 10.13 build is failing: 

{code}
/Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
 error: always_inline function '_pext_u64' requires target feature 'bmi2', but 
would be inlined into function 'ExtractBits' that is compiled without support 
for 'bmi2'
{code}

https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016

cc: ~kou


> [CI][R][C++] R macOS 10.13 build due to bmi2 changes
> 
>
> Key: ARROW-17990
> URL: https://issues.apache.org/jira/browse/ARROW-17990
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> After the changes in [ARROW-15768] the macos 10.13 build is failing: 
> {code}
> /Users/voltrondata/tmp/hbtmp/apache-arrow-20221011-20438-19rxs8w/cpp/src/parquet/level_conversion_inc.h:278:10:
>  error: always_inline function '_pext_u64' requires target feature 'bmi2', 
> but would be inlined into function 'ExtractBits' that is compiled without 
> support for 'bmi2'
> {code}
> https://github.com/ursacomputing/crossbow/actions/runs/3225397138/jobs/5278904370#step:13:7016



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17972) [CI] Update cuda docker jobs

2022-10-10 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615217#comment-17615217
 ] 

Jacob Wujciak-Jens commented on ARROW-17972:


The main issue was fixed with [ARROW-17952] I added the deploy nodes to the 
build so they also work with docker compose directly. We also want these builds 
to be as small and focused as possible to keep runtime low on the cuda runners. 

> [CI] Update cuda docker jobs
> 
>
> Key: ARROW-17972
> URL: https://issues.apache.org/jira/browse/ARROW-17972
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cuda job config in docker-compose.yml are outdated and do not work 
> anymore. Additionally disable optional features to keep build time low.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17972) [CI] Update cuda docker jobs

2022-10-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17972:
---
Description: The cuda job config in docker-compose.yml are outdated and do 
not work anymore. Additionally disable optional features to keep build time 
low.  (was: The cuda job config in docker-compose.yml are outdated and do not 
work anymore.)

> [CI] Update cuda docker jobs
> 
>
> Key: ARROW-17972
> URL: https://issues.apache.org/jira/browse/ARROW-17972
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cuda job config in docker-compose.yml are outdated and do not work 
> anymore. Additionally disable optional features to keep build time low.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17972) [CI] Update cuda docker jobs

2022-10-10 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17972:
--

 Summary: [CI] Update cuda docker jobs
 Key: ARROW-17972
 URL: https://issues.apache.org/jira/browse/ARROW-17972
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


The cuda job config in docker-compose.yml are outdated and do not work anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17961) Add read/write optimization for pyarrow.fs.S3FileSystem

2022-10-07 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614053#comment-17614053
 ] 

Jacob Wujciak-Jens commented on ARROW-17961:


Things like readahead and metadata caching cc [~lidavidm] for details

> Add read/write optimization for pyarrow.fs.S3FileSystem
> ---
>
> Key: ARROW-17961
> URL: https://issues.apache.org/jira/browse/ARROW-17961
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Volker Lorrmann
>Priority: Minor
>
> I found large differences in loading time, when loading data  from aws s3 
> using {{pyarrows.fs.S3FileSystem}} compared to {{s3fs.S3FileSystem}} See 
> example below.
> The difference comes from {{s3fs}} optimization, which {{pyarrow.fs}} is not 
> (yet) using.
> {code:python}
> import pyarrow.dataset as ds
> import pyarrow.parquet as pq
> import pyarrow.fs as pafs
> import s3fs
> import load_credentials
> credentials = load_credentials()
> path = "path/to/data" # folder with about 300 small (~10kb) files
> fs1 = s3fs.S3FileSystem(
>     anon=False,
>     key=credentials["accessKeyId"],
>     secret=credentials["secretAccessKey"],
>     token=credentials["sessionToken"],
> )
> fs2 = pafs.S3FileSystem(
>     access_key=credentials["accessKeyId"],
>     secret_key=credentials["secretAccessKey"],
>     session_token=credentials["sessionToken"],
>    
> )
> _ = ds.dataset(path, filesystem=fs1).to_table() # takes about 5 seconds
> _ = ds.dataset(path, filesystem=fs2).to_table() # takes about 25 seconds
> _ = pq.read_table(path, filesyste=fs1) # takes about 5 seconds
> _ = pq.read_table(path, filesytem=fs2) # takes about 10 seconds
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17353) [Release] R libarrow binaries have the wrong version number

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611748#comment-17611748
 ] 

Jacob Wujciak-Jens commented on ARROW-17353:


No, as the r nightly versions don't conform to the X.Y.Z.devXXX format so it 
would only work for the release script but not for the nightly build upload. 
But that regex works for a normal X.Y.Z format so adding the param {{-p 
custom_version=10.0.0}} to the crossbow submit call should work? 

> [Release] R libarrow binaries have the wrong version number
> ---
>
> Key: ARROW-17353
> URL: https://issues.apache.org/jira/browse/ARROW-17353
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 9.0.0
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> The libarrow binaries that are uploaded during the release process have the 
> wrong version number. This is an issue with the submit binaries 
> script/r-binary-packages job. The arrow version should be picked up by the 
> job even if not passed explicitly as a custom param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17876) [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611745#comment-17611745
 ] 

Jacob Wujciak-Jens commented on ARROW-17876:


I read the other one as just adding and created this one for testing, removal 
and logic changes in nixlibs.R

> [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries
> 
>
> Key: ARROW-17876
> URL: https://issues.apache.org/jira/browse/ARROW-17876
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> The new dts compiled centos-7 binaries ([ARROW-17594]) should be able to 
> replace the ubuntu-18.04 binaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17849:
---
Description: 
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]) but that they can use 
INSTALL_opts = "--build" to get a binary package that is installable on a 
centos machine WITHOUT dts -> offline build section. Centos 7 RSPM is an 
alternative source for that binary package. 
Also add messaging in configure or build_arrow_static.sh so that if someone is 
trying to install from source with gcc 4.8, we tell them what they need to do.

This should be documented and noted in the release notes.


  was:
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]) but that they can use 
INSTALL_opts = "--build" to get a binary package that is installable on a 
centos machine WITHOUT dts -> offline build section. 
Also add messaging in configure or build_arrow_static.sh so that if someone is 
trying to install from source with gcc 4.8, we tell them what they need to do.

This should be documented and noted in the release notes.



> [R][Docs] Document changes due to C++17 for centos-7 users
> --
>
> Key: ARROW-17849
> URL: https://issues.apache.org/jira/browse/ARROW-17849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> With the switch to C++17 centos 7 users need to install and enable devtoolset 
> (and possibly change makevars) to be able to compile the R package, even when 
> using the libarrow binary (see [ARROW-17594]) but that they can use 
> INSTALL_opts = "--build" to get a binary package that is installable on a 
> centos machine WITHOUT dts -> offline build section. Centos 7 RSPM is an 
> alternative source for that binary package. 
> Also add messaging in configure or build_arrow_static.sh so that if someone 
> is trying to install from source with gcc 4.8, we tell them what they need to 
> do.
> This should be documented and noted in the release notes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17849:
---
Description: 
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]) but that they can use 
INSTALL_opts = "--build" to get a binary package that is installable on a 
centos machine WITHOUT dts -> offline build section. 
Also add messaging in configure or build_arrow_static.sh so that if someone is 
trying to install from source with gcc 4.8, we tell them what they need to do.

This should be documented and noted in the release notes.


  was:
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]). 
Also add messaging in configure or build_arrow_static.sh so that if someone is 
trying to install from source with gcc 4.8, we tell them what they need to do.

This should be documented and noted in the release notes.



> [R][Docs] Document changes due to C++17 for centos-7 users
> --
>
> Key: ARROW-17849
> URL: https://issues.apache.org/jira/browse/ARROW-17849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> With the switch to C++17 centos 7 users need to install and enable devtoolset 
> (and possibly change makevars) to be able to compile the R package, even when 
> using the libarrow binary (see [ARROW-17594]) but that they can use 
> INSTALL_opts = "--build" to get a binary package that is installable on a 
> centos machine WITHOUT dts -> offline build section. 
> Also add messaging in configure or build_arrow_static.sh so that if someone 
> is trying to install from source with gcc 4.8, we tell them what they need to 
> do.
> This should be documented and noted in the release notes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17849:
---
Priority: Blocker  (was: Critical)

> [R][Docs] Document changes due to C++17 for centos-7 users
> --
>
> Key: ARROW-17849
> URL: https://issues.apache.org/jira/browse/ARROW-17849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> With the switch to C++17 centos 7 users need to install and enable devtoolset 
> (and possibly change makevars) to be able to compile the R package, even when 
> using the libarrow binary (see [ARROW-17594]). 
> Also add messaging in configure or build_arrow_static.sh so that if someone 
> is trying to install from source with gcc 4.8, we tell them what they need to 
> do.
> This should be documented and noted in the release notes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-30 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17849:
---
Description: 
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]). 
Also add messaging in configure or build_arrow_static.sh so that if someone is 
trying to install from source with gcc 4.8, we tell them what they need to do.

This should be documented and noted in the release notes.


  was:
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]).

This should be documented and noted in the release notes.



> [R][Docs] Document changes due to C++17 for centos-7 users
> --
>
> Key: ARROW-17849
> URL: https://issues.apache.org/jira/browse/ARROW-17849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> With the switch to C++17 centos 7 users need to install and enable devtoolset 
> (and possibly change makevars) to be able to compile the R package, even when 
> using the libarrow binary (see [ARROW-17594]). 
> Also add messaging in configure or build_arrow_static.sh so that if someone 
> is trying to install from source with gcc 4.8, we tell them what they need to 
> do.
> This should be documented and noted in the release notes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610573#comment-17610573
 ] 

Jacob Wujciak-Jens commented on ARROW-17872:


bq. 10 minutes for extracting 1.5GB seems quite unexpected

I have checked in detail and each of the bigger dependecies (aws, llvm, boost) 
take 2-3 minutes to "pour", so ok speeds I would say. Just over all a lot but 
still nothing  Isee the cache really speeding up. 

The timeout is set to 60 minutes so we could just raise that limit if it is not 
 applicable for the current build complexity (or as you said remove features). 
The build should already be using all 3 available cores.

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610541#comment-17610541
 ] 

Jacob Wujciak-Jens commented on ARROW-17872:


relevant homebrew issue: https://github.com/Homebrew/brew/issues/13621

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610540#comment-17610540
 ] 

Jacob Wujciak-Jens commented on ARROW-17872:


And we have 12 & 15 both similar size (do we need both?), aws sdk is 800M...

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17876) [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries

2022-09-28 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17876:
--

 Summary: [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt 
binaries
 Key: ARROW-17876
 URL: https://issues.apache.org/jira/browse/ARROW-17876
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


The new dts compiled centos-7 binaries ([ARROW-17594]) should be able to 
replace the ubuntu-18.04 binaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17876) [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17876:
---
Priority: Critical  (was: Major)

> [R][CI] Remove ubuntu-18.04 from nixlibs & prebuilt binaries
> 
>
> Key: ARROW-17876
> URL: https://issues.apache.org/jira/browse/ARROW-17876
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> The new dts compiled centos-7 binaries ([ARROW-17594]) should be able to 
> replace the ubuntu-18.04 binaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610528#comment-17610528
 ] 

Jacob Wujciak-Jens edited comment on ARROW-17872 at 9/28/22 12:14 PM:
--

it looks like homebrew is using system tar to extract the gzipped bottles, 
maybe we can speed it up by symlinking in pigz to make use of the 3 cores the 
mac runners have...


was (Author: JIRAUSER287549):
it looks like homebrew is using system tar to extract the gzipped bottles, 
maybe we can speed it up by symlinking in pzip to make use of the 3 cores the 
mac runners have...

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610528#comment-17610528
 ] 

Jacob Wujciak-Jens commented on ARROW-17872:


it looks like homebrew is using system tar to extract the gzipped bottles, 
maybe we can speed it up by symlinking in pzip to make use of the 3 cores the 
mac runners have...

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17872) [CI] Cache dependencies on macOS builds

2022-09-28 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610524#comment-17610524
 ] 

Jacob Wujciak-Jens commented on ARROW-17872:


I have set up a test job with debug output to see what exactly is taking so 
long: 
https://github.com/assignUser/test-repo-a/actions/runs/3142905685/jobs/5107502078#step:4:392

If you turn on timestamps you can see that what takes the time is extracting 
the archives (e.g. llvm ~1.5G) not downloading them, so caching the {{hombrew 
--cache}} directory would not save significant time. As the cache is also tar'd 
extracting the cache might be the new bottle neck

> [CI] Cache dependencies on macOS builds
> ---
>
> Key: ARROW-17872
> URL: https://issues.apache.org/jira/browse/ARROW-17872
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration, GLib, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> Our macOS CI builds on Github Actions usually take at least 10 minutes 
> installing dependencies from Homebrew (because of compiling from source?). It 
> would be nice to cache those, especially as they probably don't change often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17854) [CI][Developer] Hoste preview docs on S3

2022-09-27 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17854:
--

 Summary: [CI][Developer] Hoste preview docs on S3
 Key: ARROW-17854
 URL: https://issues.apache.org/jira/browse/ARROW-17854
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Developer Tools
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


Hosting on Github Pages as implemented in [ARROW-12958] is unsustainable due to 
the size of the arrow docs (~ 200mb).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17477) [CI][Docs] Document Docs PR Preview

2022-09-27 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17477:
--

Assignee: (was: Jacob Wujciak-Jens)

> [CI][Docs] Document Docs PR Preview
> ---
>
> Key: ARROW-17477
> URL: https://issues.apache.org/jira/browse/ARROW-17477
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> Document the changes from [ARROW-12958] here: 
> https://arrow.apache.org/docs/developers/documentation.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-26 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17849:
---
Description: 
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary (see [ARROW-17594]).

This should be documented and noted in the release notes.


  was:
With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary.

This should be documented and noted in the release notes.



> [R][Docs] Document changes due to C++17 for centos-7 users
> --
>
> Key: ARROW-17849
> URL: https://issues.apache.org/jira/browse/ARROW-17849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> With the switch to C++17 centos 7 users need to install and enable devtoolset 
> (and possibly change makevars) to be able to compile the R package, even when 
> using the libarrow binary (see [ARROW-17594]).
> This should be documented and noted in the release notes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17849) [R][Docs] Document changes due to C++17 for centos-7 users

2022-09-26 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17849:
--

 Summary: [R][Docs] Document changes due to C++17 for centos-7 users
 Key: ARROW-17849
 URL: https://issues.apache.org/jira/browse/ARROW-17849
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


With the switch to C++17 centos 7 users need to install and enable devtoolset 
(and possibly change makevars) to be able to compile the R package, even when 
using the libarrow binary.

This should be documented and noted in the release notes.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (ARROW-17791) [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket

2022-09-23 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens resolved ARROW-17791.

Fix Version/s: 10.0.0
   Resolution: Fixed

> [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket
> 
>
> Key: ARROW-17791
> URL: https://issues.apache.org/jira/browse/ARROW-17791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: Nightly
> Fix For: 10.0.0
>
>
> The following nitghly failures:
>  * 
> [test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5007812721]
>  * 
> [test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3094412849/jobs/5007760110]
>  * 
> [test-conda-python-3.7-pandas-0.24|https://github.com/ursacomputing/crossbow/actions/runs/3094422644/jobs/5007779545]
>  * 
> [test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094419759/jobs/5007773935]
>  * 
> [test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/309904/jobs/5007827002]
>  * 
> [test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094405494/jobs/5007746062]
>  * 
> [test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3094407475/jobs/5007750212]
>  * 
> [test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3094450745/jobs/5007839959]
>  * 
> [test-conda-python-3.9-pandas-master|https://github.com/ursacomputing/crossbow/actions/runs/3094401032/jobs/5007736715]
>  * 
> [test-debian-11-python-3|https://github.com/ursacomputing/crossbow/runs/8465194776]
> Failed Python test_s3_real_aws_region_selection with ACCESS_DENIED:
> {code:java}
>  === FAILURES 
> ===
> __ test_s3_real_aws_region_selection 
> ___    @pytest.mark.s3
>     def test_s3_real_aws_region_selection():
>         # Taken from a registry of open S3-hosted datasets
>         # at https://github.com/awslabs/open-data-registry
>         fs, path = FileSystem.from_uri('s3://mf-nwp-models/README.txt')
>         assert fs.region == 'eu-west-1'
> >       with fs.open_input_stream(path) as 
> > f:opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_fs.py:1660:
> >  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> pyarrow/_fs.pyx:805: in pyarrow._fs.FileSystem.open_input_stream
>     ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ >   ???
> E   OSError: When reading information for key 'README.txt' in bucket 
> 'mf-nwp-models': AWS Error ACCESS_DENIED during HeadObject operation: No 
> response body.pyarrow/error.pxi:115: OSError {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17791) [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket

2022-09-23 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17791:
--

Assignee: Jacob Wujciak-Jens

> [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket
> 
>
> Key: ARROW-17791
> URL: https://issues.apache.org/jira/browse/ARROW-17791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Raúl Cumplido
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: Nightly
>
> The following nitghly failures:
>  * 
> [test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5007812721]
>  * 
> [test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3094412849/jobs/5007760110]
>  * 
> [test-conda-python-3.7-pandas-0.24|https://github.com/ursacomputing/crossbow/actions/runs/3094422644/jobs/5007779545]
>  * 
> [test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094419759/jobs/5007773935]
>  * 
> [test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/309904/jobs/5007827002]
>  * 
> [test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094405494/jobs/5007746062]
>  * 
> [test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3094407475/jobs/5007750212]
>  * 
> [test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3094450745/jobs/5007839959]
>  * 
> [test-conda-python-3.9-pandas-master|https://github.com/ursacomputing/crossbow/actions/runs/3094401032/jobs/5007736715]
>  * 
> [test-debian-11-python-3|https://github.com/ursacomputing/crossbow/runs/8465194776]
> Failed Python test_s3_real_aws_region_selection with ACCESS_DENIED:
> {code:java}
>  === FAILURES 
> ===
> __ test_s3_real_aws_region_selection 
> ___    @pytest.mark.s3
>     def test_s3_real_aws_region_selection():
>         # Taken from a registry of open S3-hosted datasets
>         # at https://github.com/awslabs/open-data-registry
>         fs, path = FileSystem.from_uri('s3://mf-nwp-models/README.txt')
>         assert fs.region == 'eu-west-1'
> >       with fs.open_input_stream(path) as 
> > f:opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_fs.py:1660:
> >  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> pyarrow/_fs.pyx:805: in pyarrow._fs.FileSystem.open_input_stream
>     ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ >   ???
> E   OSError: When reading information for key 'README.txt' in bucket 
> 'mf-nwp-models': AWS Error ACCESS_DENIED during HeadObject operation: No 
> response body.pyarrow/error.pxi:115: OSError {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17791) [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket

2022-09-23 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608761#comment-17608761
 ] 

Jacob Wujciak-Jens commented on ARROW-17791:


Renaming the envvars is not an option because sccache would not detect them 
then. But we found the issue, the sccache user needs explicit permission to 
access any bucket. We have now added this and will add any other buckets that 
need to be accessed in jobs that use sccache.

Successful run here: 
https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5047216106

> [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket
> 
>
> Key: ARROW-17791
> URL: https://issues.apache.org/jira/browse/ARROW-17791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Raúl Cumplido
>Priority: Critical
>  Labels: Nightly
>
> The following nitghly failures:
>  * 
> [test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5007812721]
>  * 
> [test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3094412849/jobs/5007760110]
>  * 
> [test-conda-python-3.7-pandas-0.24|https://github.com/ursacomputing/crossbow/actions/runs/3094422644/jobs/5007779545]
>  * 
> [test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094419759/jobs/5007773935]
>  * 
> [test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/309904/jobs/5007827002]
>  * 
> [test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094405494/jobs/5007746062]
>  * 
> [test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3094407475/jobs/5007750212]
>  * 
> [test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3094450745/jobs/5007839959]
>  * 
> [test-conda-python-3.9-pandas-master|https://github.com/ursacomputing/crossbow/actions/runs/3094401032/jobs/5007736715]
>  * 
> [test-debian-11-python-3|https://github.com/ursacomputing/crossbow/runs/8465194776]
> Failed Python test_s3_real_aws_region_selection with ACCESS_DENIED:
> {code:java}
>  === FAILURES 
> ===
> __ test_s3_real_aws_region_selection 
> ___    @pytest.mark.s3
>     def test_s3_real_aws_region_selection():
>         # Taken from a registry of open S3-hosted datasets
>         # at https://github.com/awslabs/open-data-registry
>         fs, path = FileSystem.from_uri('s3://mf-nwp-models/README.txt')
>         assert fs.region == 'eu-west-1'
> >       with fs.open_input_stream(path) as 
> > f:opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_fs.py:1660:
> >  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> pyarrow/_fs.pyx:805: in pyarrow._fs.FileSystem.open_input_stream
>     ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ >   ???
> E   OSError: When reading information for key 'README.txt' in bucket 
> 'mf-nwp-models': AWS Error ACCESS_DENIED during HeadObject operation: No 
> response body.pyarrow/error.pxi:115: OSError {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17795) [C++][R] Using ARROW_ZSTD_USE_SHARED fails

2022-09-21 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17795:
--

 Summary: [C++][R] Using ARROW_ZSTD_USE_SHARED fails
 Key: ARROW-17795
 URL: https://issues.apache.org/jira/browse/ARROW-17795
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


See zulip discussion 
[here|https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/zstd.20cmake.20changes]

Changes to the find zstd module cause failure when  ARROW_ZSTD_USE_SHARED is 
used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17795) [C++][R] Using ARROW_ZSTD_USE_SHARED fails

2022-09-21 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17795:
---
Issue Type: Bug  (was: Improvement)

> [C++][R] Using ARROW_ZSTD_USE_SHARED fails
> --
>
> Key: ARROW-17795
> URL: https://issues.apache.org/jira/browse/ARROW-17795
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> See zulip discussion 
> [here|https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/zstd.20cmake.20changes]
> Changes to the find zstd module cause failure when  ARROW_ZSTD_USE_SHARED is 
> used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17353) [Release] R libarrow binaries have the wrong version number

2022-09-21 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607673#comment-17607673
 ] 

Jacob Wujciak-Jens commented on ARROW-17353:


Yes all of the files should be named with only the release version and not with 
the added date component as you said. While it might be possible to change the 
r-binary-packages job to use a different version (it is possible via param but 
this was intended for R only patch releases and such) I think it is safer to 
integrate the checking/renaming into the actual ruby release script.

> [Release] R libarrow binaries have the wrong version number
> ---
>
> Key: ARROW-17353
> URL: https://issues.apache.org/jira/browse/ARROW-17353
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 9.0.0
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> The libarrow binaries that are uploaded during the release process have the 
> wrong version number. This is an issue with the submit binaries 
> script/r-binary-packages job. The arrow version should be picked up by the 
> job even if not passed explicitly as a custom param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17782) [C++][R] R package not building on macos 10.13 with C++17 std lib

2022-09-20 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17782:
--

 Summary: [C++][R] R package not building on macos 10.13 with C++17 
std lib
 Key: ARROW-17782
 URL: https://issues.apache.org/jira/browse/ARROW-17782
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


The R package also needs `-D_LIBCPP_DISABLE_AVAILABILITY` to be able to be 
compiled on macos 10.13



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17353) [Release] R libarrow binaries have the wrong version number

2022-09-20 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607232#comment-17607232
 ] 

Jacob Wujciak-Jens commented on ARROW-17353:


[~kou] this needs to be changed in the ruby scripts right?

> [Release] R libarrow binaries have the wrong version number
> ---
>
> Key: ARROW-17353
> URL: https://issues.apache.org/jira/browse/ARROW-17353
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 9.0.0
>Reporter: Jacob Wujciak-Jens
>Priority: Blocker
> Fix For: 10.0.0
>
>
> The libarrow binaries that are uploaded during the release process have the 
> wrong version number. This is an issue with the submit binaries 
> script/r-binary-packages job. The arrow version should be picked up by the 
> job even if not passed explicitly as a custom param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16190) [CI][R] Implement CI on Apple M1 for R

2022-09-14 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16190:
--

Assignee: Jacob Wujciak-Jens

> [CI][R] Implement CI on Apple M1 for R
> --
>
> Key: ARROW-16190
> URL: https://issues.apache.org/jira/browse/ARROW-16190
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17594) [R][Packaging] Build binaries with devtoolset 8 on CentOS 7

2022-09-14 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17594:
--

Assignee: Jacob Wujciak-Jens

> [R][Packaging] Build binaries with devtoolset 8 on CentOS 7
> ---
>
> Key: ARROW-17594
> URL: https://issues.apache.org/jira/browse/ARROW-17594
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging, R
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> ARROW-17545 switches Arrow C++ to require C++17, which does not compile with 
> the CentOS 7 system compiler. The corresponding build was disabled and needs 
> to be reenabled with the devtoolset activated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17726) [CI] Enable sccache on more builds

2022-09-14 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17726:
--

 Summary: [CI] Enable sccache on more builds
 Key: ARROW-17726
 URL: https://issues.apache.org/jira/browse/ARROW-17726
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


As a follow up to [ARROW-17021]. Enabling sccache should be as easy as adding 
the install script to the relevant docker image and the sccache env to the 
docker-compose service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-16605) [CI][R] Fix revdep docker job

2022-09-12 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16605:
---
Description: 
The revdep Crossbow job is currently not functioning correctly. This led to 
changed behaviour affecting a revdep with the 8.0.0 release, requiring a patch 
after initial submission.
cc: [~jonkeane]

Due to the time and performance constraints on GHA it does not make sense to 
have a crossbow job for this. A dockeR job to be able to cleanly run this 
locally does make sense though, so I renamed the ticket.

  was:
The revdep Crossbow job is currently not functioning correctly. This led to 
changed behaviour affecting a revdep with the 8.0.0 release, requiring a patch 
after initial submission.
cc: [~jonkeane]

Due to the time and performance constraints on GHA it does not make sense to 
have a crossbow job for this. A docke job to be able to cleanly run this 
locally does make sense though, so I renamed the ticket.


> [CI][R] Fix revdep docker job
> -
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]
> Due to the time and performance constraints on GHA it does not make sense to 
> have a crossbow job for this. A dockeR job to be able to cleanly run this 
> locally does make sense though, so I renamed the ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-16605) [CI][R] Fix revdep docker job

2022-09-12 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16605:
---
Description: 
The revdep Crossbow job is currently not functioning correctly. This led to 
changed behaviour affecting a revdep with the 8.0.0 release, requiring a patch 
after initial submission.
cc: [~jonkeane]

Due to the time and performance constraints on GHA it does not make sense to 
have a crossbow job for this. A docke job to be able to cleanly run this 
locally does make sense though, so I renamed the ticket.

  was:
The revdep Crossbow job is currently not functioning correctly. This led to 
changed behaviour affecting a revdep with the 8.0.0 release, requiring a patch 
after initial submission.
cc: [~jonkeane]


> [CI][R] Fix revdep docker job
> -
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]
> Due to the time and performance constraints on GHA it does not make sense to 
> have a crossbow job for this. A docke job to be able to cleanly run this 
> locally does make sense though, so I renamed the ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-16605) [CI][R] Fix revdep docker job

2022-09-12 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-16605:
---
Summary: [CI][R] Fix revdep docker job  (was: [CI][R] Fix revdep Crossbow 
job)

> [CI][R] Fix revdep docker job
> -
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17507) [Dev][CI][R] GHA "autotune" doesn't work

2022-09-07 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17507:
--

Assignee: Jacob Wujciak-Jens

> [Dev][CI][R] GHA "autotune" doesn't work
> 
>
> Key: ARROW-17507
> URL: https://issues.apache.org/jira/browse/ARROW-17507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Developer Tools, R
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The {{@github-actions autotune}} command fails on the R step:
> https://github.com/apache/arrow/runs/7981567559?check_suite_focus=true#step:10:247
> {code}
> Error: Error: Failed to install 'roxygen2' from GitHub:
>   cannot open URL 'https://api.github.com/repos/r-lib/roxygen2/commits/HEAD'
> Execution halted
> Error: Process completed with exit code 1.
> {code}
> (sidenote: it's annoying that it doesn't at least commit the other changes 
> made, as that PR has no R changes but C++)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17507) [Dev][CI][R] GHA "autotune" doesn't work

2022-09-07 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601217#comment-17601217
 ] 

Jacob Wujciak-Jens commented on ARROW-17507:


The R section should not have triggered on that as no R files were changed...

> [Dev][CI][R] GHA "autotune" doesn't work
> 
>
> Key: ARROW-17507
> URL: https://issues.apache.org/jira/browse/ARROW-17507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Developer Tools, R
>Reporter: Antoine Pitrou
>Priority: Major
>
> The {{@github-actions autotune}} command fails on the R step:
> https://github.com/apache/arrow/runs/7981567559?check_suite_focus=true#step:10:247
> {code}
> Error: Error: Failed to install 'roxygen2' from GitHub:
>   cannot open URL 'https://api.github.com/repos/r-lib/roxygen2/commits/HEAD'
> Execution halted
> Error: Process completed with exit code 1.
> {code}
> (sidenote: it's annoying that it doesn't at least commit the other changes 
> made, as that PR has no R changes but C++)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARROW-17507) [Dev][CI][R] GHA "autotune" doesn't work

2022-09-07 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601210#comment-17601210
 ] 

Jacob Wujciak-Jens edited comment on ARROW-17507 at 9/7/22 9:25 AM:


This is likely caused by gh api limit as it uses the built-in pat from  
{remotes} 'Using bundled GitHub PAT.'


was (Author: JIRAUSER287549):
This is likely caused by gh api limit as it uses the built-in pat from  {pak} 
'Using bundled GitHub PAT.'

> [Dev][CI][R] GHA "autotune" doesn't work
> 
>
> Key: ARROW-17507
> URL: https://issues.apache.org/jira/browse/ARROW-17507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Developer Tools, R
>Reporter: Antoine Pitrou
>Priority: Major
>
> The {{@github-actions autotune}} command fails on the R step:
> https://github.com/apache/arrow/runs/7981567559?check_suite_focus=true#step:10:247
> {code}
> Error: Error: Failed to install 'roxygen2' from GitHub:
>   cannot open URL 'https://api.github.com/repos/r-lib/roxygen2/commits/HEAD'
> Execution halted
> Error: Process completed with exit code 1.
> {code}
> (sidenote: it's annoying that it doesn't at least commit the other changes 
> made, as that PR has no R changes but C++)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17507) [Dev][CI][R] GHA "autotune" doesn't work

2022-09-07 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601210#comment-17601210
 ] 

Jacob Wujciak-Jens commented on ARROW-17507:


This is likely caused by gh api limit as it uses the built-in pat from  {pak} 
'Using bundled GitHub PAT.'

> [Dev][CI][R] GHA "autotune" doesn't work
> 
>
> Key: ARROW-17507
> URL: https://issues.apache.org/jira/browse/ARROW-17507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Developer Tools, R
>Reporter: Antoine Pitrou
>Priority: Major
>
> The {{@github-actions autotune}} command fails on the R step:
> https://github.com/apache/arrow/runs/7981567559?check_suite_focus=true#step:10:247
> {code}
> Error: Error: Failed to install 'roxygen2' from GitHub:
>   cannot open URL 'https://api.github.com/repos/r-lib/roxygen2/commits/HEAD'
> Execution halted
> Error: Process completed with exit code 1.
> {code}
> (sidenote: it's annoying that it doesn't at least commit the other changes 
> made, as that PR has no R changes but C++)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17628) [CI][Packaging][Java] Publish latest nightly with SNAPSHOT version

2022-09-06 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600727#comment-17600727
 ] 

Jacob Wujciak-Jens commented on ARROW-17628:


+1

> [CI][Packaging][Java] Publish latest nightly with SNAPSHOT version
> --
>
> Key: ARROW-17628
> URL: https://issues.apache.org/jira/browse/ARROW-17628
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Packaging
>Reporter: Raúl Cumplido
>Priority: Major
>
> I am trying to build the arrow-cookbooks for Java with the latest version 
> published of the Nightlies. Currently in order to use the latest nightly we 
> have to specify the latest version, i.e: 10.0.0.dev234
> In order to find out which one is the latest version published we have to 
> parse the HTML on 
> [https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-c-data/] (or 
> other packages) and find out which one is the latest as discussed on the 
> documentation:
> [https://github.com/apache/arrow/blob/master/docs/source/developers/java/building.rst#installing-from-apache-nightlies]
> I propose we publish the latest nightly both with its unique version (i.e: 
> 10.0.0.dev234) and with 10.0.0-SNAPSHOT to make it easier for upstream 
> automation.
> Another proposal would be to add a VERSIONS metatada file with the latest 
> published version but from some investigation on other Java projects 
> publishing nightlies as SNAPSHOT seems to be the most common use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17621) [CI] Audit workflows

2022-09-05 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17621:
--

 Summary: [CI] Audit workflows
 Key: ARROW-17621
 URL: https://issues.apache.org/jira/browse/ARROW-17621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


Set minimal permissions for token, check for out-dated actions, pin shas etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16385) [R] [CI] Clean up our snappy-sanitizer skipping behavior

2022-09-05 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600368#comment-17600368
 ] 

Jacob Wujciak-Jens commented on ARROW-16385:


The patch was finally merged sadly a release just happened, so we will have to 
keep this behavior for a while longer.

> [R] [CI] Clean up our snappy-sanitizer skipping behavior
> 
>
> Key: ARROW-16385
> URL: https://issues.apache.org/jira/browse/ARROW-16385
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> We have a number of locations where we skip parquet tests now that snappy is 
> built by default + we use it by default when it is built.
> One recent example of needing to do this is 
> https://github.com/apache/arrow/pull/13014
> However, skipping tests like this is a little bit of misdirection, since we 
> aren't really skipping these because | when snappy is not available like the 
> helper suggests, just using that helper to _also_ skip when we know we are in 
> a sanitizer environment.
> The ultimate answer to this, of course is to upstream the change 
> https://github.com/google/snappy/pull/148 though that's been sitting open for 
> a few months still.
> In the meantime, what if we took out these skips and instead used 
> uncompressed parquet for reading and writting in some builds? This way we 
> could make sure that snappy was not running during sanitizer tests, but still 
> have test coverage for these code paths in other runs where we don't need to 
> worry about this sanitizer error in snappy.
> https://github.com/apache/arrow/pull/13014#discussion_r859970907 proposed one 
> way to do this in this one case, but we should do it more generally for the 
> other skips that we have had to add.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ARROW-16605) [CI][R] Fix revdep Crossbow job

2022-09-05 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600363#comment-17600363
 ] 

Jacob Wujciak-Jens edited comment on ARROW-16605 at 9/5/22 12:16 PM:
-

The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours(not 
sure if this also applies to self-hosted runners) which is the hard limit for a 
GHA step. So we will need to split it up into multiple steps and modify the 
revdepcheck queue etc.. Or just run it manually prior to release, which of 
course has the potential to be overlooked (as has happened before)...


was (Author: JIRAUSER287549):
The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours 
which is the hard limit for a GHA step. So we will need to split it up into 
multiple steps and modify the revdepcheck queue etc.. Or just run it manually 
prior to release, which of course has the potential to be overlooked (as has 
happened before)...

> [CI][R] Fix revdep Crossbow job
> ---
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16605) [CI][R] Fix revdep Crossbow job

2022-09-05 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600363#comment-17600363
 ] 

Jacob Wujciak-Jens commented on ARROW-16605:


The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours 
which is the hard limit for a GHA step. So we will need to split it up into 
multiple steps and modify the revdepcheck queue etc.. Or just run it manually 
prior to release, which of course has the potential to be overlooked (as has 
happened before)...

> [CI][R] Fix revdep Crossbow job
> ---
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17616) [CI][Java] Java nightly upload job fails after introduction of pruning

2022-09-05 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17616:
--

 Summary: [CI][Java] Java nightly upload job fails after 
introduction of pruning
 Key: ARROW-17616
 URL: https://issues.apache.org/jira/browse/ARROW-17616
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Java
Reporter: Jacob Wujciak-Jens


The nightly java upload job has been failing ever since [ARROW-17293].
https://github.com/apache/arrow/actions/workflows/java_nightly.yml

It looks like the "Build Repository" step clashes with the synced repo?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17477) [CI][Docs] Document Docs PR Preview

2022-09-05 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17477:
--

Assignee: Jacob Wujciak-Jens

> [CI][Docs] Document Docs PR Preview
> ---
>
> Key: ARROW-17477
> URL: https://issues.apache.org/jira/browse/ARROW-17477
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Documentation
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> Document the changes from [ARROW-12958] here: 
> https://arrow.apache.org/docs/developers/documentation.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17477) [CI][Docs] Document Docs PR Preview

2022-08-20 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582205#comment-17582205
 ] 

Jacob Wujciak-Jens commented on ARROW-17477:


We could add that to archery. But that should not be a special case thing just 
for this, more a generalized "post submit message" 

> [CI][Docs] Document Docs PR Preview
> ---
>
> Key: ARROW-17477
> URL: https://issues.apache.org/jira/browse/ARROW-17477
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> Document the changes from [ARROW-12958] here: 
> https://arrow.apache.org/docs/developers/documentation.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17477) [CI][Docs] Document Docs PR Preview

2022-08-19 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17477:
--

 Summary: [CI][Docs] Document Docs PR Preview
 Key: ARROW-17477
 URL: https://issues.apache.org/jira/browse/ARROW-17477
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Documentation
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


Document the changes from [ARROW-12958] here: 
https://arrow.apache.org/docs/developers/documentation.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-15481) [R] [CI] Add a crossbow job that mimics CRAN's old macOS

2022-08-19 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581831#comment-17581831
 ] 

Jacob Wujciak-Jens commented on ARROW-15481:


Technically this has been done with: https://github.com/apache/arrow/pull/13769
but I want to add the 10.13 runners to the r-binary-packages job.

> [R] [CI] Add a crossbow job that mimics CRAN's old macOS
> 
>
> Key: ARROW-15481
> URL: https://issues.apache.org/jira/browse/ARROW-15481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>
> Jeroen's autobrew does this using travis:
> https://github.com/autobrew/homebrew-core/blob/high-sierra/.travis.yml
> It would be good to test this on our own before the release process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16464) [C++][CI][GPU] Add CUDA CI

2022-08-18 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581372#comment-17581372
 ] 

Jacob Wujciak-Jens commented on ARROW-16464:


[~apitrou] Yes this is planned for 10.0.0  (or ideally prior to the actual 
release phase).

> [C++][CI][GPU] Add CUDA CI
> --
>
> Key: ARROW-16464
> URL: https://issues.apache.org/jira/browse/ARROW-16464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, GPU
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> Arrow C++, PyArrow and perhaps other bindings have CUDA support, but none is 
> currently tested on CI, and I think few of the contributors enable CUDA on 
> their local builds.
> We should definitely exercise CUDA support, at least in the nightly builds 
> where we may have more flexibility to use custom machines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-12958) [CI][Developer] Build + host the docs for PR branches

2022-08-18 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-12958:
--

Assignee: Jacob Wujciak-Jens

> [CI][Developer] Build + host the docs for PR branches
> -
>
> Key: ARROW-12958
> URL: https://issues.apache.org/jira/browse/ARROW-12958
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Developer Tools, Documentation
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Major
> Fix For: 10.0.0
>
>
> We already run the docs building with crossbow, could we host the rendered 
> docs somewhere so that we can see what they look like during the PR process?
> ARROW-1299 is a ticket for nightly docs updates for what's in master.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-12958) [CI][Developer] Build + host the docs for PR branches

2022-08-17 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580900#comment-17580900
 ] 

Jacob Wujciak-Jens commented on ARROW-12958:


The gh_pages approach would be quite easy but the problem is that the normal 
{{pull_request}} trigger does not have secret access or write access to the 
repo. This is intentional and an important security measure as we are executing 
un-trusted, potentially malicious code. Running the code in docker as we do is 
a mitigation but I would still argue against changing this. There are 
solutions, I will test them on my fork and report back/open a PR.

> [CI][Developer] Build + host the docs for PR branches
> -
>
> Key: ARROW-12958
> URL: https://issues.apache.org/jira/browse/ARROW-12958
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Developer Tools, Documentation
>Reporter: Jonathan Keane
>Priority: Major
> Fix For: 10.0.0
>
>
> We already run the docs building with crossbow, could we host the rendered 
> docs somewhere so that we can see what they look like during the PR process?
> ARROW-1299 is a ticket for nightly docs updates for what's in master.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17444) [R] Windows Only: Cannot delete file previously accesed with open_dataset

2022-08-17 Thread Jacob Wujciak-Jens (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-17444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580712#comment-17580712
 ] 

Jacob Wujciak-Jens commented on ARROW-17444:


I can reproduce it. The issue seems to be caused specifically by collect. If 
you don't collect the error does not happen.

> [R] Windows Only: Cannot delete file previously accesed with open_dataset
> -
>
> Key: ARROW-17444
> URL: https://issues.apache.org/jira/browse/ARROW-17444
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 8.0.0, 9.0.0, 8.0.1
> Environment: Windows 10
> R 4.2.1
> RStudio 22.07.1
> Arrow 9.0 (fails on arrow 8 as well)
>Reporter: Riaz Arbi
>Priority: Major
>
> Hello,
> I encountered this issue because it breaks my tests when I run
> {code:java}
> rhub::check_for_cran(){code}
> Because of this, I know it only affects Windows, all other OS checks pass.
>  
> If you write files to a directory using arrow's 
> {code:java}
> write_*{code}
>  functions, and then 
> {code:java}
> collect(open_dataset(directory)){code}
>  
>  you cannot delete a file in the directory, you get an error. This is best 
> demonstrated in a reprex:
>  
> {code:java}
> # setup 
> 
> local_prefix <- tempfile()
> df <- data.frame(a = 1:5, b = letters[1:5])
> # works fine 
> ---
> fs <- LocalFileSystem$create()
> fs$CreateDir(local_prefix)
> fsdir <- fs$cd(local_prefix)
> write_parquet(df, fsdir$path("1.parquet"))
> #open_dataset(local_prefix) %>% collect()
> fsdir$DeleteFile("1.parquet")
> unlink(local_prefix, recursive = TRUE)
> # doesn't work 
> -
> fs <- LocalFileSystem$create()
> fs$CreateDir(local_prefix)
> fsdir <- fs$cd(local_prefix)
> write_parquet(df, fsdir$path("1.parquet"))
> open_dataset(local_prefix) %>% collect() # <-- ERROR IS CAUSED BY THIS
> fsdir$DeleteFile("1.parquet") # <-- HERE IS WHERE YOU GET AN ERROR
> unlink(local_prefix, recursive = TRUE)
>  
>  
> {code}
>  
> Here is the error I keep getting:
>  
> {code:java}
> Error: IOError: Cannot delete file 
> 'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. 
> Detail: [Windows error 32] The process cannot access the file because it is 
> being used by another process.
> {code}
>  
> Note that
>  * I do not create an object from the `open_dataset` function. I simply call 
> it.
>  * I also call `collect` in order to pull the data. So I cannot see why the 
> connection to the file should exist after collect is called
>  * as mentioned above, all other OSes don't exhibit this behaviour.
>  * my environment pane looks identical in both instances.
>  * I do not need to restart R to delete the file. I can simply clear all 
> objects from the workspace (rm(list = ls()) and then it works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17422) [C++][CI] Linux builds are missing dependencies

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17422:
---
Summary: [C++][CI] Linux builds are missing dependencies  (was: [C++][CI] 
Travis builds are missing dependencies)

> [C++][CI] Linux builds are missing dependencies
> ---
>
> Key: ARROW-17422
> URL: https://issues.apache.org/jira/browse/ARROW-17422
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
> Fix For: 10.0.0
>
>
> [ARROW-17394]  added new system dependencies that are missing in the linux 
> builds. E.g. https://github.com/ursacomputing/crossbow/runs/7834744542 
> https://github.com/ursacomputing/crossbow/runs/7834709298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17423) [CI][C++] CUDA docker images fail building

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17423:
--

Assignee: Jacob Wujciak-Jens

> [CI][C++] CUDA docker images fail building
> --
>
> Key: ARROW-17423
> URL: https://issues.apache.org/jira/browse/ARROW-17423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> We have Ubuntu-based CUDA-enabled builds using Docker images that evidently 
> have not been exercised in a long time:
> {code}
> $ archery docker run ubuntu-cuda-cpp
> Pulling ubuntu-cuda-cpp ... done
> WARNING: Some service image(s) must be built from source by running:
> docker-compose build ubuntu-cuda-cpp
> Building ubuntu-cuda-cpp
> [+] Building 0.9s (4/4) FINISHED  
> 
>  => [internal] load build definition from ubuntu-20.04-cpp.dockerfile 
>0.0s
>  => => transferring dockerfile: 5.39kB
>0.0s
>  => [internal] load .dockerignore 
>0.0s
>  => => transferring context: 35B  
>0.0s
>  => ERROR [internal] load metadata for 
> docker.io/nvidia/cuda:9.1-devel-ubuntu20.04   
> 0.9s
>  => [auth] nvidia/cuda:pull token for registry-1.docker.io
>0.0s
> --
>  > [internal] load metadata for docker.io/nvidia/cuda:9.1-devel-ubuntu20.04:
> --
> failed to solve with frontend dockerfile.v0: failed to create LLB definition: 
> docker.io/nvidia/cuda:9.1-devel-ubuntu20.04: not found
> ERROR: Service 'ubuntu-cuda-cpp' failed to build : Build failed
> Error: `docker-compose --file /home/antoine/arrow/dev/docker-compose.yml 
> build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-cuda-cpp` exited with a 
> non-zero exit code 1, see the process log above.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17422) [C++][CI] Travis builds are missing dependencies

2022-08-15 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17422:
--

 Summary: [C++][CI] Travis builds are missing dependencies
 Key: ARROW-17422
 URL: https://issues.apache.org/jira/browse/ARROW-17422
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


[ARROW-17394]  added new system dependencies that are missing in the linux 
builds. E.g. https://github.com/ursacomputing/crossbow/runs/7834744542 
https://github.com/ursacomputing/crossbow/runs/7834709298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17421) [C++] CUDA on Windows fails to build

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-17421:
---
Summary: [C++] CUDA on Windows fails to build  (was: [C++] CUDA on Windows 
fails to buil)

> [C++] CUDA on Windows fails to build
> 
>
> Key: ARROW-17421
> URL: https://issues.apache.org/jira/browse/ARROW-17421
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 9.0.0
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
> Fix For: 10.0.0
>
>
> While working on the vcpkg port for 9.0.0 I noticed that Arrow with 
> ARROW_CUDA does not build on Windows due to an issue with unique_ptr in 
> CudaDevice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17421) [C++] CUDA on Windows fails to buil

2022-08-15 Thread Jacob Wujciak-Jens (Jira)

Jacob Wujciak-Jens created ARROW-17421:
--

 Summary: [C++] CUDA on Windows fails to buil
 Key: ARROW-17421
 URL: https://issues.apache.org/jira/browse/ARROW-17421
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 9.0.0
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


While working on the vcpkg port for 9.0.0 I noticed that Arrow with ARROW_CUDA 
does not build on Windows due to an issue with unique_ptr in CudaDevice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-15368) [C++] [Docs] Improve our SIMD documentation

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15368:
---
Description: 
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
on an AVX512 capable similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]

  was:
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
on an AVX512 capeable similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]


> [C++] [Docs] Improve our SIMD documentation
> ---
>
> Key: ARROW-15368
> URL: https://issues.apache.org/jira/browse/ARROW-15368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Jonathan Keane
>Priority: Major
>
> We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
> {{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).
> We should also document what the defaults are (and what that means for 
> performance and possible optimization if you're compiling and you know you'll 
> be on more/less modern hardware:
> e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
> amount of runtime dispatched simd code, and MAX there means that it will 
> compile everything it can. but at runtime it will use whatever is available. 
> so if you compile on a machine with AVX512 and run on a machine with AVX512, 
> you'll get any AVX512 runtime dispatched code that's available (probably not 
> much). There is more (esp. in the query engine) that is runtime AVX2.
> FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
> ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses
> Additionally we should document that valgrind does not support AVX512: 
> [https://bugs.kde.org/show_bug.cgi?id=383010] 
> And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run 
> valgrind on an AVX512 capable similar to what we do for our 
> [CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-15368) [C++] [Docs] Improve our SIMD documentation

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15368:
---
Description: 
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
on an AVX512 capable machine similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]

  was:
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
on an AVX512 capable similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]


> [C++] [Docs] Improve our SIMD documentation
> ---
>
> Key: ARROW-15368
> URL: https://issues.apache.org/jira/browse/ARROW-15368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Jonathan Keane
>Priority: Major
>
> We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
> {{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).
> We should also document what the defaults are (and what that means for 
> performance and possible optimization if you're compiling and you know you'll 
> be on more/less modern hardware:
> e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
> amount of runtime dispatched simd code, and MAX there means that it will 
> compile everything it can. but at runtime it will use whatever is available. 
> so if you compile on a machine with AVX512 and run on a machine with AVX512, 
> you'll get any AVX512 runtime dispatched code that's available (probably not 
> much). There is more (esp. in the query engine) that is runtime AVX2.
> FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
> ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses
> Additionally we should document that valgrind does not support AVX512: 
> [https://bugs.kde.org/show_bug.cgi?id=383010] 
> And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run 
> valgrind on an AVX512 capable machine similar to what we do for our 
> [CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-15368) [C++] [Docs] Improve our SIMD documentation

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15368:
---
Description: 
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
on an AVX512 capeable similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]

  was:
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]


> [C++] [Docs] Improve our SIMD documentation
> ---
>
> Key: ARROW-15368
> URL: https://issues.apache.org/jira/browse/ARROW-15368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Jonathan Keane
>Priority: Major
>
> We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
> {{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).
> We should also document what the defaults are (and what that means for 
> performance and possible optimization if you're compiling and you know you'll 
> be on more/less modern hardware:
> e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
> amount of runtime dispatched simd code, and MAX there means that it will 
> compile everything it can. but at runtime it will use whatever is available. 
> so if you compile on a machine with AVX512 and run on a machine with AVX512, 
> you'll get any AVX512 runtime dispatched code that's available (probably not 
> much). There is more (esp. in the query engine) that is runtime AVX2.
> FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
> ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses
> Additionally we should document that valgrind does not support AVX512: 
> [https://bugs.kde.org/show_bug.cgi?id=383010] 
> And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run 
> valgrind on an AVX512 capeable similar to what we do for our 
> [CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-15368) [C++] [Docs] Improve our SIMD documentation

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15368:
---
Description: 
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
[https://bugs.kde.org/show_bug.cgi?id=383010] 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
similar to what we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]

  was:
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
https://bugs.kde.org/show_bug.cgi?id=383010 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
as we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]


> [C++] [Docs] Improve our SIMD documentation
> ---
>
> Key: ARROW-15368
> URL: https://issues.apache.org/jira/browse/ARROW-15368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Jonathan Keane
>Priority: Major
>
> We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
> {{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).
> We should also document what the defaults are (and what that means for 
> performance and possible optimization if you're compiling and you know you'll 
> be on more/less modern hardware:
> e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
> amount of runtime dispatched simd code, and MAX there means that it will 
> compile everything it can. but at runtime it will use whatever is available. 
> so if you compile on a machine with AVX512 and run on a machine with AVX512, 
> you'll get any AVX512 runtime dispatched code that's available (probably not 
> much). There is more (esp. in the query engine) that is runtime AVX2.
> FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
> ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses
> Additionally we should document that valgrind does not support AVX512: 
> [https://bugs.kde.org/show_bug.cgi?id=383010] 
> And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run 
> valgrind similar to what we do for our 
> [CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-15368) [C++] [Docs] Improve our SIMD documentation

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens updated ARROW-15368:
---
Description: 
We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
{{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses

Additionally we should document that valgrind does not support AVX512: 
https://bugs.kde.org/show_bug.cgi?id=383010 

And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run valgrind 
as we do for our 
[CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]

  was:
We should document the various env vars ({{ARROW_SIMD_LEVEL}}, 
{{ARROW_RUNTIME_SIMD_LEVEL}}, {{ARROW_USER_SIMD_LEVEL}}, others?).

We should also document what the defaults are (and what that means for 
performance and possible optimization if you're compiling and you know you'll 
be on more/less modern hardware:

e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
amount of runtime dispatched simd code, and MAX there means that it will 
compile everything it can. but at runtime it will use whatever is available. so 
if you compile on a machine with AVX512 and run on a machine with AVX512, 
you'll get any AVX512 runtime dispatched code that's available (probably not 
much). There is more (esp. in the query engine) that is runtime AVX2.

FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses


> [C++] [Docs] Improve our SIMD documentation
> ---
>
> Key: ARROW-15368
> URL: https://issues.apache.org/jira/browse/ARROW-15368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Jonathan Keane
>Priority: Major
>
> We should document the various env vars ({{{}ARROW_SIMD_LEVEL{}}}, 
> {{{}ARROW_RUNTIME_SIMD_LEVEL{}}}, {{{}ARROW_USER_SIMD_LEVEL{}}}, others?).
> We should also document what the defaults are (and what that means for 
> performance and possible optimization if you're compiling and you know you'll 
> be on more/less modern hardware:
> e.g. pyarrow and the R package are compiled with SSE4_2, but there is some 
> amount of runtime dispatched simd code, and MAX there means that it will 
> compile everything it can. but at runtime it will use whatever is available. 
> so if you compile on a machine with AVX512 and run on a machine with AVX512, 
> you'll get any AVX512 runtime dispatched code that's available (probably not 
> much). There is more (esp. in the query engine) that is runtime AVX2.
> FWIW I (neal) would leave ARROW_RUNTIME_SIMD_LEVEL=MAX always. You can set 
> ARROW_USER_SIMD_LEVEL to change/limit what level the runtime dispatch uses
> Additionally we should document that valgrind does not support AVX512: 
> https://bugs.kde.org/show_bug.cgi?id=383010 
> And users should set ARROW_USER_SIMD_LEVEL to AVX2 if they plan to run 
> valgrind as we do for our 
> [CI|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/docker-compose.yml#L309]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (ARROW-16318) [R]Timezone is not supported by to_duckdb()

2022-08-15 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens closed ARROW-16318.
--
Resolution: Fixed

Fixed by \{duckdb} 0.4.0.

> [R]Timezone is not supported by to_duckdb()
> ---
>
> Key: ARROW-16318
> URL: https://issues.apache.org/jira/browse/ARROW-16318
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 7.0.0
>Reporter: Zsolt Kegyes-Brassai
>Priority: Minor
>
> Here is a reproducible example:
>  
> {code:java}
> library(tidyverse)
> library(arrow)
> df1 <- tibble(time = lubridate::now(tzone = "UTC"))
> str(df1)
> #> tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
> #>  $ time: POSIXct[1:1], format: "2022-04-25 12:50:10"
> write_dataset(df1, here::here("temp/df1"), format = "parquet")
> open_dataset(here::here("temp/df1")) |> 
>   to_duckdb()
> #> Error: duckdb_prepare_R: Failed to prepare query SELECT *
> #> FROM "arrow_001" AS "q01"
> #> WHERE (0 = 1)
> #> Error: Not implemented Error: Unsupported Internal Arrow Type tsu:UTC
> df2 <- tibble(time = lubridate::now())
> str(df2)
> #> tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
> #>  $ time: POSIXct[1:1], format: "2022-04-25 14:50:11"
> write_dataset(df2, here::here("temp/df2"), format = "parquet")
> open_dataset(here::here("temp/df2")) |> 
>   to_duckdb()
> #> # Source:   table [?? x 1]
> #> # Database: duckdb_connection
> #>   time               
> #>                
> #> 1 2022-04-25 12:50:11
> {code}
>  
> The timestamps without timezone information are working fine.
> How one can remove easily the timezone information from {{timestamp }}type 
> column from a parquet dataset?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16363) [Docs] How to run CI builds locally - Windows

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16363:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Docs] How to run CI builds locally - Windows
> -
>
> Key: ARROW-16363
> URL: https://issues.apache.org/jira/browse/ARROW-16363
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16364) [Docs] How to run CI builds locally - Linux

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16364:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Docs] How to run CI builds locally - Linux
> ---
>
> Key: ARROW-16364
> URL: https://issues.apache.org/jira/browse/ARROW-16364
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-13840) [Doc] Add FAQ section to CI docs

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-13840:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Doc] Add FAQ section to CI docs
> 
>
> Key: ARROW-13840
> URL: https://issues.apache.org/jira/browse/ARROW-13840
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Nicola Crane
>Priority: Major
> Fix For: 10.0.0
>
>
> Practical questions, split by whether the reader is using the CI (i.e. 
> running jobs) vs. developing/extending the CI.
> Possible examples:
> user
> * how do I locally run one of these tasks?
> * where can I find the results of running one of these tasks locally?
> * why might I want to use archery docker run vs. docker run?
> dev
> * how do I update a task that runs on each PR/merge?
> * how do I add a new job to the tasks run via Crossbow/Archery?
> * what's the difference between passing in env vars via -e to the docker 
> calls vs. explicitly setting them as env vars (inner/outer)?
> * conda feedstocks - what they are and why they're important to a lot of our 
> builds



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16366) [Docs] CI FAQ User Section

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16366:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Docs] CI FAQ User Section 
> ---
>
> Key: ARROW-16366
> URL: https://issues.apache.org/jira/browse/ARROW-16366
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Major
>
> - Diagramms
>  - where can I find the results of running one of these tasks locally?
>  - why might I want to use archery docker run vs. docker run?
> dev
>  - ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16365) [Docs] How to run CI builds locally - MacOS

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16365:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Docs] How to run CI builds locally - MacOS
> ---
>
> Key: ARROW-16365
> URL: https://issues.apache.org/jira/browse/ARROW-16365
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-16370) [Docs] Document Crossbow builds outside of code

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-16370:
--

Assignee: (was: Jacob Wujciak-Jens)

> [Docs] Document Crossbow builds outside of code
> ---
>
> Key: ARROW-16370
> URL: https://issues.apache.org/jira/browse/ARROW-16370
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Jacob Wujciak-Jens
>Priority: Major
>
> Find a way to document the different builds that are available on-demand and 
> run regularly (nightly) outside of docker-compose.yml & tasks.yml.
> Something like a "build matrix", ideally we would also have a 
> "supported-config matrix" to compare against each other.
> CC [~raulcd] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (ARROW-17021) [C++][R][CI] Enable use of sccache in cpp crossbow builds

2022-08-10 Thread Jacob Wujciak-Jens (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17021:
--

Assignee: Jacob Wujciak-Jens

> [C++][R][CI] Enable use of sccache in cpp crossbow builds
> -
>
> Key: ARROW-17021
> URL: https://issues.apache.org/jira/browse/ARROW-17021
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> sccache is ccache with a cloud storage back-end and allows us to circumvent 
> the issue of cross branch caching in crossbow (not possible by design).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo