date:20210111



 [ 
https://issues.apache.org/jira/browse/ARROW-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-10777.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 9145
[https://github.com/apache/arrow/pull/9145]

> [Packaging][Python] PyPI pyarrow source dist (sdist) contains architecture 
> dependent binaries 
> --
>
> Key: ARROW-10777
> URL: https://issues.apache.org/jira/browse/ARROW-10777
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 2.0.0
>Reporter: Daniel Jewell
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Downloading the most recent pyarrow "sdist" *source* tarball from Pypi and 
> upon extraction, the package contains multiple *binary* libraries compiled 
> for x86-64. (libarrow, libparquet, etc.) 
>  
> The ultimate result is that this isn't a source package at all - it would be 
> fine to include binaries in a python wheel but including arch/platform 
> specific binaries in a sdist breaks pip and the install. (In my case, trying 
> to install on aarch64.)
> As a general observation, this will become a larger issue as, for example, 
> the ARM-based Macs come to market. 
> That said, one commonly implemented option is to make the python source 
> package download and build any dependent libraries. 
>  
> At the very least, the source package should not contain binaries. I suppose 
> it's not much different from a *source* Debian package containing compiled 
> binary code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11204) [C++] Fix build failure with bundled gRPC and Protobuf



 [ 
https://issues.apache.org/jira/browse/ARROW-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-11204.
--
Resolution: Fixed

Issue resolved by pull request 9157
[https://github.com/apache/arrow/pull/9157]

> [C++] Fix build failure with bundled gRPC and Protobuf
> --
>
> Key: ARROW-11204
> URL: https://issues.apache.org/jira/browse/ARROW-11204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is caused by ARROW-9400.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11204) [C++] Fix build failure with bundled gRPC and Protobuf



 [ 
https://issues.apache.org/jira/browse/ARROW-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-11204:
-
Summary: [C++] Fix build failure with bundled gRPC and Protobuf  (was: 
[C++] Fix build failure with bundled gRPC and Protobuf on non MSVC)

> [C++] Fix build failure with bundled gRPC and Protobuf
> --
>
> Key: ARROW-11204
> URL: https://issues.apache.org/jira/browse/ARROW-11204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is caused by ARROW-9400.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11219) [CI][Ruby][MinGW] Reduce CI time



 [ 
https://issues.apache.org/jira/browse/ARROW-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11219:
---
Labels: pull-request-available  (was: )

> [CI][Ruby][MinGW] Reduce CI time
> 
>
> Key: ARROW-11219
> URL: https://issues.apache.org/jira/browse/ARROW-11219
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, GLib, Ruby
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11219) [CI][Ruby][MinGW] Reduce CI time

Kouhei Sutou created ARROW-11219:


 Summary: [CI][Ruby][MinGW] Reduce CI time
 Key: ARROW-11219
 URL: https://issues.apache.org/jira/browse/ARROW-11219
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, GLib, Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10834) [R] Fix print method for SubTreeFileSystem



 [ 
https://issues.apache.org/jira/browse/ARROW-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-10834.
-
Resolution: Fixed

Issue resolved by pull request 9168
[https://github.com/apache/arrow/pull/9168]

> [R] Fix print method for SubTreeFileSystem
> --
>
> Key: ARROW-10834
> URL: https://issues.apache.org/jira/browse/ARROW-10834
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 
> Running as a container in AWS fargate.
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> arrow::arrow_with_s3(){code}
>  returns TRUE.
> {code:java}
> arrow::s3_bucket("", role_arn=""){code}
>  causes the error:
>  
>  
> {code:java}
> ERROR while rich displaying an object: Error in get(name, x$base_fs): object 
> '.class_title' not found
> Traceback:
> 1. FUN(X[[i]], ...)
> 2. tryCatch(withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler), error = outer_handler)
> 3. tryCatchList(expr, classes, parentenv, handlers)
> 4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5. doTryCatch(return(expr), name, parentenv, handler)
> 6. withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler)
> 7. repr::mime2repr[[mime]](obj)
> 8. repr_text.default(obj)
> 9. paste(capture.output(print(obj)), collapse = "\n")
> 10. capture.output(print(obj))
> 11. evalVis(expr)
> 12. withVisible(eval(expr, pf))
> 13. eval(expr, pf)
> 14. eval(expr, pf)
> 15. print(obj)
> 16. print.R6(obj)
> 17. .subset2(x, "print")(...)
> 18. self$.class_title
> 19. `$.SubTreeFileSystem`(self, .class_title)
> 20. get(name, x$base_fs)
>  
> {code}
>  
> {code:java}
> SessionInfo(){code}
>  
> {code:java}
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 LTS
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 
>  [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 
>  [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
>  [7] LC_PAPER=en_US.UTF-8 LC_NAME=C 
>  [9] LC_ADDRESS=C LC_TELEPHONE=C 
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] dtplyr_1.0.1 dplyr_1.0.2
> loaded via a namespace (and not attached):
>  [1] magrittr_2.0.1 tidyselect_1.1.0 bit_4.0.4 uuid_0.1-4 
>  [5] R6_2.5.0 rlang_0.4.9 tools_4.0.3 data.table_1.13.2
>  [9] arrow_2.0.0 htmltools_0.5.0 ellipsis_0.3.1 bit64_4.0.5 
> [13] digest_0.6.27 assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 
> [17] crayon_1.3.4 IRdisplay_0.7.0 purrr_0.3.4 repr_1.1.0 
> [21] base64enc_0.1-3 vctrs_0.3.5 IRkernel_1.1.1 glue_1.4.2 
> [25] evaluate_0.14 pbdZMQ_0.3-3.1 compiler_4.0.3 pillar_1.4.7 
> [29] generics_0.1.0 jsonlite_1.7.1 pkgconfig_2.0.3
> {code}
>  
>  
> I had to use the work-around documented here: 
> https://issues.apache.org/jira/browse/ARROW-10371?jql=project%20%3D%20ARROW%20AND%20text%20~%20%22libcurl4-openssl-dev%22
>  (Download cmake 3.19.1, build it, and set CMAKE=.) to 
> install arrow.
>  
> I'm sorry I don't have more ideas about the error. Without reading the code 
> I'm not even sure what's going on in this part of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10463) [R] Better messaging for currently unsupported CSV options in open_dataset



 [ 
https://issues.apache.org/jira/browse/ARROW-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-10463.
-
Resolution: Fixed

Issue resolved by pull request 9143
[https://github.com/apache/arrow/pull/9143]

> [R] Better messaging for currently unsupported CSV options in open_dataset
> --
>
> Key: ARROW-10463
> URL: https://issues.apache.org/jira/browse/ARROW-10463
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> While read_csv_arrow()'s signature matches readr,  the 
> readr_to_csv_parse_options() function (called by way of open_dataset()) only 
> appears to capture a subset of those options:
> (https://github.com/apache/arrow/blob/883eb572bc64430307112895976ba79df10c8c7d/r/R/csv.R#L464)
> {code:java}
> readr_to_csv_parse_options <- function(delim = ",",
>  quote = '"',
>  escape_double = TRUE,
>  escape_backslash = FALSE,
>  skip_empty_rows = TRUE){code}
> I ran into this trying to use a non-standard 'na' value:
>  
> {code:java}
> open_dataset("/path/to/csv/directory/", schema = sch, partitioning=NULL, 
> format="csv", delim=";", na="\\N", escape_backslash=TRUE, 
> escape_double=FALSE`)
> Error in readr_to_csv_parse_options(...) : unused argument (na = "\\N")
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11210) [CI] Restore workflows that had been blocked by INFRA



 [ 
https://issues.apache.org/jira/browse/ARROW-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-11210.
-
Resolution: Fixed

Issue resolved by pull request 9165
[https://github.com/apache/arrow/pull/9165]

> [CI] Restore workflows that had been blocked by INFRA
> -
>
> Key: ARROW-11210
> URL: https://issues.apache.org/jira/browse/ARROW-11210
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See INFRA-21239, ARROW-11092, ARROW-11132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11218) [R] Make SubTreeFileSystem print method more informative



 [ 
https://issues.apache.org/jira/browse/ARROW-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11218:
---
Labels: pull-request-available  (was: )

> [R] Make SubTreeFileSystem print method more informative
> 
>
> Key: ARROW-11218
> URL: https://issues.apache.org/jira/browse/ARROW-11218
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The print method for {{SubTreeFileSystem}} objects prints only the class 
> name. Make it print more useful information, such as a filesystem URI 
> including scheme. For example:
> {code:r}
> print(s3_bucket("ursa-labs-taxi-data"))
> ## SubTreeFileSystem: s3://ursa-labs-taxi-data
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11218) [R] Make SubTreeFileSystem print method more informative



 [ 
https://issues.apache.org/jira/browse/ARROW-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-11218:
-
Fix Version/s: 3.0.0

> [R] Make SubTreeFileSystem print method more informative
> 
>
> Key: ARROW-11218
> URL: https://issues.apache.org/jira/browse/ARROW-11218
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Minor
> Fix For: 3.0.0
>
>
> The print method for {{SubTreeFileSystem}} objects prints only the class 
> name. Make it print more useful information, such as a filesystem URI 
> including scheme. For example:
> {code:r}
> print(s3_bucket("ursa-labs-taxi-data"))
> ## SubTreeFileSystem: s3://ursa-labs-taxi-data
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11218) [R] Make SubTreeFileSystem print method more informative

Ian Cook created ARROW-11218:


 Summary: [R] Make SubTreeFileSystem print method more informative
 Key: ARROW-11218
 URL: https://issues.apache.org/jira/browse/ARROW-11218
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Ian Cook
Assignee: Ian Cook


The print method for {{SubTreeFileSystem}} objects prints only the class name. 
Make it print more useful information, such as a filesystem URI including 
scheme. For example:
{code:r}
print(s3_bucket("ursa-labs-taxi-data"))
## SubTreeFileSystem: s3://ursa-labs-taxi-data
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11176) [R] Expose memory pool name and document setting it



 [ 
https://issues.apache.org/jira/browse/ARROW-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11176:
---
Labels: pull-request-available  (was: )

> [R] Expose memory pool name and document setting it
> ---
>
> Key: ARROW-11176
> URL: https://issues.apache.org/jira/browse/ARROW-11176
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Followup to ARROW-11009, which did this in C++ and added the binding in 
> Python. This could be useful not only for debugging but also for benchmarking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11217) [C++] Runtime SIMD check on Apple hardware missing



[ 
https://issues.apache.org/jira/browse/ARROW-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262989#comment-17262989
 ] 

Antoine Pitrou commented on ARROW-11217:


Example code:

[https://github.com/numpy/numpy/pull/13339/files]

Also a list of possible features to check for:

[https://www.mersenneforum.org/showthread.php?p=561100#post561100]

 

> [C++] Runtime SIMD check on Apple hardware missing
> --
>
> Key: ARROW-11217
> URL: https://issues.apache.org/jira/browse/ARROW-11217
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Priority: Critical
> Fix For: 3.0.0
>
>
> [~jeroenooms] hit a crash in the "sum" compute kernel using the R package on 
> a new M1 machine running the rosetta emulator: 
> https://gist.github.com/jeroen/c60548b29ff7f6807a6554799bd01cb7
> According to 
> https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment,
>  we should be checking sysctlbyname for AVX* capabilities, but we are not. We 
> only use that function in 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/cpu_info.cc#L350-L359
>  to check cpu cache size. 
> This may also explain a crash we observed previously on a very old macOS CRAN 
> machine. 
> I think we should to resolve this before the 3.0 release if possible, in 
> order to avoid bug reports as more people get M1s. 
> cc [~apitrou] [~uwe] [~kou] [~frankdu] [~yibo]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11217) [C++] Runtime SIMD check on Apple hardware missing

Neal Richardson created ARROW-11217:
---

 Summary: [C++] Runtime SIMD check on Apple hardware missing
 Key: ARROW-11217
 URL: https://issues.apache.org/jira/browse/ARROW-11217
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
 Fix For: 3.0.0


[~jeroenooms] hit a crash in the "sum" compute kernel using the R package on a 
new M1 machine running the rosetta emulator: 
https://gist.github.com/jeroen/c60548b29ff7f6807a6554799bd01cb7

According to 
https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment,
 we should be checking sysctlbyname for AVX* capabilities, but we are not. We 
only use that function in 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/cpu_info.cc#L350-L359
 to check cpu cache size. 

This may also explain a crash we observed previously on a very old macOS CRAN 
machine. 

I think we should to resolve this before the 3.0 release if possible, in order 
to avoid bug reports as more people get M1s. 

cc [~apitrou] [~uwe] [~kou] [~frankdu] [~yibo]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-11150) [Rust] Set up bi-weekly Rust sync call and update website

2021-01-11 Thread Andy Grove (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262983#comment-17262983
 ] 

Andy Grove edited comment on ARROW-11150 at 1/11/21, 11:36 PM:
---

I created the Google Meet for this: https://meet.google.com/ctp-yujs-aee


was (Author: andygrove):
I created the Google Meet for this: google.com/ctp-yujs-aee

> [Rust] Set up bi-weekly Rust sync call and update website
> -
>
> Key: ARROW-11150
> URL: https://issues.apache.org/jira/browse/ARROW-11150
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Given the momentum on the Rust implementation, I am going to set up a 
> bi-weekly sync call on Google Meet most likely. The call will be at the same 
> time as the current sync call but on alternate weeks.
> I will update the web site to list both calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11150) [Rust] Set up bi-weekly Rust sync call and update website

2021-01-11 Thread Andy Grove (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262983#comment-17262983
 ] 

Andy Grove commented on ARROW-11150:


I created the Google Meet for this: google.com/ctp-yujs-aee

> [Rust] Set up bi-weekly Rust sync call and update website
> -
>
> Key: ARROW-11150
> URL: https://issues.apache.org/jira/browse/ARROW-11150
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Given the momentum on the Rust implementation, I am going to set up a 
> bi-weekly sync call on Google Meet most likely. The call will be at the same 
> time as the current sync call but on alternate weeks.
> I will update the web site to list both calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11216) [Rust] Improve documentation for StringDictionaryBuilder



 [ 
https://issues.apache.org/jira/browse/ARROW-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11216:
---
Labels: pull-request-available  (was: )

> [Rust] Improve documentation for StringDictionaryBuilder
> 
>
> Key: ARROW-11216
> URL: https://issues.apache.org/jira/browse/ARROW-11216
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I find myself trying to remember the exact incantation to create a 
> `StringDictionaryBuilder` so it should be a doc example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11216) [Rust] Improve documentation for StringDictionaryBuilder

Andrew Lamb created ARROW-11216:
---

 Summary: [Rust] Improve documentation for StringDictionaryBuilder
 Key: ARROW-11216
 URL: https://issues.apache.org/jira/browse/ARROW-11216
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


I find myself trying to remember the exact incantation to create a 
`StringDictionaryBuilder` so it should be a doc example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10417) [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and RecordBatchFileWriter



[ 
https://issues.apache.org/jira/browse/ARROW-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262950#comment-17262950
 ] 

Weston Pace edited comment on ARROW-10417 at 1/11/21, 9:54 PM:
---

[~shouheng] I'm having trouble reproducing this with the file you've provided.  
I am running on an Ubuntu 20.04.1 desktop and RAM usage stays constant.  Can 
you provide me some more details around the environment you are running in?  
From the charts it appears you might be submitting some kind of batch job?  Do 
you know any details about what kind of OS is running or how it is configured?

If you put...
{code:java}
pa.jemalloc_set_decay_ms(1)
{code}
..in your script (immediately after import pyarrow) does it change the behavior?

Also, do you see this kind of memory growth with other calls or is it 
specifically the record batch writer calls?  For example, can you try the 
following similar program and see if you see the same kind of growth?
{code:java}
import tempfile
import os
import sys

import pyarrow as pa

B = 1
KB = 1024 * B
MB = 1024 * KB

schema = pa.schema(
[
pa.field("a_string", pa.string()),
pa.field("an_int", pa.int32()),
pa.field("a_float", pa.float32()),
pa.field("a_list_of_floats", pa.list_(pa.float32())),
]
)

nrows_in_a_batch = 1000
nbatches_in_a_table = 1000

column_arrays = [
["string"] * nrows_in_a_batch,
[123] * nrows_in_a_batch,
[456.789] * nrows_in_a_batch,
[range(1000)] * nrows_in_a_batch,
]

def main(sys_args) -> None:

for iteration in range(1000):
if iteration % 100 == 0:
print(f'Percent complete: {100*iteration/1000.0}')
batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
schema=schema)

if __name__ == "__main__":
main(sys.argv[1:])
{code}
!arrow-10417-memtest-1.png!


was (Author: westonpace):
[~shouheng] I'm having trouble reproducing this with the file you've provided.  
I am running on an Ubuntu 20.04.1 desktop and RAM usage stays constant.  Can 
you provide me some more details around the environment you are running in?  
From the charts it appears you might be submitting some kind of batch job?  Do 
you know any details about what kind of OS is running or how it is configured?

If you put...
{code:java}
pa.jemalloc_set_decay_ms(1)
{code}
..in your script (immediately after import pyarrow) does it change the behavior?

Also, do you see this kind of memory growth with other calls or is it 
specifically the record batch writer calls?  For example, can you try the 
following similar program and see if you see the same kind of growth?
{code:java}
import tempfile
import os
import sysimport pyarrow as paB = 1
KB = 1024 * B
MB = 1024 * KBschema = pa.schema(
[
pa.field("a_string", pa.string()),
pa.field("an_int", pa.int32()),
pa.field("a_float", pa.float32()),
pa.field("a_list_of_floats", pa.list_(pa.float32())),
]
)nrows_in_a_batch = 1000
nbatches_in_a_table = 1000column_arrays = [
["string"] * nrows_in_a_batch,
[123] * nrows_in_a_batch,
[456.789] * nrows_in_a_batch,
[range(1000)] * nrows_in_a_batch,
]def main(sys_args) -> None:for iteration in range(1000):
if iteration % 100 == 0:
print(f'Percent complete: {100*iteration/1000.0}')
batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
schema=schema)if __name__ == "__main__":
main(sys.argv[1:])

{code}
!arrow-10417-memtest-1.png!

> [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and 
> RecordBatchFileWriter
> ---
>
> Key: ARROW-10417
> URL: https://issues.apache.org/jira/browse/ARROW-10417
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.15.1, 2.0.0
> Environment: This is the config for my worker node:
> resources:
> - cpus: 1
> - maxMemoryMb: 4096
> - reservedMemoryMb: 2048
>Reporter: Shouheng Yi
>Priority: Major
> Fix For: 2.0.1, 3.0.0
>
> Attachments: Screen Shot 2020-10-28 at 9.43.32 PM.png, Screen Shot 
> 2020-10-28 at 9.43.40 PM.png, Screen Shot 2020-10-29 at 9.22.58 AM.png, 
> arrow-10417-memtest-1.png
>
>
> There might be a memory leak in the {{RecordBatchStreamWriter}}. The memory 
> resources were not released. It always hit the memory limit and started doing 
> virtual memory swapping. See the picture below:
> !Screen Shot 2020-10-28 at 9.43.32 PM.png!
> This was the code:
> {code:python}
> import tempfile
> import os
> import sys
> import pyarrow as pa
> B = 1
> KB = 1024 * B
> MB = 1024 * KB
> schema = pa.schema(
>

[jira] [Comment Edited] (ARROW-10417) [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and RecordBatchFileWriter



[ 
https://issues.apache.org/jira/browse/ARROW-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262950#comment-17262950
 ] 

Weston Pace edited comment on ARROW-10417 at 1/11/21, 9:52 PM:
---

[~shouheng] I'm having trouble reproducing this with the file you've provided.  
I am running on an Ubuntu 20.04.1 desktop and RAM usage stays constant.  Can 
you provide me some more details around the environment you are running in?  
From the charts it appears you might be submitting some kind of batch job?  Do 
you know any details about what kind of OS is running or how it is configured?

If you put...
{code:java}
pa.jemalloc_set_decay_ms(1)
{code}
..in your script (immediately after import pyarrow) does it change the behavior?

Also, do you see this kind of memory growth with other calls or is it 
specifically the record batch writer calls?  For example, can you try the 
following similar program and see if you see the same kind of growth?
{code:java}
import tempfile
import os
import sysimport pyarrow as paB = 1
KB = 1024 * B
MB = 1024 * KBschema = pa.schema(
[
pa.field("a_string", pa.string()),
pa.field("an_int", pa.int32()),
pa.field("a_float", pa.float32()),
pa.field("a_list_of_floats", pa.list_(pa.float32())),
]
)nrows_in_a_batch = 1000
nbatches_in_a_table = 1000column_arrays = [
["string"] * nrows_in_a_batch,
[123] * nrows_in_a_batch,
[456.789] * nrows_in_a_batch,
[range(1000)] * nrows_in_a_batch,
]def main(sys_args) -> None:for iteration in range(1000):
if iteration % 100 == 0:
print(f'Percent complete: {100*iteration/1000.0}')
batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
schema=schema)if __name__ == "__main__":
main(sys.argv[1:])

{code}
!arrow-10417-memtest-1.png!


was (Author: westonpace):
[~shouheng] I'm having trouble reproducing this with the file you've provided.  
I am running on an Ubuntu 20.04.1 desktop and RAM usage stays constant.  Can 
you provide me some more details around the environment you are running in?  
From the charts it appears you might be submitting some kind of batch job?  Do 
you know any details about what kind of OS is running or how it is configured?

If you put...
{code:java}
pa.jemalloc_set_decay_ms(1)
{code}
..in your script (immediately after import pyarrow) does it change the behavior?

Also, do you see this kind of memory growth with other calls or is it 
specifically the record batch writer calls?  For example, can you try the 
following similar program and see if you see the same kind of growth?
{code:java}
import tempfile
import os
import sysimport pyarrow as paB = 1
KB = 1024 * B
MB = 1024 * KBschema = pa.schema(
[
pa.field("a_string", pa.string()),
pa.field("an_int", pa.int32()),
pa.field("a_float", pa.float32()),
pa.field("a_list_of_floats", pa.list_(pa.float32())),
]
)nrows_in_a_batch = 1000
nbatches_in_a_table = 1000column_arrays = [
["string"] * nrows_in_a_batch,
[123] * nrows_in_a_batch,
[456.789] * nrows_in_a_batch,
[range(1000)] * nrows_in_a_batch,
]def main(sys_args) -> None:for iteration in range(1000):
if iteration % 100 == 0:
print(f'Percent complete: {100*iteration/1000.0}')
batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
schema=schema)if __name__ == "__main__":
main(sys.argv[1:])
{code}
!arrow-10417-memtest-1.png!

> [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and 
> RecordBatchFileWriter
> ---
>
> Key: ARROW-10417
> URL: https://issues.apache.org/jira/browse/ARROW-10417
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.15.1, 2.0.0
> Environment: This is the config for my worker node:
> resources:
> - cpus: 1
> - maxMemoryMb: 4096
> - reservedMemoryMb: 2048
>Reporter: Shouheng Yi
>Priority: Major
> Fix For: 2.0.1, 3.0.0
>
> Attachments: Screen Shot 2020-10-28 at 9.43.32 PM.png, Screen Shot 
> 2020-10-28 at 9.43.40 PM.png, Screen Shot 2020-10-29 at 9.22.58 AM.png, 
> arrow-10417-memtest-1.png
>
>
> There might be a memory leak in the {{RecordBatchStreamWriter}}. The memory 
> resources were not released. It always hit the memory limit and started doing 
> virtual memory swapping. See the picture below:
> !Screen Shot 2020-10-28 at 9.43.32 PM.png!
> This was the code:
> {code:python}
> import tempfile
> import os
> import sys
> import pyarrow as pa
> B = 1
> KB = 1024 * B
> MB = 1024 * KB
> schema = pa.schema(
> [
>

[jira] [Commented] (ARROW-10417) [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and RecordBatchFileWriter



[ 
https://issues.apache.org/jira/browse/ARROW-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262950#comment-17262950
 ] 

Weston Pace commented on ARROW-10417:
-

[~shouheng] I'm having trouble reproducing this with the file you've provided.  
I am running on an Ubuntu 20.04.1 desktop and RAM usage stays constant.  Can 
you provide me some more details around the environment you are running in?  
From the charts it appears you might be submitting some kind of batch job?  Do 
you know any details about what kind of OS is running or how it is configured?

If you put...
{code:java}
pa.jemalloc_set_decay_ms(1)
{code}
..in your script (immediately after import pyarrow) does it change the behavior?

Also, do you see this kind of memory growth with other calls or is it 
specifically the record batch writer calls?  For example, can you try the 
following similar program and see if you see the same kind of growth?
{code:java}
import tempfile
import os
import sysimport pyarrow as paB = 1
KB = 1024 * B
MB = 1024 * KBschema = pa.schema(
[
pa.field("a_string", pa.string()),
pa.field("an_int", pa.int32()),
pa.field("a_float", pa.float32()),
pa.field("a_list_of_floats", pa.list_(pa.float32())),
]
)nrows_in_a_batch = 1000
nbatches_in_a_table = 1000column_arrays = [
["string"] * nrows_in_a_batch,
[123] * nrows_in_a_batch,
[456.789] * nrows_in_a_batch,
[range(1000)] * nrows_in_a_batch,
]def main(sys_args) -> None:for iteration in range(1000):
if iteration % 100 == 0:
print(f'Percent complete: {100*iteration/1000.0}')
batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
schema=schema)if __name__ == "__main__":
main(sys.argv[1:])
{code}
!arrow-10417-memtest-1.png!

> [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and 
> RecordBatchFileWriter
> ---
>
> Key: ARROW-10417
> URL: https://issues.apache.org/jira/browse/ARROW-10417
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.15.1, 2.0.0
> Environment: This is the config for my worker node:
> resources:
> - cpus: 1
> - maxMemoryMb: 4096
> - reservedMemoryMb: 2048
>Reporter: Shouheng Yi
>Priority: Major
> Fix For: 2.0.1, 3.0.0
>
> Attachments: Screen Shot 2020-10-28 at 9.43.32 PM.png, Screen Shot 
> 2020-10-28 at 9.43.40 PM.png, Screen Shot 2020-10-29 at 9.22.58 AM.png, 
> arrow-10417-memtest-1.png
>
>
> There might be a memory leak in the {{RecordBatchStreamWriter}}. The memory 
> resources were not released. It always hit the memory limit and started doing 
> virtual memory swapping. See the picture below:
> !Screen Shot 2020-10-28 at 9.43.32 PM.png!
> This was the code:
> {code:python}
> import tempfile
> import os
> import sys
> import pyarrow as pa
> B = 1
> KB = 1024 * B
> MB = 1024 * KB
> schema = pa.schema(
> [
> pa.field("a_string", pa.string()),
> pa.field("an_int", pa.int32()),
> pa.field("a_float", pa.float32()),
> pa.field("a_list_of_floats", pa.list_(pa.float32())),
> ]
> )
> nrows_in_a_batch = 1000
> nbatches_in_a_table = 1000
> column_arrays = [
> ["string"] * nrows_in_a_batch,
> [123] * nrows_in_a_batch,
> [456.789] * nrows_in_a_batch,
> [range(1000)] * nrows_in_a_batch,
> ]
> def main(sys_args) -> None:
> batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
> table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
> schema=schema)
> with tempfile.TemporaryDirectory() as tmpdir:
> filename_template = "file-{n}.arror"
> i = 0
> while True:
> path = os.path.join(tmpdir, filename_template.format(n=i))
> i += 1
> with pa.OSFile(path, "w") as sink:
> with pa.RecordBatchStreamWriter(sink, schema) as writer:
> writer.write_table(table)
> print(f"pa.total_allocated_bytes(): 
> {pa.total_allocated_bytes() / MB} mb")
> if __name__ == "__main__":
> main(sys.argv[1:])
> {code}
> Strangely enough, printing {{total_allocated_bytes}}, it seemed normal.
> {code:python}
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> {code}
> Am I using {{RecordBatchStreamWriter}} incorrectly? If not, how can I release 
> the resources?
> [Updates 10/29/2020]
> I tested on {{pyarrow==2.0.0}}. I still see the same issue.
> !Screen Shot 2020-10-29 at 9.22.58 AM.png! 
> I changed

[jira] [Updated] (ARROW-10417) [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and RecordBatchFileWriter



 [ 
https://issues.apache.org/jira/browse/ARROW-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-10417:

Attachment: arrow-10417-memtest-1.png

> [Python][C++] Possible Memory Leak in RecordBatchStreamWriter and 
> RecordBatchFileWriter
> ---
>
> Key: ARROW-10417
> URL: https://issues.apache.org/jira/browse/ARROW-10417
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.15.1, 2.0.0
> Environment: This is the config for my worker node:
> resources:
> - cpus: 1
> - maxMemoryMb: 4096
> - reservedMemoryMb: 2048
>Reporter: Shouheng Yi
>Priority: Major
> Fix For: 2.0.1, 3.0.0
>
> Attachments: Screen Shot 2020-10-28 at 9.43.32 PM.png, Screen Shot 
> 2020-10-28 at 9.43.40 PM.png, Screen Shot 2020-10-29 at 9.22.58 AM.png, 
> arrow-10417-memtest-1.png
>
>
> There might be a memory leak in the {{RecordBatchStreamWriter}}. The memory 
> resources were not released. It always hit the memory limit and started doing 
> virtual memory swapping. See the picture below:
> !Screen Shot 2020-10-28 at 9.43.32 PM.png!
> This was the code:
> {code:python}
> import tempfile
> import os
> import sys
> import pyarrow as pa
> B = 1
> KB = 1024 * B
> MB = 1024 * KB
> schema = pa.schema(
> [
> pa.field("a_string", pa.string()),
> pa.field("an_int", pa.int32()),
> pa.field("a_float", pa.float32()),
> pa.field("a_list_of_floats", pa.list_(pa.float32())),
> ]
> )
> nrows_in_a_batch = 1000
> nbatches_in_a_table = 1000
> column_arrays = [
> ["string"] * nrows_in_a_batch,
> [123] * nrows_in_a_batch,
> [456.789] * nrows_in_a_batch,
> [range(1000)] * nrows_in_a_batch,
> ]
> def main(sys_args) -> None:
> batch = pa.RecordBatch.from_arrays(column_arrays, schema=schema)
> table = pa.Table.from_batches([batch] * nbatches_in_a_table, 
> schema=schema)
> with tempfile.TemporaryDirectory() as tmpdir:
> filename_template = "file-{n}.arror"
> i = 0
> while True:
> path = os.path.join(tmpdir, filename_template.format(n=i))
> i += 1
> with pa.OSFile(path, "w") as sink:
> with pa.RecordBatchStreamWriter(sink, schema) as writer:
> writer.write_table(table)
> print(f"pa.total_allocated_bytes(): 
> {pa.total_allocated_bytes() / MB} mb")
> if __name__ == "__main__":
> main(sys.argv[1:])
> {code}
> Strangely enough, printing {{total_allocated_bytes}}, it seemed normal.
> {code:python}
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> pa.total_allocated_bytes(): 3.95556640625 mb
> {code}
> Am I using {{RecordBatchStreamWriter}} incorrectly? If not, how can I release 
> the resources?
> [Updates 10/29/2020]
> I tested on {{pyarrow==2.0.0}}. I still see the same issue.
> !Screen Shot 2020-10-29 at 9.22.58 AM.png! 
> I changed {{RecordBatchStreamWriter}} to {{RecordBatchFileWriter}} in my 
> code, i.e.:
> {code:python}
> ...
> with pa.OSFile(path, "w") as sink:
> with pa.RecordBatchFileWriter(sink, schema) as writer:
> writer.write_table(table)
> print(f"pa.total_allocated_bytes(): 
> {pa.total_allocated_bytes() / MB} mb")
> ...
> {code}
> I observed the same memory profile. I'm wondering if it is caused by 
> [WriteRecordBatch 
> |https://github.com/apache/arrow/blob/maint-0.15.x/cpp/src/arrow/ipc/writer.cc#L594]
>  not being able to release memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11215) [CI] Use named volumes by default for caching in docker-compose

Krisztian Szucs created ARROW-11215:
---

 Summary: [CI] Use named volumes by default for caching in 
docker-compose
 Key: ARROW-11215
 URL: https://issues.apache.org/jira/browse/ARROW-11215
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 3.0.0


Advantages over bind mounts:
- potentially better performance for docker on mac and windows
- won't contaminate the local environment with files written from within the 
container as root user

Still need to keep the bind mounts around for github actions in order to use 
the cache plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11188) [Rust] Implement crypto functions from PostgreSQL dialect



 [ 
https://issues.apache.org/jira/browse/ARROW-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11188.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 9139
[https://github.com/apache/arrow/pull/9139]

> [Rust] Implement crypto functions from PostgreSQL dialect
> -
>
> Key: ARROW-11188
> URL: https://issues.apache.org/jira/browse/ARROW-11188
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Patsura Dmitry
>Assignee: Patsura Dmitry
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Hello!
>  
>  * md5
>  * sha224
>  * sha256
>  * sha384
>  * sha512
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11212) [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels



 [ 
https://issues.apache.org/jira/browse/ARROW-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11212:
---
Labels: pull-request-available  (was: )

> [Packaging][Python] Use vcpkg as dependency source for manylinux and windows 
> wheels
> ---
>
> Key: ARROW-11212
> URL: https://issues.apache.org/jira/browse/ARROW-11212
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> So we can pin an explicit version of the environment, preventing any issues 
> arising from dependency sources (like conda) which drag over time.
> We can also enforce static linking by installing the static libraries only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10834) [R] Fix print method for SubTreeFileSystem



 [ 
https://issues.apache.org/jira/browse/ARROW-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-10834:
---
Labels: pull-request-available  (was: )

> [R] Fix print method for SubTreeFileSystem
> --
>
> Key: ARROW-10834
> URL: https://issues.apache.org/jira/browse/ARROW-10834
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 
> Running as a container in AWS fargate.
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> arrow::arrow_with_s3(){code}
>  returns TRUE.
> {code:java}
> arrow::s3_bucket("", role_arn=""){code}
>  causes the error:
>  
>  
> {code:java}
> ERROR while rich displaying an object: Error in get(name, x$base_fs): object 
> '.class_title' not found
> Traceback:
> 1. FUN(X[[i]], ...)
> 2. tryCatch(withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler), error = outer_handler)
> 3. tryCatchList(expr, classes, parentenv, handlers)
> 4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5. doTryCatch(return(expr), name, parentenv, handler)
> 6. withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler)
> 7. repr::mime2repr[[mime]](obj)
> 8. repr_text.default(obj)
> 9. paste(capture.output(print(obj)), collapse = "\n")
> 10. capture.output(print(obj))
> 11. evalVis(expr)
> 12. withVisible(eval(expr, pf))
> 13. eval(expr, pf)
> 14. eval(expr, pf)
> 15. print(obj)
> 16. print.R6(obj)
> 17. .subset2(x, "print")(...)
> 18. self$.class_title
> 19. `$.SubTreeFileSystem`(self, .class_title)
> 20. get(name, x$base_fs)
>  
> {code}
>  
> {code:java}
> SessionInfo(){code}
>  
> {code:java}
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 LTS
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 
>  [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 
>  [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
>  [7] LC_PAPER=en_US.UTF-8 LC_NAME=C 
>  [9] LC_ADDRESS=C LC_TELEPHONE=C 
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] dtplyr_1.0.1 dplyr_1.0.2
> loaded via a namespace (and not attached):
>  [1] magrittr_2.0.1 tidyselect_1.1.0 bit_4.0.4 uuid_0.1-4 
>  [5] R6_2.5.0 rlang_0.4.9 tools_4.0.3 data.table_1.13.2
>  [9] arrow_2.0.0 htmltools_0.5.0 ellipsis_0.3.1 bit64_4.0.5 
> [13] digest_0.6.27 assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 
> [17] crayon_1.3.4 IRdisplay_0.7.0 purrr_0.3.4 repr_1.1.0 
> [21] base64enc_0.1-3 vctrs_0.3.5 IRkernel_1.1.1 glue_1.4.2 
> [25] evaluate_0.14 pbdZMQ_0.3-3.1 compiler_4.0.3 pillar_1.4.7 
> [29] generics_0.1.0 jsonlite_1.7.1 pkgconfig_2.0.3
> {code}
>  
>  
> I had to use the work-around documented here: 
> https://issues.apache.org/jira/browse/ARROW-10371?jql=project%20%3D%20ARROW%20AND%20text%20~%20%22libcurl4-openssl-dev%22
>  (Download cmake 3.19.1, build it, and set CMAKE=.) to 
> install arrow.
>  
> I'm sorry I don't have more ideas about the error. Without reading the code 
> I'm not even sure what's going on in this part of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11213) [Packaging][Python] Dockerize wheel building on windows

Krisztian Szucs created ARROW-11213:
---

 Summary: [Packaging][Python] Dockerize wheel building on windows
 Key: ARROW-11213
 URL: https://issues.apache.org/jira/browse/ARROW-11213
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Packaging, Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 3.0.0


So we have clearly reproducible builds for windows wheels which are easily 
portable between ci serices.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11214) [Rust] [DataFusion] Add optional rust features for functions in library to keep dependencies down

Andrew Lamb created ARROW-11214:
---

 Summary: [Rust] [DataFusion] Add optional rust features for 
functions in library to keep dependencies down
 Key: ARROW-11214
 URL: https://issues.apache.org/jira/browse/ARROW-11214
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb


As we expand the numbers of functions available in DataFusion, DataFusion will 
likely pick up additional third-party dependencies. In general, I think it 
would be a nice feature for DataFusion to  allow users more fine grained 
control over the features that they wanted to use (and pay the subsequently 
higher compilation / link time (and binary size) cost. At the moment, with a 
single codebase and no feature flags, everyone's compile time and binary size 
will increase even if they don't use a specific set of features. 

It seems to me like we might want to start offering a way to keep the number of 
required dependencies of DataFusion down. For example, in the case of 
https://github.com/apache/arrow/pull/9139, we could potentially put the use of 
crypto functions behind a feature flag. Users of DataFusion could then pick a 
subset of features  like "core" and "func-datetime" and "func-crypto" to have 
more control over the dependencies they pulled in





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11212) [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels



 [ 
https://issues.apache.org/jira/browse/ARROW-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-11212:

Description: 
So we can pin an explicit version of the environment, preventing any issues 
arising from dependency sources (like conda) which drag over time.

We can also enforce static linking by installing the static libraries only.

  was:So we can pin an explicit 


> [Packaging][Python] Use vcpkg as dependency source for manylinux and windows 
> wheels
> ---
>
> Key: ARROW-11212
> URL: https://issues.apache.org/jira/browse/ARROW-11212
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 3.0.0
>
>
> So we can pin an explicit version of the environment, preventing any issues 
> arising from dependency sources (like conda) which drag over time.
> We can also enforce static linking by installing the static libraries only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11212) [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels



 [ 
https://issues.apache.org/jira/browse/ARROW-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-11212:

Description: So we can pin an explicit 

> [Packaging][Python] Use vcpkg as dependency source for manylinux and windows 
> wheels
> ---
>
> Key: ARROW-11212
> URL: https://issues.apache.org/jira/browse/ARROW-11212
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 3.0.0
>
>
> So we can pin an explicit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-11212) [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels



 [ 
https://issues.apache.org/jira/browse/ARROW-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-11212:
---

Assignee: Krisztian Szucs

> [Packaging][Python] Use vcpkg as dependency source for manylinux and windows 
> wheels
> ---
>
> Key: ARROW-11212
> URL: https://issues.apache.org/jira/browse/ARROW-11212
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 3.0.0
>
>
> So we can pin an explicit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11212) [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels

Krisztian Szucs created ARROW-11212:
---

 Summary: [Packaging][Python] Use vcpkg as dependency source for 
manylinux and windows wheels
 Key: ARROW-11212
 URL: https://issues.apache.org/jira/browse/ARROW-11212
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-11176) [R] Expose memory pool name and document setting it



 [ 
https://issues.apache.org/jira/browse/ARROW-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-11176:
---

Assignee: Neal Richardson  (was: Jonathan Keane)

> [R] Expose memory pool name and document setting it
> ---
>
> Key: ARROW-11176
> URL: https://issues.apache.org/jira/browse/ARROW-11176
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 4.0.0
>
>
> Followup to ARROW-11009, which did this in C++ and added the binding in 
> Python. This could be useful not only for debugging but also for benchmarking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11211) [R] ChunkedArray$create assumes all chunks are the same type



 [ 
https://issues.apache.org/jira/browse/ARROW-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11211:

Description: 
It detects the type from the first chunk and uses it for all chunks. Normally 
this works ok, but it can lead to unexpected behavior, such as:

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}

returns:
{{Error: Invalid: Value is too large to fit in C integer type}}

There are a few things that might fix/change this: 
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all 
others

Note that in this case, specifying the type to int64() does "work" with an 
overflowed NaN value (-9223372036854775808)

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}



  was:
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}

returns:
{{Error: Invalid: Value is too large to fit in C integer type}}

There are a few things that might fix/change this: 
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all 
others

Note that specifying the type to int64() does work with an overflowed NaN value 
(-9223372036854775808)

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}




> [R] ChunkedArray$create assumes all chunks are the same type
> 
>
> Key: ARROW-11211
> URL: https://issues.apache.org/jira/browse/ARROW-11211
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Minor
>
> It detects the type from the first chunk and uses it for all chunks. Normally 
> this works ok, but it can lead to unexpected behavior, such as:
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data)
> {code}
> returns:
> {{Error: Invalid: Value is too large to fit in C integer type}}
> There are a few things that might fix/change this: 
> * improved error message
> * chunked arrays not assuming the first chunk's types can be cast safely to 
> all others
> Note that in this case, specifying the type to int64() does "work" with an 
> overflowed NaN value (-9223372036854775808)
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data, type = int64())
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11211) [R] ChunkedArray$create assumes all chunks are the same type



 [ 
https://issues.apache.org/jira/browse/ARROW-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11211:

Issue Type: Bug  (was: New Feature)

> [R] ChunkedArray$create assumes all chunks are the same type
> 
>
> Key: ARROW-11211
> URL: https://issues.apache.org/jira/browse/ARROW-11211
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Minor
>
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data)
> {code}
> returns:
> {{Error: Invalid: Value is too large to fit in C integer type}}
> There are a few things that might fix/change this: 
> * improved error message
> * chunked arrays not assuming the first chunk's types can be cast safely to 
> all others
> Note that specifying the type to int64() does work with an overflowed NaN 
> value (-9223372036854775808)
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data, type = int64())
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11211) [R] ChunkedArray$create assumes all chunks are the same type



 [ 
https://issues.apache.org/jira/browse/ARROW-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11211:

Summary: [R] ChunkedArray$create assumes all chunks are the same type  
(was: [R] Chunked arrays with integers + NaNs error oddly )

> [R] ChunkedArray$create assumes all chunks are the same type
> 
>
> Key: ARROW-11211
> URL: https://issues.apache.org/jira/browse/ARROW-11211
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Jonathan Keane
>Priority: Minor
>
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data)
> {code}
> returns:
> {{Error: Invalid: Value is too large to fit in C integer type}}
> There are a few things that might fix/change this: 
> * improved error message
> * chunked arrays not assuming the first chunk's types can be cast safely to 
> all others
> Note that specifying the type to int64() does work with an overflowed NaN 
> value (-9223372036854775808)
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data, type = int64())
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10834) [R] Fix print methods for SubTreeFileSystem



 [ 
https://issues.apache.org/jira/browse/ARROW-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-10834:
-
Summary: [R] Fix print methods for SubTreeFileSystem  (was: [R] Print 
method fails for SubTreeFileSystem)

> [R] Fix print methods for SubTreeFileSystem
> ---
>
> Key: ARROW-10834
> URL: https://issues.apache.org/jira/browse/ARROW-10834
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 
> Running as a container in AWS fargate.
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
> Fix For: 3.0.0
>
>
>  
> {code:java}
> arrow::arrow_with_s3(){code}
>  returns TRUE.
> {code:java}
> arrow::s3_bucket("", role_arn=""){code}
>  causes the error:
>  
>  
> {code:java}
> ERROR while rich displaying an object: Error in get(name, x$base_fs): object 
> '.class_title' not found
> Traceback:
> 1. FUN(X[[i]], ...)
> 2. tryCatch(withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler), error = outer_handler)
> 3. tryCatchList(expr, classes, parentenv, handlers)
> 4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5. doTryCatch(return(expr), name, parentenv, handler)
> 6. withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler)
> 7. repr::mime2repr[[mime]](obj)
> 8. repr_text.default(obj)
> 9. paste(capture.output(print(obj)), collapse = "\n")
> 10. capture.output(print(obj))
> 11. evalVis(expr)
> 12. withVisible(eval(expr, pf))
> 13. eval(expr, pf)
> 14. eval(expr, pf)
> 15. print(obj)
> 16. print.R6(obj)
> 17. .subset2(x, "print")(...)
> 18. self$.class_title
> 19. `$.SubTreeFileSystem`(self, .class_title)
> 20. get(name, x$base_fs)
>  
> {code}
>  
> {code:java}
> SessionInfo(){code}
>  
> {code:java}
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 LTS
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 
>  [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 
>  [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
>  [7] LC_PAPER=en_US.UTF-8 LC_NAME=C 
>  [9] LC_ADDRESS=C LC_TELEPHONE=C 
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] dtplyr_1.0.1 dplyr_1.0.2
> loaded via a namespace (and not attached):
>  [1] magrittr_2.0.1 tidyselect_1.1.0 bit_4.0.4 uuid_0.1-4 
>  [5] R6_2.5.0 rlang_0.4.9 tools_4.0.3 data.table_1.13.2
>  [9] arrow_2.0.0 htmltools_0.5.0 ellipsis_0.3.1 bit64_4.0.5 
> [13] digest_0.6.27 assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 
> [17] crayon_1.3.4 IRdisplay_0.7.0 purrr_0.3.4 repr_1.1.0 
> [21] base64enc_0.1-3 vctrs_0.3.5 IRkernel_1.1.1 glue_1.4.2 
> [25] evaluate_0.14 pbdZMQ_0.3-3.1 compiler_4.0.3 pillar_1.4.7 
> [29] generics_0.1.0 jsonlite_1.7.1 pkgconfig_2.0.3
> {code}
>  
>  
> I had to use the work-around documented here: 
> https://issues.apache.org/jira/browse/ARROW-10371?jql=project%20%3D%20ARROW%20AND%20text%20~%20%22libcurl4-openssl-dev%22
>  (Download cmake 3.19.1, build it, and set CMAKE=.) to 
> install arrow.
>  
> I'm sorry I don't have more ideas about the error. Without reading the code 
> I'm not even sure what's going on in this part of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10834) [R] Fix print method for SubTreeFileSystem



 [ 
https://issues.apache.org/jira/browse/ARROW-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-10834:
-
Summary: [R] Fix print method for SubTreeFileSystem  (was: [R] Fix print 
methods for SubTreeFileSystem)

> [R] Fix print method for SubTreeFileSystem
> --
>
> Key: ARROW-10834
> URL: https://issues.apache.org/jira/browse/ARROW-10834
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 
> Running as a container in AWS fargate.
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
> Fix For: 3.0.0
>
>
>  
> {code:java}
> arrow::arrow_with_s3(){code}
>  returns TRUE.
> {code:java}
> arrow::s3_bucket("", role_arn=""){code}
>  causes the error:
>  
>  
> {code:java}
> ERROR while rich displaying an object: Error in get(name, x$base_fs): object 
> '.class_title' not found
> Traceback:
> 1. FUN(X[[i]], ...)
> 2. tryCatch(withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler), error = outer_handler)
> 3. tryCatchList(expr, classes, parentenv, handlers)
> 4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5. doTryCatch(return(expr), name, parentenv, handler)
> 6. withCallingHandlers({
>  . if (!mime %in% names(repr::mime2repr)) 
>  . stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
>  . rpr <- repr::mime2repr[[mime]](obj)
>  . if (is.null(rpr)) 
>  . return(NULL)
>  . prepare_content(is.raw(rpr), rpr)
>  . }, error = error_handler)
> 7. repr::mime2repr[[mime]](obj)
> 8. repr_text.default(obj)
> 9. paste(capture.output(print(obj)), collapse = "\n")
> 10. capture.output(print(obj))
> 11. evalVis(expr)
> 12. withVisible(eval(expr, pf))
> 13. eval(expr, pf)
> 14. eval(expr, pf)
> 15. print(obj)
> 16. print.R6(obj)
> 17. .subset2(x, "print")(...)
> 18. self$.class_title
> 19. `$.SubTreeFileSystem`(self, .class_title)
> 20. get(name, x$base_fs)
>  
> {code}
>  
> {code:java}
> SessionInfo(){code}
>  
> {code:java}
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.3 LTS
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 
>  [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 
>  [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
>  [7] LC_PAPER=en_US.UTF-8 LC_NAME=C 
>  [9] LC_ADDRESS=C LC_TELEPHONE=C 
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] dtplyr_1.0.1 dplyr_1.0.2
> loaded via a namespace (and not attached):
>  [1] magrittr_2.0.1 tidyselect_1.1.0 bit_4.0.4 uuid_0.1-4 
>  [5] R6_2.5.0 rlang_0.4.9 tools_4.0.3 data.table_1.13.2
>  [9] arrow_2.0.0 htmltools_0.5.0 ellipsis_0.3.1 bit64_4.0.5 
> [13] digest_0.6.27 assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 
> [17] crayon_1.3.4 IRdisplay_0.7.0 purrr_0.3.4 repr_1.1.0 
> [21] base64enc_0.1-3 vctrs_0.3.5 IRkernel_1.1.1 glue_1.4.2 
> [25] evaluate_0.14 pbdZMQ_0.3-3.1 compiler_4.0.3 pillar_1.4.7 
> [29] generics_0.1.0 jsonlite_1.7.1 pkgconfig_2.0.3
> {code}
>  
>  
> I had to use the work-around documented here: 
> https://issues.apache.org/jira/browse/ARROW-10371?jql=project%20%3D%20ARROW%20AND%20text%20~%20%22libcurl4-openssl-dev%22
>  (Download cmake 3.19.1, build it, and set CMAKE=.) to 
> install arrow.
>  
> I'm sorry I don't have more ideas about the error. Without reading the code 
> I'm not even sure what's going on in this part of the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11136) [R] Bindings for is.nan



 [ 
https://issues.apache.org/jira/browse/ARROW-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11136:
---
Labels: pull-request-available  (was: )

> [R] Bindings for is.nan
> ---
>
> Key: ARROW-11136
> URL: https://issues.apache.org/jira/browse/ARROW-11136
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-11043 added this compute kernel in C++, so we should wire it up in R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9634) [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow



 [ 
https://issues.apache.org/jira/browse/ARROW-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-9634:
-
Fix Version/s: (was: 3.0.0)
   4.0.0

> [C++][Python] Restore non-UTC time zones when reading Parquet file that was 
> previously Arrow
> 
>
> Key: ARROW-9634
> URL: https://issues.apache.org/jira/browse/ARROW-9634
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 4.0.0
>
>
> This was reported on the mailing list
> {code}
> In [20]: df = pd.DataFrame({'a': pd.Series(np.arange(0, 1, 
> 1000)).astype(pd.DatetimeTZDtype('ns', 'America/Los_Angeles'
> ...: ))}) 
>  
> In [21]: t = pa.table(df) 
>  
> In [22]: t
>  
> Out[22]: 
> pyarrow.Table
> a: timestamp[ns, tz=America/Los_Angeles]
> In [23]: pq.write_table(t, 'test.parquet')
>  
> In [24]: pq.read_table('test.parquet')
>  
> Out[24]: 
> pyarrow.Table
> a: timestamp[us, tz=UTC]
> In [25]: pq.read_table('test.parquet')[0] 
>  
> Out[25]: 
> 
> [
>   [
> 1970-01-01 00:00:00.00,
> 1970-01-01 00:00:00.01,
> 1970-01-01 00:00:00.02,
> 1970-01-01 00:00:00.03,
> 1970-01-01 00:00:00.04,
> 1970-01-01 00:00:00.05,
> 1970-01-01 00:00:00.06,
> 1970-01-01 00:00:00.07,
> 1970-01-01 00:00:00.08,
> 1970-01-01 00:00:00.09
>   ]
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11211) [R] Chunked arrays with integers + NaNs error oddly

2021-01-11 Thread Jonathan Keane (Jira)

Jonathan Keane created ARROW-11211:
--

 Summary: [R] Chunked arrays with integers + NaNs error oddly 
 Key: ARROW-11211
 URL: https://issues.apache.org/jira/browse/ARROW-11211
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Jonathan Keane


{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}

returns:
{{Error: Invalid: Value is too large to fit in C integer type}}

There are a few things that might fix/change this: 
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all 
others

Note that specifying the type to int64() does work with an overflowed NaN value 
(-9223372036854775808)

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10472) [C++][Python] casting a scalar timestamp to date32 results in Aborted (core dump)



[ 
https://issues.apache.org/jira/browse/ARROW-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262915#comment-17262915
 ] 

Joris Van den Bossche commented on ARROW-10472:
---

This seems to have been fixed in the meantime. Would be good to add a test for 
this to close this issue. 

> [C++][Python] casting a scalar timestamp to date32 results in Aborted (core 
> dump)
> -
>
> Key: ARROW-10472
> URL: https://issues.apache.org/jira/browse/ARROW-10472
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04
>Reporter: Taras Kuzyo
>Priority: Minor
> Fix For: 3.0.0
>
>
> Consider the following example: I have an array of timestamp[s]
> {code:java}
> >>> import pyarrow.compute as pc
> >>> import pyarrow as pa
> >>> arr = pc.strptime(['2020-11-01', '2020-11-02', '2020-11-03'], 
> >>> format='%Y-%m-%d', unit='s')
> >>> arr
> 
> [
>  2020-11-01 00:00:00,
>  2020-11-02 00:00:00,
>  2020-11-03 00:00:00
> ]{code}
> I am able to cast the array to date32:
> {code:java}
> >>> pc.cast(arr, pa.date32())
> 
> [
>  2020-11-01,
>  2020-11-02,
>  2020-11-03
> ]{code}
> but when I try to do the same with a scalar I get core dumped failure
> {code:java}
> >>> arr[0]
> 
> >>> pc.cast(arr[0], pa.date32())
> terminate called after throwing an instance of 'mpark::bad_variant_access'
>  what(): bad_variant_access
> Aborted (core dumped) 
> {code}
>  Below is a stack trace from gdb
> {code:java}
> $ gdb /usr/bin/python3
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /usr/bin/python3...(no debugging symbols found)...done.
> (gdb) run sample.py 
> Starting program: /usr/bin/python3 sample.py
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x739ff700 (LWP 4314)]
> [New Thread 0x7fffebfff700 (LWP 4315)]
> [New Thread 0x7fffeb7fe700 (LWP 4316)]
> [New Thread 0x7fffe8ffd700 (LWP 4317)]
> [New Thread 0x7fffe47fc700 (LWP 4318)]
> [New Thread 0x7fffe1ffb700 (LWP 4319)]
> [New Thread 0x7fffdf7fa700 (LWP 4320)]
> [New Thread 0x7fffdcff9700 (LWP 4321)]
> [New Thread 0x7fffda7f8700 (LWP 4322)]
> [New Thread 0x7fffd7ff7700 (LWP 4323)]
> [New Thread 0x7fffd57f6700 (LWP 4324)]
> [New Thread 0x7fffd2ff5700 (LWP 4325)]
> [Thread 0x7fffd2ff5700 (LWP 4325) exited]
> [Thread 0x7fffd57f6700 (LWP 4324) exited]
> [Thread 0x7fffd7ff7700 (LWP 4323) exited]
> [Thread 0x7fffda7f8700 (LWP 4322) exited]
> [Thread 0x7fffdcff9700 (LWP 4321) exited]
> [Thread 0x7fffdf7fa700 (LWP 4320) exited]
> [Thread 0x7fffe1ffb700 (LWP 4319) exited]
> [Thread 0x7fffe47fc700 (LWP 4318) exited]
> [Thread 0x7fffe8ffd700 (LWP 4317) exited]
> [Thread 0x7fffeb7fe700 (LWP 4316) exited]
> [Thread 0x7fffebfff700 (LWP 4315) exited]
> terminate called after throwing an instance of 'mpark::bad_variant_access'
>   what():  bad_variant_access
> Thread 1 "python3" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) backtrace 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x77a248b1 in __GI_abort () at abort.c:79
> #2  0x7477d957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x74783ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x74783b21 in std::terminate() () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x74783d54 in __cxa_throw () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x75104012 in mpark::throw_bad_variant_access() ()
>    from /usr/local/lib/python3.6/dist-packages/pyarrow/libarrow.so.200
> #7  0x751e1f51 in 
> arrow::compute::internal::CastFunctor arrow::TimestampType, void>::Exec(arrow::compute::KernelContext*, 
> arrow::compute::ExecBatch const&, arrow::Datum*) ()
>    from /usr/local/lib/python3.6/dist-packages/pyarrow/libarrow.so.200
> #8  0x752ab5ab

[jira] [Updated] (ARROW-4575) [Python] Add Python Flight implementation to integration testing



 [ 
https://issues.apache.org/jira/browse/ARROW-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-4575:
-
Fix Version/s: (was: 3.0.0)
   4.0.0

> [Python] Add Python Flight implementation to integration testing
> 
>
> Key: ARROW-4575
> URL: https://issues.apache.org/jira/browse/ARROW-4575
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Integration, Python
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: flight
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-10283) [Python] Python deprecation warning for "PY_SSIZE_T_CLEAN will be required for '#' formats"



 [ 
https://issues.apache.org/jira/browse/ARROW-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reassigned ARROW-10283:
-

Assignee: Joris Van den Bossche

> [Python] Python deprecation warning for "PY_SSIZE_T_CLEAN will be required 
> for '#' formats"
> ---
>
> Key: ARROW-10283
> URL: https://issues.apache.org/jira/browse/ARROW-10283
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have a few cases that run into this python deprecation warning:
> {code}
> pyarrow/tests/test_pandas.py: 9 warnings
> pyarrow/tests/test_parquet.py: 7790 warnings
>   sys:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_decimal_with_None_explicit_type
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_decimal_with_None_infer_type
>   /buildbot/AMD64_Conda_Python_3_8/python/pyarrow/tests/test_pandas.py:114: 
> DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> result = pd.Series(arr.to_pandas(), name=s.name)
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_strided_objects
>   /buildbot/AMD64_Conda_Python_3_8/python/pyarrow/pandas_compat.py:1127: 
> DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> result = pa.lib.table_to_blocks(options, block_table, categories,
> {code}
> Related to https://bugs.python.org/issue36381
> I think one such usage example is at 
> https://github.com/apache/arrow/blob/0b481523b7502a984788d93b822a335894ffe648/cpp/src/arrow/python/decimal.cc#L106



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10283) [Python] Python deprecation warning for "PY_SSIZE_T_CLEAN will be required for '#' formats"



 [ 
https://issues.apache.org/jira/browse/ARROW-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-10283:
---
Labels: pull-request-available  (was: )

> [Python] Python deprecation warning for "PY_SSIZE_T_CLEAN will be required 
> for '#' formats"
> ---
>
> Key: ARROW-10283
> URL: https://issues.apache.org/jira/browse/ARROW-10283
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have a few cases that run into this python deprecation warning:
> {code}
> pyarrow/tests/test_pandas.py: 9 warnings
> pyarrow/tests/test_parquet.py: 7790 warnings
>   sys:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_decimal_with_None_explicit_type
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_decimal_with_None_infer_type
>   /buildbot/AMD64_Conda_Python_3_8/python/pyarrow/tests/test_pandas.py:114: 
> DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> result = pd.Series(arr.to_pandas(), name=s.name)
> pyarrow/tests/test_pandas.py::TestConvertDecimalTypes::test_strided_objects
>   /buildbot/AMD64_Conda_Python_3_8/python/pyarrow/pandas_compat.py:1127: 
> DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
> result = pa.lib.table_to_blocks(options, block_table, categories,
> {code}
> Related to https://bugs.python.org/issue36381
> I think one such usage example is at 
> https://github.com/apache/arrow/blob/0b481523b7502a984788d93b822a335894ffe648/cpp/src/arrow/python/decimal.cc#L106



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11210) [CI] Restore workflows that had been blocked by INFRA



 [ 
https://issues.apache.org/jira/browse/ARROW-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11210:
---
Labels: pull-request-available  (was: )

> [CI] Restore workflows that had been blocked by INFRA
> -
>
> Key: ARROW-11210
> URL: https://issues.apache.org/jira/browse/ARROW-11210
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See INFRA-21239, ARROW-11092, ARROW-11132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11210) [CI] Restore workflows that had been blocked by INFRA

Neal Richardson created ARROW-11210:
---

 Summary: [CI] Restore workflows that had been blocked by INFRA
 Key: ARROW-11210
 URL: https://issues.apache.org/jira/browse/ARROW-11210
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 3.0.0


See INFRA-21239, ARROW-11092, ARROW-11132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11189) [Developer] Achery benchmark diff cannot compare two jsons



 [ 
https://issues.apache.org/jira/browse/ARROW-11189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-11189:
-
Priority: Minor  (was: Major)

> [Developer] Achery benchmark diff cannot compare two jsons
> --
>
> Key: ARROW-11189
> URL: https://issues.apache.org/jira/browse/ARROW-11189
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: archery, pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11153) [C++][Packaging] Move debian/ubuntu/centos packaging off of Travis-CI



 [ 
https://issues.apache.org/jira/browse/ARROW-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-11153:
-
Fix Version/s: (was: 3.0.0)
   4.0.0

> [C++][Packaging] Move debian/ubuntu/centos packaging off of Travis-CI
> -
>
> Key: ARROW-11153
> URL: https://issues.apache.org/jira/browse/ARROW-11153
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Packaging
>Reporter: Neal Richardson
>Assignee: Kouhei Sutou
>Priority: Blocker
> Fix For: 4.0.0
>
>
> Per mailing list discussion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11205) [GLib][Dataset] Add GADFileFormat and its family



 [ 
https://issues.apache.org/jira/browse/ARROW-11205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-11205.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 9158
[https://github.com/apache/arrow/pull/9158]

> [GLib][Dataset] Add GADFileFormat and its family
> 
>
> Key: ARROW-11205
> URL: https://issues.apache.org/jira/browse/ARROW-11205
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11203) [Developer][Website] Enable JIRA and pull request integration



 [ 
https://issues.apache.org/jira/browse/ARROW-11203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-11203.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 88
[https://github.com/apache/arrow-site/pull/88]

> [Developer][Website] Enable JIRA and pull request integration
> -
>
> Key: ARROW-11203
> URL: https://issues.apache.org/jira/browse/ARROW-11203
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Website
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10784) [Python] Loading pyarrow.compute isn't thread safe



[ 
https://issues.apache.org/jira/browse/ARROW-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262898#comment-17262898
 ] 

Joris Van den Bossche commented on ARROW-10784:
---

[~emkornfield] were you able to reproduce it again? 
(if not let's move it to the next milestone)

> [Python] Loading pyarrow.compute isn't thread safe
> --
>
> Key: ARROW-10784
> URL: https://issues.apache.org/jira/browse/ARROW-10784
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 2.0.0
>Reporter: Micah Kornfield
>Priority: Major
> Fix For: 3.0.0
>
>
> When using Arrow in a multithreaded environment it is possible to trigger an 
> initialization race on the pyarrow.compute module when calling Array.flatten.
>  
> Flatten calls _pc() which imports pyarrow compute but if two threads call 
> flatten at the same time is possible that the global initialization of 
> functions from the registry will be incomplete and therefore cause an 
> AttributeError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9997) [Python] StructScalar.as_py() fails if the type has duplicate field names



 [ 
https://issues.apache.org/jira/browse/ARROW-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-9997:
-
Fix Version/s: (was: 3.0.0)
   4.0.0

> [Python] StructScalar.as_py() fails if the type has duplicate field names
> -
>
> Key: ARROW-9997
> URL: https://issues.apache.org/jira/browse/ARROW-9997
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 4.0.0
>
>
> {{StructScalar}} currently extends an abstract Mapping interface. Since the 
> type allows duplicate field names we cannot provide that API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10469) [CI][Python] Run dask integration tests on Windows



 [ 
https://issues.apache.org/jira/browse/ARROW-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-10469:
--
Fix Version/s: (was: 3.0.0)
   4.0.0

> [CI][Python] Run dask integration tests on Windows
> --
>
> Key: ARROW-10469
> URL: https://issues.apache.org/jira/browse/ARROW-10469
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 4.0.0
>
>
> So we can catch bugs like ARROW-10462 in advance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10469) [CI][Python] Run dask integration tests on Windows



[ 
https://issues.apache.org/jira/browse/ARROW-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262896#comment-17262896
 ] 

Joris Van den Bossche commented on ARROW-10469:
---

This won't happen for 3.0.0 (so moving milestone), but I will ensure to do a PR 
to ask to trigger their CI builds (including a windows build) on our nightly 
packages

> [CI][Python] Run dask integration tests on Windows
> --
>
> Key: ARROW-10469
> URL: https://issues.apache.org/jira/browse/ARROW-10469
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 3.0.0
>
>
> So we can catch bugs like ARROW-10462 in advance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10623) [R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0



 [ 
https://issues.apache.org/jira/browse/ARROW-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-10623.
-
Resolution: Fixed

Issue resolved by pull request 9118
[https://github.com/apache/arrow/pull/9118]

> [R] Version 1.0.1 breaks data.frame attributes when reading file written by 
> 2.0.0
> -
>
> Key: ARROW-10623
> URL: https://issues.apache.org/jira/browse/ARROW-10623
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.0.1, 2.0.0
>Reporter: Fleur Kelpin
>Assignee: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.1, 3.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> h4. How to reproduce
>  * Create a data frame:
> {noformat}
> df <- data.frame(col1 = 1:100){noformat}
>  * Write it to parquet file using apache 2.0.0. The demo uses R 3.6 but same 
> happens if you use R 4.0
>  * Read the parquet file using apache 1.0.1. I only tried that in R 3.6
> h4. Expected
> The data frame is the same as it was before:
> {noformat}
> structure(list(col1 = 1:100), row.names = c(NA, 100L), class = 
> "data.frame"){noformat}
> h4. Actual
> The data frame has lost some information:
> {noformat}
> structure(list(1:100), class = "data.frame"){noformat}
> h4. Demo
> I'm not sure what the easiest way is to put up a demo project for this, since 
> you need to switch between arrow installations. But I've created this docker 
> based demo:
> [https://github.com/fdlk/arrow2/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10803) [R] Support R >= 3.3 and add CI



 [ 
https://issues.apache.org/jira/browse/ARROW-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-10803.
-
Resolution: Fixed

Issue resolved by pull request 8833
[https://github.com/apache/arrow/pull/8833]

> [R] Support R >= 3.3 and add CI
> ---
>
> Key: ARROW-10803
> URL: https://issues.apache.org/jira/browse/ARROW-10803
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: AWS EMR 5.22 (also tried EMR 5.31 and EMR 6.1)
> Amazon Linux AMI 2018.03
> R version 3.4.1 
>Reporter: Doan Le
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> R install on clean AWS EMR fails.  I've tried multiple methods:
>  
> From within R
> {{> 
> source("[https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R];)}}
> {{> install_arrow()}}
> {{}}
> {{And also via: }}
> sudo R -e "install.packages('arrow', repos 
> ='https://cloud.r-project.org',dependencies = TRUE)"
>  
> All dependencies seem to install ok but the arrow R package itself fails:
>  
> ++ -m64 -std=gnu++11 -I/usr/include/R -DNDEBUG 
> -I/mnt/tmp/RtmpVCRQK4/R.INSTALL591557cd8b23/arrow/libarrow/arrow-2.0.0/include
>  -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/cpp11/include" 
> -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic 
> -c array_from_vector.cpp -o array_from_vector.o
> array_from_vector.cpp: In instantiation of 'std::shared_ptr 
> arrow::r::MakeSimpleArray(SEXP) [with int RTYPE = 13; RVector = 
> cpp11::r_vector; Type = arrow::Int32Type; SEXP = SEXPREC*]':
> array_from_vector.cpp:1406:65: required from here
> *array_from_vector.cpp:1361:59: error: 'DATAPTR' was not declared in this 
> scope*
>  *auto p_vec_start = reinterpret_cast(DATAPTR(vec));*
>  *~~~^*
> *array_from_vector.cpp:1380:10: error: unable to deduce 'auto' from 
> 'first_na'*
>  *auto p_vec = first_na;*
>  ^
> array_from_vector.cpp: In instantiation of 'std::shared_ptr 
> arrow::r::MakeSimpleArray(SEXP) [with int RTYPE = 14; RVector = 
> cpp11::r_vector; Type = arrow::Int64Type; SEXP = SEXPREC*]':
> *array_from_vector.cpp:1408:65: required from here*
> *array_from_vector.cpp:1361:59: error: 'DATAPTR' was not declared in this 
> scope*
>  *auto p_vec_start = reinterpret_cast(DATAPTR(vec));*
>  ~~~^
> array_from_vector.cpp:1380:10: error: unable to deduce 'auto' from 'first_na'
>  auto p_vec = first_na;
>  ^
> array_from_vector.cpp: In instantiation of 'std::shared_ptr 
> arrow::r::MakeSimpleArray(SEXP) [with int RTYPE = 14; RVector = 
> cpp11::r_vector; Type = arrow::DoubleType; SEXP = SEXPREC*]':
> *array_from_vector.cpp:1410:66: required from here*
> *array_from_vector.cpp:1361:59: error: 'DATAPTR' was not declared in this 
> scope*
>  auto p_vec_start = reinterpret_cast(DATAPTR(vec));
>  ~~~^
> *array_from_vector.cpp:1380:10: error: unable to deduce 'auto' from 
> 'first_na'*
>  *auto p_vec = first_na;*
>  ^
> array_from_vector.cpp: In instantiation of 'std::shared_ptr 
> arrow::r::MakeSimpleArray(SEXP) [with int RTYPE = 24; RVector = 
> cpp11::r_vector; Type = arrow::UInt8Type; SEXP = SEXPREC*]':
> array_from_vector.cpp:1412:61: required from here
> *array_from_vector.cpp:1361:59: error: 'DATAPTR' was not declared in this 
> scope*
>  *auto p_vec_start = reinterpret_cast(DATAPTR(vec));*
>  ~~~^
> *array_from_vector.cpp:1380:10: error: unable to deduce 'auto' from 
> 'first_na'*
>  *auto p_vec = first_na;*
>  ^
> make: *** [array_from_vector.o] Error 1
> ERROR: compilation failed for package 'arrow'
> * removing '/usr/lib64/R/library/arrow'
> The downloaded source packages are in
>  '/mnt/tmp/RtmpyLdG80/downloaded_packages'
> Updating HTML index of packages in '.Library'
> Making 'packages.html' ... done
> Warning message:
> In install.packages("arrow", repos = arrow_repos(repos, nightly), :
>  installation of package 'arrow' had non-zero exit status
>  
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-9019) [Python] hdfs fails to connect to for HDFS 3.x cluster

2021-01-11 Thread Bradley Miro (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262883#comment-17262883
 ] 

Bradley Miro edited comment on ARROW-9019 at 1/11/21, 7:19 PM:
---

Hey [~tgraves],

This ended up working once I did this:
{code:java}
export CLASSPATH=$(hadoop classpath --glob){code}
(assuming the Hadoop binary is in your PATH)


was (Author: bradmiro):
Hey [~tgraves],

This ended up working for me:
{code:java}
export CLASSPATH=$(hadoop classpath --glob){code}
(assuming the Hadoop binary is in your PATH)

> [Python] hdfs fails to connect to for HDFS 3.x cluster
> --
>
> Key: ARROW-9019
> URL: https://issues.apache.org/jira/browse/ARROW-9019
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Thomas Graves
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: filesystem, hdfs
>
> I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an 
> error that looks like a protobuf or jar mismatch problem with Hadoop. The 
> same code works on a Hadoop 2.9 cluster.
>  
> I'm wondering if there is something special I need to do or if pyarrow 
> doesn't support Hadoop 3.x yet?
> Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
>  
>     import pyarrow as pa
>     hdfs_kwargs = dict(host="namenodehost",
>                       port=9000,
>                       user="tgraves",
>                       driver='libhdfs',
>                       kerb_ticket=None,
>                       extra_conf=None)
>     fs = pa.hdfs.connect(**hdfs_kwargs)
>     res = fs.exists("/user/tgraves")
>  
> Error that I get on Hadoop 3.x is:
>  
> dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
> ClassCastException: 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to 
> org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException:
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>         at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>         at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9019) [Python] hdfs fails to connect to for HDFS 3.x cluster

2021-01-11 Thread Bradley Miro (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262883#comment-17262883
 ] 

Bradley Miro commented on ARROW-9019:
-

Hey [~tgraves],

This ended up working for me:
{code:java}
export CLASSPATH=$(hadoop classpath --glob){code}
(assuming the Hadoop binary is in your PATH)

> [Python] hdfs fails to connect to for HDFS 3.x cluster
> --
>
> Key: ARROW-9019
> URL: https://issues.apache.org/jira/browse/ARROW-9019
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Thomas Graves
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: filesystem, hdfs
>
> I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an 
> error that looks like a protobuf or jar mismatch problem with Hadoop. The 
> same code works on a Hadoop 2.9 cluster.
>  
> I'm wondering if there is something special I need to do or if pyarrow 
> doesn't support Hadoop 3.x yet?
> Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
>  
>     import pyarrow as pa
>     hdfs_kwargs = dict(host="namenodehost",
>                       port=9000,
>                       user="tgraves",
>                       driver='libhdfs',
>                       kerb_ticket=None,
>                       extra_conf=None)
>     fs = pa.hdfs.connect(**hdfs_kwargs)
>     res = fs.exists("/user/tgraves")
>  
> Error that I get on Hadoop 3.x is:
>  
> dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
> ClassCastException: 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to 
> org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException:
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>         at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>         at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10663) [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions



 [ 
https://issues.apache.org/jira/browse/ARROW-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-10663:
---
Labels: pull-request-available  (was: )

> [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
> ---
>
> Key: ARROW-10663
> URL: https://issues.apache.org/jira/browse/ARROW-10663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The C++ docs of {{SetLookupOptions}} has this explanation of the 
> {{skip_nulls}} option:
> {code}
>   /// Whether nulls in `value_set` count for lookup.
>   ///
>   /// If true, any null in `value_set` is ignored and nulls in the input
>   /// produce null (IndexIn) or false (IsIn) values in the output.
>   /// If false, any null in `value_set` is successfully matched in
>   /// the input.
>   bool skip_nulls;
> {code}
> (from 
> https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)
> However, for {{IsIn}} this explanation doesn't seem to hold in practice:
> {code}
> In [16]: arr = pa.array([1, 2, None])
> In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[17]: 
> 
> [
>   true,
>   false,
>   true
> ]
> In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[18]: 
> 
> [
>   true,
>   false,
>   true
> ]
> {code}
> This documentation was added in https://github.com/apache/arrow/pull/7695 
> (ARROW-8989)/
> .
> BTW, for "index_in", it works as documented:
> {code}
> In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[19]: 
> 
> [
>   0,
>   null,
>   null
> ]
> In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[20]: 
> 
> [
>   0,
>   null,
>   1
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11209) DF - Provide better error message on unsupported GROUP BY



 [ 
https://issues.apache.org/jira/browse/ARROW-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11209:
---
Labels: pull-request-available  (was: )

> DF - Provide better error message on unsupported GROUP BY
> -
>
> Key: ARROW-11209
> URL: https://issues.apache.org/jira/browse/ARROW-11209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Patsura Dmitry
>Assignee: Patsura Dmitry
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello!
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11209) DF - Provide better error message on unsupported GROUP BY

2021-01-11 Thread Patsura Dmitry (Jira)

Patsura Dmitry created ARROW-11209:
--

 Summary: DF - Provide better error message on unsupported GROUP BY
 Key: ARROW-11209
 URL: https://issues.apache.org/jira/browse/ARROW-11209
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Patsura Dmitry
Assignee: Patsura Dmitry


Hello!

 

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9019) [Python] hdfs fails to connect to for HDFS 3.x cluster

2021-01-11 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262875#comment-17262875
 ] 

Thomas Graves commented on ARROW-9019:
--

ping on this again, any information or ideas on working around or fixing?

> [Python] hdfs fails to connect to for HDFS 3.x cluster
> --
>
> Key: ARROW-9019
> URL: https://issues.apache.org/jira/browse/ARROW-9019
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Thomas Graves
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: filesystem, hdfs
>
> I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an 
> error that looks like a protobuf or jar mismatch problem with Hadoop. The 
> same code works on a Hadoop 2.9 cluster.
>  
> I'm wondering if there is something special I need to do or if pyarrow 
> doesn't support Hadoop 3.x yet?
> Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
>  
>     import pyarrow as pa
>     hdfs_kwargs = dict(host="namenodehost",
>                       port=9000,
>                       user="tgraves",
>                       driver='libhdfs',
>                       kerb_ticket=None,
>                       extra_conf=None)
>     fs = pa.hdfs.connect(**hdfs_kwargs)
>     res = fs.exists("/user/tgraves")
>  
> Error that I get on Hadoop 3.x is:
>  
> dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
> ClassCastException: 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to 
> org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException:
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
>  cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>         at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>         at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-11207) Can't install pyarrow with python 3.9.1



 [ 
https://issues.apache.org/jira/browse/ARROW-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou closed ARROW-11207.

Resolution: Duplicate

The next release will support Python 3.9.
You can use nightly build for now: 
https://arrow.apache.org/docs/python/install.html#installing-nightly-packages

> Can't install pyarrow with python 3.9.1
> ---
>
> Key: ARROW-11207
> URL: https://issues.apache.org/jira/browse/ARROW-11207
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 2.0.0
>Reporter: Oksana Shadura
>Priority: Major
>
> After package manager updates (OS Manjaro), I can't install pyarrow via pip:
>  
>  
> {code:java}
> $python --version 
>  Python 3.9.1
> $pip install pyarrow 
>  Defaulting to user installation because normal site-packages is not writeable
>  Collecting pyarrow
>  Using cached pyarrow-2.0.0.tar.gz (58.9 MB)
>  Installing build dependencies ... error
>  ERROR: Command errored out with exit status 1:
>  command: /usr/bin/python /usr/lib/python3.9/site-packages/pip install 
> --ignore-installed --no-user --prefix /tmp/pip-build-env-m7rsjg8s/overlay 
> --no-warn-script-location --no-binary :none: --only-binary :none: -i 
> https://pypi.org/simple – 'cython >= 0.29' 'numpy==1.14.5; 
> python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' 
> setuptools setuptools_scm wheel
>  cwd: None
>  Complete output (3738 lines):
>  Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
>  Collecting cython>=0.29
>  Using cached Cython-0.29.21-cp39-cp39-manylinux1_x86_64.whl (1.9 MB)
>  Collecting numpy==1.16.0
>  Using cached numpy-1.16.0.zip (5.1 MB)
>  Collecting setuptools
>  Using cached setuptools-51.1.2-py3-none-any.whl (784 kB)
>  Collecting setuptools_scm
>  Using cached setuptools_scm-5.0.1-py2.py3-none-any.whl (28 kB)
>  Collecting wheel
>  Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
>  Building wheels for collected packages: numpy
>  Building wheel for numpy (setup.py): started
>  Building wheel for numpy (setup.py): still running...
>  Building wheel for numpy (setup.py): finished with status 'error'
>  ERROR: Command errored out with exit status 1:
>  command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; 
> sys.argv[0] = '"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"'; 
> _file='"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"';f=getattr(tokenize, 
> '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', 
> '"'"'\n'"'"');f.close();exec(compile(code, __file_, '"'"'exec'"'"'))' 
> bdist_wheel -d /tmp/pip-wheel-p7wh1sc1
>  cwd: /tmp/pip-install-f4fzbfyb/numpy/
>  Complete output (3222 lines):
>  Running from numpy source directory.
>  /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/misc_util.py:476: 
> SyntaxWarning: "is" with a literal. Did you mean "=="?
>  return is_string(s) and ('*' in s or '?' is s)
>  blas_opt_info:
>  blas_mkl_info:
>  customize UnixCCompiler
>  libraries mkl_rt not found in ['/usr/local/lib64', '/usr/local/lib', 
> '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> blis_info:
>  customize UnixCCompiler
>  libraries blis not found in ['/usr/local/lib64', '/usr/local/lib', 
> '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> openblas_info:
>  customize UnixCCompiler
>  customize UnixCCompiler
>  libraries openblas not found in ['/usr/local/lib64', '/usr/local/lib', 
> '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> atlas_3_10_blas_threads_info:
>  Setting PTATLAS=ATLAS
>  customize UnixCCompiler
>  libraries tatlas not found in ['/usr/local/lib64', '/usr/local/lib', 
> '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> atlas_3_10_blas_info:
>  customize UnixCCompiler
>  libraries satlas not found in ['/usr/local/lib64', '/usr/local/lib', 
> '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> atlas_blas_threads_info:
>  Setting PTATLAS=ATLAS
>  customize UnixCCompiler
>  libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', 
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> atlas_blas_info:
>  customize UnixCCompiler
>  libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', 
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
>  NOT AVAILABLE
> accelerate_info:
>  NOT AVAILABLE
> /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/system_info.py:625: 
> UserWarning:
>  Atlas (http://math-atlas.sourceforge.net/) libraries not found.
>  Directories to search for the libraries can be specified in the
>  numpy/distutils/site.cfg file (section [atlas]) or by setting
>  the ATLAS environment variable.
>  self.calc_info()
>  blas_info:
>  customize UnixCCompiler
>  customize UnixCCompiler
>  C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g 
> -fwrapv

[jira] [Commented] (ARROW-10463) [R] Better messaging for currently unsupported CSV options in open_dataset



[ 
https://issues.apache.org/jira/browse/ARROW-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262867#comment-17262867
 ] 

Ian Cook commented on ARROW-10463:
--

[PR #9143|https://github.com/apache/arrow/pull/9143] improves the error message 
that occurs when the user specifies unsupported readr-style options. The 
example above would yield this error message:
{code:r}
 Error: The following option is supported in "read_delim_arrow" functions but 
not yet supported here: "na" 
{code}
The added code matches both full and partial option names and improves handling 
of several other modes of unsupported options.


> [R] Better messaging for currently unsupported CSV options in open_dataset
> --
>
> Key: ARROW-10463
> URL: https://issues.apache.org/jira/browse/ARROW-10463
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
>Reporter: Gabriel Bassett
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> While read_csv_arrow()'s signature matches readr,  the 
> readr_to_csv_parse_options() function (called by way of open_dataset()) only 
> appears to capture a subset of those options:
> (https://github.com/apache/arrow/blob/883eb572bc64430307112895976ba79df10c8c7d/r/R/csv.R#L464)
> {code:java}
> readr_to_csv_parse_options <- function(delim = ",",
>  quote = '"',
>  escape_double = TRUE,
>  escape_backslash = FALSE,
>  skip_empty_rows = TRUE){code}
> I ran into this trying to use a non-standard 'na' value:
>  
> {code:java}
> open_dataset("/path/to/csv/directory/", schema = sch, partitioning=NULL, 
> format="csv", delim=";", na="\\N", escape_backslash=TRUE, 
> escape_double=FALSE`)
> Error in readr_to_csv_parse_options(...) : unused argument (na = "\\N")
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11204) [C++] Fix build failure with bundled gRPC and Protobuf on non MSVC



 [ 
https://issues.apache.org/jira/browse/ARROW-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-11204:
-
Fix Version/s: 3.0.0

> [C++] Fix build failure with bundled gRPC and Protobuf on non MSVC
> --
>
> Key: ARROW-11204
> URL: https://issues.apache.org/jira/browse/ARROW-11204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is caused by ARROW-9400.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11204) [C++] Fix build failure with bundled gRPC and Protobuf on non MSVC



 [ 
https://issues.apache.org/jira/browse/ARROW-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-11204:
-
Priority: Blocker  (was: Major)

> [C++] Fix build failure with bundled gRPC and Protobuf on non MSVC
> --
>
> Key: ARROW-11204
> URL: https://issues.apache.org/jira/browse/ARROW-11204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is caused by ARROW-9400.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize



 [ 
https://issues.apache.org/jira/browse/ARROW-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-3919:
-
Fix Version/s: (was: 3.0.0)

> [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize
> -
>
> Key: ARROW-3919
> URL: https://issues.apache.org/jira/browse/ARROW-3919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available, pyarrow-serialization
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> see https://github.com/modin-project/modin/issues/266



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-11208) Situs Agen Judi Slot 4d Online 2021



 [ 
https://issues.apache.org/jira/browse/ARROW-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-11208.
--
Resolution: Invalid

> Situs Agen Judi Slot 4d Online 2021
> ---
>
> Key: ARROW-11208
> URL: https://issues.apache.org/jira/browse/ARROW-11208
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: JS-0.4.1
>Reporter: super mario
>Priority: Major
>  Labels: MNCCASH, Slotonline2021
> Fix For: 2.0.1
>
> Attachments: image-2021-01-11-23-00-01-481.png, 
> image-2021-01-11-23-36-52-954.png
>
>
> h1. Situs Judi Slot Online Terlengkap Dan Terbaru 2021
> MNCCASH merupakan situs judi slot online terpercaya 2021 mudah menang dengan 
> deposit via pulsa dan pastinya tanpa potongan. Sangat nyaman dan aman bermain 
> di situs agen judi slot online terbesar 2021 via pulsa karena telah 
> memperoleh lisensi resmi dari PAGCOR, Filipina. Betor slot telah membuktikan 
> sendiri bahwa MNCCASH adalah website situs slot online terbaik di indonesia.
> !image-2021-01-11-23-00-01-481.png|width=455,height=260!
> *Link Daftar Alternatif* : [https://rebrand.ly/DJotformMNCCASH]
> [!image-2021-01-11-23-36-52-954.png!|https://rebrand.ly/DaftarMNCCASH]
> h2. Daftar Situs Judi Slot Online Terpercaya 2021
> MNCCASH sebagai penyedia puluhan provider agen slot terpercaya sudah pasti 
> tidak ingin mengecewakan membernya. Menyediakan berbagai jenis alat transaksi 
> agar memudahkan member melakukan deposit untuk bermain. Deposit dapat 
> menggunakan *Bank BCA*, *MANDIRI*, *BNI*, *BRI*, *OVO PAYMENT* Serta *Via 
> Pulsa Telkomsel*.
> Menjadi kumpulan *situs judi slot online* terpercaya 2021 MNCCASH tidak lah 
> mudah karena banyak sekali rintangan yang harus di hadapi. Banyak permintaan 
> dari member yang harus di kabulkan seperti memperoleh akun *slot joker* resmi 
> terbaru 2021 dan bisa deposit pulsa tanpa potongan. Hal seperti itu sangatlah 
> tidak mudah untuk di kabulkan bagi bandar slot online abal-abal, Namun tidak 
> pada MNCCASH. Joker gaming adalah permainan slot online pakai pulsa yang 
> sangat booming di tengah masyarakat saat ini.
> h2. Situs Judi Bola Dan Game Slot Casino Online Resmi Indonesia
> Situs MNCCASH pastinya selalu siap melayani betor dalam menyediakan platform 
> permainan slot online terlengkap dan terpercaya selama 24jam penuh. MNCCASH 
> termasuk pada nama nama situs judi slot online 4d terpercaya dan bisa deposit 
> via pulsa tanpa potongan.
> Tidak lupa juga kami menyebutkan Customer Service MNCCASH yang bertugas 
> selama 24jam untuk mengatasi permasalahan dari para betor. Kapan saja para 
> betor slot online membutuhkan bantuan pasti akan langsung terbantu dengan 
> layanan customer service 24jam dari kami.
> h2. Situs Judi Slot 4d Online Terpercaya Indonesia 2021
> Sejak dahulu kala permainan slot telah sangat di gemari oleh para pemain 
> casino namun belum ada yang membuatnya menjadi permainan online. Sebuah mesin 
> slot sangat besar dan tinggi hingga mencapai 2 meteran dan memerlukan uang 
> lembaran atau koin untuk dapat bermain. Memasuki era digital belum lama ini 
> website bandar slot online deposit pulsa terpercaya MNCCASH langsung 
> melakukan terobosan baru. Dengan sangat cepat merubah gaya permainan slot 
> menggunakan mesin menjadi slot online terbaik sultan play pro yang dapat 
> betor mainkan dari smartphone android maupun iOs. Sebuah perencanaan yang 
> sangat matang sekali situs judi slot online deposit pulsa terbaru dan terbaik 
> 2021 MNCCASH dapat melakukan perubahan seperti itu.
> h2. Daftar Akun Slot 4d Deposit Pulsa Tanpa Potongan
> Para betor sampai saat sekarang ini tidak perlu lagi repot untuk pergi ke 
> casino bermain judi slot. Terobosan baru dari MNCCASH akan membuat para betor 
> jadi bisa merasakan sensasi bermain di situs judi slot online resmi deposit 
> via pulsa terpercaya 2021. Hanya dengan smartphone saja para betor bisa 
> melakukan daftar situs judi slot online terbaru dan terbaik 2021.
> h2. Daftar Situs Judi Slot 4d Online Terpercaya 2021
> Untuk semua member yang telah melakukan pendaftaran pada agen slot online 
> terlengkap dan juga melakukan deposit dapat bermain pada semua provider 
> permainan slot yang tersedia di MNCCASH. Hanya dengan 1 user id member sudah 
> bisa bermain pada ratusan permainan pada situs judi slot online terlengkap 
> yang tersedia. Beberapa dari provider slot online terpercaya 2021 akan kami 
> ulas di bawah ini hanya untuk anda.
> h3. Slot HABANERO Online 2021
> Slot HABANERO telah menjadi platform yang sangat menarik karena terdapat 
> beberapa permainan yang sering kali turun jackpot maupun memperoleh bonus 
> game. Platform Slot Habanero juga telah mendapatkan lisensi resmi dari 16 
> negara di dunia serta sertifikat resmi dari BMMtestlabs.

[jira] [Updated] (ARROW-11175) [R] Small docs fixes



 [ 
https://issues.apache.org/jira/browse/ARROW-11175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11175:

Fix Version/s: 3.0.0

> [R] Small docs fixes
> 
>
> Key: ARROW-11175
> URL: https://issues.apache.org/jira/browse/ARROW-11175
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Minor
> Fix For: 3.0.0
>
>
> * shell {{\}} is missing on the first line
> * On https://arrow.apache.org/docs/r/reference/write_to_raw.html, the link to 
> {{write_ipc_stream()}} is broken
> * the source links on pkgdown need a subdir (this might need a pkgdown fix)
> * the example cmake command on README.md needs a linebreak slash on the first 
> line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11208) Situs Agen Judi Slot 4d Online 2021

2021-01-11 Thread super mario (Jira)

[
https://issues.apache.org/jira/browse/ARROW-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

super mario updated ARROW-11208:

Description:
h1. Situs Judi Slot Online Terlengkap Dan Terbaru 2021

MNCCASH merupakan situs judi slot online terpercaya 2021 mudah menang dengan
deposit via pulsa dan pastinya tanpa potongan. Sangat nyaman dan aman bermain
di situs agen judi slot online terbesar 2021 via pulsa karena telah memperoleh
lisensi resmi dari PAGCOR, Filipina. Betor slot telah membuktikan sendiri bahwa
MNCCASH adalah website situs slot online terbaik di indonesia.

!image-2021-01-11-23-00-01-481.png|width=455,height=260!

*Link Daftar Alternatif* : [https://rebrand.ly/DJotformMNCCASH]

[!image-2021-01-11-23-36-52-954.png!|https://rebrand.ly/DaftarMNCCASH]
h2. Daftar Situs Judi Slot Online Terpercaya 2021

MNCCASH sebagai penyedia puluhan provider agen slot terpercaya sudah pasti
tidak ingin mengecewakan membernya. Menyediakan berbagai jenis alat transaksi
agar memudahkan member melakukan deposit untuk bermain. Deposit dapat
menggunakan *Bank BCA*, *MANDIRI*, *BNI*, *BRI*, *OVO PAYMENT* Serta *Via Pulsa
Telkomsel*.

Menjadi kumpulan *situs judi slot online* terpercaya 2021 MNCCASH tidak lah
mudah karena banyak sekali rintangan yang harus di hadapi. Banyak permintaan
dari member yang harus di kabulkan seperti memperoleh akun *slot joker* resmi
terbaru 2021 dan bisa deposit pulsa tanpa potongan. Hal seperti itu sangatlah
tidak mudah untuk di kabulkan bagi bandar slot online abal-abal, Namun tidak
pada MNCCASH. Joker gaming adalah permainan slot online pakai pulsa yang sangat
booming di tengah masyarakat saat ini.
h2. Situs Judi Bola Dan Game Slot Casino Online Resmi Indonesia

Situs MNCCASH pastinya selalu siap melayani betor dalam menyediakan platform
permainan slot online terlengkap dan terpercaya selama 24jam penuh. MNCCASH
termasuk pada nama nama situs judi slot online 4d terpercaya dan bisa deposit
via pulsa tanpa potongan.

Tidak lupa juga kami menyebutkan Customer Service MNCCASH yang bertugas selama
24jam untuk mengatasi permasalahan dari para betor. Kapan saja para betor slot
online membutuhkan bantuan pasti akan langsung terbantu dengan layanan customer
service 24jam dari kami.
h2. Situs Judi Slot 4d Online Terpercaya Indonesia 2021

Sejak dahulu kala permainan slot telah sangat di gemari oleh para pemain casino
namun belum ada yang membuatnya menjadi permainan online. Sebuah mesin slot
sangat besar dan tinggi hingga mencapai 2 meteran dan memerlukan uang lembaran
atau koin untuk dapat bermain. Memasuki era digital belum lama ini website
bandar slot online deposit pulsa terpercaya MNCCASH langsung melakukan
terobosan baru. Dengan sangat cepat merubah gaya permainan slot menggunakan
mesin menjadi slot online terbaik sultan play pro yang dapat betor mainkan dari
smartphone android maupun iOs. Sebuah perencanaan yang sangat matang sekali
situs judi slot online deposit pulsa terbaru dan terbaik 2021 MNCCASH dapat
melakukan perubahan seperti itu.
h2. Daftar Akun Slot 4d Deposit Pulsa Tanpa Potongan

Para betor sampai saat sekarang ini tidak perlu lagi repot untuk pergi ke
casino bermain judi slot. Terobosan baru dari MNCCASH akan membuat para betor
jadi bisa merasakan sensasi bermain di situs judi slot online resmi deposit via
pulsa terpercaya 2021. Hanya dengan smartphone saja para betor bisa melakukan
daftar situs judi slot online terbaru dan terbaik 2021.
h2. Daftar Situs Judi Slot 4d Online Terpercaya 2021

Untuk semua member yang telah melakukan pendaftaran pada agen slot online
terlengkap dan juga melakukan deposit dapat bermain pada semua provider
permainan slot yang tersedia di MNCCASH. Hanya dengan 1 user id member sudah
bisa bermain pada ratusan permainan pada situs judi slot online terlengkap yang
tersedia. Beberapa dari provider slot online terpercaya 2021 akan kami ulas di
bawah ini hanya untuk anda.
h3. Slot HABANERO Online 2021

Slot HABANERO telah menjadi platform yang sangat menarik karena terdapat
beberapa permainan yang sering kali turun jackpot maupun memperoleh bonus game.
Platform Slot Habanero juga telah mendapatkan lisensi resmi dari 16 negara di
dunia serta sertifikat resmi dari BMMtestlabs. Untuk jenis permainan yang
paling sering turun jackpot adalah Koi Gate, Super Twister, Fire Rooster, Fa
Cai Shen dan LuckyLucky.

Di setiap bulannya juga slot Habanero 2021 selalu menambahkan game terbaru
mereka untuk memuaskan para member yang bermain. Game Slot Habanero 2021 yang
sering membayar besar boleh member coba bermain di 5 Lucky Lions, LuckyLucky
dan Koi Gate.
h3. Game Slot Spadegaming Online 2021

Permainan slot spadegaming 2021 MNCCASH sudah termasuk provider lama namun
masih tetap memberikan hasil yang memuaskan bagi member. Slot Spadegaming 2021
telah menjadi top asia slot gaming pada beberapa negara di asia. Slot Online
Spadegaming memang merupakan

[jira] [Assigned] (ARROW-10663) [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions



 [ 
https://issues.apache.org/jira/browse/ARROW-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-10663:
--

Assignee: Antoine Pitrou

> [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
> ---
>
> Key: ARROW-10663
> URL: https://issues.apache.org/jira/browse/ARROW-10663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 3.0.0
>
>
> The C++ docs of {{SetLookupOptions}} has this explanation of the 
> {{skip_nulls}} option:
> {code}
>   /// Whether nulls in `value_set` count for lookup.
>   ///
>   /// If true, any null in `value_set` is ignored and nulls in the input
>   /// produce null (IndexIn) or false (IsIn) values in the output.
>   /// If false, any null in `value_set` is successfully matched in
>   /// the input.
>   bool skip_nulls;
> {code}
> (from 
> https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)
> However, for {{IsIn}} this explanation doesn't seem to hold in practice:
> {code}
> In [16]: arr = pa.array([1, 2, None])
> In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[17]: 
> 
> [
>   true,
>   false,
>   true
> ]
> In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[18]: 
> 
> [
>   true,
>   false,
>   true
> ]
> {code}
> This documentation was added in https://github.com/apache/arrow/pull/7695 
> (ARROW-8989)/
> .
> BTW, for "index_in", it works as documented:
> {code}
> In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[19]: 
> 
> [
>   0,
>   null,
>   null
> ]
> In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[20]: 
> 
> [
>   0,
>   null,
>   1
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-9293) [R] Add chunk_size to Table$create()



 [ 
https://issues.apache.org/jira/browse/ARROW-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-9293:
---
Fix Version/s: (was: 3.0.0)
   4.0.0

> [R] Add chunk_size to Table$create()
> 
>
> Key: ARROW-9293
> URL: https://issues.apache.org/jira/browse/ARROW-9293
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain Francois
>Priority: Major
> Fix For: 4.0.0
>
>
> While working on ARROW-3308, I noticed that write_feather has a chunk_size 
> argument, which by default will write batches of 64k rows into the file. In 
> principle, a chunking strategy like this would prevent the need to bump up to 
> large_utf8 when ingesting a large character vector because you'd end up with 
> many chunks that each fit into a regular utf8 type. However, the way the 
> function works, the data.frame is converted to a Table with all ChunkedArrays 
> containing a single chunk first, which is where the large_utf8 type gets set. 
> But if Table$create() could be instructed to make multiple chunks, this would 
> be resolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-10850) [R] Unrecognized compression type: LZ4



 [ 
https://issues.apache.org/jira/browse/ARROW-10850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-10850.
-
  Assignee: Jonathan Keane
Resolution: Duplicate

Fix was added in ARROW-11163; ARROW-10623 adds forwards/backwards compatibility 
testing to prevent future issues.

> [R] Unrecognized compression type: LZ4
> --
>
> Key: ARROW-10850
> URL: https://issues.apache.org/jira/browse/ARROW-10850
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: Windows 10, R 3.6.2, RStudio 1.3.1073
>Reporter: Chris Kennedy
>Assignee: Jonathan Keane
>Priority: Major
>  Labels: LZ4
> Fix For: 3.0.0
>
>
> Hello,
> I have recently re-installed Arrow from CRAN in R 3.6.2 and it no longer can 
> import a feather file with LZ4 compression (whereas in previous months this 
> worked fine):
> {code:java}
> > data = suppressWarnings(arrow::read_feather("blah.feather"))
> {code}
> {noformat}
> Error in ipc___feather___Reader__Read(self, columns) : Invalid: Unrecognized 
> compression type: LZ4{noformat}
> I have attempted to install from source but continue to receive this error. 
> According to the documentation though shouldn't the CRAN package also have 
> LZ4 support? Is it possible that the CRAN build has lost LZ4 support? My 
> feather file was created in pandas.
> If I install version 1.0.1 I can import the data correctly:
> {code:java}
> devtools::install_version("arrow", version = "1.0.1"){code}
> Happy to send over any other information that could be helpful, and apologies 
> if I am making some mistake on my end.
> Thanks,
> Chris



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11208) Situs Agen Judi Slot 4d Online 2021

2021-01-11 Thread super mario (Jira)

super mario created ARROW-11208:
---

 Summary: Situs Agen Judi Slot 4d Online 2021
 Key: ARROW-11208
 URL: https://issues.apache.org/jira/browse/ARROW-11208
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Documentation
Affects Versions: JS-0.4.1
Reporter: super mario
 Fix For: 2.0.1
 Attachments: image-2021-01-11-23-00-01-481.png, 
image-2021-01-11-23-36-52-954.png

h1. Situs Judi Slot Online Terlengkap Dan Terbaru 2021

MNCCASH merupakan situs judi slot online terpercaya 2021 mudah menang dengan 
deposit via pulsa dan pastinya tanpa potongan. Sangat nyaman dan aman bermain 
di situs agen judi slot online terpercaya 2021 via pulsa karena telah 
memperoleh lisensi resmi dari PAGCOR, Filipina. Betor slot telah membuktikan 
sendiri bahwa MNCCASH adalah website situs slot online terbaik di indonesia.

!image-2021-01-11-23-00-01-481.png|width=455,height=260!

*Link Daftar Alternatif* : [https://rebrand.ly/DJotformMNCCASH]

[!image-2021-01-11-23-36-52-954.png!|https://rebrand.ly/DaftarMNCCASH]

h2. Daftar Situs Judi Slot Online Terpercaya 2021

MNCCASH sebagai penyedia puluhan provider agen slot terpercaya sudah pasti 
tidak ingin mengecewakan membernya. Menyediakan berbagai jenis alat transaksi 
agar memudahkan member melakukan deposit untuk bermain. Deposit dapat 
menggunakan *Bank BCA*, *MANDIRI*, *BNI*, *BRI*, *OVO PAYMENT* Serta *Via Pulsa 
Telkomsel*.

Menjadi kumpulan *situs judi slot online* terpercaya 2021 MNCCASH tidak lah 
mudah karena banyak sekali rintangan yang harus di hadapi. Banyak permintaan 
dari member yang harus di kabulkan seperti memperoleh akun *slot joker* resmi 
terbaru 2021 dan bisa deposit pulsa tanpa potongan. Hal seperti itu sangatlah 
tidak mudah untuk di kabulkan bagi bandar slot online abal-abal, Namun tidak 
pada MNCCASH. Joker gaming adalah permainan slot online pakai pulsa yang sangat 
booming di tengah masyarakat saat ini.
h2. Situs Judi Bola Dan Game Slot Casino Online Resmi Indonesia

Situs MNCCASH pastinya selalu siap melayani betor dalam menyediakan platform 
permainan slot online terlengkap dan terpercaya selama 24jam penuh. MNCCASH 
termasuk pada nama nama situs judi slot online 4d terpercaya dan bisa deposit 
via pulsa tanpa potongan.

Tidak lupa juga kami menyebutkan Customer Service MNCCASH yang bertugas selama 
24jam untuk mengatasi permasalahan dari para betor. Kapan saja para betor slot 
online membutuhkan bantuan pasti akan langsung terbantu dengan layanan customer 
service 24jam dari kami.
h2. Situs Judi Slot 4d Online Terpercaya Indonesia 2021

Sejak dahulu kala permainan slot telah sangat di gemari oleh para pemain casino 
namun belum ada yang membuatnya menjadi permainan online. Sebuah mesin slot 
sangat besar dan tinggi hingga mencapai 2 meteran dan memerlukan uang lembaran 
atau koin untuk dapat bermain. Memasuki era digital belum lama ini website 
bandar slot online deposit pulsa terpercaya MNCCASH langsung melakukan 
terobosan baru. Dengan sangat cepat merubah gaya permainan slot menggunakan 
mesin menjadi slot online terbaik sultan play pro yang dapat betor mainkan dari 
smartphone android maupun iOs. Sebuah perencanaan yang sangat matang sekali 
situs judi slot online deposit pulsa terbaru dan terbaik 2021 MNCCASH dapat 
melakukan perubahan seperti itu.
h2. Daftar Akun Slot 4d Deposit Pulsa Tanpa Potongan

Para betor sampai saat sekarang ini tidak perlu lagi repot untuk pergi ke 
casino bermain judi slot. Terobosan baru dari MNCCASH akan membuat para betor 
jadi bisa merasakan sensasi bermain di situs judi slot online resmi deposit via 
pulsa terpercaya 2021. Hanya dengan smartphone saja para betor bisa melakukan 
daftar situs judi slot online terbaru dan terbaik 2021.
h2. Daftar Situs Judi Slot 4d Online Terpercaya 2021

Untuk semua member yang telah melakukan pendaftaran pada agen slot online 
terlengkap dan juga melakukan deposit dapat bermain pada semua provider 
permainan slot yang tersedia di MNCCASH. Hanya dengan 1 user id member sudah 
bisa bermain pada ratusan permainan pada situs judi slot online terlengkap yang 
tersedia. Beberapa dari provider slot online terpercaya 2021 akan kami ulas di 
bawah ini hanya untuk anda.
h3. Slot HABANERO Online 2021

Slot HABANERO telah menjadi platform yang sangat menarik karena terdapat 
beberapa permainan yang sering kali turun jackpot maupun memperoleh bonus game. 
Platform Slot Habanero juga telah mendapatkan lisensi resmi dari 16 negara di 
dunia serta sertifikat resmi dari BMMtestlabs. Untuk jenis permainan yang 
paling sering turun jackpot adalah Koi Gate, Super Twister, Fire Rooster, Fa 
Cai Shen dan LuckyLucky.

Di setiap bulannya juga slot Habanero 2021 selalu menambahkan game terbaru 
mereka untuk memuaskan para member yang bermain. Game Slot Habanero 2021 yang 
sering membayar besar boleh member coba bermain di 5 Lucky Lions, LuckyLucky 
dan

[jira] [Updated] (ARROW-11049) [Python] Expose alternate memory pools



 [ 
https://issues.apache.org/jira/browse/ARROW-11049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11049:
---
Labels: pull-request-available  (was: )

> [Python] Expose alternate memory pools
> --
>
> Key: ARROW-11049
> URL: https://issues.apache.org/jira/browse/ARROW-11049
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the default memory pool is exposed in Python but not the explicit 
> memory pool singletons (jemalloc, mimalloc, system).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11202) [R][CI] Nightly builds not happening (or artifacts not exported)



 [ 
https://issues.apache.org/jira/browse/ARROW-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-11202.
-
Fix Version/s: 3.0.0
 Assignee: Neal Richardson
   Resolution: Fixed

> [R][CI] Nightly builds not happening (or artifacts not exported)
> 
>
> Key: ARROW-11202
> URL: https://issues.apache.org/jira/browse/ARROW-11202
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Laurent
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 3.0.0
>
>
> The instructions to install a nightly build (here: 
> https://arrow.apache.org/docs/r/#installing-a-development-version) lead to a 
> version `2.0.0.20201222`, that does not appear to be a nightly build (in 
> particular changes from 20201230 are missing).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11202) [R][CI] Nightly builds not happening (or artifacts not exported)



[ 
https://issues.apache.org/jira/browse/ARROW-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262752#comment-17262752
 ] 

Neal Richardson commented on ARROW-11202:
-

Thanks. GitHub "helpfully" disabled the scheduled jobs due to inactivity for 60 
days. I just re-enabled them.

> [R][CI] Nightly builds not happening (or artifacts not exported)
> 
>
> Key: ARROW-11202
> URL: https://issues.apache.org/jira/browse/ARROW-11202
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Laurent
>Priority: Major
>
> The instructions to install a nightly build (here: 
> https://arrow.apache.org/docs/r/#installing-a-development-version) lead to a 
> version `2.0.0.20201222`, that does not appear to be a nightly build (in 
> particular changes from 20201230 are missing).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11207) Can't install pyarrow with python 3.9.1

2021-01-11 Thread Oksana Shadura (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oksana Shadura updated ARROW-11207:
---
Description: 
After package manager updates (OS Manjaro), I can't install pyarrow via pip:

 

 
{code:java}
$python --version 
 Python 3.9.1
$pip install pyarrow 
 Defaulting to user installation because normal site-packages is not writeable
 Collecting pyarrow
 Using cached pyarrow-2.0.0.tar.gz (58.9 MB)
 Installing build dependencies ... error
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python /usr/lib/python3.9/site-packages/pip install 
--ignore-installed --no-user --prefix /tmp/pip-build-env-m7rsjg8s/overlay 
--no-warn-script-location --no-binary :none: --only-binary :none: -i 
https://pypi.org/simple – 'cython >= 0.29' 'numpy==1.14.5; 
python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' 
setuptools setuptools_scm wheel
 cwd: None
 Complete output (3738 lines):
 Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
 Collecting cython>=0.29
 Using cached Cython-0.29.21-cp39-cp39-manylinux1_x86_64.whl (1.9 MB)
 Collecting numpy==1.16.0
 Using cached numpy-1.16.0.zip (5.1 MB)
 Collecting setuptools
 Using cached setuptools-51.1.2-py3-none-any.whl (784 kB)
 Collecting setuptools_scm
 Using cached setuptools_scm-5.0.1-py2.py3-none-any.whl (28 kB)
 Collecting wheel
 Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
 Building wheels for collected packages: numpy
 Building wheel for numpy (setup.py): started
 Building wheel for numpy (setup.py): still running...
 Building wheel for numpy (setup.py): finished with status 'error'
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] 
= '"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"'; 
_file='"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"';f=getattr(tokenize, 
'"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file_, '"'"'exec'"'"'))' 
bdist_wheel -d /tmp/pip-wheel-p7wh1sc1
 cwd: /tmp/pip-install-f4fzbfyb/numpy/
 Complete output (3222 lines):
 Running from numpy source directory.
 /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/misc_util.py:476: 
SyntaxWarning: "is" with a literal. Did you mean "=="?
 return is_string(s) and ('*' in s or '?' is s)
 blas_opt_info:
 blas_mkl_info:
 customize UnixCCompiler
 libraries mkl_rt not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
blis_info:
 customize UnixCCompiler
 libraries blis not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
openblas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 libraries openblas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_3_10_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries tatlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_3_10_blas_info:
 customize UnixCCompiler
 libraries satlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_blas_info:
 customize UnixCCompiler
 libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
accelerate_info:
 NOT AVAILABLE
/tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/system_info.py:625: UserWarning:
 Atlas (http://math-atlas.sourceforge.net/) libraries not found.
 Directories to search for the libraries can be specified in the
 numpy/distutils/site.cfg file (section [atlas]) or by setting
 the ATLAS environment variable.
 self.calc_info()
 blas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv 
-O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fPIC
creating /tmp/tmpl0lk1b35/tmp
 creating /tmp/tmpl0lk1b35/tmp/tmpl0lk1b35
 compile options: '-I/usr/local/include -I/usr/include -c'
 gcc: /tmp/tmpl0lk1b35/source.c
 /tmp/tmpl0lk1b35/source.c:1:10: fatal error: cblas.h: No such file or directory
 1 | #include 

^
 compilation terminated.
 customize UnixCCompiler
 FOUND:
 libraries = ['blas', 'blas']
 library_dirs = ['/usr/lib64']
 include_dirs = ['/usr/local/include', '/usr/include']
FOUND:
 define_macros = [('NO_ATLAS_INFO', 1)]
 libraries = ['blas', 'blas']
 library_dirs = ['/usr/lib64']

[jira] [Updated] (ARROW-11207) Can't install pyarrow with python 3.9.1

2021-01-11 Thread Oksana Shadura (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oksana Shadura updated ARROW-11207:
---
Description: 
After package manager updates (OS Manjaro), I can't install pyarrow:

 

 
{code:java}
$python --version 
 Python 3.9.1
$pip install pyarrow 
 Defaulting to user installation because normal site-packages is not writeable
 Collecting pyarrow
 Using cached pyarrow-2.0.0.tar.gz (58.9 MB)
 Installing build dependencies ... error
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python /usr/lib/python3.9/site-packages/pip install 
--ignore-installed --no-user --prefix /tmp/pip-build-env-m7rsjg8s/overlay 
--no-warn-script-location --no-binary :none: --only-binary :none: -i 
https://pypi.org/simple – 'cython >= 0.29' 'numpy==1.14.5; 
python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' 
setuptools setuptools_scm wheel
 cwd: None
 Complete output (3738 lines):
 Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
 Collecting cython>=0.29
 Using cached Cython-0.29.21-cp39-cp39-manylinux1_x86_64.whl (1.9 MB)
 Collecting numpy==1.16.0
 Using cached numpy-1.16.0.zip (5.1 MB)
 Collecting setuptools
 Using cached setuptools-51.1.2-py3-none-any.whl (784 kB)
 Collecting setuptools_scm
 Using cached setuptools_scm-5.0.1-py2.py3-none-any.whl (28 kB)
 Collecting wheel
 Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
 Building wheels for collected packages: numpy
 Building wheel for numpy (setup.py): started
 Building wheel for numpy (setup.py): still running...
 Building wheel for numpy (setup.py): finished with status 'error'
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] 
= '"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"'; 
_file='"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"';f=getattr(tokenize, 
'"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file_, '"'"'exec'"'"'))' 
bdist_wheel -d /tmp/pip-wheel-p7wh1sc1
 cwd: /tmp/pip-install-f4fzbfyb/numpy/
 Complete output (3222 lines):
 Running from numpy source directory.
 /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/misc_util.py:476: 
SyntaxWarning: "is" with a literal. Did you mean "=="?
 return is_string(s) and ('*' in s or '?' is s)
 blas_opt_info:
 blas_mkl_info:
 customize UnixCCompiler
 libraries mkl_rt not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
blis_info:
 customize UnixCCompiler
 libraries blis not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
openblas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 libraries openblas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_3_10_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries tatlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_3_10_blas_info:
 customize UnixCCompiler
 libraries satlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
atlas_blas_info:
 customize UnixCCompiler
 libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
accelerate_info:
 NOT AVAILABLE
/tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/system_info.py:625: UserWarning:
 Atlas (http://math-atlas.sourceforge.net/) libraries not found.
 Directories to search for the libraries can be specified in the
 numpy/distutils/site.cfg file (section [atlas]) or by setting
 the ATLAS environment variable.
 self.calc_info()
 blas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv 
-O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fPIC
creating /tmp/tmpl0lk1b35/tmp
 creating /tmp/tmpl0lk1b35/tmp/tmpl0lk1b35
 compile options: '-I/usr/local/include -I/usr/include -c'
 gcc: /tmp/tmpl0lk1b35/source.c
 /tmp/tmpl0lk1b35/source.c:1:10: fatal error: cblas.h: No such file or directory
 1 | #include 

^
 compilation terminated.
 customize UnixCCompiler
 FOUND:
 libraries = ['blas', 'blas']
 library_dirs = ['/usr/lib64']
 include_dirs = ['/usr/local/include', '/usr/include']
FOUND:
 define_macros = [('NO_ATLAS_INFO', 1)]
 libraries = ['blas', 'blas']
 library_dirs = ['/usr/lib64']

[jira] [Created] (ARROW-11207) Can't install pyarrow with python 3.9.1

2021-01-11 Thread Oksana Shadura (Jira)

Oksana Shadura created ARROW-11207:
--

 Summary: Can't install pyarrow with python 3.9.1
 Key: ARROW-11207
 URL: https://issues.apache.org/jira/browse/ARROW-11207
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 2.0.0
Reporter: Oksana Shadura


After package manager updates (OS Manjaro), I can't install pyarrow:

```

$python --version 
Python 3.9.1

$pip install pyarrow 
Defaulting to user installation because normal site-packages is not writeable
Collecting pyarrow
 Using cached pyarrow-2.0.0.tar.gz (58.9 MB)
 Installing build dependencies ... error
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python /usr/lib/python3.9/site-packages/pip install 
--ignore-installed --no-user --prefix /tmp/pip-build-env-m7rsjg8s/overlay 
--no-warn-script-location --no-binary :none: --only-binary :none: -i 
https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; 
python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' 
setuptools setuptools_scm wheel
 cwd: None
 Complete output (3738 lines):
 Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
 Collecting cython>=0.29
 Using cached Cython-0.29.21-cp39-cp39-manylinux1_x86_64.whl (1.9 MB)
 Collecting numpy==1.16.0
 Using cached numpy-1.16.0.zip (5.1 MB)
 Collecting setuptools
 Using cached setuptools-51.1.2-py3-none-any.whl (784 kB)
 Collecting setuptools_scm
 Using cached setuptools_scm-5.0.1-py2.py3-none-any.whl (28 kB)
 Collecting wheel
 Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
 Building wheels for collected packages: numpy
 Building wheel for numpy (setup.py): started
 Building wheel for numpy (setup.py): still running...
 Building wheel for numpy (setup.py): finished with status 'error'
 ERROR: Command errored out with exit status 1:
 command: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] 
= '"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"'; 
__file__='"'"'/tmp/pip-install-f4fzbfyb/numpy/setup.py'"'"';f=getattr(tokenize, 
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' 
bdist_wheel -d /tmp/pip-wheel-p7wh1sc1
 cwd: /tmp/pip-install-f4fzbfyb/numpy/
 Complete output (3222 lines):
 Running from numpy source directory.
 /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/misc_util.py:476: 
SyntaxWarning: "is" with a literal. Did you mean "=="?
 return is_string(s) and ('*' in s or '?' is s)
 blas_opt_info:
 blas_mkl_info:
 customize UnixCCompiler
 libraries mkl_rt not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 blis_info:
 customize UnixCCompiler
 libraries blis not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 openblas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 libraries openblas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 atlas_3_10_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries tatlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 atlas_3_10_blas_info:
 customize UnixCCompiler
 libraries satlas not found in ['/usr/local/lib64', '/usr/local/lib', 
'/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 atlas_blas_threads_info:
 Setting PTATLAS=ATLAS
 customize UnixCCompiler
 libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 atlas_blas_info:
 customize UnixCCompiler
 libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', 
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/']
 NOT AVAILABLE
 
 accelerate_info:
 NOT AVAILABLE
 
 /tmp/pip-install-f4fzbfyb/numpy/numpy/distutils/system_info.py:625: 
UserWarning:
 Atlas (http://math-atlas.sourceforge.net/) libraries not found.
 Directories to search for the libraries can be specified in the
 numpy/distutils/site.cfg file (section [atlas]) or by setting
 the ATLAS environment variable.
 self.calc_info()
 blas_info:
 customize UnixCCompiler
 customize UnixCCompiler
 C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv 
-O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt 
-march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fPIC
 
 creating /tmp/tmpl0lk1b35/tmp
 creating /tmp/tmpl0lk1b35/tmp/tmpl0lk1b35
 compile options: '-I/usr/local/include -I/usr/include -c'
 gcc: /tmp/tmpl0lk1b35/source.c
 /tmp/tmpl0lk1b35/source.c:1:10: fatal error: cblas.h: No such file or directory
 1 | #include 
 | ^
 compilation terminated.
 customize UnixCCompiler
 FOUND:
 libraries = ['blas',

[jira] [Resolved] (ARROW-11163) [C++][Python] Compressed Feather file written with pyarrow 0.17 not readable in pyarrow 2.0.0+



 [ 
https://issues.apache.org/jira/browse/ARROW-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-11163.

Resolution: Fixed

Issue resolved by pull request 9128
[https://github.com/apache/arrow/pull/9128]

> [C++][Python] Compressed Feather file written with pyarrow 0.17 not readable 
> in pyarrow 2.0.0+
> --
>
> Key: ARROW-11163
> URL: https://issues.apache.org/jira/browse/ARROW-11163
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Originally from 
> https://stackoverflow.com/questions/65413407/reading-in-feather-file-in-pyarrow-error-arrowinvalid-unrecognized-compressio
> Writing with pyarrow 0.17:
> {code:python}
> In [1]: pa.__version__
> Out[1]: '0.17.0'
> In [2]: table = pa.table({'a': range(100)})
> In [3]: from pyarrow import feather
> In [4]: feather.write_feather(table, "test_pa017_explicit.feather", 
> compression="lz4", version=2)
> # according to docstring, this should do the same, but apparently not
> In [5]: feather.write_feather(table, "test_pa017_default.feather")
> {code}
> Reading with pyarrow 1.0.0 works for both files, but reading it with master 
> (pyarrow 2.0.0 gives the same error):
> {code:python}
> In [121]: pa.__version__
> Out[121]: '3.0.0.dev552+g634f993f4'
> In [123]: feather.read_table("test_pa017_default.feather")
> Out[123]:
> pyarrow.Table
> a: int64
> In [124]: feather.read_table("test_pa017_explicit.feather")
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 feather.read_table("test_py017_explicit.feather")
> ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, 
> memory_map)
> 238
> 239 if columns is None:
> --> 240 return reader.read()
> 241
> 242 column_types = [type(column) for column in columns]
> ~/scipy/repos/arrow/python/pyarrow/feather.pxi in 
> pyarrow.lib.FeatherReader.read()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Unrecognized compression type: LZ4
> In ../src/arrow/ipc/reader.cc, line 538, code: (_error_or_value8).status()
> In ../src/arrow/ipc/reader.cc, line 594, code: 
> GetCompressionExperimental(message, )
> In ../src/arrow/ipc/reader.cc, line 942, code: (_error_or_value23).status()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11206) [C++][Dataset][Python] Consider hiding/renaming "project"



 [ 
https://issues.apache.org/jira/browse/ARROW-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-11206:
-
Labels: compute dataset  (was: )

> [C++][Dataset][Python] Consider hiding/renaming "project"
> -
>
> Key: ARROW-11206
> URL: https://issues.apache.org/jira/browse/ARROW-11206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: compute, dataset
> Fix For: 4.0.0
>
>
> The "project" compute Function is necessary for ARROW-11174. However it is 
> not intended for direct use outside an Expression ([where the correspondence 
> to projection is not immediately 
> obvious|https://github.com/apache/arrow/pull/9131#issuecomment-757764173]) so 
> it may be preferable to do one/more of:
>  * rename the function to "wrap_struct" or similar so it does make sense 
> outside Expressions
>  * ensure the function is not exposed to python/R bindings except through 
> Expressions
>  * remove the function from the default registry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8919) [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke



 [ 
https://issues.apache.org/jira/browse/ARROW-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8919:

Labels: compute  (was: )

> [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that 
> may require implicit casts to invoke
> --
>
> Key: ARROW-8919
> URL: https://issues.apache.org/jira/browse/ARROW-8919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: compute
> Fix For: 4.0.0
>
>
> Currently we have "DispatchExact" which requires an exact match of input 
> types. "DispatchBest" would permit kernel selection with implicit casts 
> required. Since multiple kernels may be valid when allowing implicit casts, 
> we will need to break ties by estimating the "cost" of the implicit casts. 
> For example, casting int8 to int32 is "less expensive" than implicitly 
> casting to int64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8919) [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke



 [ 
https://issues.apache.org/jira/browse/ARROW-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-8919:

Affects Version/s: 2.0.0

> [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that 
> may require implicit casts to invoke
> --
>
> Key: ARROW-8919
> URL: https://issues.apache.org/jira/browse/ARROW-8919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently we have "DispatchExact" which requires an exact match of input 
> types. "DispatchBest" would permit kernel selection with implicit casts 
> required. Since multiple kernels may be valid when allowing implicit casts, 
> we will need to break ties by estimating the "cost" of the implicit casts. 
> For example, casting int8 to int32 is "less expensive" than implicitly 
> casting to int64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8919) [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke



 [ 
https://issues.apache.org/jira/browse/ARROW-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-8919:
---

Assignee: Ben Kietzman

> [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that 
> may require implicit casts to invoke
> --
>
> Key: ARROW-8919
> URL: https://issues.apache.org/jira/browse/ARROW-8919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently we have "DispatchExact" which requires an exact match of input 
> types. "DispatchBest" would permit kernel selection with implicit casts 
> required. Since multiple kernels may be valid when allowing implicit casts, 
> we will need to break ties by estimating the "cost" of the implicit casts. 
> For example, casting int8 to int32 is "less expensive" than implicitly 
> casting to int64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11166) [Python][Compute] Add bindings for ProjectOptions



 [ 
https://issues.apache.org/jira/browse/ARROW-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-11166.
--
Resolution: Fixed

Issue resolved by pull request 9131
[https://github.com/apache/arrow/pull/9131]

> [Python][Compute] Add bindings for ProjectOptions
> -
>
> Key: ARROW-11166
> URL: https://issues.apache.org/jira/browse/ARROW-11166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Ben Kietzman
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similarly as ARROW-10725, need to expose {{ProjectOptions}}:
> {code}
> In [4]: from pyarrow.compute import project
> /home/joris/scipy/repos/arrow/python/pyarrow/compute.py:137: RuntimeWarning: 
> Python binding for ProjectOptions not exposed
>   warnings.warn("Python binding for {} not exposed"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-11206) [C++][Dataset][Python] Consider hiding/renaming "project"

Ben Kietzman created ARROW-11206:


 Summary: [C++][Dataset][Python] Consider hiding/renaming "project"
 Key: ARROW-11206
 URL: https://issues.apache.org/jira/browse/ARROW-11206
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 2.0.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 4.0.0


The "project" compute Function is necessary for ARROW-11174. However it is not 
intended for direct use outside an Expression ([where the correspondence to 
projection is not immediately 
obvious|https://github.com/apache/arrow/pull/9131#issuecomment-757764173]) so 
it may be preferable to do one/more of:
 * rename the function to "wrap_struct" or similar so it does make sense 
outside Expressions
 * ensure the function is not exposed to python/R bindings except through 
Expressions
 * remove the function from the default registry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11206) [C++][Dataset][Python] Consider hiding/renaming "project"



[ 
https://issues.apache.org/jira/browse/ARROW-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262701#comment-17262701
 ] 

Ben Kietzman commented on ARROW-11206:
--

[~jorisvandenbossche]

> [C++][Dataset][Python] Consider hiding/renaming "project"
> -
>
> Key: ARROW-11206
> URL: https://issues.apache.org/jira/browse/ARROW-11206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 4.0.0
>
>
> The "project" compute Function is necessary for ARROW-11174. However it is 
> not intended for direct use outside an Expression ([where the correspondence 
> to projection is not immediately 
> obvious|https://github.com/apache/arrow/pull/9131#issuecomment-757764173]) so 
> it may be preferable to do one/more of:
>  * rename the function to "wrap_struct" or similar so it does make sense 
> outside Expressions
>  * ensure the function is not exposed to python/R bindings except through 
> Expressions
>  * remove the function from the default registry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11009) [Python] Add environment variable to elect default usage of system memory allocator instead of jemalloc/mimalloc



 [ 
https://issues.apache.org/jira/browse/ARROW-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-11009.
--
Resolution: Fixed

Issue resolved by pull request 9105
[https://github.com/apache/arrow/pull/9105]

> [Python] Add environment variable to elect default usage of system memory 
> allocator instead of jemalloc/mimalloc
> 
>
> Key: ARROW-11009
> URL: https://issues.apache.org/jira/browse/ARROW-11009
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We routinely get reports like ARROW-11007 where there is suspicion of a 
> memory leak (which may or may not be valid) — having an easy way (requiring 
> no changes to application code) to toggle usage of the non-system memory 
> allocator would help with determining whether the memory usage patterns are 
> the result of the allocator being used. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9530) [C++] Add option to disable jemalloc background thread on Linux



[ 
https://issues.apache.org/jira/browse/ARROW-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262635#comment-17262635
 ] 

Antoine Pitrou commented on ARROW-9530:
---

Note that you'll soon be able to change the default memory pool implementation 
at runtime:

[https://github.com/apache/arrow/pull/9105]

(I have no idea whether mimalloc also uses a background thread, btw)

> [C++] Add option to disable jemalloc background thread on Linux
> ---
>
> Key: ARROW-9530
> URL: https://issues.apache.org/jira/browse/ARROW-9530
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Rob Ambalu
>Priority: Minor
>
> We noticed that after we upgraded pyarrow to version 0.17.1 from 0.9.0 we now 
> get two unwanted side effects just by linking in arrow libs into our c++ 
> build.  We link arrow libs into one of our low level core libraries so now 
> all of our applications are affected by these side effects:
> 1) a "jemalloc_bg_thd" thread is spawned on dlinit before we ever hit main
> 2) all our apps are now hitting valgrind leak warnings due to a (potential) 
> leak in jemalloc code:
> ==33515== 656 bytes in 1 blocks are possibly lost in loss record 1 of 1
> ==33515== at 0x402E9EA: calloc (vg_replace_malloc.c:752)
> ==33515== by 0x4011F44: _dl_allocate_tls (in /usr/lib64/ld-2.17.so)
> ==33515== by 0x5DFF9C0: pthread_create@@GLIBC_2.2.5 (in 
> /usr/lib64/libpthread-2.17.so)
> ==33515== by 0x589186B: je_arrow_private_je_pthread_create_wrapper 
> (background_thread.c:48)
> ==33515== by 0x589186B: background_thread_create_signals_masked 
> (background_thread.c:365)
> ==33515== by 0x589186B: background_thread_create_locked 
> (background_thread.c:573)
> ==33515== by 0x5891A47: je_arrow_private_je_background_thread_create 
> (background_thread.c:598)
> ==33515== by 0x400F502: _dl_init (in /usr/lib64/ld-2.17.so)
> ==33515== by 0x40011A9: ??? (in /usr/lib64/ld-2.17.so)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9530) [C++] Add option to disable jemalloc background thread on Linux

2021-01-11 Thread Rob Ambalu (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262633#comment-17262633
 ] 

Rob Ambalu commented on ARROW-9530:
---

fwiw I'm not seeing this behavior, threads are at 0% when they arent in use

They are still a nuisance though

> [C++] Add option to disable jemalloc background thread on Linux
> ---
>
> Key: ARROW-9530
> URL: https://issues.apache.org/jira/browse/ARROW-9530
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Rob Ambalu
>Priority: Minor
>
> We noticed that after we upgraded pyarrow to version 0.17.1 from 0.9.0 we now 
> get two unwanted side effects just by linking in arrow libs into our c++ 
> build.  We link arrow libs into one of our low level core libraries so now 
> all of our applications are affected by these side effects:
> 1) a "jemalloc_bg_thd" thread is spawned on dlinit before we ever hit main
> 2) all our apps are now hitting valgrind leak warnings due to a (potential) 
> leak in jemalloc code:
> ==33515== 656 bytes in 1 blocks are possibly lost in loss record 1 of 1
> ==33515== at 0x402E9EA: calloc (vg_replace_malloc.c:752)
> ==33515== by 0x4011F44: _dl_allocate_tls (in /usr/lib64/ld-2.17.so)
> ==33515== by 0x5DFF9C0: pthread_create@@GLIBC_2.2.5 (in 
> /usr/lib64/libpthread-2.17.so)
> ==33515== by 0x589186B: je_arrow_private_je_pthread_create_wrapper 
> (background_thread.c:48)
> ==33515== by 0x589186B: background_thread_create_signals_masked 
> (background_thread.c:365)
> ==33515== by 0x589186B: background_thread_create_locked 
> (background_thread.c:573)
> ==33515== by 0x5891A47: je_arrow_private_je_background_thread_create 
> (background_thread.c:598)
> ==33515== by 0x400F502: _dl_init (in /usr/lib64/ld-2.17.so)
> ==33515== by 0x40011A9: ??? (in /usr/lib64/ld-2.17.so)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-11165) [Rust] [DataFusion] Document the desired SQL dialect for DataFusion



 [ 
https://issues.apache.org/jira/browse/ARROW-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11165.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 9127
[https://github.com/apache/arrow/pull/9127]

> [Rust] [DataFusion] Document the desired SQL dialect for DataFusion
> ---
>
> Key: ARROW-11165
> URL: https://issues.apache.org/jira/browse/ARROW-11165
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As discussed https://github.com/apache/arrow/pull/9108#issuecomment-755791235
> We would like to pick a single SQL dialect in DataFusion to avoid what 
> happened with Spark SQL where functions, 
> https://spark.apache.org/docs/latest/api/sql/index.html, were added seemingly 
> ad-hoc making their usage very difficult and no clear feature matrix 
> available.
> Using an existing dialect will also allow us to re-use the documentation (and 
> other tools) from that dialect. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11165) [Rust] [DataFusion] Document the desired SQL dialect for DataFusion



 [ 
https://issues.apache.org/jira/browse/ARROW-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11165:

Component/s: Rust - DataFusion

> [Rust] [DataFusion] Document the desired SQL dialect for DataFusion
> ---
>
> Key: ARROW-11165
> URL: https://issues.apache.org/jira/browse/ARROW-11165
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As discussed https://github.com/apache/arrow/pull/9108#issuecomment-755791235
> We would like to pick a single SQL dialect in DataFusion to avoid what 
> happened with Spark SQL where functions, 
> https://spark.apache.org/docs/latest/api/sql/index.html, were added seemingly 
> ad-hoc making their usage very difficult and no clear feature matrix 
> available.
> Using an existing dialect will also allow us to re-use the documentation (and 
> other tools) from that dialect. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-11165) [Rust] [DataFusion] Document the desired SQL dialect for DataFusion



 [ 
https://issues.apache.org/jira/browse/ARROW-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11165:
---

Assignee: Andrew Lamb

> [Rust] [DataFusion] Document the desired SQL dialect for DataFusion
> ---
>
> Key: ARROW-11165
> URL: https://issues.apache.org/jira/browse/ARROW-11165
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As discussed https://github.com/apache/arrow/pull/9108#issuecomment-755791235
> We would like to pick a single SQL dialect in DataFusion to avoid what 
> happened with Spark SQL where functions, 
> https://spark.apache.org/docs/latest/api/sql/index.html, were added seemingly 
> ad-hoc making their usage very difficult and no clear feature matrix 
> available.
> Using an existing dialect will also allow us to re-use the documentation (and 
> other tools) from that dialect. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11172) [Python] NumPyBuffer does not set mutable_data_



 [ 
https://issues.apache.org/jira/browse/ARROW-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-11172:
--
Summary: [Python] NumPyBuffer does not set mutable_data_  (was: NumPyBuffer 
does not set mutable_data_)

> [Python] NumPyBuffer does not set mutable_data_
> ---
>
> Key: ARROW-11172
> URL: https://issues.apache.org/jira/browse/ARROW-11172
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 2.0.0
> Environment: All
>Reporter: Arthur Peters
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> NumPyBuffer sets is_mutable_, but does not set mutable_data_ in it's 
> constructor 
> ([code|https://github.com/apache/arrow/blob/d1ffe7229f327de8e9dbb7785b7c6e38d2c3319e/cpp/src/arrow/python/numpy_convert.cc#L51-L53])
>  even if the numpy array is mutable (NPY_ARRAY_WRITEABLE).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11202) [R][CI] Nightly builds not happening (or artifacts not exported)



 [ 
https://issues.apache.org/jira/browse/ARROW-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-11202:
--
Summary: [R][CI] Nightly builds not happening (or artifacts not exported)  
(was: Nightly builds not happening (or artifacts not exported))

> [R][CI] Nightly builds not happening (or artifacts not exported)
> 
>
> Key: ARROW-11202
> URL: https://issues.apache.org/jira/browse/ARROW-11202
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Laurent
>Priority: Major
>
> The instructions to install a nightly build (here: 
> https://arrow.apache.org/docs/r/#installing-a-development-version) lead to a 
> version `2.0.0.20201222`, that does not appear to be a nightly build (in 
> particular changes from 20201230 are missing).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10406) [C++] Unify dictionaries when writing IPC file in a single shot



 [ 
https://issues.apache.org/jira/browse/ARROW-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-10406:
---
Component/s: (was: Format)
 C++

> [C++] Unify dictionaries when writing IPC file in a single shot
> ---
>
> Key: ARROW-10406
> URL: https://issues.apache.org/jira/browse/ARROW-10406
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>
> I read a big (taxi) csv file and specified that I wanted to dictionary-encode 
> some columns. The resulting Table has ChunkedArrays with 1604 chunks. When I 
> go to write this Table to the IPC file format (write_feather), I get an 
> error: 
> {code}
>   Invalid: Dictionary replacement detected when writing IPC file format. 
> Arrow IPC files only support a single dictionary for a given field accross 
> all batches.
> {code}
> I can write this to Parquet and read it back in, and the roundtrip of the 
> data is correct. We should be able to do this in IPC too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10406) [C++] Unify dictionaries when writing IPC file in a single shot



 [ 
https://issues.apache.org/jira/browse/ARROW-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-10406:
---
Summary: [C++] Unify dictionaries when writing IPC file in a single shot  
(was: [Format] Support dictionary replacement in the IPC file format)

> [C++] Unify dictionaries when writing IPC file in a single shot
> ---
>
> Key: ARROW-10406
> URL: https://issues.apache.org/jira/browse/ARROW-10406
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Format
>Reporter: Neal Richardson
>Priority: Major
>
> I read a big (taxi) csv file and specified that I wanted to dictionary-encode 
> some columns. The resulting Table has ChunkedArrays with 1604 chunks. When I 
> go to write this Table to the IPC file format (write_feather), I get an 
> error: 
> {code}
>   Invalid: Dictionary replacement detected when writing IPC file format. 
> Arrow IPC files only support a single dictionary for a given field accross 
> all batches.
> {code}
> I can write this to Parquet and read it back in, and the roundtrip of the 
> data is correct. We should be able to do this in IPC too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-5336) [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries