[jira] [Resolved] (ARROW-13633) [Packaging][Debian] Add support for bookworm

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-13633.
--
Fix Version/s: 6.0.0
   Resolution: Fixed

Issue resolved by pull request 11297
[https://github.com/apache/arrow/pull/11297]

> [Packaging][Debian] Add support for bookworm
> 
>
> Key: ARROW-13633
> URL: https://issues.apache.org/jira/browse/ARROW-13633
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Nicola Crane
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-14178) [C++] Boost download location has moved

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-14178.
--
Fix Version/s: 6.0.0
   Resolution: Fixed

Issue resolved by pull request 11275
[https://github.com/apache/arrow/pull/11275]

> [C++] Boost download location has moved
> ---
>
> Key: ARROW-14178
> URL: https://issues.apache.org/jira/browse/ARROW-14178
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benson Muite
>Assignee: Benson Muite
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Boost downloads available on Jfrog. Need to update location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14207) [C++] Add missing dependencies for bundled Boost targets

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-14207:
---
Labels: pull-request-available  (was: )

> [C++] Add missing dependencies for bundled Boost targets
> 
>
> Key: ARROW-14207
> URL: https://issues.apache.org/jira/browse/ARROW-14207
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14207) [C++] Add missing dependencies for bundled Boost targets

2021-10-02 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-14207:


 Summary: [C++] Add missing dependencies for bundled Boost targets
 Key: ARROW-14207
 URL: https://issues.apache.org/jira/browse/ARROW-14207
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-14180) [Packaging] Add support for AlmaLinux 8

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-14180.
--
Resolution: Fixed

Issue resolved by pull request 11295
[https://github.com/apache/arrow/pull/11295]

> [Packaging] Add support for AlmaLinux 8
> ---
>
> Key: ARROW-14180
> URL: https://issues.apache.org/jira/browse/ARROW-14180
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14076) Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)

2021-10-02 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423600#comment-17423600
 ] 

Kouhei Sutou commented on ARROW-14076:
--

Could you try extpp 0.1.0?

> Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
> 
>
> Key: ARROW-14076
> URL: https://issues.apache.org/jira/browse/ARROW-14076
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 4.0.0
> Environment: Ruby 2.7.4 on Ubuntu 20.04/Heroku
>Reporter: Daniel Rice
>Priority: Major
>
>  
> Hello,
>  
> I am not able to get the Ruby gems, `red-arrow` and `red-parquet`, to work 
> properly on Heroku.  Heroku itself is merely an Ubuntu 20.04 LTS (focal) 
> container so this really is a question about what dependencies must be 
> installed to get these gems to work on Focal?
> So far I have specified the following in Heroku's `Aptfile`: 
> {code:java}
> # Get Heroku's Ubuntu distro for your Stack.  Heroku-20 = focal
> # Running bash on ⬢ ... up, run.1471 (Hobby)
> # ~ $ lsb_release --codename --short
> :repo:deb [trusted=yes arch=amd64] 
> https://apache.jfrog.io/artifactory/arrow/ubuntu/ focal mainlibarrow-dev
> libparquet-dev
> libarrow-glib-dev
> libparquet-glib-dev
> libgirepository-1.0-1
> libgirepository1.0-dev
> libglib2.0-dev
> libglib2.0-0
> gir1.2-glib-2.0
> gobject-introspection
> {code}
> Note: the above contains additional packages that were required by 
> `red-arrow` that WERE NOT SPECIFIED in the Installation guide at 
> [https://arrow.apache.org/install/.|https://arrow.apache.org/install/]
> Despite all my efforts, I am unable to solve this issue:
> {code:java}
> 2021-09-21T23:05:11.469561+00:00 heroku[worker.1]: Process exited with status 
> 1
> 2021-09-21T23:05:11.263179+00:00 app[worker.1]: bundler: failed to load 
> command: sidekiq (/app/vendor/bundle/ruby/2.7.0/bin/sidekiq)
> 2021-09-21T23:05:11.263465+00:00 app[worker.1]: 
> /app/vendor/bundle/ruby/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/kernel.rb:34:in
>  `require': 
> /tmp/build_29fd2902/vendor/bundle/ruby/2.7.0/gems/extpp-0.0.9/ext/extpp/libruby-extpp.so:
>  cannot open shared object file: No such file or directory - 
> /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow.so (LoadError)
> 2021-09-21T23:05:11.263508+00:00 app[worker.1]: from 
> /app/vendor/bundle/ruby/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/kernel.rb:34:in
>  `require'
> 2021-09-21T23:05:11.263521+00:00 app[worker.1]: from 
> /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow/loader.rb:112:in 
> `require_extension_library'
> 2021-09-21T23:05:11.263532+00:00 app[worker.1]: from 
> /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow/loader.rb:31:in 
> `post_load'
> 2021-09-21T23:05:11.263544+00:00 app[worker.1]: from 
> /app/vendor/bundle/ruby/2.7.0/gems/gobject-introspection-3.4.4/lib/gobject-introspection/loader.rb:45:in
>  `load'
> 2021-09-21T23:05:11.263565+00:00 app[worker.1]: from 
> /app/vendor/bundle/ruby/2.7.0/gems/gobject-introspection-3.4.4/lib/gobject-introspection/loader.rb:25:in
>  `load'
> {code}
>  What is super frustrating is that the directory, 
> `/app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib`, is specified in 
> `LD_LIBRARY_PATH`, so I'm not sure why it's not being found.
> *+_Any help determining the full list of dependent packages for Ubuntu 20.04 
> (focal) would be greatly appreciated._+*  
>  
> *Extra environment details:*
>  
> Ruby 2.7.4 on Ubuntu 20.04/Heroku
>  
> *Relevant gem versions:*
> red-arrow (4.0.0)
> red-parquet (4.0.0)
> gio2 (3.4.4)
> gobject-introspection (3.4.4)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14188) link error on ubuntu

2021-10-02 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423599#comment-17423599
 ] 

Kouhei Sutou commented on ARROW-14188:
--

Thanks.

Could you swap {{arrow_static parquet_static}} order in 
{{target_link_libraries(vision_obj)}}?
{{target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
parquet_static arrow_static)}}

> link error on ubuntu
> 
>
> Key: ARROW-14188
> URL: https://issues.apache.org/jira/browse/ARROW-14188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 5.0.0
> Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow
>Reporter: Amir Ghamarian
>Priority: Major
> Attachments: completerr.txt, linkerror.txt
>
>
> I used vcpkg to install arrow versions 4 and 5, trying to build my code that 
> uses parquet fails by giving link errors of undefined reference.
> The same code works on OSX but fails on ubuntu.
> My cmake snippet is as follows:
>  
> {code:java}
> find_package(Arrow CONFIG REQUIRED)
> get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY)
> find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR})
> find_package(Thrift CONFIG REQUIRED)
> {code}
> and the linking: 
>  
> {code:java}
> target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
> arrow_static parquet_static )
> {code}
>  
>  I get a lot of errors
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-14205) [C++] Add unicode normalization to scalar string

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-14205:


Assignee: Keisuke Okada

> [C++] Add unicode normalization to scalar string
> 
>
> Key: ARROW-14205
> URL: https://issues.apache.org/jira/browse/ARROW-14205
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Keisuke Okada
>Assignee: Keisuke Okada
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14205) [C++] Add unicode normalization to scalar string

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-14205:
-
Summary: [C++] Add unicode normalization to scalar string  (was: [C++]Add 
unicode normalization to scalar string)

> [C++] Add unicode normalization to scalar string
> 
>
> Key: ARROW-14205
> URL: https://issues.apache.org/jira/browse/ARROW-14205
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Keisuke Okada
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14206) [Go] Fix Build for ARM and s390x

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-14206:
---
Labels: pull-request-available  (was: )

> [Go] Fix Build for ARM and s390x
> 
>
> Key: ARROW-14206
> URL: https://issues.apache.org/jira/browse/ARROW-14206
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Go
>Affects Versions: 6.0.0
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14206) [Go] Fix Build for ARM and s390x

2021-10-02 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-14206:
-

 Summary: [Go] Fix Build for ARM and s390x
 Key: ARROW-14206
 URL: https://issues.apache.org/jira/browse/ARROW-14206
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Go
Affects Versions: 6.0.0
Reporter: Matthew Topol
Assignee: Matthew Topol
 Fix For: 6.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike

2021-10-02 Thread Eduardo Ponce (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduardo Ponce closed ARROW-14204.
-
Resolution: Not A Bug

> [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in 
> MatchLike
> 
>
> Key: ARROW-14204
> URL: https://issues.apache.org/jira/browse/ARROW-14204
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Eduardo Ponce
>Assignee: Eduardo Ponce
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [*RegexSubstringMatcher* is available only when RE2 is enabled as it is 
> guarded with #ifdef 
> ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
>  but it is [used in MatchLike kernel without the RE2 
> guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
>  so it compilation fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14063) [R] open_dataset() does not work on CSVs without header rows

2021-10-02 Thread Nicola Crane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423548#comment-17423548
 ] 

Nicola Crane commented on ARROW-14063:
--

We have a ticket open already to address the confusing behaviour for 
{{read_csv_arrow()}}, (ARROW-13887), but will look into the {{open_dataset()}} 
issue shortly. 

> [R] open_dataset() does not work on CSVs without header rows
> 
>
> Key: ARROW-14063
> URL: https://issues.apache.org/jira/browse/ARROW-14063
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 5.0.0
> Environment: sessionInfo()
> R version 4.0.5 (2021-03-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.5 LTS
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=C.UTF-8   LC_NUMERIC=C   LC_TIME=C.UTF-8   
>  [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8   
>  [7] LC_PAPER=C.UTF-8   LC_NAME=C  LC_ADDRESS=C  
> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base 
> other attached packages:
> [1] arrow_5.0.0.2  dplyr_1.0.5magrittr_2.0.1 targets_0.6.0 
> loaded via a namespace (and not attached):
>  [1] httr_1.4.2  rnaturalearth_0.1.0 sass_0.4.0  tidyr_1.1.3  
>   
>  [5] jsonlite_1.7.2  bit64_4.0.5 bslib_0.2.5.1   
> assertthat_0.2.1   
>  [9] askpass_1.1 sp_1.4-5blob_1.2.1  renv_0.13.2  
>   
> [13] yaml_2.2.1  globals_0.14.0  pillar_1.5.1
> RSQLite_2.2.7  
> [17] lattice_0.20-41 glue_1.4.2  digest_0.6.27   
> htmltools_0.5.1.1  
> [21] pkgconfig_2.0.3 RPostgres_1.3.2 listenv_0.8.0   config_0.3.1 
>   
> [25] purrr_0.3.4 processx_3.5.1  openssl_1.4.3   tibble_3.1.0 
>   
> [29] proxy_0.4-25aws.s3_0.3.21   colourvalues_0.3.7  
> generics_0.1.0 
> [33] ellipsis_0.3.1  cachem_1.0.5withr_2.4.1 furrr_0.2.3  
>   
> [37] cli_2.4.0   crayon_1.4.1memoise_2.0.0   
> evaluate_0.14  
> [41] ps_1.6.0fs_1.5.0future_1.21.0   fansi_0.4.2  
>   
> [45] parallelly_1.25.0   xml2_1.3.2  class_7.3-18
> rsconnect_0.8.18   
> [49] tools_4.0.5 data.table_1.14.0   hms_1.0.0   
> lifecycle_1.0.0
> [53] stringr_1.4.0   callr_3.6.0 jquerylib_0.1.4 
> compiler_4.0.5 
> [57] e1071_1.7-6 rlang_0.4.10classInt_0.4-3  units_0.7-1  
>   
> [61] grid_4.0.5  rstudioapi_0.13 visNetwork_2.0.9
> htmlwidgets_1.5.3  
> [65] aws.signature_0.6.0 crosstalk_1.1.1 igraph_1.2.6
> base64enc_0.1-3
> [69] rmarkdown_2.7   codetools_0.2-18DBI_1.1.1   curl_4.3 
>   
> [73] R6_2.5.0lubridate_1.7.10knitr_1.31  
> fastmap_1.1.0  
> [77] rgeos_0.5-5 bit_4.0.4   utf8_1.2.1  
> tarchetypes_0.2.1  
> [81] readr_1.4.0 KernSmooth_2.23-18  stringi_1.5.3   
> parallel_4.0.5 
> [85] Rcpp_1.0.6  vctrs_0.3.7 sf_0.9-8
> leaflet_2.0.4.1
> [89] dbplyr_2.1.1tidyselect_1.1.0xfun_0.22
>Reporter: Jared Lander
>Assignee: Nicola Crane
>Priority: Major
>  Labels: bug
> Fix For: 6.0.0
>
>
> Using {{open_dataset()}} on a CSV without a header row, followed by 
> {{collect()}}, results either in a {{tibble}} of {{NA}}s or an error 
> depending on duplication of the first row of data. This affects reading one 
> file or a directory of files.
> Here we use the `diamonds` data, where the first row of data does not have 
> any repeat values.
> {code:java}
> library(arrow)
> library(magrittr)
> data(diamonds, package='ggplot2')
> readr::write_csv(head(diamonds), file='diamonds_with_header.csv', 
> col_names=TRUE)
> readr::write_csv(head(diamonds), file='diamonds_without_header.csv', 
> col_names=FALSE)
> diamond_schema <- schema(
> carat=float32()
> , cut=string()
> , color=string()
> , clarity=string()
> , depth=float32()
> , table=float32()
> , price=float32()
> , x=float32()
> , y=float32()
> , z=float32()
> )
> diamonds_with_headers <- open_dataset('diamonds_with_header.csv', 
> schema=diamond_schema, format='csv')
> diamonds_without_headers <- open_dataset('diamonds_without_header.csv', 
> schema=diamond_schema, format='csv')
> # this works
> diamonds_with_headers %>% collect()
> # A tibble: 6 x 10
>   carat cut   color 

[jira] [Assigned] (ARROW-14063) [R] open_dataset() does not work on CSVs without header rows

2021-10-02 Thread Nicola Crane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Crane reassigned ARROW-14063:


Assignee: Nicola Crane

> [R] open_dataset() does not work on CSVs without header rows
> 
>
> Key: ARROW-14063
> URL: https://issues.apache.org/jira/browse/ARROW-14063
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 5.0.0
> Environment: sessionInfo()
> R version 4.0.5 (2021-03-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.5 LTS
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> locale:
>  [1] LC_CTYPE=C.UTF-8   LC_NUMERIC=C   LC_TIME=C.UTF-8   
>  [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8   
>  [7] LC_PAPER=C.UTF-8   LC_NAME=C  LC_ADDRESS=C  
> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base 
> other attached packages:
> [1] arrow_5.0.0.2  dplyr_1.0.5magrittr_2.0.1 targets_0.6.0 
> loaded via a namespace (and not attached):
>  [1] httr_1.4.2  rnaturalearth_0.1.0 sass_0.4.0  tidyr_1.1.3  
>   
>  [5] jsonlite_1.7.2  bit64_4.0.5 bslib_0.2.5.1   
> assertthat_0.2.1   
>  [9] askpass_1.1 sp_1.4-5blob_1.2.1  renv_0.13.2  
>   
> [13] yaml_2.2.1  globals_0.14.0  pillar_1.5.1
> RSQLite_2.2.7  
> [17] lattice_0.20-41 glue_1.4.2  digest_0.6.27   
> htmltools_0.5.1.1  
> [21] pkgconfig_2.0.3 RPostgres_1.3.2 listenv_0.8.0   config_0.3.1 
>   
> [25] purrr_0.3.4 processx_3.5.1  openssl_1.4.3   tibble_3.1.0 
>   
> [29] proxy_0.4-25aws.s3_0.3.21   colourvalues_0.3.7  
> generics_0.1.0 
> [33] ellipsis_0.3.1  cachem_1.0.5withr_2.4.1 furrr_0.2.3  
>   
> [37] cli_2.4.0   crayon_1.4.1memoise_2.0.0   
> evaluate_0.14  
> [41] ps_1.6.0fs_1.5.0future_1.21.0   fansi_0.4.2  
>   
> [45] parallelly_1.25.0   xml2_1.3.2  class_7.3-18
> rsconnect_0.8.18   
> [49] tools_4.0.5 data.table_1.14.0   hms_1.0.0   
> lifecycle_1.0.0
> [53] stringr_1.4.0   callr_3.6.0 jquerylib_0.1.4 
> compiler_4.0.5 
> [57] e1071_1.7-6 rlang_0.4.10classInt_0.4-3  units_0.7-1  
>   
> [61] grid_4.0.5  rstudioapi_0.13 visNetwork_2.0.9
> htmlwidgets_1.5.3  
> [65] aws.signature_0.6.0 crosstalk_1.1.1 igraph_1.2.6
> base64enc_0.1-3
> [69] rmarkdown_2.7   codetools_0.2-18DBI_1.1.1   curl_4.3 
>   
> [73] R6_2.5.0lubridate_1.7.10knitr_1.31  
> fastmap_1.1.0  
> [77] rgeos_0.5-5 bit_4.0.4   utf8_1.2.1  
> tarchetypes_0.2.1  
> [81] readr_1.4.0 KernSmooth_2.23-18  stringi_1.5.3   
> parallel_4.0.5 
> [85] Rcpp_1.0.6  vctrs_0.3.7 sf_0.9-8
> leaflet_2.0.4.1
> [89] dbplyr_2.1.1tidyselect_1.1.0xfun_0.22
>Reporter: Jared Lander
>Assignee: Nicola Crane
>Priority: Major
>  Labels: bug
> Fix For: 6.0.0
>
>
> Using {{open_dataset()}} on a CSV without a header row, followed by 
> {{collect()}}, results either in a {{tibble}} of {{NA}}s or an error 
> depending on duplication of the first row of data. This affects reading one 
> file or a directory of files.
> Here we use the `diamonds` data, where the first row of data does not have 
> any repeat values.
> {code:java}
> library(arrow)
> library(magrittr)
> data(diamonds, package='ggplot2')
> readr::write_csv(head(diamonds), file='diamonds_with_header.csv', 
> col_names=TRUE)
> readr::write_csv(head(diamonds), file='diamonds_without_header.csv', 
> col_names=FALSE)
> diamond_schema <- schema(
> carat=float32()
> , cut=string()
> , color=string()
> , clarity=string()
> , depth=float32()
> , table=float32()
> , price=float32()
> , x=float32()
> , y=float32()
> , z=float32()
> )
> diamonds_with_headers <- open_dataset('diamonds_with_header.csv', 
> schema=diamond_schema, format='csv')
> diamonds_without_headers <- open_dataset('diamonds_without_header.csv', 
> schema=diamond_schema, format='csv')
> # this works
> diamonds_with_headers %>% collect()
> # A tibble: 6 x 10
>   carat cut   color clarity depth table price x y z
>  
> 1 0.230 Ideal E SI2  61.555   326  3.95  3.98  2.43
> 2 0.210 Premium   E SI1  59.8   

[jira] [Updated] (ARROW-14188) link error on ubuntu

2021-10-02 Thread Amir Ghamarian (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Ghamarian updated ARROW-14188:
---
Attachment: completerr.txt

> link error on ubuntu
> 
>
> Key: ARROW-14188
> URL: https://issues.apache.org/jira/browse/ARROW-14188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 5.0.0
> Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow
>Reporter: Amir Ghamarian
>Priority: Major
> Attachments: completerr.txt, linkerror.txt
>
>
> I used vcpkg to install arrow versions 4 and 5, trying to build my code that 
> uses parquet fails by giving link errors of undefined reference.
> The same code works on OSX but fails on ubuntu.
> My cmake snippet is as follows:
>  
> {code:java}
> find_package(Arrow CONFIG REQUIRED)
> get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY)
> find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR})
> find_package(Thrift CONFIG REQUIRED)
> {code}
> and the linking: 
>  
> {code:java}
> target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
> arrow_static parquet_static )
> {code}
>  
>  I get a lot of errors
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14188) link error on ubuntu

2021-10-02 Thread Amir Ghamarian (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Ghamarian updated ARROW-14188:
---
Attachment: (was: completerr.txt)

> link error on ubuntu
> 
>
> Key: ARROW-14188
> URL: https://issues.apache.org/jira/browse/ARROW-14188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 5.0.0
> Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow
>Reporter: Amir Ghamarian
>Priority: Major
> Attachments: completerr.txt, linkerror.txt
>
>
> I used vcpkg to install arrow versions 4 and 5, trying to build my code that 
> uses parquet fails by giving link errors of undefined reference.
> The same code works on OSX but fails on ubuntu.
> My cmake snippet is as follows:
>  
> {code:java}
> find_package(Arrow CONFIG REQUIRED)
> get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY)
> find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR})
> find_package(Thrift CONFIG REQUIRED)
> {code}
> and the linking: 
>  
> {code:java}
> target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
> arrow_static parquet_static )
> {code}
>  
>  I get a lot of errors
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-13879) [C++] Mixed support for binary types in regex functions

2021-10-02 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423516#comment-17423516
 ] 

David Li commented on ARROW-13879:
--

length() is only used again if you construct from a pointer instead of a 
pointer + length - so this is still an issue with usage and not with 
string_view.

> [C++] Mixed support for binary types in regex functions
> ---
>
> Key: ARROW-13879
> URL: https://issues.apache.org/jira/browse/ARROW-13879
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Weston Pace
>Assignee: Eduardo Ponce
>Priority: Major
>  Labels: kernel, pull-request-available, types
> Fix For: 6.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The functions count_substring, count_substring_regex, find_substring, and 
> find_substring_regex all accept binary types but the function extract_regex, 
> match_substring, match_substring_regex, match_like, starts_with, ends_with, 
> split_pattern, and split_pattern_regex do not.
> They should either all accept binary types or none should.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14188) link error on ubuntu

2021-10-02 Thread Amir Ghamarian (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423512#comment-17423512
 ] 

Amir Ghamarian commented on ARROW-14188:


I added another log. Thanks [~kou].

> link error on ubuntu
> 
>
> Key: ARROW-14188
> URL: https://issues.apache.org/jira/browse/ARROW-14188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 5.0.0
> Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow
>Reporter: Amir Ghamarian
>Priority: Major
> Attachments: completerr.txt, linkerror.txt
>
>
> I used vcpkg to install arrow versions 4 and 5, trying to build my code that 
> uses parquet fails by giving link errors of undefined reference.
> The same code works on OSX but fails on ubuntu.
> My cmake snippet is as follows:
>  
> {code:java}
> find_package(Arrow CONFIG REQUIRED)
> get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY)
> find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR})
> find_package(Thrift CONFIG REQUIRED)
> {code}
> and the linking: 
>  
> {code:java}
> target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
> arrow_static parquet_static )
> {code}
>  
>  I get a lot of errors
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14188) link error on ubuntu

2021-10-02 Thread Amir Ghamarian (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Ghamarian updated ARROW-14188:
---
Attachment: completerr.txt

> link error on ubuntu
> 
>
> Key: ARROW-14188
> URL: https://issues.apache.org/jira/browse/ARROW-14188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 5.0.0
> Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow
>Reporter: Amir Ghamarian
>Priority: Major
> Attachments: completerr.txt, linkerror.txt
>
>
> I used vcpkg to install arrow versions 4 and 5, trying to build my code that 
> uses parquet fails by giving link errors of undefined reference.
> The same code works on OSX but fails on ubuntu.
> My cmake snippet is as follows:
>  
> {code:java}
> find_package(Arrow CONFIG REQUIRED)
> get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY)
> find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR})
> find_package(Thrift CONFIG REQUIRED)
> {code}
> and the linking: 
>  
> {code:java}
> target_link_libraries(vision_obj PUBLIC  thrift::thrift re2::re2 
> arrow_static parquet_static )
> {code}
>  
>  I get a lot of errors
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14205) [C++]Add unicode normalization to scalar string

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-14205:
---
Labels: pull-request-available  (was: )

> [C++]Add unicode normalization to scalar string
> ---
>
> Key: ARROW-14205
> URL: https://issues.apache.org/jira/browse/ARROW-14205
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Keisuke Okada
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14205) [C++]Add unicode normalization to scalar string

2021-10-02 Thread Keisuke Okada (Jira)
Keisuke Okada created ARROW-14205:
-

 Summary: [C++]Add unicode normalization to scalar string
 Key: ARROW-14205
 URL: https://issues.apache.org/jira/browse/ARROW-14205
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Keisuke Okada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13633) [Packaging][Debian] Add support for bookworm

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13633:
---
Labels: pull-request-available  (was: )

> [Packaging][Debian] Add support for bookworm
> 
>
> Key: ARROW-13633
> URL: https://issues.apache.org/jira/browse/ARROW-13633
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Nicola Crane
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-13633) [Packaging][Debian] Add support for bookworm

2021-10-02 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-13633:


Assignee: Kouhei Sutou

> [Packaging][Debian] Add support for bookworm
> 
>
> Key: ARROW-13633
> URL: https://issues.apache.org/jira/browse/ARROW-13633
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Nicola Crane
>Assignee: Kouhei Sutou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer

2021-10-02 Thread Truc Lam Nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423486#comment-17423486
 ] 

Truc Lam Nguyen commented on ARROW-14196:
-

[~judah.rand] sorry that I don't really understand your comment, could you 
please explain a little bit more? thanks

> [C++][Parquet] Default to compliant nested types in Parquet writer
> --
>
> Key: ARROW-14196
> URL: https://issues.apache.org/jira/browse/ARROW-14196
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Parquet
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++ there is already an option to get the "compliant_nested_types" (to 
> have the list columns follow the Parquet specification), and ARROW-11497 
> exposed this option in Python.
> This is still set to False by default, but in the source it says "TODO: At 
> some point we should flip this.", and in ARROW-11497 there was also some 
> discussion about what it would take to change the default.
> cc [~emkornfield] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-14204:
---
Labels: pull-request-available  (was: )

> [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in 
> MatchLike
> 
>
> Key: ARROW-14204
> URL: https://issues.apache.org/jira/browse/ARROW-14204
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Eduardo Ponce
>Assignee: Eduardo Ponce
>Priority: Major
>  Labels: pull-request-available
> Fix For: 6.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [*RegexSubstringMatcher* is available only when RE2 is enabled as it is 
> guarded with #ifdef 
> ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
>  but it is [used in MatchLike kernel without the RE2 
> guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
>  so it compilation fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike

2021-10-02 Thread Eduardo Ponce (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduardo Ponce updated ARROW-14204:
--
Summary: [C++] Fails to compile Arrow without RE2 due to missing ifdef 
guard in MatchLike  (was: [C++] Fails to compile Arrow without RE2 due to 
missing ifdef guard)

> [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in 
> MatchLike
> 
>
> Key: ARROW-14204
> URL: https://issues.apache.org/jira/browse/ARROW-14204
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Eduardo Ponce
>Assignee: Eduardo Ponce
>Priority: Major
> Fix For: 6.0.0
>
>
> [*RegexSubstringMatcher* is available only when RE2 is enabled as it is 
> guarded with #ifdef 
> ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
>  but it is [used here without the RE2 
> guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
>  so it compilation fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike

2021-10-02 Thread Eduardo Ponce (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduardo Ponce updated ARROW-14204:
--
Description: [*RegexSubstringMatcher* is available only when RE2 is enabled 
as it is guarded with #ifdef 
ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
 but it is [used in MatchLike kernel without the RE2 
guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
 so it compilation fails.  (was: [*RegexSubstringMatcher* is available only 
when RE2 is enabled as it is guarded with #ifdef 
ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
 but it is [used here without the RE2 
guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
 so it compilation fails.)

> [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in 
> MatchLike
> 
>
> Key: ARROW-14204
> URL: https://issues.apache.org/jira/browse/ARROW-14204
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Eduardo Ponce
>Assignee: Eduardo Ponce
>Priority: Major
> Fix For: 6.0.0
>
>
> [*RegexSubstringMatcher* is available only when RE2 is enabled as it is 
> guarded with #ifdef 
> ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862]
>  but it is [used in MatchLike kernel without the RE2 
> guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113],
>  so it compilation fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)