[jira] [Resolved] (ARROW-13633) [Packaging][Debian] Add support for bookworm
[ https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-13633. -- Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11297 [https://github.com/apache/arrow/pull/11297] > [Packaging][Debian] Add support for bookworm > > > Key: ARROW-13633 > URL: https://issues.apache.org/jira/browse/ARROW-13633 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Nicola Crane >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14178) [C++] Boost download location has moved
[ https://issues.apache.org/jira/browse/ARROW-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-14178. -- Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11275 [https://github.com/apache/arrow/pull/11275] > [C++] Boost download location has moved > --- > > Key: ARROW-14178 > URL: https://issues.apache.org/jira/browse/ARROW-14178 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benson Muite >Assignee: Benson Muite >Priority: Minor > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Boost downloads available on Jfrog. Need to update location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14207) [C++] Add missing dependencies for bundled Boost targets
[ https://issues.apache.org/jira/browse/ARROW-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14207: --- Labels: pull-request-available (was: ) > [C++] Add missing dependencies for bundled Boost targets > > > Key: ARROW-14207 > URL: https://issues.apache.org/jira/browse/ARROW-14207 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14207) [C++] Add missing dependencies for bundled Boost targets
Kouhei Sutou created ARROW-14207: Summary: [C++] Add missing dependencies for bundled Boost targets Key: ARROW-14207 URL: https://issues.apache.org/jira/browse/ARROW-14207 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-14180) [Packaging] Add support for AlmaLinux 8
[ https://issues.apache.org/jira/browse/ARROW-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-14180. -- Resolution: Fixed Issue resolved by pull request 11295 [https://github.com/apache/arrow/pull/11295] > [Packaging] Add support for AlmaLinux 8 > --- > > Key: ARROW-14180 > URL: https://issues.apache.org/jira/browse/ARROW-14180 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14076) Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
[ https://issues.apache.org/jira/browse/ARROW-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423600#comment-17423600 ] Kouhei Sutou commented on ARROW-14076: -- Could you try extpp 0.1.0? > Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal) > > > Key: ARROW-14076 > URL: https://issues.apache.org/jira/browse/ARROW-14076 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 4.0.0 > Environment: Ruby 2.7.4 on Ubuntu 20.04/Heroku >Reporter: Daniel Rice >Priority: Major > > > Hello, > > I am not able to get the Ruby gems, `red-arrow` and `red-parquet`, to work > properly on Heroku. Heroku itself is merely an Ubuntu 20.04 LTS (focal) > container so this really is a question about what dependencies must be > installed to get these gems to work on Focal? > So far I have specified the following in Heroku's `Aptfile`: > {code:java} > # Get Heroku's Ubuntu distro for your Stack. Heroku-20 = focal > # Running bash on ⬢ ... up, run.1471 (Hobby) > # ~ $ lsb_release --codename --short > :repo:deb [trusted=yes arch=amd64] > https://apache.jfrog.io/artifactory/arrow/ubuntu/ focal mainlibarrow-dev > libparquet-dev > libarrow-glib-dev > libparquet-glib-dev > libgirepository-1.0-1 > libgirepository1.0-dev > libglib2.0-dev > libglib2.0-0 > gir1.2-glib-2.0 > gobject-introspection > {code} > Note: the above contains additional packages that were required by > `red-arrow` that WERE NOT SPECIFIED in the Installation guide at > [https://arrow.apache.org/install/.|https://arrow.apache.org/install/] > Despite all my efforts, I am unable to solve this issue: > {code:java} > 2021-09-21T23:05:11.469561+00:00 heroku[worker.1]: Process exited with status > 1 > 2021-09-21T23:05:11.263179+00:00 app[worker.1]: bundler: failed to load > command: sidekiq (/app/vendor/bundle/ruby/2.7.0/bin/sidekiq) > 2021-09-21T23:05:11.263465+00:00 app[worker.1]: > /app/vendor/bundle/ruby/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/kernel.rb:34:in > `require': > /tmp/build_29fd2902/vendor/bundle/ruby/2.7.0/gems/extpp-0.0.9/ext/extpp/libruby-extpp.so: > cannot open shared object file: No such file or directory - > /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow.so (LoadError) > 2021-09-21T23:05:11.263508+00:00 app[worker.1]: from > /app/vendor/bundle/ruby/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/kernel.rb:34:in > `require' > 2021-09-21T23:05:11.263521+00:00 app[worker.1]: from > /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow/loader.rb:112:in > `require_extension_library' > 2021-09-21T23:05:11.263532+00:00 app[worker.1]: from > /app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib/arrow/loader.rb:31:in > `post_load' > 2021-09-21T23:05:11.263544+00:00 app[worker.1]: from > /app/vendor/bundle/ruby/2.7.0/gems/gobject-introspection-3.4.4/lib/gobject-introspection/loader.rb:45:in > `load' > 2021-09-21T23:05:11.263565+00:00 app[worker.1]: from > /app/vendor/bundle/ruby/2.7.0/gems/gobject-introspection-3.4.4/lib/gobject-introspection/loader.rb:25:in > `load' > {code} > What is super frustrating is that the directory, > `/app/vendor/bundle/ruby/2.7.0/gems/red-arrow-4.0.0/lib`, is specified in > `LD_LIBRARY_PATH`, so I'm not sure why it's not being found. > *+_Any help determining the full list of dependent packages for Ubuntu 20.04 > (focal) would be greatly appreciated._+* > > *Extra environment details:* > > Ruby 2.7.4 on Ubuntu 20.04/Heroku > > *Relevant gem versions:* > red-arrow (4.0.0) > red-parquet (4.0.0) > gio2 (3.4.4) > gobject-introspection (3.4.4) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14188) link error on ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423599#comment-17423599 ] Kouhei Sutou commented on ARROW-14188: -- Thanks. Could you swap {{arrow_static parquet_static}} order in {{target_link_libraries(vision_obj)}}? {{target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 parquet_static arrow_static)}} > link error on ubuntu > > > Key: ARROW-14188 > URL: https://issues.apache.org/jira/browse/ARROW-14188 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 5.0.0 > Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow >Reporter: Amir Ghamarian >Priority: Major > Attachments: completerr.txt, linkerror.txt > > > I used vcpkg to install arrow versions 4 and 5, trying to build my code that > uses parquet fails by giving link errors of undefined reference. > The same code works on OSX but fails on ubuntu. > My cmake snippet is as follows: > > {code:java} > find_package(Arrow CONFIG REQUIRED) > get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY) > find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR}) > find_package(Thrift CONFIG REQUIRED) > {code} > and the linking: > > {code:java} > target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 > arrow_static parquet_static ) > {code} > > I get a lot of errors > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-14205) [C++] Add unicode normalization to scalar string
[ https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-14205: Assignee: Keisuke Okada > [C++] Add unicode normalization to scalar string > > > Key: ARROW-14205 > URL: https://issues.apache.org/jira/browse/ARROW-14205 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Keisuke Okada >Assignee: Keisuke Okada >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14205) [C++] Add unicode normalization to scalar string
[ https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-14205: - Summary: [C++] Add unicode normalization to scalar string (was: [C++]Add unicode normalization to scalar string) > [C++] Add unicode normalization to scalar string > > > Key: ARROW-14205 > URL: https://issues.apache.org/jira/browse/ARROW-14205 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Keisuke Okada >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14206) [Go] Fix Build for ARM and s390x
[ https://issues.apache.org/jira/browse/ARROW-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14206: --- Labels: pull-request-available (was: ) > [Go] Fix Build for ARM and s390x > > > Key: ARROW-14206 > URL: https://issues.apache.org/jira/browse/ARROW-14206 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Go >Affects Versions: 6.0.0 >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14206) [Go] Fix Build for ARM and s390x
Matthew Topol created ARROW-14206: - Summary: [Go] Fix Build for ARM and s390x Key: ARROW-14206 URL: https://issues.apache.org/jira/browse/ARROW-14206 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Go Affects Versions: 6.0.0 Reporter: Matthew Topol Assignee: Matthew Topol Fix For: 6.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
[ https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eduardo Ponce closed ARROW-14204. - Resolution: Not A Bug > [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in > MatchLike > > > Key: ARROW-14204 > URL: https://issues.apache.org/jira/browse/ARROW-14204 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Eduardo Ponce >Assignee: Eduardo Ponce >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > [*RegexSubstringMatcher* is available only when RE2 is enabled as it is > guarded with #ifdef > ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] > but it is [used in MatchLike kernel without the RE2 > guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], > so it compilation fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14063) [R] open_dataset() does not work on CSVs without header rows
[ https://issues.apache.org/jira/browse/ARROW-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423548#comment-17423548 ] Nicola Crane commented on ARROW-14063: -- We have a ticket open already to address the confusing behaviour for {{read_csv_arrow()}}, (ARROW-13887), but will look into the {{open_dataset()}} issue shortly. > [R] open_dataset() does not work on CSVs without header rows > > > Key: ARROW-14063 > URL: https://issues.apache.org/jira/browse/ARROW-14063 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 5.0.0 > Environment: sessionInfo() > R version 4.0.5 (2021-03-31) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 18.04.5 LTS > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 > LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so > locale: > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 > [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8 > [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] arrow_5.0.0.2 dplyr_1.0.5magrittr_2.0.1 targets_0.6.0 > loaded via a namespace (and not attached): > [1] httr_1.4.2 rnaturalearth_0.1.0 sass_0.4.0 tidyr_1.1.3 > > [5] jsonlite_1.7.2 bit64_4.0.5 bslib_0.2.5.1 > assertthat_0.2.1 > [9] askpass_1.1 sp_1.4-5blob_1.2.1 renv_0.13.2 > > [13] yaml_2.2.1 globals_0.14.0 pillar_1.5.1 > RSQLite_2.2.7 > [17] lattice_0.20-41 glue_1.4.2 digest_0.6.27 > htmltools_0.5.1.1 > [21] pkgconfig_2.0.3 RPostgres_1.3.2 listenv_0.8.0 config_0.3.1 > > [25] purrr_0.3.4 processx_3.5.1 openssl_1.4.3 tibble_3.1.0 > > [29] proxy_0.4-25aws.s3_0.3.21 colourvalues_0.3.7 > generics_0.1.0 > [33] ellipsis_0.3.1 cachem_1.0.5withr_2.4.1 furrr_0.2.3 > > [37] cli_2.4.0 crayon_1.4.1memoise_2.0.0 > evaluate_0.14 > [41] ps_1.6.0fs_1.5.0future_1.21.0 fansi_0.4.2 > > [45] parallelly_1.25.0 xml2_1.3.2 class_7.3-18 > rsconnect_0.8.18 > [49] tools_4.0.5 data.table_1.14.0 hms_1.0.0 > lifecycle_1.0.0 > [53] stringr_1.4.0 callr_3.6.0 jquerylib_0.1.4 > compiler_4.0.5 > [57] e1071_1.7-6 rlang_0.4.10classInt_0.4-3 units_0.7-1 > > [61] grid_4.0.5 rstudioapi_0.13 visNetwork_2.0.9 > htmlwidgets_1.5.3 > [65] aws.signature_0.6.0 crosstalk_1.1.1 igraph_1.2.6 > base64enc_0.1-3 > [69] rmarkdown_2.7 codetools_0.2-18DBI_1.1.1 curl_4.3 > > [73] R6_2.5.0lubridate_1.7.10knitr_1.31 > fastmap_1.1.0 > [77] rgeos_0.5-5 bit_4.0.4 utf8_1.2.1 > tarchetypes_0.2.1 > [81] readr_1.4.0 KernSmooth_2.23-18 stringi_1.5.3 > parallel_4.0.5 > [85] Rcpp_1.0.6 vctrs_0.3.7 sf_0.9-8 > leaflet_2.0.4.1 > [89] dbplyr_2.1.1tidyselect_1.1.0xfun_0.22 >Reporter: Jared Lander >Assignee: Nicola Crane >Priority: Major > Labels: bug > Fix For: 6.0.0 > > > Using {{open_dataset()}} on a CSV without a header row, followed by > {{collect()}}, results either in a {{tibble}} of {{NA}}s or an error > depending on duplication of the first row of data. This affects reading one > file or a directory of files. > Here we use the `diamonds` data, where the first row of data does not have > any repeat values. > {code:java} > library(arrow) > library(magrittr) > data(diamonds, package='ggplot2') > readr::write_csv(head(diamonds), file='diamonds_with_header.csv', > col_names=TRUE) > readr::write_csv(head(diamonds), file='diamonds_without_header.csv', > col_names=FALSE) > diamond_schema <- schema( > carat=float32() > , cut=string() > , color=string() > , clarity=string() > , depth=float32() > , table=float32() > , price=float32() > , x=float32() > , y=float32() > , z=float32() > ) > diamonds_with_headers <- open_dataset('diamonds_with_header.csv', > schema=diamond_schema, format='csv') > diamonds_without_headers <- open_dataset('diamonds_without_header.csv', > schema=diamond_schema, format='csv') > # this works > diamonds_with_headers %>% collect() > # A tibble: 6 x 10 > carat cu
[jira] [Assigned] (ARROW-14063) [R] open_dataset() does not work on CSVs without header rows
[ https://issues.apache.org/jira/browse/ARROW-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Crane reassigned ARROW-14063: Assignee: Nicola Crane > [R] open_dataset() does not work on CSVs without header rows > > > Key: ARROW-14063 > URL: https://issues.apache.org/jira/browse/ARROW-14063 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 5.0.0 > Environment: sessionInfo() > R version 4.0.5 (2021-03-31) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 18.04.5 LTS > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 > LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so > locale: > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 > [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8 > [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] arrow_5.0.0.2 dplyr_1.0.5magrittr_2.0.1 targets_0.6.0 > loaded via a namespace (and not attached): > [1] httr_1.4.2 rnaturalearth_0.1.0 sass_0.4.0 tidyr_1.1.3 > > [5] jsonlite_1.7.2 bit64_4.0.5 bslib_0.2.5.1 > assertthat_0.2.1 > [9] askpass_1.1 sp_1.4-5blob_1.2.1 renv_0.13.2 > > [13] yaml_2.2.1 globals_0.14.0 pillar_1.5.1 > RSQLite_2.2.7 > [17] lattice_0.20-41 glue_1.4.2 digest_0.6.27 > htmltools_0.5.1.1 > [21] pkgconfig_2.0.3 RPostgres_1.3.2 listenv_0.8.0 config_0.3.1 > > [25] purrr_0.3.4 processx_3.5.1 openssl_1.4.3 tibble_3.1.0 > > [29] proxy_0.4-25aws.s3_0.3.21 colourvalues_0.3.7 > generics_0.1.0 > [33] ellipsis_0.3.1 cachem_1.0.5withr_2.4.1 furrr_0.2.3 > > [37] cli_2.4.0 crayon_1.4.1memoise_2.0.0 > evaluate_0.14 > [41] ps_1.6.0fs_1.5.0future_1.21.0 fansi_0.4.2 > > [45] parallelly_1.25.0 xml2_1.3.2 class_7.3-18 > rsconnect_0.8.18 > [49] tools_4.0.5 data.table_1.14.0 hms_1.0.0 > lifecycle_1.0.0 > [53] stringr_1.4.0 callr_3.6.0 jquerylib_0.1.4 > compiler_4.0.5 > [57] e1071_1.7-6 rlang_0.4.10classInt_0.4-3 units_0.7-1 > > [61] grid_4.0.5 rstudioapi_0.13 visNetwork_2.0.9 > htmlwidgets_1.5.3 > [65] aws.signature_0.6.0 crosstalk_1.1.1 igraph_1.2.6 > base64enc_0.1-3 > [69] rmarkdown_2.7 codetools_0.2-18DBI_1.1.1 curl_4.3 > > [73] R6_2.5.0lubridate_1.7.10knitr_1.31 > fastmap_1.1.0 > [77] rgeos_0.5-5 bit_4.0.4 utf8_1.2.1 > tarchetypes_0.2.1 > [81] readr_1.4.0 KernSmooth_2.23-18 stringi_1.5.3 > parallel_4.0.5 > [85] Rcpp_1.0.6 vctrs_0.3.7 sf_0.9-8 > leaflet_2.0.4.1 > [89] dbplyr_2.1.1tidyselect_1.1.0xfun_0.22 >Reporter: Jared Lander >Assignee: Nicola Crane >Priority: Major > Labels: bug > Fix For: 6.0.0 > > > Using {{open_dataset()}} on a CSV without a header row, followed by > {{collect()}}, results either in a {{tibble}} of {{NA}}s or an error > depending on duplication of the first row of data. This affects reading one > file or a directory of files. > Here we use the `diamonds` data, where the first row of data does not have > any repeat values. > {code:java} > library(arrow) > library(magrittr) > data(diamonds, package='ggplot2') > readr::write_csv(head(diamonds), file='diamonds_with_header.csv', > col_names=TRUE) > readr::write_csv(head(diamonds), file='diamonds_without_header.csv', > col_names=FALSE) > diamond_schema <- schema( > carat=float32() > , cut=string() > , color=string() > , clarity=string() > , depth=float32() > , table=float32() > , price=float32() > , x=float32() > , y=float32() > , z=float32() > ) > diamonds_with_headers <- open_dataset('diamonds_with_header.csv', > schema=diamond_schema, format='csv') > diamonds_without_headers <- open_dataset('diamonds_without_header.csv', > schema=diamond_schema, format='csv') > # this works > diamonds_with_headers %>% collect() > # A tibble: 6 x 10 > carat cut color clarity depth table price x y z > > 1 0.230 Ideal E SI2 61.555 326 3.95 3.98 2.43 > 2 0.210 Premium E SI1 59.8
[jira] [Updated] (ARROW-14188) link error on ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Ghamarian updated ARROW-14188: --- Attachment: completerr.txt > link error on ubuntu > > > Key: ARROW-14188 > URL: https://issues.apache.org/jira/browse/ARROW-14188 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 5.0.0 > Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow >Reporter: Amir Ghamarian >Priority: Major > Attachments: completerr.txt, linkerror.txt > > > I used vcpkg to install arrow versions 4 and 5, trying to build my code that > uses parquet fails by giving link errors of undefined reference. > The same code works on OSX but fails on ubuntu. > My cmake snippet is as follows: > > {code:java} > find_package(Arrow CONFIG REQUIRED) > get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY) > find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR}) > find_package(Thrift CONFIG REQUIRED) > {code} > and the linking: > > {code:java} > target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 > arrow_static parquet_static ) > {code} > > I get a lot of errors > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14188) link error on ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Ghamarian updated ARROW-14188: --- Attachment: (was: completerr.txt) > link error on ubuntu > > > Key: ARROW-14188 > URL: https://issues.apache.org/jira/browse/ARROW-14188 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 5.0.0 > Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow >Reporter: Amir Ghamarian >Priority: Major > Attachments: completerr.txt, linkerror.txt > > > I used vcpkg to install arrow versions 4 and 5, trying to build my code that > uses parquet fails by giving link errors of undefined reference. > The same code works on OSX but fails on ubuntu. > My cmake snippet is as follows: > > {code:java} > find_package(Arrow CONFIG REQUIRED) > get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY) > find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR}) > find_package(Thrift CONFIG REQUIRED) > {code} > and the linking: > > {code:java} > target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 > arrow_static parquet_static ) > {code} > > I get a lot of errors > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13879) [C++] Mixed support for binary types in regex functions
[ https://issues.apache.org/jira/browse/ARROW-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423516#comment-17423516 ] David Li commented on ARROW-13879: -- length() is only used again if you construct from a pointer instead of a pointer + length - so this is still an issue with usage and not with string_view. > [C++] Mixed support for binary types in regex functions > --- > > Key: ARROW-13879 > URL: https://issues.apache.org/jira/browse/ARROW-13879 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Weston Pace >Assignee: Eduardo Ponce >Priority: Major > Labels: kernel, pull-request-available, types > Fix For: 6.0.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The functions count_substring, count_substring_regex, find_substring, and > find_substring_regex all accept binary types but the function extract_regex, > match_substring, match_substring_regex, match_like, starts_with, ends_with, > split_pattern, and split_pattern_regex do not. > They should either all accept binary types or none should. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14188) link error on ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423512#comment-17423512 ] Amir Ghamarian commented on ARROW-14188: I added another log. Thanks [~kou]. > link error on ubuntu > > > Key: ARROW-14188 > URL: https://issues.apache.org/jira/browse/ARROW-14188 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 5.0.0 > Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow >Reporter: Amir Ghamarian >Priority: Major > Attachments: completerr.txt, linkerror.txt > > > I used vcpkg to install arrow versions 4 and 5, trying to build my code that > uses parquet fails by giving link errors of undefined reference. > The same code works on OSX but fails on ubuntu. > My cmake snippet is as follows: > > {code:java} > find_package(Arrow CONFIG REQUIRED) > get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY) > find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR}) > find_package(Thrift CONFIG REQUIRED) > {code} > and the linking: > > {code:java} > target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 > arrow_static parquet_static ) > {code} > > I get a lot of errors > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14188) link error on ubuntu
[ https://issues.apache.org/jira/browse/ARROW-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Ghamarian updated ARROW-14188: --- Attachment: completerr.txt > link error on ubuntu > > > Key: ARROW-14188 > URL: https://issues.apache.org/jira/browse/ARROW-14188 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 5.0.0 > Environment: Ubuntu 18.04, gcc-9, and vcpkg installation of arrow >Reporter: Amir Ghamarian >Priority: Major > Attachments: completerr.txt, linkerror.txt > > > I used vcpkg to install arrow versions 4 and 5, trying to build my code that > uses parquet fails by giving link errors of undefined reference. > The same code works on OSX but fails on ubuntu. > My cmake snippet is as follows: > > {code:java} > find_package(Arrow CONFIG REQUIRED) > get_filename_component(MY_SEARCH_DIR ${Arrow_CONFIG} DIRECTORY) > find_package(Parquet CONFIG REQUIRED PATHS ${MY_SEARCH_DIR}) > find_package(Thrift CONFIG REQUIRED) > {code} > and the linking: > > {code:java} > target_link_libraries(vision_obj PUBLIC thrift::thrift re2::re2 > arrow_static parquet_static ) > {code} > > I get a lot of errors > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14205) [C++]Add unicode normalization to scalar string
[ https://issues.apache.org/jira/browse/ARROW-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14205: --- Labels: pull-request-available (was: ) > [C++]Add unicode normalization to scalar string > --- > > Key: ARROW-14205 > URL: https://issues.apache.org/jira/browse/ARROW-14205 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Keisuke Okada >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-14205) [C++]Add unicode normalization to scalar string
Keisuke Okada created ARROW-14205: - Summary: [C++]Add unicode normalization to scalar string Key: ARROW-14205 URL: https://issues.apache.org/jira/browse/ARROW-14205 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Keisuke Okada -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13633) [Packaging][Debian] Add support for bookworm
[ https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13633: --- Labels: pull-request-available (was: ) > [Packaging][Debian] Add support for bookworm > > > Key: ARROW-13633 > URL: https://issues.apache.org/jira/browse/ARROW-13633 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Nicola Crane >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-13633) [Packaging][Debian] Add support for bookworm
[ https://issues.apache.org/jira/browse/ARROW-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-13633: Assignee: Kouhei Sutou > [Packaging][Debian] Add support for bookworm > > > Key: ARROW-13633 > URL: https://issues.apache.org/jira/browse/ARROW-13633 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Nicola Crane >Assignee: Kouhei Sutou >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423486#comment-17423486 ] Truc Lam Nguyen commented on ARROW-14196: - [~judah.rand] sorry that I don't really understand your comment, could you please explain a little bit more? thanks > [C++][Parquet] Default to compliant nested types in Parquet writer > -- > > Key: ARROW-14196 > URL: https://issues.apache.org/jira/browse/ARROW-14196 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: Joris Van den Bossche >Priority: Major > > In C++ there is already an option to get the "compliant_nested_types" (to > have the list columns follow the Parquet specification), and ARROW-11497 > exposed this option in Python. > This is still set to False by default, but in the source it says "TODO: At > some point we should flip this.", and in ARROW-11497 there was also some > discussion about what it would take to change the default. > cc [~emkornfield] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
[ https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14204: --- Labels: pull-request-available (was: ) > [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in > MatchLike > > > Key: ARROW-14204 > URL: https://issues.apache.org/jira/browse/ARROW-14204 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Eduardo Ponce >Assignee: Eduardo Ponce >Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > [*RegexSubstringMatcher* is available only when RE2 is enabled as it is > guarded with #ifdef > ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] > but it is [used in MatchLike kernel without the RE2 > guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], > so it compilation fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
[ https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eduardo Ponce updated ARROW-14204: -- Summary: [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike (was: [C++] Fails to compile Arrow without RE2 due to missing ifdef guard) > [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in > MatchLike > > > Key: ARROW-14204 > URL: https://issues.apache.org/jira/browse/ARROW-14204 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Eduardo Ponce >Assignee: Eduardo Ponce >Priority: Major > Fix For: 6.0.0 > > > [*RegexSubstringMatcher* is available only when RE2 is enabled as it is > guarded with #ifdef > ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] > but it is [used here without the RE2 > guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], > so it compilation fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-14204) [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
[ https://issues.apache.org/jira/browse/ARROW-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eduardo Ponce updated ARROW-14204: -- Description: [*RegexSubstringMatcher* is available only when RE2 is enabled as it is guarded with #ifdef ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] but it is [used in MatchLike kernel without the RE2 guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], so it compilation fails. (was: [*RegexSubstringMatcher* is available only when RE2 is enabled as it is guarded with #ifdef ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] but it is [used here without the RE2 guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], so it compilation fails.) > [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in > MatchLike > > > Key: ARROW-14204 > URL: https://issues.apache.org/jira/browse/ARROW-14204 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Eduardo Ponce >Assignee: Eduardo Ponce >Priority: Major > Fix For: 6.0.0 > > > [*RegexSubstringMatcher* is available only when RE2 is enabled as it is > guarded with #ifdef > ARROW_WITH_RE2|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L861-L862] > but it is [used in MatchLike kernel without the RE2 > guard|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1113], > so it compilation fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)