egillax opened a new issue, #34519: URL: https://github.com/apache/arrow/issues/34519
### Describe the bug, including details regarding any error messages, version, and platform. I was testing the latest arrow develop version using [this](https://arrow.apache.org/docs/dev/r/articles/install_nightly.html#install-from-git-repository) method to install from git. And now it seems I cannot cast columns in a dataset, it results in ```NA``` values: I tried using both parquet and arrow files. This does work using latest version on CRAN (11.0.0.3) and using arrow tables instead of datasets. Reprex: ``` r library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(arrow) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp mtcars %>% write_dataset('./mtcars/') ds <- open_dataset('./mtcars') ds %>% dplyr::collect() #> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 #> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 #> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 #> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ds %>% dplyr::mutate(mpg=as.numeric(mpg)) %>% dplyr::collect() #> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 NA 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 NA 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 NA 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 NA 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 NA 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 NA 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 NA 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 11 NA 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 12 NA 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 13 NA 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 14 NA 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 15 NA 8 472.0 205 2.93 5.250 17.98 0 0 3 4 #> 16 NA 8 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 17 NA 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> 19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> 20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 #> 21 NA 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 22 NA 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 23 NA 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 24 NA 8 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 25 NA 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> 26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> 27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> 28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 #> 29 NA 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> 30 NA 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> 31 NA 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> 32 NA 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ``` <sup>Created on 2023-03-09 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> <details> <summary>Arrow Info</summary> Arrow package version: 11.0.0.9000 Capabilities: dataset TRUE substrait FALSE parquet TRUE json TRUE s3 FALSE gcs FALSE utf8proc TRUE re2 TRUE snappy TRUE gzip FALSE brotli FALSE zstd FALSE lz4 TRUE lz4_frame TRUE lzo FALSE bz2 FALSE jemalloc FALSE mimalloc TRUE To reinstall with more optional capabilities enabled, see https://arrow.apache.org/docs/r/articles/install.html Memory: Allocator mimalloc Current 13.31 Kb Max 46.31 Mb Runtime: SIMD Level avx2 Detected SIMD Level avx2 Build: C++ Library Version 12.0.0-SNAPSHOT C++ Compiler GNU C++ Compiler Version 12.2.0 Git ID b679a96d426f4df1a2d15d452f312c968cdfc8f6 </details> <details> <summary>sessionInfo</summary> R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=nl_NL.UTF-8 [9] LC_ADDRESS=nl_NL.UTF-8 LC_TELEPHONE=nl_NL.UTF-8 LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=nl_NL.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_11.0.0.9000 dplyr_1.0.10 PatientLevelPrediction_6.2.0.9000 loaded via a namespace (and not attached): [1] pkgload_1.3.2 bit64_4.0.5 jsonlite_1.8.4 DatabaseConnector_6.0.0 R.utils_2.12.2 [6] shiny_1.7.4 assertthat_0.2.1 highr_0.10 blob_1.2.3 remotes_2.4.2 [11] yaml_2.3.6 sessioninfo_1.2.2 pillar_1.8.1 RSQLite_2.2.18 lattice_0.20-45 [16] glue_1.6.2 reticulate_1.26 digest_0.6.31 promises_1.2.0.1 htmltools_0.5.4 [21] httpuv_1.6.8 Matrix_1.5-1 R.oo_1.25.0 clipr_0.8.0 pkgconfig_2.0.3 [26] devtools_2.4.5 purrr_1.0.1 xtable_1.8-4 processx_3.8.0 later_1.3.0 [31] ParallelLogger_3.0.1 tibble_3.1.8 styler_1.9.0 generics_0.1.3 usethis_2.1.6 [36] ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.6.0 magrittr_2.0.3 [41] crayon_1.5.2 mime_0.12 memoise_2.0.1 evaluate_0.20 ps_1.7.2 [46] R.methodsS3_1.8.2 Andromeda_1.0.0 fs_1.5.2 fansi_1.0.3 R.cache_0.16.0 [51] pkgbuild_1.4.0 SqlRender_1.12.0 profvis_0.3.7 tools_4.2.2 data.table_1.14.4 [56] prettyunits_1.1.1 lifecycle_1.0.3 stringr_1.5.0 reprex_2.0.2 callr_3.7.3 [61] compiler_4.2.2 rlang_1.0.6 grid_4.2.2 rstudioapi_0.14 htmlwidgets_1.6.1 [66] miniUI_0.1.1.1 rmarkdown_2.19 DBI_1.1.3 R6_2.5.1 knitr_1.41 [71] fastmap_1.1.0 bit_4.0.4 utf8_1.2.2 stringi_1.7.12 rJava_1.0-6 [76] parallel_4.2.2 Rcpp_1.0.9 vctrs_0.5.1 png_0.1-7 urlchecker_1.0.1 [81] tidyselect_1.2.0 FeatureExtraction_3.2.0 xfun_0.36 </details> ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org