jonkeane commented on a change in pull request #9743:
URL: https://github.com/apache/arrow/pull/9743#discussion_r606244216
##########
File path: r/R/util.R
##########
@@ -45,3 +45,16 @@ is_list_of <- function(object, class) {
}
empty_named_list <- function() structure(list(), .Names = character(0))
+
+read_compressed_error <- function(e) {
+ e <- as.character(e)
+ alg <- regmatches(e, gregexpr("(?<=\')(.*?)(?=\')", e, perl = TRUE))[[1]]
+ msg <- paste("Unsupported compressed format", alg,
+ "\nPlease visit https://arrow.apache.org/docs/r/articles/install.html",
+ "\nfor an explanation about optional features such as compression
libraries enabled.",
Review comment:
```suggestion
"\nfor information about troubleshooting installation issues.",
```
##########
File path: r/R/util.R
##########
@@ -45,3 +45,16 @@ is_list_of <- function(object, class) {
}
empty_named_list <- function() structure(list(), .Names = character(0))
+
+read_compressed_error <- function(e) {
+ e <- as.character(e)
+ alg <- regmatches(e, gregexpr("(?<=\')(.*?)(?=\')", e, perl = TRUE))[[1]]
+ msg <- paste("Unsupported compressed format", alg,
+ "\nPlease visit https://arrow.apache.org/docs/r/articles/install.html",
+ "\nfor an explanation about optional features such as compression
libraries enabled.",
+ "\nSetting LIBARROW_MINIMAL=false and then building the package from
source fixes this,",
+ sprintf("\nor building libarrow with -DARROW_WITH_%s=ON and reinstalling
the package.",
+ toupper(alg))
+ )
Review comment:
`stop` will `paste` these together, so you don't need to wrap this whole
thing in `paste`.
##########
File path: r/R/feather.R
##########
@@ -154,7 +154,15 @@ read_feather <- function(file, col_select = NULL,
as_data_frame = TRUE, ...) {
vars_select(names(reader), !!col_select)
}
- out <- reader$Read(columns)
+ out <- tryCatch(
Review comment:
This tryCatch + function is an improvement, but is there a reason you
didn't include the `tryCatch` in your function instead of repeating it?
Something like
```
ensure_compression_available <- function(expr) {
tryCatch(
expr,
error = ...
)
}
ensure_compression_available(reader$Read(columns))
```
It would also be nice to have a slightly more clear name for what the
function is doing (like `ensure_compression_available`) instead of
`ensure_compression_available`.
##########
File path: r/R/util.R
##########
@@ -45,3 +45,16 @@ is_list_of <- function(object, class) {
}
empty_named_list <- function() structure(list(), .Names = character(0))
+
+read_compressed_error <- function(e) {
+ e <- as.character(e)
+ alg <- regmatches(e, gregexpr("(?<=\')(.*?)(?=\')", e, perl = TRUE))[[1]]
+ msg <- paste("Unsupported compressed format", alg,
+ "\nPlease visit https://arrow.apache.org/docs/r/articles/install.html",
+ "\nfor an explanation about optional features such as compression
libraries enabled.",
+ "\nSetting LIBARROW_MINIMAL=false and then building the package from
source fixes this,",
+ sprintf("\nor building libarrow with -DARROW_WITH_%s=ON and reinstalling
the package.",
Review comment:
These two lines should be:
"Try setting the environment variable LIBARROW_MINIMAL=false and
reinstalling for a more complete installation (including {compression}) or
setting ARROW_WITH_{compression}=ON and reinstalling to enable support for this
codec."
Note that for everyone who is not a developer in this situation, they will
be end up triggering `inst/build_arrow_static.sh` which includes a line that
[checks that environment variable specifically to enable the correct cmake
flag](https://github.com/apache/arrow/blob/master/r/inst/build_arrow_static.sh#L63).
We want to make this as easy as possible for someone so we want to recommend
they do that instead of asking them to use cmake themselves, learn about all
the flags, etc.
You should try this out yourself by setting `ARROW_WITH_LZ4=ON` and then
using `install.packages("arrow")` and then checking that lz4 is enabled (or any
of the other flags that we turn on/off for our minimal builds.
##########
File path: r/R/util.R
##########
@@ -45,3 +45,16 @@ is_list_of <- function(object, class) {
}
empty_named_list <- function() structure(list(), .Names = character(0))
+
+read_compressed_error <- function(e) {
+ e <- as.character(e)
+ alg <- regmatches(e, gregexpr("(?<=\')(.*?)(?=\')", e, perl = TRUE))[[1]]
+ msg <- paste("Unsupported compressed format", alg,
+ "\nPlease visit https://arrow.apache.org/docs/r/articles/install.html",
+ "\nfor an explanation about optional features such as compression
libraries enabled.",
+ "\nSetting LIBARROW_MINIMAL=false and then building the package from
source fixes this,",
+ sprintf("\nor building libarrow with -DARROW_WITH_%s=ON and reinstalling
the package.",
+ toupper(alg))
+ )
+ stop(msg)
Review comment:
Could you add `call. = FALSE` here so the call isn't printed — I don't
think it will be particularly useful here.
##########
File path: r/tests/testthat/test-parquet.R
##########
@@ -234,3 +234,16 @@ test_that("ParquetFileReader $ReadRowGroup(s) methods", {
expect_true(reader$ReadRowGroups(c(0, 1), 0) == Table$create(x = 1:20))
expect_error(reader$ReadRowGroups(c(0, 1), 1))
})
+
+test_that("Error messages are shown when the compression algorithm lz4/snappy
+ is not found", {
+ if (codec_is_available("snappy")) {
+ d <- read_parquet(pq_file)
+ expect_is(d, "data.frame")
+ } else {
+ expect_error(
+ read_parquet(pq_file),
+ "Unsupported compressed format"
+ )
Review comment:
I don't think that the `else` branch of this test will ever be
exercised. On our minimal build test we disable (among other things) both
[snappy](https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=2724&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=6c939d89-0d1a-51f2-8b30-091a7a82e98c&l=309)
and
[parquet](https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=2724&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=6c939d89-0d1a-51f2-8b30-091a7a82e98c&l=248)
on that build. And then also note that [this entire file is skipped if parquet
is not
available.](https://github.com/apache/arrow/blob/master/r/tests/testthat/test-parquet.R#L18).
Though, if this test did run in an environment where both parquet and snappy
are disabled, it shouldn't be caught and talk about compression codecs, it
should mention that parquet support was not built instead (see [my comment
above](https://github.com/apache/arrow/pull/9743/files#r605862157) about that).
##########
File path: r/tests/testthat/test-feather.R
##########
@@ -196,3 +196,18 @@ test_that("Character vectors > 2GB can write to feather", {
})
unlink(feather_file)
+
+ft_file <- system.file("v0.7.1.feather", package = "arrow")
Review comment:
It looks like you added this feather file? That's shouldn't actually be
necessary. We have a number of feather files with different compression codecs
at https://github.com/apache/arrow/tree/master/r/tests/testthat/golden-files.
Please use one those if possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]