thisisnic commented on issue #45601:
URL: https://github.com/apache/arrow/issues/45601#issuecomment-2907773208
Hi @EinMaulwurf, I'm curious whether this temporary fix would work for you
on the kind of datasets you're working with or if it's too slow? You can use
`map_batches()` to convert the labelled column to a data type that Arrow can
work with, though it'll be slower as it's doing the conversion in R and not
Arrow. Here's an example with a small dataset.
```
library(haven)
library(arrow)
library(tibble)
library(dplyr)
d <- tibble(
a = labelled(x = 1:5, label = "example variable a"),
b = labelled(x = 11:15, label = "example variable b")
)
tf <- tempfile()
write_parquet(d, tf)
library(arrow)
open_dataset(tf) %>%
map_batches(~mutate(., a = as.integer(a))) %>% # remove labels
filter(a > 3) %>%
collect() %>%
mutate(a = labelled(a, , label = "example variable a")) # restore labels
on output data
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]