thisisnic commented on issue #45601:
URL: https://github.com/apache/arrow/issues/45601#issuecomment-2908121045
Thanks for trying it out and confirming the file format! Another
alternative temporary solution which may be quicker than the `map_batches()`
solution would be to rewrite the dataset without the labels.
```
open_dataset(whatever) %>%
mutate(col = cast(col, int32()) %>%
write_dataset(newlocation)
open_dataset(newlocation) %>%
filter(col > 3) %>%
collect()
```
Again, I'm thinking of this as a workaround - what would be nice would be if
we could just operate on the underlying storage type (i.e. an integer), but
this is a much bigger design decision. Just dropping the labels is a bit
problematic as it messes up roundtrip fidelity (i.e some people might want to
be able to read and write the data without dropping the labels). Will keep
iterating on a solution!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]