EinMaulwurf commented on issue #45601: URL: https://github.com/apache/arrow/issues/45601#issuecomment-2907906623
Hi, thanks for working on this problem. My dataset was consisting of a single parquet file. I just ended up reading it into R completely (`arrow::read_parquet()`) and then removing all the labels (`haven::zap_labels()` and similar) before continuing work. I did not need the labels, they were just in the dataset, probably because it was exported from STATA where labels are more commonly used (or so I've heard). I also just tested your suggestion with `map_batches()`. It works, but is almost 100 times slower compared to the same dataset without labels with a "normal" pipeline without the `map_batches()`. For my case, I would prefer a solution where arrow just drops or ignores all labels (perhaps with a warning or something). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
