[ https://issues.apache.org/jira/browse/ARROW-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578358#comment-17578358 ]
Egill Axfjord Fridgeirsson commented on ARROW-17373: ---------------------------------------------------- After some further testing it seems the copying is unnecessary. Opening a large dataset and writing to a different location seems to produce the error in most cases. Here is a slightly simpler reprex: {code:java} df <- data.frame(replicate(1,sample(0:1,100e6,rep=TRUE))) savePath <- file.path(tempdir(), 'arrowTest') if (!dir.exists(savePath)) { dir.create(savePath) } arrow::write_feather(df, file.path(savePath, 'part-0.feather')) writePath <- file.path(tempdir(), 'arrowTest') if (!dir.exists(writePath)) { dir.create(writePath) } dataset <- arrow::open_dataset(savePath, format='feather') arrow::write_dataset(dataset = dataset, path = writePath, format = 'feather') {code} > [R] copying dataset and immediatly writing the copy to a different location > fails > --------------------------------------------------------------------------------- > > Key: ARROW-17373 > URL: https://issues.apache.org/jira/browse/ARROW-17373 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 9.0.0 > Environment: Ubuntu 22.10 > Reporter: Egill Axfjord Fridgeirsson > Priority: Major > > When I copy large feather files, open a dataset from that file and > immediately write that dataset to a new location I get the following error: > > ```Error: Invalid: Expected to read 144 metadata bytes but got 0``` > > I have made a reproducible example below: > > ``` r > df <- data.frame(replicate(1,sample(0:1,100e6,rep=TRUE))) > savePath <- file.path(tempdir(), 'arrowTest') > if (!dir.exists(savePath)) { > dir.create(savePath) > } > arrow::write_feather(df, file.path(savePath, 'part-0.feather')) > copyPath <- file.path(tempdir(),'arrowTest') > if (!dir.exists(copyPath)) { > dir.create(copyPath) > } > writePath <- file.path(tempdir(), 'arrowTest') > if (!dir.exists(writePath)) { > dir.create(writePath) > } > arrow::copy_files(savePath, copyPath) > dataset <- arrow::open_dataset(copyPath, format='feather') > arrow::write_dataset(dataset = dataset, path = writePath, format = 'feather') > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)