EB80 edited a comment on issue #11665:
URL: https://github.com/apache/arrow/issues/11665#issuecomment-975690421
I expect that the issue with arrow::read_feather was just because I had used
the very old feather::write_feather to write the file.
I have the following code to test arrow::write_feather:
```R
rm(list = ls())
# set the wd to be where this script is saved
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
# set dimensions based on the real file
numRows = 26e6 # 26M rows in the real file
numCols = 150 # 150 columns in the real file
# whip up a fake dataframe
fakeDataframe <- as.data.frame(matrix("fake string", numRows, numCols))
# change the column names for aesthetic purposes, I guess
names(fakeDataframe) <- sprintf("Fake Column %s", 1:150)
# save the fake file with data.table
data.table::fwrite(fakeDataframe, "fakeFile.csv")
# save the fake file with arrow
arrow::write_feather(fakeDataframe, "fakeFile.feather")
```
The fwrite step took about 10 minutes to write. While the dimensions of the
fake file match those of the real file, the size on the disk is much larger (46
GB vice 32 GB). I wrote a while loop to trim off rows from the fake file until
it matched the real file, but object.size() was painfully slow. Either way, I
figured this would be a suitable test. arrow::write_feather hangs with this
fake dataframe just as before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]