jonkeane commented on pull request #11369:
URL: https://github.com/apache/arrow/pull/11369#issuecomment-952072917
Hmmm, actually one of those just finished. Maybe this is “just” paying the
performance of reading-from-disk + arrow to R converstion all at once on the
write, but I’m surprised the second write here is _so much_ longer than the
first:
```
> df <- data.frame(
+ col_letters = sample(LETTERS, 10000000, replace = TRUE)
+ )
>
> system.time({
+ write_parquet(df, "df.parquet")
+ })
user system elapsed
0.633 0.042 0.681
>
> df_rt <- read_parquet("df.parquet")
>
> system.time({
+ write_parquet(df_rt, "df_again.parquet")
+ })
user system elapsed
94.758 17.734 114.312
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]