bknakker commented on issue #45956:
URL: https://github.com/apache/arrow/issues/45956#issuecomment-2790443728
Thanks for the response! is_arrow_altrep is exactly something I would have
looked for, somehow I failed to look at unexported functions.
test_arrow_altrep_force_materialize() seem to call
test_arrow_altrep_is_materialized(). Looking at a column from
df=collect(arrowobj); eg. x=df$col1, is_arrow_altrep(x) returns TRUE, but
test_arrow_altrep_is_materialized(x) also returns TRUE, and consequently
test_arrow_altrep_force_materialize(x) throws an error that the array is
already materialized. If I pull the same (integer) vector from the arrow object
(xa=pull(arrowobj,col1), is_arrow_altrep is TRUE though
test_arrow_altrep_is_materialized() also returns FALSE in this case. So
something can be materialized by still be an ALTREP, which means that your
proposed function doesn't un-altrep the variable.
```
> .Internal(inspect(x))
@0x000001eb6c9b6e30 14 REALSXP g1c0 [MARK,REF(65535)] materialized
arrow::array_dbl_vector len=15812
> .Internal(inspect(xa))
@0x000001eb1d8cf390 14 REALSXP g0c0 [REF(65535)]
arrow::array_dbl_vector<0x000001eb0881e510, double, 183 chunks, 0 nulls>
len=3174882
```
I thought if I raise this problem here it may turn out that my idea is not
the right solution. Tbh I'm not sure - it's a broader question on the level of
the R ecosystem that I don't really see through, I'm trying to learn the
philosophy behind the whole ALTREP thing, how it is supposed to work and how it
is supposed to be used, I think I'll give it a round on R-help about what to
read and maybe on specific design principles of the whole thing. So I'm a bit
hesitant to do a PR, though I would be happy to contribute (I'm also not a
software engineer by training but the idea of contributing even little things
to great open source software is exciting and would be an honour to me.)
As for the RData file, this specific one I save consists of multiple data
frames and variables, including df-s that results from quite a few
transformations of the dfs loaded by arrow. Also, I needed to transfer the data
for a junior colleague so I didn't want to bother them with deep info on data
storage formats. It might not be the best solution but this is what I needed
now, and I'm quite sure this happens with other R users as well. I might be
wrong, but I feel that in general, besides arrow / IPC formats, even RData
might have its place and use case in a project or system.
Short sessionInfo:
```
R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
arrow_19.0.1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]