jonkeane commented on PR #45346:
URL: https://github.com/apache/arrow/pull/45346#issuecomment-2614454649

   Thanks for all the info @TysonStanley and @ben-schwen 
   
   > "Safe option" would be not to drop the data.table class but only drop the 
`.internal.selfref` attribute, so data.table will detect that and allocate the 
right memory. ... I consider the initial report as WAI, although our warning 
could be better since we also warn about the serializing case when `saveRDS` 
and subsequent `readRDS`.
   
   _nods_ yeah, this sounds good to me. I'm happy to help with a PR for 
improving that warning in `data.table`. It makes a lot of sense for it to be 
there since it's about serializing those objects in any fashion, not just with 
Arrow/Parquet, but I wonder if there is anything that we should add to our 
documentation that calls this out as well. We could add something to 
https://arrow.apache.org/docs/r/articles/metadata.html#r-object-attributes 
which calls this case out as one of the cases where we can't preserve a portion 
of metadata, but the way that `data.table` works that's working as expected and 
self heals, etc.
   
   > We can silence the warning if we do what @nealrichardson suggested in 
https://github.com/apache/arrow/pull/45346#issuecomment-2613008898 though I'm 
not sure whether we want to or not. I pushed up 
https://github.com/apache/arrow/commit/459bde8b4bac76d31e70b8fcf45088ef1d99ea8b 
for discussion.
   
   Thanks for pushing this up for discussion (and extra thanks for jumping on 
this overall), IMHO we probably don't want to do this. If I am understanding 
correctly, there will still be a (small) performance hit when the memory is 
allocated and we should defer to `data.table`'s self healing here since it 
knows best when and how (and can message appropriately that there might be a 
small performance degradation that is coming from the fact that any 
serialization happened). 
   
   It sounds like we will likely close this PR, but I can go back to the issue 
and follow up with a TL;DR there explaining what we learned here (and closing 
that too since it's working as intended). I'm a little surprised in the 
original issue they mention this (seeming to) work in older versions of Arrow. 
There's no way it _actually did_ serialize the pointer and bring it back 
faithfully, but maybe something in #41969 changed the serialization slightly 
such that `data.table` is now better able to recognize and warn about the need 
to self-heal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to