nealrichardson commented on PR #12316:
URL: https://github.com/apache/arrow/pull/12316#issuecomment-1100072852

   The last failing test was that row-level metadata isn't included when you 
collect() a dataset, with a warning: 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-metadata.R#L274-L277
   
   What was actually happening is that no KeyValueMetadata was being preserved 
on write in this PR, we just don't have any tests currently around metadata in 
write_dataset, only this one about row-level metadata (in this PR, neither 
row-level nor any other kind of of metadata is present). 
   
   I pushed a fix that will grab the KVM from the source data object in the 
query and use that, and that fixes the failing tests, but I'm not sure that's 
totally correct. If you have a join or aggregation, it won't make sense; we 
have some logic in do_exec_plan() to handle some of this, should we factor that 
out and apply it in write_dataset()? Also more than happy to defer handling 
this since (afaict) this PR is consistent with the status quo, and we don't 
have any assertions about the behavior (that I can find) of general metadata on 
dataset write, nor any handling for amalgamating metadata when joining, etc. 
Your call @jonkeane.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to