paleolimbot commented on code in PR #12817:
URL: https://github.com/apache/arrow/pull/12817#discussion_r855310102
##########
r/R/python.R:
##########
@@ -105,19 +105,25 @@ py_to_r.pyarrow.lib.ChunkedArray <- function(x, ...) {
}
r_to_py.Table <- function(x, convert = FALSE) {
- # Import with convert = FALSE so that `_import_from_c` returns a Python
object
- pa <- reticulate::import("pyarrow", convert = FALSE)
- out <- pa$Table$from_arrays(x$columns, schema = x$schema)
- # But set the convert attribute on the return object to the requested value
+ # Going through RecordBatchReader maintains schema metadata (e.g.,
Review Comment:
I made a JIRA (ARROW-16269) for the schema metadata thing...the gist of it
is that the schema metadata roundtrips fine but you end up with a situation
where `roundtripped_table$col1$type` isn't the same as
`roundtripped_table$schema$col1$type`.
Going through `RecordBatchReader` does re-chunk everything to line up into
record batches BUT the slicing is zero-copy (according to the comments:
https://github.com/apache/arrow/blob/1157e677f9ba3e6d5b203adde4756a2e4d178713/cpp/src/arrow/table.h#L240-L244
). It looks like it zero-copy slices everything to match the column with the
smallest batches (see details below). I don't think this will materially matter
although it would be nice to avoid the potential re-chunking (I made
ARROW-16269 to see if we can't reinstate column-wise conversion).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]