paleolimbot commented on issue #36274: URL: https://github.com/apache/arrow/issues/36274#issuecomment-1719510628
> Does the time complexity of that scale with the number of chunks or the number of entries or neither? I am actually not sure of the time complexity of `as_record_batch_reader()` and what it depends on. My guess would be that it is very, very low but might be observable if your table has many (thousands) of columns. It almost certainly does not depend on the number of rows but might depend on the number of chunks. You will have to benchmark and see for the type of data you're planning to pass. > My understanding is that it is possible to do a `std::shared_ptr<arrow::Table>` to `pyarrow::Table` cast, which means that there isn't any table copying going on. If you control the builds of both the Arrow R package and whatever C++ you're writing (e.g., via setting `ARROW_HOME` and building your own arrow R package or distributing an R package via conda-forge), you can do this in R too. It is my understanding that the only way to do this safely in Python would be via an `arrow-cpp` conda dependency (i.e., distribution via `pip` would be unsafe). Similarly, if you did this in R, distribution via the usual packaging process would not be safe because you do not control the build of the arrow R package. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
