[ https://issues.apache.org/jira/browse/ARROW-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104730#comment-17104730 ]
Neal Richardson commented on ARROW-8748: ---------------------------------------- We could add methods to concatenate Tables in Arrow memory (the function probably exists in the C++ library). But I'm not sure that's the best solution to your problem. If you have several Tables and you dump them to a file, you don't need to concatenate them in memory first. You can use the lower-level {{RecordBatchStreamWriter}} that {{write_ipc_stream}} wraps. Something like: {code:r} file_obj <- FileOutputStream$create(file_name) writer <- RecordBatchFileWriter$create(file_obj, batch$schema) for (batch in batches) { writer$write(batch) } writer$close() file_obj$close() {code} See {{?RecordBatchWriter}}. > [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows > and dplyr::bind_cols > ------------------------------------------------------------------------------------------------ > > Key: ARROW-8748 > URL: https://issues.apache.org/jira/browse/ARROW-8748 > Project: Apache Arrow > Issue Type: New Feature > Components: R > Reporter: Dominic Dennenmoser > Priority: Major > Labels: features, performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > First at all, many thanks for your hard work! I was quite exited, when you > guys implemented some basic function of the the {{dplyr}} package. Is there a > why to combine tow or more arrow tables into one by rows or columns? At the > moment my workaround looks like this: > {code:r} > dplyr::bind_rows( > "a" = arrow.table.1 %>% dplyr::collect(), > "b" = arrow.table.2 %>% dplyr::collect(), > "c" = arrow.table.3 %>% dplyr::collect(), > "d" = arrow.table.4 %>% dplyr::collect(), > .id = "ID" > ) %>% > arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow") > {code} > But this is actually not really a meaningful measure because of putting the > data back as dataframes/tibbles into the r environment, which might lead to > an exhaust of RAM space. Perhaps you might have a better workaround on hand. > It would be great if you guys could implement the {{bind_rows}} and > {{bind_cols}} methods provided by {{dplyr}}. > {code:java} > dplyr::bind_rows( > "a" = arrow.table.1, > "b" = arrow.table.2, > "c" = arrow.table.3, > "d" = arrow.table.4, > .id = "ID" > ) %>% > arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow"){code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)