[GitHub] [arrow] pitrou commented on a diff in pull request #36377: GH-36280: [Parquet][C++] FileWriter supports WriteTable in the buffered mode

via GitHub Wed, 19 Jul 2023 08:24:15 -0700


pitrou commented on code in PR #36377:
URL: https://github.com/apache/arrow/pull/36377#discussion_r1268238707



##########
cpp/src/parquet/arrow/writer.cc:
##########
@@ -488,6 +458,59 @@ class FileWriterImpl : public FileWriter {
   std::vector<ArrowWriteContext> parallel_column_write_contexts_;
 };
 
+template <typename T>
+Status FileWriterImpl::WriteBuffered(const T& batch, int64_t 
max_row_group_length) {
+  if (row_group_writer_ == nullptr || !row_group_writer_->buffered() ||
+      row_group_writer_->num_rows() >= max_row_group_length) {
+    RETURN_NOT_OK(NewBufferedRowGroup());
+  }
+
+  auto WriteBatch = [&](int64_t offset, int64_t size) {
+    std::vector<std::unique_ptr<ArrowColumnWriterV2>> writers;
+    int column_index_start = 0;
+
+    for (int i = 0; i < batch.num_columns(); i++) {
+      std::shared_ptr<ChunkedArray> chunked_array = 
GetColumnChunkedArray(batch, i);

Review Comment:
   Ok, then you can simply call `GetColumnChunkedArray(batch, i)` outside of 
`WriteBatch`. It does not depend on offset/size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a diff in pull request #36377: GH-36280: [Parquet][C++] FileWriter supports WriteTable in the buffered mode

Reply via email to