Re: [PR] GH-39163: [C++] Add missing data copy in StreamDecoder::Consume(data) [arrow]

via GitHub Thu, 14 Dec 2023 20:59:30 -0800


kou commented on code in PR #39164:
URL: https://github.com/apache/arrow/pull/39164#discussion_r1427550312



##########
cpp/src/arrow/ipc/reader.h:
##########
@@ -317,7 +317,12 @@ class ARROW_EXPORT Listener {
 /// \since 0.17.0
 class ARROW_EXPORT CollectListener : public Listener {
  public:
-  CollectListener() : schema_(), filtered_schema_(), record_batches_(), 
metadatas_() {}
+  explicit CollectListener(bool copy_record_batch = false)

Review Comment:
   Oh, sorry. My comment was outdated: 
https://github.com/apache/arrow/pull/39164#discussion_r1421772936
   
   But it doesn't elaborate enough...
   
   `StreamDecoder::Consume(const uint8_t* data, int64_t size)` (not 
`Consume(Buffer* buffer)`) assumes that the given `data` is alive while 
`Consume()` is calling. (The `data` may be freed after `Consume()` call is 
finished.)
   So decoded record batches that are passed to `Listener` are valid while 
`Consume()` is calling. But `CollectListener` keeps decoded record batches 
after `Consume()` is finished. If we want to make decoded record batches are 
valid after `Consume()` call, we need to copy these record batches.
   
   This is only happen with `Consume(const uint8_t* data, int64_t size)`. This 
is not happen with `Consume(Buffer* buffer)` because it refers the given 
`buffer`. So I want to avoid copying all decoded record batches. This is why I 
added `copy_record_batch` option here.
   
   I hope that this explains why...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-39163: [C++] Add missing data copy in StreamDecoder::Consume(data) [arrow]

Reply via email to