kou commented on code in PR #39164:
URL: https://github.com/apache/arrow/pull/39164#discussion_r1427550312
##########
cpp/src/arrow/ipc/reader.h:
##########
@@ -317,7 +317,12 @@ class ARROW_EXPORT Listener {
/// \since 0.17.0
class ARROW_EXPORT CollectListener : public Listener {
public:
- CollectListener() : schema_(), filtered_schema_(), record_batches_(),
metadatas_() {}
+ explicit CollectListener(bool copy_record_batch = false)
Review Comment:
Oh, sorry. My comment was outdated:
https://github.com/apache/arrow/pull/39164#discussion_r1421772936
But it doesn't elaborate enough...
`StreamDecoder::Consume(const uint8_t* data, int64_t size)` (not
`Consume(Buffer* buffer)`) assumes that the given `data` is alive while
`Consume()` is calling. (The `data` may be freed after `Consume()` call is
finished.)
So decoded record batches that are passed to `Listener` are valid while
`Consume()` is calling. But `CollectListener` keeps decoded record batches
after `Consume()` is finished. If we want to make decoded record batches are
valid after `Consume()` call, we need to copy these record batches.
This is only happen with `Consume(const uint8_t* data, int64_t size)`. This
is not happen with `Consume(Buffer* buffer)` because it refers the given
`buffer`. So I want to avoid copying all decoded record batches. This is why I
added `copy_record_batch` option here.
I hope that this explains why...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]