pitrou commented on code in PR #48192:
URL: https://github.com/apache/arrow/pull/48192#discussion_r2569344430
##########
cpp/src/arrow/util/compression_zstd.cc:
##########
@@ -187,9 +202,14 @@ class ZSTDCodec : public Codec {
DCHECK_EQ(output_buffer_len, 0);
output_buffer = &empty_buffer;
}
-
- size_t ret = ZSTD_decompress(output_buffer,
static_cast<size_t>(output_buffer_len),
- input, static_cast<size_t>(input_len));
+ // Decompression context for ZSTD contains several large heap allocations.
Review Comment:
> Can't really say I like the idea though, that bad use of that interface
(i.e. not _**explicitly**_ sharing the factory as the user of the outermost
interface!) would still inevitably result in carrying multiple instances of the
thread local scope instead.
Two non-exclusive answers: 1) documentation 2) make caching optional in the
`Codec` constructor
> Makes it harder to use, and potentially even backfires in terms of peak
memory consumption. Even worst, that map in the TLS is accumulating a memory
leak of expired shared ptrs - which due to extensive use of `std::make_shared`
are usually actually fused allocations.
Right, but the underlying compression context will be destroyed anyway,
which is what matters.
(and a mitigation for this is to scrub expired weak_ptrs depending on
heuristics)
> assuming that threads are usually a resource developers are good at
tracking and tearing down when no longer intended for re-use
Arrow uses its internal thread pool extensively, so that doesn't apply here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]