Ext3h commented on code in PR #48192:
URL: https://github.com/apache/arrow/pull/48192#discussion_r2569505334
##########
cpp/src/arrow/util/compression_zstd.cc:
##########
@@ -187,9 +202,14 @@ class ZSTDCodec : public Codec {
DCHECK_EQ(output_buffer_len, 0);
output_buffer = &empty_buffer;
}
-
- size_t ret = ZSTD_decompress(output_buffer,
static_cast<size_t>(output_buffer_len),
- input, static_cast<size_t>(input_len));
+ // Decompression context for ZSTD contains several large heap allocations.
Review Comment:
> Arrow uses its internal thread pool extensively, so that doesn't apply
here.
One global thread pool, and the size of that pool is bounded to the number
of CPU cores. From my perspective, this means that there is already a sane
limit in the number of threads.
Meanwhile, for any application I can think of, it should be more or less be
safe to assume that any thread that did require a `thread_local` scoped context
once, will also either need it again during the lifetime of the process, or the
thread will have a limited life time itself. It's not like someone is going to
spawn thousands of threads for single-shot compression, and then somehow
additionally keeps all of them around indefinitely without ever using them for
the same task again. (While also not having enough RAM to pay for that
allocated context, while still having enough RAM to pay for the remaining
overhead of an OS thread...)
> By the way, a simpler alternative to this non-static ThreadLocal class is
to manage a bounded-size freelist of contexts at the Codec level. This would
probably cut down on implementation complexity, though it would also limit
context reuse if the number of decompressing threads goes above the limit.
That would imply a global synchronization primitive on the freelist, and
also completely messes up the cache locality in case the cached context is
jumping cores due to the non-local properties of any naively implemented
freelist. You'd at least have to implement an instance stealing pattern in
order to preserve locality in the good path. Also "bounded-size" - assuming
that you had wanted to block until an instance became free - means there's now
a source of priority inversion, and the only way to avoid is to
"coincidentally" set the upper bound exactly to the number of threads.
The only potential benefit was to be had in the case of potentially
short-lived threads - but the overhead of creating a fresh thread dominates the
cost for that ZSTD context allocation bound to the TLS by magnitudes so
certainly not worth the effort to optimize for.
By all means, `thread_local` is appropriate here for holding the context.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]