phrocker commented on pull request #140: URL: https://github.com/apache/accumulo/pull/140#issuecomment-1018608281
> After some research into recent Hadoop improvements and since production hasn't encountered memory issues with compressors recently, I think this can be closed. Hadoop has made changes in version 2.9.0 to `CodecPool` that essentially does no pooling, one improvement this PR was making. Here is the code from `CodecPool` that checks for the `DoNotPool` annotation: https://github.com/apache/hadoop/blob/rel/release-2.9.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java#L160-L165 @milleruntime thanks for doing that leg work! I suspect this issue wouldn't be encountered while source limiting exists on a production systems. I haven't been part of enough to know to be honest, but I did confirm it in 2020 ( my prior comment was meant to be on a production system ) in the last round of comments with an iterator that deep copied a lot of sources. I believe that annotation is needed on that class, though. A cursory searched shows that it only the built in built in gzip compressor/decompresor does not pool (https://github.com/apache/hadoop/search?q=DoNotPool) . I think the primary motivation was changing how delegation of compressors worked and to make it pluggable less correcting the initial issue but as Keith identified it may be better suited in spi anyway, which means closing this makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
