[
https://issues.apache.org/jira/browse/PARQUET-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190636#comment-14190636
]
Julien Le Dem commented on PARQUET-118:
---------------------------------------
Hello Patrick,
Feel free to propose a pull request for this.
I'd be happy to review.
CC [~nongli]
> Provide option to use on-heap buffers for Snappy compression/decompression
> --------------------------------------------------------------------------
>
> Key: PARQUET-118
> URL: https://issues.apache.org/jira/browse/PARQUET-118
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.6.0rc2
> Reporter: Patrick Wendell
>
> The current code uses direct off-heap buffers for decompression. If many
> decompressors are instantiated across multiple threads, and/or the objects
> being decompressed are large, this can lead to a huge amount of off-heap
> allocation by the JVM. This can be exacerbated if overall, there is not heap
> contention, since no GC will be performed to reclaim the space used by these
> buffers.
> It would be nice if there was a flag we cold use to simply allocate on-heap
> buffers here:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28
> We ran into an issue today where these buffers totaled a very large amount of
> storage and caused our Java processes (running within containers) to be
> terminated by the kernel OOM-killer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)