[ https://issues.apache.org/jira/browse/PARQUET-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281418#comment-17281418 ]
Sai Sri Harsha Gudladona commented on PARQUET-118: -------------------------------------------------- Are there any better ways to handle this for compression and decompression. Using this lib in a streaming application to batch protobuf/json to snappy compressed parquet is causing sporadic OOM errors. > Provide option to use on-heap buffers for Snappy compression/decompression > -------------------------------------------------------------------------- > > Key: PARQUET-118 > URL: https://issues.apache.org/jira/browse/PARQUET-118 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.6.0 > Reporter: Patrick Wendell > Priority: Major > > The current code uses direct off-heap buffers for decompression. If many > decompressors are instantiated across multiple threads, and/or the objects > being decompressed are large, this can lead to a huge amount of off-heap > allocation by the JVM. This can be exacerbated if overall, there is not heap > contention, since no GC will be performed to reclaim the space used by these > buffers. > It would be nice if there was a flag we cold use to simply allocate on-heap > buffers here: > https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28 > We ran into an issue today where these buffers totaled a very large amount of > storage and caused our Java processes (running within containers) to be > terminated by the kernel OOM-killer. -- This message was sent by Atlassian Jira (v8.3.4#803005)