[ https://issues.apache.org/jira/browse/HIVE-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571562#comment-14571562 ]
Sergey Shelukhin commented on HIVE-10068: ----------------------------------------- Update from some test runs on TPCDS and TPCH queries, we waste around 15% allocated memory due to buddy allocator granularity: {noformat} $ sed -E "s/.*ALLOCATED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}' 278162046976 $ sed -E "s/.*ALLOCATED_USED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}' 238565954908 {noformat} Some of that is obviously unavoidable, but some could be avoided by implementing this. However, it's not as bad as I expected (bad results can be seen on very small datasets were stripes/RGs are routinely smaller than compression block size. > LLAP: adjust allocation after decompression > ------------------------------------------- > > Key: HIVE-10068 > URL: https://issues.apache.org/jira/browse/HIVE-10068 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > > We don't know decompressed size of a compression buffer in ORC, all we know > is the file-level compression buffer size. For many files, compression > buffers can be smaller than that because of compact encoding, or because > compression block ends for other reasons (different streams, etc. - "present" > streams for example are very small). > BuddyAllocator should be able to accept back parts of the allocated memory > (e.g. allocate 256Kb with minimum allocation of 32Kb, decompress 45Kb, return > the last 192Kb as 64+128Kb). For generality (this depends on implementation), > we can make an API like "offer", and allocator can decide to take back > however much it can. -- This message was sent by Atlassian JIRA (v6.3.4#6332)