[ 
https://issues.apache.org/jira/browse/HIVE-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571562#comment-14571562
 ] 

Sergey Shelukhin commented on HIVE-10068:
-----------------------------------------

Update from some test runs on TPCDS and TPCH queries, we waste around 15% 
allocated memory due to buddy allocator granularity:
{noformat}
$ sed -E "s/.*ALLOCATED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print 
s}'
278162046976
$ sed -E "s/.*ALLOCATED_USED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk 
'{s+=$1}END{print s}'
238565954908
{noformat}

Some of that is obviously unavoidable, but some could be avoided by 
implementing this. However, it's not as bad as I expected (bad results can be 
seen on very small datasets were stripes/RGs are routinely smaller than 
compression block size.

> LLAP: adjust allocation after decompression
> -------------------------------------------
>
>                 Key: HIVE-10068
>                 URL: https://issues.apache.org/jira/browse/HIVE-10068
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>
> We don't know decompressed size of a compression buffer in ORC, all we know 
> is the file-level compression buffer size. For many files, compression 
> buffers can be smaller than that because of compact encoding, or because 
> compression block ends for other reasons (different streams, etc. - "present" 
> streams for example are very small).
> BuddyAllocator should be able to accept back parts of the allocated memory 
> (e.g. allocate 256Kb with minimum allocation of 32Kb, decompress 45Kb, return 
> the last 192Kb as 64+128Kb). For generality (this depends on implementation), 
> we can make an API like "offer", and allocator can decide to take back 
> however much it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to