[jira] [Comment Edited] (HIVE-20380) explore storing multiple CBs in a single cache buffer in LLAP cache

2018-08-23 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591009#comment-16591009
 ] 

Sergey Shelukhin edited comment on HIVE-20380 at 8/24/18 1:01 AM:
--

After trying various approaches I think since this will anyway involve memory 
copying and interleaving buffers, what needs to happen instead is that we need 
to decrease allocation size after decompression. 
That is much simpler than having a separate cache and consolidating CBs into a 
single buffer, doing partial cache matches, adding offsets to LlapDataBuffer-s, 
etc.

One issue is that, for small cache-wide table case, where the entire cache can 
become locked, it's not helpful to replace the fully locked cache of 128Kb 
buffers with 4Kb of data each, with 4Kb buffers sitting in cache every 128Kb. 
You still cannot get 128Kb. So, we'd have to move data. We will not have 
multiple CBs per the Java buffer object, but merely change allocations so small 
CBs don't use large cache buffers.

If we do this shrinking before putting data into cache, then unlike regular 
cache defragmentation, which is complex, we have a set of already locked 
buffers that are also invisible to anyone else, so we can trivially consolidate 
within all the buffers allocated by a read, that noone can touch in any way, 
and free up some large buffers completely and also some parts of the smaller 
buffers (e.g. if we have 10 ROW_INDEX streams, each with <4Kb of data, but 
sitting in 128Kb allocs because the ORC file CB size is 128Kb, we can create 10 
4Kb buffers within one of those 10, and straight up deallocate 9 remaining 
128Kb buffers, plus the 64Kb + 16Kb + 8Kb in the first one). We can also do an 
extra step (e.g. if we have a single 4Kb-of-data-128Kb-alloc) of allocating a 
small buffer explicitly (without defragmentation, and with a flag to not split 
buffers larger than the original for this - no point in creating a 4Kb buffer 
out of another 128Kb of empty space for this example), and copying there before 
deallocating the big one. That will be able to pick up all the crumbs created 
by other consolidations like the one above. Without splitting and retries the 
allocation can be cheap and safe.
This will be controlled by a waste threshold setting.

Unfortunately this will do slightly less than nothing at all for Hive 2 without 
the defrag patch. But, if we backport the defrag patch (pending) this will also 
work for Hive 2.

I may not be able to work on this to completion immediately so just posting a 
brain dump here for reference. cc [~gopalv]


was (Author: sershe):
After trying various approaches I think since this will anyway involve memory 
copying and interleaving buffers, what needs to happen instead is that we need 
to decrease allocation size after decompression. Which won't move data, either. 
However, for small cache, wide table case, where the entire cache can become 
locked, it's not helpful to replace the fully locked cache of 128Kb buffers 
with 4Kb of data each with 4Kb buffers sitting in cache every 128Kb. So, we'd 
have to move data. We will not have multiple CBs per the Java buffer object, 
but merely change allocations so small CBs don't use large cache buffers.

If we do this shrinking before putting data into cache, then unlike regular 
cache defragmentation, which is complex, we have a set of already locked 
buffers that are also invisible to anyone else, so we can trivially consolidate 
within all the buffers allocated by a read, that noone can touch in any way, 
and free up some large buffers completely and also some parts of the smaller 
buffers (i.e. if we have 10 ROW_INDEX streams, each with <4Kb of data, but 
sitting in 128Kb allocs because the ORC file CB size is 128Kb, we can create 10 
4Kb buffers within one of those 10, and straight up deallocate 9 remaining 
128Kb buffers, plus the 64Kb + 16Kb + 8Kb in the first one). We can also do an 
extra step (e.g. if we have a single 4Kb-of-data-128Kb-alloc) of allocating a 
small buffer explicitly (without defragmentation, and with a flag to not split 
buffers larger than the original for this - no point in creating a 4Kb buffer 
out of another 128Kb of empty space for this example), and copying there before 
deallocating the big one. That will be able to pick up all the crumbs created 
by other consolidations like the one above. Without splitting and retries the 
allocation can be cheap and safe.
This will be controlled by a waste threshold setting.

Unfortunately this will do slightly less than nothing at all for Hive 2 without 
the defrag patch. But, if we backport the defrag patch (pending) this will also 
work for Hive 2.

I may not be able to work on this to completion immediately so just posting a 
brain dump here for reference. cc [~gopalv]

> explore storing multiple CBs in a single cache buffer in LLAP cache
> ---

[jira] [Comment Edited] (HIVE-20380) explore storing multiple CBs in a single cache buffer in LLAP cache

2018-08-23 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591009#comment-16591009
 ] 

Sergey Shelukhin edited comment on HIVE-20380 at 8/24/18 12:59 AM:
---

After trying various approaches I think since this will anyway involve memory 
copying and interleaving buffers, what needs to happen instead is that we need 
to decrease allocation size after decompression. Which won't move data, either. 
However, for small cache, wide table case, where the entire cache can become 
locked, it's not helpful to replace the fully locked cache of 128Kb buffers 
with 4Kb of data each with 4Kb buffers sitting in cache every 128Kb. So, we'd 
have to move data. We will not have multiple CBs per the Java buffer object, 
but merely change allocations so small CBs don't use large cache buffers.

If we do this shrinking before putting data into cache, then unlike regular 
cache defragmentation, which is complex, we have a set of already locked 
buffers that are also invisible to anyone else, so we can trivially consolidate 
within all the buffers allocated by a read, that noone can touch in any way, 
and free up some large buffers completely and also some parts of the smaller 
buffers (i.e. if we have 10 ROW_INDEX streams, each with <4Kb of data, but 
sitting in 128Kb allocs because the ORC file CB size is 128Kb, we can create 10 
4Kb buffers within one of those 10, and straight up deallocate 9 remaining 
128Kb buffers, plus the 64Kb + 16Kb + 8Kb in the first one). We can also do an 
extra step (e.g. if we have a single 4Kb-of-data-128Kb-alloc) of allocating a 
small buffer explicitly (without defragmentation, and with a flag to not split 
buffers larger than the original for this - no point in creating a 4Kb buffer 
out of another 128Kb of empty space for this example), and copying there before 
deallocating the big one. That will be able to pick up all the crumbs created 
by other consolidations like the one above. Without splitting and retries the 
allocation can be cheap and safe.
This will be controlled by a waste threshold setting.

Unfortunately this will do slightly less than nothing at all for Hive 2 without 
the defrag patch. But, if we backport the defrag patch (pending) this will also 
work for Hive 2.

I may not be able to work on this to completion immediately so just posting a 
brain dump here for reference. cc [~gopalv]


was (Author: sershe):
After trying various approaches I think since this will anyway involve memory 
copying and interleaving buffers, what needs to happen instead is that we need 
to decrease allocation size after decompression. Which won't move data, either. 
However, for small cache, wide table case, where the entire cache can become 
locked, it's not helpful to replace the fully locked cache of 128Kb buffers 
with 4Kb of data each with 4Kb buffers sitting in cache every 128Kb. So, we'd 
have to move data. We will not have multiple CBs per the Java buffer object, 
but merely change allocations so small CBs don't use large cache buffers.

If we do this shrinking before putting data into cache, then unlike regular 
cache defragmentation, which is complex, we have a set of already locked 
buffers that are also invisible to anyone else, so we can trivially consolidate 
within all the buffers allocated by a read, that noone can touch in any way, 
and free up some large buffers completely and also some parts of the smaller 
buffers (i.e. if we have 10 ROW_INDEX streams, each with <4Kb of data, but 
sitting in 128Kb allocs because the ORC file CB size is 128Kb, we can create 10 
4Kb buffers within one of those 10, and straight up deallocate 9 remaining 
128Kb buffers, plus the 64Kb + 16Kb + 8Kb in the first one). We can also do an 
extra step (e.g. if we have a single 4Kb-of-data-128Kb-alloc) of allocating a 
small buffer explicitly (without defragmentation, and with a flag to not split 
buffers larger than the original for this - no point in creating a 4Kb buffer 
out of another 128Kb of empty space for this example), and copying there before 
deallocating the big one. That will be able to pick up all the crumbs created 
by other consolidations like the one above. Without splitting and retries the 
allocation can be cheap and safe.
This will be controlled by a waste threshold setting.

Unfortunately this will do slightly less than nothing at all for Hive 2 without 
the defrag patch. But, if we backport the defrag patch (pending) this will also 
work for Hive 2.

I may not be able to work on this to completion immediately so just posting a 
brain dump here for reference.

> explore storing multiple CBs in a single cache buffer in LLAP cache
> ---
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>