[jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently

2018-11-12 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20380:

Description: 
The main thing to do is 
https://issues.apache.org/jira/browse/HIVE-20380?focusedCommentId=16591009=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16591009

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]



  was:
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, everywhere in the 
readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after 
decompression is the new idea (see comments) that can be confined to the 
allocator and is also more flexible - no dependence on cache map, so we don't 
need to make sure stuff is contiguous and such (for example, R_I streams that 
we want to consolidate are interleaved with large bloom filters, that we don't 
want to read or consolidate when they are not needed - but cache key structure 
depends on offsets, so we'd need a new cache map for R_I and separate logic for 
these streams). Also streams like PRESENT with one small CB cannot be combined 
with anything realistically speaking, but shrinking the allocation will help 
them.

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]




> LLAP cache should cache small buffers more efficiently
> --
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> The main thing to do is 
> https://issues.apache.org/jira/browse/HIVE-20380?focusedCommentId=16591009=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16591009
> There are also minor heap improvements possible.
> 1) Intern tracking tag.
> 2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
> use one method, compareAndSwap.
> One more idea is making tracking less object oriented, in particular passing 
> around integer indexes instead of objects and storing state in giant arrays 
> somewhere (potentially with some optimizations for less common things), 
> instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently

2018-08-24 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20380:

Description: 
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, everywhere in the 
readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after 
decompression is the new idea (see comments) that can be confined to the 
allocator and is also more flexible - no dependence on cache map, so we don't 
need to make sure stuff is contiguous and such (for example, R_I streams that 
we want to consolidate are interleaved with large bloom filters, that we don't 
want to read or consolidate when they are not needed - but cache key structure 
depends on offsets, so we'd need a new cache map for R_I and separate logic for 
these streams). Also streams like PRESENT with one small CB cannot be combined 
with anything realistically speaking, but shrinking the allocation will help 
them.

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]



  was:
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, everywhere in the 
readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after 
decompression is the new idea (see comments).

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]




> LLAP cache should cache small buffers more efficiently
> --
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
> (instead of 256Kb), then after we moved metadata cache off-heap, the index 
> streams that are all tiny take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some 
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and 
> probably compute) overhead to track all these buffers. Arguably even the 4Kb 
> min.alloc is too small.
> The initial idea was to store multiple CBs per block, however this is a lot 
> of work all over the place (cache mapping, cache lookups, everywhere in the 
> readers, etc.). 
> Consolidating and reducing allocation sizes after we know the "real" size 
> after decompression is the new idea (see comments) that can be confined to 
> the allocator and is also more flexible - no dependence on cache map, so we 
> don't need to make sure stuff is contiguous and such (for example, R_I 
> streams that we want to consolidate are interleaved with large bloom filters, 
> that we don't want to read or consolidate when they are not needed - but 
> cache key structure depends on offsets, so we'd need a new cache map for R_I 
> and separate logic for 

[jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently

2018-08-24 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20380:

Description: 
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, everywhere in the 
readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after 
decompression is the new idea (see comments).

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]



  was:
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, readers, etc.). 

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]




> LLAP cache should cache small buffers more efficiently
> --
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
> (instead of 256Kb), then after we moved metadata cache off-heap, the index 
> streams that are all tiny take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some 
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and 
> probably compute) overhead to track all these buffers. Arguably even the 4Kb 
> min.alloc is too small.
> The initial idea was to store multiple CBs per block, however this is a lot 
> of work all over the place (cache mapping, cache lookups, everywhere in the 
> readers, etc.). 
> Consolidating and reducing allocation sizes after we know the "real" size 
> after decompression is the new idea (see comments).
> There are also minor heap improvements possible.
> 1) Intern tracking tag.
> 2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
> use one method, compareAndSwap.
> One more idea is making tracking less object oriented, in particular passing 
> around integer indexes instead of objects and storing state in giant arrays 
> somewhere (potentially with some optimizations for less common things), 
> instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently

2018-08-24 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20380:

Description: 
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of 
work all over the place (cache mapping, cache lookups, readers, etc.). 

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
use one method, compareAndSwap.

One more idea is making tracking less object oriented, in particular passing 
around integer indexes instead of objects and storing state in giant arrays 
somewhere (potentially with some optimizations for less common things), instead 
of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]



  was:
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
(instead of 256Kb), then after we moved metadata cache off-heap, the index 
streams that are all tiny take up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and 
probably compute) overhead to track all these buffers. Arguably even the 4Kb 
min.alloc is too small.

We should store contiguous CBs in the same buffer; to start, we can do it for 
ROW_INDEX streams. That probably means reading all ROW_INDEX streams instead of 
doing projection when we see that they are too small.
We need to investigate what the pattern is for ORC data blocks. One option is 
to increase min.alloc and then consolidate multiple 4-8Kb CBs, but only for the 
same stream. However larger min.alloc will result in wastage for really small 
streams, so we can also consolidate multiple streams (potentially across 
columns) if needed. This will result in some priority anomalies but they 
probably ok.

Another consideration is making tracking less object oriented, in particular 
passing around integer indexes instead of objects and storing state in giant 
arrays somewhere (potentially with some optimizations for less common things), 
instead of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]




> LLAP cache should cache small buffers more efficiently
> --
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
> (instead of 256Kb), then after we moved metadata cache off-heap, the index 
> streams that are all tiny take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some 
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and 
> probably compute) overhead to track all these buffers. Arguably even the 4Kb 
> min.alloc is too small.
> The initial idea was to store multiple CBs per block, however this is a lot 
> of work all over the place (cache mapping, cache lookups, readers, etc.). 
> There are also minor heap improvements possible.
> 1) Intern tracking tag.
> 2) Replace AtomicLong object with a long and unsafe CAS method, we only ever 
> use one method, compareAndSwap.
> One more idea is making tracking less object oriented, in particular passing 
> around integer indexes instead of objects and storing state in giant arrays 
> somewhere (potentially with some optimizations for less common things), 
> instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently

2018-08-24 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20380:

Summary: LLAP cache should cache small buffers more efficiently  (was: 
explore storing multiple CBs in a single cache buffer in LLAP cache)

> LLAP cache should cache small buffers more efficiently
> --
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
> (instead of 256Kb), then after we moved metadata cache off-heap, the index 
> streams that are all tiny take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some 
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and 
> probably compute) overhead to track all these buffers. Arguably even the 4Kb 
> min.alloc is too small.
> We should store contiguous CBs in the same buffer; to start, we can do it for 
> ROW_INDEX streams. That probably means reading all ROW_INDEX streams instead 
> of doing projection when we see that they are too small.
> We need to investigate what the pattern is for ORC data blocks. One option is 
> to increase min.alloc and then consolidate multiple 4-8Kb CBs, but only for 
> the same stream. However larger min.alloc will result in wastage for really 
> small streams, so we can also consolidate multiple streams (potentially 
> across columns) if needed. This will result in some priority anomalies but 
> they probably ok.
> Another consideration is making tracking less object oriented, in particular 
> passing around integer indexes instead of objects and storing state in giant 
> arrays somewhere (potentially with some optimizations for less common 
> things), instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)