Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

Liang Chen Wed, 17 May 2017 06:59:00 -0700

Hi Ravi

Thank you bringing this improvement discussion to mailing list.


One question , the point1 how to solve the below issues ? there are still
two part index info in driver and executor side ?
----------------------------------------------------------------------------------------------------
And also chances of loading btree on each executor is more for every query
because there is no guarantee that same block goes to same executor every
time. It will be worse in case of dynamic containers.

Regards
Liang

2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[email protected]>:

> Hi,
>
> *1. Current problem.*
>  1.There is more size taking on java heap to create Btree for index file.
> It is because we create multiple objects for each leaf node so it takes
> more memory inside heap than actual size of index file. while doing LRU
> cache also we are considering only index file size instead of objects size
> so it impacts the eviction process of LRU cache.
>  2. Currently we load one btree on driver side to find the blocks and load
> another btree on executor side to find the blocklets. After we have
> increased the blocklet size to 128 mb and decrease the table_block size to
> 256 mb the number of nodes inside driver side btree and executor side btree
> is not much different. So it would be overhead to read the same information
> twice.
> And also chances of loading btree on each executor is more for every query
> because there is no guarantee that same block goes to same executor every
> time. It will be worse in case of dynamic containers.
>
> *2. Proposed solution.*
>  1. To reduce the java heap for Btree , we can remove the Btree data
> structure and use simple single array and do binary search on it. And also
> we should move this cached arrays to unsafe (offheap/onheap) to reduce the
> burden on GC.
>  2. Unify the btree to single Btree instead of 2 and load at driver side.
> So that only one lookup can be done to find the blocklets directly. And
> executors are not required to load the btree for every query.
>     We can consider moving this to separate metadata service eventually
> once the memory footprint get reduced.
>
> First I will consider point 1 reduce the btree size after that I consider
> merging of btrees.
>
> Please comment on it.
>
> --
> Thanks & Regards,
> Ravindra.
>



-- 
Regards
Liang

Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

Reply via email to