Hi Liang,

Yes Liang , it will be done in 2 parts. At first reduce the size of the
btree and then merge the driver side and executor btree to single btree.

Regards,
Ravindra.

On 17 May 2017 at 19:28, Liang Chen <chenliang6...@gmail.com> wrote:

> Hi Ravi
>
> Thank you bringing this improvement discussion to mailing list.
>
> One question , the point1 how to solve the below issues ? there are still
> two part index info in driver and executor side ?
> ------------------------------------------------------------
> ----------------------------------------
> And also chances of loading btree on each executor is more for every query
> because there is no guarantee that same block goes to same executor every
> time. It will be worse in case of dynamic containers.
>
> Regards
> Liang
>
> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <ravi.pes...@gmail.com>:
>
> > Hi,
> >
> > *1. Current problem.*
> >  1.There is more size taking on java heap to create Btree for index file.
> > It is because we create multiple objects for each leaf node so it takes
> > more memory inside heap than actual size of index file. while doing LRU
> > cache also we are considering only index file size instead of objects
> size
> > so it impacts the eviction process of LRU cache.
> >  2. Currently we load one btree on driver side to find the blocks and
> load
> > another btree on executor side to find the blocklets. After we have
> > increased the blocklet size to 128 mb and decrease the table_block size
> to
> > 256 mb the number of nodes inside driver side btree and executor side
> btree
> > is not much different. So it would be overhead to read the same
> information
> > twice.
> > And also chances of loading btree on each executor is more for every
> query
> > because there is no guarantee that same block goes to same executor every
> > time. It will be worse in case of dynamic containers.
> >
> > *2. Proposed solution.*
> >  1. To reduce the java heap for Btree , we can remove the Btree data
> > structure and use simple single array and do binary search on it. And
> also
> > we should move this cached arrays to unsafe (offheap/onheap) to reduce
> the
> > burden on GC.
> >  2. Unify the btree to single Btree instead of 2 and load at driver side.
> > So that only one lookup can be done to find the blocklets directly. And
> > executors are not required to load the btree for every query.
> >     We can consider moving this to separate metadata service eventually
> > once the memory footprint get reduced.
> >
> > First I will consider point 1 reduce the btree size after that I consider
> > merging of btrees.
> >
> > Please comment on it.
> >
> > --
> > Thanks & Regards,
> > Ravindra.
> >
>
>
>
> --
> Regards
> Liang
>



-- 
Thanks & Regards,
Ravi

Reply via email to