Hi Jacky,

Default blocklet size currently is 64 MB, so if block size is 256 MB then
at most blocklets per block is 4.

Regards,
Ravindra.

On 17 May 2017 at 19:59, Jacky Li <jacky.li...@qq.com> wrote:

> +1 for both proposal 1 & 2,
>
> For point 2, do you have idea how many blocklet within one block roughly?
> This will help to estimate the length of array in driver side.
>
> Regards,
> Jacky
>
> > 在 2017年5月17日,下午7:33,Ravindra Pesala <ravi.pes...@gmail.com> 写道:
> >
> > Hi,
> >
> > *1. Current problem.*
> > 1.There is more size taking on java heap to create Btree for index file.
> > It is because we create multiple objects for each leaf node so it takes
> > more memory inside heap than actual size of index file. while doing LRU
> > cache also we are considering only index file size instead of objects
> size
> > so it impacts the eviction process of LRU cache.
> > 2. Currently we load one btree on driver side to find the blocks and load
> > another btree on executor side to find the blocklets. After we have
> > increased the blocklet size to 128 mb and decrease the table_block size
> to
> > 256 mb the number of nodes inside driver side btree and executor side
> btree
> > is not much different. So it would be overhead to read the same
> information
> > twice.
> > And also chances of loading btree on each executor is more for every
> query
> > because there is no guarantee that same block goes to same executor every
> > time. It will be worse in case of dynamic containers.
> >
> > *2. Proposed solution.*
> > 1. To reduce the java heap for Btree , we can remove the Btree data
> > structure and use simple single array and do binary search on it. And
> also
> > we should move this cached arrays to unsafe (offheap/onheap) to reduce
> the
> > burden on GC.
> > 2. Unify the btree to single Btree instead of 2 and load at driver side.
> > So that only one lookup can be done to find the blocklets directly. And
> > executors are not required to load the btree for every query.
> >    We can consider moving this to separate metadata service eventually
> > once the memory footprint get reduced.
> >
> > First I will consider point 1 reduce the btree size after that I consider
> > merging of btrees.
> >
> > Please comment on it.
> >
> > --
> > Thanks & Regards,
> > Ravindra.
>
>
>
>


-- 
Thanks & Regards,
Ravi

Reply via email to