+1 for both proposal 1 & 2, For point 2, do you have idea how many blocklet within one block roughly? This will help to estimate the length of array in driver side.
Regards, Jacky > 在 2017年5月17日,下午7:33,Ravindra Pesala <ravi.pes...@gmail.com> 写道: > > Hi, > > *1. Current problem.* > 1.There is more size taking on java heap to create Btree for index file. > It is because we create multiple objects for each leaf node so it takes > more memory inside heap than actual size of index file. while doing LRU > cache also we are considering only index file size instead of objects size > so it impacts the eviction process of LRU cache. > 2. Currently we load one btree on driver side to find the blocks and load > another btree on executor side to find the blocklets. After we have > increased the blocklet size to 128 mb and decrease the table_block size to > 256 mb the number of nodes inside driver side btree and executor side btree > is not much different. So it would be overhead to read the same information > twice. > And also chances of loading btree on each executor is more for every query > because there is no guarantee that same block goes to same executor every > time. It will be worse in case of dynamic containers. > > *2. Proposed solution.* > 1. To reduce the java heap for Btree , we can remove the Btree data > structure and use simple single array and do binary search on it. And also > we should move this cached arrays to unsafe (offheap/onheap) to reduce the > burden on GC. > 2. Unify the btree to single Btree instead of 2 and load at driver side. > So that only one lookup can be done to find the blocklets directly. And > executors are not required to load the btree for every query. > We can consider moving this to separate metadata service eventually > once the memory footprint get reduced. > > First I will consider point 1 reduce the btree size after that I consider > merging of btrees. > > Please comment on it. > > -- > Thanks & Regards, > Ravindra.