Re: [DISCUSSION] Remove BTree related code

2018-08-31 Thread Jacky Li
+1
Better to clean it if it is not used

Regards,
Jacky

> 在 2018年8月24日,下午6:01,Kunal Kapoor  写道:
> 
> +1 for removing unused code
> 
> 
> 
> Regards
> Kunal Kapoor
> 
> 
> On Fri, Aug 24, 2018, 2:09 PM Ravindra Pesala  wrote:
> 
>> +1
>> We can remove unused code
>> 
>> Regards,
>> Ravindra
>> 
>> On Fri, 24 Aug 2018 at 14:06, Kumar Vishal 
>> wrote:
>> 
 
 +1
>>> 
>>> Better to remove Btree code as now it is not getting used.
>>> -Regards
>>> Kumar Vishal
>>> 
>> 
>> 
>> --
>> Thanks & Regards,
>> Ravi
>> 
> 





Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

2018-08-31 Thread Akash Nilugal
Hi all,

With PR https://github.com/apache/carbondata/pull/2662
i have tested the performance and memory requirement with decoder based
fallback for local dictionary and the results are as below

1. with current implementation, data loading of 3million data was taking
around 4GB when local dictionary was enabled which is almost 10times the
memory required to load same data when local dictionary is disabled.
  With decoder based fall back, the memory requirement is reduced from
10times to almost 2 times.


2. The dataloading performance is as below.
With the current implementation, the data loading of 1 billlion data takes
around 1.1hrs
and with decoder based fallback it takes 1.2hrs, which is not much
difference, but memory requirement is reduced more.
I think this PR will help.

Consolidated points.
1. store size didn't get impacted
2. GC time didn't get impacted
3. Time impact is low as mentioned above
4. memory requirement reduced to higher level



Regards,
Akash R Nilugal

On Mon, Aug 27, 2018 at 11:51 AM Akash Nilugal 
wrote:

> Hi all,
>
> Currently, when the fallback is initiated for a column page in case of
> local dictionary, we are keeping both encoded data
> and actual data in memory and then we form the new column page without
> dictionary encoding and then at last we free the Encoded Column Page.
> Because of this offheap memory footprint increases.
>
> We can reduce the offheap memory footprint. This can be done using decoder
> based fallback mechanism.
> This means, no need to keep the actual data along with encoded data in
> encoded column page. We can keep only encoded data and to form a new column
> page, get the dictionary data from encoded column page by uncompressing and
> using dictionary data get the actual data using local dictionary generator
> and put it in new column page created and compress it again and give to
> consumer for writing blocklet.
>
> The above process may slow down the loading, but it will reduces the
> memory footprint. So we can give a property which will decide whether to
> take current fallback procedure or decoder based fallback mechanism dring
> fallback.
> Any inputs or suggestions are welcomed.
>
>
> Regards,
> Akash
>


Re: [DISCUSSION] Support Standard Spark's FileFormat interface in Carbondata

2018-08-31 Thread aaron
Does this means that we could call carbon in pyspark?




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/