Re: Introducing V3 format.

Liang Chen Wed, 15 Feb 2017 21:55:38 -0800

Hi

Thanks for detail explanation.
+1 for introducing new format to improve performance further.


Regards
Liang


ravipesala wrote
> Hi Liang,
> 
> Backward compatibility is already handled in 1.0.0 version, so to read old
> store then  it uses V1/V2 format readers to read data from old store. So
> backward compatibility works even though we jump to V3 format.
> 
> Regards,
> Ravindra.
> 
> On 16 February 2017 at 04:18, Liang Chen &lt;

> chenliang6136@

> &gt; wrote:
> 
>> Hi Ravi
>>
>> Thank you bringing the discussion to mailing list, i have one question:
>> how
>> to ensure backward-compatible after introducing the new format.
>>
>> Regards
>> Liang
>>
>> Jean-Baptiste Onofré wrote
>> > Agree.
>> >
>> > +1
>> >
>> > Regards
>> > JB
>> >
>> > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal &lt;
>>
>> > kumarvishal1802@
>>
>> > &gt; wrote:
>> >>+1
>> >>This will improve the IO bottleneck. Page level min max will improve
>> >>the
>> >>block pruning and less number of false positive blocks will improve the
>> >>filter query performance. Separating uncompression of data from reader
>> >>layer will improve the overall query performance.
>> >>
>> >>-Regards
>> >>Kumar Vishal
>> >>
>> >>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala
>> >>&lt;
>>
>> > ravi.pesala@
>>
>> > &gt;
>> >>wrote:
>> >>
>> >>> Please find the thrift file in below location.
>> >>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b
>> >>> 1NqSTU2b2g4dkhkVDRj
>> >>>
>> >>> On 15 February 2017 at 17:14, Ravindra Pesala &lt;
>>
>> > ravi.pesala@
>>
>> > &gt;
>> >>> wrote:
>> >>>
>> >>> > Problems in current format.
>> >>> > 1. IO read is slower since it needs to go for multiple seeks on the
>> >>file
>> >>> > to read column blocklets. Current size of blocklet is 120000, so it
>> >>needs
>> >>> > to read multiple times from file to scan the data on that column.
>> >>> > Alternatively we can increase the blocklet size but it suffers for
>> >>filter
>> >>> > queries as it gets big blocklet to filter.
>> >>> > 2. Decompression is slower in current format, we are using inverted
>> >>index
>> >>> > for faster filter queries and using NumberCompressor to compress
>> >>the
>> >>> > inverted index in bit wise packing. It becomes slower so we should
>> >>avoid
>> >>> > number compressor. One alternative is to keep blocklet size with in
>> >>32000
>> >>> > so that inverted index can be written with short, but IO read
>> >>suffers a
>> >>> lot.
>> >>> >
>> >>> > To overcome from above 2 issues we are introducing new format V3.
>> >>> > Here each blocklet has multiple pages with size 32000, number of
>> >>pages in
>> >>> > blocklet is configurable. Since we keep the page with in short
>> >>limit so
>> >>> no
>> >>> > need compress the inverted index here.
>> >>> > And maintain the max/min for each page to further prune the filter
>> >>> queries.
>> >>> > Read the blocklet with pages at once and keep in offheap memory.
>> >>> > During filter first check the max/min range and if it is valid then
>> >>go
>> >>> for
>> >>> > decompressing the page to filter further.
>> >>> >
>> >>> > Please find the attached V3 format thrift file.
>> >>> >
>> >>> > --
>> >>> > Thanks & Regards,
>> >>> > Ravi
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Thanks & Regards,
>> >>> Ravi
>> >>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-
>> mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-
>> format-tp7609p7622.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>> at Nabble.com.
>>
> 
> 
> 
> -- 
> Thanks & Regards,
> Ravi





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-format-tp7609p7645.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Introducing V3 format.

Reply via email to