Agree.

+1

Regards
JB

On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal <kumarvishal1...@gmail.com> 
wrote:
>+1
>This will improve the IO bottleneck. Page level min max will improve
>the
>block pruning and less number of false positive blocks will improve the
>filter query performance. Separating uncompression of data from reader
>layer will improve the overall query performance.
>
>-Regards
>Kumar Vishal
>
>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala
><ravi.pes...@gmail.com>
>wrote:
>
>> Please find the thrift file in below location.
>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b
>> 1NqSTU2b2g4dkhkVDRj
>>
>> On 15 February 2017 at 17:14, Ravindra Pesala <ravi.pes...@gmail.com>
>> wrote:
>>
>> > Problems in current format.
>> > 1. IO read is slower since it needs to go for multiple seeks on the
>file
>> > to read column blocklets. Current size of blocklet is 120000, so it
>needs
>> > to read multiple times from file to scan the data on that column.
>> > Alternatively we can increase the blocklet size but it suffers for
>filter
>> > queries as it gets big blocklet to filter.
>> > 2. Decompression is slower in current format, we are using inverted
>index
>> > for faster filter queries and using NumberCompressor to compress
>the
>> > inverted index in bit wise packing. It becomes slower so we should
>avoid
>> > number compressor. One alternative is to keep blocklet size with in
>32000
>> > so that inverted index can be written with short, but IO read
>suffers a
>> lot.
>> >
>> > To overcome from above 2 issues we are introducing new format V3.
>> > Here each blocklet has multiple pages with size 32000, number of
>pages in
>> > blocklet is configurable. Since we keep the page with in short
>limit so
>> no
>> > need compress the inverted index here.
>> > And maintain the max/min for each page to further prune the filter
>> queries.
>> > Read the blocklet with pages at once and keep in offheap memory.
>> > During filter first check the max/min range and if it is valid then
>go
>> for
>> > decompressing the page to filter further.
>> >
>> > Please find the attached V3 format thrift file.
>> >
>> > --
>> > Thanks & Regards,
>> > Ravi
>> >
>>
>>
>>
>> --
>> Thanks & Regards,
>> Ravi
>>

Reply via email to