Re: Introducing V3 format.

Ravindra Pesala Wed, 15 Feb 2017 19:51:12 -0800

Hi Liang,

Backward compatibility is already handled in 1.0.0 version, so to read old
store then  it uses V1/V2 format readers to read data from old store. So
backward compatibility works even though we jump to V3 format.


Regards,
Ravindra.

On 16 February 2017 at 04:18, Liang Chen <chenliang6...@gmail.com> wrote:

> Hi Ravi
>
> Thank you bringing the discussion to mailing list, i have one question: how
> to ensure backward-compatible after introducing the new format.
>
> Regards
> Liang
>
> Jean-Baptiste Onofré wrote
> > Agree.
> >
> > +1
> >
> > Regards
> > JB
> >
> > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal &lt;
>
> > kumarvishal1802@
>
> > &gt; wrote:
> >>+1
> >>This will improve the IO bottleneck. Page level min max will improve
> >>the
> >>block pruning and less number of false positive blocks will improve the
> >>filter query performance. Separating uncompression of data from reader
> >>layer will improve the overall query performance.
> >>
> >>-Regards
> >>Kumar Vishal
> >>
> >>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala
> >>&lt;
>
> > ravi.pesala@
>
> > &gt;
> >>wrote:
> >>
> >>> Please find the thrift file in below location.
> >>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b
> >>> 1NqSTU2b2g4dkhkVDRj
> >>>
> >>> On 15 February 2017 at 17:14, Ravindra Pesala &lt;
>
> > ravi.pesala@
>
> > &gt;
> >>> wrote:
> >>>
> >>> > Problems in current format.
> >>> > 1. IO read is slower since it needs to go for multiple seeks on the
> >>file
> >>> > to read column blocklets. Current size of blocklet is 120000, so it
> >>needs
> >>> > to read multiple times from file to scan the data on that column.
> >>> > Alternatively we can increase the blocklet size but it suffers for
> >>filter
> >>> > queries as it gets big blocklet to filter.
> >>> > 2. Decompression is slower in current format, we are using inverted
> >>index
> >>> > for faster filter queries and using NumberCompressor to compress
> >>the
> >>> > inverted index in bit wise packing. It becomes slower so we should
> >>avoid
> >>> > number compressor. One alternative is to keep blocklet size with in
> >>32000
> >>> > so that inverted index can be written with short, but IO read
> >>suffers a
> >>> lot.
> >>> >
> >>> > To overcome from above 2 issues we are introducing new format V3.
> >>> > Here each blocklet has multiple pages with size 32000, number of
> >>pages in
> >>> > blocklet is configurable. Since we keep the page with in short
> >>limit so
> >>> no
> >>> > need compress the inverted index here.
> >>> > And maintain the max/min for each page to further prune the filter
> >>> queries.
> >>> > Read the blocklet with pages at once and keep in offheap memory.
> >>> > During filter first check the max/min range and if it is valid then
> >>go
> >>> for
> >>> > decompressing the page to filter further.
> >>> >
> >>> > Please find the attached V3 format thrift file.
> >>> >
> >>> > --
> >>> > Thanks & Regards,
> >>> > Ravi
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks & Regards,
> >>> Ravi
> >>>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-
> format-tp7609p7622.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 
Thanks & Regards,
Ravi

Re: Introducing V3 format.

Reply via email to