Hi Liang, Backward compatibility is already handled in 1.0.0 version, so to read old store then it uses V1/V2 format readers to read data from old store. So backward compatibility works even though we jump to V3 format.
Regards, Ravindra. On 16 February 2017 at 04:18, Liang Chen <chenliang6...@gmail.com> wrote: > Hi Ravi > > Thank you bringing the discussion to mailing list, i have one question: how > to ensure backward-compatible after introducing the new format. > > Regards > Liang > > Jean-Baptiste Onofré wrote > > Agree. > > > > +1 > > > > Regards > > JB > > > > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal < > > > kumarvishal1802@ > > > > wrote: > >>+1 > >>This will improve the IO bottleneck. Page level min max will improve > >>the > >>block pruning and less number of false positive blocks will improve the > >>filter query performance. Separating uncompression of data from reader > >>layer will improve the overall query performance. > >> > >>-Regards > >>Kumar Vishal > >> > >>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala > >>< > > > ravi.pesala@ > > > > > >>wrote: > >> > >>> Please find the thrift file in below location. > >>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b > >>> 1NqSTU2b2g4dkhkVDRj > >>> > >>> On 15 February 2017 at 17:14, Ravindra Pesala < > > > ravi.pesala@ > > > > > >>> wrote: > >>> > >>> > Problems in current format. > >>> > 1. IO read is slower since it needs to go for multiple seeks on the > >>file > >>> > to read column blocklets. Current size of blocklet is 120000, so it > >>needs > >>> > to read multiple times from file to scan the data on that column. > >>> > Alternatively we can increase the blocklet size but it suffers for > >>filter > >>> > queries as it gets big blocklet to filter. > >>> > 2. Decompression is slower in current format, we are using inverted > >>index > >>> > for faster filter queries and using NumberCompressor to compress > >>the > >>> > inverted index in bit wise packing. It becomes slower so we should > >>avoid > >>> > number compressor. One alternative is to keep blocklet size with in > >>32000 > >>> > so that inverted index can be written with short, but IO read > >>suffers a > >>> lot. > >>> > > >>> > To overcome from above 2 issues we are introducing new format V3. > >>> > Here each blocklet has multiple pages with size 32000, number of > >>pages in > >>> > blocklet is configurable. Since we keep the page with in short > >>limit so > >>> no > >>> > need compress the inverted index here. > >>> > And maintain the max/min for each page to further prune the filter > >>> queries. > >>> > Read the blocklet with pages at once and keep in offheap memory. > >>> > During filter first check the max/min range and if it is valid then > >>go > >>> for > >>> > decompressing the page to filter further. > >>> > > >>> > Please find the attached V3 format thrift file. > >>> > > >>> > -- > >>> > Thanks & Regards, > >>> > Ravi > >>> > > >>> > >>> > >>> > >>> -- > >>> Thanks & Regards, > >>> Ravi > >>> > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Introducing-V3- > format-tp7609p7622.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi