Hi Thanks for detail explanation. +1 for introducing new format to improve performance further.
Regards Liang ravipesala wrote > Hi Liang, > > Backward compatibility is already handled in 1.0.0 version, so to read old > store then it uses V1/V2 format readers to read data from old store. So > backward compatibility works even though we jump to V3 format. > > Regards, > Ravindra. > > On 16 February 2017 at 04:18, Liang Chen < > chenliang6136@ > > wrote: > >> Hi Ravi >> >> Thank you bringing the discussion to mailing list, i have one question: >> how >> to ensure backward-compatible after introducing the new format. >> >> Regards >> Liang >> >> Jean-Baptiste Onofré wrote >> > Agree. >> > >> > +1 >> > >> > Regards >> > JB >> > >> > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal < >> >> > kumarvishal1802@ >> >> > > wrote: >> >>+1 >> >>This will improve the IO bottleneck. Page level min max will improve >> >>the >> >>block pruning and less number of false positive blocks will improve the >> >>filter query performance. Separating uncompression of data from reader >> >>layer will improve the overall query performance. >> >> >> >>-Regards >> >>Kumar Vishal >> >> >> >>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala >> >>< >> >> > ravi.pesala@ >> >> > > >> >>wrote: >> >> >> >>> Please find the thrift file in below location. >> >>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b >> >>> 1NqSTU2b2g4dkhkVDRj >> >>> >> >>> On 15 February 2017 at 17:14, Ravindra Pesala < >> >> > ravi.pesala@ >> >> > > >> >>> wrote: >> >>> >> >>> > Problems in current format. >> >>> > 1. IO read is slower since it needs to go for multiple seeks on the >> >>file >> >>> > to read column blocklets. Current size of blocklet is 120000, so it >> >>needs >> >>> > to read multiple times from file to scan the data on that column. >> >>> > Alternatively we can increase the blocklet size but it suffers for >> >>filter >> >>> > queries as it gets big blocklet to filter. >> >>> > 2. Decompression is slower in current format, we are using inverted >> >>index >> >>> > for faster filter queries and using NumberCompressor to compress >> >>the >> >>> > inverted index in bit wise packing. It becomes slower so we should >> >>avoid >> >>> > number compressor. One alternative is to keep blocklet size with in >> >>32000 >> >>> > so that inverted index can be written with short, but IO read >> >>suffers a >> >>> lot. >> >>> > >> >>> > To overcome from above 2 issues we are introducing new format V3. >> >>> > Here each blocklet has multiple pages with size 32000, number of >> >>pages in >> >>> > blocklet is configurable. Since we keep the page with in short >> >>limit so >> >>> no >> >>> > need compress the inverted index here. >> >>> > And maintain the max/min for each page to further prune the filter >> >>> queries. >> >>> > Read the blocklet with pages at once and keep in offheap memory. >> >>> > During filter first check the max/min range and if it is valid then >> >>go >> >>> for >> >>> > decompressing the page to filter further. >> >>> > >> >>> > Please find the attached V3 format thrift file. >> >>> > >> >>> > -- >> >>> > Thanks & Regards, >> >>> > Ravi >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Thanks & Regards, >> >>> Ravi >> >>> >> >> >> >> >> >> -- >> View this message in context: http://apache-carbondata- >> mailing-list-archive.1130556.n5.nabble.com/Introducing-V3- >> format-tp7609p7622.html >> Sent from the Apache CarbonData Mailing List archive mailing list archive >> at Nabble.com. >> > > > > -- > Thanks & Regards, > Ravi -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-format-tp7609p7645.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.