Hi All, Due to some problem above link is not working. Please find the updated link.
https://drive.google.com/file/d/10LqtQlrE4jeotmleoMLJ8F91rK2TrN2h/view?usp=sharing -Regards Kumar Vishal On Wed, Jun 6, 2018 at 2:40 PM, Kumar Vishal <kumarvishal1...@gmail.com> wrote: > Hi All, > > Please find the link for design doc. > > https://drive.google.com/file/d/1eqfIms2tMi3b63nMbKfGRZYmo7T > MyE1_/view?usp=sharing > > -Regards > Kumar Vishal > > On Wed, Jun 6, 2018 at 2:25 PM, Kumar Vishal <kumarvishal1...@gmail.com> > wrote: > >> Hi Community, >> >> Please find the Attached Local dictionary support design document. Please >> let me know for any further clarification on design document. >> Any further inputs/improvements are most welcomed. >> >> >> >> -Regards >> Kumar Vishal >> >> On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote: >> >>> +1 >>> Good feature to add in CarbonData >>> >>> Regards, >>> Jacky >>> >>> >>> > 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道: >>> > >>> > Hi Community,Currently CarbonData supports global dictionary or >>> > No-Dictionary (Plain-Text stored in LV format) for storing dimension >>> column >>> > data. >>> > >>> > *Bottleneck with Global Dictionary* >>> > >>> > 1. >>> > >>> > As dictionary file is mutable file, so it is not possible to support >>> > global dictionary in storage environment which does not support >>> append. >>> > 2. >>> > >>> > It’s difficult for user to determine whether the column should be >>> > dictionary or not if number of columns in table is high. >>> > 3. >>> > >>> > Global dictionary generation generally slows down the load process >>> > >>> > *Bottleneck with No-Dictionary* >>> > >>> > 1. >>> > >>> > Storage size is high >>> > 2. >>> > >>> > Query on No-Dictionary column is slower as data read/processed is >>> more >>> > 3. >>> > >>> > Filtering is slower on No-Dictionary columns as number of comparison >>> is >>> > high >>> > 4. >>> > >>> > Memory footprint is high >>> > >>> > The above bottlenecks can be solved by *Generating Local dictionary >>> for low >>> > cardinality columns at each blocklet level, *which will help to achieve >>> > below benefits: >>> > >>> > 1. >>> > >>> > This will help in supporting dictionary generation on different >>> storage >>> > environment irrespective of its supported operations(append) on the >>> files. >>> > 2. >>> > >>> > Reduces the extra IO operations read/write on the dictionary files >>> > generated in case of global dictionary. >>> > 3. >>> > >>> > It will eliminate the problem for user to identify the dictionary >>> > columns when the number of columns are more in a table. >>> > 4. >>> > >>> > It helps in getting more compression on dimension columns with less >>> > cardinality. >>> > 5. >>> > >>> > Filter query on No-dictionary columns with local dictionary will be >>> > faster as filter will be done on encoded data. >>> > 6. >>> > >>> > It will help in reducing the store size and memory footprint as only >>> > unique values will be stored as part of local dictionary and >>> > corresponding data will be stored as encoded data. >>> > >>> > Please provide your comment. Any suggestion from community is most >>> > welcomed. Please let me know for any clarification. >>> > >>> > -Regards >>> > Kumar Vishal >>> >>> >>> >>> >> >