Hi Community, Please find the Attached Local dictionary support design document. Please let me know for any further clarification on design document. Any further inputs/improvements are most welcomed.
-Regards Kumar Vishal On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote: > +1 > Good feature to add in CarbonData > > Regards, > Jacky > > > > 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道: > > > > Hi Community,Currently CarbonData supports global dictionary or > > No-Dictionary (Plain-Text stored in LV format) for storing dimension > column > > data. > > > > *Bottleneck with Global Dictionary* > > > > 1. > > > > As dictionary file is mutable file, so it is not possible to support > > global dictionary in storage environment which does not support append. > > 2. > > > > It’s difficult for user to determine whether the column should be > > dictionary or not if number of columns in table is high. > > 3. > > > > Global dictionary generation generally slows down the load process > > > > *Bottleneck with No-Dictionary* > > > > 1. > > > > Storage size is high > > 2. > > > > Query on No-Dictionary column is slower as data read/processed is more > > 3. > > > > Filtering is slower on No-Dictionary columns as number of comparison is > > high > > 4. > > > > Memory footprint is high > > > > The above bottlenecks can be solved by *Generating Local dictionary for > low > > cardinality columns at each blocklet level, *which will help to achieve > > below benefits: > > > > 1. > > > > This will help in supporting dictionary generation on different storage > > environment irrespective of its supported operations(append) on the > files. > > 2. > > > > Reduces the extra IO operations read/write on the dictionary files > > generated in case of global dictionary. > > 3. > > > > It will eliminate the problem for user to identify the dictionary > > columns when the number of columns are more in a table. > > 4. > > > > It helps in getting more compression on dimension columns with less > > cardinality. > > 5. > > > > Filter query on No-dictionary columns with local dictionary will be > > faster as filter will be done on encoded data. > > 6. > > > > It will help in reducing the store size and memory footprint as only > > unique values will be stored as part of local dictionary and > > corresponding data will be stored as encoded data. > > > > Please provide your comment. Any suggestion from community is most > > welcomed. Please let me know for any clarification. > > > > -Regards > > Kumar Vishal > > > >