Hi All, Please find the link for design doc.
https://drive.google.com/file/d/1eqfIms2tMi3b63nMbKfGRZYmo7TMy E1_/view?usp=sharing -Regards Kumar Vishal On Wed, Jun 6, 2018 at 2:25 PM, Kumar Vishal <kumarvishal1...@gmail.com> wrote: > Hi Community, > > Please find the Attached Local dictionary support design document. Please > let me know for any further clarification on design document. > Any further inputs/improvements are most welcomed. > > > > -Regards > Kumar Vishal > > On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote: > >> +1 >> Good feature to add in CarbonData >> >> Regards, >> Jacky >> >> >> > 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道: >> > >> > Hi Community,Currently CarbonData supports global dictionary or >> > No-Dictionary (Plain-Text stored in LV format) for storing dimension >> column >> > data. >> > >> > *Bottleneck with Global Dictionary* >> > >> > 1. >> > >> > As dictionary file is mutable file, so it is not possible to support >> > global dictionary in storage environment which does not support >> append. >> > 2. >> > >> > It’s difficult for user to determine whether the column should be >> > dictionary or not if number of columns in table is high. >> > 3. >> > >> > Global dictionary generation generally slows down the load process >> > >> > *Bottleneck with No-Dictionary* >> > >> > 1. >> > >> > Storage size is high >> > 2. >> > >> > Query on No-Dictionary column is slower as data read/processed is more >> > 3. >> > >> > Filtering is slower on No-Dictionary columns as number of comparison >> is >> > high >> > 4. >> > >> > Memory footprint is high >> > >> > The above bottlenecks can be solved by *Generating Local dictionary for >> low >> > cardinality columns at each blocklet level, *which will help to achieve >> > below benefits: >> > >> > 1. >> > >> > This will help in supporting dictionary generation on different >> storage >> > environment irrespective of its supported operations(append) on the >> files. >> > 2. >> > >> > Reduces the extra IO operations read/write on the dictionary files >> > generated in case of global dictionary. >> > 3. >> > >> > It will eliminate the problem for user to identify the dictionary >> > columns when the number of columns are more in a table. >> > 4. >> > >> > It helps in getting more compression on dimension columns with less >> > cardinality. >> > 5. >> > >> > Filter query on No-dictionary columns with local dictionary will be >> > faster as filter will be done on encoded data. >> > 6. >> > >> > It will help in reducing the store size and memory footprint as only >> > unique values will be stored as part of local dictionary and >> > corresponding data will be stored as encoded data. >> > >> > Please provide your comment. Any suggestion from community is most >> > welcomed. Please let me know for any clarification. >> > >> > -Regards >> > Kumar Vishal >> >> >> >> >