Hi All, Please ignore above link.
Please comment here: https://docs.google.com/document/d/1y0dJSWOr0ZTPpbNOOUfVfU5SoANL5B1F0l7jhl8BgUs/edit?usp=sharing -Regards Kumar Vishal On Wed, Jun 6, 2018 at 3:06 PM, Kumar Vishal <kumarvishal1...@gmail.com> wrote: > Hi All, > > Due to some problem above link is not working. Please find the updated > link. > > https://drive.google.com/file/d/10LqtQlrE4jeotmleoMLJ8F91rK2Tr > N2h/view?usp=sharing > > -Regards > Kumar Vishal > > On Wed, Jun 6, 2018 at 2:40 PM, Kumar Vishal <kumarvishal1...@gmail.com> > wrote: > >> Hi All, >> >> Please find the link for design doc. >> >> https://drive.google.com/file/d/1eqfIms2tMi3b63nMbKfGRZYmo7T >> MyE1_/view?usp=sharing >> >> -Regards >> Kumar Vishal >> >> On Wed, Jun 6, 2018 at 2:25 PM, Kumar Vishal <kumarvishal1...@gmail.com> >> wrote: >> >>> Hi Community, >>> >>> Please find the Attached Local dictionary support design document. >>> Please let me know for any further clarification on design document. >>> Any further inputs/improvements are most welcomed. >>> >>> >>> >>> -Regards >>> Kumar Vishal >>> >>> On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote: >>> >>>> +1 >>>> Good feature to add in CarbonData >>>> >>>> Regards, >>>> Jacky >>>> >>>> >>>> > 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道: >>>> > >>>> > Hi Community,Currently CarbonData supports global dictionary or >>>> > No-Dictionary (Plain-Text stored in LV format) for storing dimension >>>> column >>>> > data. >>>> > >>>> > *Bottleneck with Global Dictionary* >>>> > >>>> > 1. >>>> > >>>> > As dictionary file is mutable file, so it is not possible to support >>>> > global dictionary in storage environment which does not support >>>> append. >>>> > 2. >>>> > >>>> > It’s difficult for user to determine whether the column should be >>>> > dictionary or not if number of columns in table is high. >>>> > 3. >>>> > >>>> > Global dictionary generation generally slows down the load process >>>> > >>>> > *Bottleneck with No-Dictionary* >>>> > >>>> > 1. >>>> > >>>> > Storage size is high >>>> > 2. >>>> > >>>> > Query on No-Dictionary column is slower as data read/processed is >>>> more >>>> > 3. >>>> > >>>> > Filtering is slower on No-Dictionary columns as number of >>>> comparison is >>>> > high >>>> > 4. >>>> > >>>> > Memory footprint is high >>>> > >>>> > The above bottlenecks can be solved by *Generating Local dictionary >>>> for low >>>> > cardinality columns at each blocklet level, *which will help to >>>> achieve >>>> > below benefits: >>>> > >>>> > 1. >>>> > >>>> > This will help in supporting dictionary generation on different >>>> storage >>>> > environment irrespective of its supported operations(append) on the >>>> files. >>>> > 2. >>>> > >>>> > Reduces the extra IO operations read/write on the dictionary files >>>> > generated in case of global dictionary. >>>> > 3. >>>> > >>>> > It will eliminate the problem for user to identify the dictionary >>>> > columns when the number of columns are more in a table. >>>> > 4. >>>> > >>>> > It helps in getting more compression on dimension columns with less >>>> > cardinality. >>>> > 5. >>>> > >>>> > Filter query on No-dictionary columns with local dictionary will be >>>> > faster as filter will be done on encoded data. >>>> > 6. >>>> > >>>> > It will help in reducing the store size and memory footprint as only >>>> > unique values will be stored as part of local dictionary and >>>> > corresponding data will be stored as encoded data. >>>> > >>>> > Please provide your comment. Any suggestion from community is most >>>> > welcomed. Please let me know for any clarification. >>>> > >>>> > -Regards >>>> > Kumar Vishal >>>> >>>> >>>> >>>> >>> >> >