Hi Community,

Please find the Attached Local dictionary support design document. Please
let me know for any further clarification on design document.
Any further inputs/improvements are most welcomed.



-Regards
Kumar Vishal

On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote:

> +1
> Good feature to add in CarbonData
>
> Regards,
> Jacky
>
>
> > 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道:
> >
> > Hi Community,Currently CarbonData supports global dictionary or
> > No-Dictionary (Plain-Text stored in LV format) for storing dimension
> column
> > data.
> >
> > *Bottleneck with Global Dictionary*
> >
> >   1.
> >
> >   As dictionary file is mutable file, so it is not possible to support
> >   global dictionary in storage environment which does not support append.
> >   2.
> >
> >   It’s difficult for user to determine whether the column should be
> >   dictionary or not if number of columns in table is high.
> >   3.
> >
> >   Global dictionary generation generally slows down the load process
> >
> > *Bottleneck with No-Dictionary*
> >
> >   1.
> >
> >   Storage size is high
> >   2.
> >
> >   Query on No-Dictionary column is slower as data read/processed is more
> >   3.
> >
> >   Filtering is slower on No-Dictionary columns as number of comparison is
> >   high
> >   4.
> >
> >   Memory footprint is high
> >
> > The above bottlenecks can be solved by *Generating Local dictionary for
> low
> > cardinality columns at each blocklet level, *which will help to achieve
> > below benefits:
> >
> >   1.
> >
> >   This will help in supporting dictionary generation on different storage
> >   environment irrespective of its supported operations(append) on the
> files.
> >   2.
> >
> >   Reduces the extra IO operations read/write on the dictionary files
> >   generated in case of global dictionary.
> >   3.
> >
> >   It will eliminate the problem for user to identify the dictionary
> >   columns when the number of columns are more in a table.
> >   4.
> >
> >   It helps in getting more compression on dimension columns with less
> >   cardinality.
> >   5.
> >
> >   Filter query on No-dictionary columns with local dictionary will be
> >   faster as filter will be done on encoded data.
> >   6.
> >
> >   It will help in reducing the store size and memory footprint as only
> >   unique values will be stored as part of local dictionary and
> >   corresponding data will be stored as encoded data.
> >
> > Please provide your comment. Any suggestion from community is most
> > welcomed. Please let me know for any clarification.
> >
> > -Regards
> > Kumar Vishal
>
>
>
>

Reply via email to