Re: [Discussion] Carbon Local Dictionary Support

Kumar Vishal Wed, 06 Jun 2018 02:10:31 -0700

Hi All,

Please find the link for design doc.


https://drive.google.com/file/d/1eqfIms2tMi3b63nMbKfGRZYmo7TMy
E1_/view?usp=sharing

-Regards
Kumar Vishal

On Wed, Jun 6, 2018 at 2:25 PM, Kumar Vishal <kumarvishal1...@gmail.com>
wrote:

> Hi Community,
>
> Please find the Attached Local dictionary support design document. Please
> let me know for any further clarification on design document.
> Any further inputs/improvements are most welcomed.
>
>
>
> -Regards
> Kumar Vishal
>
> On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <jacky.li...@qq.com> wrote:
>
>> +1
>> Good feature to add in CarbonData
>>
>> Regards,
>> Jacky
>>
>>
>> > 在 2018年6月4日，下午11:10，Kumar Vishal <kumarvishal1...@gmail.com> 写道：
>> >
>> > Hi Community,Currently CarbonData supports global dictionary or
>> > No-Dictionary (Plain-Text stored in LV format) for storing dimension
>> column
>> > data.
>> >
>> > *Bottleneck with Global Dictionary*
>> >
>> >   1.
>> >
>> >   As dictionary file is mutable file, so it is not possible to support
>> >   global dictionary in storage environment which does not support
>> append.
>> >   2.
>> >
>> >   It’s difficult for user to determine whether the column should be
>> >   dictionary or not if number of columns in table is high.
>> >   3.
>> >
>> >   Global dictionary generation generally slows down the load process
>> >
>> > *Bottleneck with No-Dictionary*
>> >
>> >   1.
>> >
>> >   Storage size is high
>> >   2.
>> >
>> >   Query on No-Dictionary column is slower as data read/processed is more
>> >   3.
>> >
>> >   Filtering is slower on No-Dictionary columns as number of comparison
>> is
>> >   high
>> >   4.
>> >
>> >   Memory footprint is high
>> >
>> > The above bottlenecks can be solved by *Generating Local dictionary for
>> low
>> > cardinality columns at each blocklet level, *which will help to achieve
>> > below benefits:
>> >
>> >   1.
>> >
>> >   This will help in supporting dictionary generation on different
>> storage
>> >   environment irrespective of its supported operations(append) on the
>> files.
>> >   2.
>> >
>> >   Reduces the extra IO operations read/write on the dictionary files
>> >   generated in case of global dictionary.
>> >   3.
>> >
>> >   It will eliminate the problem for user to identify the dictionary
>> >   columns when the number of columns are more in a table.
>> >   4.
>> >
>> >   It helps in getting more compression on dimension columns with less
>> >   cardinality.
>> >   5.
>> >
>> >   Filter query on No-dictionary columns with local dictionary will be
>> >   faster as filter will be done on encoded data.
>> >   6.
>> >
>> >   It will help in reducing the store size and memory footprint as only
>> >   unique values will be stored as part of local dictionary and
>> >   corresponding data will be stored as encoded data.
>> >
>> > Please provide your comment. Any suggestion from community is most
>> > welcomed. Please let me know for any clarification.
>> >
>> > -Regards
>> > Kumar Vishal
>>
>>
>>
>>
>

Re: [Discussion] Carbon Local Dictionary Support

Reply via email to