Re: [Discussion] Carbon Local Dictionary Support

Kumar Vishal Wed, 06 Jun 2018 03:28:06 -0700

Hi All,

Please ignore above link.


Please comment here:
https://docs.google.com/document/d/1y0dJSWOr0ZTPpbNOOUfVfU5SoANL5B1F0l7jhl8BgUs/edit?usp=sharing

-Regards
Kumar Vishal

On Wed, Jun 6, 2018 at 3:06 PM, Kumar Vishal <[email protected]>
wrote:

> Hi All,
>
> Due to some problem above link is not working. Please find the updated
> link.
>
> https://drive.google.com/file/d/10LqtQlrE4jeotmleoMLJ8F91rK2Tr
> N2h/view?usp=sharing
>
> -Regards
> Kumar Vishal
>
> On Wed, Jun 6, 2018 at 2:40 PM, Kumar Vishal <[email protected]>
> wrote:
>
>> Hi All,
>>
>> Please find the link for design doc.
>>
>> https://drive.google.com/file/d/1eqfIms2tMi3b63nMbKfGRZYmo7T
>> MyE1_/view?usp=sharing
>>
>> -Regards
>> Kumar Vishal
>>
>> On Wed, Jun 6, 2018 at 2:25 PM, Kumar Vishal <[email protected]>
>> wrote:
>>
>>> Hi Community,
>>>
>>> Please find the Attached Local dictionary support design document.
>>> Please let me know for any further clarification on design document.
>>> Any further inputs/improvements are most welcomed.
>>>
>>>
>>>
>>> -Regards
>>> Kumar Vishal
>>>
>>> On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <[email protected]> wrote:
>>>
>>>> +1
>>>> Good feature to add in CarbonData
>>>>
>>>> Regards,
>>>> Jacky
>>>>
>>>>
>>>> > 在 2018年6月4日，下午11:10，Kumar Vishal <[email protected]> 写道：
>>>> >
>>>> > Hi Community,Currently CarbonData supports global dictionary or
>>>> > No-Dictionary (Plain-Text stored in LV format) for storing dimension
>>>> column
>>>> > data.
>>>> >
>>>> > *Bottleneck with Global Dictionary*
>>>> >
>>>> >   1.
>>>> >
>>>> >   As dictionary file is mutable file, so it is not possible to support
>>>> >   global dictionary in storage environment which does not support
>>>> append.
>>>> >   2.
>>>> >
>>>> >   It’s difficult for user to determine whether the column should be
>>>> >   dictionary or not if number of columns in table is high.
>>>> >   3.
>>>> >
>>>> >   Global dictionary generation generally slows down the load process
>>>> >
>>>> > *Bottleneck with No-Dictionary*
>>>> >
>>>> >   1.
>>>> >
>>>> >   Storage size is high
>>>> >   2.
>>>> >
>>>> >   Query on No-Dictionary column is slower as data read/processed is
>>>> more
>>>> >   3.
>>>> >
>>>> >   Filtering is slower on No-Dictionary columns as number of
>>>> comparison is
>>>> >   high
>>>> >   4.
>>>> >
>>>> >   Memory footprint is high
>>>> >
>>>> > The above bottlenecks can be solved by *Generating Local dictionary
>>>> for low
>>>> > cardinality columns at each blocklet level, *which will help to
>>>> achieve
>>>> > below benefits:
>>>> >
>>>> >   1.
>>>> >
>>>> >   This will help in supporting dictionary generation on different
>>>> storage
>>>> >   environment irrespective of its supported operations(append) on the
>>>> files.
>>>> >   2.
>>>> >
>>>> >   Reduces the extra IO operations read/write on the dictionary files
>>>> >   generated in case of global dictionary.
>>>> >   3.
>>>> >
>>>> >   It will eliminate the problem for user to identify the dictionary
>>>> >   columns when the number of columns are more in a table.
>>>> >   4.
>>>> >
>>>> >   It helps in getting more compression on dimension columns with less
>>>> >   cardinality.
>>>> >   5.
>>>> >
>>>> >   Filter query on No-dictionary columns with local dictionary will be
>>>> >   faster as filter will be done on encoded data.
>>>> >   6.
>>>> >
>>>> >   It will help in reducing the store size and memory footprint as only
>>>> >   unique values will be stored as part of local dictionary and
>>>> >   corresponding data will be stored as encoded data.
>>>> >
>>>> > Please provide your comment. Any suggestion from community is most
>>>> > welcomed. Please let me know for any clarification.
>>>> >
>>>> > -Regards
>>>> > Kumar Vishal
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: [Discussion] Carbon Local Dictionary Support

Reply via email to