Re: Dictionary encoding

2018-08-06 Thread Saeid Sattari
Hi Todd,

Thank you for good descriptions :)

Regards,
Saeid

On Mon, 6 Aug 2018, 21:26 Todd Lipcon,  wrote:

> Hi Saeid,
>


> It's not based on the number of distinct values, but rather on the
> combined size of the values. I believe the default is 256kb, so assuming
> your strings are pretty short, a few thousand are likely to be able to be
> dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
> of data) so even if your overall cardinality is much larger, if you have
> some spatial locality such that rows with nearby primary keys have fewer
> distinct values, then you're likely to get benefit here.
>
> -Todd
>
> On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari 
> wrote:
>
>> Hi Kudu community,
>>
>> Does any body know what is the maximum distinct values of a String column
>> that Kudu considers in order to set its encoding to Dictionary? Many thanks
>> :)
>>
>> br,
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Dictionary encoding

2018-08-06 Thread Todd Lipcon
Hi Saeid,

It's not based on the number of distinct values, but rather on the combined
size of the values. I believe the default is 256kb, so assuming your
strings are pretty short, a few thousand are likely to be able to be
dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
of data) so even if your overall cardinality is much larger, if you have
some spatial locality such that rows with nearby primary keys have fewer
distinct values, then you're likely to get benefit here.

-Todd

On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari 
wrote:

> Hi Kudu community,
>
> Does any body know what is the maximum distinct values of a String column
> that Kudu considers in order to set its encoding to Dictionary? Many thanks
> :)
>
> br,
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Dictionary encoding

2018-08-04 Thread Saeid Sattari
Hi Kudu community,

Does any body know what is the maximum distinct values of a String column
that Kudu considers in order to set its encoding to Dictionary? Many thanks
:)

br,