Re: Dictionary encoding
Hi Todd, Thank you for good descriptions :) Regards, Saeid On Mon, 6 Aug 2018, 21:26 Todd Lipcon, wrote: > Hi Saeid, > > It's not based on the number of distinct values, but rather on the > combined size of the values. I believe the default is 256kb, so assuming > your strings are pretty short, a few thousand are likely to be able to be > dict-encoded. Note that dictionaries are calculated per-rowset (small chunk > of data) so even if your overall cardinality is much larger, if you have > some spatial locality such that rows with nearby primary keys have fewer > distinct values, then you're likely to get benefit here. > > -Todd > > On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari > wrote: > >> Hi Kudu community, >> >> Does any body know what is the maximum distinct values of a String column >> that Kudu considers in order to set its encoding to Dictionary? Many thanks >> :) >> >> br, >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
Re: Dictionary encoding
Hi Saeid, It's not based on the number of distinct values, but rather on the combined size of the values. I believe the default is 256kb, so assuming your strings are pretty short, a few thousand are likely to be able to be dict-encoded. Note that dictionaries are calculated per-rowset (small chunk of data) so even if your overall cardinality is much larger, if you have some spatial locality such that rows with nearby primary keys have fewer distinct values, then you're likely to get benefit here. -Todd On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari wrote: > Hi Kudu community, > > Does any body know what is the maximum distinct values of a String column > that Kudu considers in order to set its encoding to Dictionary? Many thanks > :) > > br, > > -- Todd Lipcon Software Engineer, Cloudera
Dictionary encoding
Hi Kudu community, Does any body know what is the maximum distinct values of a String column that Kudu considers in order to set its encoding to Dictionary? Many thanks :) br,