Re: Dictionary encoding
Hi Todd, Thank you for good descriptions :) Regards, Saeid On Mon, 6 Aug 2018, 21:26 Todd Lipcon, wrote: > Hi Saeid, > > It's not based on the number of distinct values, but rather on the > combined size of the values. I believe the default is 256kb, so assuming > your strings are pretty short, a few thousand are likely to be able to be > dict-encoded. Note that dictionaries are calculated per-rowset (small chunk > of data) so even if your overall cardinality is much larger, if you have > some spatial locality such that rows with nearby primary keys have fewer > distinct values, then you're likely to get benefit here. > > -Todd > > On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari > wrote: > >> Hi Kudu community, >> >> Does any body know what is the maximum distinct values of a String column >> that Kudu considers in order to set its encoding to Dictionary? Many thanks >> :) >> >> br, >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
Dictionary encoding
Hi Kudu community, Does any body know what is the maximum distinct values of a String column that Kudu considers in order to set its encoding to Dictionary? Many thanks :) br,
Re: Kudu query error
Hi Yao, Thanks. I will try it and let you know about the result. Regards, Saeid On Thu, May 17, 2018 at 4:09 PM, 徐瑶 <ocla...@gmail.com> wrote: > Hi Saeid, > > I think it may be had a long time interval between ScanRequest. > Setting the parameter --scanner_ttl_ms(default 60s) larger may solve the > problem. > > Yao > > 2018-05-17 17:36 GMT+08:00 Saeid Sattari <saeid.satt...@gmail.com>: > >> Hi all, >> >> Lately, I have a critical problem with Kudu cluster (V1.6). For most of >> the queries submitted by Impala (even very simple and light one) >> regularly encountered with the following error. There are 120 datanode in >> my cluster and changed following parameters on datanodes: >> >> unlock_experimental_flags=true >> rpc_service_queue_length=15000 >> rpc_acceptor_listen_backlog=1024 >> >> The error: >> >>> WARNINGS: Unable to advance iterator: Timed out: Scan RPC to >>> 172.18.77.131:7050 timed out after 0.000s (SENT): Not found: Scanner >>> not found >> >> >> Any hints and tips are appreciated. Thank you in advance. >> >> Best, >> Saeid >> > >
Kudu query error
Hi all, Lately, I have a critical problem with Kudu cluster (V1.6). For most of the queries submitted by Impala (even very simple and light one) regularly encountered with the following error. There are 120 datanode in my cluster and changed following parameters on datanodes: unlock_experimental_flags=true rpc_service_queue_length=15000 rpc_acceptor_listen_backlog=1024 The error: > WARNINGS: Unable to advance iterator: Timed out: Scan RPC to > 172.18.77.131:7050 timed out after 0.000s (SENT): Not found: Scanner not > found Any hints and tips are appreciated. Thank you in advance. Best, Saeid
Fwd: Column Compression and Encoding
Hi all, Folks who have used the column compression and encoding in Kudu tables: can you share your experiences with the performance? What type of fields are worse/better (IO bottleneck vs query return time,..) to compress. We can collect a knowledge base regarding these subjects that users can use in the future. Thanks. Regards,