Re: Dictionary encoding

2018-08-06 Thread Saeid Sattari
Hi Todd,

Thank you for good descriptions :)

Regards,
Saeid

On Mon, 6 Aug 2018, 21:26 Todd Lipcon,  wrote:

> Hi Saeid,
>


> It's not based on the number of distinct values, but rather on the
> combined size of the values. I believe the default is 256kb, so assuming
> your strings are pretty short, a few thousand are likely to be able to be
> dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
> of data) so even if your overall cardinality is much larger, if you have
> some spatial locality such that rows with nearby primary keys have fewer
> distinct values, then you're likely to get benefit here.
>
> -Todd
>
> On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari 
> wrote:
>
>> Hi Kudu community,
>>
>> Does any body know what is the maximum distinct values of a String column
>> that Kudu considers in order to set its encoding to Dictionary? Many thanks
>> :)
>>
>> br,
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Dictionary encoding

2018-08-04 Thread Saeid Sattari
Hi Kudu community,

Does any body know what is the maximum distinct values of a String column
that Kudu considers in order to set its encoding to Dictionary? Many thanks
:)

br,


Re: Kudu query error

2018-05-17 Thread Saeid Sattari
Hi Yao,

Thanks. I will try it and let you know about the result.

Regards,
Saeid

On Thu, May 17, 2018 at 4:09 PM, 徐瑶 <ocla...@gmail.com> wrote:

> Hi Saeid,
>
> I think it may be had a long time interval between ScanRequest.
> Setting the parameter --scanner_ttl_ms(default 60s) larger may solve the
> problem.
>
> Yao
>
> 2018-05-17 17:36 GMT+08:00 Saeid Sattari <saeid.satt...@gmail.com>:
>
>> Hi all,
>>
>> Lately, I have a critical problem with Kudu cluster (V1.6). For most of
>> the queries submitted by Impala (even very simple and light one)
>> regularly encountered with the following error. There are 120 datanode in
>> my cluster and changed following parameters on datanodes:
>>
>> unlock_experimental_flags=true
>> rpc_service_queue_length=15000
>> rpc_acceptor_listen_backlog=1024
>>
>> The error:
>>
>>> WARNINGS: Unable to advance iterator: Timed out: Scan RPC to
>>> 172.18.77.131:7050 timed out after 0.000s (SENT): Not found: Scanner
>>> not found
>>
>>
>> Any hints and tips are appreciated. Thank you in advance.
>>
>> Best,
>> Saeid
>>
>
>


Kudu query error

2018-05-17 Thread Saeid Sattari
Hi all,

Lately, I have a critical problem with Kudu cluster (V1.6). For most of the
queries submitted by Impala (even very simple and light one)
regularly encountered with the following error. There are 120 datanode in
my cluster and changed following parameters on datanodes:

unlock_experimental_flags=true
rpc_service_queue_length=15000
rpc_acceptor_listen_backlog=1024

The error:

> WARNINGS: Unable to advance iterator: Timed out: Scan RPC to
> 172.18.77.131:7050 timed out after 0.000s (SENT): Not found: Scanner not
> found


Any hints and tips are appreciated. Thank you in advance.

Best,
Saeid


Fwd: Column Compression and Encoding

2018-05-07 Thread Saeid Sattari
Hi all,

Folks who have used the column compression and encoding in Kudu tables: can
you share your experiences with the performance?  What type of fields are
worse/better (IO bottleneck vs query return time,..) to compress. We can
collect a knowledge base regarding these subjects that users can use in the
future. Thanks.

Regards,