[ 
https://issues.apache.org/jira/browse/KUDU-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296178#comment-16296178
 ] 

Dan Burkert commented on KUDU-2231:
-----------------------------------

I ran into this same issue independently in the past week with a production 
workload.  Moving a column further down in the primary key made this issue crop 
up, and I believe it's responsible for a 5 to 10x slowdown in a fully-cached 
COUNT(*) query with a predicate over 100M+ rows.

> "materializing_iterator_do_pushdown=true" cause simple query slow
> -----------------------------------------------------------------
>
>                 Key: KUDU-2231
>                 URL: https://issues.apache.org/jira/browse/KUDU-2231
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64)
> KUDU-1.4.0-1.cdh5.12.1.p0.10
> IMPALA 2.6.0
> x86-64 
> Intel CPU
>            Reporter: DawnZhang
>            Assignee: Dan Burkert
>         Attachments: 756ACA6F105F0905EBCB79B940FFCE86.jpg, 
> F8C604537B8E921DDCCA78995DC11BDA.jpg, screenshot-1.png
>
>
> I ran the following SQL again and again
> while refresh 8050/scans page at the same time.
> sql:
> {code:sql}
> select count(xx_id),count(yy_id),count(time) from  test_table  where event_id 
> =29983; 
> {code}
> "Cells read from disk"  is much more greater then table size when 
> materializing_iterator_do_pushdown = true (default).
> after setting materializing_iterator_do_pushdown = false 
> "Cells read from disk" reduced to some reasonable value (close to table size)
> and the  sql run faster.
> here's detail:
> table under test:
> {code:sql}
> CREATE TABLE rawdata.test_table (
>   day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
>   user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION 
> DEFAULT_COMPRESSION,
>   time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION 
> DEFAULT_COMPRESSION,
>   event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
>   distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   ...
>   ...  other fields ...
>   ...
>   PRIMARY KEY (day, user_id, time, _offset)
> )
> PARTITION BY HASH (user_id) PARTITIONS 9
> STORED AS KUDU
> TBLPROPERTIES ( ... );
> {code}
> table size (select count(1) from test_table) : 19510709
> CASE 1, materializing_iterator_do_pushdown = true
> [^756ACA6F105F0905EBCB79B940FFCE86.jpg]
> CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)
> [^F8C604537B8E921DDCCA78995DC11BDA.jpg]
> it looks like kudu scan table multiple times for the simple sql caused by 
> some silly bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to