[ https://issues.apache.org/jira/browse/KUDU-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296178#comment-16296178 ]
Dan Burkert commented on KUDU-2231: ----------------------------------- I ran into this same issue independently in the past week with a production workload. Moving a column further down in the primary key made this issue crop up, and I believe it's responsible for a 5 to 10x slowdown in a fully-cached COUNT(*) query with a predicate over 100M+ rows. > "materializing_iterator_do_pushdown=true" cause simple query slow > ----------------------------------------------------------------- > > Key: KUDU-2231 > URL: https://issues.apache.org/jira/browse/KUDU-2231 > Project: Kudu > Issue Type: Bug > Components: master, tserver > Affects Versions: 1.4.0, 1.5.0 > Environment: CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64) > KUDU-1.4.0-1.cdh5.12.1.p0.10 > IMPALA 2.6.0 > x86-64 > Intel CPU > Reporter: DawnZhang > Assignee: Dan Burkert > Attachments: 756ACA6F105F0905EBCB79B940FFCE86.jpg, > F8C604537B8E921DDCCA78995DC11BDA.jpg, screenshot-1.png > > > I ran the following SQL again and again > while refresh 8050/scans page at the same time. > sql: > {code:sql} > select count(xx_id),count(yy_id),count(time) from test_table where event_id > =29983; > {code} > "Cells read from disk" is much more greater then table size when > materializing_iterator_do_pushdown = true (default). > after setting materializing_iterator_do_pushdown = false > "Cells read from disk" reduced to some reasonable value (close to table size) > and the sql run faster. > here's detail: > table under test: > {code:sql} > CREATE TABLE rawdata.test_table ( > day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, > user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION > DEFAULT_COMPRESSION, > time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION > DEFAULT_COMPRESSION, > event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, > distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > ... > ... other fields ... > ... > PRIMARY KEY (day, user_id, time, _offset) > ) > PARTITION BY HASH (user_id) PARTITIONS 9 > STORED AS KUDU > TBLPROPERTIES ( ... ); > {code} > table size (select count(1) from test_table) : 19510709 > CASE 1, materializing_iterator_do_pushdown = true > [^756ACA6F105F0905EBCB79B940FFCE86.jpg] > CASE 2, materializing_iterator_do_pushdown = false (sql ran faster) > [^F8C604537B8E921DDCCA78995DC11BDA.jpg] > it looks like kudu scan table multiple times for the simple sql caused by > some silly bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029)