[ https://issues.apache.org/jira/browse/KUDU-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DawnZhang updated KUDU-2231: ---------------------------- Description: I ran the following SQL again and again while refresh 8050/scans page at the same time. h3. sql: {code:sql} select count(xx_id),count(yy_id),count(time) from test_table where event_id =29983; {code} "Cells read from disk" is much more greater then table size when materializing_iterator_do_pushdown = true (default). after setting materializing_iterator_do_pushdown = false "Cells read from disk" reduced to some reasonable value (close to table size) and the sql run faster. here's detail: h3. table under test: {code:sql} CREATE TABLE rawdata.test_table ( day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION, ... ... other fields ... ... PRIMARY KEY (day, user_id, time, _offset) ) PARTITION BY HASH (user_id) PARTITIONS 9 STORED AS KUDU TBLPROPERTIES ( ... ); {code} table size (select count(1) from test_table) : 19510709 h3. CASE 1, materializing_iterator_do_pushdown = true scans: https://issues.apache.org/jira/secure/attachment/12900200/756ACA6F105F0905EBCB79B940FFCE86.jpg h3. CASE 2, materializing_iterator_do_pushdown = false (sql ran faster) scans: https://issues.apache.org/jira/secure/attachment/12900199/F8C604537B8E921DDCCA78995DC11BDA.jpg it looks like kudu scan table multiple times for the simple sql caused by some silly bug. ( was: I ran the following SQL again and again while refresh 8050/scans page at the same time. h3. sql: {code:sql} select count(xx_id),count(yy_id),count(time) from test_table where event_id =29983; {code} "Cells read from disk" is much more greater then table size when materializing_iterator_do_pushdown = true (default). after setting materializing_iterator_do_pushdown = false "Cells read from disk" reduced to some reasonable value (close to table size) and the sql run faster. here's detail: h3. table under test: {code:sql} CREATE TABLE rawdata.test_table ( day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION, ... ... other fields ... ... PRIMARY KEY (day, user_id, time, _offset) ) PARTITION BY HASH (user_id) PARTITIONS 9 STORED AS KUDU TBLPROPERTIES ( ... ); {code} table size (select count(1) from test_table) : 19510709 h3. CASE 1, materializing_iterator_do_pushdown = true scans: https://issues.apache.org/jira/secure/attachment/12900200/756ACA6F105F0905EBCB79B940FFCE86.jpg !756ACA6F105F0905EBCB79B940FFCE86.jpg|thumbnail! h3. CASE 2, materializing_iterator_do_pushdown = false (sql ran faster) scans: https://issues.apache.org/jira/secure/attachment/12900199/F8C604537B8E921DDCCA78995DC11BDA.jpg !F8C604537B8E921DDCCA78995DC11BDA.jpg|thumbnail! it looks like kudu scan table multiple times for the simple sql caused by some silly bug. > "materializing_iterator_do_pushdown=true" cause simple query slow > ----------------------------------------------------------------- > > Key: KUDU-2231 > URL: https://issues.apache.org/jira/browse/KUDU-2231 > Project: Kudu > Issue Type: Bug > Components: master, tserver > Affects Versions: 1.4.0, 1.5.0 > Environment: CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64) > KUDU-1.4.0-1.cdh5.12.1.p0.10 > IMPALA 2.6.0 > x86-64 > Intel CPU > Reporter: DawnZhang > Attachments: 756ACA6F105F0905EBCB79B940FFCE86.jpg, > F8C604537B8E921DDCCA78995DC11BDA.jpg > > > I ran the following SQL again and again > while refresh 8050/scans page at the same time. > h3. sql: > {code:sql} > select count(xx_id),count(yy_id),count(time) from test_table where event_id > =29983; > {code} > "Cells read from disk" is much more greater then table size when > materializing_iterator_do_pushdown = true (default). > after setting materializing_iterator_do_pushdown = false > "Cells read from disk" reduced to some reasonable value (close to table size) > and the sql run faster. > here's detail: > h3. table under test: > {code:sql} > CREATE TABLE rawdata.test_table ( > day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, > user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION > DEFAULT_COMPRESSION, > time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION > DEFAULT_COMPRESSION, > event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, > distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > ... > ... other fields ... > ... > PRIMARY KEY (day, user_id, time, _offset) > ) > PARTITION BY HASH (user_id) PARTITIONS 9 > STORED AS KUDU > TBLPROPERTIES ( ... ); > {code} > table size (select count(1) from test_table) : 19510709 > h3. CASE 1, materializing_iterator_do_pushdown = true > scans: > https://issues.apache.org/jira/secure/attachment/12900200/756ACA6F105F0905EBCB79B940FFCE86.jpg > h3. CASE 2, materializing_iterator_do_pushdown = false (sql ran faster) > scans: > https://issues.apache.org/jira/secure/attachment/12900199/F8C604537B8E921DDCCA78995DC11BDA.jpg > it looks like kudu scan table multiple times for the simple sql caused by > some silly bug. > ( -- This message was sent by Atlassian JIRA (v6.4.14#64029)