DawnZhang created KUDU-2231:
-------------------------------

             Summary: "materializing_iterator_do_pushdown=true" cause simple 
query slow
                 Key: KUDU-2231
                 URL: https://issues.apache.org/jira/browse/KUDU-2231
             Project: Kudu
          Issue Type: Bug
          Components: master, tserver
    Affects Versions: 1.5.0, 1.4.0
         Environment: CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64)
KUDU-1.4.0-1.cdh5.12.1.p0.10
IMPALA 2.6.0
x86-64 
Intel CPU
            Reporter: DawnZhang
         Attachments: 756ACA6F105F0905EBCB79B940FFCE86.jpg, 
F8C604537B8E921DDCCA78995DC11BDA.jpg

I ran the following SQL again and again
while refresh 8050/scans page at the same time.

sql:
{code:sql}
select count(xx_id),count(xx_id),count(time) from  test_table  where event_id 
=29983; 
{code}

"Cells read from disk"  is much more greater then table size when 
materializing_iterator_do_pushdown = true (default).

after setting materializing_iterator_do_pushdown = false 
"Cells read from disk" reduced to some reasonable value (close to table size)
and the  sql run faster.

here's detail:

table under test:
{code:sql}
CREATE TABLE rawdata.test_table (
  day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
  user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
  time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
  event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
  distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION 
DEFAULT_COMPRESSION,
  ...
  ...  other fields ...
  ...
  PRIMARY KEY (day, user_id, time, _offset)
)
PARTITION BY HASH (user_id) PARTITIONS 9
STORED AS KUDU
TBLPROPERTIES ( ... );
{code}

table size (select count(1) from test_table) : 19510709

CASE 1, materializing_iterator_do_pushdown = true

scans:
!756ACA6F105F0905EBCB79B940FFCE86.jpg|thumbnail!

CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)

scans:
!F8C604537B8E921DDCCA78995DC11BDA.jpg|thumbnail!

it looks like kudu scan table multiple times for the simple sql caused by some 
silly bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to