[ 
https://issues.apache.org/jira/browse/IMPALA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-92:
----------------------------
    Attachment:     (was: like-predicate.cc.patch)

> Significant performance difference between LIKE = 'x' AND = 'x'
> ---------------------------------------------------------------
>
>                 Key: IMPALA-92
>                 URL: https://issues.apache.org/jira/browse/IMPALA-92
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 0.6
>            Reporter: Philip Zeyliger
>            Assignee: Skye Wanderman-Milne
>             Fix For: Impala 0.7
>
>         Attachments: like-predicate.cc.patch
>
>
> I'm running the following two queries.  The only difference between them is 
> I'm using "LIKE" in one case and "=" in another, though there is no "%" in 
> the LIKE, so the effect is the same.  I was surprised to see approximately a 
> 10x difference in performance between them.
> {code}
> Query: select v1, c, count(*) FROM xxx b, yyy a  WHERE a.v1 = b.file AND v5 
> LIKE "hostId" AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000
> Returned 89 row(s) in 10.13s
> Query: select v1, c, count(*) FROM xxx b, yyy a  WHERE a.v1 = b.file AND v5 
> LIKE "hostId" AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000
> Returned 89 row(s) in 93.76s
> {code}
> I'm running
> {code}
> impalad version 0.6 RELEASE (build e675301a90e370f694d700b395a13f0265b7f09c)
> {code}
> I've attached the two query profiles.  The basic difference is in the 
> execution rate:
> {code}
> -    Averaged Fragment 2:(1m27s 0.00%)
> -      completion times: min:1m19s  max:1m32s  mean: 1m28s  stddev:4s545ms
> -      execution rates: min:35.33 MB/sec  max:41.00 MB/sec  mean:37.37 MB/sec 
>  stddev:1.90 MB/sec
> +         - RowsReturnedRate: 9.00 /sec
> +    Averaged Fragment 2:(7s906ms 0.00%)
> +      completion times: min:7s620ms  max:9s495ms  mean: 8s056ms  stddev:653ms
> +      execution rates: min:342.95 MB/sec  max:436.42 MB/sec  mean:409.84 
> MB/sec  stddev:31.25 MB/sec
> {code}
> Obviously I've fixed my query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to