Csaba Ringhofer created IMPALA-6266:
---------------------------------------

             Summary: Runtime filters should not have non-deterministic 
expression on consumer side
                 Key: IMPALA-6266
                 URL: https://issues.apache.org/jira/browse/IMPALA-6266
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.10.0
            Reporter: Csaba Ringhofer


Random expressions on the consumer side of runtime filters are evaluated 
independently from the "final" join, which gives +1 chance for rows to be 
dropped. This means that the same query can return less or different rows if 
the runtime fiiter was used than if not.

Example:
use tpch_parquet;

set DISABLE_ROW_RUNTIME_FILTERING=0;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as 
int) = n_nationkey;
result: 9722

set DISABLE_ROW_RUNTIME_FILTERING=1;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as 
int) = n_nationkey;
result: 9803

( rand() is pseudo-random, so running the same query without changing to query 
option always returns the same result)

Optimizations like runtime filters should have no effect on the results, even 
in case of non-deterministic expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to