I have submitted the following issue,
https://issues.apache.org/jira/browse/IMPALA-10789 
<https://issues.apache.org/jira/browse/IMPALA-10789>

In some cases, the performance is better that early materialize expressions in 
ScanNode.
For example,
SELECT SUM(col), COUNT(col), MIN(col), MAX(col) FROM ( SELECT 
CAST(regexp_extract(string_col, '(\\d+)', 0) AS bigint) col FROM 
functional_parquet.alltypesagg ) t
The expression only needs to be evaluated once if materialize expressions in 
ScanNode.
I have roughly implemented this feature and the performance has improved 
significantly.
I would like to discuss whether this feature can be contributed.

Thanks!
Xianqing

Reply via email to