[ 
https://issues.apache.org/jira/browse/IMPALA-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11322:
---------------------------------------
    Description: 
IMPALA-801 adds support for virtual column INPUT_{_}FILE{_}_NAME.

Memory estimations are a bit off for virtual columns because the default 
estimations are used. Virtual columns actually require much less memory. E.g. 
INPUT_{_}FILE{_}_NAME is only allocated per file per scanner.

Also cardinality estimations are off. These could be problematic if users do 
GROUP BY based on virtual columns, because currently we overestimate the 
cardinality. This causes too high memory estimations. As a workaround, users 
can set MEM_LIMIT.

  was:
IMPALA-801 adds support for virtual column INPUT__FILE__NAME.

Though memory estimations are a bit off for virtual columns because the default 
estimations are used. Virtual columns actually require much less memory. E.g. 
INPUT__FILE__NAME is only allocated per file per scanner.

Also cardinality estimations are off. These could be problematic if users do 
GROUP BY based on virtual columns, because currently we overestimate the 
cardinality. This causes too high memory estimations. As a workaround, users 
can set MEM_LIMIT.


> Add proper estimations for virtual columns
> ------------------------------------------
>
>                 Key: IMPALA-11322
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11322
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> IMPALA-801 adds support for virtual column INPUT_{_}FILE{_}_NAME.
> Memory estimations are a bit off for virtual columns because the default 
> estimations are used. Virtual columns actually require much less memory. E.g. 
> INPUT_{_}FILE{_}_NAME is only allocated per file per scanner.
> Also cardinality estimations are off. These could be problematic if users do 
> GROUP BY based on virtual columns, because currently we overestimate the 
> cardinality. This causes too high memory estimations. As a workaround, users 
> can set MEM_LIMIT.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to