[ 
https://issues.apache.org/jira/browse/HIVE-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27080:
---------------------------------------
    Attachment: jdbc_project_pushdown.q

> Support project pushdown in JDBC storage handler even when filters are not 
> pushed
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-27080
>                 URL: https://issues.apache.org/jira/browse/HIVE-27080
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>         Attachments: jdbc_project_pushdown.q
>
>
> {code:sql}
> CREATE EXTERNAL TABLE book
> (
>     id int,
>     title varchar(20),
>     author int
> )
> STORED BY                                          
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (                                    
>     "hive.sql.database.type" = "POSTGRES",
>     "hive.sql.jdbc.driver" = "org.postgresql.Driver",
>     "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
>     "hive.sql.dbcp.username" = "qtestuser",
>     "hive.sql.dbcp.password" = "qtestpassword",
>     "hive.sql.table" = "book"
> );
> {code}
> {code:sql}
> explain cbo select id from book where title = 'Les Miserables';
> {code}
> {noformat}
> CBO PLAN:
> HiveJdbcConverter(convention=[JDBC.POSTGRES])
>   JdbcProject(id=[$0])
>     JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
>       JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Good case:+ Only the id column is fetched from the underlying database (see 
> JdbcProject) since it is necessary for the result.
> {code:sql}
> explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
> {code}
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE", _UTF-16LE'LES MISERABLES')])
>     HiveProject(id=[$0], title=[$1], author=[$2])
>       HiveJdbcConverter(convention=[JDBC.POSTGRES])
>         JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Bad case:+ All table columns are fetched from the database although only id 
> and title are necessary; id is the result so cannot be dropped and title is 
> needed for HiveFilter since the UPPER operation was not pushed in the DBMS. 
> The author column is not needed at all so the plan should have a JdbcProject 
> with id, and title, on top of the JdbcHiveTableScan.
> Although it doesn't seem a big deal in some cases tables are pretty wide 
> (more than 100 columns) while the queries rarely return all of them. 
> Improving project pushdown to handle such cases can give a major performance 
> boost.
> Pushing the filter with UPPER to JDBC storage handler is also a relevant 
> improvement but this should be tracked under another ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to