[ 
https://issues.apache.org/jira/browse/HIVE-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27080:
---------------------------------------
    Description: 
{code:sql}
CREATE EXTERNAL TABLE book
(
    id int,
    title varchar(20),
    author int
)
STORED BY                                          
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (                                    
    "hive.sql.database.type" = "POSTGRES",
    "hive.sql.jdbc.driver" = "org.postgresql.Driver",
    "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
    "hive.sql.dbcp.username" = "qtestuser",
    "hive.sql.dbcp.password" = "qtestpassword",
    "hive.sql.table" = "book"
);
{code}
{code:sql}
explain cbo select id from book where title = 'Les Miserables';
{code}
{noformat}
CBO PLAN:
HiveJdbcConverter(convention=[JDBC.POSTGRES])
  JdbcProject(id=[$0])
    JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
      JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Good case:+ Only the id column is fetched from the underlying database (see 
JdbcProject) since it is necessary for the result.
{code:sql}
explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
{code}
{noformat}
CBO PLAN:
HiveProject(id=[$0])
  HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE", _UTF-16LE'LES MISERABLES')])
    HiveProject(id=[$0], title=[$1], author=[$2])
      HiveJdbcConverter(convention=[JDBC.POSTGRES])
        JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Bad case:+ All table columns are fetched from the database although only id 
and title are necessary; id is the result so cannot be dropped and title is 
needed for HiveFilter since the UPPER operation was not pushed in the DBMS. The 
author column is not needed at all so the plan should have a JdbcProject with 
id, and title, on top of the JdbcHiveTableScan.

Although it doesn't seem a big deal in some cases tables are pretty wide (more 
than 100 columns) while the queries rarely return all of them. Improving 
project pushdown to handle such cases can give a major performance boost.

Pushing the filter with UPPER to JDBC storage handler is also a relevant 
improvement but this should be tracked under another ticket.

The problem can be reproduced by running:
{noformat}
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_project_pushdown.q 
-Dtest.output.overwrite
{noformat}

  was:
{code:sql}
CREATE EXTERNAL TABLE book
(
    id int,
    title varchar(20),
    author int
)
STORED BY                                          
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (                                    
    "hive.sql.database.type" = "POSTGRES",
    "hive.sql.jdbc.driver" = "org.postgresql.Driver",
    "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
    "hive.sql.dbcp.username" = "qtestuser",
    "hive.sql.dbcp.password" = "qtestpassword",
    "hive.sql.table" = "book"
);
{code}
{code:sql}
explain cbo select id from book where title = 'Les Miserables';
{code}
{noformat}
CBO PLAN:
HiveJdbcConverter(convention=[JDBC.POSTGRES])
  JdbcProject(id=[$0])
    JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
      JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Good case:+ Only the id column is fetched from the underlying database (see 
JdbcProject) since it is necessary for the result.
{code:sql}
explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
{code}
{noformat}
CBO PLAN:
HiveProject(id=[$0])
  HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE", _UTF-16LE'LES MISERABLES')])
    HiveProject(id=[$0], title=[$1], author=[$2])
      HiveJdbcConverter(convention=[JDBC.POSTGRES])
        JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
{noformat}
+Bad case:+ All table columns are fetched from the database although only id 
and title are necessary; id is the result so cannot be dropped and title is 
needed for HiveFilter since the UPPER operation was not pushed in the DBMS. The 
author column is not needed at all so the plan should have a JdbcProject with 
id, and title, on top of the JdbcHiveTableScan.

Although it doesn't seem a big deal in some cases tables are pretty wide (more 
than 100 columns) while the queries rarely return all of them. Improving 
project pushdown to handle such cases can give a major performance boost.

Pushing the filter with UPPER to JDBC storage handler is also a relevant 
improvement but this should be tracked under another ticket.


> Support project pushdown in JDBC storage handler even when filters are not 
> pushed
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-27080
>                 URL: https://issues.apache.org/jira/browse/HIVE-27080
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>         Attachments: jdbc_project_pushdown.q
>
>
> {code:sql}
> CREATE EXTERNAL TABLE book
> (
>     id int,
>     title varchar(20),
>     author int
> )
> STORED BY                                          
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (                                    
>     "hive.sql.database.type" = "POSTGRES",
>     "hive.sql.jdbc.driver" = "org.postgresql.Driver",
>     "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
>     "hive.sql.dbcp.username" = "qtestuser",
>     "hive.sql.dbcp.password" = "qtestpassword",
>     "hive.sql.table" = "book"
> );
> {code}
> {code:sql}
> explain cbo select id from book where title = 'Les Miserables';
> {code}
> {noformat}
> CBO PLAN:
> HiveJdbcConverter(convention=[JDBC.POSTGRES])
>   JdbcProject(id=[$0])
>     JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
>       JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Good case:+ Only the id column is fetched from the underlying database (see 
> JdbcProject) since it is necessary for the result.
> {code:sql}
> explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
> {code}
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE", _UTF-16LE'LES MISERABLES')])
>     HiveProject(id=[$0], title=[$1], author=[$2])
>       HiveJdbcConverter(convention=[JDBC.POSTGRES])
>         JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Bad case:+ All table columns are fetched from the database although only id 
> and title are necessary; id is the result so cannot be dropped and title is 
> needed for HiveFilter since the UPPER operation was not pushed in the DBMS. 
> The author column is not needed at all so the plan should have a JdbcProject 
> with id, and title, on top of the JdbcHiveTableScan.
> Although it doesn't seem a big deal in some cases tables are pretty wide 
> (more than 100 columns) while the queries rarely return all of them. 
> Improving project pushdown to handle such cases can give a major performance 
> boost.
> Pushing the filter with UPPER to JDBC storage handler is also a relevant 
> improvement but this should be tracked under another ticket.
> The problem can be reproduced by running:
> {noformat}
> mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_project_pushdown.q 
> -Dtest.output.overwrite
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to