[ 
https://issues.apache.org/jira/browse/CALCITE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

itxiangkui updated CALCITE-5190:
--------------------------------
    Affects Version/s: 1.30.0

> TPC-H testing of the Calcite framework
> --------------------------------------
>
>                 Key: CALCITE-5190
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5190
>             Project: Calcite
>          Issue Type: Wish
>    Affects Versions: 1.30.0
>            Reporter: itxiangkui
>            Priority: Major
>
> What our project is doing:
> 1. Designed a time series database, made full use of the enumerable adapter, 
> and made a horizontal comparison of benchmarks with other popular time 
> databases
> 2. Design catalog->database->table, and use the adapter function of jdbc to 
> connect multiple databases such as Mysql/Es in the backend to solve the join 
> problem of heterogeneous data (for example: select * from catalog1.database 
> .table1 ,catalog2,database.table2 .. where t1.c1='xx' )
> But when we tested it, we found that, as far as Jdbc-Adapter is concerned, 
> Calcite will disassemble the logical SQL into very fragmented query plan SQL, 
> and hand it over to the underlying database instance to execute the specific 
> physical SQL. Most of the SQL has no filter conditions to push down and the 
> physical SQL looks very clumsy.
> So we did a version of the TPC-H test (an industry standard in the database 
> field), taking SQL-19 as an example:
> {code:sql}
> SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue
> FROM lineitem,
>      part
> WHERE (p_partkey = l_partkey
>        AND p_brand = 'Brand#52'
>        AND p_container in ('SM CASE',
>                            'SM BOX',
>                            'SM PACK',
>                            'SM PKG')
>        AND l_quantity >= 4
>        AND l_quantity <= 4 + 10
>        AND p_size BETWEEN 1 AND 5
>        AND l_shipmode in ('AIR',
>                           'AIR REG')
>        AND l_shipinstruct = 'DELIVER IN PERSON')
>   OR (p_partkey = l_partkey
>       AND p_brand = 'Brand#11'
>       AND p_container in ('MED BAG',
>                           'MED BOX',
>                           'MED PKG',
>                           'MED PACK')
>       AND l_quantity >= 18
>       AND l_quantity <= 18 + 10
>       AND p_size BETWEEN 1 AND 10
>       AND l_shipmode in ('AIR',
>                          'AIR REG')
>       AND l_shipinstruct = 'DELIVER IN PERSON')
>   OR (p_partkey = l_partkey
>       AND p_brand = 'Brand#51'
>       AND p_container in ('LG CASE',
>                           'LG BOX',
>                           'LG PACK',
>                           'LG PKG')
>       AND l_quantity >= 29
>       AND l_quantity <= 29 + 10
>       AND p_size BETWEEN 1 AND 15
>       AND l_shipmode in ('AIR',
>                          'AIR REG')
>       AND l_shipinstruct = 'DELIVER IN PERSON');
> {code}
> The physical execution plan is:
> {code:sql}
> [SELECT *
> FROM `tpch`.`lineitem`
> LIMIT 3]
> [SELECT `L_PARTKEY`, `L_QUANTITY`, `L_EXTENDEDPRICE`, `L_DISCOUNT`, 
> `L_SHIPINSTRUCT`, `L_SHIPMODE`
> FROM `tpch`.`lineitem`]
> [SELECT `P_PARTKEY`, `P_BRAND`, `P_SIZE`, `P_CONTAINER`
> FROM `tpch`.`part`]
> {code}
> this looks incredible...
>  
> My question is: Has Calcite ever tested something like TPC-H.
> In theory, for a specific database instance, the query speed after using 
> calcite will not drop much performance compared to the original database, so 
> many users are happy to use Calcite to solve the problem of data islands. …



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to