[ https://issues.apache.org/jira/browse/CALCITE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
itxiangkui updated CALCITE-5190: -------------------------------- Affects Version/s: 1.30.0 > TPC-H testing of the Calcite framework > -------------------------------------- > > Key: CALCITE-5190 > URL: https://issues.apache.org/jira/browse/CALCITE-5190 > Project: Calcite > Issue Type: Wish > Affects Versions: 1.30.0 > Reporter: itxiangkui > Priority: Major > > What our project is doing: > 1. Designed a time series database, made full use of the enumerable adapter, > and made a horizontal comparison of benchmarks with other popular time > databases > 2. Design catalog->database->table, and use the adapter function of jdbc to > connect multiple databases such as Mysql/Es in the backend to solve the join > problem of heterogeneous data (for example: select * from catalog1.database > .table1 ,catalog2,database.table2 .. where t1.c1='xx' ) > But when we tested it, we found that, as far as Jdbc-Adapter is concerned, > Calcite will disassemble the logical SQL into very fragmented query plan SQL, > and hand it over to the underlying database instance to execute the specific > physical SQL. Most of the SQL has no filter conditions to push down and the > physical SQL looks very clumsy. > So we did a version of the TPC-H test (an industry standard in the database > field), taking SQL-19 as an example: > {code:sql} > SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue > FROM lineitem, > part > WHERE (p_partkey = l_partkey > AND p_brand = 'Brand#52' > AND p_container in ('SM CASE', > 'SM BOX', > 'SM PACK', > 'SM PKG') > AND l_quantity >= 4 > AND l_quantity <= 4 + 10 > AND p_size BETWEEN 1 AND 5 > AND l_shipmode in ('AIR', > 'AIR REG') > AND l_shipinstruct = 'DELIVER IN PERSON') > OR (p_partkey = l_partkey > AND p_brand = 'Brand#11' > AND p_container in ('MED BAG', > 'MED BOX', > 'MED PKG', > 'MED PACK') > AND l_quantity >= 18 > AND l_quantity <= 18 + 10 > AND p_size BETWEEN 1 AND 10 > AND l_shipmode in ('AIR', > 'AIR REG') > AND l_shipinstruct = 'DELIVER IN PERSON') > OR (p_partkey = l_partkey > AND p_brand = 'Brand#51' > AND p_container in ('LG CASE', > 'LG BOX', > 'LG PACK', > 'LG PKG') > AND l_quantity >= 29 > AND l_quantity <= 29 + 10 > AND p_size BETWEEN 1 AND 15 > AND l_shipmode in ('AIR', > 'AIR REG') > AND l_shipinstruct = 'DELIVER IN PERSON'); > {code} > The physical execution plan is: > {code:sql} > [SELECT * > FROM `tpch`.`lineitem` > LIMIT 3] > [SELECT `L_PARTKEY`, `L_QUANTITY`, `L_EXTENDEDPRICE`, `L_DISCOUNT`, > `L_SHIPINSTRUCT`, `L_SHIPMODE` > FROM `tpch`.`lineitem`] > [SELECT `P_PARTKEY`, `P_BRAND`, `P_SIZE`, `P_CONTAINER` > FROM `tpch`.`part`] > {code} > this looks incredible... > > My question is: Has Calcite ever tested something like TPC-H. > In theory, for a specific database instance, the query speed after using > calcite will not drop much performance compared to the original database, so > many users are happy to use Calcite to solve the problem of data islands. … -- This message was sent by Atlassian Jira (v8.20.7#820007)