Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/593#issuecomment-111736855
Here is the simple evaluation result.
### Query
```
select
count(*)
from
lineitem left outer join orders
on l_orderkey = o_orderkey
left outer join partsupp
on ps_suppkey = o_custkey
left outer join customer
on ps_suppkey = c_custkey
left outer join part
on p_partkey = c_nationkey
```
### Data
* TPC-H of scale factor 100
### Cluster
* One master and 3 workers
* Each worker equips 4 cores, 8 GB memory, and 2 HDDs.
### Performance comparison
* Elapsed time
* Before patch: 25 mins, 23 sec
* After patch: 19 mins, 59 sec
* Performance improvement ratio: about 20%
### Query plan
The query execution time is reduced due to the improved query plan as
follows.
#### Before patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem â tpch100.orders)
â tpch100.partsupp) â tpch100.customer) â tpch100.part) (cost:
1.0447965953264456E46)
> Optimized join order : ((((tpch100.lineitem â tpch100.orders)
â tpch100.partsupp) â tpch100.customer) â tpch100.part) (cost:
1.0447965859707236E46)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) =
tpch100.customer.c_nationkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.customer.c_nationkey (INT8),
tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8),
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4),
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8),
tpch100.part.p_comment (TEXT)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.customer.c_custkey (INT8)
=> target list: tpch100.customer.c_nationkey (INT8)
=> out schema: {(1) tpch100.customer.c_nationkey (INT8)}
=> in schema: {(3) tpch100.partsupp.ps_suppkey (INT8),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_nationkey (INT8),
tpch100.customer.c_custkey (INT8)
=> out schema: {(2) tpch100.customer.c_nationkey (INT8),
tpch100.customer.c_custkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT),
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT),
tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(2) tpch100.orders.o_custkey (INT8),
tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4),
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) =
tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.orders.o_custkey (INT8)
=> out schema: {(1) tpch100.orders.o_custkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_custkey (INT8),
tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.orders.o_custkey (INT8),
tpch100.orders.o_orderkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT),
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE),
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT),
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8),
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8),
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8),
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode
(TEXT), tpch100.lineitem.l_comment (TEXT)}
```
#### After patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem â tpch100.orders)
â tpch100.partsupp) â tpch100.customer) â tpch100.part) (cost:
7.933924122078524E46)
> Optimized join order : ((tpch100.lineitem â (tpch100.orders â
tpch100.partsupp)) â (tpch100.customer â tpch100.part)) (cost:
4.016549062824562E47)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.customer.c_custkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.partsupp.ps_suppkey (INT8),
tpch100.customer.c_custkey (INT8)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) =
tpch100.customer.c_nationkey (INT8)
=> target list: tpch100.customer.c_custkey (INT8)
=> out schema: {(1) tpch100.customer.c_custkey (INT8)}
=> in schema: {(3) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8),
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4),
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8),
tpch100.part.p_comment (TEXT)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8)
=> out schema: {(2) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT),
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT),
tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) =
tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8),
tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.partsupp.ps_suppkey (INT8),
tpch100.orders.o_orderkey (INT8)}
=> in schema: {(3) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4),
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8)
=> out schema: {(2) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT),
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE),
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT),
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8),
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8),
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8),
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode
(TEXT), tpch100.lineitem.l_comment (TEXT)}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---