Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/593#issuecomment-111736855
  
    Here is the simple evaluation result.
    
    ### Query
    ```
    select 
      count(*) 
    from 
      lineitem left outer join orders 
        on l_orderkey = o_orderkey 
      left outer join partsupp 
        on ps_suppkey = o_custkey 
      left outer join customer 
        on ps_suppkey = c_custkey 
      left outer join part 
        on p_partkey = c_nationkey
    ```
    
    ### Data
    * TPC-H of scale factor 100
    
    ### Cluster
    * One master and 3 workers
    * Each worker equips 4 cores, 8 GB memory, and 2 HDDs.
    
    ### Performance comparison
    * Elapsed time
     * Before patch: 25 mins, 23 sec
     * After patch: 19 mins, 59 sec
    * Performance improvement ratio: about 20%
    
    ### Query plan
    The query execution time is reduced due to the improved query plan as 
follows.
    #### Before patch
    ```
    -----------------------------
    Query Block Graph
    -----------------------------
    |-#ROOT
    -----------------------------
    Optimization Log:
    [LogicalPlan]
        > ProjectionNode is eliminated.
    [#ROOT]
        > Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) 
⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
1.0447965953264456E46)
        > Optimized join order    : ((((tpch100.lineitem ⟕ tpch100.orders) 
⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
1.0447965859707236E46)
    -----------------------------
    
    GROUP_BY(10)()
      => exprs: (count())
      => target list: ?count (INT8)
      => out schema:{(1) ?count (INT8)}
      => in schema:{(0) }
       JOIN(15)(LEFT_OUTER)
         => Join Cond: tpch100.part.p_partkey (INT8) = 
tpch100.customer.c_nationkey (INT8)
         => target list: 
         => out schema: {(0) }
         => in schema: {(2) tpch100.customer.c_nationkey (INT8), 
tpch100.part.p_partkey (INT8)}
          SCAN(7) on tpch100.part
            => target list: tpch100.part.p_partkey (INT8)
            => out schema: {(1) tpch100.part.p_partkey (INT8)}
            => in schema: {(9) tpch100.part.p_partkey (INT8), 
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand 
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), 
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), 
tpch100.part.p_comment (TEXT)}
          JOIN(14)(LEFT_OUTER)
            => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.customer.c_custkey (INT8)
            => target list: tpch100.customer.c_nationkey (INT8)
            => out schema: {(1) tpch100.customer.c_nationkey (INT8)}
            => in schema: {(3) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
             SCAN(5) on tpch100.customer
               => target list: tpch100.customer.c_nationkey (INT8), 
tpch100.customer.c_custkey (INT8)
               => out schema: {(2) tpch100.customer.c_nationkey (INT8), 
tpch100.customer.c_custkey (INT8)}
               => in schema: {(8) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), 
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), 
tpch100.customer.c_comment (TEXT)}
             JOIN(13)(LEFT_OUTER)
               => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.orders.o_custkey (INT8)
               => target list: tpch100.partsupp.ps_suppkey (INT8)
               => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
               => in schema: {(2) tpch100.orders.o_custkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8)}
                SCAN(3) on tpch100.partsupp
                  => target list: tpch100.partsupp.ps_suppkey (INT8)
                  => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
                  => in schema: {(5) tpch100.partsupp.ps_partkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), 
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
                JOIN(12)(LEFT_OUTER)
                  => Join Cond: tpch100.lineitem.l_orderkey (INT8) = 
tpch100.orders.o_orderkey (INT8)
                  => target list: tpch100.orders.o_custkey (INT8)
                  => out schema: {(1) tpch100.orders.o_custkey (INT8)}
                  => in schema: {(3) tpch100.lineitem.l_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
                   SCAN(1) on tpch100.orders
                     => target list: tpch100.orders.o_custkey (INT8), 
tpch100.orders.o_orderkey (INT8)
                     => out schema: {(2) tpch100.orders.o_custkey (INT8), 
tpch100.orders.o_orderkey (INT8)}
                     => in schema: {(9) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), 
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), 
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), 
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
                   SCAN(0) on tpch100.lineitem
                     => target list: tpch100.lineitem.l_orderkey (INT8)
                     => out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
                     => in schema: {(16) tpch100.lineitem.l_orderkey (INT8), 
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), 
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), 
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount 
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag 
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate 
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate 
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode 
(TEXT), tpch100.lineitem.l_comment (TEXT)}
    ```
    
    #### After patch
    ```
    -----------------------------
    Query Block Graph
    -----------------------------
    |-#ROOT
    -----------------------------
    Optimization Log:
    [LogicalPlan]
        > ProjectionNode is eliminated.
    [#ROOT]
        > Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) 
⟕ tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
7.933924122078524E46)
        > Optimized join order    : ((tpch100.lineitem ⟕ (tpch100.orders ⟕ 
tpch100.partsupp)) ⟕ (tpch100.customer ⟕ tpch100.part)) (cost: 
4.016549062824562E47)
    -----------------------------
    
    GROUP_BY(10)()
      => exprs: (count())
      => target list: ?count (INT8)
      => out schema:{(1) ?count (INT8)}
      => in schema:{(0) }
       JOIN(15)(LEFT_OUTER)
         => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.customer.c_custkey (INT8)
         => target list: 
         => out schema: {(0) }
         => in schema: {(2) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.customer.c_custkey (INT8)}
          JOIN(14)(LEFT_OUTER)
            => Join Cond: tpch100.part.p_partkey (INT8) = 
tpch100.customer.c_nationkey (INT8)
            => target list: tpch100.customer.c_custkey (INT8)
            => out schema: {(1) tpch100.customer.c_custkey (INT8)}
            => in schema: {(3) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
             SCAN(7) on tpch100.part
               => target list: tpch100.part.p_partkey (INT8)
               => out schema: {(1) tpch100.part.p_partkey (INT8)}
               => in schema: {(9) tpch100.part.p_partkey (INT8), 
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand 
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), 
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), 
tpch100.part.p_comment (TEXT)}
             SCAN(5) on tpch100.customer
               => target list: tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8)
               => out schema: {(2) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8)}
               => in schema: {(8) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), 
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), 
tpch100.customer.c_comment (TEXT)}
          JOIN(13)(LEFT_OUTER)
            => Join Cond: tpch100.lineitem.l_orderkey (INT8) = 
tpch100.orders.o_orderkey (INT8)
            => target list: tpch100.partsupp.ps_suppkey (INT8)
            => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
            => in schema: {(3) tpch100.lineitem.l_orderkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
             JOIN(12)(LEFT_OUTER)
               => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.orders.o_custkey (INT8)
               => target list: tpch100.partsupp.ps_suppkey (INT8), 
tpch100.orders.o_orderkey (INT8)
               => out schema: {(2) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.orders.o_orderkey (INT8)}
               => in schema: {(3) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
                SCAN(3) on tpch100.partsupp
                  => target list: tpch100.partsupp.ps_suppkey (INT8)
                  => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
                  => in schema: {(5) tpch100.partsupp.ps_partkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), 
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
                SCAN(1) on tpch100.orders
                  => target list: tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8)
                  => out schema: {(2) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8)}
                  => in schema: {(9) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), 
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), 
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), 
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
             SCAN(0) on tpch100.lineitem
               => target list: tpch100.lineitem.l_orderkey (INT8)
               => out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
               => in schema: {(16) tpch100.lineitem.l_orderkey (INT8), 
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), 
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), 
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount 
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag 
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate 
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate 
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode 
(TEXT), tpch100.lineitem.l_comment (TEXT)}
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to