[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

jianqiao Mon, 24 Oct 2016 00:32:42 -0700

GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/122


    Add backend support for LIPFilters.

    This PR follows #113 and #118 and adds backend support for LIPFilters.
    - `BuildHashOperator` supports building of LIPFilters.
    - `SelectOperator`, `HashJoinOperator` and `AggregateOperator` support 
probing of LIPFilters. 
    
    For `SelectOperator` and `AggregateOperator`, if an filter predicate is 
present, then the LIPFilters will be applied AFTER the filter predicate.
    
    Here are the performance results for SSB SF100 and TPC-H SF100.
    <table>
      <tr>
        <td><b>SSB SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>885</td>
        <td>955</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>738</td>
        <td>821</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>707</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>1240</td>
        <td>1114</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>853</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>751</td>
        <td>975</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>3109</td>
        <td>2116</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>1042</td>
        <td>581</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>786</td>
        <td>710</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>603</td>
        <td>558</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2851</td>
        <td>1410</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>3279</td>
        <td>908</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>1122</td>
        <td>904</td>
      </tr>
      <tr>
        <td>Total</td>
        <td>17967</td>
        <td>12721</td>
      </tr>
    </table>
    
    For TPC-H queries, there is one issue with Q21 that two hash tables on the 
`lineitem` relation are required. Since all the `HashTable`s are constructed in 
`QueryContext` at the beginning of query execution, so that 75% of the 
available memory slots (48569 out of 64385) are occupied which can not be 
swapped out by `StorageManager`'s `EvictionPolicy`. This incurs heavy 
_spilling_ behavior and results in over 120 seconds running time for Q21 in 
master branch / occasional DNF in LIPFilter branch. One quick solution to 
bypass this problem is to relax the buffer pool size (set 
`-buffer_pool_slots=100000`). For a long term solution, we may
    (1) reduce hash table size by using untyped values;
    (2) delay allocating hash table memory unless it is actually used;
    (3) revise scheduler to be aware of resource requirements.
    
    (**master** branch's performance is from Harshad's experiment #121)
    <table>
      <tr>
        <td><b>TPCH SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
        <td><b>w/ LIPFilter (ms)<br />-buffer_pool_slots=100000</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>16,046</td>
        <td>15180</td>
        <td>15238</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>5,625</td>
        <td>710</td>
        <td>744</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>6,861</td>
        <td>5069</td>
        <td>4907</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>2,662</td>
        <td>2617</td>
        <td>2448</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>4,364</td>
        <td>5966</td>
        <td>4499</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>398</td>
        <td>401</td>
        <td>395</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>23,367</td>
        <td>25836</td>
        <td>24860</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>3,274</td>
        <td>1714</td>
        <td>1733</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>10,050</td>
        <td>13707</td>
        <td>7789</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>15,296</td>
        <td>13038</td>
        <td>12934</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2,110</td>
        <td>2344</td>
        <td>2221</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>1,805</td>
        <td>2049</td>
        <td>1969</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>34,220</td>
        <td>35116</td>
        <td>34915</td>
      </tr>
      <tr>
        <td>Q14</td>
        <td>771</td>
        <td>942</td>
        <td>852</td>
      </tr>
      <tr>
        <td>Q15</td>
        <td>4,435</td>
        <td>4882</td>
        <td>4832</td>
      </tr>
      <tr>
        <td>Q16</td>
        <td>8,661</td>
        <td>8062</td>
        <td>9522</td>
      </tr>
      <tr>
        <td>Q17</td>
        <td>160,707</td>
        <td>1749</td>
        <td>1684</td>
      </tr>
      <tr>
        <td>Q18</td>
        <td>66,309</td>
        <td>82505</td>
        <td>86376</td>
      </tr>
      <tr>
        <td>Q19</td>
        <td>1,475</td>
        <td>1871</td>
        <td>1515</td>
      </tr>
      <tr>
        <td>Q20</td>
        <td>55,381</td>
        <td>1591</td>
        <td>1491</td>
      </tr>
      <tr>
        <td>Q21</td>
        <td>121,310</td>
        <td>DNF</td>
        <td>13205</td>
      </tr>
      <tr>
        <td>Q22</td>
        <td>6,792</td>
        <td>6746</td>
        <td>7098</td>
      </tr>
      <tr>
        <td></td>
        <td>551,921</td>
        <td>232096 (w/o Q21)</td>
        <td>241228</td>
      </tr>
    </table>
    
    Note that some improvements are not orthogonal to Harshad's partitioned 
aggregation #121 since LIPFilters also speed up some aggregations. Roughly 
speaking, when both PRs are merged, we will have an estimated overall running 
time of ~150s for TPC-H SF100. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep 
lip-refactor-backend

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #122
    
----
commit 31b05122f2278a3c1327674795eec71efe8ff452
Author: Jianqiao Zhu <[email protected]>
Date:   2016-09-07T18:20:43Z

    Add backend support for LIPFilters.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

Reply via email to