Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

Peter Moser Thu, 30 Mar 2017 05:12:10 -0700

2017-03-01 10:56 GMT+01:00 Peter Moser <pitiz...@gmail.com>:
> A similar walkthrough for ALIGN will follow soon.
>
> We are thankful for any suggestion or ideas, to be used to write a
> good SGML documentation.


The attached README explains the ALIGN operation step-by-step with a
TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
input, show how we rewrite it during parser stage, and show how the
final execution generates result tuples.


Best regards,
Anton, Michael, Johann, Peter

================================================================================
                                    ALIGN
================================================================================

  This is an exemplary walkthrough of a temporal ALIGN operation, from the
  query input, over the query rewrite, until the result tuple outputs during
  execution.



EXAMPLE
================================================================================

  These tables represent employees and projects in a company. An employee works
  for a department over a certain time period. Different projects get developed
  in a department. The start and end time points of each project is stored in
  the period column.

    CREATE TABLE emp (empname VARCHAR, dept VARCHAR, period INT4RANGE);
    INSERT INTO emp VALUES
    ('Ann', 'DB', '[1,5)'),
    ('Bea', 'DB', '[3,8)'),
    ('Ron', 'AI', '[6,9)');

    CREATE TABLE prj (prjname VARCHAR, dept VARCHAR, period INT4RANGE);
    INSERT INTO prj VALUES
    ('PR1', 'DB', '[3,7)'),
    ('PR2', 'DB', '[1,5)'),
    ('PR3', 'HW', '[2,8)');


Timeline representation
-----------------------

    EMPNAME  DEPT      PERIOD
    Ann      DB        [1,5)      ----      <- Tuple 1, valid from 1 to 5 excl.
    Bea      DB        [3,8)        -----   <- Tuple 2
    Ron      AI        [6,9)           ---  <- Tuple 3
                                  123456789 <- Timeline

    PRJNAME  DEPT      PERIOD
    PR1      DB        [3,7)        ----    <- Tuple 1, valid from 3 to 7 excl.
    PR2      DB        [1,5)      ----      <- Tuple 2
    PR3      HW        [2,8)       ------   <- Tuple 3
                                  123456789 <- Timeline


TEMPORAL LEFT OUTER JOIN query
------------------------------

  Query: At each time point, to which projects is an  employee assigned, and
         when does an employee not have an assigned project?


    WITH emp AS (SELECT period u, * FROM emp),
         prj AS (SELECT period v, * FROM prj)
    SELECT empname, prjname, dept, period FROM (
        emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
    ) emp_aligned
    LEFT OUTER JOIN (
        prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
    ) prj_aligned
    USING(dept, period)
    WHERE period = u * v OR u IS NULL OR v IS NULL;


Result
------

     empname | prjname | dept | period
    ---------+---------+------+--------
     Ann     | PR2     | DB   | [1,5)           ----
     Ann     | PR1     | DB   | [3,5)             --
     Bea     | PR1     | DB   | [3,7)             ----
     Bea     | PR2     | DB   | [3,5)             --
     Bea     |         | DB   | [7,8)                 -
     Ron     |         | AI   | [6,9)                ---
                                                123456789  <- Timeline



STEP-BY-STEP EXPLANATION
================================================================================

  In this chapter we describe first the rewrite and processing of the two ALIGN
  clauses, that are,

    ( emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
    ) emp_aligned

  ...and...

    ( prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
    ) prj_aligned.

  Secondly, we explain what the above query does in general, and why we need two
  ALIGNs, the WHERE-clause and CTEs (WITH clauses).


ALIGN subquery processing
-------------------------

  After receiving the ALIGN query (see above) as input, we rewrite it into
  the following query:

    SELECT emp.*, GREATEST(LOWER(emp.period), LOWER(prj.period)) p1,
                             LEAST(UPPER(emp.period), UPPER(prj.period)) p2
    FROM
        (SELECT *, row_id() OVER () rn FROM emp) emp
    LEFT OUTER JOIN
        prj
    ON emp.dept = prj.dept AND emp.period && prj.period
    ORDER BY rn, p1, p2;

  Then, we rewrite the second ALIGN subquery analoguously.


Intermediate results: These are the inputs of our ALIGN execution nodes
-----------------------------------------------------------------------
  (See Appendix [1] for details)

  For emp ALIGN prj...

    TUPID    empname | dept | period | rn | p1 | p2
            ---------+------+--------+----+----+----
      A      Ann     | DB   | [1,5)  |  1 |  1 |  5
      B      Ann     | DB   | [1,5)  |  1 |  3 |  5
      C      Bea     | DB   | [3,8)  |  2 |  3 |  5
      D      Bea     | DB   | [3,8)  |  2 |  3 |  7
      E      Ron     | AI   | [6,9)  |  3 |  6 |  9


  For prj ALIGN emp...

    TUPID    prjname | dept | period | rn | p1 | p2
            ---------+------+--------+----+----+----
      a      PR1     | DB   | [3,7)  |  1 |  3 |  7
      b      PR1     | DB   | [3,7)  |  1 |  3 |  5
      c      PR2     | DB   | [1,5)  |  2 |  3 |  5
      d      PR2     | DB   | [1,5)  |  2 |  1 |  5
      e      PR3     | HW   | [2,6)  |  3 |  2 |  6



We have four possibilities inside ALIGN during execution
--------------------------------------------------------
  (See Appendix [2] for details)

    1) CURR == PREV
       a) S < CURR.P1   --> generate (CURR, [S, CURR.P1)) and set S = CURR.P1
       b) otherwise:
          if OUT != CURR:
             generate (CURR, [CURR.P1, CURR.P2))
             set S = MAX(S, CURR.P2)
          fetch next tuple
          if fetched tuple is NULL --> goto(2) forcing CURR != PREV to be true
    2) CURR != PREV
       a) S < PREV.te   --> generate (PREV, [S, PREV.te))
       set S = CURR.ts and goto(1) forcing CURR == PREV to be true


Executor steps for "emp ALIGN prj ON emp.dept = prj.dept..."
------------------------------------------------------------

1) First call of ExecTemporalAdjustment: Fetch tuple A, which is now CURR.
   Set sweepline S = 1 (A.ts), and copy tuple A also into PREV. Goto(1)
   forcing CURR == PREV to be true. The last result tuple (OUT) is NULL.

2) (1): CURR == PREV, that is, tuple A is equal to itself.
   (1b): OUT == NULL, hence we generate a result tuple:
            (Ann, DB, [1,5))
        ...where the interval is [CURR.P1, CURR.P2) and all other attributes
        are copied from the current tuple (omitting helper columns: RN, P1,
        and P2). This result is now the new OUT.
   We update the sweepline: S = 5 <-- MAX(1, 5).
   Fetch tuple B; CURR = B, PREV = A

3) (1): A == B.
   (1b): OUT != CURR (A.ts != B.p1), hence we generate a result tuple:
            (Ann, DB, [3,5))
   Fetch tuple C; CURR = C, PREV = B

4) (2): B != C.
   We update the sweepline: S = 3 (CURR.ts)
   We set PREV = C. Goto(1) forcing CURR == PREV to be true.

5) (1): C == C.
   (1b): OUT != CURR (OUT.rn != CURR.rn), hence we generate a result tuple:
            (Bea, DB, [3,5))
   We update the sweepline: S = 5 (CURR.p2)
   Fetch tuple D; CURR = D, PREV = C

6) (1): C == D.
   (1b): OUT != CURR (OUT.te != CURR.p2), hence we generate a result tuple:
            (Bea, DB, [3,7))
   We update the sweepline: S = 7 (CURR.p2)
   Fetch tuple E; CURR = E, PREV = D

7) (2): D != E.
   (2a): S < D.te (7 < 8), hence we generate a result tuple:
            (Bea, DB, [3,7))
   We update the sweepline: S = 6 (CURR.ts)
   We set PREV = E. Goto(1) forcing CURR == PREV to be true.

8) (1): E == E.
   (1b): OUT != CURR (OUT.ts != CURR.p1), hence we generate a result tuple:
            (Ron, AI, [6,9))
   We update the sweepline: S = 9 (CURR.p2)


No new tuple can be fetched, nor is any new result tuple produced, hence we
terminate our executor.

The result of "emp ALIGN prj..." is as follows:

     empname | dept | period
    ---------+------+--------
     Ann     | DB   | [1,5)
     Ann     | DB   | [3,5)
     Bea     | DB   | [3,5)
     Bea     | DB   | [3,7)
     Bea     | DB   | [7,8)
     Ron     | AI   | [6,9)

Whereas, the result of "prj ALIGN emp..." is the following:

     prjname | dept | period
    ---------+------+--------
     PR1     | DB   | [3,5)
     PR1     | DB   | [3,7)
     PR2     | DB   | [1,5)
     PR2     | DB   | [3,5)
     PR3     | HW   | [2,6)

The initial goal was to find at each time point, for which projects inside a
department is an employee responsible. These can now be easily done from these
aligned inputs with standard (reused) PostgreSQL operators. In our example a
regular LEFT OUTER JOIN, which gives the output below:

     empname | prjname | dept | period
    ---------+---------+------+--------
     Ann     | PR2     | DB   | [1,5)
     Ann     | PR1     | DB   | [3,5)
     Ann     | PR2     | DB   | [3,5)
     Bea     | PR1     | DB   | [3,7)
     Bea     | PR1     | DB   | [3,5)
     Bea     | PR2     | DB   | [3,5)
     Bea     |         | DB   | [7,8)
     Ron     |         | AI   | [6,9)

As you can see, some result tuples are already covered by others. For instance,
(Ann, PR2, DB, [3,5)) is already covered by (Ann, PR2, DB, [1,5)). Therefore, we
need an "absorbing" WHERE-clause, and CTEs (WITH-clauses) to compare the 
original
periods with the new result tuples. After nesting the JOIN within these clauses
the final result is:

     empname | prjname | dept | period
    ---------+---------+------+--------
     Ann     | PR2     | DB   | [1,5)
     Ann     | PR1     | DB   | [3,5)
     Bea     | PR1     | DB   | [3,7)
     Bea     | PR2     | DB   | [3,5)
     Bea     |         | DB   | [7,8)
     Ron     |         | AI   | [6,9)




Best regards,
Anton, Michael, Johann, Peter







--------------------------------------------------------------------------------
Appendix:

[1]
After the intermediate result we get an input, which is splitted into so-called
temporal groups. Each of these groups satisfy the USING-condition (a.dept =
b.dept), has overlapping periods, and a row number as ID (rn). The input is
ordered by temporal group ID, and the start and ending timepoints, i.e., P1 and
P2. Please note, that each tuple in a certain rn ID is always equal to any other
tuple in that rn ID, except for P1 and P2.

[2]
Additinal information for ALIGN:
    - sweepline: to sweep from left-to-right on the time line (S)
    - slots to the current and previous tuple (CURR, PREV)
    - Two tuples A and B are equal, i.e. A == B, if A.rn == B.rn
    - ts means LOWER(time), and te means UPPER(time)
    - OUT is the last produced output tuple (last result tuple)
    - OUT != CURR, if OUT.rn != CURR.rn
                   or OUT.ts != CURR.p1
                   or OUT.te != CURR.p2
                   or OUT == NULL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

Reply via email to