Julian Hyde created CALCITE-838:
-----------------------------------

             Summary: Optimize join-to-MAX queries
                 Key: CALCITE-838
                 URL: https://issues.apache.org/jira/browse/CALCITE-838
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde


A common class of queries joins a fact table to an "current" record in a 
dimension table. What appears to be a many-to-many join becomes many-to-one 
because there is only one current record.

Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
  SELECT last_value(customerId) OVER w AS customerId,
    last_value(customerName) OVER w AS customerName
  FROM Customers AS c
  WINDOW w AS (PARTITION BY customerId
    ORDER BY effectiveDate DESC)){code}

This formulation avoids the self-join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to