Julian Hyde created CALCITE-838:
-----------------------------------
Summary: Optimize join-to-MAX queries
Key: CALCITE-838
URL: https://issues.apache.org/jira/browse/CALCITE-838
Project: Calcite
Issue Type: Bug
Reporter: Julian Hyde
Assignee: Julian Hyde
A common class of queries joins a fact table to an "current" record in a
dimension table. What appears to be a many-to-many join becomes many-to-one
because there is only one current record.
Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
SELECT last_value(customerId) OVER w AS customerId,
last_value(customerName) OVER w AS customerName
FROM Customers AS c
WINDOW w AS (PARTITION BY customerId
ORDER BY effectiveDate DESC)){code}
This formulation avoids the self-join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)