[
https://issues.apache.org/jira/browse/CALCITE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated CALCITE-838:
--------------------------------
Description:
A common class of queries joins a fact table to an "current" record in a
dimension table. What appears to be a many-to-many join becomes many-to-one
because there is only one current record.
Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
SELECT last_value(customerId) OVER w AS customerId,
last_value(customerName) OVER w AS customerName
FROM Customers AS c
WINDOW w AS (PARTITION BY customerId
ORDER BY effectiveDate)){code}
This formulation avoids the self-join, and is uncorrelated. Both good things.
It could probably also be written using correlated NOT IN. (Not good, because
any plan chosen for NOT IN has to watch out for null values.)
was:
A common class of queries joins a fact table to an "current" record in a
dimension table. What appears to be a many-to-many join becomes many-to-one
because there is only one current record.
Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
SELECT 1
FROM Customers
WHERE customerId = c.CustomerId
AND effectiveDate > c.effectiveDate){code}
The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
SELECT last_value(customerId) OVER w AS customerId,
last_value(customerName) OVER w AS customerName
FROM Customers AS c
WINDOW w AS (PARTITION BY customerId
ORDER BY effectiveDate DESC)){code}
This formulation avoids the self-join.
> Optimize "join to most recent" queries
> --------------------------------------
>
> Key: CALCITE-838
> URL: https://issues.apache.org/jira/browse/CALCITE-838
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
>
> A common class of queries joins a fact table to an "current" record in a
> dimension table. What appears to be a many-to-many join becomes many-to-one
> because there is only one current record.
> Here the query on orders (fact) and customers (dimension): {code}
> SELECT *
> FROM Orders
> JOIN Customers AS c USING (customerId)
> WHERE NOT EXISTS (
> SELECT 1
> FROM Customers
> WHERE customerId = c.CustomerId
> AND effectiveDate > c.effectiveDate){code}
> Same query pushing WHERE into JOIN:{code}
> SELECT *
> FROM Orders AS o
> JOIN Customers AS c
> ON o.customerId = c.customerId
> AND NOT EXISTS (
> SELECT 1
> FROM Customers
> WHERE customerId = c.CustomerId
> AND effectiveDate > c.effectiveDate){code}
> The same query re-formulated using LAST_VALUE:{code}
> SELECT *
> FROM Orders AS o
> JOIN (
> SELECT last_value(customerId) OVER w AS customerId,
> last_value(customerName) OVER w AS customerName
> FROM Customers AS c
> WINDOW w AS (PARTITION BY customerId
> ORDER BY effectiveDate)){code}
> This formulation avoids the self-join, and is uncorrelated. Both good things.
> It could probably also be written using correlated NOT IN. (Not good, because
> any plan chosen for NOT IN has to watch out for null values.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)