[ 
https://issues.apache.org/jira/browse/CALCITE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-838:
--------------------------------
    Description: 
A common class of queries joins a fact table to an "current" record in a 
dimension table. What appears to be a many-to-many join becomes many-to-one 
because there is only one current record.

Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
  SELECT last_value(customerId) OVER w AS customerId,
    last_value(customerName) OVER w AS customerName
  FROM Customers AS c
  WINDOW w AS (PARTITION BY customerId
    ORDER BY effectiveDate)){code}

This formulation avoids the self-join, and is uncorrelated. Both good things.

It could probably also be written using correlated NOT IN. (Not good, because 
any plan chosen for NOT IN has to watch out for null values.)

  was:
A common class of queries joins a fact table to an "current" record in a 
dimension table. What appears to be a many-to-many join becomes many-to-one 
because there is only one current record.

Here the query on orders (fact) and customers (dimension): {code}
SELECT *
FROM Orders
JOIN Customers AS c USING (customerId)
WHERE NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

Same query pushing WHERE into JOIN:{code}
SELECT *
FROM Orders AS o
JOIN Customers AS c
ON o.customerId = c.customerId
AND NOT EXISTS (
  SELECT 1
  FROM Customers
  WHERE customerId = c.CustomerId 
  AND effectiveDate > c.effectiveDate){code}

The same query re-formulated using LAST_VALUE:{code}
SELECT *
FROM Orders AS o
JOIN (
  SELECT last_value(customerId) OVER w AS customerId,
    last_value(customerName) OVER w AS customerName
  FROM Customers AS c
  WINDOW w AS (PARTITION BY customerId
    ORDER BY effectiveDate DESC)){code}

This formulation avoids the self-join.


> Optimize "join to most recent" queries
> --------------------------------------
>
>                 Key: CALCITE-838
>                 URL: https://issues.apache.org/jira/browse/CALCITE-838
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> A common class of queries joins a fact table to an "current" record in a 
> dimension table. What appears to be a many-to-many join becomes many-to-one 
> because there is only one current record.
> Here the query on orders (fact) and customers (dimension): {code}
> SELECT *
> FROM Orders
> JOIN Customers AS c USING (customerId)
> WHERE NOT EXISTS (
>   SELECT 1
>   FROM Customers
>   WHERE customerId = c.CustomerId 
>   AND effectiveDate > c.effectiveDate){code}
> Same query pushing WHERE into JOIN:{code}
> SELECT *
> FROM Orders AS o
> JOIN Customers AS c
> ON o.customerId = c.customerId
> AND NOT EXISTS (
>   SELECT 1
>   FROM Customers
>   WHERE customerId = c.CustomerId 
>   AND effectiveDate > c.effectiveDate){code}
> The same query re-formulated using LAST_VALUE:{code}
> SELECT *
> FROM Orders AS o
> JOIN (
>   SELECT last_value(customerId) OVER w AS customerId,
>     last_value(customerName) OVER w AS customerName
>   FROM Customers AS c
>   WINDOW w AS (PARTITION BY customerId
>     ORDER BY effectiveDate)){code}
> This formulation avoids the self-join, and is uncorrelated. Both good things.
> It could probably also be written using correlated NOT IN. (Not good, because 
> any plan chosen for NOT IN has to watch out for null values.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to