[
https://issues.apache.org/jira/browse/OPTIQ-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated OPTIQ-399:
------------------------------
Description:
If a predicate is of the from {{p AND q1 OR p AND q2}} it would help to
factorize out the AND factor {{p}}, to the equivalent expression {{p AND (q1 OR
q2)}}. Sometimes {{p}} contains predicates from just one side of a join and can
be pushed down, whereas the whole predicate references columns from both sides.
Converting to DNF is not sufficient, and in the worst case exponentially
increases the size of the expression.
In the following query, factorization will allow us to push down
{{store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_sold_date_sk =
date_dim.d_date_sk}}.
{code:sql}
SELECT avg(ss_quantity) ,
avg(ss_ext_sales_price) ,
avg(ss_ext_wholesale_cost) ,
sum(ss_ext_wholesale_cost)
FROM store_sales ,
store ,
customer_demographics ,
household_demographics ,
customer_address ,
date_dim
WHERE store.s_store_sk = store_sales.ss_store_sk
AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND date_dim.d_year = 2001
AND((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'M'
AND customer_demographics.cd_education_status = '4 yr Degree'
AND store_sales.ss_sales_price BETWEEN 100.00 AND 150.00
AND household_demographics.hd_dep_count = 3)
OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'D'
AND customer_demographics.cd_education_status = 'Primary'
AND store_sales.ss_sales_price BETWEEN 50.00 AND 100.00
AND household_demographics.hd_dep_count = 1 )
OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'U'
AND customer_demographics.cd_education_status = 'Advanced Degree'
AND store_sales.ss_sales_price BETWEEN 150.00 AND 200.00
AND household_demographics.hd_dep_count = 1))
AND((store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('KY',
'GA',
'NM')
AND store_sales.ss_net_profit BETWEEN 100 AND 200)
OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('MT',
'OR',
'IN')
AND store_sales.ss_net_profit BETWEEN 150 AND 300)
OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('WI',
'MO',
'WV')
AND store_sales.ss_net_profit BETWEEN 50 AND 250)) ;
{code}
was:
If a predicate is of the from {{p AND q1 OR p AND q2}} it would help to
factorize out the AND factor {{p}}. Sometimes {{p}} contains predicates from
just one side of a join and can be pushed down, whereas the whole predicate
references columns from both sides.
Converting to DNF is not sufficient, and in the worst case exponentially
increases the size of the expression.
In the following query, factorization will allow us to push down
{{store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_sold_date_sk =
date_dim.d_date_sk}}.
{code:sql}
SELECT avg(ss_quantity) ,
avg(ss_ext_sales_price) ,
avg(ss_ext_wholesale_cost) ,
sum(ss_ext_wholesale_cost)
FROM store_sales ,
store ,
customer_demographics ,
household_demographics ,
customer_address ,
date_dim
WHERE store.s_store_sk = store_sales.ss_store_sk
AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND date_dim.d_year = 2001
AND((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'M'
AND customer_demographics.cd_education_status = '4 yr Degree'
AND store_sales.ss_sales_price BETWEEN 100.00 AND 150.00
AND household_demographics.hd_dep_count = 3)
OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'D'
AND customer_demographics.cd_education_status = 'Primary'
AND store_sales.ss_sales_price BETWEEN 50.00 AND 100.00
AND household_demographics.hd_dep_count = 1 )
OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
AND customer_demographics.cd_demo_sk = ss_cdemo_sk
AND customer_demographics.cd_marital_status = 'U'
AND customer_demographics.cd_education_status = 'Advanced Degree'
AND store_sales.ss_sales_price BETWEEN 150.00 AND 200.00
AND household_demographics.hd_dep_count = 1))
AND((store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('KY',
'GA',
'NM')
AND store_sales.ss_net_profit BETWEEN 100 AND 200)
OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('MT',
'OR',
'IN')
AND store_sales.ss_net_profit BETWEEN 150 AND 300)
OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
AND customer_address.ca_country = 'United States'
AND customer_address.ca_state IN ('WI',
'MO',
'WV')
AND store_sales.ss_net_profit BETWEEN 50 AND 250)) ;
{code}
> Factorize common AND factors out of OR predicates
> -------------------------------------------------
>
> Key: OPTIQ-399
> URL: https://issues.apache.org/jira/browse/OPTIQ-399
> Project: Optiq
> Issue Type: Bug
> Reporter: Laljo John Pullokkaran
> Assignee: Laljo John Pullokkaran
>
> If a predicate is of the from {{p AND q1 OR p AND q2}} it would help to
> factorize out the AND factor {{p}}, to the equivalent expression {{p AND (q1
> OR q2)}}. Sometimes {{p}} contains predicates from just one side of a join
> and can be pushed down, whereas the whole predicate references columns from
> both sides.
> Converting to DNF is not sufficient, and in the worst case exponentially
> increases the size of the expression.
> In the following query, factorization will allow us to push down
> {{store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_sold_date_sk
> = date_dim.d_date_sk}}.
> {code:sql}
> SELECT avg(ss_quantity) ,
> avg(ss_ext_sales_price) ,
> avg(ss_ext_wholesale_cost) ,
> sum(ss_ext_wholesale_cost)
> FROM store_sales ,
> store ,
> customer_demographics ,
> household_demographics ,
> customer_address ,
> date_dim
> WHERE store.s_store_sk = store_sales.ss_store_sk
> AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
> AND date_dim.d_year = 2001
> AND((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
> AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
> AND customer_demographics.cd_marital_status = 'M'
> AND customer_demographics.cd_education_status = '4 yr Degree'
> AND store_sales.ss_sales_price BETWEEN 100.00 AND 150.00
> AND household_demographics.hd_dep_count = 3)
> OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
> AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
> AND customer_demographics.cd_marital_status = 'D'
> AND customer_demographics.cd_education_status = 'Primary'
> AND store_sales.ss_sales_price BETWEEN 50.00 AND 100.00
> AND household_demographics.hd_dep_count = 1 )
> OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
> AND customer_demographics.cd_demo_sk = ss_cdemo_sk
> AND customer_demographics.cd_marital_status = 'U'
> AND customer_demographics.cd_education_status = 'Advanced Degree'
> AND store_sales.ss_sales_price BETWEEN 150.00 AND 200.00
> AND household_demographics.hd_dep_count = 1))
> AND((store_sales.ss_addr_sk = customer_address.ca_address_sk
> AND customer_address.ca_country = 'United States'
> AND customer_address.ca_state IN ('KY',
> 'GA',
> 'NM')
> AND store_sales.ss_net_profit BETWEEN 100 AND 200)
> OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
> AND customer_address.ca_country = 'United States'
> AND customer_address.ca_state IN ('MT',
> 'OR',
> 'IN')
> AND store_sales.ss_net_profit BETWEEN 150 AND 300)
> OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
> AND customer_address.ca_country = 'United States'
> AND customer_address.ca_state IN ('WI',
> 'MO',
> 'WV')
> AND store_sales.ss_net_profit BETWEEN 50 AND 250)) ;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)