Re: [GENERAL] Equivalence Classes when using IN
On 12 October 2017 at 10:15, Kim Rose Carlsenwrote: > Why don't I see that predicate (customer_id) pushed into the outer nested > loop so we don't have to sort the whole table on each loop. > > (See original post and follow up for definitions) > QUERY PLAN > - > Nested Loop Left Join (cost=139.00..10392.96 rows=668 width=16) (actual > time=0.528..35.120 rows=200 loops=1) > Join Filter: (c.customer_id = product.customer_id) > Rows Removed by Join Filter: 199900 > -> Nested Loop (cost=0.28..199.21 rows=334 width=12) (actual > time=0.075..1.146 rows=100 loops=1) > -> Seq Scan on customer (cost=0.00..21.51 rows=334 width=8) (actual > time=0.067..0.282 rows=100 loops=1) > Filter: (age < 20) > Rows Removed by Filter: 901 > -> Index Only Scan using customer_pkey on customer c > (cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100) > Index Cond: (customer_id = customer.customer_id) > Heap Fetches: 100 > -> Materialize (cost=138.73..173.75 rows=2001 width=8) (actual > time=0.005..0.130 rows=2001 loops=100) > -> Sort (cost=138.73..143.73 rows=2001 width=8) (actual > time=0.448..0.588 rows=2001 loops=1) > Sort Key: product.customer_id, product.product_id > Sort Method: quicksort Memory: 142kB > -> Seq Scan on product (cost=0.00..29.01 rows=2001 width=8) > (actual time=0.006..0.215 rows=2001 loops=1) > Planning time: 0.214 ms > Execution time: 35.284 ms I don't really see any blockers that would mean we couldn't support this, it's just that we don't currently support it. The predicates that we do pushdown are just ones we deem as safe to pushdown of the ones that appear in the query, or ones that can be derived through equivalence. (e.g. ab.a = ab.b and ab.b = 1 --> ab.a = 1) For example, consider the difference between the following: create table ab(a int, b int); insert into ab select x,x from generate_series(1,100)x; create index on ab(a); create index on ab(b); postgres=# explain select * from (select distinct on (a) a,b from ab order by a,b) ab where ab.b < 10; QUERY PLAN --- Subquery Scan on ab (cost=127757.34..145257.34 rows=33 width=8) Filter: (ab.b < 10) -> Unique (cost=127757.34..132757.34 rows=100 width=8) -> Sort (cost=127757.34..130257.34 rows=100 width=8) Sort Key: ab_1.a, ab_1.b -> Seq Scan on ab ab_1 (cost=0.00..14425.00 rows=100 width=8) (6 rows) postgres=# explain select * from (select distinct on (a) a,b from ab order by a,b) ab where ab.a < 10; QUERY PLAN --- Unique (cost=8.73..8.77 rows=9 width=8) -> Sort (cost=8.73..8.75 rows=9 width=8) Sort Key: ab.a, ab.b -> Index Scan using ab_a_idx on ab (cost=0.42..8.58 rows=9 width=8) Index Cond: (a < 10) (5 rows) The "a < 10" was pushed down as we're distinct on (a), but pushing down "ab.b < 10" would be invalid and could cause wrong results. The predicate you'd like to see pushed down is actually a parameter in a parameterized Path and we don't currently generate any parameterized paths outside of each query level. Likely there's no good reason for this other than it's not been done yet, but it's really only been since 9.6 that the query planner has been flexible enough to possibly allow something like this to be done at all. The reason the planner may appear to push down the predicate when there's no DISTINCT ON clause is that the planner was able to pull the subquery (or view) up a level. When the planner is able to do this it's much more flexible to the types of plans it can generate. It's just that we don't ever pull up subqueries with DISTINCT ON, plus a bunch of other reasons. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
> On 11 Oct 2017, at 21.46, David Rowleywrote: > >> On 12 October 2017 at 08:37, Kim Rose Carlsen wrote: >> >>> Yeah. The ORDER BY creates a partial optimization fence, preventing >>> any such plan from being considered. >> >> I can see in the general case it semanticly means different things If you >> allow the WHERE to pass through ORDER BY. >> >> A special case can be allowed for WHERE to pass the ORDER BY if the column >> is part of DISTINCT ON. > > Yeah, we do allow predicates to be pushed down in that case. > Let's ignore that it's not a very useful query I have written. Why don't I see that predicate (customer_id) pushed into the outer nested loop so we don't have to sort the whole table on each loop. (See original post and follow up for definitions) QUERY PLAN - Nested Loop Left Join (cost=139.00..10392.96 rows=668 width=16) (actual time=0.528..35.120 rows=200 loops=1) Join Filter: (c.customer_id = product.customer_id) Rows Removed by Join Filter: 199900 -> Nested Loop (cost=0.28..199.21 rows=334 width=12) (actual time=0.075..1.146 rows=100 loops=1) -> Seq Scan on customer (cost=0.00..21.51 rows=334 width=8) (actual time=0.067..0.282 rows=100 loops=1) Filter: (age < 20) Rows Removed by Filter: 901 -> Index Only Scan using customer_pkey on customer c (cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100) Index Cond: (customer_id = customer.customer_id) Heap Fetches: 100 -> Materialize (cost=138.73..173.75 rows=2001 width=8) (actual time=0.005..0.130 rows=2001 loops=100) -> Sort (cost=138.73..143.73 rows=2001 width=8) (actual time=0.448..0.588 rows=2001 loops=1) Sort Key: product.customer_id, product.product_id Sort Method: quicksort Memory: 142kB -> Seq Scan on product (cost=0.00..29.01 rows=2001 width=8) (actual time=0.006..0.215 rows=2001 loops=1) Planning time: 0.214 ms Execution time: 35.284 ms -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
On 12 October 2017 at 08:37, Kim Rose Carlsenwrote: > >> Yeah. The ORDER BY creates a partial optimization fence, preventing >> any such plan from being considered. >>> > > I can see in the general case it semanticly means different things If you > allow the WHERE to pass through ORDER BY. > > A special case can be allowed for WHERE to pass the ORDER BY if the column is > part of DISTINCT ON. Yeah, we do allow predicates to be pushed down in that case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
> Yeah. The ORDER BY creates a partial optimization fence, preventing > any such plan from being considered. >> I can see in the general case it semanticly means different things If you allow the WHERE to pass through ORDER BY. A special case can be allowed for WHERE to pass the ORDER BY if the column is part of DISTINCT ON. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
On Mon, Oct 09, 2017 at 07:44:50PM -0400, Tom Lane wrote: > David Rowleywrites: > > If the only reason that is_simple_subquery() rejects subqueries with > > ORDER BY is due to wanting to keep the order by of a view, then > > couldn't we make is_simple_subquery() a bit smarter and have it check > > if the subquery is going to be joined to something else, which likely > > would destroy the order, or at least it would remove any guarantees of > > it. > > I'm not on board with this. The assumption is that if the user put an > ORDER BY there, that means they want that subquery to be computed in that > order. It's not for us to decide they didn't mean what they said. > > Moreover, there are cases where the ORDER BY would be semantically > significant, eg if there's a LIMIT or volatile functions or tSRFs > involved. Or where the order is meaningful to an aggregate function applied to columns of a view result set. I'm not sure what the full set of cases where the ORDER BY on the inner query is meaningful, but I'm sure there are cases it is not. If there are no such constraints on dropping the ORDER BY, then the it could be dropped, making the view query simpler. Nico -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
> If the only reason that is_simple_subquery() rejects subqueries with > ORDER BY is due to wanting to keep the order by of a view, then > couldn't we make is_simple_subquery() a bit smarter and have it check > if the subquery is going to be joined to something else, which likely > would destroy the order, or at least it would remove any guarantees of > it. > > Something like the attached? I dont know if it makes any difference that the ORDER BY is used in a DISTINCT ON clause. In this case the ORDER BY is important. - Kim
Re: [GENERAL] Equivalence Classes when using IN
> You would benefit from adding the age column to view_customer, or at > least consider having some view which contains all the columns you'll > ever need from those tables and if you need special views with only a > subset of columns due to some software doing "select * from > viewname;", then you could just create some. Joining to the same table > again seems like a bit of a waste of effort for the planner and > executor. I would argue that the anti pattern would be the software that insist on using "select * from viewname;" from a view that has calculated columns that you do not care for. I recommend introducing both lightweight views and heavyweight views, so you can join up probably for what you need. My example is fabricated trying to simplify things, but I seem to create more confusion than clarity in my example. My point was only to see if anything could be added to the fabricated execution path. I agree that the listed example does not make sense. So I will try and give some more context to real use cases. Imagine an invoice entity where you have one relation for invoice base data and a relation for invoice_line. The invoice has some invoice_id, customer_id, due_date, paid_date and invoice_line contains each line with a invoice_id, display_name, amount. A view (view_invoice_with_amount) where you calculate the total. so a query could be SELECT c.customer_id, i.invoice_amount_total FROM view_customer c JOIN view_invoice_with_amount i ON c.customer_id = i.customer_id WHERE c.first_name = 'John'; If you ever need to filter by invoice_amount_total, it might be necesary denormalize the relations and cache the amount in the invoice table. > I'd assume customer_id is the PRIMARY KEY of customer and > is unique. This is a continuation of the previous example, maybe I should have included it all to make it more clear. But customer_id is declared as a primary key. > It's not all that clear what your view is doing here. Confusingly > there's a Sort in the plan, yet nothing in the query asked for that, > so I guess that the view must have an ORDER BY. If you get rid of that > the planner would likely use an index on product (customer_id) to > parameterise the nested loop, at least, it likely would, if you have > one. The view is defined in the original post. What I was trying to illustrate was a DISTINCT ON clause to prioritize multiple products pr customer to a somewhat "main" product for the customer. The ORDER BY on product_id would in this case then map the first product a customer gets to its "main" product. It could also be the most valuable product or newest ordered active product etc. It is just some way of mapping one to many relation to a one to one. Again the example is simplified and fabricated and maybe looses its power to explain its intents. > It's pretty bad practice to have ORDER BY in views. I kinda wish we > didn't even allow it, but that ship sailed many years ago... It is required by DISTINCT ON and as soon as you go into reporting, datawarehouse then it gets difficult to avoid these along with group by. Instead of writing each query from the ground up you get a huge benefit by factorizing each query into meaningful entities that can stand alone and make sense by themself, and from these build up the query to answer your questions. That way you gain lots of re-use of code and definition doesn't change between queries. The down side is it leaves alot of work to the planner. It's a trade off between optimization, readability and simplicity. I hope I make more sense now. - Kim
Re: [GENERAL] Equivalence Classes when using IN
On 10 October 2017 at 12:44, Tom Lanewrote: > David Rowley writes: >> If the only reason that is_simple_subquery() rejects subqueries with >> ORDER BY is due to wanting to keep the order by of a view, then >> couldn't we make is_simple_subquery() a bit smarter and have it check >> if the subquery is going to be joined to something else, which likely >> would destroy the order, or at least it would remove any guarantees of >> it. > > I'm not on board with this. The assumption is that if the user put an > ORDER BY there, that means they want that subquery to be computed in that > order. It's not for us to decide they didn't mean what they said. > > Moreover, there are cases where the ORDER BY would be semantically > significant, eg if there's a LIMIT or volatile functions or tSRFs > involved. Ok, thanks for looking, although, FWIW, LIMIT and tSRFs are still disabled. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
David Rowleywrites: > If the only reason that is_simple_subquery() rejects subqueries with > ORDER BY is due to wanting to keep the order by of a view, then > couldn't we make is_simple_subquery() a bit smarter and have it check > if the subquery is going to be joined to something else, which likely > would destroy the order, or at least it would remove any guarantees of > it. I'm not on board with this. The assumption is that if the user put an ORDER BY there, that means they want that subquery to be computed in that order. It's not for us to decide they didn't mean what they said. Moreover, there are cases where the ORDER BY would be semantically significant, eg if there's a LIMIT or volatile functions or tSRFs involved. BTW, I noticed that I was wrong upthread about ORDER BY in subqueries being disallowed by spec --- that was true originally, but they allow it as of SQL:2008 or thereabouts. It might be interesting to see if the spec says anything concrete about the semantics of that. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
On 10 October 2017 at 02:51, Tom Lanewrote: > David Rowley writes: >> It's pretty bad practice to have ORDER BY in views. I kinda wish we >> didn't even allow it, but that ship sailed many years ago... > > I think it's actually disallowed by the SQL spec (although so are > many other things we support). IMO it's a useful facility to have > for views that are meant for direct presentation to clients --- > but if you'd like joins to the view to be optimized, you don't > want an ORDER BY in there. If the only reason that is_simple_subquery() rejects subqueries with ORDER BY is due to wanting to keep the order by of a view, then couldn't we make is_simple_subquery() a bit smarter and have it check if the subquery is going to be joined to something else, which likely would destroy the order, or at least it would remove any guarantees of it. Something like the attached? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pullup_subqueries_with_order_by_when_its_not_the_only_fromitem.patch Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
David Rowleywrites: > It's not all that clear what your view is doing here. Confusingly > there's a Sort in the plan, yet nothing in the query asked for that, > so I guess that the view must have an ORDER BY. If you get rid of that > the planner would likely use an index on product (customer_id) to > parameterise the nested loop, at least, it likely would, if you have > one. Yeah. The ORDER BY creates a partial optimization fence, preventing any such plan from being considered. > It's pretty bad practice to have ORDER BY in views. I kinda wish we > didn't even allow it, but that ship sailed many years ago... I think it's actually disallowed by the SQL spec (although so are many other things we support). IMO it's a useful facility to have for views that are meant for direct presentation to clients --- but if you'd like joins to the view to be optimized, you don't want an ORDER BY in there. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
On 9 October 2017 at 22:39, Kim Rose Carlsenwrote: > EXPLAIN ANALYZE > SELECT * >FROM customer >JOIN view_customer > ON customer.customer_id = view_customer.customer_id > WHERE age < 20; > > QUERY PLAN > - > Nested Loop Left Join (cost=139.00..10392.96 rows=668 width=16) (actual > time=0.528..35.120 rows=200 loops=1) >Join Filter: (c.customer_id = product.customer_id) >Rows Removed by Join Filter: 199900 >-> Nested Loop (cost=0.28..199.21 rows=334 width=12) (actual > time=0.075..1.146 rows=100 loops=1) > -> Seq Scan on customer (cost=0.00..21.51 rows=334 width=8) > (actual time=0.067..0.282 rows=100 loops=1) >Filter: (age < 20) >Rows Removed by Filter: 901 > -> Index Only Scan using customer_pkey on customer c > (cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100) >Index Cond: (customer_id = customer.customer_id) >Heap Fetches: 100 >-> Materialize (cost=138.73..173.75 rows=2001 width=8) (actual > time=0.005..0.130 rows=2001 loops=100) > -> Sort (cost=138.73..143.73 rows=2001 width=8) (actual > time=0.448..0.588 rows=2001 loops=1) >Sort Key: product.customer_id, product.product_id >Sort Method: quicksort Memory: 142kB >-> Seq Scan on product (cost=0.00..29.01 rows=2001 width=8) > (actual time=0.006..0.215 rows=2001 loops=1) > Planning time: 0.214 ms > Execution time: 35.284 ms You would benefit from adding the age column to view_customer, or at least consider having some view which contains all the columns you'll ever need from those tables and if you need special views with only a subset of columns due to some software doing "select * from viewname;", then you could just create some. Joining to the same table again seems like a bit of a waste of effort for the planner and executor. I'd assume customer_id is the PRIMARY KEY of customer and is unique. It's not all that clear what your view is doing here. Confusingly there's a Sort in the plan, yet nothing in the query asked for that, so I guess that the view must have an ORDER BY. If you get rid of that the planner would likely use an index on product (customer_id) to parameterise the nested loop, at least, it likely would, if you have one. It's pretty bad practice to have ORDER BY in views. I kinda wish we didn't even allow it, but that ship sailed many years ago... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
>On 9 October 2017 at 08:01, Kim Rose Carlsenwrote: >> Is this because postgres never consider IN clause when building equivalence >> class's? > >Only btree equality operators are considered at the moment. After good night sleep and reading the previous discussion, I am no longer sure I have reduced my original problem to the right example. If we continue from previous setup and add the following: ALTER TABLE customer ADD COLUMN age INTEGER; UPDATE customer SET age = customer_id / 5; CREATE INDEX ON customer (age); CREATE INDEX ON product (customer_id); SET enable_hashjoin = false; SET enable_mergejoin = false; EXPLAIN ANALYZE SELECT * FROM customer JOIN view_customer ON customer.customer_id = view_customer.customer_id WHERE age < 20; QUERY PLAN - Nested Loop Left Join (cost=139.00..10392.96 rows=668 width=16) (actual time=0.528..35.120 rows=200 loops=1) Join Filter: (c.customer_id = product.customer_id) Rows Removed by Join Filter: 199900 -> Nested Loop (cost=0.28..199.21 rows=334 width=12) (actual time=0.075..1.146 rows=100 loops=1) -> Seq Scan on customer (cost=0.00..21.51 rows=334 width=8) (actual time=0.067..0.282 rows=100 loops=1) Filter: (age < 20) Rows Removed by Filter: 901 -> Index Only Scan using customer_pkey on customer c (cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100) Index Cond: (customer_id = customer.customer_id) Heap Fetches: 100 -> Materialize (cost=138.73..173.75 rows=2001 width=8) (actual time=0.005..0.130 rows=2001 loops=100) -> Sort (cost=138.73..143.73 rows=2001 width=8) (actual time=0.448..0.588 rows=2001 loops=1) Sort Key: product.customer_id, product.product_id Sort Method: quicksort Memory: 142kB -> Seq Scan on product (cost=0.00..29.01 rows=2001 width=8) (actual time=0.006..0.215 rows=2001 loops=1) Planning time: 0.214 ms Execution time: 35.284 ms The planner prefer to use hash and merge joins which is ok, when many rows are to be joined, I don't think any condition can be merged to make these case faster. I have disabled merge and hash joins to get to a nested loop join instead, in this case it would be much better if customer_id can be pulled inside the loop, so it can look at only the relevant rows and not all rows for each loop. I somehow inferred that this would be the same as selecting from the view using IN clause, now I'm not so sure anymore. I can see there is a trade off between planner time and how exotic the case is. If you want to be able to hide abstraction through views I guess the nature becomes more OLAP oriented than OLTP. Best Regards Kim Carlsen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
David Rowleywrites: > There are more details of the discussion in [1], although there's > probably lots more threads to be found if you search the archives. > [1] > https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com#cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com That thread seems to be about transitively applying inequalities (ie, given "a = b and a < c", deduce "b < c"), which seems like a bit of a different animal than IN. Some of the issues are similar perhaps, but I'd think that being able to deduce "b IN somelist" from "a = b and a IN somelist" is more valuable, simply because the IN would typically be a much stronger constraint than an inequality. So that idea suggests that it's more worth expending planner cycles to chase the possibility. I do vaguely recall discussions specifically around IN, though I didn't have any luck finding one in the archives. There's also the recent thread https://www.postgresql.org/message-id/flat/camjna7cc4x9yr-vajs-jsycajhrdvjqnn7m2slh1wlh-_z2...@mail.gmail.com which suggests being able to simplify "a IN somelist AND a IN someotherlist". If we wanted to do that, making the "lists" be treatable as eclass members would be a good place to start, because that would naturally result in intersect-able lists ending up in the same eclass. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Equivalence Classes when using IN
On 9 October 2017 at 08:01, Kim Rose Carlsenwrote: > Is this because postgres never consider IN clause when building equivalence > class's? Only btree equality operators are considered at the moment. > Are there any interests in adding such rule? There's been some discussion on it previously, although there is lots to still be worked out, for example, it's not that clear if it will always be a win to always apply the qual. There are more details of the discussion in [1], although there's probably lots more threads to be found if you search the archives. [1] https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com#cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Equivalence Classes when using IN
Hi I have this query where I think it's strange that the join doesn't pull the where condition in since RHS is equal to LHS. It might be easier to expain with an example Setup CREATE TABLE customer ( customer_id INTEGER PRIMARY KEY ); CREATE TABLE product ( product_id INTEGER PRIMARY KEY, customer_id INTEGER NOT NULL REFERENCES customer (customer_id) ); INSERT INTO customer (SELECT generate_series FROM generate_series(0, 100)); INSERT INTO product (product_id, customer_id) (SELECT generate_series, generate_series / 2 FROM generate_series(0, 2000)); Query EXPLAIN ANALYSE SELECT * FROM customer c JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY customer_id, product_id) p ON c.customer_id = p.customer_id WHERE c.customer_id IN (500, 501); QUERY PLAN -- Merge Join (cost=172.43..186.25 rows=1 width=12) (actual time=1.350..1.353 rows=2 loops=1) Merge Cond: (c.customer_id = product.customer_id) -> Sort (cost=13.93..13.93 rows=2 width=4) (actual time=0.036..0.036 rows=2 loops=1) Sort Key: c.customer_id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on customer c (cost=8.58..13.92 rows=2 width=4) (actual time=0.026..0.027 rows=2 loops=1) Recheck Cond: (customer_id = ANY ('{500,501}'::integer[])) Heap Blocks: exact=1 -> Bitmap Index Scan on customer_pkey (cost=0.00..8.58 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1) Index Cond: (customer_id = ANY ('{500,501}'::integer[])) -> Unique (cost=158.51..169.81 rows=200 width=8) (actual time=0.783..1.221 rows=502 loops=1) -> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.782..0.929 rows=1003 loops=1) Sort Key: product.customer_id, product.product_id Sort Method: quicksort Memory: 142kB -> Seq Scan on product (cost=0.00..32.60 rows=2260 width=8) (actual time=0.015..0.366 rows=2001 loops=1) Planning time: 0.281 ms Execution time: 1.432 ms I would expect that since c.customer_id = p.customer_id then p.customer_id IN (500, 501). If I apply this rule myself, I get a much nicer plan (and it could be even better with an index on product_id). QUERY PLAN -- Merge Join (cost=52.70..53.11 rows=1 width=12) (actual time=0.686..0.693 rows=2 loops=1) Merge Cond: (product.customer_id = c.customer_id) -> Unique (cost=38.77..38.89 rows=22 width=8) (actual time=0.647..0.651 rows=2 loops=1) -> Sort (cost=38.77..38.83 rows=23 width=8) (actual time=0.646..0.647 rows=4 loops=1) Sort Key: product.customer_id, product.product_id Sort Method: quicksort Memory: 25kB -> Seq Scan on product (cost=0.00..38.25 rows=23 width=8) (actual time=0.331..0.632 rows=4 loops=1) Filter: (customer_id = ANY ('{500,501}'::integer[])) Rows Removed by Filter: 1997 -> Sort (cost=13.93..13.93 rows=2 width=4) (actual time=0.033..0.033 rows=2 loops=1) Sort Key: c.customer_id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on customer c (cost=8.58..13.92 rows=2 width=4) (actual time=0.025..0.026 rows=2 loops=1) Recheck Cond: (customer_id = ANY ('{500,501}'::integer[])) Heap Blocks: exact=1 -> Bitmap Index Scan on customer_pkey (cost=0.00..8.58 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1) Index Cond: (customer_id = ANY ('{500,501}'::integer[])) Planning time: 0.386 ms Execution time: 0.774 ms (19 rows) Is this because postgres never consider IN clause when building equivalence class's? Are there any interests in adding such rule? My idea is to wrap this in a view CREATE VIEW view_customer AS SELECT c.customer_id, p.product_id FROM customer c LEFT JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY customer_id, product_id) p ON c.customer_id = p.customer_id Where the LEFT JOIN can be pruned if there is no explicit need for product_id. Here I loose the power to express that both c.customer_id and p.customer_id is the same. Best regards Kim Carlsen