Re: [PERFORM] two table join with order by on both tables attributes
> On 08 Aug 2014, at 16:29, Marti Raudsepp wrote: > > On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin wrote: >> select * from users join notifications on >> users.id=notifications.user_id ORDER BY users.priority desc >> ,notifications.priority desc limit 10; > > In my understanding, i need to have two indexes > on users(priority desc, id) > and notifications(user_id, priority desc) > >> And actually with this kind of query we really want the most wanted >> notifications, by the user. >> So we really can rewrite to order by users.priority desc, id asc, >> notifications.priority desc according to business logic. > > You can rewrite it with LATERAL to trick the planner into sorting each > user's notifications separately. This should give you the nestloop > plan you expect: > > SELECT * > FROM users, > LATERAL ( > SELECT * FROM notifications WHERE notifications.user_id=users.id > ORDER BY notifications.priority DESC > ) AS notifications > ORDER BY users.priority DESC, users.id > Thank you very much. > It would be great if Postgres could do this transformation automatically. > > There's a "partial sort" patch in the current CommitFest, which would > solve the problem partially (it could use the index on users, but the > notifications sort would have to be done in memory still). > https://commitfest.postgresql.org/action/patch_view?id=1368 > > Regards, > Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin wrote: > select * from users join notifications on users.id=notifications.user_id > ORDER BY users.priority desc ,notifications.priority desc limit 10; In my understanding, i need to have two indexes on users(priority desc, id) and notifications(user_id, priority desc) > And actually with this kind of query we really want the most wanted > notifications, by the user. > So we really can rewrite to order by users.priority desc, id asc, > notifications.priority desc according to business logic. You can rewrite it with LATERAL to trick the planner into sorting each user's notifications separately. This should give you the nestloop plan you expect: SELECT * FROM users, LATERAL ( SELECT * FROM notifications WHERE notifications.user_id=users.id ORDER BY notifications.priority DESC ) AS notifications ORDER BY users.priority DESC, users.id It would be great if Postgres could do this transformation automatically. There's a "partial sort" patch in the current CommitFest, which would solve the problem partially (it could use the index on users, but the notifications sort would have to be done in memory still). https://commitfest.postgresql.org/action/patch_view?id=1368 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
> On 08 Aug 2014, at 03:43, Evgeniy Shishkin wrote: > select * from users join notifications on users.id=notifications.user_id ORDER BY users.priority desc ,notifications.priority desc limit 10; >> >>> In my understanding, i need to have two indexes >>> on users(priority desc, id) >>> and notifications(user_id, priority desc) >>> then postgresql would choose nested loop and get sorted data from indexes. >>> But it wont. >> >> Indeed. If you think a bit harder, you'll realize that the plan you >> suggest would *not* produce the sort order requested by this query. >> It would (if I'm not confused myself) produce an ordering like >> users.priority desc, id asc, notifications.priority desc >> which would only match what the query asks for if there's just a single >> value of id per users.priority value. >> >> Offhand I think that the planner will not recognize a nestloop as >> producing a sort ordering of this kind even if the query did request the >> right ordering. That could perhaps be improved, but I've not seen many >> if any cases where it would be worth the trouble. > And actually with this kind of query we really want the most wanted notifications, by the user. So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business logic. And we will benefit if this case would be improved. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
>>> select * from users join notifications on users.id=notifications.user_id >>> ORDER BY users.priority desc ,notifications.priority desc limit 10; > >> In my understanding, i need to have two indexes >> on users(priority desc, id) >> and notifications(user_id, priority desc) >> then postgresql would choose nested loop and get sorted data from indexes. >> But it wont. > > Indeed. If you think a bit harder, you'll realize that the plan you > suggest would *not* produce the sort order requested by this query. > It would (if I'm not confused myself) produce an ordering like > users.priority desc, id asc, notifications.priority desc > which would only match what the query asks for if there's just a single > value of id per users.priority value. > > Offhand I think that the planner will not recognize a nestloop as > producing a sort ordering of this kind even if the query did request the > right ordering. That could perhaps be improved, but I've not seen many > if any cases where it would be worth the trouble. Thanks Tom, you are right. But may be some sort of skip index scan ala loose index scan will help with index on notifications(priority desc,user_id)? I know that this is currently not handled by native executors. May by i can work around this using WITH RECURSIVE query? Also, are there any plans to handle loose index scan in the upcoming release? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
Evgeniy Shishkin writes: >> select * from users join notifications on users.id=notifications.user_id >> ORDER BY users.priority desc ,notifications.priority desc limit 10; > In my understanding, i need to have two indexes > on users(priority desc, id) > and notifications(user_id, priority desc) > then postgresql would choose nested loop and get sorted data from indexes. > But it wont. Indeed. If you think a bit harder, you'll realize that the plan you suggest would *not* produce the sort order requested by this query. It would (if I'm not confused myself) produce an ordering like users.priority desc, id asc, notifications.priority desc which would only match what the query asks for if there's just a single value of id per users.priority value. Offhand I think that the planner will not recognize a nestloop as producing a sort ordering of this kind even if the query did request the right ordering. That could perhaps be improved, but I've not seen many if any cases where it would be worth the trouble. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
My question was about that you can not have fast execution of this kind of query in postgresql. With any runtime configuration you just swith from seq scan and hash join to merge join, and then you have a sort node. In my understanding, i need to have two indexes on users(priority desc, id) and notifications(user_id, priority desc) then postgresql would choose nested loop and get sorted data from indexes. But it wont. I don't understand why. Do you have any schema and GUCs which performs this kind of query well? Sorry for top posting. > Can you explain why a nested loop is best for your data? Given my > understanding of an expected "priority"cardinality I would expect your ORDER > BY to be extremely inefficient and not all that compatible with a nested > loop approach. > > You can use the various parameters listed on this page to force the desired > plan and then provide EXPLAIN ANALYZE results for the various executed plans > and compare them. > > http://www.postgresql.org/docs/9.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE > > And now for the obligatory "read this" link: > > https://wiki.postgresql.org/wiki/SlowQueryQuestions > > If you can show that in fact the nested loop (or some other plan) performs > better than the one chosen by the planner - and can provide data that the > developers can use to replicate the experiment - then improvements can be > made. At worse you will come to understand why the planner is right and can > then explore alternative models. > > David J. > > > > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/two-table-join-with-order-by-on-both-tables-attributes-tp5814135p5814137.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
Evgeniy Shishkin wrote > Hello, > > suppose you have two very simple tables with fk dependency, by which we > join them > and another attribute for sorting > > like this > select * from users join notifications on users.id=notifications.user_id > ORDER BY users.priority desc ,notifications.priority desc limit 10; > > Very typical web query. > > No matter which composite indexes i try, postgresql can not make efficient > nested loop plan using indexes. > It chooses all sorts of seq scans and hash joins or merge join and always > a sort node and then a limit 10. > > Neither plan provides acceptable performance. And tables tend to grow =\ > > Can anybody suggest something or explain this behavior? Can you explain why a nested loop is best for your data? Given my understanding of an expected "priority"cardinality I would expect your ORDER BY to be extremely inefficient and not all that compatible with a nested loop approach. You can use the various parameters listed on this page to force the desired plan and then provide EXPLAIN ANALYZE results for the various executed plans and compare them. http://www.postgresql.org/docs/9.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE And now for the obligatory "read this" link: https://wiki.postgresql.org/wiki/SlowQueryQuestions If you can show that in fact the nested loop (or some other plan) performs better than the one chosen by the planner - and can provide data that the developers can use to replicate the experiment - then improvements can be made. At worse you will come to understand why the planner is right and can then explore alternative models. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/two-table-join-with-order-by-on-both-tables-attributes-tp5814135p5814137.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] two table join with order by on both tables attributes
Hello, suppose you have two very simple tables with fk dependency, by which we join them and another attribute for sorting like this select * from users join notifications on users.id=notifications.user_id ORDER BY users.priority desc ,notifications.priority desc limit 10; Very typical web query. No matter which composite indexes i try, postgresql can not make efficient nested loop plan using indexes. It chooses all sorts of seq scans and hash joins or merge join and always a sort node and then a limit 10. Neither plan provides acceptable performance. And tables tend to grow =\ Can anybody suggest something or explain this behavior? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance