Re: [PERFORM] two table join with order by on both tables attributes
On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin itparan...@gmail.com wrote: select * from users join notifications on users.id=notifications.user_id ORDER BY users.priority desc ,notifications.priority desc limit 10; In my understanding, i need to have two indexes on users(priority desc, id) and notifications(user_id, priority desc) And actually with this kind of query we really want the most wanted notifications, by the user. So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business logic. You can rewrite it with LATERAL to trick the planner into sorting each user's notifications separately. This should give you the nestloop plan you expect: SELECT * FROM users, LATERAL ( SELECT * FROM notifications WHERE notifications.user_id=users.id ORDER BY notifications.priority DESC ) AS notifications ORDER BY users.priority DESC, users.id It would be great if Postgres could do this transformation automatically. There's a partial sort patch in the current CommitFest, which would solve the problem partially (it could use the index on users, but the notifications sort would have to be done in memory still). https://commitfest.postgresql.org/action/patch_view?id=1368 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
On 08 Aug 2014, at 16:29, Marti Raudsepp ma...@juffo.org wrote: On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin itparan...@gmail.com wrote: select * from users join notifications on users.id=notifications.user_id ORDER BY users.priority desc ,notifications.priority desc limit 10; In my understanding, i need to have two indexes on users(priority desc, id) and notifications(user_id, priority desc) And actually with this kind of query we really want the most wanted notifications, by the user. So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business logic. You can rewrite it with LATERAL to trick the planner into sorting each user's notifications separately. This should give you the nestloop plan you expect: SELECT * FROM users, LATERAL ( SELECT * FROM notifications WHERE notifications.user_id=users.id ORDER BY notifications.priority DESC ) AS notifications ORDER BY users.priority DESC, users.id Thank you very much. It would be great if Postgres could do this transformation automatically. There's a partial sort patch in the current CommitFest, which would solve the problem partially (it could use the index on users, but the notifications sort would have to be done in memory still). https://commitfest.postgresql.org/action/patch_view?id=1368 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Optimization idea for long IN() lists
Folks, So one thing we tell users who have chronically long IN() lists is that they should create a temporary table and join against that instead. Other than not having the code, is there a reason why PostgreSQL shouldn't do something like this behind the scenes, automatically? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimization idea for long IN() lists
On Sat, Aug 9, 2014 at 5:15 AM, Josh Berkus j...@agliodbs.com wrote: Folks, So one thing we tell users who have chronically long IN() lists is that they should create a temporary table and join against that instead. Other than not having the code, is there a reason why PostgreSQL shouldn't do something like this behind the scenes, automatically? Hi Josh, I know that problem for many years. There are some workaround which doesn't require using the temporary tables (and I used that approach quite a lot when performance matter): Instead of using: SELECT * FROM sometable WHERE somefield IN (val1, val2, ...) AND other_filters; Query could be written as: SELECT * FROM sometable JOIN (VALUES ((val1), (val2) ...)) AS v(somefield) ON v.somefield=sometable.somefield WHERE other_filters; When there no index on somefield query plans would look like as: Original query: Filter: (somefield = ANY ('{...}'::integer[])) vs optimized query: Hash Join (cost=0.25..117.89 rows=22 width=59) (actual time=5.332..5.332 rows=0 loops=1) Hash Cond: (sometable.somefield = *VALUES*.somefield) ... - Hash (cost=0.12..0.12 rows=10 width=4) (actual time=0.010..0.010 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB - Values Scan on *VALUES* (cost=0.00..0.12 rows=10 width=4) (actual time=0.001..0.003 rows=10 loops=1) In synthetic data I observed the following performance results (fully in-memory data with integer values): List lengthIN Performance JOIN VALUES Performance 10 5.39ms 5.38ms 100 9.74ms 5.49ms 100053.02ms 9.89ms 1 231.10ms13.14ms So starting from 10 elements VALUES/HASH JOIN approach is clear winner. In case of the text literals IN list performance difference even more obvious (~2 order of magnitude for 1 list). However, if IN list used for the primary key lookup - there are no visible performance difference between these two approaches. So yes there are some space for optimization of Filter: (somefield = ANY ('{...}'::integer[])) via hashing. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.