Re: Proposal: implement Postgres CLUSTER USING by default for MtM tables
On May 21, 2013, at 9:28 PM, bry...@obviously.com wrote: > The tables created by django for MtM fields are unordered. Loading all the > relationships may result in a number of I/O operations approaching the number > of relationships: > > # select * from basetable; > id |basetable_id| mtm_id > ++-- > 6 | 3 |1 > 7 | 2 |2 > 15 | 1 |1 > 16 | 1 |2 > 18 | 3 |5 > > Postgres has a feature to 'defragment' such tables packing data that gets > accessed at the same time into the same disk blocks. The number of I/O > operations can be as small as one: > > # \pset pager off > # cluster basetable using table_mem_id; > # select * from basetable; > id |basetable_id| mtm_id > ++-- > 15 | 1 |1 > 16 | 1 |2 > 7 | 2 |2 > 6 | 3 |1 > 18 | 3 |5 > > On a large table this can put hundreds of MtM relationships sequentially on > disk. This makes for less I/O operations and less cache space used. > > > This feature is postgres specific. > And it would involve django making a guess about the use pattern of the > relationship. Either the forward or reverse is optimized. > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-developers+unsubscr...@googlegroups.com. > To post to this group, send email to django-developers@googlegroups.com. > Visit this group at http://groups.google.com/group/django-developers?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > What do you mean by "clustering by default". CLUSTER is a one time operation. You use to to arrange the orders of the row in a table but after they have been arranged it has no long lasting affects past that. - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Proposal: implement Postgres CLUSTER USING by default for MtM tables
Hi Bryce, An interesting idea. How are you proposing that this feature be exposed as an API? Is there any reason that this couldn't be handled as a DB-admin operation? i.e., Django just does what it normally does, and if a DB admin notices that table clustering is a problem, they apply the cluster command at the database level. Given that this is a very low level optimisation, is there any reason that this needs to be exposed as a Django-level API optimisation option? Yours, Russ Magee %-) On Wed, May 22, 2013 at 9:28 AM, wrote: > The tables created by django for MtM fields are unordered. Loading all > the relationships may result in a number of I/O operations approaching the > number of relationships: > > # select * from basetable; > id |basetable_id| mtm_id > ++-- > 6 | 3 |1 > 7 | 2 |2 > 15 | 1 |1 > 16 | 1 |2 > 18 | 3 |5 > > > Postgres has a feature to 'defragment' such tables packing data that gets > accessed at the same time into the same disk blocks. The number of I/O > operations can be as small as one: > > # \pset pager off > # cluster basetable using table_mem_id; > # select * from basetable; > > id |basetable_id| mtm_id > ++-- > 15 | 1 |1 > 16 | 1 |2 > 7 | 2 |2 > 6 | 3 |1 > 18 | 3 |5 > > On a large table this can put hundreds of MtM relationships sequentially > on disk. This makes for less I/O operations and less cache space used. > > > This feature is postgres specific. > And it would involve django making a guess about the use pattern of the > relationship. Either the forward or reverse is optimized. > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-developers+unsubscr...@googlegroups.com. > To post to this group, send email to django-developers@googlegroups.com. > Visit this group at http://groups.google.com/group/django-developers?hl=en > . > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Proposal: implement Postgres CLUSTER USING by default for MtM tables
The tables created by django for MtM fields are unordered. Loading all the relationships may result in a number of I/O operations approaching the number of relationships: # select * from basetable; id |basetable_id| mtm_id ++-- 6 | 3 |1 7 | 2 |2 15 | 1 |1 16 | 1 |2 18 | 3 |5 Postgres has a feature to 'defragment' such tables packing data that gets accessed at the same time into the same disk blocks. The number of I/O operations can be as small as one: # \pset pager off # cluster basetable using table_mem_id; # select * from basetable; id |basetable_id| mtm_id ++-- 15 | 1 |1 16 | 1 |2 7 | 2 |2 6 | 3 |1 18 | 3 |5 On a large table this can put hundreds of MtM relationships sequentially on disk. This makes for less I/O operations and less cache space used. This feature is postgres specific. And it would involve django making a guess about the use pattern of the relationship. Either the forward or reverse is optimized. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.