On 06 Oct 2014, at 10:02, Richard Frith-Macdonald 
<richard.frith-macdon...@brainstorm.co.uk> wrote:

> I'm wondering if anyone can help with advice on how to manage large 
> lists/sets of items in a postgresql database.
> 
> I have a database which uses multiple  lists of items roughly like this:
> 
> CREATE TABLE List (
>  ID SERIAL,
>  Name VARCHAR ....
> );
> 
> and a table containing individual entries in the lists:
> 
> CREATE TABLE ListEntry (
>  ListID INT, /* Reference the List table */
>  ItemID INT /* References an Item table */
> ) ;
> CREATE UNIQUE INDEX ListEntryIDX ON ListEntry(ListID, ItemID);

Don’t you have any PK’s? A UNIQUE INDEX is not the same as a PK, a PK does not 
allow NULLs for example.

For that matter, I’d ditch the serial column in List - it attributes to a 
larger index size which decreases the chances that the index will fit in 
memory, making it less feasable to the query planner. IMHO, natural keys are to 
be preferred here over surrogate keys. That is assuming that List.Name is 
supposed to be unique.

> Now, there are thousands of lists, many with millions of entries, and items 
> are added to and removed from lists in an unpredictable way (in response to 
> our customer's actions, not under our control).  Lists are also created by 
> customer actions.


> I think that server won't use index-only scans because, even in cases where a 
> particular list has not had any recent changes, the ListEntry table will 
> almost always have had some change (for one of the other lists) since its 
> last vacuum.
> Perhaps creating multiple ListEntry tables (one for each list) would allow 
> better performance; but that would be thousands (possibly tens of thousands) 
> of tables, and allowing new tables to be created by our clients might 
> conflict with things like nightly backups.
> 
> Is there a better way to manage list/set membership for many thousands of 
> sets and many millions of items?

Another benefit of using natural keys is that you don’t need to fetch the 
actual List entries - the Names are right there in your ListEntry table. You 
only you need to look records up in the List table when you want their details 
(columns other than Name).

A possible drawback in this case is that the PK index on ListEntry would 
probably be larger.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to