[HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
Recently, in preparation for migrating an application to postgres, I
got to this part of the manual (which is *excellent* so far, by the
way):

http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html

A quick check with the folks on #postgresql confirmed my
understanding, which was that the locking semantics of setval() and
nextval() make this unsafe:

SELECT setval('my_seq', nextval('my_seq') + 500);

Now, I was reminded that I could simply do this:

SELECT nextval('my_seq') FROM generate_series(1, 500);

But of course then I would have no guarantee that I would get a
contiguous block of ids, which means if I'm using this to do a mass
insert of records which refer to each others' ids (example: storing a
directed, linear graph), I either have to do a correlated update on
the client side, after transferring the keys (consider the cost of
doing this for a few million records - 4 MB in keys per million
records, for, in extreme cases, 12 MB of data to be inserted -- 33%
overhead in the worst case, presuming symmetric bandwidth), or I have
to insert into a temporary table, then have the db backend do the
update, then insert from there to the real table. Both are imperfect
options in terms of performance and complexity.

Thus, before I start work on it, I propose an extension to the current
nextval():

SELECT nextval('my_seq', 500);

This would increment the my_seq sequence by its interval * 500, and
return the first valid key. This both makes client code that needs a
bunch of PKs simpler to implement, and saves in performance, since the
client can just replace all its PKs (presuming they're currently a
contiguous block from 1 to n) with my_starting_pk + current_pk, so
this:

 pk | next_node
+---
  0 | 1
  1 | 2
  2 | 0

can be easily updated like this:

SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
currval('my_seq') + next_node;

to something like this:

   pk   | next_node
+--
 521650 |521651
 521651 |521652
 521652 |521650

This is a net gain of performance and ease of implementation in many
cases where a large number of ids from a sequence are needed -- with a
small added benefit of the keys being guaranteed to be contiguous.

I don't see any technical problems with this; postgres already can
pre-allocate more than one key, but the number is semi-static (the
CACHE parameter to CREATE SEQUENCE). This might break existing user
code if they've defined a nextval(regclass, integer), but I don't see
any way to

Finally, I've checked sequence.c -- this looks pretty straightforward
to implement, but I figured checking with this list was wise before
starting work. Apologies if I've been overly wordy.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Kenneth Marshall
Hi Peter,

All you need to do is define your own sequence with an
increment of 500. Look at:

http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

Regards,
Ken

On Fri, May 14, 2010 at 02:56:18PM -0400, Peter Crabtree wrote:
 Recently, in preparation for migrating an application to postgres, I
 got to this part of the manual (which is *excellent* so far, by the
 way):
 
 http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html
 
 A quick check with the folks on #postgresql confirmed my
 understanding, which was that the locking semantics of setval() and
 nextval() make this unsafe:
 
 SELECT setval('my_seq', nextval('my_seq') + 500);
 
 Now, I was reminded that I could simply do this:
 
 SELECT nextval('my_seq') FROM generate_series(1, 500);
 
 But of course then I would have no guarantee that I would get a
 contiguous block of ids, which means if I'm using this to do a mass
 insert of records which refer to each others' ids (example: storing a
 directed, linear graph), I either have to do a correlated update on
 the client side, after transferring the keys (consider the cost of
 doing this for a few million records - 4 MB in keys per million
 records, for, in extreme cases, 12 MB of data to be inserted -- 33%
 overhead in the worst case, presuming symmetric bandwidth), or I have
 to insert into a temporary table, then have the db backend do the
 update, then insert from there to the real table. Both are imperfect
 options in terms of performance and complexity.
 
 Thus, before I start work on it, I propose an extension to the current
 nextval():
 
 SELECT nextval('my_seq', 500);
 
 This would increment the my_seq sequence by its interval * 500, and
 return the first valid key. This both makes client code that needs a
 bunch of PKs simpler to implement, and saves in performance, since the
 client can just replace all its PKs (presuming they're currently a
 contiguous block from 1 to n) with my_starting_pk + current_pk, so
 this:
 
  pk | next_node
 +---
   0 | 1
   1 | 2
   2 | 0
 
 can be easily updated like this:
 
 SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
 UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
 currval('my_seq') + next_node;
 
 to something like this:
 
pk   | next_node
 +--
  521650 |521651
  521651 |521652
  521652 |521650
 
 This is a net gain of performance and ease of implementation in many
 cases where a large number of ids from a sequence are needed -- with a
 small added benefit of the keys being guaranteed to be contiguous.
 
 I don't see any technical problems with this; postgres already can
 pre-allocate more than one key, but the number is semi-static (the
 CACHE parameter to CREATE SEQUENCE). This might break existing user
 code if they've defined a nextval(regclass, integer), but I don't see
 any way to
 
 Finally, I've checked sequence.c -- this looks pretty straightforward
 to implement, but I figured checking with this list was wise before
 starting work. Apologies if I've been overly wordy.
 
 Peter
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread hubert depesz lubaczewski
On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
 Hi Peter,
 
 All you need to do is define your own sequence with an
 increment of 500. Look at:
 
 http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

This is often not enough. For example - I want standard increment of 1,
but right now I'm importing 1 objects, and it would be simpler for
me to get 1 ids. Preferably in one block.

This is not achievable now. I know I can 'alter sequence set increment
by' - but this will also affect concurrent sessions. which might not be
a problem, but it's a side effect that I don't want.

+1 for original proposition, would love to get it.

depesz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Tom Lane
Peter Crabtree peter.crabt...@gmail.com writes:
 Now, I was reminded that I could simply do this:

 SELECT nextval('my_seq') FROM generate_series(1, 500);

 But of course then I would have no guarantee that I would get a
 contiguous block of ids,

The existing cache behavior will already handle that for you,
I believe.  I don't really see a need for new features here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Robert Haas
On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski
dep...@depesz.com wrote:
 On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
 Hi Peter,

 All you need to do is define your own sequence with an
 increment of 500. Look at:

 http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

 This is often not enough. For example - I want standard increment of 1,
 but right now I'm importing 1 objects, and it would be simpler for
 me to get 1 ids. Preferably in one block.

 This is not achievable now. I know I can 'alter sequence set increment
 by' - but this will also affect concurrent sessions. which might not be
 a problem, but it's a side effect that I don't want.

 +1 for original proposition, would love to get it.

If we do this, I'm inclined to think that the extra argument to
nextval() should be treated as overriding the base increment rather
than specifying a multiplier for it.  Other than that nitpick, it
sounds like a reasonable thing to allow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
On Fri, May 14, 2010 at 5:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Crabtree peter.crabt...@gmail.com writes:
 Now, I was reminded that I could simply do this:

 SELECT nextval('my_seq') FROM generate_series(1, 500);

 But of course then I would have no guarantee that I would get a
 contiguous block of ids,

 The existing cache behavior will already handle that for you,
 I believe.  I don't really see a need for new features here.

I don't see how that works for this case, because the cache setting
is static, and also shared between sessions. So if I have 10 records
one time, and 100 records the next, and 587 the third time, what
should my CACHE be set to for that sequence?

And if I do ALTER SEQUENCE SET CACHE each time, I have either killed
concurrency (because I'm locking other sessions out of using that
sequence until I'm finished with it), or I have a race condition (if
someone else issues an ALTER SEQUENCE before I call nextval()). The
same problem exists with using ALTER SEQUENCE SET INCREMENT BY.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
On Fri, May 14, 2010 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski
 dep...@depesz.com wrote:
 On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
 Hi Peter,

 All you need to do is define your own sequence with an
 increment of 500. Look at:

 http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

 This is often not enough. For example - I want standard increment of 1,
 but right now I'm importing 1 objects, and it would be simpler for
 me to get 1 ids. Preferably in one block.

 This is not achievable now. I know I can 'alter sequence set increment
 by' - but this will also affect concurrent sessions. which might not be
 a problem, but it's a side effect that I don't want.

 +1 for original proposition, would love to get it.

 If we do this, I'm inclined to think that the extra argument to
 nextval() should be treated as overriding the base increment rather
 than specifying a multiplier for it.  Other than that nitpick, it
 sounds like a reasonable thing to allow.


After giving it some thought, that sounds better. You gain some
functionality that way (temporarily overriding the interval) and lose
none.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Tom Lane
Peter Crabtree peter.crabt...@gmail.com writes:
 On Fri, May 14, 2010 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote:
 If we do this, I'm inclined to think that the extra argument to
 nextval() should be treated as overriding the base increment rather
 than specifying a multiplier for it.  Other than that nitpick, it
 sounds like a reasonable thing to allow.

 After giving it some thought, that sounds better. You gain some
 functionality that way (temporarily overriding the interval) and lose
 none.

Well, what you lose is the previous assurance that values of nextval()
were always multiples of the increment.  I could see that breaking
applications that are using non-unity increments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Robert Haas
On Fri, May 14, 2010 at 6:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Crabtree peter.crabt...@gmail.com writes:
 On Fri, May 14, 2010 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote:
 If we do this, I'm inclined to think that the extra argument to
 nextval() should be treated as overriding the base increment rather
 than specifying a multiplier for it.  Other than that nitpick, it
 sounds like a reasonable thing to allow.

 After giving it some thought, that sounds better. You gain some
 functionality that way (temporarily overriding the interval) and lose
 none.

 Well, what you lose is the previous assurance that values of nextval()
 were always multiples of the increment.  I could see that breaking
 applications that are using non-unity increments.

Err, right.  But those applications presumably will also not be using
this new behavior.  There are no versions of PG that have an extra
argument to nextval but still guarantee that the values of nextval()
are multiples of the increment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers