Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread James B. Byrne

On Thu, January 17, 2008 10:15, Scott Marlowe wrote:

 If race conditions are a possible issue, you use a sequence and
 increment that until you get a number that isn't used.  That way two
 clients connecting at the same time can get different, available
 numbers.


That is close to the idea that I originally had.  I was simply wondering
if the built-in sequencer could handle this case or whether I need to roll
my own.

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:[EMAIL PROTECTED]
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread Scott Marlowe
On Jan 17, 2008 9:05 AM, James B. Byrne [EMAIL PROTECTED] wrote:

 If the entries involved numbered in the millions then Scott's approach has
 considerable merit.  In my case, as the rate of additions is very low and
 the size of the existing blocks is in the hundreds rather than hundreds of
 thousands then I believe that I will simply write my own iterator and do a
 repetitive select when on the incrementally proposed values until an
 opening is found then insert the new entry and update the iterator next
 value accordingly.

If race conditions are a possible issue, you use a sequence and
increment that until you get a number that isn't used.  That way two
clients connecting at the same time can get different, available
numbers.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread James B. Byrne

On Wed, January 16, 2008 18:40, Scott Marlowe wrote:
 You're essentially wanting to fill in the blanks here.  If you need
 good performance, then what you'll need to do is to preallocate all
 the numbers that haven't been assigned somewhere.  So, we make a table
 something like:

 create table locatorcodes (i int, count_id serial);

 Then we insert an id into that table for everyone that's missing from
 the main table:

 insert into locatorcodes (i)
 select b.i from (
 select * from generate_series(1,100)as i
 ) as b
 left join main_table a on (b.i=a.i)
 where a.i is null;

 Or something like that.

 Now, we've got a table with all the unused ids, and a serial count
 assigned to them.  Create another sequence:

 create checkout_sequence;

 and use that to check out numbers from locatorcodes:

 select i from locatorcodes where count_id=nextval('checkout_sequence');

 And since the sequence will just count up, there's little or no
 problems with performance.

 There's lots of ways of handling this.  That's just one of the ones
 that doesn't slow your database down a lot.

 If you need to, you can shuffle the numbers going into the
 locatorcodes table with an order by random() when you create it.


Martin and Scott,

Thank you both for your responses.

If the entries involved numbered in the millions then Scott's approach has
considerable merit.  In my case, as the rate of additions is very low and
the size of the existing blocks is in the hundreds rather than hundreds of
thousands then I believe that I will simply write my own iterator and do a
repetitive select when on the incrementally proposed values until an
opening is found then insert the new entry and update the iterator next
value accordingly.

Regards,

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:[EMAIL PROTECTED]
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread Scott Marlowe
On Jan 17, 2008 9:19 AM, James B. Byrne [EMAIL PROTECTED] wrote:

 On Thu, January 17, 2008 10:15, Scott Marlowe wrote:
 
  If race conditions are a possible issue, you use a sequence and
  increment that until you get a number that isn't used.  That way two
  clients connecting at the same time can get different, available
  numbers.
 

 That is close to the idea that I originally had.  I was simply wondering
 if the built-in sequencer could handle this case or whether I need to roll
 my own.

Yeah, the built in sequencer just increments by one, nothing else.
But it should be pretty easy to write a pl/pgsql function that grabs
the next value and loop until it finds one that's available.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread Scott Marlowe
On Jan 17, 2008 9:19 AM, James B. Byrne [EMAIL PROTECTED] wrote:

 On Thu, January 17, 2008 10:15, Scott Marlowe wrote:
 
  If race conditions are a possible issue, you use a sequence and
  increment that until you get a number that isn't used.  That way two
  clients connecting at the same time can get different, available
  numbers.
 

 That is close to the idea that I originally had.  I was simply wondering
 if the built-in sequencer could handle this case or whether I need to roll
 my own.

Got bored, hacked this aggregious pl/pgsql routine up.  It looks
horrible, but I wanted it to be able to use indexes.  Seems to work.
Test has ~750k rows and returns in it and returns a new id in   1ms
on my little server.

File attached.
drop table a;
drop sequence aseq;
create table a (i integer primary key);
create sequence aseq;
insert into a(i) select * from generate_series(1,100) where random()  0.25;

create or replace function getnext() returns int as $$
DECLARE
	niq int;
	tf bool;
BEGIN
	loop
		select nextval('aseq') into niq;
		select case 
			when (select true from (select niq as i) as x join a on (a.i=x.i)) 
			then TRUE 
			else FALSE end into tf;
		exit when not tf ;
	end loop;
	return niq;
END;
$$ language plpgsql;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-17 Thread James B. Byrne

On Thu, January 17, 2008 11:48, Scott Marlowe wrote:

 Got bored, hacked this aggregious pl/pgsql routine up.  It looks
 horrible, but I wanted it to be able to use indexes.  Seems to work.
 Test has ~750k rows and returns in it and returns a new id in   1ms
 on my little server.

 File attached.


Many thanks.

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:[EMAIL PROTECTED]
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-16 Thread James B. Byrne
I am prototyping a system migration that is to employ Ruby, Rails and
PostgreSQL.  Rails has the convention that the primary key of a row is an
arbitrary integer value assigned by the database manager through a
sequence.  As it turns out, the legacy application employs essentially the
same convention in most instances.

My question is this: Can one assign an id number to a sequenced key column
on create and override the sequencer?  If one does this then can and, if
so, how does the sequencer in Postgresql handle the eventuality of running
into a block of keys holding previously assigned numbers?

For example.  The existing client master dataset employs an eight digit
account number as primary key.  The values in use tend to cluster in
groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225,
3001..3312, ..., 2001001..2001476, ..., etc.

Assuming that these existing entries were all loaded into the new table
with the values given as their primary keys and given that one could not
simply start the sequencer at a value above the highest existing value: If
one was to add a record and auto-generate a sequence number then can the
sequencer handle looping from 1 to 375 and returning 376 and then continue
until it reaches 1001, when it needs be loop again until 1288 and then
return 1289 and so forth?

During the load of the initial table data it would probably be necessary
to disable the sequencer for this column. Is this in fact the case? If so,
how is this done and how is the sequencer restored after the initial
migration of data is complete?

I presume that I can write my own sequencer function to accomplish this in
any case but I wish to know if the existing method handles this case. 
Another solution is to simply decouple the existing key value from the new
and treat the exiting client number as a piece of data (with or without an
index), but that seems redundant since the new arbitrary key value might
just as well be the existing arbitrary key value.

Dispensing with the existing arbitrary number is another option but, the
fact is that present business practice is for employees to refer to their
clients and vendors by account number. The existing computer system is 25
years old but employs account numbers that predate automation. The firm is
well past the century mark and some of these numbers have been in use with
a few clients from the end of 1800's.  So, while not strictly a business
case, both the firm and some of its clients have a strong, if irrational,
attachment to preserving the existing scheme.

I regret if these questions appear naive but I am struggling with a lot of
new information on a number of fronts and as usual wish to get quick
answers to questions that may be far more involved than I realize.

Sincerely,

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:[EMAIL PROTECTED]
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-16 Thread Martijn van Oosterhout
On Fri, Jan 11, 2008 at 11:43:54AM -0500, James B. Byrne wrote:
 My question is this: Can one assign an id number to a sequenced key column
 on create and override the sequencer?  If one does this then can and, if
 so, how does the sequencer in Postgresql handle the eventuality of running
 into a block of keys holding previously assigned numbers?

You can set the counter during create, or at any time later. However,
the counter is not defined by the column as such and will happily
return numbers already in the table if you screw it up.

The usual process is to insert normally when loading the data and then
do a setval() on the sequence to past the values already stored.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [GENERAL] Help with pre-loaded arbitrary key sequences

2008-01-16 Thread Scott Marlowe
On Jan 11, 2008 10:43 AM, James B. Byrne [EMAIL PROTECTED] wrote:
 I am prototyping a system migration that is to employ Ruby, Rails and
 PostgreSQL.  Rails has the convention that the primary key of a row is an
 arbitrary integer value assigned by the database manager through a
 sequence.  As it turns out, the legacy application employs essentially the
 same convention in most instances.

 My question is this: Can one assign an id number to a sequenced key column
 on create and override the sequencer?  If one does this then can and, if
 so, how does the sequencer in Postgresql handle the eventuality of running
 into a block of keys holding previously assigned numbers?

 For example.  The existing client master dataset employs an eight digit
 account number as primary key.  The values in use tend to cluster in
 groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225,
 3001..3312, ..., 2001001..2001476, ..., etc.

 Assuming that these existing entries were all loaded into the new table
 with the values given as their primary keys and given that one could not
 simply start the sequencer at a value above the highest existing value: If
 one was to add a record and auto-generate a sequence number then can the
 sequencer handle looping from 1 to 375 and returning 376 and then continue
 until it reaches 1001, when it needs be loop again until 1288 and then
 return 1289 and so forth?

You're essentially wanting to fill in the blanks here.  If you need
good performance, then what you'll need to do is to preallocate all
the numbers that haven't been assigned somewhere.  So, we make a table
something like:

create table locatorcodes (i int, count_id serial);

Then we insert an id into that table for everyone that's missing from
the main table:

insert into locatorcodes (i)
select b.i from (
select * from generate_series(1,100)as i
) as b
left join main_table a on (b.i=a.i)
where a.i is null;

Or something like that.

Now, we've got a table with all the unused ids, and a serial count
assigned to them.  Create another sequence:

create checkout_sequence;

and use that to check out numbers from locatorcodes:

select i from locatorcodes where count_id=nextval('checkout_sequence');

And since the sequence will just count up, there's little or no
problems with performance.

There's lots of ways of handling this.  That's just one of the ones
that doesn't slow your database down a lot.

If you need to, you can shuffle the numbers going into the
locatorcodes table with an order by random() when you create it.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly