[GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Rich Shepard

  I've a table (from a client, not created here) with a column that should
be the primary key, but not all rows have a value for this attribute. The
column format is VARCHAR(12) and has a variety of values, such as 96-A000672
and 9612-0881 (probably assigned by different analytical laboratories).

  A simple sequence of numbers would do the job of replacing NULL values.
What is the most parsimonious way to replace NULLs with unique values for
this column? I also need to add such values for a new set of data that I'm
in the process of translating from spreadsheet format to the table
structure.

Rich


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Chris Travers
On Thu, Aug 11, 2011 at 11:47 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I've a table (from a client, not created here) with a column that should
 be the primary key, but not all rows have a value for this attribute. The
 column format is VARCHAR(12) and has a variety of values, such as 96-A000672
 and 9612-0881 (probably assigned by different analytical laboratories).

  A simple sequence of numbers would do the job of replacing NULL values.
 What is the most parsimonious way to replace NULLs with unique values for
 this column? I also need to add such values for a new set of data that I'm
 in the process of translating from spreadsheet format to the table
 structure.

The simplest seems to me to be a sequence and use nextval() to
populate the null values.  The major advantage would be that the
sequence could stay around in case you need it again.  So for example:

create sequence my_varchar_values;

UPDATE my_table set my_varchar =
nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;

You could also use windowing functions to get rid of the sequence, but
the queries become a lot more complicated.  For example, see
http://stackoverflow.com/questions/4358613/using-window-functions-in-an-update-statement

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Rich Shepard

On Thu, 11 Aug 2011, Chris Travers wrote:


The simplest seems to me to be a sequence and use nextval() to populate
the null values. The major advantage would be that the sequence could stay
around in case you need it again. So for example:

create sequence my_varchar_values;



UPDATE my_table set my_varchar =
nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;


Chris,

  I was wondering if this was the best approach since I have new data to add
to the table. Don't need a starting value, eh?

Many thanks,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Chris Travers
On Thu, Aug 11, 2011 at 12:34 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Thu, 11 Aug 2011, Chris Travers wrote:

 The simplest seems to me to be a sequence and use nextval() to populate
 the null values. The major advantage would be that the sequence could stay
 around in case you need it again. So for example:

 create sequence my_varchar_values;

 UPDATE my_table set my_varchar =
 nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;

 Chris,

  I was wondering if this was the best approach since I have new data to add
 to the table. Don't need a starting value, eh?


TBH, it's the approach I would use.  It creates one additional
database object but the queries are simpler and thus more
maintainable.

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread David Johnston
On Aug 11, 2011, at 15:08, Chris Travers chris.trav...@gmail.com wrote:

 On Thu, Aug 11, 2011 at 11:47 AM, Rich Shepard rshep...@appl-ecosys.com 
 wrote:
  I've a table (from a client, not created here) with a column that should
 be the primary key, but not all rows have a value for this attribute. The
 column format is VARCHAR(12) and has a variety of values, such as 96-A000672
 and 9612-0881 (probably assigned by different analytical laboratories).
 
  A simple sequence of numbers would do the job of replacing NULL values.
 What is the most parsimonious way to replace NULLs with unique values for
 this column? I also need to add such values for a new set of data that I'm
 in the process of translating from spreadsheet format to the table
 structure.
 

The technical aspect is covered but consider using one or two characters as a 
prefix related to the data source.  You've already taken the hit for using a 
text data type so you might as well take advantage of it.  Even if you have a 
source field this can be useful.  The only costraint is you limit the number of 
sequence values you can use (per source).

If you use the sequence you can always reset it between imports.

David J.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Steve Atkins

On Aug 11, 2011, at 12:34 PM, Rich Shepard wrote:

 On Thu, 11 Aug 2011, Chris Travers wrote:
 
 The simplest seems to me to be a sequence and use nextval() to populate
 the null values. The major advantage would be that the sequence could stay
 around in case you need it again. So for example:
 
 create sequence my_varchar_values;
 
 UPDATE my_table set my_varchar =
 nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;
 
 Chris,
 
  I was wondering if this was the best approach since I have new data to add
 to the table. Don't need a starting value, eh?

This will fail if any of the existing values are integers in the range that
you're inserting - and it may fail in the future, as you add new records
if they clash with existing entries.

It's still a good way to go, but might need some care or some tweaking -
adding a prefix, maybe.

Cheers,
  Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Filling Missing Primary Key Values

2011-08-11 Thread Rich Shepard

On Thu, 11 Aug 2011, Steve Atkins wrote:


This will fail if any of the existing values are integers in the range that
you're inserting - and it may fail in the future, as you add new records
if they clash with existing entries.


Steve/Chris/Dave:

  I had not looked in deatil at that column before. Having just done this, I
see that it's really a laboratory number, not a unique sample ID. So, I
renamed sample_id to lab_nbr, added a sample_id column, created the sequence
sample_id_seq, updated the table with it, then added the constraint that
sample_id is the primary key.

  Thank you all very much!

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general