[SQL] Cannot insert dup id in pk
Hello, I sent this question yesterday morning, but it was not allowed because I wasn't subscribed to the list. If it did make it through, I appologize for the dup. I am having strange behavior when I try to do an insert. Postgres tells me that it cannot insert a duplicate key into a primary key index, when I am not trying to insert into it. Can anyone shed light on why this is happening, or point out the error of my ways? Here are the details: wormbase=> \d fdata Table "public.fdata" Column | Type | Modifiers ---++--- - fid | integer| not null default nextval('public.fdata _fid_seq'::text) fref | character varying(100) | not null default '' fstart| integer| not null default '0' fstop | integer| not null default '0' fbin | double precision | not null default '0.00' ftypeid | integer| not null default '0' fscore| double precision | fstrand | character varying(3) | fphase| character varying(3) | gid | integer| not null default '0' ftarget_start | integer| ftarget_stop | integer| Indexes: pk_fdata primary key btree (fid), fdata_fref_idx btree (fref, fbin, fstart, fstop, ftypeid, gid), fdata_ftypeid_idx btree (ftypeid), fdata_gid_idx btree (gid) Check constraints: "chk_fdata_fstrand" ((fstrand = '+'::character varying) OR (f strand = '-'::character varying)) "chk_fdata_fphase" (((fphase = '0'::character varying) OR (fp hase = '1'::character varying)) OR (fphase = '2'::character varying)) Now a chunk from my query log: Jul 14 12:48:47 localhost postgres[2998]: [107-1] LOG: query: INSERT INTO fdata (fref,fstart,fstop,fbin,ftypeid,fscore,fstrand,fphase,gid,ftarget_start,ftarget_stop) Jul 14 12:48:47 localhost postgres[2998]: [107-2] VALUES('XX',7327656,7327658,'1000.007327','41',NULL,'+',NULL,'12358',NULL,NULL) Jul 14 12:48:47 localhost postgres[2998]: [108] ERROR: Cannot insert a duplicate key into unique index pk_fdata Note that I do not try to insert anything into fid, the primary key on this table. Why does Postgres think I am? Thanks much, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Cannot insert dup id in pk
On Tue, 2003-07-15 at 10:43, Dmitry Tkach wrote: > You must have your sequence out of date with the content of the table > (so that the next value in the sequence has already been inserted). > One way to get into a situation like that is loading the table data with > COPY (the input contains the pks, and the COPY command does not update > the sequence, you have to do that manually after the copy is done). Yes, this is exactly what happened. I had no idea that copy didn't update the sequence. I suspect I've got users who are being bitten by this and don't realize it. I'll have to change my "bulk loading" script to update the sequence after the load is done. Thanks much, Scott -- ---- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] OR vs UNION
Hello, I have a query that uses a series of ORs and I have heard that sometimes this type of query can be rewritten to use UNION instead and be more efficient. Are there any rules of thumb for when this might be the case? As an example here is a query of the type I am discussing: select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id from feature f, featureloc fl where (f.type_id = 219 OR f.type_id = 368 OR f.type_id = 514 OR f.type_id = 475 OR f.type_id = 426 OR f.type_id = 456 OR f.type_id = 461 OR f.type_id = 553 OR f.type_id = 89) and fl.srcfeature_id = 1 and f.feature_id = fl.feature_id and fl.fmin <= 2491413 and fl.fmax >= 2485521 which could easily be rewritten as a set of select statements for each type_id and then union them together. For this particular query, explain analyze indicates that this is the more efficient form, but I could easily see that at other times/for other parameters, a set unioned together would be better. Are there any guidelines for this? Thanks, Scott -- ---- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] OR vs UNION
On Thu, 2003-07-17 at 15:00, Josh Berkus wrote: > Scott, > > > I have a query that uses a series of ORs and I have heard that sometimes > > this type of query can be rewritten to use UNION instead and be more > > efficient. > > I'd be interested to know where you heard that; as far as I know, it could > only apply to conditional left outer joins. Hmm, don't know for sure where I heard it, however I do know from previous experience that unioned queries worked better in a somewhat similar query, though it was a different schema, so it is hard to compare directly. One way in which I thought it might make a difference is if I build partial indexes on feature_id for each of the type_ids of interest (there are several thousand in the database, but only 15 or 20 that I am interested in querying). That way, when I write the separate queries for each type_id, the query planner would have access to the partial indexes for each type, and therefore may be able to complete the individual queries very quickly. > > > select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id > > from feature f, featureloc fl > > where > >(f.type_id = 219 OR > > f.type_id = 368 OR > > f.type_id = 514 OR > > f.type_id = 475 OR > > f.type_id = 426 OR > > f.type_id = 456 OR > > f.type_id = 461 OR > > f.type_id = 553 OR > > f.type_id = 89) and > > fl.srcfeature_id = 1 and > > f.feature_id = fl.feature_id and > > fl.fmin <= 2491413 and fl.fmax >= 2485521 > > Certainly a query of the above form would not benefit from being a union. > > For readability, you could use an IN() statement rather than a bunch of ORs > ... this would not help performance, but would make your query easier to > type/read. -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[SQL] Cannot insert dup id in pk
Hello, I am having strange behavior when I try to do an insert. Postgres tells me that it cannot insert a duplicate key into a primary key index, when I am not trying to insert into it. Can anyone shed light on why this is happen, or point out the error of my ways? Here are the details: wormbase=> \d fdata Table "public.fdata" Column | Type | Modifiers ---++--- - fid | integer| not null default nextval('public.fdata _fid_seq'::text) fref | character varying(100) | not null default '' fstart| integer| not null default '0' fstop | integer| not null default '0' fbin | double precision | not null default '0.00' ftypeid | integer| not null default '0' fscore| double precision | fstrand | character varying(3) | fphase| character varying(3) | gid | integer| not null default '0' ftarget_start | integer| ftarget_stop | integer| Indexes: pk_fdata primary key btree (fid), fdata_fref_idx btree (fref, fbin, fstart, fstop, ftypeid, gid), fdata_ftypeid_idx btree (ftypeid), fdata_gid_idx btree (gid) Check constraints: "chk_fdata_fstrand" ((fstrand = '+'::character varying) OR (f strand = '-'::character varying)) "chk_fdata_fphase" (((fphase = '0'::character varying) OR (fp hase = '1'::character varying)) OR (fphase = '2'::character varying)) Now a chunk from my query log: Jul 14 12:48:47 localhost postgres[2998]: [107-1] LOG: query: INSERT INTO fdata (fref,fstart,fstop,fbin,ftypeid,fscore,fstrand,fphase,gid,ftarget_start,ftarget_stop) Jul 14 12:48:47 localhost postgres[2998]: [107-2] VALUES('XX',7327656,7327658,'1000.007327','41',NULL,'+',NULL,'12358',NULL,NULL) Jul 14 12:48:47 localhost postgres[2998]: [108] ERROR: Cannot insert a duplicate key into unique index pk_fdata Note that I do not try to insert anything into fid, the primary key on this table. Why does Postgres think I am? Thanks much, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
Hello, Note: there is a SQL question way at the bottom of this narrative :-) Last week I asked about doing substring operations on very long strings (>10 million characters). I was given a suggestion to use EXTERNAL storage on the column via the ALTER TABLE ... SET STORAGE command. In one test case, the performance of substring actually got worse using EXTERNAL storage. In an effort to find the best way to do this operation, I decided to look at what is my "worst case" scenario: the DNA sequence for human chromosome 1, which is about 250 million characters long (previous strings where about 20 million characters long). I wrote a perl script to do several substring operations over this very long string, with substring lengths varying between 1000 and 40,000 characters spread out over various locations along the string. While EXTENDED storage won in this case, it was a hollow victory: 38 seconds per operation versus 40 seconds, both of which are way too long to for an interactive application. Time for a new method. A suggestion from my boss was to "shred" the DNA into smallish chunks and a column giving offsets from the beginning of the string, so that it can be reassembled when needed. Here is the test table: string=> \d dna Table "public.dna" Column | Type | Modifiers -+-+--- foffset | integer | pdna| text| Indexes: foffset_idx btree (foffset) In practice, there would also be a foreign key column to give the identifier of the dna. Then I wrote the following function (here's the SQL part promised above): CREATE OR REPLACE FUNCTION dna_string (integer, integer) RETURNS TEXT AS ' DECLARE smin ALIAS FOR $1; smax ALIAS FOR $2; longdna TEXT := ''''; dna_row dna%ROWTYPE; dnastring TEXT; firstchunk INTEGER; lastchunk INTEGER; in_longdnastart INTEGER; in_longdnalen INTEGER; chunksize INTEGER; BEGIN SELECT INTO chunksize min(foffset) FROM dna WHERE foffset>0; firstchunk := chunksize*(smin/chunksize); lastchunk := chunksize*(smax/chunksize); in_longdnastart := smin % chunksize; in_longdnalen := smax - smin + 1; FOR dna_row IN SELECT * FROM dna WHERE foffset >= firstchunk AND foffset <= lastchunk ORDER BY foffset LOOP longdna := longdna || dna_row.pdna; END LOOP; dnastring := substring(longdna FROM in_longdnastart FOR in_longdnalen); RETURN dnastring; END; ' LANGUAGE 'plpgsql'; So here's the question: I've never written a plpgsql function before, so I don't have much experience with it; is there anything obviously wrong with this function, or are there things that could be done better? At least this appears to work and is much faster, completing substring operations like above in about 0.27 secs (that's about two orders of magnitude improvement!) Thanks, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On Mon, 2003-08-04 at 11:55, Richard Huxton wrote: > On Monday 04 August 2003 16:25, Scott Cain wrote: > [snip] > > [snip] > > You might want some checks to make sure that smin < smax, otherwise looks like > it does the job in a good clean fashion. Good point--smin < smax generally by virtue of the application using the database, but I shouldn't assume that will always be the case. > > Glad to hear it's going to solve your problems. Two things you might want to > bear in mind: > 1. There's probably a "sweet spot" where the chunk size interacts well with > your data, usage patterns and PGs backend to give you peak performance. > You'll have to test. Yes, I had a feeling that was probably the case-- since this is an open source project, I will need to write directions for installers on picking a reasonable chunk size. > 2. If you want to search for a sequence you'll need to deal with the case > where it starts in one chunk and ends in another. I forgot about searching--I suspect that application is why I faced opposition for shredding in my schema development group. Maybe I should push that off to the file system and use grep (or BLAST). Otherwise, I could write a function that would search the chunks first, then after failing to find the substring in those, I could start sewing the chunks together to look for the query string. That could get ugly (and slow--but if the user knows that and expects it to be slow, I'm ok with that). Thanks, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On Mon, 2003-08-04 at 11:53, Tom Lane wrote: > Scott Cain <[EMAIL PROTECTED]> writes: > > At least this appears to work and is much faster, completing substring > > operations like above in about 0.27 secs (that's about two orders of > > magnitude improvement!) > > I find it really, really hard to believe that a crude reimplementation > in plpgsql of the TOAST concept could beat the built-in implementation > at all, let alone beat it by two orders of magnitude. > > Either there's something unrealistic about your testing of the > dna_string function, or your original tests are not causing TOAST to be > invoked in the expected way, or there's a bug we need to fix. I'd > really like to see some profiling of the poor-performing > external-storage case, so we can figure out what's going on. > I was really hoping for a "Good job and glad to hear it" from you :-) I don't think there is anything unrealistic about my function or its testing, as it is very much along the lines of the types of things we do now. I will really try to do some profiling this week to help figure out what is going on. Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
Joe, Good idea, since I may not get around to profiling it this week. I created a dump of the data set I was working with. It is available at http://www.gmod.org/string_dump.bz2 Thanks, Scott On Mon, 2003-08-04 at 16:29, Joe Conway wrote: > Is there a sample table schema and dataset available (external-storage > case) that we can play with? > > Joe -- ---- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is 109M. Scott On Tue, 2003-08-05 at 11:01, Scott Cain wrote: > Joe, > > Good idea, since I may not get around to profiling it this week. I > created a dump of the data set I was working with. It is available at > http://www.gmod.org/string_dump.bz2 > > Thanks, > Scott > > > On Mon, 2003-08-04 at 16:29, Joe Conway wrote: > > Is there a sample table schema and dataset available (external-storage > > case) that we can play with? > > > > Joe -- -------- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] Writing a generic delete rule for a view
Hello, I have a relatively straight forward set up with two schemas, the main public schema and a private schema (called "testschema" here). The tables in both schemas are nearly identical: for each table "foo" in the public schema, there is a table "testschema._foo" and a view "testschema.foo" in the private schema, where testschema.foo is a simple union of the public and private tables, to make it appear when querying in the private schema that the tables are exactly the same as the public schema, and that the tables have a union of what is in the private and public schemas. Now, I would like to allow insert, update and delete rules on the private views to do the same actions in private tables (that is, an insert into testschema.foo should really do an insert into testschema._foo). Writing rules to do the update and inserts was fairly straight forward. What I don't understand how to do is a general rule for doing deletes on testschema._foo. What I would like would be a way to "grab" the where clause and append it onto the DO INSTEAD for the rule. I'm thinking something like this: CREATE OR REPLACE RULE testschema_delete_foo AS ON DELETE TO testschema.foo DO INSTEAD (DELETE FROM testschema._foo WHERE $where); So, what I want to know is, is there a "magic" way to get that $where? Thanks, Scott -- -------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql