[SQL] Cannot insert dup id in pk

2003-07-15 Thread Scott Cain
Hello,

I sent this question yesterday morning, but it was not allowed because I
wasn't subscribed to the list.  If it did make it through, I appologize
for the dup.

I am having strange behavior when I try to do an insert.  Postgres tells
me that it cannot insert a duplicate key into a primary key index, when
I am not trying to insert into it.  Can anyone shed light on why this is
happening, or point out the error of my ways?

Here are the details:
wormbase=> \d fdata
  Table "public.fdata"
Column |  Type  |   Modifiers  
   
---++--- 
-
 fid   | integer| not null default nextval('public.fdata 
_fid_seq'::text)
 fref  | character varying(100) | not null default ''
 fstart| integer| not null default '0'
 fstop | integer| not null default '0'
 fbin  | double precision   | not null default '0.00'
 ftypeid   | integer| not null default '0'
 fscore| double precision   |
 fstrand   | character varying(3)   |
 fphase| character varying(3)   |
 gid   | integer| not null default '0'
 ftarget_start | integer|
 ftarget_stop  | integer|
Indexes: pk_fdata primary key btree (fid),
 fdata_fref_idx btree (fref, fbin, fstart, fstop, ftypeid, gid),
 fdata_ftypeid_idx btree (ftypeid),
 fdata_gid_idx btree (gid)
Check constraints: "chk_fdata_fstrand" ((fstrand = '+'::character varying) OR (f 
strand = '-'::character varying))
   "chk_fdata_fphase" (((fphase = '0'::character varying) OR (fp hase 
= '1'::character varying)) OR (fphase = '2'::character varying))

Now a chunk from my query log:
Jul 14 12:48:47 localhost postgres[2998]: [107-1] LOG:  query: INSERT INTO fdata 
(fref,fstart,fstop,fbin,ftypeid,fscore,fstrand,fphase,gid,ftarget_start,ftarget_stop)
Jul 14 12:48:47 localhost postgres[2998]: [107-2] 
VALUES('XX',7327656,7327658,'1000.007327','41',NULL,'+',NULL,'12358',NULL,NULL)
Jul 14 12:48:47 localhost postgres[2998]: [108] ERROR:  Cannot insert a duplicate key 
into unique index pk_fdata

Note that I do not try to insert anything into fid, the primary key on
this table.  Why does Postgres think I am?

Thanks much,
Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Cannot insert dup id in pk

2003-07-15 Thread Scott Cain
On Tue, 2003-07-15 at 10:43, Dmitry Tkach wrote:
> You must have your sequence out of date with the content of the table 
> (so that the next value in the sequence has already been inserted).
> One way to get into a situation like that is loading the table data with 
> COPY (the input contains the pks, and the COPY command does not update 
> the sequence, you have to do that manually after the copy is done).

Yes, this is exactly what happened.  I had no idea that copy didn't
update the sequence.  I suspect I've got users who are being bitten by
this and don't realize it.  I'll have to change my "bulk loading" script
to update the sequence after the load is done.

Thanks much,
Scott

-- 
----
Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[SQL] OR vs UNION

2003-07-17 Thread Scott Cain
Hello,

I have a query that uses a series of ORs and I have heard that sometimes
this type of query can be rewritten to use UNION instead and be more
efficient.  Are there any rules of thumb for when this might be the
case?  As an example here is a query of the type I am discussing:

 select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
 from feature f, featureloc fl
 where
   (f.type_id = 219 OR
f.type_id = 368 OR
f.type_id = 514 OR
f.type_id = 475 OR
f.type_id = 426 OR
f.type_id = 456 OR
f.type_id = 461 OR
f.type_id = 553 OR
f.type_id = 89) and
  fl.srcfeature_id = 1 and
  f.feature_id  = fl.feature_id and
  fl.fmin <= 2491413 and fl.fmax >= 2485521

which could easily be rewritten as a set of select statements for each
type_id and then union them together.  For this particular query,
explain analyze indicates that this is the more efficient form, but I
could easily see that at other times/for other parameters, a set unioned
together would be better.  Are there any guidelines for this?

Thanks,
Scott

-- 
----
Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] OR vs UNION

2003-07-17 Thread Scott Cain
On Thu, 2003-07-17 at 15:00, Josh Berkus wrote:
> Scott,
> 
> > I have a query that uses a series of ORs and I have heard that sometimes
> > this type of query can be rewritten to use UNION instead and be more
> > efficient.  
> 
> I'd be interested to know where you heard that; as far as I know, it could 
> only apply to conditional left outer joins.

Hmm, don't know for sure where I heard it, however I do know from
previous experience that unioned queries worked better in a somewhat
similar query, though it was a different schema, so it is hard to
compare directly.

One way in which I thought it might make a difference is if I build
partial indexes on feature_id for each of the type_ids of interest
(there are several thousand in the database, but only 15 or 20 that I am
interested in querying).  That way, when I write the separate queries
for each type_id, the query planner would have access to the partial
indexes for each type, and therefore may be able to complete the
individual queries very quickly.
> 
> >  select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
> >  from feature f, featureloc fl
> >  where
> >(f.type_id = 219 OR
> > f.type_id = 368 OR
> > f.type_id = 514 OR
> > f.type_id = 475 OR
> > f.type_id = 426 OR
> > f.type_id = 456 OR
> > f.type_id = 461 OR
> > f.type_id = 553 OR
> > f.type_id = 89) and
> >   fl.srcfeature_id = 1 and
> >   f.feature_id  = fl.feature_id and
> >   fl.fmin <= 2491413 and fl.fmax >= 2485521
> 
> Certainly a query of the above form would not benefit from being a union.
> 
> For readability, you could use an IN() statement rather than a bunch of ORs 
> ... this would not help performance, but would make your query easier to 
> type/read. 
-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] Cannot insert dup id in pk

2003-07-19 Thread Scott Cain
Hello,

I am having strange behavior when I try to do an insert.  Postgres tells
me that it cannot insert a duplicate key into a primary key index, when
I am not trying to insert into it.  Can anyone shed light on why this is
happen, or point out the error of my ways?

Here are the details:
wormbase=> \d fdata
  Table "public.fdata"
Column |  Type  |   Modifiers  
   
---++--- 
-
 fid   | integer| not null default nextval('public.fdata 
_fid_seq'::text)
 fref  | character varying(100) | not null default ''
 fstart| integer| not null default '0'
 fstop | integer| not null default '0'
 fbin  | double precision   | not null default '0.00'
 ftypeid   | integer| not null default '0'
 fscore| double precision   |
 fstrand   | character varying(3)   |
 fphase| character varying(3)   |
 gid   | integer| not null default '0'
 ftarget_start | integer|
 ftarget_stop  | integer|
Indexes: pk_fdata primary key btree (fid),
 fdata_fref_idx btree (fref, fbin, fstart, fstop, ftypeid, gid),
 fdata_ftypeid_idx btree (ftypeid),
 fdata_gid_idx btree (gid)
Check constraints: "chk_fdata_fstrand" ((fstrand = '+'::character varying) OR (f 
strand = '-'::character varying))
   "chk_fdata_fphase" (((fphase = '0'::character varying) OR (fp hase 
= '1'::character varying)) OR (fphase = '2'::character varying))

Now a chunk from my query log:
Jul 14 12:48:47 localhost postgres[2998]: [107-1] LOG:  query: INSERT INTO fdata 
(fref,fstart,fstop,fbin,ftypeid,fscore,fstrand,fphase,gid,ftarget_start,ftarget_stop)
Jul 14 12:48:47 localhost postgres[2998]: [107-2] 
VALUES('XX',7327656,7327658,'1000.007327','41',NULL,'+',NULL,'12358',NULL,NULL)
Jul 14 12:48:47 localhost postgres[2998]: [108] ERROR:  Cannot insert a duplicate key 
into unique index pk_fdata

Note that I do not try to insert anything into fid, the primary key on
this table.  Why does Postgres think I am?

Thanks much,
Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
Hello,

Note: there is a SQL question way at the bottom of this narrative :-)

Last week I asked about doing substring operations on very long strings
(>10 million characters).  I was given a suggestion to use EXTERNAL
storage on the column via the ALTER TABLE ... SET STORAGE command.  In
one test case, the performance of substring actually got worse using
EXTERNAL storage.  

In an effort to find the best way to do this operation, I decided to
look at what is my "worst case" scenario: the DNA sequence for human
chromosome 1, which is about 250 million characters long (previous
strings where about 20 million characters long).  I wrote a perl script
to do several substring operations over this very long string, with
substring lengths varying between 1000 and 40,000 characters spread out
over various locations along the string.  While EXTENDED storage won in
this case, it was a hollow victory: 38 seconds per operation versus 40
seconds, both of which are way too long to for an interactive
application.

Time for a new method.  A suggestion from my boss was to "shred" the DNA
into smallish chunks and a column giving offsets from the beginning of
the string, so that it can be reassembled when needed. Here is the test
table:

string=> \d dna
  Table "public.dna"
 Column  |  Type   | Modifiers
-+-+---
 foffset | integer |
 pdna| text|
Indexes: foffset_idx btree (foffset)

In practice, there would also be a foreign key column to give the
identifier of the dna.  Then I wrote the following function (here's the
SQL part promised above):

CREATE OR REPLACE FUNCTION dna_string (integer, integer) RETURNS TEXT AS '
DECLARE
smin ALIAS FOR $1;
smax ALIAS FOR $2;
longdna TEXT := '''';
dna_row dna%ROWTYPE;
dnastring   TEXT;
firstchunk  INTEGER;
lastchunk   INTEGER;
in_longdnastart INTEGER;
in_longdnalen   INTEGER;
chunksize   INTEGER;
BEGIN
SELECT INTO chunksize min(foffset) FROM dna WHERE foffset>0;
firstchunk :=  chunksize*(smin/chunksize);
lastchunk  :=  chunksize*(smax/chunksize);

in_longdnastart := smin % chunksize;
in_longdnalen   := smax - smin + 1;

FOR dna_row IN
SELECT * FROM dna
WHERE foffset >= firstchunk AND foffset <= lastchunk
ORDER BY foffset
LOOP

longdna := longdna || dna_row.pdna;
END LOOP;

dnastring := substring(longdna FROM in_longdnastart FOR in_longdnalen);

RETURN dnastring;
END;
' LANGUAGE 'plpgsql';

So here's the question: I've never written a plpgsql function before, so
I don't have much experience with it; is there anything obviously wrong
with this function, or are there things that could be done better?  At
least this appears to work and is much faster, completing substring
operations like above in about 0.27 secs (that's about two orders of
magnitude improvement!)

Thanks,
Scott


-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
On Mon, 2003-08-04 at 11:55, Richard Huxton wrote:
> On Monday 04 August 2003 16:25, Scott Cain wrote:
> [snip]
> > [snip]
> 
> You might want some checks to make sure that smin < smax, otherwise looks like 
> it does the job in a good clean fashion.

Good point--smin < smax generally by virtue of the application using the
database, but I shouldn't assume that will always be the case.
> 
> Glad to hear it's going to solve your problems. Two things you might want to 
> bear in mind:
> 1. There's probably a "sweet spot" where the chunk size interacts well with 
> your data, usage patterns and PGs backend to give you peak performance. 
> You'll have to test.

Yes, I had a feeling that was probably the case-- since this is an open
source project, I will need to write directions for installers on
picking a reasonable chunk size.

> 2. If you want to search for a sequence you'll need to deal with the case 
> where it starts in one chunk and ends in another.

I forgot about searching--I suspect that application is why I faced
opposition for shredding in my schema development group.  Maybe I should
push that off to the file system and use grep (or BLAST).  Otherwise, I
could write a function that would search the chunks first, then after
failing to find the substring in those, I could start sewing the chunks
together to look for the query string.  That could get ugly (and
slow--but if the user knows that and expects it to be slow, I'm ok with
that).

Thanks,
Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
On Mon, 2003-08-04 at 11:53, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> > At least this appears to work and is much faster, completing substring
> > operations like above in about 0.27 secs (that's about two orders of
> > magnitude improvement!)
> 
> I find it really, really hard to believe that a crude reimplementation
> in plpgsql of the TOAST concept could beat the built-in implementation
> at all, let alone beat it by two orders of magnitude.
> 
> Either there's something unrealistic about your testing of the
> dna_string function, or your original tests are not causing TOAST to be
> invoked in the expected way, or there's a bug we need to fix.  I'd
> really like to see some profiling of the poor-performing
> external-storage case, so we can figure out what's going on.
> 
I was really hoping for a "Good job and glad to hear it" from you :-)

I don't think there is anything unrealistic about my function or its
testing, as it is very much along the lines of the types of things we do
now.  I will really try to do some profiling this week to help figure
out what is going on.

Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-06 Thread Scott Cain
Joe,

Good idea, since I may not get around to profiling it this week.  I
created a dump of the data set I was working with.  It is available at
http://www.gmod.org/string_dump.bz2

Thanks,
Scott


On Mon, 2003-08-04 at 16:29, Joe Conway wrote:
> Is there a sample table schema and dataset available (external-storage 
> case) that we can play with?
> 
> Joe
-- 
----
Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-09 Thread Scott Cain
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is
109M.

Scott

On Tue, 2003-08-05 at 11:01, Scott Cain wrote:
> Joe,
> 
> Good idea, since I may not get around to profiling it this week.  I
> created a dump of the data set I was working with.  It is available at
> http://www.gmod.org/string_dump.bz2
> 
> Thanks,
> Scott
> 
> 
> On Mon, 2003-08-04 at 16:29, Joe Conway wrote:
> > Is there a sample table schema and dataset available (external-storage 
> > case) that we can play with?
> > 
> > Joe
-- 
--------
Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[SQL] Writing a generic delete rule for a view

2009-03-05 Thread Scott Cain
Hello,

I have a relatively straight forward set up with two schemas, the main
public schema and a private schema (called "testschema" here).  The
tables in both schemas are nearly identical: for each table "foo" in
the public schema, there is a table "testschema._foo" and a view
"testschema.foo" in the private schema, where testschema.foo is a
simple union of the public and private tables, to make it appear when
querying in the private schema that the tables are exactly the same as
the public schema, and that the tables have a union of what is in the
private and public schemas.

Now, I would like to allow insert, update and delete rules on the
private views to do the same actions in private tables (that is, an
insert into testschema.foo should really do an insert into
testschema._foo).  Writing rules to do the update and inserts was
fairly straight forward.  What I don't understand how to do is a
general rule for doing deletes on testschema._foo.  What I would like
would be a way to "grab" the where clause and append it onto the DO
INSTEAD for the rule.  I'm thinking something like this:

  CREATE OR REPLACE RULE testschema_delete_foo
 AS ON DELETE TO testschema.foo
 DO INSTEAD (DELETE FROM testschema._foo WHERE $where);

So, what I want to know is, is there a "magic" way to get that $where?

Thanks,
Scott


-- 
--------
Scott Cain, Ph. D.   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql