[SQL] NOT and AND problem

2003-07-17 Thread Richard Jones
Dear All,

I am having some confusion over a query which is supposed to achieve the
following:  To remove a record from a table if the one or both of the
columns containing references to other tables no longer point to table rows
which still exist.  There are good reasons why I cannot use foreign keys to
maintain referential integrity, but I will not go into them, but they give
rise to the need to "clean-up" my database table from time to time.  The
query that I have had most success with looks like this:

DELETE FROM myTable
WHERE (NOT myTable.item_id = item.item_id)
AND (NOT myTable.group_id = ep.group_id);

Which is odd, because logically it shouldn't work.  What I find with the
above queries is that as follows:

let myTable.item_id = item.item_id be A
let myTable.group_id = ep.group_id be B

The derived and actual truth tables for the results of the where clause
follow:

Derived:
A | B | Result
1 | 1 | 0
1 | 0 | 0
0 | 1 | 0
0 | 0 | 1

Actual:
A | B | Result
1 | 1 | 0
1 | 0 | 0
0 | 1 | 1
0 | 0 | 1

This makes no sense to me, as effectively rows 2 and 3 of the Actual results
truth table are the same (unless there's some subtle difference with regards
to the order of the statements, otherwise just substitute A for B and vice
versa).

The result that I actually want from the operation is this:

A | B | Result
1 | 1 | 0
1 | 0 | 1
0 | 1 | 1
0 | 0 | 1

which would suggest a query like:

DELETE FROM myTable
WHERE NOT (myTable.item_id = item.item_id AND myTable.group_id =
ep.group_id);

which ought to provide the above output.  Instead, using this query, the
output I get is as follows:

A | B | Result
1 | 1 | 1
1 | 0 | 1
0 | 1 | 1
0 | 0 | 1

I can only conclude that Postgres is doing something with regards to the
other two tables which I am unaware of.  Can anyone help me understand what
is going on?  Any suggestions gratefully received.

Cheers

Richard


Richard Jones
---
Systems Developer
Theses Alive! - www.thesesalive.ac.uk
Edinburgh University Library
[EMAIL PROTECTED]
0131 651 1611


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] NOT and AND problem

2003-07-17 Thread Richard Jones
Hello,

Just to say thanks for such fast responses.  The full working query is
indeed as suggested (I cut the demo query down when I posted it, as it's got
some awfully long names in it in real life):

DELETE FROM eulepersongroup2workspaceitem
WHERE NOT EXISTS (
 SELECT 1 FROM workspaceitem WHERE workspace_item_id =
eulepersongroup2workspaceitem.workspace_item_id
) OR NOT EXISTS (
 SELECT 1 FROM epersongroup WHERE eperson_group_id =
eulepersongroup2workspaceitem.eperson_group_id
);

Thanks very much for your help.

Regards

Richard


Richard Jones
---
Systems Developer
Theses Alive! - www.thesesalive.ac.uk
Edinburgh University Library
[EMAIL PROTECTED]
0131 651 1611


---(end of broadcast)---
TIP 8: explain analyze is your friend


[SQL] Referential integrity (foreign keys) across multiple tables

2006-07-22 Thread Richard Jones

Simplified schema:

  create table hosts (
id serial primary key,
hostname text not null
  );

  create table pages (
id serial primary key,
hostid int not null references hosts (id),
url text not null,
unique (hostid, url)
  );

  create table page_contents (
pageid int not null references pages (id),
section text not null
  );

(There are many hosts, many pages per host, and many page_contents
sections per page).

Now I want to add a column to page_contents, say called link_name,
which is going to reference the pages.url column for the particular
host that this page belongs to.

Something like:

  alter table page_contents add link_name text;
  alter table page_contents
add constraint foo foreign key (p.hostid, link_name)
references pages (hostid, url)
where p.id = pageid;

Obviously that second statement isn't going to compile.

I don't want to add the hostid column to page_contents table because I
have a lot of old code accessing the database which would be hard to
change (the old code would no longer be able to insert page_contents
rows).

Is this possible somehow?  Perhaps by adding a second table?  Do I
have to use triggers, and if so is that as robust as referential
integrity?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] Referential integrity (foreign keys) across multiple tables

2006-07-24 Thread Richard Jones
On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote:
> On Sat, Jul 22, 2006 at 14:32:57 +0100,
>   Richard Jones <[EMAIL PROTECTED]> wrote:
> > 
> > Now I want to add a column to page_contents, say called link_name,
> > which is going to reference the pages.url column for the particular
> > host that this page belongs to.
> 
> What are you trying to accomplish by this?

Data integrity.

> The information is available by doing a join. If you are trying to
> simplify things for applications, you can probably do it with a view
> or rules depending on whether you want to have an updatable view. If
> you are denormalizing for performance and want constraints to
> maintain consistancy, then you probably want to push the hostid down
> to page_contents as well as the url. These could both be set with a
> trigger. (I think a rule could be used as well.)

So if I get this right, I should use a trigger to ensure that the old
code causes the hostid field to be set in page_contents?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Referential integrity (foreign keys) across multiple tables

2006-07-24 Thread Richard Jones
On Mon, Jul 24, 2006 at 12:51:48PM -0500, Bruno Wolff III wrote:
> On Mon, Jul 24, 2006 at 09:59:07 +0100,
>   Richard Jones <[EMAIL PROTECTED]> wrote:
> > On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote:
> > > On Sat, Jul 22, 2006 at 14:32:57 +0100,
> > >   Richard Jones <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > Now I want to add a column to page_contents, say called link_name,
> > > > which is going to reference the pages.url column for the particular
> > > > host that this page belongs to.
> > > 
> > > What are you trying to accomplish by this?
> > 
> > Data integrity.
> 
> This doesn't make sense in isolation. If that is all you are trying to do,
> then you don't need to do anything to the database design as the information
> is already there. The application just needs to do a join when querying the
> data.

I'm not sure what this means.  By "data integrity" I just meant that I
don't want applications to create page_contents.link_name fields which
could point to non-existent URLs.  (A URL consists of a particular
hostid and url, since you're not allowed to have one host pointing to
pages on another, and this is where the requirement for a foreign key
which spans two tables comes from).  Perhaps I meant "data
consistency"?  Anyway without some sort of check, be it a reference or
a trigger -- assuming a trigger is possible -- then an application
might create such a bad link.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Referential integrity (foreign keys) across multiple tables

2006-07-24 Thread Richard Jones
On Mon, Jul 24, 2006 at 12:26:15PM -0500, Aaron Bono wrote:
> On 7/22/06, Richard Jones <[EMAIL PROTECTED]> wrote:
> >  create table hosts (
> >id serial primary key,
> >hostname text not null
> >  );
> >
> >  create table pages (
> >id serial primary key,
> >hostid int not null references hosts (id),
> >url text not null,
> >unique (hostid, url)
> >  );
> >
> >  create table page_contents (
> >pageid int not null references pages (id),
> >section text not null
> >  );
> >
> >(There are many hosts, many pages per host, and many page_contents
> >sections per page).
> >
> >Now I want to add a column to page_contents, say called link_name,
> >which is going to reference the pages.url column for the particular
> >host that this page belongs to.
> >
> >Something like:
> >
> >  alter table page_contents add link_name text;
> >  alter table page_contents
> >add constraint foo foreign key (p.hostid, link_name)
> >references pages (hostid, url)
> >where p.id = pageid;
> 
> The easiest, and not necessarily elegant, way to acomplish this is to create
> linkid rather than link_name and make it a foreign key to pages.id.

Unfortunately this isn't possible :-(  My schema above is simplified a
little too far.  In the real schema we keep old versions of pages
around in the pages table (we move the 'url' field to a 'url_deleted'
field so that the unique (hostid, url) isn't violated by the new
version of the page).  This means that the pageid can be updated, so
link_name must store a url, not a pageid.

> Then add a trigger that checks to make sure the pages you link to
> from page_contents to pages is for the same host.  If not, raise an
> exception.

I think though that this suggestion is right.  I'm not sure what
difference it makes if it's link_name or linkid, but it looks like
I'll have to write a trigger for this.  It doesn't seem like there's a
way using just ordinary foreign keys.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Richard Jones
I've been profiling a PG database / mix of applications and found that
one statement which takes a very long time to execute is:

select e.keywordid, e.quantity, e.max_cpc, i.position
 from bid3_events_impressions i, bid3_events e
where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,
$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,
$45,$46,$47,$48,
[... placeholders $49 thru $1908 omitted ...]
$1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1917,$1918,$1919,$1920,
$1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1929,$1930,$1931,$1932)
order by e.keywordid, e.creativeid, e.t

Needless to say this statement is being generated programatically.

The problem is that the code needs to execute selects of this sort on
various different number of keyword IDs quite frequently.  I'm not
sure how to rewrite it.  If I put the keyword IDs into a temporary
table then it is not at all clear that the overhead of doing each
individual INSERT to populate the table won't be just as slow (the
database is located across a network so there is a significant RTT,
and COPY isn't supported by my PG lib).

Has anyone got any suggestions?

Rich.

--
explain select e.keywordid, e.quantity, e.max_cpc, i.position from 
bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid 
in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t;  

  QUERY PLAN
 
---
 Sort  (cost=15795.79..15796.57 rows=312 width=34)
   Sort Key: e.keywordid, e.creativeid, e.t
   ->  Hash Join  (cost=11623.58..15782.87 rows=312 width=34)
 Hash Cond: ("outer".eventid = "inner".id)
 ->  Seq Scan on bid3_events_impressions i  (cost=0.00..3471.78 
rows=136878 width=8)
 ->  Hash  (cost=11622.35..11622.35 rows=489 width=34)
   ->  Seq Scan on bid3_events e  (cost=0.00..11622.35 rows=489 
width=34)
 Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid 
= 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 
7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
(8 rows)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Richard Jones
On Sun, Feb 25, 2007 at 01:34:44PM -0500, Tom Lane wrote:
> Joe Conway <[EMAIL PROTECTED]> writes:
> > If 8.2, what about
> >... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
> > ?
> 
> Well, the OP wasn't using 8.2 --- judging from the selected plan, it had
> to be 8.0 or older.  But yeah, a values-list is an interesting
> alternative on 8.2.  I think actually you don't need all that much extra
> notation; this seems to work:
> 
>   WHERE foo IN (VALUES ($1),($2),($3),...)

That's right, it is in fact PG 7.4.

I will try the values suggestion to see if that makes a difference.

Rich.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate