[SQL] NOT and AND problem
Dear All, I am having some confusion over a query which is supposed to achieve the following: To remove a record from a table if the one or both of the columns containing references to other tables no longer point to table rows which still exist. There are good reasons why I cannot use foreign keys to maintain referential integrity, but I will not go into them, but they give rise to the need to "clean-up" my database table from time to time. The query that I have had most success with looks like this: DELETE FROM myTable WHERE (NOT myTable.item_id = item.item_id) AND (NOT myTable.group_id = ep.group_id); Which is odd, because logically it shouldn't work. What I find with the above queries is that as follows: let myTable.item_id = item.item_id be A let myTable.group_id = ep.group_id be B The derived and actual truth tables for the results of the where clause follow: Derived: A | B | Result 1 | 1 | 0 1 | 0 | 0 0 | 1 | 0 0 | 0 | 1 Actual: A | B | Result 1 | 1 | 0 1 | 0 | 0 0 | 1 | 1 0 | 0 | 1 This makes no sense to me, as effectively rows 2 and 3 of the Actual results truth table are the same (unless there's some subtle difference with regards to the order of the statements, otherwise just substitute A for B and vice versa). The result that I actually want from the operation is this: A | B | Result 1 | 1 | 0 1 | 0 | 1 0 | 1 | 1 0 | 0 | 1 which would suggest a query like: DELETE FROM myTable WHERE NOT (myTable.item_id = item.item_id AND myTable.group_id = ep.group_id); which ought to provide the above output. Instead, using this query, the output I get is as follows: A | B | Result 1 | 1 | 1 1 | 0 | 1 0 | 1 | 1 0 | 0 | 1 I can only conclude that Postgres is doing something with regards to the other two tables which I am unaware of. Can anyone help me understand what is going on? Any suggestions gratefully received. Cheers Richard Richard Jones --- Systems Developer Theses Alive! - www.thesesalive.ac.uk Edinburgh University Library [EMAIL PROTECTED] 0131 651 1611 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] NOT and AND problem
Hello, Just to say thanks for such fast responses. The full working query is indeed as suggested (I cut the demo query down when I posted it, as it's got some awfully long names in it in real life): DELETE FROM eulepersongroup2workspaceitem WHERE NOT EXISTS ( SELECT 1 FROM workspaceitem WHERE workspace_item_id = eulepersongroup2workspaceitem.workspace_item_id ) OR NOT EXISTS ( SELECT 1 FROM epersongroup WHERE eperson_group_id = eulepersongroup2workspaceitem.eperson_group_id ); Thanks very much for your help. Regards Richard Richard Jones --- Systems Developer Theses Alive! - www.thesesalive.ac.uk Edinburgh University Library [EMAIL PROTECTED] 0131 651 1611 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[SQL] Referential integrity (foreign keys) across multiple tables
Simplified schema: create table hosts ( id serial primary key, hostname text not null ); create table pages ( id serial primary key, hostid int not null references hosts (id), url text not null, unique (hostid, url) ); create table page_contents ( pageid int not null references pages (id), section text not null ); (There are many hosts, many pages per host, and many page_contents sections per page). Now I want to add a column to page_contents, say called link_name, which is going to reference the pages.url column for the particular host that this page belongs to. Something like: alter table page_contents add link_name text; alter table page_contents add constraint foo foreign key (p.hostid, link_name) references pages (hostid, url) where p.id = pageid; Obviously that second statement isn't going to compile. I don't want to add the hostid column to page_contents table because I have a lot of old code accessing the database which would be hard to change (the old code would no longer be able to insert page_contents rows). Is this possible somehow? Perhaps by adding a second table? Do I have to use triggers, and if so is that as robust as referential integrity? Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Referential integrity (foreign keys) across multiple tables
On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote: > On Sat, Jul 22, 2006 at 14:32:57 +0100, > Richard Jones <[EMAIL PROTECTED]> wrote: > > > > Now I want to add a column to page_contents, say called link_name, > > which is going to reference the pages.url column for the particular > > host that this page belongs to. > > What are you trying to accomplish by this? Data integrity. > The information is available by doing a join. If you are trying to > simplify things for applications, you can probably do it with a view > or rules depending on whether you want to have an updatable view. If > you are denormalizing for performance and want constraints to > maintain consistancy, then you probably want to push the hostid down > to page_contents as well as the url. These could both be set with a > trigger. (I think a rule could be used as well.) So if I get this right, I should use a trigger to ensure that the old code causes the hostid field to be set in page_contents? Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Referential integrity (foreign keys) across multiple tables
On Mon, Jul 24, 2006 at 12:51:48PM -0500, Bruno Wolff III wrote: > On Mon, Jul 24, 2006 at 09:59:07 +0100, > Richard Jones <[EMAIL PROTECTED]> wrote: > > On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote: > > > On Sat, Jul 22, 2006 at 14:32:57 +0100, > > > Richard Jones <[EMAIL PROTECTED]> wrote: > > > > > > > > Now I want to add a column to page_contents, say called link_name, > > > > which is going to reference the pages.url column for the particular > > > > host that this page belongs to. > > > > > > What are you trying to accomplish by this? > > > > Data integrity. > > This doesn't make sense in isolation. If that is all you are trying to do, > then you don't need to do anything to the database design as the information > is already there. The application just needs to do a join when querying the > data. I'm not sure what this means. By "data integrity" I just meant that I don't want applications to create page_contents.link_name fields which could point to non-existent URLs. (A URL consists of a particular hostid and url, since you're not allowed to have one host pointing to pages on another, and this is where the requirement for a foreign key which spans two tables comes from). Perhaps I meant "data consistency"? Anyway without some sort of check, be it a reference or a trigger -- assuming a trigger is possible -- then an application might create such a bad link. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Referential integrity (foreign keys) across multiple tables
On Mon, Jul 24, 2006 at 12:26:15PM -0500, Aaron Bono wrote: > On 7/22/06, Richard Jones <[EMAIL PROTECTED]> wrote: > > create table hosts ( > >id serial primary key, > >hostname text not null > > ); > > > > create table pages ( > >id serial primary key, > >hostid int not null references hosts (id), > >url text not null, > >unique (hostid, url) > > ); > > > > create table page_contents ( > >pageid int not null references pages (id), > >section text not null > > ); > > > >(There are many hosts, many pages per host, and many page_contents > >sections per page). > > > >Now I want to add a column to page_contents, say called link_name, > >which is going to reference the pages.url column for the particular > >host that this page belongs to. > > > >Something like: > > > > alter table page_contents add link_name text; > > alter table page_contents > >add constraint foo foreign key (p.hostid, link_name) > >references pages (hostid, url) > >where p.id = pageid; > > The easiest, and not necessarily elegant, way to acomplish this is to create > linkid rather than link_name and make it a foreign key to pages.id. Unfortunately this isn't possible :-( My schema above is simplified a little too far. In the real schema we keep old versions of pages around in the pages table (we move the 'url' field to a 'url_deleted' field so that the unique (hostid, url) isn't violated by the new version of the page). This means that the pageid can be updated, so link_name must store a url, not a pageid. > Then add a trigger that checks to make sure the pages you link to > from page_contents to pages is for the same host. If not, raise an > exception. I think though that this suggestion is right. I'm not sure what difference it makes if it's link_name or linkid, but it looks like I'll have to write a trigger for this. It doesn't seem like there's a way using just ordinary foreign keys. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[SQL] Very large IN-clause is slow, but how to rewrite it?
I've been profiling a PG database / mix of applications and found that one statement which takes a very long time to execute is: select e.keywordid, e.quantity, e.max_cpc, i.position from bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7, $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26, $27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44, $45,$46,$47,$48, [... placeholders $49 thru $1908 omitted ...] $1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1917,$1918,$1919,$1920, $1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1929,$1930,$1931,$1932) order by e.keywordid, e.creativeid, e.t Needless to say this statement is being generated programatically. The problem is that the code needs to execute selects of this sort on various different number of keyword IDs quite frequently. I'm not sure how to rewrite it. If I put the keyword IDs into a temporary table then it is not at all clear that the overhead of doing each individual INSERT to populate the table won't be just as slow (the database is located across a network so there is a significant RTT, and COPY isn't supported by my PG lib). Has anyone got any suggestions? Rich. -- explain select e.keywordid, e.quantity, e.max_cpc, i.position from bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t; QUERY PLAN --- Sort (cost=15795.79..15796.57 rows=312 width=34) Sort Key: e.keywordid, e.creativeid, e.t -> Hash Join (cost=11623.58..15782.87 rows=312 width=34) Hash Cond: ("outer".eventid = "inner".id) -> Seq Scan on bid3_events_impressions i (cost=0.00..3471.78 rows=136878 width=8) -> Hash (cost=11622.35..11622.35 rows=489 width=34) -> Seq Scan on bid3_events e (cost=0.00..11622.35 rows=489 width=34) Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid = 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10)) (8 rows) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Very large IN-clause is slow, but how to rewrite it?
On Sun, Feb 25, 2007 at 01:34:44PM -0500, Tom Lane wrote: > Joe Conway <[EMAIL PROTECTED]> writes: > > If 8.2, what about > >... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x)) > > ? > > Well, the OP wasn't using 8.2 --- judging from the selected plan, it had > to be 8.0 or older. But yeah, a values-list is an interesting > alternative on 8.2. I think actually you don't need all that much extra > notation; this seems to work: > > WHERE foo IN (VALUES ($1),($2),($3),...) That's right, it is in fact PG 7.4. I will try the values suggestion to see if that makes a difference. Rich. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate