[SQL] optimizing a query over tree-like structure
hi. sorry for the vague syntax used below, but the query is huge so i've tried to present it in simple terms. And sorry if i'm doing obviously stupid things, i have lots of years programming behind me but NO sql involved. i have a somewhat tree-like structure of objects that link to each other via many2many associations. it looks like: (N is "root") N links to R,P,F R links to F P links to O,F O links to O,F #recursively F links to Z All links to F but the one in O are "shortcuts", to avoid looking it up recursively. each of these objects has some associated values (again many2many, ownership). what i want is to get all the values related to a given N and its sublevels, in one query. one variant of what i've invented so far is (~pseudocode, no recursion on O): SELECT ownership.*, value.* FROM Nazn, mm_N2P, mm_P2O, mm_O2O, mm_O2O AS mm_O2O1, mm_N2Z, ownership JOIN value ON ownership.value = value.dbid WHERE ( N.dbid = ownership.N OR N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R OR N.dbid = mm_N2P.left AND ( mm_N2P.right = ownership.P OR mm_N2P.right = mm_P2O.left AND ( mm_P2O.right = ownership.O OR mm_P2O.right = mm_O2O.left AND ( mm_O2O.right = ownership.O OR mm_O2O.right = mm_O2O1.left AND mm_O2O1.right = ownership.O ))) OR Nazn.dbid = mm_N2F.left AND ( mm_N2F.right = ownership.F OR mm_N2Z.right = ownership.Z ) ) AND ownership.value = value.dbid AND N.obj = whatever-filter-by-N this scales very poor. it uses the shortcut to F present in N. for just 200 rows with related associations, it takes 4 seconds to get result. if i use the shortcut to F present in P, it takes 2 seconds - but thats still inacceptable. seems that the number or consequtive ORs on same level is killing it. EXPLAIN gives nested loops all over. What am i doing wrong here? should i expand the A-to-B links of the sort mm_N2P.right = mm_P2O.left into mm_N2P.right = P.dbid and P.dbid == mm_P2O.left ? the query is generated via sqlalchemy and a layer on top, so i can tweak it any way required (and it has many other sub/filterings which i've ommited for brevity - they dont make it better/worse). any pointers of how such queries should be written are appreciated - e.g. what is considered fine, what doable and what is a no-no. thanks ahead ciao svil www.svilendobrev.com dbcook.sf.net -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Can COPY update or skip existing records?
Hey all, I've got a table with a unique constraint across a few fields which I need to regularly import a batch of data into. Is there a way to do it with COPY without getting conflicts on the unique contraint? I have no was of being certain that some of the data I'm trying to load isn't in the table already. Ideally I'd like it to operate like MySQL's on_duplicate_key_update option, but for now I'll suffice with just ignoring existing rows and proceeding with everything else. Thanks, -- Glenn -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] optimizing a query over tree-like structure
Hi, Svil I 'd like to first fully understand the background of your problem before figurin out if I can be of any help (or not). You have a tree, of which N is the root, is this correct? Then, what are the next sublevel? F, P and R? If so, why is R linked to a sibling (F) ? And the next one? O and Z? Is O connected to itself? And i am not understanding your concept of "shortcuts". Could you please explain ? What kind of tree do you have exactly? Binary? Trenary? The mm_* tables keep relations between nodes, I guess If so , the mm_N2Z one is empty, in this example, right?As there is no edge from N to Z (not direct). But what is the Nazn table? What records does it keep? And what is the ownership table? And the value? Could you tell which columns these tables have, at least the relevant ones for your problem ? Please kindly advice me on these issues. I am not very specialized in optimizing queries, but I see you have a lot of cartesian products on your FROM clause, which, from my own experience, I guess it has tendency to be slow... Best, Oliveiros - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Tuesday, September 30, 2008 9:32 AM Subject: [SQL] optimizing a query over tree-like structure hi. sorry for the vague syntax used below, but the query is huge so i've tried to present it in simple terms. And sorry if i'm doing obviously stupid things, i have lots of years programming behind me but NO sql involved. i have a somewhat tree-like structure of objects that link to each other via many2many associations. it looks like: (N is "root") N links to R,P,F R links to F P links to O,F O links to O,F #recursively F links to Z All links to F but the one in O are "shortcuts", to avoid looking it up recursively. each of these objects has some associated values (again many2many, ownership). what i want is to get all the values related to a given N and its sublevels, in one query. one variant of what i've invented so far is (~pseudocode, no recursion on O): SELECT ownership.*, value.* FROM Nazn, mm_N2P, mm_P2O, mm_O2O, mm_O2O AS mm_O2O1, mm_N2Z, ownership JOIN value ON ownership.value = value.dbid WHERE ( N.dbid = ownership.N OR N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R OR N.dbid = mm_N2P.left AND ( mm_N2P.right = ownership.P OR mm_N2P.right = mm_P2O.left AND ( mm_P2O.right = ownership.O OR mm_P2O.right = mm_O2O.left AND ( mm_O2O.right = ownership.O OR mm_O2O.right = mm_O2O1.left AND mm_O2O1.right = ownership.O ))) OR Nazn.dbid = mm_N2F.left AND ( mm_N2F.right = ownership.F OR mm_N2Z.right = ownership.Z ) ) AND ownership.value = value.dbid AND N.obj = whatever-filter-by-N this scales very poor. it uses the shortcut to F present in N. for just 200 rows with related associations, it takes 4 seconds to get result. if i use the shortcut to F present in P, it takes 2 seconds - but thats still inacceptable. seems that the number or consequtive ORs on same level is killing it. EXPLAIN gives nested loops all over. What am i doing wrong here? should i expand the A-to-B links of the sort mm_N2P.right = mm_P2O.left into mm_N2P.right = P.dbid and P.dbid == mm_P2O.left ? the query is generated via sqlalchemy and a layer on top, so i can tweak it any way required (and it has many other sub/filterings which i've ommited for brevity - they dont make it better/worse). any pointers of how such queries should be written are appreciated - e.g. what is considered fine, what doable and what is a no-no. thanks ahead ciao svil www.svilendobrev.com dbcook.sf.net -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Finding sequential records
On Mon, Sep 29, 2008 at 11:05 PM, Steve Midgley <[EMAIL PROTECTED]> wrote: > Any reason why using serial PK's with "compound > natural unique indices" is better/worse than just using natural PK's? Not really, surrogate keys will always work well so long as unwanted duplicates are constrained. Surrogate Keys will allow as much flexibility and versatility as is possible. However as the case of unwanted duplicates illustrates, sometimes flexibility and versatility is always wanted. So if flexibility and versatility is valued less than the ability to add inflexible constraints (beyond preventing duplicates) across multiple relationships, then using natural primary keys becomes very attractive. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] optimizing a query over tree-like structure
another idea i just got, to decrease the number of tables in one FROM, is to represent the first-level ORs with unions. but i'm not sure how exactly to do it: are these equivalent? FROM N,ownership,mm_N2R where ( N.dbid = ownership.N OR N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R OR ... ) AND N.obj = whatever-filter-by-N FROM ( (ownership join N on ownership.N = N.dbid) union (ownership join mm_N2R on ownership.R = mm_N2R.right join N on mm_N2R.left = N.dbid ) OR ) ... and should i bundle the filtering-by-N/Employment in every of above union-members? On Tuesday 30 September 2008 15:21:09 Oliveiros Cristina wrote: > Hi, Svil > > I 'd like to first fully understand the background of your problem > before figurin out if I can be of any help (or not). > > You have a tree, of which N is the root, is this correct? > Then, what are the next sublevel? > F, P and R? If so, why is R linked to a sibling (F) ? > And the next one? > O and Z? > Is O connected to itself? > And i am not understanding your concept of "shortcuts". Could you > please explain ? > What kind of tree do you have exactly? Binary? Trenary? > The mm_* tables keep relations between nodes, I guess If so , > the mm_N2Z one is empty, in this example, right?As there is no edge > from N to Z (not direct). > But what is the Nazn table? What records does it keep? > And what is the ownership table? And the value? > Could you tell which columns these tables have, at least the > relevant ones for your problem ? > > Please kindly advice me on these issues. > I am not very specialized in optimizing queries, but I see you have > a lot of cartesian products on your FROM clause, which, from my own > experience, I guess it has tendency to be slow... > > Best, > > Oliveiros > > - Original Message - > From: <[EMAIL PROTECTED]> > To: > Sent: Tuesday, September 30, 2008 9:32 AM > Subject: [SQL] optimizing a query over tree-like structure > > > hi. > > sorry for the vague syntax used below, but the query is huge so > > i've tried to present it in simple terms. And sorry if i'm doing > > obviously stupid things, i have lots of years programming behind > > me but NO sql involved. > > i have a somewhat tree-like structure of objects that link to > > each other via many2many associations. it looks like: > > (N is "root") > > N links to R,P,F > > R links to F > > P links to O,F > > O links to O,F #recursively > > F links to Z > > All links to F but the one in O are "shortcuts", to avoid looking > > it up recursively. > > each of these objects has some associated values (again > > many2many, ownership). > > > > what i want is to get all the values related to a given N and its > > sublevels, in one query. > > > > one variant of what i've invented so far is (~pseudocode, no > > recursion on O): > > > > SELECT ownership.*, value.* > > FROM Nazn, mm_N2P, mm_P2O, mm_O2O, mm_O2O AS mm_O2O1, mm_N2Z, > > ownership JOIN value ON ownership.value = value.dbid > > WHERE ( > > N.dbid = ownership.N > > OR > > N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R > > OR > > N.dbid = mm_N2P.left AND ( > > mm_N2P.right = ownership.P > > OR > > mm_N2P.right = mm_P2O.left AND ( > > mm_P2O.right = ownership.O > > OR > > mm_P2O.right = mm_O2O.left AND ( > > mm_O2O.right = ownership.O > > OR > > mm_O2O.right = mm_O2O1.left AND > >mm_O2O1.right = ownership.O > > ))) > > OR > > Nazn.dbid = mm_N2F.left AND ( > > mm_N2F.right = ownership.F > > OR > > mm_N2Z.right = ownership.Z > > ) > > ) AND ownership.value = value.dbid AND N.obj = > > whatever-filter-by-N > > > > > > this scales very poor. > > it uses the shortcut to F present in N. > > for just 200 rows with related associations, it takes 4 seconds > > to get result. > > if i use the shortcut to F present in P, it takes 2 seconds - but > > thats still inacceptable. > > seems that the number or consequtive ORs on same level is killing > > it. EXPLAIN gives nested loops all over. > > What am i doing wrong here? > > should i expand the A-to-B links of the sort > > mm_N2P.right = mm_P2O.left > > into > > mm_N2P.right = P.dbid and P.dbid == mm_P2O.left ? > > > > the query is generated via sqlalchemy and a layer on top, so i > > can tweak it any way required (and it has many other > > sub/filterings which i've ommited for brevity - they dont make it > > better/worse). > > > > any pointers of how such queries should be written are > > appreciated - e.g. what is considered fine, what doable and what > > is a no-no. > > > > thanks ahead > > ciao > > svil > > > > www.svilendobrev.com > > dbcook.sf.net > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailp
Re: [SQL] optimizing a query over tree-like structure
Svil, Please advice me, You have values and one table for N,R,P,F and O and Z, right? And you have ownership which is a "catch-all" associative table between values and whatever other table, is this correct? You want to retrieve the values for a certain N, and all to all the other entities that belong to that N, is this correct? Fine, I have a few questions on your associative tables mm_N2P -- An N can be associated to P in a many to many relationship? N Can have one or none P, am I right? It can have many Ps ? And a P can belong to many Ns ? mm_P2O, -- I understand an O can have many Positions, but a Positon can be in more than one O? is this a many to many relationship? mm_O2O, -- if it is a hierarchical tree do you need an associative table? An O can have many sub-departments but a sub-department (O) can just belong to a (super-)department, -- isn't it so? mm_N2Z -- this one was mistake of yours, it doesn't exist, right? You wrote that giant query by yourself or was it generated by some automated tool? (sqlalchemy , I have no idea :p ) Your ultimate goal is to retrieve the values associated with a certain Department( and all its sub-departments) which is associated with a certain Position, which values are also to obtain, such position being associated with a certain N, which values are also to obtain, is my understandin correct? Explain me the simplest case, where you have an N not associated with any R or P, just with an F. Which info is to be retrieved, exactly in this case ? Best, Oliveiros - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Tuesday, September 30, 2008 3:24 PM Subject: Re: [SQL] optimizing a query over tree-like structure another idea i just got, to decrease the number of tables in one FROM, is to represent the first-level ORs with unions. but i'm not sure how exactly to do it: are these equivalent? FROM N,ownership,mm_N2R where ( N.dbid = ownership.N OR N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R OR ... ) AND N.obj = whatever-filter-by-N FROM ( (ownership join N on ownership.N = N.dbid) union (ownership join mm_N2R on ownership.R = mm_N2R.right join N on mm_N2R.left = N.dbid ) OR ) ... and should i bundle the filtering-by-N/Employment in every of above union-members? On Tuesday 30 September 2008 15:21:09 Oliveiros Cristina wrote: Hi, Svil I 'd like to first fully understand the background of your problem before figurin out if I can be of any help (or not). You have a tree, of which N is the root, is this correct? Then, what are the next sublevel? F, P and R? If so, why is R linked to a sibling (F) ? And the next one? O and Z? Is O connected to itself? And i am not understanding your concept of "shortcuts". Could you please explain ? What kind of tree do you have exactly? Binary? Trenary? The mm_* tables keep relations between nodes, I guess If so , the mm_N2Z one is empty, in this example, right?As there is no edge from N to Z (not direct). But what is the Nazn table? What records does it keep? And what is the ownership table? And the value? Could you tell which columns these tables have, at least the relevant ones for your problem ? Please kindly advice me on these issues. I am not very specialized in optimizing queries, but I see you have a lot of cartesian products on your FROM clause, which, from my own experience, I guess it has tendency to be slow... Best, Oliveiros - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Tuesday, September 30, 2008 9:32 AM Subject: [SQL] optimizing a query over tree-like structure > hi. > sorry for the vague syntax used below, but the query is huge so > i've tried to present it in simple terms. And sorry if i'm doing > obviously stupid things, i have lots of years programming behind > me but NO sql involved. > i have a somewhat tree-like structure of objects that link to > each other via many2many associations. it looks like: > (N is "root") > N links to R,P,F > R links to F > P links to O,F > O links to O,F #recursively > F links to Z > All links to F but the one in O are "shortcuts", to avoid looking > it up recursively. > each of these objects has some associated values (again > many2many, ownership). > > what i want is to get all the values related to a given N and its > sublevels, in one query. > > one variant of what i've invented so far is (~pseudocode, no > recursion on O): > > SELECT ownership.*, value.* > FROM Nazn, mm_N2P, mm_P2O, mm_O2O, mm_O2O AS mm_O2O1, mm_N2Z, > ownership JOIN value ON ownership.value = value.dbid > WHERE ( > N.dbid = ownership.N > OR > N.dbid = mm_N2R.left AND mm_N2R.right = ownership.R > OR > N.dbid = mm_N2P.left AND ( > mm_N2P.right = ownership.P > OR > mm_N2P.right = mm_P2O.left AND ( > mm_P2O.right = ownership.O > OR > mm_P2O.right = mm_O2O.left AND ( > mm_O2O.right = ownership.O > OR > mm_O2O.right = mm_O2O1.left AND >
Re: [SQL] Can COPY update or skip existing records?
Hi there, The operation "on_duplicate_key_update" is in implementation on the new version of Postgres :) The only way (or, at least, the best way... I think) to do what you want is using a temporary table... let's see: /* Creating the structure of the first table (table_01)... You can do it the way you like */ create table temp_01 as (select * from table_01 limit 1); TRUNCATE TABLE table_01; /* COPY */ COPY temp_01 FROM '/tmp/table'; /* Insert the values */ insert into table_01 a where not exists (select 1 from temp_01 b where a.cod_serial = b.cod_serial) /* Or you could do like this */ delete from temp_01 a where exists (select 1 from table_01 b where a.cod_serial = b.cod_serial) I hope being helpful. Best Regards, Rafael Domiciano Postgres DBA 2008/9/30 Glenn Gillen <[EMAIL PROTECTED]> > Hey all, > > I've got a table with a unique constraint across a few fields which I > need to regularly import a batch of data into. Is there a way to do it > with COPY without getting conflicts on the unique contraint? I have no > was of being certain that some of the data I'm trying to load isn't in > the table already. > > Ideally I'd like it to operate like MySQL's on_duplicate_key_update > option, but for now I'll suffice with just ignoring existing rows and > proceeding with everything else. > > Thanks, > > -- > Glenn > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Re: [SQL] Can COPY update or skip existing records?
On Wed, Oct 1, 2008 at 6:37 AM, Rafael Domiciano <[EMAIL PROTECTED] > wrote: > Hi there, > The operation "on_duplicate_key_update" is in implementation on the new > version of Postgres :) > The only way (or, at least, the best way... I think) to do what you want is > using a temporary table... let's see: > > /* Creating the structure of the first table (table_01)... You can do it > the way you like */ > create table temp_01 as > (select * from table_01 limit 1); > CREATE TABLE temp_01( LIKE table_01 ); OR CREATE TABLE temp_01 as select * from table_01 where 1 = 2; These statements would avoid the following TRUNCATE command. (mail follows at the end) > TRUNCATE TABLE table_01; > > /* COPY */ > COPY temp_01 FROM '/tmp/table'; > > /* Insert the values */ > insert into table_01 a > where not exists > (select 1 from temp_01 b > where > a.cod_serial = b.cod_serial) > > /* Or you could do like this */ > delete from temp_01 a > where exists > (select 1 from table_01 b > where > a.cod_serial = b.cod_serial) > > I hope being helpful. > > Best Regards, > > Rafael Domiciano > Postgres DBA > > 2008/9/30 Glenn Gillen <[EMAIL PROTECTED]> > > Hey all, >> >> I've got a table with a unique constraint across a few fields which I >> need to regularly import a batch of data into. Is there a way to do it >> with COPY without getting conflicts on the unique contraint? I have no >> was of being certain that some of the data I'm trying to load isn't in >> the table already. >> >> Ideally I'd like it to operate like MySQL's on_duplicate_key_update >> option, but for now I'll suffice with just ignoring existing rows and >> proceeding with everything else. >> >> Thanks, >> >> For your current requirement, you can create a trigger on the table, which silently rejects the duplicate rows. To implement UPDATE ON DUPLICATE, you can add additional logic to the same trigger and update the row already present. Scour these archives, you'll find this topic discussed in detail earlier. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [SQL] Can COPY update or skip existing records?
On Tue, Sep 30, 2008 at 6:16 AM, Glenn Gillen <[EMAIL PROTECTED]> wrote: > Hey all, > > I've got a table with a unique constraint across a few fields which I > need to regularly import a batch of data into. Is there a way to do it > with COPY without getting conflicts on the unique contraint? I have no > was of being certain that some of the data I'm trying to load isn't in > the table already. > > Ideally I'd like it to operate like MySQL's on_duplicate_key_update > option, but for now I'll suffice with just ignoring existing rows and > proceeding with everything else. Look into pg_loader, I believe it has that capability. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql