Re: [SQL] Query optimizing
Hi Helio, Sorry about the parenthesis - Bad copy/pasting skills! To further discuss your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause the sub-query to become correlated and thus much less efficient ? I may be wrong, or may have miss-understood your suggestion. Thanks for you help, Sebastian On Mon, Nov 10, 2008 at 11:48 AM, Helio Campos Mello de Andrade < [EMAIL PROTECTED]> wrote: > Hi Sebastian, > > - First of all i think there is an open-parenthesis missing in the query > V2. > Maybe in the V2 version you cold restrict the results in the INNER query a > bit more if you use a restriction clause like "WHERE n_issue = i.id" in > that. It will certainly lower the number of rows returned by it to only 1 > result. > > Regards > > -- > Helio Campos Mello de Andrade > > > > On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter < > [EMAIL PROTECTED]> wrote: > >> Hi all, >> >> I was hoping to receive some advise on a slow running query in our >> business' Issue Tracking System. To shed some light on the below mentioned >> queries, here is a brief summary of how users interact with the system. The >> two main components in the system are a Issues and Followups. An Issue is >> created by our technical support staff when some form of action needs to be >> taken in order to resolve a problem. FollowUps are entries entered against >> an issue depicting the sequence of events taken to resolve the issue. There >> are about 15,000 Issues in the system at present and about 95,000 FollowUps >> in the system. As we need the system to be very responsive, each query >> should ideally run in under 1 second. >> >> A lot of the reports our technical officers submit to us include a listing >> of all actioned issues for a given day along with the last modified followup >> of each said issue. With the number of rows in our database increasing at a >> high rate, these queries are starting to run too slowly. >> >> Here is a condensed version of the two tables: >> >> Issues: >> = >> id - integer >> dt_created - timestamp >> dt_modified - timestamp >> t_title - varchar >> t_description - varchar >> >> FollowUps: >> = >> id - integer >> dt_created - timestamp >> dt_modified - timestamp >> t_description - varchar >> n_issue - foregin key to issues >> >> We have identified that the slowness in our queries is trying to return >> the lastest followup for each actioned issue that day. Without further ado >> here are two variations I have tried within the system (neither of which are >> making the cut): >> >> V1 (correlated subquery - Very bad performance) >> >> (SELECT >> fu.* >> FROM >> manage_followup fu, >> manage_issue i >> WHERE >> i.id = fu.n_issue >> AND >> fu.id = (SELECT >> id >>FROM >> manage_followup >> WHERE >> n_issue = i.id >> ORDER BY >> dt_modified DESC >> LIMIT 1)) AS latestfu, >> >> V2 (Using Group By, "max" aggregate function and distinct- better >> performance, but still bad because of distinct) >> >> >> SELECT DISTINCT ON (fu.n_issue) >> fu.id, >> fu.dt_created, >> fu.dt_modified, >> fu.t_description, >> fu.n_issue as issue_id >> FROM >> manage_followup fu, >> (SELECT >> n_issue, >> max(dt_modified) as dt_modified >> FROM >> manage_followup >> GROUP BY >> n_issue) as max_modified >> WHERE >> max_modified.n_issue = fu.n_issue >> AND >> fu.dt_modified = max_modified.dt_modified) >> AS latestfu ON (latestfu.issue_id = i.id), >> >> We must use distinct here as we sometimes use batch scripts to enter >> followups, which will give them all similar, if not equal, modification >> dates. We also can't use followup ids as an indicator of the latest followup >> because users of the system can retrospectively go back and change older >> followups. >> >> I was hoping some one could provide a solution that does not require a >> corrolated subquery or make use of the distinct keyword. Any help would be >> much appreciated. >> >> Kind regards, >> Sebastian >> >> >> >> >> >>
Re: [SQL] Query optimizing
Cheers for you help guys. Having filtered and then joined has substantially reduced the run time. Much obliged, Sebastian On Mon, Nov 10, 2008 at 12:32 PM, Richard Huxton <[EMAIL PROTECTED]> wrote: > Sebastian Ritter wrote: > > Could it have something > > to do with the fact that it is a subquery and thus the planner can not > > deduce filtering conditions from the outer query against it? My > apologises > > if that made no sense. > > Could make a difference. > > > In summary, what im trying to understand is the following: Will there be > a > > performance difference between filtering query sets first and then > joining > > them together as opposed to joining first and then filtering? Does the > > opitmiser not choose the best course of action either way yielding the > same > > result? > > There obviously is a performance difference between joining all of the > issues table versus joining 1% of it to followups. > > In most cases the planner can push the condition into the subquery, but > not in all cases because: > 1. It's not provably correct to do so > 2. The planner isn't smart enough to figure out that it can > > It's impossible to say which applies to you without knowing the full query. > > -- > Richard Huxton > Archonet Ltd >
[SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
I have this issue: postgres=# select E'\xc5\x53\x94\x96\x83\x29'; ERROR: invalid byte sequence for encoding "UTF8": 0xc553 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". postgres=# show client_encoding ; client_encoding - UTF8 (1 row) postgres=# show server_encoding ; server_encoding - UTF8 (1 row) postgres=# select version(); version PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3) (1 row) On postgres 8.2 this worked: postgres=# select version(); version PostgreSQL 8.2.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) (1 row) postgres=# select E'\xc5\x53\x94\x96\x83\x29'; ?column? -- S) (1 row) postgres=# show client_encoding ; client_encoding - UTF8 (1 row) postgres=# show server_encoding ; server_encoding - UTF8 (1 row) I'm using the above mentioned string to store data into bytea column. I did pg_dump of the database on postgres 8.2, and then tried to restore it on postgres 8.3, and I got this error. The actuall line that produces error is like this: INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, encription_key, charset, amount_width, shop_width, counter_width) VALUES (3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16); The error is: ERROR: invalid byte sequence for encoding "UTF8": 0xc553 Now, I see that I can type: "SELECT E'\xFF'" in pg8.2, but can't do that in pg8.3. So, my question is, how do I specify hexadecimal value of C5 to be stored in bytea column, in an INSERT statement? Mike -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Query optimizing
Hi all, I was hoping to receive some advise on a slow running query in our business' Issue Tracking System. To shed some light on the below mentioned queries, here is a brief summary of how users interact with the system. The two main components in the system are a Issues and Followups. An Issue is created by our technical support staff when some form of action needs to be taken in order to resolve a problem. FollowUps are entries entered against an issue depicting the sequence of events taken to resolve the issue. There are about 15,000 Issues in the system at present and about 95,000 FollowUps in the system. As we need the system to be very responsive, each query should ideally run in under 1 second. A lot of the reports our technical officers submit to us include a listing of all actioned issues for a given day along with the last modified followup of each said issue. With the number of rows in our database increasing at a high rate, these queries are starting to run too slowly. Here is a condensed version of the two tables: Issues: = id - integer dt_created - timestamp dt_modified - timestamp t_title - varchar t_description - varchar FollowUps: = id - integer dt_created - timestamp dt_modified - timestamp t_description - varchar n_issue - foregin key to issues We have identified that the slowness in our queries is trying to return the lastest followup for each actioned issue that day. Without further ado here are two variations I have tried within the system (neither of which are making the cut): V1 (correlated subquery - Very bad performance) (SELECT fu.* FROM manage_followup fu, manage_issue i WHERE i.id = fu.n_issue AND fu.id = (SELECT id FROM manage_followup WHERE n_issue = i.id ORDER BY dt_modified DESC LIMIT 1)) AS latestfu, V2 (Using Group By, "max" aggregate function and distinct- better performance, but still bad because of distinct) SELECT DISTINCT ON (fu.n_issue) fu.id, fu.dt_created, fu.dt_modified, fu.t_description, fu.n_issue as issue_id FROM manage_followup fu, (SELECT n_issue, max(dt_modified) as dt_modified FROM manage_followup GROUP BY n_issue) as max_modified WHERE max_modified.n_issue = fu.n_issue AND fu.dt_modified = max_modified.dt_modified) AS latestfu ON (latestfu.issue_id = i.id), We must use distinct here as we sometimes use batch scripts to enter followups, which will give them all similar, if not equal, modification dates. We also can't use followup ids as an indicator of the latest followup because users of the system can retrospectively go back and change older followups. I was hoping some one could provide a solution that does not require a corrolated subquery or make use of the distinct keyword. Any help would be much appreciated. Kind regards, Sebastian
[SQL] Converting between UUID and VARCHAR
I have a table, like this: CREATE TABLE t1 ( u1 character varying ) And some data inside: INSERT INTO t1 (u1) VALUES ('62d6b434-7dfd-4b3b-b1bf-87f6c20c10dd'); INSERT INTO t1 (u1) VALUES ('e3fee596-164b-4995-9e0d-7b2a79e83752'); INSERT INTO t1 (u1) VALUES ('37a42ec8-9000-44bc-bb06-13b5d4373a45'); INSERT INTO t1 (u1) VALUES ('fe160c33-846b-4843-999e-071cbc71260c'); INSERT INTO t1 (u1) VALUES ('4a8d9697-f26c-41a4-91cd-444226e075f7'); INSERT INTO t1 (u1) VALUES ('e21cddf9-9843-42a0-acb6-95933ed2d6ee'); INSERT INTO t1 (u1) VALUES ('b3c04c2d-3706-4fa2-a3f5-b15552eaaadb'); INSERT INTO t1 (u1) VALUES ('e73d128d-fcf4-427c-959e-ac989150f2c4'); INSERT INTO t1 (u1) VALUES ('c88ac916-efb6-4afe-a2e3-8f2f49316c67'); INSERT INTO t1 (u1) VALUES ('2014ab62-bee9-4a3a-b273-58859d1d8941'); Now, I can do this: SELECT u1::uuid FROM t1; But I can't do this: ALTER TABLE t1 ALTER u1 TYPE uuid; So, if I want to change column t1 to use uuid type instead of varchar I need to do this: SELECT u1::uuid INTO _t1 from t1; DROP TABLE t1; ALTER TABLE _t1 RENAME TO t1; That's pain in the behind if I have several tables referenced with foreign keys and tons of data. Is there a more 'elegant' way of changing varchar data type to uuid? My database consists of several tables which heavily use UUIDs, but since we were on 8.2 we used varchar (actually char(36)), but now pg8.3 supports UUID, and I need to 'upgrade' my database to use that new time. Mike -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Converting between UUID and VARCHAR
am Mon, dem 10.11.2008, um 15:34:10 +0100 mailte Mario Splivalo folgendes: > I have a table, like this: > > CREATE TABLE t1 > ( > u1 character varying > ) > > And some data inside: > > INSERT INTO t1 (u1) VALUES ('62d6b434-7dfd-4b3b-b1bf-87f6c20c10dd'); > INSERT INTO t1 (u1) VALUES ('e3fee596-164b-4995-9e0d-7b2a79e83752'); > > Now, I can do this: > > SELECT u1::uuid FROM t1; > > But I can't do this: > > ALTER TABLE t1 ALTER u1 TYPE uuid; test=# CREATE TABLE t1 test-# ( test(# u1 character varying test(# ) test-# ; CREATE TABLE test=# INSERT INTO t1 (u1) VALUES ('62d6b434-7dfd-4b3b-b1bf-87f6c20c10dd'); INSERT 0 1 test=# alter table t1 alter column u1 type uuid using u1::uuid; ALTER TABLE test=# \d t1; Table "public.t1" Column | Type | Modifiers +--+--- u1 | uuid | Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Subsorting GROUP BY data
Given the following table: ID | Cat | Num |---|-- Z | A | 0 Y | A | 1 X | A | 2 W | B | 0 V | B | 1 U | B | 2 T | C | 0 S | C | 1 R | C | 2 I want to do this: Group the items by the cat field. Then select the ID where the num is the highest in the group; so it should return something like: Cat | ID | Num -|--|-- A | X | 2 B | U | 2 C | R | 2 Using SQL like this, I can get the category and the highest # in the category: SELECT cat, MAX(num) FROM my_table GROUP_BY cat; But if I add the "id" column, of course it doesn't work, since it's not in an aggregate function or in the GROUP_BY clause. So I found a post at http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php which describes how to add a "FIRST" and "LAST" aggregate function to PGSQL. However, first and last don't seem to help unless you are able to "subsort" the grouping by the # (ie, group by cat, then subsort on num, and select the "last" one of the group). I would think something like the following would work, except that PGSQL does not like the SQL generated (it basically says I can't have a GROUP_BY after an ORDER_BY). And if I move the "ORDER_BY" to the end, that just orders the returned groupings, so that doesn't help me either. SELECT cat, LAST(id), LAST(num) FROM my_table ORDER_BY num GROUP_BY cat; So does anyone know how to sort *within* a grouping so that FIRST and LAST return meaningful results? Thanks in advance, Mike Johnson -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Query optimizing
Richard Huxton wrote: > Do you have an index on (id,dt_modified) for manage_followup? Can you > provide an EXPLAIN ANALYSE for this? > Hi Richard, > > Firstly, thank-you very much for your swift reply. To answer your > question, > I had not been using an index on dt_modfied. I have added it now and > ran explain analyse on the function snippet. I am almost too > embarrassed to show > the result > > QUERY PLAN [snip] > Total runtime: 412464.804 ms Something wrong here. I've attacked a small script that generates 10,000 issues and 10 follow-ups for each. It then pulls off the most recent follow-ups for all issues occurring on a given date. The explain analyse should show both indexes being used and a runtime of a few milliseconds. -- Richard Huxton Archonet Ltd BEGIN; CREATE SCHEMA issuefup; SET search_path = issuefup; CREATE TABLE issues ( id integer, dt_created timestamp(0) with time zone, dt_modified timestamp(0) with time zone, t_title varchar(100), t_description text ); CREATE TABLE followups ( id integer, dt_created timestamp(0) with time zone, dt_modified timestamp(0) with time zone, t_description text, n_issue integer NOT NULL ); INSERT INTO issues (id, dt_created, dt_modified, t_title, t_description) SELECT (d*100 + i), '2008-01-01'::date + (d * '1 day'::interval), '2008-01-01'::date + (d * '1 day'::interval), 'issue title ' || d || '/' || i, 'issue description ' || d || '/' || i FROM generate_series(0,99) AS d, generate_series(0,99) AS i ; INSERT INTO followups (id, dt_created, dt_modified, t_description, n_issue) SELECT (i.id * 10) + d, '2008-01-01'::date + ((i.id + d) * '1 day'::interval), '2008-01-01'::date + ((i.id + d) * '1 day'::interval), 'followup description ' || ((i.id * 10) + d), i.id FROM generate_series(0,9) AS d, issues AS i ; ALTER TABLE issues ADD PRIMARY KEY (id); ALTER TABLE followups ADD PRIMARY KEY (id); ALTER TABLE followups ADD CONSTRAINT n_issue_fkey FOREIGN KEY (n_issue) REFERENCES issues (id); CREATE INDEX issues_dt_idx ON issues (dt_modified); CREATE INDEX followups_nissue_dt_idx ON followups (n_issue, dt_modified); ANALYSE ; EXPLAIN ANALYSE SELECT fu.* FROM issues i, followups fu WHERE i.dt_modified = '2008-01-07 00:00:00+00' AND fu.id = ( SELECT f.id FROM followups f WHERE f.n_issue = i.id ORDER BY f.dt_modified DESC LIMIT 1 ) ; ROLLBACK; -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Subsorting GROUP BY data
If it is to Group the items by cat field then select the ID where the num is the highest in group, You could maybe try SELECT a.ID, b.Cat,b.Num FROM my_table a JOIN ( SELECT cat, MAX(num) as maximo FROM my_table GROUP_BY cat) b ON a.Cat = b.Cat AND a.Num = b.maximo; It 'll probably give what you need (with minor fixes...) Best, Oliveiros - Original Message - From: "Johnson, Michael L." <[EMAIL PROTECTED]> To: Sent: Monday, November 10, 2008 2:56 PM Subject: [SQL] Subsorting GROUP BY data Given the following table: ID | Cat | Num |---|-- Z | A | 0 Y | A | 1 X | A | 2 W | B | 0 V | B | 1 U | B | 2 T | C | 0 S | C | 1 R | C | 2 I want to do this: Group the items by the cat field. Then select the ID where the num is the highest in the group; so it should return something like: Cat | ID | Num -|--|-- A | X | 2 B | U | 2 C | R | 2 Using SQL like this, I can get the category and the highest # in the category: SELECT cat, MAX(num) FROM my_table GROUP_BY cat; But if I add the "id" column, of course it doesn't work, since it's not in an aggregate function or in the GROUP_BY clause. So I found a post at http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php which describes how to add a "FIRST" and "LAST" aggregate function to PGSQL. However, first and last don't seem to help unless you are able to "subsort" the grouping by the # (ie, group by cat, then subsort on num, and select the "last" one of the group). I would think something like the following would work, except that PGSQL does not like the SQL generated (it basically says I can't have a GROUP_BY after an ORDER_BY). And if I move the "ORDER_BY" to the end, that just orders the returned groupings, so that doesn't help me either. SELECT cat, LAST(id), LAST(num) FROM my_table ORDER_BY num GROUP_BY cat; So does anyone know how to sort *within* a grouping so that FIRST and LAST return meaningful results? Thanks in advance, Mike Johnson -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Subsorting GROUP BY data
> -Mensaje original- > De: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] En nombre de Johnson, > Michael L. > Enviado el: Lunes, 10 de Noviembre de 2008 12:57 > Para: pgsql-sql@postgresql.org > Asunto: [SQL] Subsorting GROUP BY data > > Given the following table: > > ID | Cat | Num > |---|-- > Z | A | 0 > Y | A | 1 > X | A | 2 > W | B | 0 > V | B | 1 > U | B | 2 > T | C | 0 > S | C | 1 > R | C | 2 > > I want to do this: Group the items by the cat field. Then > select the ID where the num is the highest in the group; so > it should return something like: > > Cat | ID | Num > -|--|-- > A | X | 2 > B | U | 2 > C | R | 2 > > > Using SQL like this, I can get the category and the highest # in the > category: > > SELECT cat, MAX(num) FROM my_table GROUP_BY cat; > > But if I add the "id" column, of course it doesn't work, > since it's not in an aggregate function or in the GROUP_BY > clause. So I found a post at > http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php > which describes how to add a "FIRST" and "LAST" aggregate > function to PGSQL. However, first and last don't seem to > help unless you are able to "subsort" the grouping by the # > (ie, group by cat, then subsort on num, and select the "last" > one of the group). > I wonder if this suites you: SELECT sub.cat, t.id, sub.Num FROM my_table t, ( SELECT cat, MAX(num) as Num FROM my_table GROUP_BY cat ) sub WHERE t.cat = sub.cat AND t.Num = sub.Num ORDER BY t.cat; Regards, Fernando. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Seq scan on join, not on subselect? analyze this
Bryce, - I think that the difference between the two queries has to do with the way postgresql execute them. In the first the SGDB does: 1º Creates a temporary table with "m" X "n" rows where the "m" and "n" are the number of the rows in the tables been joined. 2º Take only the rows that has the same "work_key" 3º It restricts using the where clause. OBS: Maybe It use the where clause first on the tables just to minimize the "m" and "n". I not sure about that. Still it creates and "m" X "n" temporary table with lots of bad rows. In the second query the SGDB: 1º Select in "article_words" only the rows that correspond with the restriction to that "context_key". It results in a much smaller number of rows. "k" <<< "n". 2º It uses "k-results" and look for the for the rows where "word_key" is in the group created by the INNER Query. That's why you have the difference between the query's "Total runtime". Regards -- Helio Campos Mello de Andrade On Sun, Nov 2, 2008 at 3:51 PM, Bryce Nesbitt <[EMAIL PROTECTED]> wrote: > I'm a bit confused why the query planner is not restricting my join, and > not using the index. Two explain analyze statements follow. > Why is the second so much better? > > lyell5=> select version(); > PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 > 20061115 (prerelease) (Debian 4.1.1-21) > > lyell5=> explain analyze select * from article_words join words using > (word_key) where context_key=535462; > > ++ > | > QUERY > PLAN | > > ++ > | Hash Join (cost=192092.90..276920.93 rows=45327 width=17) (actual > time=6020.932..60084.817 rows=777 > loops=1) | > | Hash Cond: (article_words.word_key = > words.word_key) > | > | -> Index Scan using article_word_idx on article_words > (cost=0.00..55960.50 rows=45327 width=8) (actual time=0.031..0.547 > rows=777 loops=1) | > | Index Cond: (context_key = > 535462) > | > | -> Hash (cost=93819.62..93819.62 rows=5653462 width=13) (actual > time=6020.605..6020.605 rows=5651551 loops=1) > | > | -> Seq Scan on words (cost=0.00..93819.62 rows=5653462 > width=13) (actual time=0.006..2010.962 rows=5651551 > loops=1) | > | Total runtime: 60085.616 > ms > | > > ++ > > > lyell5=> explain analyze select * from words where word_key in (select > word_key from article_words where context_key=535462); > > +--+ > | > QUERY > PLAN | > > +--+ > | Nested Loop (cost=56073.81..56091.41 rows=2 width=13) (actual > time=0.808..4.723 rows=777 > loops=1) | > | -> HashAggregate (cost=56073.81..56073.83 rows=2 width=4) (actual > time=0.795..1.072 rows=777 > loops=1)| > | -> Index Scan using article_word_idx on article_words > (cost=0.00..55960.50 rows=45327 width=4) (actual time=0.030..0.344 > rows=777 loops=1) | > | Index Cond: (context_key = > 535462) > | > | -> Index Scan using words_pkey on words (cost=0.00..8.78 rows=1 > width=13) (actual time=0.003..0.004 rows=1 > loops=777)| > | Index Cond: (words.word_key = > article_words.word_key) > | > | Total runtime: 4.936 > ms > | > > +--+ > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Re: [SQL] Query optimizing
Hi Sebastian, - First of all i think there is an open-parenthesis missing in the query V2. Maybe in the V2 version you cold restrict the results in the INNER query a bit more if you use a restriction clause like "WHERE n_issue = i.id" in that. It will certainly lower the number of rows returned by it to only 1 result. Regards -- Helio Campos Mello de Andrade On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter < [EMAIL PROTECTED]> wrote: > Hi all, > > I was hoping to receive some advise on a slow running query in our > business' Issue Tracking System. To shed some light on the below mentioned > queries, here is a brief summary of how users interact with the system. The > two main components in the system are a Issues and Followups. An Issue is > created by our technical support staff when some form of action needs to be > taken in order to resolve a problem. FollowUps are entries entered against > an issue depicting the sequence of events taken to resolve the issue. There > are about 15,000 Issues in the system at present and about 95,000 FollowUps > in the system. As we need the system to be very responsive, each query > should ideally run in under 1 second. > > A lot of the reports our technical officers submit to us include a listing > of all actioned issues for a given day along with the last modified followup > of each said issue. With the number of rows in our database increasing at a > high rate, these queries are starting to run too slowly. > > Here is a condensed version of the two tables: > > Issues: > = > id - integer > dt_created - timestamp > dt_modified - timestamp > t_title - varchar > t_description - varchar > > FollowUps: > = > id - integer > dt_created - timestamp > dt_modified - timestamp > t_description - varchar > n_issue - foregin key to issues > > We have identified that the slowness in our queries is trying to return the > lastest followup for each actioned issue that day. Without further ado here > are two variations I have tried within the system (neither of which are > making the cut): > > V1 (correlated subquery - Very bad performance) > > (SELECT > fu.* > FROM > manage_followup fu, > manage_issue i > WHERE > i.id = fu.n_issue > AND > fu.id = (SELECT > id >FROM > manage_followup > WHERE > n_issue = i.id > ORDER BY > dt_modified DESC > LIMIT 1)) AS latestfu, > > V2 (Using Group By, "max" aggregate function and distinct- better > performance, but still bad because of distinct) > > > SELECT DISTINCT ON (fu.n_issue) > fu.id, > fu.dt_created, > fu.dt_modified, > fu.t_description, > fu.n_issue as issue_id > FROM > manage_followup fu, > (SELECT > n_issue, > max(dt_modified) as dt_modified > FROM > manage_followup > GROUP BY > n_issue) as max_modified > WHERE > max_modified.n_issue = fu.n_issue > AND > fu.dt_modified = max_modified.dt_modified) > AS latestfu ON (latestfu.issue_id = i.id), > > We must use distinct here as we sometimes use batch scripts to enter > followups, which will give them all similar, if not equal, modification > dates. We also can't use followup ids as an indicator of the latest followup > because users of the system can retrospectively go back and change older > followups. > > I was hoping some one could provide a solution that does not require a > corrolated subquery or make use of the distinct keyword. Any help would be > much appreciated. > > Kind regards, > Sebastian > > > > > >
Re: [SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
Mario Splivalo <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Exactly what version of pg_dump are you using? What I get from pg_dump >> doesn't look like that. Bytea fields with -D look more like this: >> >> INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)'); > Yes, I mistakenly used pg8.2 pg_dump, when I use pg3.8 dump I get what > you get. I was quoting the output of 8.2.latest pg_dump. Maybe you have a very old subrelease? But no version of pg_dump would've put an explicit cast to bytea in there. > Btw, what is that S after 305? Hex 53 is 'S' I believe. > 305 octal is C5 hexadecimal. How > do I enter hexadecimal C5 without UTF encoding errors? bytea only supports octal, so \\305 is the way to do it. The way you were doing it was guaranteed to fail on corner cases such as \0 and \ itself, because you were converting at the string-literal stage not byteain(). regards, tom lane -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
Richard Huxton <[EMAIL PROTECTED]> writes: > Mario Splivalo wrote: >> That's true, but I'd still like to use hexadecimal notation. You could use decode(): regression=# select decode('c5a4', 'hex'); decode -- \305\244 (1 row) regards, tom lane -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] inserts within function, within insert
Hello All: I am hoping to get some guidance from the local gurus, before I go off and write this code. Consider this scenario: I receive a file containing inventory info, make it simple, clothes items. Within the file there is info for different locations, so an inventory item can occur more than once. I.e. a certain jacket could be in the inventory at more than one location. It is also possible that an inventory item in the file could be a new item, an item that is not in my item table. When I parse and import the table there will be enough info provided, so that I can create the new inventory items. I would like to write a function that based upon the info provided for an inventory item will either return the item's key or if the item is new, create the item and then return the item's key. Say I am doing a simple insert to inventory like the following: INSERT INTO INVENTORY ( itemkey, locationkey, qty) SELECT getitemkey(itemtype, style, color, size), lockey, qty from IMPORT_INV Question: New records added to the ITEM table within getitemkey(), will they be available to inserts that follow? I.e. if record 2 in IMPORT_INV is for the same clothing item as record 27, and this a new inventory item, but for different locations. Will this new inventory item get added twice to my ITEM table. Many thanks for considering this issue. Kevin Duffy
Re: [SQL] Subsorting GROUP BY data
Thanks! That's perfect, because now I don't need the FIRST/LAST aggregate functions! Mike -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fernando Hevia Sent: Monday, November 10, 2008 10:30 AM To: Johnson, Michael L.; pgsql-sql@postgresql.org Subject: Re: [SQL] Subsorting GROUP BY data > -Mensaje original- > De: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] En nombre de Johnson, Michael > L. > Enviado el: Lunes, 10 de Noviembre de 2008 12:57 > Para: pgsql-sql@postgresql.org > Asunto: [SQL] Subsorting GROUP BY data > > Given the following table: > > ID | Cat | Num > |---|-- > Z | A | 0 > Y | A | 1 > X | A | 2 > W | B | 0 > V | B | 1 > U | B | 2 > T | C | 0 > S | C | 1 > R | C | 2 > > I want to do this: Group the items by the cat field. Then select the > ID where the num is the highest in the group; so it should return > something like: > > Cat | ID | Num > -|--|-- > A | X | 2 > B | U | 2 > C | R | 2 > > > Using SQL like this, I can get the category and the highest # in the > category: > > SELECT cat, MAX(num) FROM my_table GROUP_BY cat; > > But if I add the "id" column, of course it doesn't work, since it's > not in an aggregate function or in the GROUP_BY clause. So I found a > post at > http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php > which describes how to add a "FIRST" and "LAST" aggregate function to > PGSQL. However, first and last don't seem to help unless you are able > to "subsort" the grouping by the # (ie, group by cat, then subsort on > num, and select the "last" > one of the group). > I wonder if this suites you: SELECT sub.cat, t.id, sub.Num FROM my_table t, ( SELECT cat, MAX(num) as Num FROM my_table GROUP_BY cat ) sub WHERE t.cat = sub.cat AND t.Num = sub.Num ORDER BY t.cat; Regards, Fernando. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Subsorting GROUP BY data
What about replacing the table by SELECT * FROM my_table ORDER BY num i.e. something like SELECT cat, LAST(id), LAST(num) FROM (SELECT * FROM my_table ORDER BY num) AS foo GROUP_BY cat; Hope it works, just guessing it might help :-) regards Tomas > SELECT cat, MAX(num) FROM my_table GROUP_BY cat; > > But if I add the "id" column, of course it doesn't work, since it's not > in an aggregate function or in the GROUP_BY clause. So I found a post > at http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php > which describes how to add a "FIRST" and "LAST" aggregate function to > PGSQL. However, first and last don't seem to help unless you are able > to "subsort" the grouping by the # (ie, group by cat, then subsort on > num, and select the "last" one of the group). > > I would think something like the following would work, except that PGSQL > does not like the SQL generated (it basically says I can't have a > GROUP_BY after an ORDER_BY). And if I move the "ORDER_BY" to the end, > that just orders the returned groupings, so that doesn't help me either. > > SELECT cat, LAST(id), LAST(num) FROM my_table ORDER_BY num GROUP_BY cat; > > > So does anyone know how to sort *within* a grouping so that FIRST and > LAST return meaningful results? -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
Mario Splivalo <[EMAIL PROTECTED]> writes: > I have this issue: > postgres=# select E'\xc5\x53\x94\x96\x83\x29'; > ERROR: invalid byte sequence for encoding "UTF8": 0xc553 This is expected since the string is not valid as text. > I'm using the above mentioned string to store data into bytea column. I > did pg_dump of the database on postgres 8.2, and then tried to restore > it on postgres 8.3, and I got this error. The actuall line that produces > error is like this: > INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, > encription_key, charset, amount_width, shop_width, counter_width) VALUES > (3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', > E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16); Exactly what version of pg_dump are you using? What I get from pg_dump doesn't look like that. Bytea fields with -D look more like this: INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)'); regards, tom lane -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Query optimizing
Sebastian Ritter wrote: > A lot of the reports our technical officers submit to us include a listing > of all actioned issues for a given day along with the last modified followup > of each said issue. With the number of rows in our database increasing at a > high rate, these queries are starting to run too slowly. > We have identified that the slowness in our queries is trying to return the > lastest followup for each actioned issue that day. Without further ado here > are two variations I have tried within the system (neither of which are > making the cut): > > V1 (correlated subquery - Very bad performance) > > (SELECT > fu.* > FROM > manage_followup fu, > manage_issue i > WHERE > i.id = fu.n_issue > AND > fu.id = (SELECT > id >FROM > manage_followup > WHERE > n_issue = i.id > ORDER BY > dt_modified DESC > LIMIT 1)) AS latestfu, > Do you have an index on (id,dt_modified) for manage_followup? Can you provide an EXPLAIN ANALYSE for this? -- Richard Huxton Archonet Ltd -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Query optimizing
Cheers for this Richard. The more I think about it, I believe the join is being made against ALL issues and followups first and then filtered by my where clause conditions afterwards. This would in incur a scan against all 15,000 issues and 95,000 followups. Set theory tells me that I should not use the entire issue table but rather the subset of interest and then join it to the followup table, instead of joining the two tables and then filtering the results. I was under the impression that the postgresql optimizer would have done this logically by itself. Could it have something to do with the fact that it is a subquery and thus the planner can not deduce filtering conditions from the outer query against it? My apologises if that made no sense. In summary, what im trying to understand is the following: Will there be a performance difference between filtering query sets first and then joining them together as opposed to joining first and then filtering? Does the opitmiser not choose the best course of action either way yielding the same result? This might be a stupid question. Sebastian On Mon, Nov 10, 2008 at 12:03 PM, Richard Huxton <[EMAIL PROTECTED]> wrote: > Richard Huxton wrote: > > Do you have an index on (id,dt_modified) for manage_followup? Can you > > provide an EXPLAIN ANALYSE for this? > > > Hi Richard, > > > > Firstly, thank-you very much for your swift reply. To answer your > > question, > > I had not been using an index on dt_modfied. I have added it now and > > ran explain analyse on the function snippet. I am almost too > > embarrassed > to show > > the result > > > > QUERY PLAN > [snip] > > Total runtime: 412464.804 ms > > Something wrong here. I've attacked a small script that generates 10,000 > issues and 10 follow-ups for each. It then pulls off the most recent > follow-ups for all issues occurring on a given date. > > The explain analyse should show both indexes being used and a runtime of > a few milliseconds. > > -- > Richard Huxton > Archonet Ltd >
Re: [SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
Tom Lane wrote: I'm using the above mentioned string to store data into bytea column. I did pg_dump of the database on postgres 8.2, and then tried to restore it on postgres 8.3, and I got this error. The actuall line that produces error is like this: INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, encription_key, charset, amount_width, shop_width, counter_width) VALUES (3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16); Exactly what version of pg_dump are you using? What I get from pg_dump doesn't look like that. Bytea fields with -D look more like this: INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)'); Yes, I mistakenly used pg8.2 pg_dump, when I use pg3.8 dump I get what you get. Btw, what is that S after 305? 305 octal is C5 hexadecimal. How do I enter hexadecimal C5 without UTF encoding errors? Mike -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Query optimizing
Sebastian Ritter wrote: > Could it have something > to do with the fact that it is a subquery and thus the planner can not > deduce filtering conditions from the outer query against it? My apologises > if that made no sense. Could make a difference. > In summary, what im trying to understand is the following: Will there be a > performance difference between filtering query sets first and then joining > them together as opposed to joining first and then filtering? Does the > opitmiser not choose the best course of action either way yielding the same > result? There obviously is a performance difference between joining all of the issues table versus joining 1% of it to followups. In most cases the planner can push the condition into the subquery, but not in all cases because: 1. It's not provably correct to do so 2. The planner isn't smart enough to figure out that it can It's impossible to say which applies to you without knowing the full query. -- Richard Huxton Archonet Ltd -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] [PERFORM] Can we activate WAL runtime?
Hi All, I am using postgres 8.2. I want to use Write Ahead Log (WAL) functionality to take a back up. I know one way to activate WAL is through updating postgres.conf file with archive_command. but how do I activate it on the server command line? Can we activate it runtime? Thanks in advance. - Prakash
Re: [SQL] Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns
Mario Splivalo wrote: > Richard Huxton wrote: >> Mario Splivalo wrote: >>> I have this issue: >>> >>> postgres=# select E'\xc5\x53\x94\x96\x83\x29'; >>> ERROR: invalid byte sequence for encoding "UTF8": 0xc553 >> >> I think you want to be using octal escapes. That's text you're >> generating above. >> >> CREATE TABLE bytea_test (b bytea); >> INSERT INTO bytea_test (b) VALUES (E'\\305\\123\\224\\226'); >> SELECT * FROM bytea_test; >>b >> --- >> \305S\224\226 >> (1 row) > > That's true, but I'd still like to use hexadecimal notation. Manual > states that I could say '\xC5', but then I get those encoding errors. I think you're reading the "text" rather than "bytea" part of the manual. 4.1.2.1. String Constants "...and \xhexdigits, where hexdigits represents a hexadecimal byte value. (It is your responsibility that the byte sequences you create are valid characters in the server character set encoding.)" No mention of hex in the bytea section of the manual. -- Richard Huxton Archonet Ltd -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Measuring degredation of CLUSTER INDEX operation
I've got a table for which "CLUSTER tablename USING index" makes an order of magnitude difference. Are there ways to determine how "unclustered" this table becomes over time, so I can schedule downtime to recluster? I could use the pg_stat tables, but this seems awkward. -Bryce NB: For manual optimization work, it would be handy to have a feature in "ANALYZE VERBOSE" which gives a measure from 0-100%, right next to the "rows examined". 100% would be an optimally clustered result. 0% would be every row came from a distinct block on disc. Related links: http://www.postgresql.org/docs/8.3/static/sql-cluster.html http://www.postgresonline.com/journal/index.php?/archives/10-How-does-CLUSTER-ON-improve-index-performance.html -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Measuring degredation of CLUSTER INDEX operation
On Mon, Nov 10, 2008 at 12:54 PM, Bryce Nesbitt <[EMAIL PROTECTED]> wrote: > I've got a table for which "CLUSTER tablename USING index" makes an order of > magnitude difference. > > Are there ways to determine how "unclustered" this table becomes over time, > so I can schedule downtime to recluster? I could use the pg_stat tables, > but this seems awkward. You should be able to run analyze then select correlation from pg_stats where schemaname='yourschename' and tablename='yourtablename'; the closer you are to 1.0 the better the clustering. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] [PERFORM] Can we activate WAL runtime?
prakash wrote: Hi All, I am using postgres 8.2. I want to use Write Ahead Log (WAL) functionality to take a back up. I know one way to activate WAL is through updating postgres.conf file with archive_command. but how do I activate it on the server command line? Can we activate it runtime? You may be able to reload the server (/etc/init.d/postgresql reload or pg_ctl reload) to activate WAL archiving. However, this thread suggests that it is, or was, only at server restart: http://www.nabble.com/Enabling-archive_mode-without-restart-td20267884.html It appears to depend on version, in that there are suggestions that in 8.2 (which you're using) it can be set during a reload. There may be reasons why that was removed, though, so I wouldn't do anything without at least testing on a spare machine - and preferably without waiting for someone with a clue to chime in here. Hmm, like Tom Lane did in that thread referenced above: "... and affects a whole bunch of other behaviors too, in processes all across the database that could not possibly adopt a new setting synchronously. That's exactly why it was made a postmaster-start option to begin with. Simon's given no argument at all why it would be safe to flip it on-the-fly." Again, though, that may be new in 8.3, I really would wait for some confirmation. -- Craig Ringer -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql