Re: [PERFORM] Forcing the use of particular execution plans
Thanks Tom The time difference did distract me from the issue. Switching Seq Scan to off reduced the runtime greatly, so I am now adjusting the effective_cache_size, random_page_cost settings to favor indexes over Seq Scans. Regards, Tim -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, 3 October 2006 1:50 PM To: Tim Truman Cc: 'Dave Dutcher'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Forcing the use of particular execution plans Tim Truman [EMAIL PROTECTED] writes: Here is an explain analyze for the query that performs slowly, This shows that the planner is exactly correct in thinking that all the runtime is going into the seqscan on transaction: Aggregate (cost=88256.32..88256.32 rows=1 width=0) (actual time=55829.000..55829.000 rows=1 loops=1) ... - Seq Scan on transaction t (cost=0.00..87061.04 rows=1630 width=349) (actual time=234.000..55797.000 rows=200 loops=1) Filter: ((transaction_date = '2005-01-01'::date) AND (transaction_date = '2006-09-25'::date) AND ((credit_card_no)::text ~~ '4564%549'::text)) Since that component of the plan was identical in your two original plans (desired and undesired) it seems pretty clear that you have not correctly identified what your problem is. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High CPU Load
Hi All, I reply to me, we solved a CPU Load problem. We had an external batch who used an expensive SQL view and took 99% of the CPU. Thanks all for you help ! --- I started the HAPlatform open-source project is a part of Share'nGo Project, this goal is define all documentation and scripts required to install and maintain High Available platform. Tow platform are targeted : * LAPJ : Linux Apache PostgreSQL Java * LAMP : Linux Apache MySQL PHP The first documentation is here (it's my postgres configuration) : http://sharengo.org/haplatform/docs/PostgreSQL/en/html_single/index.html Cheers, Jérôme. -- Open-Source : http://www.sharengo.org Corporate : http://www.argia-engineering.fr Le vendredi 22 septembre 2006 à 09:43 +0200, Jérôme BENOIS a écrit : Hi, Markus, Le mardi 19 septembre 2006 à 15:09 +0200, Markus Schaber a écrit : Hi, Jerome, Jérôme BENOIS wrote: Now i Have 335 concurrent connections, i decreased work_mem parameter to 32768 and disabled Hyper Threading in BIOS. But my CPU load is still very important. What are your settings for commit_siblings and commit_delay? It default : #commit_delay = 01 # range 0-10, inmicroseconds #commit_siblings = 5 # range 1-1000 You should uncomment them, and play with different settings. I'd try a commit_delay of 100, and commit_siblings of 5 to start with. I plan to return to previous version : 7.4.6 in and i will reinstall all in a dedicated server in order to reproduce and solve the problem. You should use at least 7.4.13 as it fixes some critical buts that were in 7.4.6. They use the same on-disk format and query planner logic, so they should not have any difference. I don't have much more ideas what the problem could be. Can you try to do some profiling (e. G. with statement logging) to see what specific statements are the one that cause high cpu load? Are there other differences (besides the PostgreSQL version) between the two installations? (Kernel, libraries, other software...) nothing. I returned to the previous version 7.4.6 in my production server, it's work fine ! And I plan to reproduce this problem in a dedicated server, and i will send all informations in this list in the next week. I hope your help for solve this problem. Cheers, Jérôme. HTH, Markus signature.asc Description: Ceci est une partie de message numériquement signée
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: Did you think about putting the whole data into PostgreSQL using COPY in a nearly unprocessed manner, index it properly, and then use SQL and stored functions to transform the data inside the database to the desired result? This is actually what we are doing. The slowness is on the row-by-row transformation. Every row reqauires that all the inserts and updates of the pvious row be committed - that's why we have problems figuring out how to use this using SQL set logic. Maybe group by, order by, distinct on and hand-written functions and aggregates (like first() or best()) may help. You could combine all relevant columns into an user-defined compund type, then group by entity, and have a self-defined aggregate generate the accumulated tuple for each entity. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: Trying to achieve a high level of data quality in one large project is not often possible. Focus on the most critical areas of checking and get that working first with acceptable performance, then layer on additional checks while tuning. The complexity of the load programs you have also means they are susceptible to introducing data quality problems rather than removing them, so an incremental approach will also aid debugging of the load suite. I couldn't agree more. I still think that using a PL in the backend might be more performant than having an external client, alone being the SPI interface more efficient compared to the network serialization for external applications. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Performance Optimization for Dummies 2 - the SQL
Some very helpful people had asked that I post the troublesome code that was generated by my import program. I installed a SQL log feature in my import program. I have posted samples of the SQL statements that cause the biggest delays. Thanks for all of your help. Carlo -- Sample 1: This one is very expensive on my system. -- select f.facility_id, provider_practice_id from mdx_core.provider_practice as pp join mdx_core.facility as f on f.facility_id = pp.facility_id join mdx_core.facility_address as fa on fa.facility_id = pp.facility_id join mdx_core.address as a on a.address_id = fa.address_id where pp.provider_id = 1411311 and f.facility_type_code != 'P' and ( pp.facility_address_id is not null and a.state_code = 'NY' and '10001-2382' = a.postal_code||'%' and a.city = 'New York' ) or ( f.default_state_code = 'NY' and '10001-2382' like f.default_postal_code||'%' and f.default_city = 'New York' ) limit 1 Limit (cost=3899.18..32935.21 rows=1 width=8) - Hash Join (cost=3899.18..91007.27 rows=3 width=8) Hash Cond: (outer.address_id = inner.address_id) Join Filter: (((outer.provider_id = 1411311) AND (outer.facility_type_code 'P'::bpchar) AND (outer.facility_address_id IS NOT NULL) AND ((inner.state_code)::text = 'NY'::text) AND ('10001-2382'::text = ((inner.postal_code)::text || '%'::text)) AND ((inner.city)::text = 'New York'::text)) OR ((outer.default_state_code = 'NY'::bpchar) AND ('10001-2382'::text ~~ ((outer.default_postal_code)::text || '%'::text)) AND ((outer.default_city)::text = 'New York'::text))) - Merge Join (cost=0.00..50589.20 rows=695598 width=57) Merge Cond: (outer.facility_id = inner.facility_id) - Merge Join (cost=0.00..16873.90 rows=128268 width=49) Merge Cond: (outer.facility_id = inner.facility_id) - Index Scan using facility_pkey on facility f (cost=0.00..13590.18 rows=162525 width=41) - Index Scan using facility_address_facility_idx on facility_address fa (cost=0.00..4254.46 rows=128268 width=8) - Index Scan using provider_practice_facility_idx on provider_practice pp (cost=0.00..28718.27 rows=452129 width=16) - Hash (cost=3650.54..3650.54 rows=99454 width=36) - Seq Scan on address a (cost=0.00..3650.54 rows=99454 width=36) -- Sample 2: This one includes a call to a custom function which performs lexical comparisons and returns a rating on the likelihood that the company names refer to the same facility. Replacing the code: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as comp with 1 as comp -- to avoid the function call only shaved a fragment off the execution time, which leads me to believe my problem is in the SQL structure itself. -- select mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as comp, facil.* from ( select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where facility_address_id is not null and a.country_code = 'US' and a.state_code = 'IL' and '60640-5759' like a.postal_code||'%' union select f.facility_id, null as facility_address_id, null as address_id, f.facility_type_code, f.name, null as address, f.default_city as city, f.default_state_code as state_code, f.default_postal_code as postal_code, f.default_country_code as country_code from mdx_core.facility as f left outer join mdx_core.facility_address as fa on fa.facility_id = f.facility_id where facility_address_id is null and f.default_country_code = 'US' and '60640-5759' like f.default_postal_code||'%' ) as facil order by comp Sort (cost=20595.92..20598.01 rows=834 width=236) Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc'::text, (name)::text) - Subquery Scan facil (cost=20522.10..20555.46 rows=834 width=236) - Unique (cost=20522.10..20545.03 rows=834 width=103) - Sort (cost=20522.10..20524.18 rows=834 width=103) Sort Key: facility_id, facility_address_id, address_id, facility_type_code, name, address, city, state_code, postal_code, country_code - Append (cost=4645.12..20481.63 rows=834 width=103) - Nested Loop (cost=4645.12..8381.36 rows=21 width=103) - Hash Join (cost=4645.12..8301.35 rows=21 width=72) Hash Cond: (outer.address_id = inner.address_id) - Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) Filter: (facility_address_id IS NOT NULL) -
Re: [PERFORM] Performace Optimization for Dummies
I still think that using a PL in the backend might be more performant than having an external client, alone being the SPI interface more efficient compared to the network serialization for external applications. I would actually love for this to work better, as this is technology that I would like to develop in general - I see db servers with strong server-side programming languages as being able to operate as application servers, with the enterprises business logic centralised on the server. The import routine that I wrote will actually work on the server as well - it will detect the presence of the spi_ calls, and replace the pg_* calls with spi_* calls. So, you see this WAS my intention. However, the last time I tried to run something that complex from the db server, it ran quite slowly compared to from a client. This may have had something to do with the client that I used to call the stored procedure - I thought that perhaps the client created an implicit transaction around my SQL statement to allow a rollback, and all of the updates and inserts got backed up in a massive transaction queue that took forever to commit. Carlo ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performace Optimization for Dummies
Maybe group by, order by, distinct on and hand-written functions and aggregates (like first() or best()) may help. We use these - we have lexical analysis functions which assign a rating to each row in a set, and the likelyhood that the data is a match, and then we sort our results. I thought this would be the cause of the slowdowns - and it is, but a very small part of it. I have identified the problem code, and the problems are within some very simple joins. I have posted the code under a related topic header. I obviously have a few things to learn about optimising SQL joins. Carlo You could combine all relevant columns into an user-defined compund type, then group by entity, and have a self-defined aggregate generate the accumulated tuple for each entity. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Poor performance on very simple query ?
Hi List ! I have a performance problem, but I am not sure whether it really is a problem or not. I am running a fresh install of PostgreSQL 8.1.4 on Windows2000. The server is a bi-opteron with 2GB of RAM. The PostgreSQL's data folder is on a RAID-0 array of 2 SATA WD Raptor drives (10.000 rpm, 8MB cache). I have a very simple table, with only ~500 rows : CREATE TABLE table1 ( gid int4 NOT NULL DEFAULT 0, field1 varchar(45) NOT NULL, field2 int2 NOT NULL DEFAULT 1, field3 int2 NOT NULL DEFAULT 0, field4 int2 NOT NULL DEFAULT 1, field5 int4 NOT NULL DEFAULT -1, field6 int4, field7 int4, field8 int4, field9 int2 DEFAULT 1, CONSTRAINT table1_pkey PRIMARY KEY (gid) ) WITHOUT OIDS; The problem is that simple select queries with the primary key in the WHERE statement take very long to run. For example, this query returns only 7 rows and takes about 1 second to run ! SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); EXPLAIN ANALYZE SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); QUERY PLAN -- Seq Scan on table1 (cost=0.00..23.69 rows=10 width=35) (actual time=0.023..0.734 rows=7 loops=1) Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84) OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid = 68) OR (gid = 41)) Total runtime: 0.801 ms (3 rows) I have run VACUUM FULL on this table many times... I don't know what to try next ! What is wrong here (because I hope that something is wrong) ? Thanks a lot for your help ! Regards -- Arnaud ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance on very simple query ?
Steinar H. Gunderson wrote: Total runtime: 0.801 ms 0.801 ms is _far_ under a second... Where do you have the latter timing from? I fell stupid... Sorry for the useless message... [] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor performance on very simple query ?
On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote: The problem is that simple select queries with the primary key in the WHERE statement take very long to run. For example, this query returns only 7 rows and takes about 1 second to run ! SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); This is a very small table, but generally speaking, such queries benefit from an index; eg., create index table1_gid on table1 (gid); Note that PostgreSQL may still perform a sequential scan if it thinks this has a lower cost, eg. for small tables that span just a few pages. I have run VACUUM FULL on this table many times... I don't know what to try next ! PostgreSQL's query planner relies on table statistics to perform certain optimizations; make sure you run analyze table1. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Poor performance on very simple query ?
[Arnaud Lesauvage - Tue at 01:25:10PM +0200] I have a performance problem, but I am not sure whether it really is a problem or not. QUERY PLAN -- Seq Scan on table1 (cost=0.00..23.69 rows=10 width=35) (actual time=0.023..0.734 rows=7 loops=1) Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84) OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid = 68) OR (gid = 41)) Total runtime: 0.801 ms (3 rows) I have run VACUUM FULL on this table many times... I don't know what to try next ! What is wrong here (because I hope that something is wrong) ? Thanks a lot for your help ! Did you try analyze as well? It's weird it's using seq scan, since you have a primary key it's supposed to have an index ... though 500 rows is little. I just checked up our own production database, takes 0.08 ms to fetch a row by ID from one of our tables containing 176k with rows. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Poor performance on very simple query ?
[Tobias Brox - Tue at 02:10:04PM +0200] Did you try analyze as well? It's weird it's using seq scan, since you have a primary key it's supposed to have an index ... though 500 rows is little. I just checked up our own production database, takes 0.08 ms to fetch a row by ID from one of our tables containing 176k with rows. Oh, the gid is not primary key. I guess I should also apologize for adding noise here :-) Make an index here! :-) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor performance on very simple query ?
Tobias Brox wrote: Oh, the gid is not primary key. I guess I should also apologize for adding noise here :-) Yes, it is a primary key, but I am the noise maker here ! ;-) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance on very simple query ?
[Arnaud Lesauvage - Tue at 02:13:59PM +0200] Tobias Brox wrote: Oh, the gid is not primary key. I guess I should also apologize for adding noise here :-) Yes, it is a primary key, but I am the noise maker here ! ;-) Oh - it is. How can you have a default value on a primary key? Will it use the index if you do analyze? Is there an index on the table at all, do you get it up if you ask for a description of the table (\d tablename)? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Poor performance on very simple query ?
Tobias Brox wrote: [Arnaud Lesauvage - Tue at 02:13:59PM +0200] Tobias Brox wrote: Oh, the gid is not primary key. I guess I should also apologize for adding noise here :-) Yes, it is a primary key, but I am the noise maker here ! ;-) Oh - it is. How can you have a default value on a primary key? Good question, but I am not the DB designer in that case. Will it use the index if you do analyze? Is there an index on the table at all, do you get it up if you ask for a description of the table (\d tablename)? In this case (a simplified version of the real case), the pkey is the only index. It is used if I only as for one row (WHERE gid=33). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Poor performance on very simple query ?
Tobias Brox tobias 'at' nordicbet.com writes: Oh - it is. How can you have a default value on a primary key? Will it you can but it is useless :) foo=# create table bar (uid int primary key default 0, baz text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index bar_pkey for table bar CREATE TABLE foo=# insert into bar (baz) values (''); INSERT 217426996 1 foo=# insert into bar (baz) values (''); ERROR: duplicate key violates unique constraint bar_pkey -- Guillaume Cottenceau Create your personal SMS or WAP Service - visit http://mobilefriends.ch/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Unsubscribe
uwcssa wrote: Please unsubscribe me! Thank you! Also, it would be better to have a message foot saying how to unsubscribe. It would be better if you would have paid attention when you subscribed as to how to unsubscribe. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote: Some very helpful people had asked that I post the troublesome code that was generated by my import program. I installed a SQL log feature in my import program. I have posted samples of the SQL statements that cause the biggest delays. explain analyze is more helpful because it prints the times. sample 1, couple questions: what is the purpose of limit 1? if you break up the 'or' which checks facility and address into two separate queries, are the two queries total times more, less, or same as the large query. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance on very simple query ?
On October 3, 2006 04:25 am, Arnaud Lesauvage wrote: Hi List ! I have a performance problem, but I am not sure whether it really is a problem or not. I am running a fresh install of PostgreSQL 8.1.4 on Windows2000. The server is a bi-opteron with 2GB of RAM. The PostgreSQL's data folder is on a RAID-0 array of 2 SATA WD Raptor drives (10.000 rpm, 8MB cache). I have a very simple table, with only ~500 rows : CREATE TABLE table1 ( gid int4 NOT NULL DEFAULT 0, field1 varchar(45) NOT NULL, field2 int2 NOT NULL DEFAULT 1, field3 int2 NOT NULL DEFAULT 0, field4 int2 NOT NULL DEFAULT 1, field5 int4 NOT NULL DEFAULT -1, field6 int4, field7 int4, field8 int4, field9 int2 DEFAULT 1, CONSTRAINT table1_pkey PRIMARY KEY (gid) ) WITHOUT OIDS; The problem is that simple select queries with the primary key in the WHERE statement take very long to run. For example, this query returns only 7 rows and takes about 1 second to run ! According to your explain analyze, it's taking 0.8 of a milisecond (less than 1 1000th of a second) so I can't see how this can possibly be speed up. SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); EXPLAIN ANALYZE SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); QUERY PLAN --- --- Seq Scan on table1 (cost=0.00..23.69 rows=10 width=35) (actual time=0.023..0.734 rows=7 loops=1) Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84) OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid = 68) OR (gid = 41)) Total runtime: 0.801 ms (3 rows) I have run VACUUM FULL on this table many times... I don't know what to try next ! What is wrong here (because I hope that something is wrong) ? Thanks a lot for your help ! Regards -- Arnaud ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Darcy Buskermolen Command Prompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance on very simple query ?
On October 3, 2006 05:08 am, Alexander Staubo wrote: On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote: The problem is that simple select queries with the primary key in the WHERE statement take very long to run. For example, this query returns only 7 rows and takes about 1 second to run ! SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); This is a very small table, but generally speaking, such queries benefit from an index; eg., create index table1_gid on table1 (gid); gid is is a PRIMARY KEY, so it will already have an index in place. Note that PostgreSQL may still perform a sequential scan if it thinks this has a lower cost, eg. for small tables that span just a few pages. I have run VACUUM FULL on this table many times... I don't know what to try next ! PostgreSQL's query planner relies on table statistics to perform certain optimizations; make sure you run analyze table1. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Darcy Buskermolen Command Prompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 3 Oct 2006, at 16:04, Merlin Moncure wrote: On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote: Some very helpful people had asked that I post the troublesome code that was generated by my import program. I installed a SQL log feature in my import program. I have posted samples of the SQL statements that cause the biggest delays. explain analyze is more helpful because it prints the times. You can always use the \timing flag in psql ;) l1_historical=# \timing Timing is on. l1_historical=# select 1; ?column? -- 1 (1 row) Time: 4.717 ms ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
explain analyze is more helpful because it prints the times. Sorry, this runs in-line in my code, and I didn't want to slow the already-slow program with explain analyze. I have run it outside of the code in its own query. The new results are below. sample 1, couple questions: what is the purpose of limit 1? I don't need to know the results, I just need to know if any data which meets this criteria exists. if you break up the 'or' which checks facility and address into two separate queries, are the two queries total times more, less, or same as the large query. They are much less; I had assumed that SQL would use lazy evaluation in this case, not bothering to perform one half of the OR condition if the other half But the single query is much heavier than the two seperate ones. Carlo merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend select f.facility_id, provider_practice_id from mdx_core.provider_practice as pp join mdx_core.facility as f on f.facility_id = pp.facility_id join mdx_core.facility_address as fa on fa.facility_id = pp.facility_id join mdx_core.address as a on a.address_id = fa.address_id where pp.provider_id = 1411311 and f.facility_type_code != 'P' and ( pp.facility_address_id is not null and a.state_code = 'NY' and '10001-2382' = a.postal_code||'%' and a.city = 'New York' ) or ( f.default_state_code = 'NY' and '10001-2382' like f.default_postal_code||'%' and f.default_city = 'New York' ) limit 1 Limit (cost=3899.18..22561.46 rows=1 width=8) (actual time=9410.970..9410.970 rows=0 loops=1) - Hash Join (cost=3899.18..97210.58 rows=5 width=8) (actual time=9410.966..9410.966 rows=0 loops=1) Hash Cond: (outer.address_id = inner.address_id) Join Filter: (((outer.provider_id = 1411311) AND (outer.facility_type_code 'P'::bpchar) AND (outer.facility_address_id IS NOT NULL) AND ((inner.state_code)::text = 'NY'::text) AND ('10001-2382'::text = ((inner.postal_code)::text || '%' (..) - Merge Join (cost=0.00..51234.97 rows=801456 width=57) (actual time=0.314..6690.241 rows=685198 loops=1) Merge Cond: (outer.facility_id = inner.facility_id) - Merge Join (cost=0.00..15799.46 rows=128268 width=49) (actual time=0.197..1637.553 rows=128268 loops=1) Merge Cond: (outer.facility_id = inner.facility_id) - Index Scan using facility_pkey on facility f (cost=0.00..13247.94 rows=176864 width=41) (actual time=0.145..591.219 rows=126624 loops=1) - Index Scan using facility_address_facility_idx on facility_address fa (cost=0.00..4245.12 rows=128268 width=8) (actual time=0.041..384.632 rows=128268 loops=1) - Index Scan using provider_practice_facility_idx on provider_practice pp (cost=0.00..30346.89 rows=489069 width=16) (actual time=0.111..3031.675 rows=708714 loops=1) - Hash (cost=3650.54..3650.54 rows=99454 width=36) (actual time=478.509..478.509 rows=99454 loops=1) - Seq Scan on address a (cost=0.00..3650.54 rows=99454 width=36) (actual time=0.033..251.203 rows=99454 loops=1) Total runtime: 9412.654 ms -- Sample 2: This one includes a call to a custom function which performs lexical comparisons and returns a rating on the likelihood that the company names refer to the same facility. Replacing the code: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as comp with 1 as comp -- to avoid the function call only shaved a fragment off the execution time, which leads me to believe my problem is in the SQL structure itself. -- select mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as comp, facil.* from ( select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where facility_address_id is not null and a.country_code = 'US' and a.state_code = 'IL' and '60640-5759' like a.postal_code||'%' union select f.facility_id, null as facility_address_id, null as address_id, f.facility_type_code, f.name, null as address, f.default_city as city, f.default_state_code as state_code, f.default_postal_code as postal_code, f.default_country_code as country_code from mdx_core.facility as f left outer join mdx_core.facility_address as fa on fa.facility_id = f.facility_id where facility_address_id is null and f.default_country_code = 'US' and '60640-5759' like f.default_postal_code||'%' ) as facil order by comp Sort (cost=21565.50..21567.78 rows=909 width=236) (actual time=1622.448..1622.456 rows=12 loops=1) Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc'::text, (name)::text) - Subquery Scan facil (cost=21484.47..21520.83 rows=909
Re: [PERFORM] Unsubscribe
On Mon, Oct 02, 2006 at 01:36:17PM -0400, uwcssa wrote: Please unsubscribe me! Thank you! Also, it would be better to have a message foot saying how to unsubscribe. Will this do? It's too big for a footer. Here's how to unsubscribe: First, ask your Internet Provider to mail you an Unsubscribing Kit. Then follow these directions. The kit will most likely be the standard no-fault type. Depending on requirements, System A and/or System B can be used. When operating System A, depress lever and a plastic dalkron unsubscriber will be dispensed through the slot immediately underneath. When you have fastened the adhesive lip, attach connection marked by the large X outlet hose. Twist the silver-coloured ring one inch below the connection point until you feel it lock. The kit is now ready for use. The Cin-Eliminator is activated by the small switch on the lip. When securing, twist the ring back to its initial condition, so that the two orange lines meet. Disconnect. Place the dalkron unsubscriber in the vacuum receptacle to the rear. Activate by pressing the blue button. The controls for System B are located on the opposite side. The red release switch places the Cin-Eliminator into position; it can be adjusted manually up or down by pressing the blue manual release button. The opening is self-adjusting. To secure after use, press the green button, which simultaneously activates the evaporator and returns the Cin-Eliminator to its storage position. You may log off if the green exit light is on over the evaporator. If the red light is illuminated, one of the Cin-Eliminator requirements has not been properly implemented. Press the List Guy call button on the right of the evaporator. He will secure all facilities from his control panel. To use the Auto-Unsub, first undress and place all your clothes in the clothes rack. Put on the velcro slippers located in the cabinet immediately below. Enter the shower, taking the entire kit with you. On the control panel to your upper right upon entering you will see a Shower seal button. Press to activate. A green light will then be illuminated immediately below. On the intensity knob, select the desired setting. Now depress the Auto-Unsub activation lever. Bathe normally. The Auto-Unsub will automatically go off after three minutes unless you activate the Manual off override switch by flipping it up. When you are ready to leave, press the blue Shower seal release button. The door will open and you may leave. Please remove the velcro slippers and place them in their container. If you prefer the ultrasonic log-off mode, press the indicated blue button. When the twin panels open, pull forward by rings A B. The knob to the left, just below the blue light, has three settings, low, medium or high. For normal use, the medium setting is suggested. After these settings have been made, you can activate the device by switching to the ON position the clearly marked red switch. If during the unsubscribing operation you wish to change the settings, place the manual off override switch in the OFF position. You may now make the change and repeat the cycle. When the green exit light goes on, you may log off and have lunch. Please close the door behind you. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / [EMAIL PROTECTED] GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Unsubscribe
I got one of these last Christmas. It works great, but the device has no obvious power source and now I can't find my cat. God help me when I accidently try to unsubscribe like that .. Carlo [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Mon, Oct 02, 2006 at 01:36:17PM -0400, uwcssa wrote: Please unsubscribe me! Thank you! Also, it would be better to have a message foot saying how to unsubscribe. Will this do? It's too big for a footer. Here's how to unsubscribe: First, ask your Internet Provider to mail you an Unsubscribing Kit. Then follow these directions. The kit will most likely be the standard no-fault type. Depending on requirements, System A and/or System B can be used. When operating System A, depress lever and a plastic dalkron unsubscriber will be dispensed through the slot immediately underneath. When you have fastened the adhesive lip, attach connection marked by the large X outlet hose. Twist the silver-coloured ring one inch below the connection point until you feel it lock. The kit is now ready for use. The Cin-Eliminator is activated by the small switch on the lip. When securing, twist the ring back to its initial condition, so that the two orange lines meet. Disconnect. Place the dalkron unsubscriber in the vacuum receptacle to the rear. Activate by pressing the blue button. The controls for System B are located on the opposite side. The red release switch places the Cin-Eliminator into position; it can be adjusted manually up or down by pressing the blue manual release button. The opening is self-adjusting. To secure after use, press the green button, which simultaneously activates the evaporator and returns the Cin-Eliminator to its storage position. You may log off if the green exit light is on over the evaporator. If the red light is illuminated, one of the Cin-Eliminator requirements has not been properly implemented. Press the List Guy call button on the right of the evaporator. He will secure all facilities from his control panel. To use the Auto-Unsub, first undress and place all your clothes in the clothes rack. Put on the velcro slippers located in the cabinet immediately below. Enter the shower, taking the entire kit with you. On the control panel to your upper right upon entering you will see a Shower seal button. Press to activate. A green light will then be illuminated immediately below. On the intensity knob, select the desired setting. Now depress the Auto-Unsub activation lever. Bathe normally. The Auto-Unsub will automatically go off after three minutes unless you activate the Manual off override switch by flipping it up. When you are ready to leave, press the blue Shower seal release button. The door will open and you may leave. Please remove the velcro slippers and place them in their container. If you prefer the ultrasonic log-off mode, press the indicated blue button. When the twin panels open, pull forward by rings A B. The knob to the left, just below the blue light, has three settings, low, medium or high. For normal use, the medium setting is suggested. After these settings have been made, you can activate the device by switching to the ON position the clearly marked red switch. If during the unsubscribing operation you wish to change the settings, place the manual off override switch in the OFF position. You may now make the change and repeat the cycle. When the green exit light goes on, you may log off and have lunch. Please close the door behind you. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / [EMAIL PROTECTED] GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
Please ignore sample 1 - now that I have the logging feature, I can see that my query generator algorithm made an error. The SQL of concern is now script 2. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] BUG #2658: Query not using index
Hi, Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any other ideas on how to make this query use an index? Thanks, -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] On Wed, Sep 27, 2006 at 20:56:32 +, Graham Davis [EMAIL PROTECTED] wrote: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; I have an index on (ts), another index on (assetid) and a multikey index on (assetid, ts). I know the assetid index is pointless since the multikey one takes its place, but I put it there while testing just to make sure. The ANALYZE EXPLAIN for this query is: QUERY PLAN - HashAggregate (cost=125423.96..125424.21 rows=20 width=12) (actual time=39693.995..39694.036 rows=20 loops=1) - Seq Scan on asset_positions (cost=0.00..116654.64 rows=1753864 width=12) (actual time=20002.362..34724.896 rows=1738693 loops=1) Total runtime: 39694.245 ms (3 rows) You can see it is doing a sequential scan on the table when it should be using the (assetid, ts) index, or at the very least the (ts) index. This query takes about 40 seconds to complete with a table of 1.7 million rows. I tested running the query without the group by as follows: SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; This is almost what you want to do to get an alternative plan. But you need to ORDER BY assetid DESC, ts DESC to make use of the multicolumn index. If you really need the other output order, reverse it in your application or use the above as a subselect in another query that orders by assetid ASC. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BUG #2658: Query not using index
[EMAIL PROTECTED] (Graham Davis) writes: Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any other ideas on how to make this query use an index? Thanks, Why do you want to worsen performance by forcing the use of an index? You are reading through the entire table, after all, and doing so via a sequential scan is normally the fastest way to do that. An index scan would only be more efficient if you don't have enough space in memory to store all assetid values. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://www3.sympatico.ca/cbbrowne/emacs.html Expect the unexpected. -- The Hitchhiker's Guide to the Galaxy, page 7023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] BUG #2658: Query not using index
The asset_positions table has about 1.7 million rows, and this query takes over 40 seconds to do a sequential scan. Initially I was trying to get the original query: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; to use the multikey index since I read that PostgreSQL 8 added support for aggregates to use indexes. However, the GROUP BY was causing the query plan to not use any index (removing the GROUP by allowed the query to use the ts index and it took only 50 ms to run). Since I need the query to find the max time for EACH asset, I can't just drop the GROUP BY from my query. So I was trying some alternate ways of writing the query (as described in the below email) to force the use of one of these indexes. 40 seconds is much too slow for this query to run and I'm assuming that the use of an index will make it much faster (as seen when I removed the GROUP BY clause). Any tips? Graham. Chris Browne wrote: [EMAIL PROTECTED] (Graham Davis) writes: Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any other ideas on how to make this query use an index? Thanks, Why do you want to worsen performance by forcing the use of an index? You are reading through the entire table, after all, and doing so via a sequential scan is normally the fastest way to do that. An index scan would only be more efficient if you don't have enough space in memory to store all assetid values. -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #2658: Query not using index
Also, the multikey index of (assetid, ts) would already be sorted and that is why using such an index in this case is faster than doing a sequential scan that does the sorting afterwards. Graham. Chris Browne wrote: [EMAIL PROTECTED] (Graham Davis) writes: Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any other ideas on how to make this query use an index? Thanks, Why do you want to worsen performance by forcing the use of an index? You are reading through the entire table, after all, and doing so via a sequential scan is normally the fastest way to do that. An index scan would only be more efficient if you don't have enough space in memory to store all assetid values. -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote: Please ignore sample 1 - now that I have the logging feature, I can see that my query generator algorithm made an error. can you do explain analyze on the two select queries on either side of the union separatly? the subquery is correctly written and unlikely to be a problem (in fact, good style imo). so lets have a look at both sides of facil query and see where the problem is. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] BUG #2658: Query not using index
[EMAIL PROTECTED] (Graham Davis) writes: 40 seconds is much too slow for this query to run and I'm assuming that the use of an index will make it much faster (as seen when I removed the GROUP BY clause). Any tips? Assumptions are dangerous things. An aggregate like this has *got to* scan the entire table, and given that that is the case, an index scan is NOT optimal; a seq scan is. An index scan is just going to be slower. -- let name=cbbrowne and tld=linuxdatabases.info in String.concat @ [name;tld];; http://cbbrowne.com/info/linux.html The computer is the ultimate polluter: its feces are indistinguishable from the food it produces. -- Alan J. Perlis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BUG #2658: Query not using index
How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. This is why SELECT max(ts) AS ts FROM asset_positions; Uses an index on the ts column and only takes 50 milliseconds. When I added the group by it would not use a multikey index or any other index. Is there just no support for aggregates to use multikey indexes? Sorry to be so pushy, but I just want to make sure I understand why the above query can use an index and the following can't: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] Chris Browne wrote: [EMAIL PROTECTED] (Graham Davis) writes: 40 seconds is much too slow for this query to run and I'm assuming that the use of an index will make it much faster (as seen when I removed the GROUP BY clause). Any tips? Assumptions are dangerous things. An aggregate like this has *got to* scan the entire table, and given that that is the case, an index scan is NOT optimal; a seq scan is. An index scan is just going to be slower. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BUG #2658: Query not using index
On Tue, Oct 03, 2006 at 12:13:43 -0700, Graham Davis [EMAIL PROTECTED] wrote: Also, the multikey index of (assetid, ts) would already be sorted and that is why using such an index in this case is faster than doing a sequential scan that does the sorting afterwards. That isn't necessarily true. The sequentional scan and sort will need a lot fewer disk seeks and could run faster than using an index scan that has the disk drives doing seeks for every tuple (in the worst case, where the on disk order of tuples doesn't match the order in the index). If your server is caching most of the blocks than the index scan might give better results. You might try disabling sequentional scans to try to coerce the other plan and see what results you get. If it is substantially faster the other way, then you might want to look at lowering the random page cost factor. However, since this can affect other queries you need to be careful that you don't speed up one query at the expense of a lot of other queries. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #2658: Query not using index
Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #2658: Query not using index
Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] BUG #2658: Query not using index
Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #2658: Query not using index
The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #2658: Query not using index
Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] BUG #2658: Query not using index
Not many. It fluctuates, but there are usually only ever a few hundred at most. Each assetid has multi-millions of positions though. Mark Lewis wrote: Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] PostgreSQL Caching
Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #2658: Query not using index
A few hundred is quite a lot for the next proposal and it's kind of an ugly one, but might as well throw the idea out since you never know. Have you considered creating one partial index per assetid? Something along the lines of CREATE INDEX asset_index_N ON asset_positions(ts) WHERE assetid=N? I'd guess that the planner probably wouldn't be smart enough to use the partial indexes unless you issued a separate query for each assetid, but each one of those queries should be really fast. Of course, this is all assuming that PG knows how to use partial indexes to satisfy MAX queries; I'm not sure if it does. -- Mark Lewis On Tue, 2006-10-03 at 14:35 -0700, Graham Davis wrote: Not many. It fluctuates, but there are usually only ever a few hundred at most. Each assetid has multi-millions of positions though. Mark Lewis wrote: Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL Caching
Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #2658: Query not using index
Mark Lewis [EMAIL PROTECTED] writes: Have you considered creating one partial index per assetid? Something along the lines of CREATE INDEX asset_index_N ON asset_positions(ts) WHERE assetid=N? I'd guess that the planner probably wouldn't be smart enough to use the partial indexes unless you issued a separate query for each assetid, but each one of those queries should be really fast. Actually, a single index on (assetid, ts) is sufficient to handle select max(ts) from asset_positions where assetid = constant The problem is to know what values of constant to issue the query for, and this idea doesn't seem to help with that. If Graham is willing to assume that the set of assetids changes slowly, perhaps he could keep a summary table that contains all the valid assetids (or maybe there already is such a table? is assetid a foreign key?) and do select pk.assetid, (select max(ts) from asset_positions where assetid = pk.assetid) from other_table pk; I'm pretty sure the subselect would be planned the way he wants. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL Caching
Thanks, I wonder these ; * When any session updates the data that allready in shared buffer, does Postgres sychronize the data both disk and shared buffers area immediately ? * Does postgres cache SQL execution plan analyze results in memory to use for other sessions ? For example ; When session A execute SELECT * FROM tab WHERE col1 = val1 AND col2 = val2, does postgres save the parser/optimizer result in memory in order to use by other session to prevent duplicate execution of parser and optimizer so therefore get time ?. Because an execution plan is created before.. Sincenerly Adnan DURSUN - Original Message - From: Tomeh, Husam [EMAIL PROTECTED] To: Adnan DURSUN [EMAIL PROTECTED]; pgsql-performance@postgresql.org Sent: Wednesday, October 04, 2006 1:11 AM Subject: Re: [PERFORM] PostgreSQL Caching Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Forcing the use of particular execution plans
Jim C. Nasby wrote: Index scans are also pretty picky about correlation. If you have really low correlation you don't want to index scan, I'm still don't think correlation is the right metric at all for making this decision. If you have a list of addresses clustered by zip the correlation of State, City, County, etc will all be zero (since the zip codes don't match the alphabetical order of state or city names) but index scans are still big wins because the data for any given state or city will be packed on the same few pages - and in fact the pages could be read mostly sequentially. but I think our current estimates make it too eager to switch to a seqscan. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Forcing the use of particular execution plans
Adding -performance back in. On Tue, Oct 03, 2006 at 05:10:04PM -0700, Ron Mayer wrote: Jim C. Nasby wrote: Index scans are also pretty picky about correlation. If you have really low correlation you don't want to index scan, I'm still don't think correlation is the right metric at all for making this decision. If you have a list of addresses clustered by zip the correlation of State, City, County, etc will all be zero (since the zip codes don't match the alphabetical order of state or city names) but index scans are still big wins because the data for any given state or city will be packed on the same few pages - and in fact the pages could be read mostly sequentially. That's a good point that I don't think has been considered before. I think correlation is still somewhat important, but what's probably far more important is data localization. One possible way to calculate this would be to note the location of every tuple with a given value in the heap. Calculate the geometric mean of those locations (I think you could essentially average all the ctids), and divide that by the average distance of each tuple from that mean (or maybe the reciprocal of that would be more logical). Obviously we don't want to scan the whole table to do that, but there should be some way to do it via sampling as well. Or perhaps someone knows of a research paper with real data on how to do this instead of hand-waving. :) but I think our current estimates make it too eager to switch to a seqscan. -- Jim C. Nasby, Database Architect [EMAIL PROTECTED] 512.569.9461 (cell) http://jim.nasby.net ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL Caching
* When any session updates the data that already in shared buffer, does Postgres synchronize the data both disk and shared buffers area immediately ? Not necessarily true. When a block is modified in the shared buffers, the modified block is written to the Postgres WAL log. A periodic DB checkpoint is performed to flush the modified blocks in the shared buffers to the data files. * Does postgres cache SQL execution plan analyze results in memory to use for other sessions ? For example ; When session A execute SELECT * FROM tab WHERE col1 = val1 AND col2 = val2, does postgres save the parser/optimizer result in memory in order to use by other session to prevent duplicate execution of parser and optimizer so therefore get time ?. Because an execution plan is created before.. Query plans are not stored in the shared buffers and therefore can not be re-used by other sessions. They're only cached by the connection on a session level. Sincerely, -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 4:53 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL Caching Thanks, I wonder these ; * When any session updates the data that allready in shared buffer, does Postgres sychronize the data both disk and shared buffers area immediately ? * Does postgres cache SQL execution plan analyze results in memory to use for other sessions ? For example ; When session A execute SELECT * FROM tab WHERE col1 = val1 AND col2 = val2, does postgres save the parser/optimizer result in memory in order to use by other session to prevent duplicate execution of parser and optimizer so therefore get time ?. Because an execution plan is created before.. Sincenerly Adnan DURSUN - Original Message - From: Tomeh, Husam [EMAIL PROTECTED] To: Adnan DURSUN [EMAIL PROTECTED]; pgsql-performance@postgresql.org Sent: Wednesday, October 04, 2006 1:11 AM Subject: Re: [PERFORM] PostgreSQL Caching Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL Caching
- Original Message - From: Tomeh, Husam [EMAIL PROTECTED] To: Adnan DURSUN [EMAIL PROTECTED]; pgsql-performance@postgresql.org Sent: Wednesday, October 04, 2006 4:29 AM Subject: RE: [PERFORM] PostgreSQL Caching Query plans are not stored in the shared buffers and therefore can not be re-used by other sessions. They're only cached by the connection on a session level. Ok. i see. thanks..So that means that a stored object execution plan saved before is destroyed from memory after it was altered or dropped by any session. Is that true ? And last one :-) i want to be can read an execution plan when i look at it. So, is there any doc about how it should be read ? Sincenerly ! Adnan DURSUN ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match