Re: [PERFORM] query slowdown after 9.0 -> 9.4 migration
On Thu, Oct 27, 2016 at 7:38 AM, Andreas Kretschmer < akretsch...@spamfence.net> wrote: > Tomas Vondrawrote: > > > > > Perhaps the table is organized / sorted differently, or something like > > that. How did you do the upgrade? > > Nothing special, dump + reload. The table in question is tiny - 280 kB, 859 rows. > Maybe table-bloat? Filip, check if autovacuum runs properly. > > Yes, it does. Just to be sure I ran VACUUM FULL, ANALZYE and REINDEX on all tables and indexes - no change :-( Any other ideas (before drawing on heavy tools like strace)? Does it make sense to ask on postgis-users list? Thanks, Filip
[PERFORM] query slowdown after 9.0 -> 9.4 migration
Hi. Query run time degraded after migration from Pg 9.0 + postgis 1.5 to Pg 9.4 + postgis 2.2. 1 ms versus 7 ms. Same query, same data, same schema, similar hardware. Data is small and fits in cache. EXPLAIN shows heap scan cost increase. What can be the reason for 40-fold increase in page scans needed to run Bitmap Heap Scan with Filter and Recheck? GIST index performance looks OK. PostgreSQL 9.0.23 on x86_64-suse-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064], 64-bit POSTGIS="1.5.4" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8" USE_STATS (procs from 1.5 r5976 need upgrade) -> https://explain.depesz.com/s/C3Vw PostgreSQL 9.4.7 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.2, 08 September 2015" GDAL="GDAL 2.1.0, released 2016/04/25 GDAL_DATA not found" LIBXML="2.9.1" LIBJSON="0.12" (core procs from "2.2.1 r14555" need upgrade) TOPOLOGY (topology procs from "2.2.1 r14555" need upgrade) RASTER (raster procs from "2.2.1 r14555" need upgrade) -> https://explain.depesz.com/s/24GA Query: SELECT round(meters_to_miles(st_distance_sphere(ST_GeomFromText('POINT(-77.0364 38.89524)', 4326),llpoint))::numeric,2) as _distance FROM storelocator WHERE st_expand(ST_GeomFromText('POINT(-77.0364 38.89524)', 4326), miles_to_degree(50,38.89524)) && llpoint AND st_distance_sphere(ST_GeomFromText('POINT(-77.0364 38.89524)', 4326), llpoint) <= miles_to_meters(50) ORDER BY _distance LIMIT 10; thanks for any suggestions / ideas. Filip
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
Thanks for the feedback. This patch is my first and obvious approach. @Merlin, I'm not sure if I get your idea: - keep previous behaviour as obligatory? (which is: automatic de-duplicating of incoming messages by channel+payload), - instead of trivial search (sorting by browsing) use some kind of faster lookups? I'm not sure if this statement in async.c is carved in stone: * Duplicate notifications from the same transaction are sent out as one * notification only. This is done to save work when for example a trigger * on a 2 million row table fires a notification for each row that has been * changed. If the application needs to receive every single notification * that has been sent, it can easily add some unique string into the extra * payload parameter. 1) "work-saving" is disputable in some cases 2) an idea to "add some unique string" is OK logical-wise but it's not OK performance-wise. Current search code is a sequential search: https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139 I'm not that smart to devise an algorithm for faster lookups - probably you guys can give some advice. Again, my rationale is... This feature can burn a lot of CPU for nothing. I was hoping to use NOTIFY/LISTEN as superfast notification mechanism. Superfast regardless on whether you insert 100, 10k or 1m rows. On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote: > On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski > <filip.rembialkow...@gmail.com> wrote: >> patch submitted on -hackers list. >> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com >> >> results after the patch: >> >> trigger= BEGIN RETURN NULL; END >> rows=4 >> 228ms COPY test.tab FROM '/tmp/test.dat' >> 205ms COPY test.tab FROM '/tmp/test.dat' >> rows=8 >> 494ms COPY test.tab FROM '/tmp/test.dat' >> 395ms COPY test.tab FROM '/tmp/test.dat' >> rows=12 >> 678ms COPY test.tab FROM '/tmp/test.dat' >> 652ms COPY test.tab FROM '/tmp/test.dat' >> rows=16 >> 956ms COPY test.tab FROM '/tmp/test.dat' >> 822ms COPY test.tab FROM '/tmp/test.dat' >> rows=20 >> 1184ms COPY test.tab FROM '/tmp/test.dat' >> 1072ms COPY test.tab FROM '/tmp/test.dat' >> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END >> rows=4 >> 440ms COPY test.tab FROM '/tmp/test.dat' >> 406ms COPY test.tab FROM '/tmp/test.dat' >> rows=8 >> 887ms COPY test.tab FROM '/tmp/test.dat' >> 769ms COPY test.tab FROM '/tmp/test.dat' >> rows=12 >> 1346ms COPY test.tab FROM '/tmp/test.dat' >> 1171ms COPY test.tab FROM '/tmp/test.dat' >> rows=16 >> 1710ms COPY test.tab FROM '/tmp/test.dat' >> 1709ms COPY test.tab FROM '/tmp/test.dat' >> rows=20 >> 2189ms COPY test.tab FROM '/tmp/test.dat' >> 2206ms COPY test.tab FROM '/tmp/test.dat' > > I'm not so sure that this is a great idea. Generally, we tend to > discourage GUCs that control behavior at the SQL level. Are you 100% > certain that there is no path to optimizing this case without changing > behvior? > > merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
patch submitted on -hackers list. http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com results after the patch: trigger= BEGIN RETURN NULL; END rows=4 228ms COPY test.tab FROM '/tmp/test.dat' 205ms COPY test.tab FROM '/tmp/test.dat' rows=8 494ms COPY test.tab FROM '/tmp/test.dat' 395ms COPY test.tab FROM '/tmp/test.dat' rows=12 678ms COPY test.tab FROM '/tmp/test.dat' 652ms COPY test.tab FROM '/tmp/test.dat' rows=16 956ms COPY test.tab FROM '/tmp/test.dat' 822ms COPY test.tab FROM '/tmp/test.dat' rows=20 1184ms COPY test.tab FROM '/tmp/test.dat' 1072ms COPY test.tab FROM '/tmp/test.dat' trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END rows=4 440ms COPY test.tab FROM '/tmp/test.dat' 406ms COPY test.tab FROM '/tmp/test.dat' rows=8 887ms COPY test.tab FROM '/tmp/test.dat' 769ms COPY test.tab FROM '/tmp/test.dat' rows=12 1346ms COPY test.tab FROM '/tmp/test.dat' 1171ms COPY test.tab FROM '/tmp/test.dat' rows=16 1710ms COPY test.tab FROM '/tmp/test.dat' 1709ms COPY test.tab FROM '/tmp/test.dat' rows=20 2189ms COPY test.tab FROM '/tmp/test.dat' 2206ms COPY test.tab FROM '/tmp/test.dat' On Fri, Feb 5, 2016 at 1:45 PM, Filip Rembiałkowski < filip.rembialkow...@gmail.com> wrote: > On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= <filip.rembialkow...@gmail.com> >> writes: >> > A table has a trigger. >> > The trigger sends a NOTIFY. >> > Test with COPY FROM shows non-linear correlation between number of >> inserted >> > rows and COPY duration. >> >> No surprise, see AsyncExistsPendingNotify. You would have a lot of other >> performance issues with sending hundreds of thousands of distinct notify >> events from one transaction anyway, so I can't get terribly excited about >> this. >> > > > What kind of issues? Do you mean, problems in postgres or problems in > client? > > Is there an additional non-linear cost on COMMIT (extra to the cost I > already showed)? > > The 8GB internal queue (referenced in a Note at > http://www.postgresql.org/docs/current/static/sql-notify.html) should be > able to keep ~ 1E8 such notifications (assumed one notification will fit in > 80 bytes). > > On client side, this seems legit - the LISTENer deamon will collect these > notifications and process them in line. > There might be no LISTENer running at all. > > Still, the main problem I get with this approach is quadratic cost on big > insert transactions. > I wonder if this behavior is possible to change in future postgres > versions. And how much programming work does it require. > > Is duplicate-elimination a fundamental, non-negotiable requirement? > > > > Thank you, > Filip > >
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
On Thu, Feb 4, 2016 at 11:41 PM, Tom Lanewrote: > =?UTF-8?Q?Filip_Rembia=C5=82kowski?= > writes: > > A table has a trigger. > > The trigger sends a NOTIFY. > > Test with COPY FROM shows non-linear correlation between number of > inserted > > rows and COPY duration. > > No surprise, see AsyncExistsPendingNotify. You would have a lot of other > performance issues with sending hundreds of thousands of distinct notify > events from one transaction anyway, so I can't get terribly excited about > this. > What kind of issues? Do you mean, problems in postgres or problems in client? Is there an additional non-linear cost on COMMIT (extra to the cost I already showed)? The 8GB internal queue (referenced in a Note at http://www.postgresql.org/docs/current/static/sql-notify.html) should be able to keep ~ 1E8 such notifications (assumed one notification will fit in 80 bytes). On client side, this seems legit - the LISTENer deamon will collect these notifications and process them in line. There might be no LISTENer running at all. Still, the main problem I get with this approach is quadratic cost on big insert transactions. I wonder if this behavior is possible to change in future postgres versions. And how much programming work does it require. Is duplicate-elimination a fundamental, non-negotiable requirement? Thank you, Filip
[PERFORM] bad COPY performance with NOTIFY in a trigger
Hi. A table has a trigger. The trigger sends a NOTIFY. Test with COPY FROM shows non-linear correlation between number of inserted rows and COPY duration. Table "test.tab" Column | Type | Modifiers -+-+--- id | integer | not null default nextval('test.tab_id_seq'::regclass) payload | text| Indexes: "tab_pkey" PRIMARY KEY, btree (id) Triggers: trg AFTER INSERT ON test.tab FOR EACH ROW EXECUTE PROCEDURE test.fun() Test Series 1. Trigger code: BEGIN RETURN NULL; END You can see linear scaling. rows=4 191ms COPY test.tab FROM '/tmp/test.dat' 201ms COPY test.tab FROM '/tmp/test.dat' rows=8 426ms COPY test.tab FROM '/tmp/test.dat' 415ms COPY test.tab FROM '/tmp/test.dat' rows=12 634ms COPY test.tab FROM '/tmp/test.dat' 616ms COPY test.tab FROM '/tmp/test.dat' rows=16 843ms COPY test.tab FROM '/tmp/test.dat' 834ms COPY test.tab FROM '/tmp/test.dat' rows=20 1101ms COPY test.tab FROM '/tmp/test.dat' 1094ms COPY test.tab FROM '/tmp/test.dat' Test Series 2. Trigger code: BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; You can see non-linear scaling. rows=4 9767ms COPY test.tab FROM '/tmp/test.dat' 8901ms COPY test.tab FROM '/tmp/test.dat' rows=8 37409ms COPY test.tab FROM '/tmp/test.dat' 38015ms COPY test.tab FROM '/tmp/test.dat' rows=12 90227ms COPY test.tab FROM '/tmp/test.dat' 87838ms COPY test.tab FROM '/tmp/test.dat' rows=16 160080ms COPY test.tab FROM '/tmp/test.dat' 159801ms COPY test.tab FROM '/tmp/test.dat' rows=20 247330ms COPY test.tab FROM '/tmp/test.dat' 251191ms COPY test.tab FROM '/tmp/test.dat' O(N^2)
[PERFORM] 9.0 performance degradation with kernel 3.11
Hi After upgrading our 9.0 database server from: openSUSE 11.4, kernel 2.6.37.6-24-default, Pg 9.0.13 to: openSUSE 13.1, kernel v 3.11.10-21-default, Pg 9.0.15 ... and overall server load is +1 after that. We did not add any new services/daemons. It's hard to track down to individual queries - when I tested most individual query times are same as before the migration. Any - ANY - hints will be much appreciated. Thanks Filip -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple join doesn't use index
On Tue, Jan 29, 2013 at 8:24 AM, Alex Vinnik alvinni...@gmail.com wrote: On Mon, Jan 28, 2013 at 6:55 PM, Filip Rembiałkowski plk.zu...@gmail.comwrote: do you know pgtune? it's a good tool for starters, if you want a fast postgres and don't really want to learn what's behind the scenes. Yeah.. I came across pgtune but noticed that latest version dated 2009-10-29 http://pgfoundry.org/frs/?group_id=1000416 which is kind of outdated. Tar file has settings for pg 8.3. Is still relevant? Yes, I'm sure it will not do anything bad to your config. random_page_cost=1 might be not what you really want. it would mean that random reads are as fast as as sequential reads, which probably is true only for SSD What randon_page_cost would be more appropriate for EC2 EBS Provisioned volume that can handle 2,000 IOPS? I'd say: don't guess. Measure. Use any tool that can test sequential disk block reads versus random disk block reads. bonnie++ is quite popular. Filip
Re: [PERFORM] Simple join doesn't use index
On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik alvinni...@gmail.com wrote: It sure turned out that default settings are not a good fit. do you know pgtune? it's a good tool for starters, if you want a fast postgres and don't really want to learn what's behind the scenes. random_page_cost=1 might be not what you really want. it would mean that random reads are as fast as as sequential reads, which probably is true only for SSD Filip
Re: [PERFORM] problem with large inserts
Just an idea - how long does it take to run _only_ CREATE TEMP TABLE foo AS your SELECT here On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer lfisc...@staffmail.ed.ac.uk wrote: Hi I have currently some trouble with inserts into a table INSERT INTO LPP (PPID, LID) SELECT DISTINCT PPid, LID FROM (SELECT * FROM PP WHERE s_id = sid) pp INNER JOIN has_protein hp1 ON pp.p1id = hp1.pid INNER JOIN has_protein hp2 ON pp.p2_id = hp2.pid INNER JOIN (SELECT * FROM L WHERE s_id = sid) l ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos + pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2) OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos + pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1) ; If I run only SELECT DISTINCT PPid, LID FROM (SELECT * FROM PP WHERE s_id = 708) pp INNER JOIN has_protein hp1 ON pp.p1id = hp1.pid INNER JOIN has_protein hp2 ON pp.p2_id = hp2.pid INNER JOIN (SELECT * FROM L WHERE s_id = 708) l ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos + pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2) OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos + pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1) ; it returns 200620 rows in 170649 ms ( thats just under 3 minutes). I stopped the actual insert after about 8h. The table that the insert happens to, is following: CREATE TABLE LPP ( ppid bigint NOT NULL, lid bigint NOT NULL, CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid) ) I also tried without the primary key but that one is still running for more that a day. Currently the table LPP holds 471139 rows. Its linking the PP and the L table. There are no foreign keys referring to that table nor are there any other constraints on it. Previously I had foreign keys on lid and ppid refering to the L and PP table. But in a desperate try to get some speed up I deleted these. - But still... I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and the database is on something like a raid 1+0 (actually a raid1e) consisting of 3x4TB disks (limit of what could easily be fitted into the server). At the given time there were no concurrent access to any of the involved tables. Has anybody some idea why the insert takes so long and/or how to speed things up a bit? I could live with something like half an hour - better would be in minutes. Thanks for any responds, Lutz Fischer -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] problem with large inserts
Hmm, so it is some kind of file / table locking issue, not general IO system malfunction. It would be interesting and useful to run this use case on other postgres instance (or several instances), including non-Windows ones. OTOH Pg on Windows housekeeping was always fun - I advise all my clients to avoid it for production purposes. On Thu, Dec 13, 2012 at 5:33 PM, Lutz Fischer lfisc...@staffmail.ed.ac.uk wrote: Thanks a lot you saved my day create temp table foo AS SELECT DISTINCT ... did take a mere 77464.744 ms And an additional Insert into LPP select * from foo; Just 576.909 ms I don't really understand why it's working via a temp table but not directly (or in any reasonable amount of time) - but at least I have a solution I can work with. On 13/12/12 16:09, Filip Rembiałkowski wrote: Just an idea - how long does it take to run _only_ CREATE TEMP TABLE foo AS your SELECT here On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer lfisc...@staffmail.ed.ac.uk wrote: Hi I have currently some trouble with inserts into a table INSERT INTO LPP (PPID, LID) SELECT DISTINCT PPid, LID FROM (SELECT * FROM PP WHERE s_id = sid) pp INNER JOIN has_protein hp1 ON pp.p1id = hp1.pid INNER JOIN has_protein hp2 ON pp.p2_id = hp2.pid INNER JOIN (SELECT * FROM L WHERE s_id = sid) l ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos + pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2) OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos + pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1) ; If I run only SELECT DISTINCT PPid, LID FROM (SELECT * FROM PP WHERE s_id = 708) pp INNER JOIN has_protein hp1 ON pp.p1id = hp1.pid INNER JOIN has_protein hp2 ON pp.p2_id = hp2.pid INNER JOIN (SELECT * FROM L WHERE s_id = 708) l ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos + pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2) OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos + pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1) ; it returns 200620 rows in 170649 ms ( thats just under 3 minutes). I stopped the actual insert after about 8h. The table that the insert happens to, is following: CREATE TABLE LPP ( ppid bigint NOT NULL, lid bigint NOT NULL, CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid) ) I also tried without the primary key but that one is still running for more that a day. Currently the table LPP holds 471139 rows. Its linking the PP and the L table. There are no foreign keys referring to that table nor are there any other constraints on it. Previously I had foreign keys on lid and ppid refering to the L and PP table. But in a desperate try to get some speed up I deleted these. - But still... I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and the database is on something like a raid 1+0 (actually a raid1e) consisting of 3x4TB disks (limit of what could easily be fitted into the server). At the given time there were no concurrent access to any of the involved tables. Has anybody some idea why the insert takes so long and/or how to speed things up a bit? I could live with something like half an hour - better would be in minutes. Thanks for any responds, Lutz Fischer -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Lutz Fischer lfisc...@staffmail.ed.ac.uk +44 131 6517057 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] parse - bind take more time than execute
hello. 1. planning time execute time, it can happen normally, for some fast-executing queries, so it is not bad per se. 2. what are your statistics settings? they influence planning time. I mean default_statistics_target and per-column SET STATISTICS? 3. upgrade to 8.4.10, it's quick upgrade (minimal downtime) and there were some planner improvements. 4. what is considerably more time in absolute units? Filip 2011/12/27 MirrorX mirr...@gmail.com: there are some performance issues on a server and by searching in the logs i noticed that the phases of parse and bind take considerably more time than execute for most of the queries. i guess that the right thing to do in this case is to use functions or prepare statements but in any case, what could be the cause of this? information about the server- -CentOS 5.6 -4-cores -12GB ram shared_buffers: 1 GB temp_buffers = 100MB work_mem : 30 MB maintenance_mem: 512 MB database_size: 1,5 GB archive_mode is ON vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05) this behaviour is not related with checkpoints on the database (as indicated by the logs, i dont see this latency when a checkpoint occurs, i see it most of the time) so my question is the following; what can cause the bind/parse phases to take so much longer than the execute? if you need any more info the server i ll be glad to provide it. thank you in advance for your advice -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] will the planner ever use an index when the condition is ?
Normally there is no chance it could work, because (a) the planner does not know all possible values of a column, and (b) btree indexes cannot search on not equal operator. BTW I've just made a case where - logically - it could work, but it still does not: create table nums ( num int4 not null, check(num=1 or num=2) ); insert into nums select case when random()=0.99 then 1 else 2 end from generate_series(1,100); create index nums_idx on nums(num); analyze nums; set constraint_exclusion to 'on'; explain select * from nums where num1; --planner could estimate selectivity as 1%, and use index with =2 filter basing on check constraint? 2011/12/17 Roxanne Reid-Bennett r...@tara-lu.com: I have a query that used against an indexed column. In this case I can use the reverse and use in or = and get the performance I need... but in general... will the planner ever use an index when the related column is compared using ? I feel like the answer is no, but wanted to ask. Roxanne Postgres Version 8.4.9 PostGIS version 1.5.2 Context for question: I have the following query: select * from op_region opr, yield_segment_info ysi, data_location dl where opr.op_region_id in (select distinct op_region_id from yield_point where yield 0 and area 0 and ST_GeometryType(location) 'ST_Point' ) and ysi.op_region_id = opr.op_region_id and dl.data_set_id = opr.data_set_id Yield_Point has 161,575,599 records where yield 0 and area 0 has 161,263,193 records, where ST_GeometryType(location) 'ST_Point' has just 231 records yield_segment_info has 165,929 records op_region has 566,212 records data_location has 394,763 All of these have a high volume of insert/delete's. The tables have recently been vacuum full'd and the indexes reindexed. [they are under the management of the autovacuum, but we forced a cleanup on the chance that things had degraded...] If I run an explain analyze: Nested Loop (cost=5068203.00..5068230.31 rows=3 width=225308) (actual time=192571.730..193625.728 rows=236 loops=1) -Nested Loop (cost=5068203.00..5068219.66 rows=1 width=57329) (actual time=192522.573..192786.698 rows=230 loops=1) -Nested Loop (cost=5068203.00..5068211.36 rows=1 width=57268) (actual time=192509.822..192638.446 rows=230 loops=1) -HashAggregate (cost=5068203.00..5068203.01 rows=1 width=4) (actual time=192471.507..192471.682 rows=230 loops=1) -Seq Scan on yield_point (cost=0.00..5068203.00 rows=1 width=4) (actual time=602.174..192471.177 rows=230 loops=1) Filter: ((yield 0::double precision) AND (area 0::double precision) AND (st_geometrytype(location) 'ST_Point'::text)) -Index Scan using op_region_pkey on op_region opr (cost=0.00..8.33 rows=1 width=57264) (actual time=0.723..0.723 rows=1 loops=230) Index Cond: (opr.op_region_id = yield_point.op_region_id) -Index Scan using yield_segment_info_key on yield_segment_info ysi (cost=0.00..8.29 rows=1 width=65) (actual time=0.643..0.643 rows=1 loops=230) Index Cond: (ysi.op_region_id = opr.op_region_id) -Index Scan using data_location_data_set_idx on data_location dl (cost=0.00..10.61 rows=3 width=167979) (actual time=3.611..3.646 rows=1 loops=230) Index Cond: (dl.data_set_id = opr.data_set_id) Total runtime: 193625.955 ms yield_point has the following indexes: btree on ST_GeometryType(location) gist on location btree on op_region_id I've also tried an index on ((yield 0::double precision) AND (area 0::double precision) AND (st_geometrytype(location) 'ST_Point'::text)) ... it still goes for the sequential scan. But if I change it to st_geometrytype(location) = 'ST_Polygon' or even in ('ST_Polygon','ST_MultiPolygon') the planner uses the index. Roxanne -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query on tables with new columns added.
2011/9/26 M. D. li...@turnkey.bz I have full access to the database, but no access to the application source code. If creating an index will help, I can do that, but with the columns I don't see it helping as I don't have access to the application source to change that. So yes, by changing settings, I would like to know if there's any memory settings I can change to help or create an index. There is an index on the customer_id column in the gltx table, so I'm not sure what else could be done. If there was a way to create a select trigger, I would do it and return 0 for both columns on that customer_id as it should always be 0. Hi I didn't respond earlier, because I actually don't see any easy way of speeding up the query. The memory settings seem fine for this size of data. It does not look like you can change things by simply adding indexes. I mean, you can certainly add a specially crafted partial index on gltx.customer_id WHERE (gltx.inactive_on IS NULL) AND (gltx.posted = 'Y') - this can earn you a few percent max. The problem here might be the type of join columns - we can see they are about 10 characters which is not an ideal choice (that's one of reasons why I'm a fan of artificial integer pkeys). You _could_ try running the query with enable_mergejoin = off and see what happens. You can check if the problem persists after dumping and reloading to another db. If app modification was possible, you could materialize the data _before_ it must be queried - using summary table and appropriate triggers for keeping it up-to-date. Regarding your last comment - on that customer_id values should be 0 - if it's a persistent business rule, I would try to create a CHECK to reflect it. With some luck and fiddling, constraint_exclusion might come to help with speeding up your query. Also, if there is something special about customer_id distribution - table partitioning might be an option. Ok, that's a long list - hope this helps, and good luck. After all you can throw more hardware at the problem - or hire some Pg magician :-)
Re: [PERFORM] slow query on tables with new columns added.
2011/9/23 M. D. li...@turnkey.bz I did a software upgrade, and with it came a new feature where when selecting a customer it queries for the sum of a few columns. This takes 7 seconds for the 'Cash Sale' customer - by far the most active customer. I'd like to see if it's possible to get it down a bit by changing settings. To make things clear before we search for a solution. You wrote by changing settings. Is it the only option? Can't you change the query in software? Can't you change database schema (add indexes etc)? Query: explain analyse select sum(item_points),sum(disc_**points) from invoice left join gltx on invoice.invoice_id = gltx.gltx_id where gltx.inactive_on is null and gltx.posted = 'Y' and gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg' Aside from other things, you know that LEFT join here is useless? - planner should collapse it to normal join but I'd check. Filip
Re: [PERFORM] the XID question
2011/1/19 Charles.Hou giveme...@gmail.com: what's the definetion of XID? XID == Transaction ID. select * from mybook SQL command also increase the XID ? Yes. Single SELECT is a transaction. Hence, it needs a transaction ID. greets, Filip -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Auto-clustering?
2010/12/17 Royce Ausburn ro...@inomial.com Hi all, I have a table that in the typical case holds two minute sample data for a few thousand sources. Often we need to report on these data for a particular source over a particular time period and we're finding this query tends to get a bit slow. The structure of the table: Table public.sample Column | Type | Modifiers ---+--+- client| integer | not null aggregateid | bigint | not null sample| bigint | not null default nextval('samplekey'::regclass) customer | integer | period| integer | not null starttime | integer | not null duration | integer | not null ip| text | tariff| integer | bytessentrate | bigint | bytessent | bigint | bytesreceived | bigint | packets | integer | not null queuetype | integer | not null default 0 collection| integer | bytesreceivedrate | bigint | greatestrate | bigint | invalidated | timestamp with time zone | Indexes: sample_pkey PRIMARY KEY, btree (sample) sample_collection_starttime_idx btree (collection, starttime) sample_customer_starttime_idx btree (customer, starttime) sample_sample_idx btree (client, sample) Foreign-key constraints: sample_client_fkey FOREIGN KEY (client) REFERENCES client(client) fc=# explain analyse select collection, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as startchunk from sample_20101001 where starttime between 1287493200 and 1290171599 and collection=128and ip = '10.9.125.207' group by startchunk, tariff, collection, period; QUERY PLAN - HashAggregate (cost=34959.01..34959.03 rows=1 width=44) (actual time=67047.850..67047.850 rows=0 loops=1) - Bitmap Heap Scan on sample_20101001 (cost=130.56..34958.91 rows=5 width=44) (actual time=67047.847..67047.847 rows=0 loops=1) Recheck Cond: ((collection = 128) AND (starttime = 1287493200) AND (starttime = 1290171599)) Filter: (ip = '10.9.125.207'::text) - Bitmap Index Scan on sample_20101001_collection_starttime_idx (cost=0.00..130.56 rows=9596 width=0) (actual time=9806.115..9806.115 rows=6830 loops=1) Index Cond: ((collection = 128) AND (starttime = 1287493200) AND (starttime = 1290171599)) Total runtime: 67048.201 ms (7 rows) how about (auto)vacuuming? I figure at most there should only be ~20,000 rows to be read from disk, and I expect that the index is doing a pretty good job of making sure only the rows that need reading are read. inclusion of the ip in the query is almost redundant as most of the time an ip has its own collection My suspicion is that the rows that we're interested in are very sparsely distributed on disk, so we're having to read too many pages for the query... you can test this suspicion in very simple way: - create test table (like yours including indexes including constraints, but with no data) - insert into test select * from yours order by - analyze test tablee available - test the query on the new table If new query is much faster, and if you have intensive random UPD/DEL/INS activity, periodic CLUSTER could be a good idea... but it depends on actual usage patterns (SELECT/modify ratio, types of updates, and so on). All of the queries on this table are reporting on a single collection, so ideally a collection's data would all be stored in the same part of the disk... or at least clumped together. This can be achieved using cluster, however as far as I know there's no automated, non-cronesque means of clustering and having the table become unusable during the cluster is not ideal. cron is a way of automation, isn't it :-) I've considered partitioning, but I don't think that's going to give the effect I need. Apparently clustering is only going to scale to a few dozen child tables, so that's only going to give one order of magnitude performance for significant complexity. regarding partitioning: I guess it starts to make sense around 10M rows or 10G Bytes in one table. regarding clustering: it does not help with index bloat. and finally, you did not specify what
Re: [PERFORM] Auto-clustering?
you are right, I must have missed it... Table public.u Column |Type | Modifiers +-+--- id | integer | t | timestamp without time zone | d | text| Indexes: u_d btree (d) u_id btree (id) u_t btree (t) fi...@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d'); oid | relname | pg_relation_size ---+-+-- 64283 | u | 15187968 64289 | u_id| 6758400 64290 | u_t | 6086656 64291 | u_d | 16482304 fi...@filip=# CLUSTER u USING u_t; CLUSTER fi...@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d'); oid | relname | pg_relation_size ---+-+-- 64283 | u | 12115968 64289 | u_id| 3391488 64290 | u_t | 3391488 64291 | u_d | 8216576 (4 rows) So CLUSTER is effectively CLUSTER + REINDEX... nice. W dniu 17 grudnia 2010 10:41 użytkownik Marti Raudsepp ma...@juffo.orgnapisał: 2010/12/17 Filip Rembiałkowski filip.rembialkow...@gmail.com: regarding clustering: it does not help with index bloat. I'm almost sure it does, CLUSTER re-creates all indexes from scratch after copying the tuples. Regards, Marti