Re: [PERFORM] performance on new linux box
On 09/07/10 02:31, Ryan Wexler wrote: Thanks a lot for all the comments. The fact that both my windows box and the old linux box both show a massive performance improvement over the new linux box seems to point to hardware to me. I am not sure how to test the fsync issue, but i don't see how that could be it. The raid card the server has in it is: 3Ware 4 Port 9650SE-4LPML RAID Card Looking it up, it seems to indicate that it has BBU The only other difference between the boxes is the postgresql version. The new one has 8.4-2 from the yum install instructions on the site: http://yum.pgrpms.org/reporpms/repoview/pgdg-centos.html Any more thoughts? Really dumb idea, you don't happen to have the build of the RPM's that had debug enabled do you? That resulted in significant performance problem? Regards Russell
Re: [PERFORM] Low perfomance SUM and Group by large databse
On 22/06/10 00:42, Sergio Charpinel Jr. wrote: Hi, [snip] = explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM(bytes),SUM(packets),SUM(flows) FROM acct_2010_25 WHERE stamp_inserted='2010-06-20 10:10' AND stamp_inserted'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0; QUERY PLAN -- Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual time=276981.107..276981.133 rows=50 loops=1) - Sort (cost=3998662.81..4001046.07 rows=953305 width=50) (actual time=276981.105..276981.107 rows=50 loops=1) Sort Key: sum(bytes) - GroupAggregate (cost=3499863.27..3754872.33 rows=953305 width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1) - Sort (cost=3499863.27..3523695.89 rows=9533049 width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1) Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto You are having to sort and aggregate a large number of rows before you can get the top 50. That's 9 million rows in this case, width 50 = 400MB+ sort. That's going to be slow as you are going to have to sort it on disk unless you bump up sort mem to 500Mb (bad idea). So unless you have really fast storage for temporary tables it's going to take a while. About 2.5 minutes you are experiencing at the moment is probably not too bad. I'm sure improvements have been made in the area since 8.1 and if you are able to upgrade to 8.4 which is also offered by Centos5 now, you might get benefit there. I can't remember the specific benefits, but I believe sorting speed has improved, your explain analyze will also give you more information about what's going on with disk/memory sorting. - Seq Scan on acct_2010_25 (cost=0.00..352648.10 rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1) Filter: ((stamp_inserted = '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted '2010-06-21 10:10:00'::timestamp without time zone)) Total runtime: 278791.661 ms (9 registros) Another one just summing bytes (still low): = explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM(bytes) FROM acct_2010_25 WHERE stamp_inserted='2010-06-20 10:10' AND stamp_inserted'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto LIMIT 50 OFFSET 0; QUERY PLAN Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual time=106261.359..106261.451 rows=50 loops=1) - GroupAggregate (cost=3395202.50..3602225.48 rows=974226 width=42) (actual time=106261.357..106261.435 rows=50 loops=1) - Sort (cost=3395202.50..3419558.14 rows=9742258 width=42) (actual time=106261.107..106261.169 rows=176 loops=1) Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto - Seq Scan on acct_2010_25 (cost=0.00..367529.72 rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1) Filter: ((stamp_inserted = '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted '2010-06-21 10:10:00'::timestamp without time zone)) Total runtime: 109911.882 ms (7 registros) The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM. I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just increased checkpoint_segments to 50). Checkpoint segments won't help you as the number of segments is about writing to the database and how fast that can happen. What can I change to increase performance? Increasing sort-memory (work_mem) will give you speed benefits even though you are going to disk. I don't know how much spare memory you have, but trying other values between 8MB and 128MB may be useful just for the specific query runs. If you can afford 512Mb for each of the two sorts, go for that, but it's dangerous as mentioned due to the risk of using more RAM than you have. work_mem allocates that amount of memory per sort. If you are running these queries all the time, a summary table the produces there reports on a regular basis, maybe daily or even hourly would be useful. Basically the
Re: [PERFORM] Getting time of a postgresql-request
Kai Behncke wrote: But I would like to get it in a php-script, like $timerequest_result=pg_result($timerequest,0); (well, that does not work). I wonder: Is there another way to get the time a request needs? How do you handle this? $time = microtime() $result = pg_result($query); echo Time to run query and return result to PHP: .(microtime() - $time); Something like that. Regards Russell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SQL select query becomes slow when using limit (with no offset)
Kees van Dieren wrote: Hi Folks, Thanks for your response. I have added the following index (suggested by other post): CREATE INDEX events_events_cleared_eventtype ON events_events USING btree (eventtype_id, cleared) WHERE cleared = false; Also with columns in reversed order. No changes in response time noticed. Index on cleared column already is there (indices are in sql file attached to initial post.). eventtype_id has a foreign key constraint, which adds an index automatically I believe? The explain analyze results for both queries: explain analyze select events_events.id http://events_events.id FROM events_events left join events_event_types on events_events.eventType_id=events_event_types.id http://events_event_types.id where events_event_types.severity=70 and not events_events.cleared order by events_events.dateTime DESC LIMIT 100 Limit (cost=0.00..125.03 rows=100 width=16) (actual time=0.046..3897.094 rows=77 loops=1) - Nested Loop (cost=0.00..120361.40 rows=96269 width=16) (actual time=0.042..3896.881 rows=77 loops=1) - Index Scan Backward using events_events_datetime_ind on events_events (cost=0.00..18335.76 rows=361008 width=24) (actual time=0.025..720.345 rows=360637 loops=1) Filter: (NOT cleared) - Index Scan using events_event_types_pkey on events_event_types (cost=0.00..0.27 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=360637) Index Cond: (events_event_types.id http://events_event_types.id = events_events.eventtype_id) Filter: (events_event_types.severity = 70) Total runtime: 3897.268 ms The plan here is guessing that we will find the 100 rows we want pretty quickly by scanning the dateTime index. As we aren't expecting to have to look through many rows to find 100 that match the criteria. With no cross column statistics it's more a guess than a good calculation. So the guess is bad and we end up scanning 360k rows from the index before we find what we want. My skills are not up to giving specific advise on how to avert this problem. Maybe somebody else can help there. explain analyze select events_events.id http://events_events.id FROM events_events left join events_event_types on events_events.eventType_id=events_event_types.id http://events_event_types.id where events_event_types.severity=70 and not events_events.cleared order by events_events.dateTime DESC Sort (cost=20255.18..20495.85 rows=96269 width=16) (actual time=1084.842..1084.951 rows=77 loops=1) Sort Key: events_events.datetime Sort Method: quicksort Memory: 20kB - Hash Join (cost=2.09..12286.62 rows=96269 width=16) (actual time=1080.789..1084.696 rows=77 loops=1) Hash Cond: (events_events.eventtype_id = events_event_types.id http://events_event_types.id) - Seq Scan on events_events (cost=0.00..9968.06 rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1) Filter: (NOT cleared) - Hash (cost=1.89..1.89 rows=16 width=8) (actual time=0.077..0.077 rows=16 loops=1) - Seq Scan on events_event_types (cost=0.00..1.89 rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1) Filter: (severity = 70) Total runtime: 1085.145 ms Any suggestions? This plan is faster as you avoid the index scan. The planner is preferring to do a tablescan to find what it needs. This is much faster than the 360k random I/O index lookups. You can force this type of plan with a subquery and the OFFSET 0 trick, but I'm not sure it's the best solution. eg explain analyze SELECT * FROM (SELECT events_events.id http://events_events.id FROM events_events LEFT JOIN events_event_types on events_events.eventType_id=events_event_types.id http://events_event_types.id WHERE events_event_types.severity=70 AND not events_events.cleared ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100 Regards Russell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause breaks query planner?
Pavel Stehule wrote: Hello 2008/9/1 David West [EMAIL PROTECTED]: Thanks for your suggestion but the result is the same. Here is the explain analyse output from different queries. Select * from my_table where A is null and B = '21' limit 15 Limit (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1) - Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15 loops=1) Filter: ((A IS NULL) AND ((B)::text = '21'::text)) Total runtime: 85896.214 ms [snip] Further to Pavel's comments; (actual time=85837.038..85896.091 rows=15 loops=1) That's 85 seconds on a sequence scan to return the first tuple. The table is not bloated by any chance is it? Regards Russell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Define all IP's in the world in pg_hba.conf
idc danny wrote: Hi everybody, I know that this group deals with performance but is the only one on which I'm subscribed, so my apologize in advance for the question. I want to allow everybody in the world, all IP's, to connect to my server. How do I accomplish that? Definitely, it's not a good solution to enter all them manually in pg_hba.conf :). what's wrong with 0.0.0.0/0 ? Currently, if above question cannot be answered, I want to achieve to allow the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future it can expand to all IP's. Thank you, Danny -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Heavy write activity on first vacuum of fresh TOAST data
Simon Riggs wrote: On Thu, 2007-12-13 at 15:19 -0600, Kevin Grittner wrote: What impact would lack of the hint bits have until a vacuum? Vacuum isn't important here. Its the first idiot to read the data that gets hit. Given vacuum must then touch every page, is there a win in only setting hint bits on pages where vacuum has to do some other work on the page? As vacuum is causing significant IO load for data that may not be accessed for some time. The question becomes what is the impact of not setting hint bits? Is it better or worse than the IO caused by vacuum? Regards Russell Smith ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] TB-sized databases
Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all depends upon your workload really as to how well PostgreSQL, or another other RDBMS vendor can handle them. Anyway, my reason for replying to this thread is that I'm planning changes for PostgreSQL 8.4+ that will make allow us to get bigger and faster databases. If anybody has specific concerns then I'd like to hear them so I can consider those things in the planning stages it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. In my opinion, by definition, a huge database sooner or later will have tables far bigger than RAM available (same for their indexes). I think the queries need to be solved using indexes enough smart to be fast on disk. OK, I agree with this one. I'd thought that index-only plans were only for OLTP, but now I see they can also make a big difference with DW queries. So I'm very interested in this area now. If that's true, then you want to get behind the work Gokulakannan Somasundaram (http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has done with relation to thick indexes. I would have thought that concept particularly useful in DW. Only having to scan indexes on a number of join tables would be a huge win for some of these types of queries. My tiny point of view would say that is a much better investment than setting up the proposed parameter. I can see the use of the parameter though. Most of the complaints about indexes having visibility is about update /delete contention. I would expect in a DW that those things aren't in the critical path like they are in many other applications. Especially with partitioning and previous partitions not getting may updates, I would think there could be great benefit. I would think that many of Pablo's requests up-thread would get significant performance benefit from this type of index. But as I mentioned at the start, that's my tiny point of view and I certainly don't have the resources to direct what gets looked at for PostgreSQL. Regards Russell Smith ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Join performance
Pepe Barbe wrote: Hello, I am having an issue on PostgreSQL 8.0.12. In the past we had performance issues with the query planner for queries on some tables where we knew we had indexes and it was doing a sequential scan, and for this reason we issue SET enable_seqscan = FALSE for some queries. Recently we have stumbled upon one of these kind of queries that is giving terrible performance, because seqscan is disabled. I've reduced the problem to a a command like this one: SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER JOIN temp_busy_hr USING(start_time,bsc_id,sect_id); Where temp_busy_hr is a temporary table. Have you tried analyzing the temp_busy_hr table? Possibly adding an index to the temp table can help if you are doing lots of queries. If the previous is issued with seqscan TRUE, it runs within reasonable time, else it runs for ever. The query plan for the previous query with enable_seqscan = TRUE: It would be worth know how far the estimates are out. Also, have you tried altering the statistics target for relevant columns to increase the accuracy? QUERY PLAN Limit (cost=0.00..384555.98 rows=1 width=3092) - Nested Loop (cost=0.00..384555.98 rows=1 width=3092) Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND (inner.sect_id = outer.sect_id)) - Nested Loop (cost=0.00..368645.64 rows=28 width=1192) Join Filter: ((outer.sect_id = inner.sect_id) AND (outer.bsc_id = inner.bsc_id)) - Seq Scan on temp_busy_hr (cost=0.00..24.00 rows=1400 width=24) - Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics (cost=0.00..226.66 rows=2094 width=1168) Index Cond: (outer.start_time = gsm_amr_metrics.start_time) - Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 (cost=0.00..528.77 rows=1973 width=1936) Index Cond: (t1.start_time = outer.start_time) (10 rows) and the plan for enable_seqscan = FALSE: QUERY PLAN Limit (cost=10097.16.. 100720844.01 rows=1 width=3092) - Nested Loop (cost=10097.16..100720844.01 rows=1 width=3092) Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND (inner.sect_id = outer.sect_id)) - Merge Join (cost=10097.16..100704933.67 rows=28 width=1192) Merge Cond: (outer.start_time = inner.start_time) Join Filter: ((inner.sect_id = outer.sect_id) AND (inner.bsc_id = outer.bsc_id)) - Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics (cost=0.00..631211.45 rows=6005551 width=1168) - Sort (cost=10097.16..10100.66 rows=1400 width=24) Sort Key: temp_busy_hr.start_time - Seq Scan on temp_busy_hr (cost=1.00..10024.00 rows=1400 width=24) - Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 (cost=0.00..528.77 rows=1973 width=1936) Index Cond: (t1.start_time = outer.start_time) (12 rows) Any ideas what could I try to fix this problem? Thanks, Pepe ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimising in queries
Michael Glaesemann wrote: On Aug 22, 2007, at 5:58 , Russell Smith wrote: Stephen Davies wrote: select count(rdate),rdate from reading where sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc limit 1; It would have been helpful to see the table definition here. I can say up front that array processing in postgres is SLOW. Um, what array processing are you seeing here? IN (a, b, b) is not an array construct. Filter: (sensor_id = ANY ('{1137,1138,1139,1140}'::integer[])) I've never seen this plan item except for when array's are involved. I could be wrong. I'd like to know how this is generated when you don't have an array. Regards Russell Smith ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Poor Performance after Upgrade
Ben Perrault wrote: Hi, I recently inherited a very old (PostgreSQL 7.0.3) database, and have migrated it to 8.2.4 but have run into a performance issue. Basically, I did a dump and import into the new database, vacuumed and created fresh indexes and everything is work great except the following type of query (and similar): SELECT tsr.stepId, tsr.testType, tsr.problemReportId, tsr.excpt, tcr.caseId FROM TestCaseRun tcr, TestStepRun tsr WHERE tcr.parentSN = 194813 AND(tsr.testType '' OR tsr.problemReportId '' OR tsr.excpt'') ANDtsr.parentSN = tcr.recordSN This query is not similar to the plans listed below. It will not result in a sort/unique unless tcr or tsr are views. Can we also see explain analyze instead of just explain, it's much more helpful to see what's actually going on. Especially since the row estimates are quite different in the two plans. You also mentioned above that you vacuumed, did you analyze with that? vacuum doesn't do analyze in 8.2.4. You have to say vacuum analyze, or just analyze. What used to take 250ms or so on the old database now takes between 55 and 60 Seconds. On the old database, the query plan looks like this: Unique (cost=13074.30..13078.36 rows=32 width=68) - Sort (cost=13074.30..13074.30 rows=324 width=68) - Nested Loop (cost=0.00..13060.77 rows=324 width=68) - Index Scan using parentsn_tcr_indx on testcaserun tcr (cost=0.00..444.83 rows=111 width=16) - Index Scan using parentsn_tsr_indx on teststeprun tsr (cost=0.00..113.42 rows=27 width=52) And on the new database it looks like this: Unique (cost=206559152.10..206559157.14 rows=336 width=137) - Sort (cost=206559152.10..206559152.94 rows=336 width=137) Sort Key: tsr.stepid, tsr.testtype, tsr.problemreportid, tsr.excpt, tcr.caseid - Nested Loop (cost=1.00..106559138.00 rows=336 width=137) - Index Scan using parentsn_tcr_indx on testcaserun tcr (cost=0.00..17.00 rows=115 width=11) Index Cond: (parentsn = 186726) - Index Scan using parentsn_tsr_indx on teststeprun tsr (cost=0.00..56089.00 rows=75747 width=134) Index Cond: (tsr.parentsn = tcr.recordsn) Filter: ((testtype ''::text) OR ((problemreportid)::text ''::text) OR (excpt ''::text)) (9 rows) I'm fairly familiar with PostgreSQL, but I have no idea where to start in trying to trouble shoot this huge performance discrepancy. The hardware and OS are the same. And the data size is exactly the same between the two, and the total data size is about 7.5GB, with the largest table (teststeprun mentioned above) being about 15 million rows. Any pointers to where to start troubleshooting this or how to change the query to work better would be appreciated. Look at row estimates vs reality. They should be pretty close in the new version. Why are the costs so high in the new plan? 1 happens to be a nice number that's used when you attempt to turn off a certain type of plan. EXPLAIN ANALZE (query) is your friend. Regards Russell ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimising in queries
Stephen Davies wrote: I have a PostgreSQL 8.2.4 table with some seven million rows. The psql query: select count(rdate),rdate from reading where sensor_id in (1137,1138,1139) group by rdate order by rdate desc limit 1; takes a few seconds but: select count(rdate),rdate from reading where sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc limit 1; It would have been helpful to see the table definition here. I can say up front that array processing in postgres is SLOW. (anything with four or more values in the in list) takes several minutes. Is there any way to make the larger queries more efficient? Both rdate and sensor_id are indexed and the database is vacuumed every night. The values in the in list are seldom as neat as in the above examples. Actual values can range from 1 to about 2000. The number of values ranges from 2 to about 10. Explain outputs are: benparts=# explain select count(rdate),rdate from reading where sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc limit 1; QUERY PLAN --- Limit (cost=0.00..39890.96 rows=1 width=8) - GroupAggregate (cost=0.00..7938300.21 rows=199 width=8) - Index Scan Backward using date on reading (cost=0.00..7937884.59 rows=82625 width=8) Filter: (sensor_id = ANY ('{1137,1138,1139,1140}'::integer[])) (4 rows) I'm unsure of how you produced a plan like this without the benefit of seeing the table definition. benparts=# explain select count(rdate),rdate from reading where sensor_id in (1137,1138,1139) group by rdate order by rdate desc limit 1; QUERY PLAN - Limit (cost=48364.32..48364.32 rows=1 width=8) - Sort (cost=48364.32..48364.49 rows=69 width=8) Sort Key: rdate - HashAggregate (cost=48361.35..48362.21 rows=69 width=8) - Bitmap Heap Scan on reading (cost=535.53..48218.10 rows=28650 width=8) Recheck Cond: (sensor_id = ANY ('{1137,1138,1139}'::integer[])) - Bitmap Index Scan on reading_sensor (cost=0.00..528.37 rows=28650 width=0) Index Cond: (sensor_id = ANY ('{1137,1138,1139}'::integer[])) (8 rows) As mentioned already, you need explain analyze. However I again will say that array processing is postgres is SLOW. It would strongly recommend redesigning your schema to use a table with sensor_id's that correspond to the primary key in the reading table. Rethinking the way you are going about this will probably be the most effective solution, but we will need more information if you are not comfortable doing that yourself. Regards Russell Smith ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Guido Neitzer wrote: On 27.01.2007, at 00:35, Russell Smith wrote: Guess 1 would be that your primary key is int8, but can't be certain that is what's causing the problem. Why could that be a problem? Before 8.0, the planner would not choose an index scan if the types were different int8_col = const, int8_col = 4. 4 in this example is cast to int4. int8 != int4. So the planner will not choose an index scan. Regards Russell Smith cug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Carlos Moreno wrote: Hi, I find various references in the list to this issue of queries being too slow because the planner miscalculates things and decides to go for a sequenctial scan when an index is available and would lead to better performance. Is this still an issue with the latest version? I'm doing some tests right now, but I have version 7.4 (and not sure when I will be able to spend the effort to move our system to 8.2). When I force it via set enable_seqscan to off, the index scan takes about 0.1 msec (as reported by explain analyze), whereas with the default, it chooses a seq. scan, for a total execution time around 10 msec!! (yes: 100 times slower!). The table has 20 thousand records, and the WHERE part of the query uses one field that is part of the primary key (as in, the primary key is the combination of field1,field2, and the query involves a where field1=1 and some_other_field=2). I don't think I'm doing something wrong, and I find no reason not to expect the query planner to choose an index scan. For the time being, I'm using an explicit enable_seqscan off in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? Please supply explain analyze for the query in both the index and sequence scan operation. We may be able to tell you why it's choosing the wrong options. Guess 1 would be that your primary key is int8, but can't be certain that is what's causing the problem. Regards Russell Smith Thanks, Carlos ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] unusual performance for vac following 8.2 upgrade
Kim wrote: snip OS: Solaris 10 write transactions/hr: 1.5 million size of pg_class: 535,226 number of relations: 108,694 That is a huge pg_class. I remember some discussion recently about problems with 8.2 and the way it scans pg_class. I also believe it's fixed in 8.2.1. Are you running that. If not, I suggest you upgrade and see if the fault still exists. Regards Russell Smith Thanks to all, Kim ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Improving SQL performance
Carlos H. Reimer wrote: Hi, I know that the problem with the following SQL is the LOG.CODCEP = ENDE.CODCEP||CODLOG condition, but what can I do to improve the performance? I wouldn't say it's the join condition. There is a nested loop join on 500k+ rows. Is it possible to put an index on LOG.CODCEP? That might give you a better plan, as you only have 1 row in the left of the join. so index scan would be preferable. Regards Russell Smith Is there a type of index that could help or is there another way to build this SQL? Thank you in advance! explain analyze SELECT ENDE.* , DEND.DESEND, DEND.USOEND, DEND.DUPEND, to_char('F') as NOVO, LOG.TIPLOG FROM TT_END ENDE LEFT OUTER JOIN TD_END DEND ON DEND.CODTAB = ENDE.TIPEND LEFT OUTER JOIN TT_LOG LOG ON LOG.CODCEP = ENDE.CODCEP||CODLOG WHERE ENDE.FILCLI = '001' AND ENDE.CODCLI = ' 19475'; QUERY PLAN -- Nested Loop Left Join (cost=0.00..25366.84 rows=1259 width=417) (actual time=1901.499..1901.529 rows=1 loops=1) Join Filter: ((inner.codcep)::text = ((outer.codcep)::text || (outer.codlog)::text)) - Nested Loop Left Join (cost=0.00..4.91 rows=1 width=412) (actual time=0.117..0.144 rows=1 loops=1) Join Filter: (inner.codtab = outer.tipend) - Index Scan using pk_end on tt_end ende (cost=0.00..3.87 rows=1 width=388) (actual time=0.066..0.078 rows=1 loops=1) Index Cond: ((filcli = '001'::bpchar) AND (codcli = ' 19475'::bpchar)) - Seq Scan on td_end dend (cost=0.00..1.02 rows=2 width=33) (actual time=0.012..0.018 rows=2 loops=1) - Seq Scan on tt_log log (cost=0.00..12254.24 rows=582424 width=17) (actual time=0.013..582.521 rows=582424 loops=1) Total runtime: 1901.769 ms (9 rows) \d tt_log Table TOTALL.tt_log Column | Type | Modifiers ++--- codbai | numeric(5,0) | not null nomlog | character varying(55) | not null codcep | character(8) | not null \d tt_end Table TOTALL.tt_end Column | Type |Modifiers +---+- ... ... ... codlog | character(3) | ... ... ... codcep | character(5) | ... ... Reimer ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Backup/Restore too slow
Sebastián Baioni wrote: Thanks for answering. This is my configuration: # - Memory - shared_buffers = 1000# min 16, at least max_connections*2, 8KB each #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048# min 100, size in KB The PC where we are runing PostgreSQL server is: AMD Athlon(tm) 64 Processor 3000+ 1.79 GHz, 1.93 GB RAM with WindowsXP Proffesional, Version 2002 Service Pack 2. How should we set it? Shared buffers even on a workstation should be higher than 1000 if you want some performance. It depends how much memory you have spare to use for PostgreSQL. But something like shared_buffers = 2 maintenance_work_mem = 256000 Will certainly give you a performance boost. You will have to adjust those figures based on whatever else you are doing on the machine. Russell Smith. Thanks a lot! Sebastián */Tom Lane [EMAIL PROTECTED]/* escribió: Rod Taylor writes: Rebuilding the indexes or integrity confirmations are probably taking most of the time. What is your work_mem setting? maintenance_work_mem is the thing to look at, actually. I concur that bumping it up might help. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: [PERFORM] Slow SELECT on three or more clients
AMIR FRANCO D. JOVEN wrote: Hi! Im new to PostgreSQL. My current project uses PostgreSQL 7.3.4. Upgrading your version of PostgreSQL to 8.1 will give you significant benefits to performance. the problem is like this: I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. The client executes three 'select' statements to perform the task which consumes 2.58 seconds. With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower. for a single select with one field in one resultset, is 0.86 seconds normal? You will need to attach the query. EXPLAIN ANALYZE SELECT ... where SELECT ... is your query. That will help us work out what the problem is. 0.86 seconds might be slow for a query that returns 1 row, it might be fast for a query that returns a large set with complex joins and where conditions. Fast and slow are not objective terms. They are very dependent on the query. I tried vacuuming and reindexing but to no avail. the total record count in that particular table is 456,541. 456,541 is not all that many records. But again you will need to post more information for us to be able to assist. Thanks in advance. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres server crash
Craig A. James wrote: For the third time today, our server has crashed, or frozen, actually something in between. Normally there are about 30-50 connections because of mod_perl processes that keep connections open. After the crash, there are three processes remaining: # ps -ef | grep postgres postgres 23832 1 0 Nov11 pts/100:02:53 /usr/local/pgsql/bin/postmaster -D /postgres/main postgres 1200 23832 20 14:28 pts/100:58:14 postgres: pubchem pubchem 66.226.76.106(58882) SELECT postgres 4190 23832 25 14:33 pts/101:09:12 postgres: asinex asinex 66.226.76.106(56298) SELECT But they're not doing anything: No CPU time consumed, no I/O going on, no progress. If I try to connect with psql(1), it says: psql: FATAL: the database system is in recovery mode And the server log has: LOG: background writer process (PID 23874) was terminated by signal 9 LOG: terminating any other active server processes LOG: statistics collector process (PID 23875) was terminated by signal 9 WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited ab normally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited ab ... repeats about 50 times, one per process. Questions: 1. Any idea what happened and how I can avoid this? It's a *big* problem. 2. Why didn't the database recover? Why are there two processes that couldn't be killed? 3. Where did the signal 9 come from? (Nobody but me ever logs in to the server machine.) I would guess it's the linux OOM if you are running linux. You need to turn off killing of processes when you run out of memory. Are you getting close to running out of memory? Help! Thanks, Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] View not using index
On Mon, 13 Jun 2005 04:54 pm, Yves Vindevogel wrote: Still, when I use explain, pg says it will first sort my tables instead of using my index How is that possible ? Can we see the output of the explain analyze? The definition of the view? Regards Russell Smith ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] View not using index
Please CC the list. On Mon, 13 Jun 2005 05:11 pm, Yves Vindevogel wrote: create or replace view vw_document_pagesperjob as select documentname, eventdate, eventtime, loginuser, fnFormatInt(pages) as pages from tblPrintjobs order by descpages, documentname ; rvponp=# explain select documentname, eventdate, eventtime, loginuser, pages from tblPrintjobs order by descpages, documentname ; QUERY PLAN Sort (cost=81326.07..82796.59 rows=588209 width=74) Sort Key: descpages, documentname - Seq Scan on tblprintjobs (cost=0.00..24958.09 rows=588209 width=74) (3 rows) Postgresql must scan the entire heap anyway, so ordering in memory will be faster, and you don't have to load the pages from disk in a random order. rvponp=# explain select documentname, eventdate, eventtime, loginuser, pages from tblPrintjobs order by descpages, documentname limit 10 ; QUERY PLAN - Limit (cost=0.00..33.14 rows=10 width=74) - Index Scan using ixprintjobspagesperjob on tblprintjobs (cost=0.00..1949116.68 rows=588209 width=74) (2 rows) That's because an index scan is only useful if you are scanning a small percentage of the table. Which you are doing when you have the limit clause. Strange thing is, when I immediately add the limit clause, it runs like I want it to run. I am not sure of the usefulness of the first query anyway, it returns a lot of data. How do you expect it not to scan the whole table when you want all the data form the table? Problem is that I run this from Cocoon. Cocoon adds the limit clause itself. Maybe I need to rewrite everything in functions instead of views. Functions, views. It will make not difference. The issue is the amount of data returned relative to the amount of data in the table. Regards Russell Smith ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance nightmare with dspam (urgent)
On Thu, 2 Jun 2005 06:19 am, Casey Allen Shobe wrote: I found this response to my original post, and tried every single suggestion in it, which has not helped: http://archives.postgresql.org/pgsql-performance/2004-11/msg00416.php I'm sorry to come begging for help, but this is a MAJOR problem with no logical explanation, and is almost certainly the fault of PostgreSQL, because the database and contents have been identical across all the hosts, and some work beautifully with no tuning whatsoever; so I don't feel I'm wrong in placing blame... I would personally strongly suggest turing on logging on the PG server for about an hour, sifting through the runtimes for the queries and finding which ones are taking all the time. I'd then run explain analyze and see what is happening. I have heard you could get much better performance by rewriting some of the dspam queries to use PG features. But I've never used dspam, so I can't verify that. But a quick look through the dspam pg driver source... /* Declare Cursor */ #ifdef VIRTUAL_USERS strcpy (query, DECLARE dscursor CURSOR FOR SELECT DISTINCT username FROM dspam_virtual_uids); #else strcpy (query, DECLARE dscursor CURSOR FOR SELECT DISTINCT uid FROM dspam_stats); #endif If that's run often, it probably won't give the best performance, but that's a guess. Again I'd suggest turning up the logging. All machines run Gentoo Linux. All have the same package versions. Disk I/O doesn't seem to be related - the 733MHz server had a 33MB/s IDE drive, the 2.4GHz server had a RAID 5 with 3 ultra320 drives: neither of those required any tuning. The new 3.0GHz has a mirror raid with 2 ultra320 drives, and the 3000+ that tuning fixed had an ultra160 disk not in a RAID. I really like PostgreSQL, and really don't want to use MySQL for dspam, but if I can't get this worked out ASAP I'm going to have to change for the sake of our customers. Any help is GREATLY appreciated! Again I'd suggest turning up the logging. I'm online on instant messengers (contact IDs shown below), monitoring my email, and will be on #postgresql on Freenode. Cheers, ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query tuning help
On Mon, 9 May 2005 09:20 am, Dan Harris wrote: Sorry to bother everyone with yet another my query isn't using an index problem but I am over my head on this one.. I am open to ways of restructuring this query to perform better. I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has been just run on the table. This is the result of: explain analyze select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat from ea, em, eg where em.incidentid = ea.incidentid and em.incidentid = eg.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and ea.incidentid in ( select incidentid from ea where recordtext like '%RED%' ) and ea.incidentid in ( select incidentid from ea where recordtext like '%CORVETTE%' ) and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate You cannot use an index for %CORVETTE%, or %RED%. There is no way for the index to know if a row had that in the middle without scanning the whole index. So it's much cheaper to do a sequence scan. One possible way to make the query faster is to limit based on date, as you will only get about 700 rows. And then don't use subselects, as they are doing full sequence scans. I think this query does what you do above, and I think it will be faster, but I don't know. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%') JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' ); - ANALYZE RESULTS - Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1) - Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1) Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat - Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1) - Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1) - Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1) - Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1) - HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1) - Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1) Filter: ((recordtext)::text ~~ '%CORVETTE%'::text) - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) - Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011) Index Cond: ((outer.incidentid)::text = (ea.incidentid)::text) Filter: ((recordtext)::text ~~ '%RED%'::text) - Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701) Index Cond: ((outer.incidentid)::text = (em.incidentid)::text) Filter: ((entrydate = '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 00:00:00'::timestamp without time zone)) - Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72) Index Cond: ((outer.incidentid)::text = (eg.incidentid)::text) Total runtime: 446871.880 ms (22 rows) - EXPLANATION - The reason for the redundant LIKE clause is that first, I only want those incidentids that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered
Re: [PERFORM] Query tuning help
On Mon, 9 May 2005 11:49 am, Dan Harris wrote: On May 8, 2005, at 6:51 PM, Russell Smith wrote: [snip] select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%') JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' ); I have run this, and while it is very fast, I'm concerned it's not doing what I need. How fast is very fast? Here's the situation: Due to the format of the systems with which I integrate ( I have no control over these formats ), we will get these 'recordtext' values one line at a time, accumulating over time. The only way I can find to make this work is to insert a new record for each line. The problem is, that when someone wants to search multiple keywords, they expect these words to be matched across multiple records with a given incident number. For a very simple example: IncidentID DateRecordtext -- - --- 1 2005-05-01 14:21 blah blah blah RED blah blah 2005-05-01 14:23 not what we are looking for 1 2005-05-02 02:05 blah CORVETTE blah blah So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the condition will only be applied to a single row of recordtext at a time, not a whole group with the same incident number. If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I may also receive additional updates to the previous data. In that case, I need to replace the original record with the latest version of it. If I have already concatenated these rows into a single field, the logic to in-line replace only the old text that has changed is very very difficult at best. So, that's the reason I had to do two subqueries in my example. Please tell me if I misunderstood your logic and it really will match given my condition above, but it didn't seem like it would. Thanks again for the quick responses! This list has been a great resource for me. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%' OR ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid WHERE em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid) AND em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%')) JOIN eg ON em.incidentid = eg.incidentid) This may be more accurate. However I would cool it VERY NASTY. Josh's solutions may be better. However much of the data should be in memory once the subplans are done, so it may be quite fast. you may -Dan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] column name is LIMIT
On Mon, 14 Mar 2005 06:14 pm, Gourish Singbal wrote: Guys, I am having a problem firing queries on one of the tables which is having limit as the column name. If a run an insert/select/update command on that table i get the below error. ERROR: syntax error at or near limit at character 71 select limit from limit_table WHERE limit 50 LIMIT 2; You need to quote the field name, and make sure the case is correct. Any Help would be realyl great to solve the problem. postgresql 7.4.5 and linux OS You should probably upgrade to 7.4.7 Regards Russell Smith. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Index use and slow queries
On Sun, 13 Mar 2005 04:40 pm, Tom Pfeifer wrote: Hello, My version of Postgresql is 7.4.3. I have a simple table with 2 indexes: Table public.tst Column | Type | Modifiers +-+- tst_id | bigint | default nextval('tst_id_seq'::text) mmd5 | character varying(32) | not null active | character(1) | not null lud | timestamp without time zone | default now() Indexes: tst_idx unique, btree (mmd5, active) tst_tst_id_key unique, btree (tst_id) There are exactly 1,000,000 (one million) rows in the table (tst). There are no NULLS, empty columns in any row. I get really fast response times when using the following select statement (Less than 1 second). maach=# explain select * from tst where mmd5 = '71e1c18cbc708a0bf28fe106e03256c7' and active = 'A'; QUERY PLAN -- Index Scan using tst_idx on tst (cost=0.00..6.02 rows=1 width=57) Index Cond: (((mmd5)::text = '71e1c18cbc708a0bf28fe106e03256c7'::text) AND (active = 'A'::bpchar)) (2 rows) I get really slow repoonse times when using the following select statement (About 20 seconds). maach=# explain select * from tst where tst_id = 639246; Before 8.0, bigint would not use an index unless you cast it, or quote it. eg explain select * from tst where tst_id = 639246::int8; explain select * from tst where tst_id = '639246'; Hope this helps. Russell Smith ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to boost performance of ilike queries ?
On Tue, 25 Jan 2005 07:23 pm, Antony Paul wrote: Creating an index and using lower(column) does not change the explain plan estimates. It seems that it is not using index for like or ilike queries irrespective of whether it have a pattern matching character in it or not. (using PostgreSQL 7.3.3) On googling I found this thread http://archives.postgresql.org/pgsql-sql/2004-11/msg00285.php It says that index is not used if the search string begins with a % symbol. What exactly are the type of like queries you are going? there is a solution for having the % at the start, but you can win everyway. rgds Antony Paul On Mon, 24 Jan 2005 20:58:54 +1100, Russell Smith [EMAIL PROTECTED] wrote: On Mon, 24 Jan 2005 08:18 pm, Antony Paul wrote: Hi, I have a query which is executed using ilike. The query values are received from user and it is executed using PreparedStatement. Currently all queries are executed as it is using iilike irrespective of whether it have a pattern matching character or not. Can using = instead of ilike boot performance ?. If creating index can help then how the index should be created on lower case or uppercase ?. It depends on the type of queries you are doing. changing it to something like lower(column) like lower('text%'), and creating an index on lower(column) will give you much better performance. If you have % in the middle of the query, it will still be slow, but I assume that is not the general case. I am not sure what the effect of it being prepared will be, however I've had much success with the method above without the queries being prepared. Others may be able to offer advice about if prepare will effect it. Regards Russell Smith ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to boost performance of ilike queries ?
On Mon, 24 Jan 2005 08:18 pm, Antony Paul wrote: Hi, I have a query which is executed using ilike. The query values are received from user and it is executed using PreparedStatement. Currently all queries are executed as it is using iilike irrespective of whether it have a pattern matching character or not. Can using = instead of ilike boot performance ?. If creating index can help then how the index should be created on lower case or uppercase ?. It depends on the type of queries you are doing. changing it to something like lower(column) like lower('text%'), and creating an index on lower(column) will give you much better performance. If you have % in the middle of the query, it will still be slow, but I assume that is not the general case. I am not sure what the effect of it being prepared will be, however I've had much success with the method above without the queries being prepared. Others may be able to offer advice about if prepare will effect it. Regards Russell Smith ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index scan of whole table, can't see why
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote: On 20 Jan 2005 at 7:26, Stephan Szabo wrote: [snip] Honestly I expected it to be slower (which it was), but I figured it's worth seeing what alternate plans it'll generate (specifically to see how it cost a nested loop on that join to compare to the fast plan). Unfortunately, it generated a merge join, so I think it might require both enable_hashjoin=false and enable_mergejoin=false to get it which is likely to be even slower in practice but still may be useful to see. Setting both to false gives a dramatic performance boost. See http://rafb.net/paste/results/b70KAi42.html - Materialize (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92) - Nested Loop (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1) The Planner here has a quite inaccurate guess at the number of rows that will match in the join. An alternative to turning off join types is to up the statistics on the Element columns because that's where the join is happening. Hopefully the planner will get a better idea. However it may not be able too. 2766 rows vs 135 is quite likely to choose different plans. As you can see you have had to turn off two join types to give something you wanted/expected. This gives suitable speed, but why does the plan vary so much with such a minor change in the WHERE clause? Plan 1 - broken - Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1) Plan 2 - deprecated - Hash Join (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1) The performance difference is when the where is changed, you have a totally different set of selection options. The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1. But for plan 2 its a factor of 20. The planner is likely to make the wrong choice when the stats are out by that factor. Beware what is a small typing change does not mean they queries are anything alight. Regards Russell Smith. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] query plan question
On Wed, 17 Nov 2004 02:54 pm, you wrote: I have a query for which postgres is generating a different plan on different machines. The database schema is the same, the dataset is the same, the configuration is the same (e.g., pg_autovacuum running in both cases), both systems are Solaris 9. The main difference in the two systems is that one is sparc and the other is intel. The query runs in about 40 ms on the intel box, but takes about 18 seconds on the sparc box. Now, the intel boxes we have are certainly faster, but I'm curious why the query plan might be different. For the intel: QUERY PLAN Unique (cost=11.50..11.52 rows=2 width=131) - Sort (cost=11.50..11.50 rows=2 width=131) Sort Key: up.prefix, s.name, s.tuid, s.foundryversion - Hash Join (cost=10.42..11.49 rows=2 width=131) Hash Cond: (outer.dbid = inner.schema) - Seq Scan on schema s (cost=0.00..1.02 rows=2 width=128) - Hash (cost=10.41..10.41 rows=4 width=11) - Nested Loop (cost=0.00..10.41 rows=4 width=11) - Nested Loop (cost=0.00..2.14 rows=4 width=4) - Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) Filter: (servicetype = 646) - Index Scan using usage_flow_i on usage u (cost=0.00..2.06 rows=6 width=8) Index Cond: (u.flow = outer.dbid) - Index Scan using usageparameter_usage_i on usageparameter up (cost=0.00..2.06 rows=1 width=15) Index Cond: (up.usage = outer.dbid) Filter: ((prefix)::text 'xsd'::text) For the sparc: QUERY PLAN Unique (cost=10.81..10.83 rows=1 width=167) - Sort (cost=10.81..10.82 rows=1 width=167) Sort Key: up.prefix, s.name, s.tuid, s.foundryversion - Nested Loop (cost=9.75..10.80 rows=1 width=167) Join Filter: (outer.flow = inner.dbid) - Hash Join (cost=9.75..10.79 rows=1 width=171) Hash Cond: (outer.dbid = inner.schema) - Seq Scan on schema s (cost=0.00..1.02 rows=2 width=128) - Hash (cost=9.75..9.75 rows=1 width=51) - Nested Loop (cost=0.00..9.75 rows=1 width=51) Join Filter: (inner.usage = outer.dbid) - Index Scan using usage_flow_i on usage u (cost=0.00..4.78 rows=1 width=8) - Index Scan using usageparameter_schema_i on usageparameter up (cost=0.00..4.96 rows=1 width=51) Filter: ((prefix)::text 'xsd'::text) - Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) Filter: (servicetype = 646) Unique (cost=11.50..11.52 rows=2 width=131) Unique (cost=10.81..10.83 rows=1 width=167) The estimations for the cost is basically the same, 10ms for the first row. Can you supply Explain analyze to see what it's actually doing? Russell Smith ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgresql is using seqscan when is should use indexes.
On Mon, 8 Nov 2004 09:40 pm, Andreas kre Solberg wrote: We have two tables, dst_port_hour and dst_port_day, which should be very similar, they both have about 50.000.000 rows. In both tables we have an index for period_id. We run postgresql 7.4.5 on a dedicated Debian server, with dual Intel Xeon 3GHz and 4GB memory. The problem is that on the dst_port_day table, postgresql is using seqscan, and not the index when it should. Forcing the use of the index by setting enable_seqscan to false, makes the query lighthening fast. When using seqscan, the query takes several minutes. The planner calculates the cost for Index scan to be much more than sequence scan. Why is our query planner misbehaving? Here are the exaplain analyze output with and without index-force: SET enable_seqscan=false; stager= explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC LIMIT 5; dst_port_day cur (cost=0.00..2019931.14 rows=546150 width=12) (actual time=0.038..303.801 rows=48072 loops=1) The guess of the number of rows returned by the index scan is out by a factor of 10. 500k rows is greater than 1% of the rows, so I think the planner is likely to choose a sequence scan at this amount, unless you have tuned things like random page cost. What is the selectivity like on that column? Have you analyzed recently? If so, you should probably increase the statistics on that column See ALTER TABLE SET STATISTICS in the manual. QUERY PLAN - Limit (cost=2022664.62..2022664.63 rows=5 width=12) (actual time=831.772..831.816 rows=5 loops=1) - Sort (cost=2022664.62..2022664.82 rows=80 width=12) (actual time=831.761..831.774 rows=5 loops=1) Sort Key: sum(octets) - HashAggregate (cost=2022661.89..2022662.09 rows=80 width=12) (actual time=587.036..663.991 rows=16396 loops=1) - Index Scan using dst_port_day_period_id_key on dst_port_day cur (cost=0.00..2019931.14 rows=546150 width=12) (actual time=0.038..303.801 rows=48072 loops=1) Index Cond: (period_id = 2779) Total runtime: 836.362 ms (7 rows) SET enable_seqscan=true; stager= explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC LIMIT 5; QUERY PLAN -- Limit (cost=1209426.88..1209426.89 rows=5 width=12) (actual time=299053.006..299053.053 rows=5 loops=1) - Sort (cost=1209426.88..1209427.08 rows=80 width=12) (actual time=299052.995..299053.008 rows=5 loops=1) Sort Key: sum(octets) - HashAggregate (cost=1209424.15..1209424.35 rows=80 width=12) (actual time=298803.273..298881.020 rows=16396 loops=1) - Seq Scan on dst_port_day cur (cost=0.00..1206693.40 rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072 loops=1) Filter: (period_id = 2779) Total runtime: 299057.643 ms (7 rows) Regards Russell Smith ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select with qualified join condition / Batch inserts
On Fri, 15 Oct 2004 08:47 pm, Gavin Sherry wrote: On Fri, 15 Oct 2004, Bernd wrote: Hi, [snip] Table-def: Table public.scr_well_compound Column | Type | Modifiers ++--- mat_id | numeric(10,0) | not null barcode| character varying(240) | not null well_index | numeric(5,0) | not null id_level | numeric(3,0) | not null compound | character varying(240) | not null Indexes: scr_wcm_pk PRIMARY KEY, btree (id_level, mat_id, barcode, well_index) numeric is not optimized by postgresql like it is by Oracle. You will get much better performance by changing the numeric types to int, big int, or small int. That should get the query time down to somewhere near what Oracle is giving you. Regards Russell Smith. [snip] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query planner problem
On Sat, 2 Oct 2004 08:06 am, Ryan VanMiddlesworth wrote: [snip] Here is the query and EXPLAIN that runs quickly: SELECT case_id FROM case_data WHERE case_filed_date '2004-09-16' AND case_filed_date '2004-09-20' QUERY PLAN - Index Scan using case_data_case_filed_date on case_data (cost=0.00..13790.52 rows=3614 width=18) Index Cond: ((case_filed_date '2004-09-16'::date) AND (case_filed_date '2004-09-20'::date)) And here is the query and EXPLAIN from the version that I believe the planner should reduce to be logically equivalent: SELECT case_id FROM case_data WHERE (('2004-09-16' IS NULL) OR (case_filed_date '2004-09-16')) AND (('2004-09-20' IS NULL) OR (case_filed_date '2004-09-20')) QUERY PLAN - Seq Scan on case_data (cost=0.00..107422.02 rows=27509 width=18) Filter: ((('2004-09-16' IS NULL) OR (case_filed_date '2004-09-16'::date)) AND (('2004-09-20' IS NULL) OR (case_filed_date '2004-09-20'::date))) I was hoping that the null comparisons would get folded out by the planner relatively cheaply. But as you can see, the first query uses indexes and the second one uses sequence scans, thereby taking much longer. I guess my question is - is there a better way to accomplish what I'm doing in SQL or am I going to have to dynamically generate the statement based on supplied parameters? The Index does not store NULL values, so you have to do a tables scan to find NULL values. That means the second query cannot use an Index, even if it wanted to. Regards Russell Smith ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query performance issue with 8.0.0beta1
7.4.2 Aggregate (cost=46817.89..46817.89 rows=1 width=0) (actual time=401.216..401.217 rows=1 loops=1) - Index Scan using snsdata_codpar on SNS_DATA (cost=0.00..46817.22 rows=268 width=0) (actual time=165.948..400.258 rows=744 loops=1) Index Cond: ((Cod_Par)::text = '17476'::text) Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp without time zone)) Total runtime: 401.302 ms Row counts are out by a factor of 3, on the low side. so the planner will guess index is better, which it is. ***while on 8.0.0*** Aggregate (cost=93932.91..93932.91 rows=1 width=0) (actual time=14916.371..14916.371 rows=1 loops=1) - Seq Scan on SNS_DATA (cost=0.00..93930.14 rows=1108 width=0) (actual time=6297.152..14915.330 rows=744 loops=1) Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp without time zone) AND ((Cod_Par)::text = '17476'::text)) Total runtime: 14916.935 ms Planner guesses that 1108 row should be returned, which is out by less, but on the high side. Big question is given there are 2M rows, why does returning 1108 rows, less than 1% result in a sequence scan. Usually the selectivity on the index is bad, try increasing the stats target on the column. I know 8.0 has new stats anaylsis code, which could be effecting how it choses the plan. But it would still require a good amount of stats to get it to guess correctly. Increase stats and see if the times improve. And I if disable the seqscan SET enable_seqscan = false; I get the following: Aggregate (cost=158603.19..158603.19 rows=1 width=0) (actual time=4605.862..4605.863 rows=1 loops=1) - Index Scan using snsdata_codpar on SNS_DATA (cost=0.00..158600.41 rows=1108 width=0) (actual time=2534.422..4604.865 rows=744 loops=1) Index Cond: ((Cod_Par)::text = '17476'::text) Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp without time zone)) Total runtime: 4605.965 ms The total runtime is bigger (x10 !!) than the old one. Did you run this multiple times, or is this the first time. If it had to get the data off disk it will be slower. Are you sure that it's coming from disk in this and the 7.4 case? or both from memory. If 7.4 is from buffer_cache, or kernel_cache, and 8.0 is from disk you are likely to get A LOT slower. The memory runtime parameters are shared_buffer = 2048 work_mem = sort_mem = 2048 [ snip ] The table has 2M of records Can it be a datatype conversion issue? That should not be an issue in 8.0, at least for the simple type conversions. like int8 to int4. I'm not 100% sure which ones were added, and which were not, but the query appears to cast everything correctly anyway. Can it be depend on the the type of restore (with COPY commands)? Shouldn't and VACUUM FULL ANALYZE will make the table as small as possible. The row order may be different on disk, but the planner won't know that, and it's a bad plan causing the problem. I have no idea. Thanks in advance! Reds Regards Russell Smith. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Hardware upgrade for a high-traffic database
[snip] One question I do have though - you specifically mentioned NOW() as something to watch out for, in that it's mutable. We typically use COUNT() as a subselect to retrieve the number of associated rows to the current query. Additionally, we use NOW a lot, primarily to detect the status of a date, i.e.: SELECT id FROM subscriptions WHERE userid = 1 AND timeend NOW(); Is there a better way to do this? I was under the impression that NOW() was pretty harmless, just to return a current timestamp. NOW() will trigger unnessecary sequence scans. As it is unknown with prepared query and function when the statement is run, the planner plans the query with now as a variable. This can push the planner to a seq scan over and index scan. I have seen this time and time again. You can create your own immutable now, but don't use it in functions or prepared queries or you will get wrong results. Based on feedback, I'm looking at a minor upgrade of our RAID controller to a 3ware 9000 series (SATA with cache, battery backup optional), and re-configuring it for RAID 10. It's a damn cheap upgrade at around $350 and an hour of downtime, so I figure that it's worth it for us to give it a shot. Thanks, Jason Russell Smith ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] my boss want to migrate to ORACLE
On Thu, 29 Jul 2004 03:08 am, Stephane Tessier wrote: Hi everyone, somebody can help me??? my boss want to migrate to ORACLE we have a BIG problem of performance,it's slow we use postgres 7.3 for php security application with approximately 4 millions of insertion by day and 4 millions of delete and update and archive db with 40 millions of archived stuff... This is heavy update. as I say below, what is the vacuum setup like? we have 10 databases for our clients and a centralized database for the general stuff. database specs: double XEON 2.4 on DELL PowerEdge2650 2 gigs of RAM 5 SCSI Drive RAID 5 15rpm tasks: 4 millions of transactions by day 160 open connection 24 hours by day 7 days by week pg_autovacuum running 24/7 reindex on midnight Where is your pg_autovacuum config? how often is it set to vacuum? and analyze for that matter. postgresql.conf: tcpip_socket = true #ssl = false max_connections = 256 #superuser_reserved_connections = 2 #port = 5432 #hostname_lookup = false #show_source_port = false #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' #krb_server_keyfile = '' # # Shared Memory Size # #shared_buffers = 256 # min max_connections*2 or 16, 8KB each #shared_buffers = 196000# min max_connections*2 or 16, 8KB each shared_buffers = 128000 # min max_connections*2 or 16, 8KB each #max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 100 # min 1000, fsm is free space map, ~6 bytes #max_locks_per_transaction = 64 # min 10 #wal_buffers = 8# min 4, typically 8KB each I would assume given heavy update you need more WAL buffers, but then I don't know a lot. # # Non-shared Memory Sizes # #sort_mem = 32168 # min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB vacuum_mem = 65536 # min 1024, size in KB # # Write-ahead log (WAL) # #checkpoint_segments = 3# in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds 3 checkpoint_segments is too low for the number of inserts/delete/updates you are doing. you need a much larger check_point, something like 10+ but the tuning docs will give you a better idea. # #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5# range 1-1000 # #fsync = true #wal_sync_method = fsync# the default varies across platforms: # # fsync, fdatasync, open_sync, or open_datasync #wal_debug = 0 # range 0-16 # # Optimizer Parameters # #enable_seqscan = true #enable_indexscan = true #enable_tidscan = true #enable_sort = true #enable_nestloop = true #enable_mergejoin = true #enable_hashjoin = true #effective_cache_size = 1000# typically 8KB each effective_cache_size = 196608 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #default_statistics_target = 10 # range 1-1000 # # GEQO Optimizer Parameters # #geqo = true #geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_threshold = 11 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_effort = 1 #geqo_generations = 0 #geqo_random_seed = -1 # auto-compute seed # # Message display # server_min_messages =notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic client_min_messages =notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error #silent_mode = false #log_connections =true #log_pid =true #log_statement =true #log_duration =true #log_timestamp =true log_min_error_statement =error # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #explain_pretty_print = true # requires USE_ASSERT_CHECKING #debug_assertions = true # # Syslog # syslog = 0 # range 0-2 syslog_facility = 'LOCAL0' syslog_ident = 'postgres' # # Statistics # show_parser_stats = false show_planner_stats =false show_executor_stats = false show_statement_stats =false # requires
[PERFORM] Use of Functional Indexs and Planner estimates
Dear All, I have a table with approximately 570k Rows. Table filter.rules Column | Type | Modifiers --++ rulename | character varying(16) | not null default ''::character varying uri | character varying(200) | not null default ''::character varying redirect | character varying(200) | not null default ''::character varying moddate | date | not null default ('now'::text)::date active | boolean| not null default true comment | character varying(255) | not null default ''::character varying Indexes: rules_pkey primary key, btree (rulename, uri) moddate_idx btree (moddate) rules_idx btree (lower((uri)::text)) Statistic on the uri column have been set 1000 Vacuum full and analyze was run before tests, no alteration to tables since then. # analyze verbose filter.rules; INFO: analyzing filter.rules INFO: rules: 5228 pages, 30 rows sampled, 570533 estimated total rows ANALYZE # explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY rulename,redirect; QUERY PLAN HashAggregate (cost=22352.79..22352.79 rows=1 width=12) (actual time=2047.331..2047.332 rows=1 loops=1) - Seq Scan on rules (cost=0.00..22296.32 rows=11294 width=12) (actual time=540.149..2047.308 rows=1 loops=1) Filter: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 'com/'::text)) Total runtime: 2047.420 ms (4 rows) # SET enable_seqscan=off; # explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY rulename,redirect; QUERY PLAN HashAggregate (cost=38970.68..38970.68 rows=1 width=12) (actual time=0.328..0.328 rows=1 loops=1) - Index Scan using rules_idx, rules_idx, rules_idx, rules_idx on rules (cost=0.00..38914.21 rows=11294 width=12) (actual time=0.210..0.312 rows=1 loops=1) Index Cond: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 'com/'::text)) Total runtime: 0.700 ms (4 rows) Could anybody offer explanations of why the planner does such a terrible job of estimated the number of rows for this query, with the stats set so high. Tests were also done with stats set to 100, and 1. The results are exactly the same. Which I would have assumed. Also I am interested in how functional indexes have statistics collected for them, if they do. As to possibly minimize or avoid this problem in the future. Thanks for your considersation of this matter. Regards Russell Smith. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match