[PERFORM] Stored Procedure Performance
Hi, Im trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL? A simple example of this can be shown with the following commands: First I created a test table: CREATE TABLE test ( id int8, name varchar(128), description varchar(500), constraint pk_test primary key (id) ); Then the function I want to test: CREATE OR REPLACE FUNCTION readTest() RETURNS SETOF test AS $$ DECLARE row test%ROWTYPE; BEGIN FOR row IN SELECT * FROM test LOOP RETURN NEXT row; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; Firstly, I ran EXPLAIN on the raw SQL to see how long that takes to access the database the results are as follows: EXPLAIN ANALYZE SELECT * FROM test; Seq Scan on test (cost=0.00..10.90 rows=90 width=798) (actual time=0.003..0.003 rows=0 loops=1) Total runtime: 0.074 ms (2 rows) Secondly, I ran EXPLAIN on the function created above and the results are as follows: EXPLAIN ANALYZE SELECT * FROM readTest(); Function Scan on readtest (cost=0.00..12.50 rows=1000 width=798) (actual time=0.870..0.870 rows=0 loops=1) Total runtime: 0.910 ms (2 rows) I know that the function is planned the first time it is executed so I ran the same command again to remove that processing from the timings and the results are as follows: EXPLAIN ANALYZE SELECT * FROM readTest(); Function Scan on readtest (cost=0.00..12.50 rows=1000 width=798) (actual time=0.166..0.166 rows=0 loops=1) Total runtime: 0.217 ms (2 rows) Event with the planning removed, the function still performs significantly slower than the raw SQL. Is that normal or am I doing something wrong with the creation or calling of the function? Thanks for your help, Simon Visit our Website at www.rm.com This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications. Freedom of Information Act 2000 This email and any attachments may contain confidential information belonging to RM. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests. This email has been scanned for viruses by Trend ScanMail.
Re: [PERFORM] Takes too long to fetch the data from database
I have flushed the database, so currently records in the lan table are: 665280 but records can be increased more than 1GB and in that case it takes more than 1 hour Below is explain analyze output taken from the table having 665280 records pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE ( ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; NOTICE: QUERY PLAN: Sort (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29 rows =288 loops=1) - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (ac tual time=7564.44..619121.61 rows=288 loops=1) Total runtime: 619140.76 msec EXPLAIN bsdb=# explain analyze SELECT DISTINCT sdate, stime, rbts from lan WHERE ( ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; NOTICE: QUERY PLAN: Unique (cost=17.13..17.14 rows=1 width=16) (actual time=610546.66..610564.31 rows=288 loops=1) - Sort (cost=17.13..17.13 rows=1 width=16) (actual time=610546.65..610546.75 rows=288 loops=1) - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7524.47..610533.50 rows=288 loops=1) Total runtime: 610565.51 msec EXPLAIN pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE ( ( bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; NOTICE: QUERY PLAN: Sort (cost=17.13..17.13 rows=1 width=16) (actual time=1260756.66..1260756.76 rows=288 loops=1) - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7725.97..1260752.47 rows=288 loops=1) Total runtime: 1260757.09 msec pdb=# \d lan Table lan Column | Type | Modifiers --+---+--- bname | character varying(64) | not null sdate | date | not null stime | integer | not null cno | smallint | not null pno | smallint | not null rbts | bigint | tbts | bigint | u_inpkt | bigint | u_outpkt | bigint | m_inpkt | bigint | m_outpkt | bigint | b_inpkt | bigint | b_outpkt | bigint | Primary key: lan_pkey Check constraints: lan_stime ((stime = 0) AND (stime 86400)) On 4/10/06, Joshua D. Drake [EMAIL PROTECTED] wrote: Rajesh Kumar Mallah wrote: what is the query ? use LIMIT or a restricting where clause. You could also use a cursor.Joshua D. Drake regds mallah. On 4/10/06, *soni de* [EMAIL PROTECTED] mailto: [EMAIL PROTECTED] wrote: Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time. --=== The PostgreSQL Company: Command Prompt, Inc. ===Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240Providing the most comprehensivePostgreSQL solutions since 1997 http://www.commandprompt.com/
Re: [PERFORM] Stored Procedure Performance
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote: I'm trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL?worry but your benchmark is completelly flawed.1st. the tables are empty. will you ever run the real code on empty tables? 2nd. do you really need a stored procedure for such a simple query?testing something that's far from real usage will not give you any good.return next will of course show up as slower than standard select. the thing is - will the relative slowness of return next matter to you when you will put more logic in the procedure? depesz
Re: [PERFORM] Stored Procedure Performance
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote: Hi, I'm trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL? No. RETURN NEXT keeps accumulating the data before returning. I am not sure if any optimisations have been done to that effect. In general functions are *NOT* slower than RAW SQL. Regds mallah. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Stored Procedure Performance
Rajesh Kumar Mallah wrote: On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote: I'm trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL? No. RETURN NEXT keeps accumulating the data before returning. I am not sure if any optimisations have been done to that effect. In general functions are *NOT* slower than RAW SQL. Actually, in cases where there is a simple way to state the query in raw SQL then I'd expect that a procedural solution IS slower. After all, you're adding another layer of processing. Of course, you normally wouldn't write a procedural solution to a simple query. Added to this is the difference that plpgsql is planned once whereas raw sql will be planned on each query. This means you save planning costs with the plpgsql but have the chance to get better plans with the raw sql. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Takes too long to fetch the data from database
soni de wrote: I have flushed the database, so currently records in the lan table are: 665280 but records can be increased more than 1GB and in that case it takes more than 1 hour Below is explain analyze output taken from the table having 665280 records pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE ( ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; NOTICE: QUERY PLAN: Sort (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29rows =288 loops=1) - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7564.44..619121.61 rows=288 loops=1) Total runtime: 619140.76 msec OK - there is clearly something wrong here when you take 10 minutes to fetch 288 rows from an index. 1. VACUUM FULL VERBOSE lan; 2. test again, and if that doesn't work... 3. REINDEX TABLE lan; 4. test again I'm guessing you have a *lot* of dead rows in there. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Restore performance?
Well, your pg_dump command lost your BLOBs since the plain text format doesn't support them. Well, no.. they are stored as BYTEA not Large Objects.. They are encoded in ASCII in the pg_dump output. But once you use the -Fc format on your dump and enable blob backups, you can speed up reloads by increasing your checkpoint segments to a big number like 256 and the checkpoint timeout to something like 10 minutes. All other normal tuning parameters should be what you plan to use for your normal operations, too. Thanks. Jesper -- Jesper Krogh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Takes too long to fetch the data from database
pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE ( ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; this query would benefit from an index on pluto, cno, pno, sdate create index Ian_idx on Ian(bname, cno, pno, sdate); pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE ( ( bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ; ditto above. Generally, the closer the fields in the where clause are matched by the index, the it will speed up your query. Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Stored Procedure Performance
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote: I'm trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL? pl/pgsql procedures are a very thin layer over the query engine. Generally, they run about the same speed as SQL but you are not making apples to apples comparison. One of the few but annoying limitations of pl/pgsql procedures is that you can't return a select directly from the query engine but have to go through the return/return next paradigm which will be slower than raw query for obvious reasons. You can however return a refcursor and you may want to look at them in situations where you want to return arbitrary sets outside the query engine or between pl/pgsql functions. An example of using refcurors in that way is on my blog at http://people.planetpostgresql.org/merlin/index.php?/archives/2-Dealing-With-Recursive-Sets-With-PLPGSQL.html Generally, in my opinion if you want to really unlock the power of postgresql you have to master pl/pgsql. Go for it...it will work and work well. merlin merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Takes too long to fetch the data from database
Richard Huxton dev@archonet.com writes: soni de wrote: NOTICE: QUERY PLAN: Sort (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29rows =288 loops=1) - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7564.44..619121.61 rows=288 loops=1) Total runtime: 619140.76 msec OK - there is clearly something wrong here when you take 10 minutes to fetch 288 rows from an index. I'm guessing you have a *lot* of dead rows in there. Yeah. The other small problem here is that EXPLAIN output hasn't looked like that since PG 7.2 (unless Soni has just omitted the index-condition lines). I'd recommend updating to something modern. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Stored Procedure Performance
[EMAIL PROTECTED] (Simon Dale) wrote: p class=MsoNormalfont size=2 face=Arialspan style='font-size:10.0pt; font-family:Arial'Event with the planning removed, the function still performs significantly slower than the raw SQL. Is that normal or am I doing something wrong with the creation or calling of the function?o:p/o:p/span/font/p I'd expect this, yes. You're doing something via stored procedure logic that would be done more directly via straight SQL; of course it won't be faster. In effect, pl/pgsql involves (planning once) then running each line of logic. In effect, you replaced one query (select * from some table) into 90 queries. Yup, there's extra cost there. There's not some magic by which stored procedures provide results faster as a natural matter of course; the performance benefits generally fall out of two improvements: 1. You eliminate client-to-server round trips. A stored proc that runs 8 queries saves you 8 round trips over submitting the 8 queries directly. Saving you latency time. 2. You can eliminate the marshalling and transmission of unnecessary data. A stored proc that runs 8 queries, and only returns summarized results that all come from the last table queried will eliminate the need to marshall and transmit (possibly over a slow link) the data for the 7 preceding queries. The case that you tried can benefit from neither of those effects; your stored procedure eliminates NO round trips, and NO marshalling/transmission. -- (format nil [EMAIL PROTECTED] cbbrowne gmail.com) http://linuxdatabases.info/info/rdbms.html Rules of the Evil Overlord #228. If the hero claims he wishes to confess in public or to me personally, I will remind him that a notarized deposition will serve just as well. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Stored Procedure Performance
Merlin Moncure wrote: On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote: I'm trying to evaluate PostgreSQL as a database that will have to store a high volume of data and access that data frequently. One of the features on our wish list is to be able to use stored procedures to access the data and I was wondering if it is usual for stored procedures to perform slower on PostgreSQL than raw SQL? pl/pgsql procedures are a very thin layer over the query engine. Generally, they run about the same speed as SQL but you are not making apples to apples comparison. One of the few but annoying limitations of pl/pgsql procedures is that you can't return a select directly from the query engine but have to go through the return/return next paradigm which will be slower than raw query for obvious reasons. There's one problem that hasn't been mentioned. For the optimizer a PL/pgSQL function (really, a function in any language except SQL) is a black box. If you have a complex join of two or three functions, and they don't return 1000 rows, it's very likely that the optimizer is going to get it wrong. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Indexes with descending date columns
Hi, Bruce, Bruce Momjian wrote: Ahh. There's a hack to do that by defining a new opclass that reverses and , and then doing ORDER BY project_id, id, date USING new_opclass. I think there's a TODO about this, but I'm not sure... Yes, and updated: * Allow the creation of indexes with mixed ascending/descending specifiers This is possible now by creating an operator class with reversed sort operators. One complexity is that NULLs would then appear at the start of the result set, and this might affect certain sort types, like merge join. I think it would be better to allow index zig-zag scans for multi-column index.[1] So it traverses in a given order on the higher order column, and the sub trees for each specific high order value is traversed in reversed order. From my knowledge at least of BTrees, and given correct commutator definitions, this should be not so complicated to implement.[2] This would allow the query planner to use the same index for arbitrary ASC/DESC combinations of the given columns. Just a thought, Markus [1] It may make sense to implement the mixed specifiers on indices as well, to allow CLUSTERing on mixed search order. [2] But I admit that I currently don't have enough knowledge in PostgreSQL index scan internals to know whether it really is easy to implement. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] FOREIGN KEYS vs PERFORMANCE
Hi, I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? Specifically, the system we are developing is a server/cliente architecture that the server is the database and the fat client is an application developed in DELPHI!!! Thanks in advance!!
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Tue, 2006-04-11 at 14:13, Rodrigo Sakai wrote: Hi, I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? Specifically, the system we are developing is a server/cliente architecture that the server is the database and the fat client is an application developed in DELPHI!!! If ref integrity is important, you'll have to do it either in the app or the database. Almost always, it's faster to let the database do it, as there's less traffic across the wire required to maintain ref integrity, plus, the guys who wrote the database have spent years making sure race conditions won't scram your data. For simple, straight forward FK-PK relationships, you will likely NOT be able to beat the database in terms of either reliability or performance with your own code. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Encouraging multi-table join order
Tom Lane wrote: SNIP So it's estimating 5775 cost units per probe into eventactivity, which is pretty high --- it must think that a lot of rows will be retrieved by the index (way more than the 20 or so it thinks will get past the filter condition). What does the pg_stats entry for eventactivity.incidentid contain? select * from pg_stats where tablename = 'eventactivity' and attname='incidentid'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---++---+---++-+-++- public | eventactivity | incidentid | 0 |14 | 8157 | {P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117} | {0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013} | {P022140319,P030471058,P033090308,P041961082,P042910689,P050311006,P051350254,P052261148,P053270945,P060240316,P061000287} |0.241737 It might be worth increasing the statistics target for that column to try to get a better estimate. How high should I set this? I read the default is 10, but I'm not sure if doubling this would make a difference or if I should be doing a much larger number. There's approx 45 million rows in the table, if that matters. Thanks again, Dan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Stored Procedure Performance
On 4/11/06, Alvaro Herrera [EMAIL PROTECTED] wrote: Merlin Moncure wrote: pl/pgsql procedures are a very thin layer over the query engine. Generally, they run about the same speed as SQL but you are not making apples to apples comparison. One of the few but annoying limitations of pl/pgsql procedures is that you can't return a select directly from the query engine but have to go through the return/return next paradigm which will be slower than raw query for obvious reasons. There's one problem that hasn't been mentioned. For the optimizer a PL/pgSQL function (really, a function in any language except SQL) is a black box. If you have a complex join of two or three functions, and they don't return 1000 rows, it's very likely that the optimizer is going to get it wrong. This doesn't bother me that much. Those cases usually have a high overlap with views.You just have to plan on the function being fully materialized before it is inovled further. What drives me crazy is I have to do 'select * from plpgsql_srf()' but I am allowed to do the much friendlier and more versatile 'select sql_srf()', even if they do more or less the same thing. On the flip side, what drives me crazy about sql functions is that all tables have to be in the search path for the validator. Since I frequently use the trick of having multiple schemas with one set of functions this is annoying. Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Encouraging multi-table join order
Dan Harris [EMAIL PROTECTED] writes: Tom Lane wrote: What does the pg_stats entry for eventactivity.incidentid contain? {P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117} | {0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013} How high should I set this? I read the default is 10, but I'm not sure if doubling this would make a difference or if I should be doing a much larger number. There's approx 45 million rows in the table, if that matters. What the stats entry is saying is that the most common entries occur about 75000 times apiece (0.0017 * 45e6), which is what's scaring the planner here ;-). I think those frequencies are artificially high though. The default statistics sample size is 3000 rows (300 * statistics target, actually), so those numbers correspond to 5 or 4 rows in the sample, which is probably just random chance. Try increasing the stats targets for this table to 100, then re-ANALYZE and see what you get. The most_common_freqs entries might drop as much as a factor of 10. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sequencial scan instead of using index
Quoting Harry Hehl [EMAIL PROTECTED]: Mark, (snippage)However I am still getting seq scans on indexes for other queries For example: select * from omfile where ( objectid in ( select distinct(ref_oid) from ts ) ); objectid ref_oid are non-unique indexes omimagefile omclipfile inherit from omfile -- -- Nested Loop IN Join (cost=21432.32..951981.42 rows=204910 width=217) Join Filter: (outer.objectid = inner.ref_oid) - Append (cost=0.00..8454.10 rows=204910 width=217) - Seq Scan on omfile (cost=0.00..8428.20 rows=204320 width=217) - Seq Scan on omimagefile omfile (cost=0.00..12.70 rows=270 width=217) - Seq Scan on omclipfile omfile (cost=0.00..13.20 rows=320 width=217) - Materialize (cost=21432.32..21434.32 rows=200 width=16) - Unique (cost=20614.91..21430.12 rows=200 width=16) - Sort (cost=20614.91..21022.52 rows=163041 width=16) Sort Key: ts.ref_oid - Seq Scan on ts (cost=0.00..3739.41 rows=163041 width=16) (11 rows) Time: 164.232 ms BTW set enable_seqscan=off has no affect i.e still uses seq scans. If I do a simple query, it is very quick, no sequencial scans. So how can I get index scans to work consistently with joins? explain select * from omfile where objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; Result (cost=2.00..7723.30 rows=102903 width=217) - Append (cost=2.00..7723.30 rows=102903 width=217) - Bitmap Heap Scan on omfile (cost=2.00..7697.60 rows=102608 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omfile_objectid_idx (cost=0.00..2.00 rows=102608 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Heap Scan on omimagefile omfile (cost=1.00..12.69 rows=135 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omimagefile_objectid_idx (cost=0.00..1.00 rows=135 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Heap Scan on omclipfile omfile (cost=1.00..13.00 rows=160 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omclipfile_objectid_idx (cost=0.00..1.00 rows=160 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) (14 rows) Time: 5.164 Hmm - that first query needs to do a sort, so you might want to experiment with the sort_mem parameter. Could you show us output from explain analyze for both the above queries? At face value, selecting 20 rows (assuming the estimates are accurate) may mean that a seqscan is the best plan! But we'll know more after seeing the explain analyze... Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Sequencial scan instead of using index
Quoting [EMAIL PROTECTED] [EMAIL PROTECTED]: Hmm - that first query needs to do a sort, so you might want to experiment with the sort_mem parameter Oops - I mean work_mem... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Sequencial scan instead of using index
Mark, If you can upgrade to 8.1.(3), then the planner can consider paths that use *both* the indexes on srcobj and dstobj (which would probably be the business!). Yes, 8.1.3 resolved this issue. Thanks. However I am still getting seq scans on indexes for other queries For example: select * from omfile where ( objectid in ( select distinct(ref_oid) from ts ) ); objectid ref_oid are non-unique indexes omimagefile omclipfile inherit from omfile Nested Loop IN Join (cost=21432.32..951981.42 rows=204910 width=217) Join Filter: (outer.objectid = inner.ref_oid) - Append (cost=0.00..8454.10 rows=204910 width=217) - Seq Scan on omfile (cost=0.00..8428.20 rows=204320 width=217) - Seq Scan on omimagefile omfile (cost=0.00..12.70 rows=270 width=217) - Seq Scan on omclipfile omfile (cost=0.00..13.20 rows=320 width=217) - Materialize (cost=21432.32..21434.32 rows=200 width=16) - Unique (cost=20614.91..21430.12 rows=200 width=16) - Sort (cost=20614.91..21022.52 rows=163041 width=16) Sort Key: ts.ref_oid - Seq Scan on ts (cost=0.00..3739.41 rows=163041 width=16) (11 rows) Time: 164.232 ms BTW set enable_seqscan=off has no affect i.e still uses seq scans. If I do a simple query, it is very quick, no sequencial scans. So how can I get index scans to work consistently with joins? explain select * from omfile where objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; Result (cost=2.00..7723.30 rows=102903 width=217) - Append (cost=2.00..7723.30 rows=102903 width=217) - Bitmap Heap Scan on omfile (cost=2.00..7697.60 rows=102608 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omfile_objectid_idx (cost=0.00..2.00 rows=102608 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Heap Scan on omimagefile omfile (cost=1.00..12.69 rows=135 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omimagefile_objectid_idx (cost=0.00..1.00 rows=135 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Heap Scan on omclipfile omfile (cost=1.00..13.00 rows=160 width=217) Recheck Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) - Bitmap Index Scan on omclipfile_objectid_idx (cost=0.00..1.00 rows=160 width=0) Index Cond: (objectid = '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid) (14 rows) Time: 5.164 -Original Message- From: Mark Kirkwood [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 12:04 AM To: Harry Hehl Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sequencial scan instead of using index Harry Hehl wrote: There seems to be many posts on this issue but I not yet found an answer to the seq scan issue. I am having an issue with a joins. I am using 8.0.3 on FC4 Query: select * from ommemberrelation where srcobj='somevalue' and dstobj in (select objectid from omfilesysentry where name='dir15_file80'); Columns srcobj, dstobj name are all indexed. The planner is over-estimating the number of rows here (33989 vs 100): - Seq Scan on ommemberrelation (cost=0.00..2394.72 rows=33989 width=177) (actual time=0.078..70.887 rows=100 loops=1) The usual way to attack this is to up the sample size for ANALYZE: ALTER TABLE ommemberrelation ALTER COLUMN srcobj SET STATISTICS 100; ALTER TABLE ommemberrelation ALTER COLUMN dstobj SET STATISTICS 100; -- or even 1000. ANALYZE ommemberrelation; Then try EXPLAIN ANALYZE again. If you can upgrade to 8.1.(3), then the planner can consider paths that use *both* the indexes on srcobj and dstobj (which would probably be the business!). Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sequencial scan instead of using index
Harry Hehl [EMAIL PROTECTED] writes: Nested Loop IN Join (cost=21432.32..951981.42 rows=204910 width=217) Join Filter: (outer.objectid = inner.ref_oid) - Append (cost=0.00..8454.10 rows=204910 width=217) - Seq Scan on omfile (cost=0.00..8428.20 rows=204320 width=217) - Seq Scan on omimagefile omfile (cost=0.00..12.70 rows=270 width=217) - Seq Scan on omclipfile omfile (cost=0.00..13.20 rows=320 width=217) - Materialize (cost=21432.32..21434.32 rows=200 width=16) - Unique (cost=20614.91..21430.12 rows=200 width=16) - Sort (cost=20614.91..21022.52 rows=163041 width=16) Sort Key: ts.ref_oid - Seq Scan on ts (cost=0.00..3739.41 rows=163041 width=16) (11 rows) Time: 164.232 ms So how can I get index scans to work consistently with joins? It's not the join that's the problem, it's the inheritance. I recently improved the planner so that it can consider appended indexscans for an inheritance tree on the inside of a join, but no pre-8.2 release can do it. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote: I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? If I had to choose between one or the other, I'd leave all referential integrity in the database and deal with the errors thrown when referential integrity is violated in the application. PostgreSQL is designed to handle these kinds of issues. Anything you code in your application is more likely to contain bugs or miss corner cases that would allow referential integrity to be violated. PostgreSQL has been pounded on for years by a great many users and developers, making the likelihood of bugs still remaining much smaller. Of course, you can add some referential integrity checks in your application code, but those should be in addition to your database- level checks. Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote: On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote: I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? If I had to choose between one or the other, I'd leave all referential integrity in the database and deal with the errors thrown when referential integrity is violated in the application. PostgreSQL is designed to handle these kinds of issues. Anything you code in your application is more likely to contain bugs or miss corner cases that would allow referential integrity to be violated. PostgreSQL has been pounded on for years by a great many users and developers, making the likelihood of bugs still remaining much smaller. It's also pretty unlikely that you can make RI in the application perform better than in the database. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Jim C. Nasby [EMAIL PROTECTED] writes: On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote: ... Anything you code in your application is more likely to contain bugs or miss corner cases that would allow referential integrity to be violated. PostgreSQL has been pounded on for years by a great many users and developers, making the likelihood of bugs still remaining much smaller. It's also pretty unlikely that you can make RI in the application perform better than in the database. I think the traditional assumption among the you should do RI in the application crowd is that the application has higher-level knowledge that lets it understand when it can skip doing an RI check entirely. Skipping an RI check is always faster than doing it --- so that's right, it's faster. As long as you don't make any mistakes. The question you have to ask yourself is whether you are really that smart ... not just today, but every single time. To quote Clint Eastwood: Do you feel lucky punk? Well, do you? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly