Re: [PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9
Sorry Tom, misread your mail! My bad :-( I believe the following is the data you need ? Live Server relname relpages ctp_statcon 72 statcon_pk 135 Test Server relname relpages ctp_statcon 34 statcon_pk 28 Have executed the following query to obtain that data: SELECT relname, relpages FROM pg_class WHERE relname = 'statcon_pk' OR relname = 'sc2ctp_fk' OR relname = 'sc2mtp_fk' OR relname = 'sc2sc_fk' OR relname = 'ctp_statcon' The size difference for the index is surprisingly big I think, considering that there's only around 1000 rows more in the table on the live server than on the server. Count for Live Server: 12597 Count for Test Server: 11494 Any insight into this? Cheers Jona PS: The meta data for the table is: CREATE TABLE statcon_tbl ( id serial NOT NULL, data bytea, wm bool DEFAULT 'FALSE', created timestamp DEFAULT now(), modified timestamp DEFAULT now(), enabled bool DEFAULT 'TRUE', bitsperpixel int4 DEFAULT 0, mtpid int4, sctid int4, ctpid int4, CONSTRAINT statcon_pk PRIMARY KEY (id), CONSTRAINT sc2ctp_fk FOREIGN KEY (ctpid) REFERENCES contype_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT sc2mtp_fk FOREIGN KEY (mtpid) REFERENCES mimetype_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT sc2sct_fk FOREIGN KEY (sctid) REFERENCES statcontrans_tbl (id) ON UPDATE CASCADE ON DELETE CASCADE ) WITHOUT OIDS; CREATE INDEX ctp_statcon ON statcon_tbl USING btree (sctid, ctpid); Tom Lane wrote: Jona [EMAIL PROTECTED] writes: anyway, here's the info for relpages: Live Server: 424 Test Server: 338 I was asking about the indexes associated with the table, not the table itself. regards, tom lane
[PERFORM] sequence scan on PK
Hi, I understand that when a table contains only a few rows it is better to do a sequence scan than an index scan. But is this also for a table with 99 records? A table contains id integer (primary key) namevarchar(70) parent integer comment text owner integer inheritAccess integer defaultAccess integer sequence bigint contentsinheritaccessmove integer contentsinheritaccessadd integer explain select * from tblFolders where id=90; QUERY PLAN --- Seq Scan on tblfolders (cost=0.00..3.24 rows=1 width=50) Filter: (id = 90) (I have analyze table bit still a sequence scan). With how manys rows it is ok to do an index scan or sequence scan? How is this calculated in pg? Regards Jer ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] sequence scan on PK
Jeroen van Iddekinge wrote: Hi, I understand that when a table contains only a few rows it is better to do a sequence scan than an index scan. But is this also for a table with 99 records? ... explain select * from tblFolders where id=90; QUERY PLAN --- Seq Scan on tblfolders (cost=0.00..3.24 rows=1 width=50) Filter: (id = 90) (I have analyze table bit still a sequence scan). With how manys rows it is ok to do an index scan or sequence scan? How is this calculated in pg? Regards Jer It depends on how many pages need to be read. To do an index scan you need to read the index pages, and then you read the page where the actual row resides. Usually the comment is if you are selecting 5% of the rows, seqscan is faster than an index scan. If I'm reading your query correctly, it is estimating needing to read about 3 pages to get the row you are asking for. If you used an index, it probably would have to read at least that many pages, and they would not be a sequential read, so it should be slower. If you want to prove it, try: \timing EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; SET enable_seqscan TO OFF; EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90; Run multiple times to make sure everything is cached, and take the fastest time. On your machine it might be true that the index scan is slightly faster than the seqscan in this exact circumstance. But I have the feeling the time is trivially different, and if you had say 70 rows it would favor seqscan. Probably somewhere at 150-200 rows it will switch on it's own. You could tweak with several settings to get it to do an index scan earlier, but these would probably break other queries. You don't need to tune for 100 rows, more like 100k or 100M. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] sequence scan on PK
Hi, Thanks for respone. The index scan was a little bit faster for id=1 and faster for id=99. Which settings shoud I change for this? cpu_index_tuple_cost , cpu_operator_cost, cpu_tuple_cost? You should lower random_page_cost to make the planner choose an index scan vs sequential scan. Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] sequence scan on PK
You should lower random_page_cost to make the planner choose an index scan vs sequential scan. Yes , it was a bit to high (18) so a lowered it. It speeded up some pages for about 5%. Reg. Jer ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] sequence scan on PK
Jeroen van Iddekinge [EMAIL PROTECTED] writes: You could tweak with several settings to get it to do an index scan earlier, but these would probably break other queries. You don't need to tune for 100 rows, morelike 100k or 100M. Which settings shoud I change for this? I'd agree with John's response: if you change any settings based on just this one test case, you're a fool. But usually random_page_cost is the best knob to twiddle if you wish to encourage indexscans. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] sequence scan on PK
Hi, Yes , it was a bit to high (18) so a lowered it. It speeded up some pages for about 5%. 18? The default is 4 if I can remember correctly. I wonder if your db has ever seen an index scan ;) Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] COPY vs INSERT
On Fri, May 06, 2005 at 09:30:46AM +0200, Dennis Bjorklund wrote: The sql standard include this, except that you can not have the outer (). So it should be INSERT INTO table VALUES (1,2,3), (4,5,6), (7,8,9); Do DB2 demand these extra ()? My recollection is that it does, but it's been a few years... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] sequence scan on PK
Matteo Beccati wrote: Hi, Yes , it was a bit to high (18) so a lowered it. It speeded up some pages for about 5%. 18? The default is 4 if I can remember correctly. I wonder if your db has ever seen an index scan ;) I was expermenting how much some setting influence has on the perfomance of some web application. So I think i forgot to change the setting back and got some strange query plans. Thanks Jer ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Whence the Opterons?
Mischa Sandberg wrote: After reading the comparisons between Opteron and Xeon processors for Linux, I'd like to add an Opteron box to our stable of Dells and Sparcs, for comparison. IBM, Sun and HP have their fairly pricey Opteron systems. The IT people are not swell about unsupported purchases off ebay. Anyone care to suggest any other vendors/distributors? Check out the Tyan Transport systems. Tyan are an ex Sparc clone manufacturer, and released the second available Opteron board - widely considered the first serious Opteron board to hit the market. Sam. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Whence the Opterons?
Thanks to everyone for their pointers to suppliers of Opteron systems. The system I'm pricing is under a tighter budget than a production machine, because it will be for perftests. Our customers tend to run on Dells but occasionally run on (Sun) Opterons. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Query tuning help
Sorry to bother everyone with yet another my query isn't using an index problem but I am over my head on this one.. I am open to ways of restructuring this query to perform better. I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has been just run on the table. This is the result of: explain analyze select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat from ea, em, eg where em.incidentid = ea.incidentid and em.incidentid = eg.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and ea.incidentid in ( select incidentid from ea where recordtext like '%RED%' ) and ea.incidentid in ( select incidentid from ea where recordtext like '%CORVETTE%' ) and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate - ANALYZE RESULTS - Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1) - Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1) Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat - Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1) - Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1) - Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1) - Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1) - HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1) - Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1) Filter: ((recordtext)::text ~~ '%CORVETTE%'::text) - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) - Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011) Index Cond: ((outer.incidentid)::text = (ea.incidentid)::text) Filter: ((recordtext)::text ~~ '%RED%'::text) - Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701) Index Cond: ((outer.incidentid)::text = (em.incidentid)::text) Filter: ((entrydate = '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 00:00:00'::timestamp without time zone)) - Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72) Index Cond: ((outer.incidentid)::text = (eg.incidentid)::text) Total runtime: 446871.880 ms (22 rows) - EXPLANATION - The reason for the redundant LIKE clause is that first, I only want those incidentids that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered by the user. I have investigated text searching options and have not found them to be congruous with my application. Why is it choosing a sequential scan one part of the query when searching for the words, yet using an index scan for another part of it? Is there a better way to structure the query to give it better hints? I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM. Thank you for any advice. -Dan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query tuning help
On Mon, 9 May 2005 09:20 am, Dan Harris wrote: Sorry to bother everyone with yet another my query isn't using an index problem but I am over my head on this one.. I am open to ways of restructuring this query to perform better. I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has been just run on the table. This is the result of: explain analyze select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat from ea, em, eg where em.incidentid = ea.incidentid and em.incidentid = eg.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and ea.incidentid in ( select incidentid from ea where recordtext like '%RED%' ) and ea.incidentid in ( select incidentid from ea where recordtext like '%CORVETTE%' ) and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate You cannot use an index for %CORVETTE%, or %RED%. There is no way for the index to know if a row had that in the middle without scanning the whole index. So it's much cheaper to do a sequence scan. One possible way to make the query faster is to limit based on date, as you will only get about 700 rows. And then don't use subselects, as they are doing full sequence scans. I think this query does what you do above, and I think it will be faster, but I don't know. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%') JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' ); - ANALYZE RESULTS - Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1) - Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1) Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat - Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1) - Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1) - Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1) - Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1) - HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1) - Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1) Filter: ((recordtext)::text ~~ '%CORVETTE%'::text) - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) - Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011) Index Cond: ((outer.incidentid)::text = (ea.incidentid)::text) Filter: ((recordtext)::text ~~ '%RED%'::text) - Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701) Index Cond: ((outer.incidentid)::text = (em.incidentid)::text) Filter: ((entrydate = '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 00:00:00'::timestamp without time zone)) - Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72) Index Cond: ((outer.incidentid)::text = (eg.incidentid)::text) Total runtime: 446871.880 ms (22 rows) - EXPLANATION - The reason for the redundant LIKE clause is that first, I only want those incidentids that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered
Re: [PERFORM] Query tuning help
Dan, and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) It is simply not possible to use B-tree indexes on these kind of text queries. B-trees require you to start at the left side of the field, because B-trees locate records via tests. Anywhere in the field text search requires a Full Text Index. The reason for the redundant LIKE clause is that first, I only want those incidentids that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered by the user. I have investigated text searching options and have not found them to be congruous with my application. Sounds like you either need to restructure your application, restructure your database (so that you're not doing anywhere in field searches), or buy 32GB of ram so that you can cache the whole table. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query tuning help
Russell Smith [EMAIL PROTECTED] writes: On Mon, 9 May 2005 09:20 am, Dan Harris wrote: and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) You cannot use an index for %CORVETTE%, or %RED%. Not a btree index anyway. Dan might have some success here with a full-text-indexing package (eg, contrib/tsearch2) regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Query tuning help
On May 8, 2005, at 6:51 PM, Russell Smith wrote: On Mon, 9 May 2005 09:20 am, Dan Harris wrote: You cannot use an index for %CORVETTE%, or %RED%. There is no way for the index to know if a row had that in the middle without scanning the whole index. So it's much cheaper to do a sequence scan. While I believe you, I'm confused by this line in my original EXPLAIN ANALYZE: - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) Doesn't that mean it was using an index to filter? Along those lines, before I created index 'ea1', the query was much much slower. So, it seemed like creating this index made a difference. One possible way to make the query faster is to limit based on date, as you will only get about 700 rows. And then don't use subselects, as they are doing full sequence scans. I think this query does what you do above, and I think it will be faster, but I don't know. I REALLY like this idea! If I could just filter by date first and then sequential scan through those, it should be very manageable. Hopefully I can keep this goal while still accommodating the requirement listed in my next paragraph. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%') JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' ); I have run this, and while it is very fast, I'm concerned it's not doing what I need. Here's the situation: Due to the format of the systems with which I integrate ( I have no control over these formats ), we will get these 'recordtext' values one line at a time, accumulating over time. The only way I can find to make this work is to insert a new record for each line. The problem is, that when someone wants to search multiple keywords, they expect these words to be matched across multiple records with a given incident number. For a very simple example: IncidentID DateRecordtext -- - --- 1 2005-05-01 14:21 blah blah blah RED blah blah 2005-05-01 14:23 not what we are looking for 1 2005-05-02 02:05 blah CORVETTE blah blah So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the condition will only be applied to a single row of recordtext at a time, not a whole group with the same incident number. If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I may also receive additional updates to the previous data. In that case, I need to replace the original record with the latest version of it. If I have already concatenated these rows into a single field, the logic to in-line replace only the old text that has changed is very very difficult at best. So, that's the reason I had to do two subqueries in my example. Please tell me if I misunderstood your logic and it really will match given my condition above, but it didn't seem like it would. Thanks again for the quick responses! This list has been a great resource for me. -Dan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query tuning help
Dan, While I believe you, I'm confused by this line in my original EXPLAIN ANALYZE: - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) The index named is matching based on incidentid -- the join condition. The filter is applied against the table rows, i.e. a scan. If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? No. Read the OpenFTS docs, they are fairly clear on how to set up a simple FTS index. (TSearch2 ~~ OpenFTS) If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. You do that by doubling up ... that is, use the FTS index to pick all rows that contain RED and CORVETTE, and then check the order. I'll also note that your current query is not checking word order. Example: WHERE recordtext_fti @@ to_tsquery ('default', 'RED CORVETTE') AND recordtext LIKE '%RED%CORVETTE%' I'm doing something fairly similar on one of my projects and it works very well. The limitations on TSearch2 indexes are: 1) they are expensive to update, so your data loads would be noticably slower. 2) they are only fast when cached in RAM (and when cached, are *very* fast). So if you have a variety of other processes that tend to fill up RAM between searches, you may find them less useful. 3) You have to create a materialized index column next to recordtext, which will increase the size of the table. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Whence the Opterons?
Mischa Sandberg wrote: After reading the comparisons between Opteron and Xeon processors for Linux, I'd like to add an Opteron box to our stable of Dells and Sparcs, for comparison. IBM, Sun and HP have their fairly pricey Opteron systems. The IT people are not swell about unsupported purchases off ebay. Anyone care to suggest any other vendors/distributors? Looking for names with national support, so that we can recommend as much to our customers. Monarch Computer http://www.monarchcomputer.com/ They have prebuilt and custom built systems. -- Until later, Geoffrey ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query tuning help
Dan Harris [EMAIL PROTECTED] writes: - Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) Doesn't that mean it was using an index to filter? No. The Index Cond shows it is using the index only for the join condition. A Filter is an additional filter condition that happens to get applied at this plan node --- but it'll be applied to every row the index finds for the index condition. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query tuning help
On May 8, 2005, at 8:06 PM, Josh Berkus wrote: If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? No. Read the OpenFTS docs, they are fairly clear on how to set up a simple FTS index. (TSearch2 ~~ OpenFTS) If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I have been reading the Tsearch2 docs and either I don't understand something or I'm not communicating my situation clearly enough. It seems that Tsearch2 has a concept of document. And, in everything I am reading, they expect your document to be all contained in a single row. Since my words can be spread across multiple rows, I don't see that Tsearch2 will combine all 'recordtext' row values with the same incidentid into a single vector. Am I overlooking something in the docs? I'm doing something fairly similar on one of my projects and it works very well. I'd be curious what similarities they have? Is it the searching across multiple rows or the order of words? The limitations on TSearch2 indexes are: 1) they are expensive to update, so your data loads would be noticably slower. 2) they are only fast when cached in RAM (and when cached, are *very* fast). So if you have a variety of other processes that tend to fill up RAM between searches, you may find them less useful. 3) You have to create a materialized index column next to recordtext, which will increase the size of the table. Duly noted. If this method can search across rows, I'm willing to accept this overhead for the speed it would add. In the meantime, is there any way I can reach my goal without Tsearch2 by just restructuring my query to narrow down the results by date first, then seq scan for the 'likes'? -Dan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Query tuning help
On Mon, 9 May 2005 11:49 am, Dan Harris wrote: On May 8, 2005, at 6:51 PM, Russell Smith wrote: [snip] select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%') JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' ); I have run this, and while it is very fast, I'm concerned it's not doing what I need. How fast is very fast? Here's the situation: Due to the format of the systems with which I integrate ( I have no control over these formats ), we will get these 'recordtext' values one line at a time, accumulating over time. The only way I can find to make this work is to insert a new record for each line. The problem is, that when someone wants to search multiple keywords, they expect these words to be matched across multiple records with a given incident number. For a very simple example: IncidentID DateRecordtext -- - --- 1 2005-05-01 14:21 blah blah blah RED blah blah 2005-05-01 14:23 not what we are looking for 1 2005-05-02 02:05 blah CORVETTE blah blah So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the condition will only be applied to a single row of recordtext at a time, not a whole group with the same incident number. If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I may also receive additional updates to the previous data. In that case, I need to replace the original record with the latest version of it. If I have already concatenated these rows into a single field, the logic to in-line replace only the old text that has changed is very very difficult at best. So, that's the reason I had to do two subqueries in my example. Please tell me if I misunderstood your logic and it really will match given my condition above, but it didn't seem like it would. Thanks again for the quick responses! This list has been a great resource for me. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%' OR ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid WHERE em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid) AND em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%')) JOIN eg ON em.incidentid = eg.incidentid) This may be more accurate. However I would cool it VERY NASTY. Josh's solutions may be better. However much of the data should be in memory once the subplans are done, so it may be quite fast. you may -Dan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Query tuning help
On May 8, 2005, at 8:32 PM, Russell Smith wrote: I have run this, and while it is very fast, I'm concerned it's not doing what I need. How fast is very fast? It took 35 seconds to complete versus ~450 my old way. select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%' OR ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid WHERE em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%')) JOIN eg ON em.incidentid = eg.incidentid) AND em.incidentid IN (select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 00:00' AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%')) JOIN eg ON em.incidentid = eg.incidentid) Yes, it is nasty, but so was my previous query :) So long as this is faster, I'm ok with that. I'll see if i can make this work. Thank you very much. -Dan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Query tuning help
On Sun, 8 May 2005 20:31:38 -0600, Dan Harris [EMAIL PROTECTED] wrote: Duly noted. If this method can search across rows, I'm willing to accept this overhead for the speed it would add. You could use intersect to search across rows. Using tsearch2 will look up the RED and CORVETTE using the index and intersect will pull out the commmon rows. In the meantime, is there any way I can reach my goal without Tsearch2 by just restructuring my query to narrow down the results by date first, then seq scan for the 'likes'? select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat from ea, em, eg, ( select ea.incidentid from ea, em where em.incidentid = ea.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and recordtext like '%RED%' intersect select ea.incidentid from ea, em where em.incidentid = ea.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and recordtext like '%CORVETTE%' ) as iid where em.incidentid = ea.incidentid and em.incidentid = eg.incidentid and em.entrydate = '2005-1-1 00:00' and em.entrydate = '2005-5-9 00:00' and ea.incidentid = iid.incidentid and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate klint. +---+-+ : Klint Gore: Non rhyming: : EMail : [EMAIL PROTECTED] : slang - the: : Snail : A.B.R.I.: possibilities : : Mail University of New England : are useless : : Armidale NSW 2351 Australia : L.J.J. : : Fax : +61 2 6772 5376 : : +---+-+ ---(end of broadcast)--- TIP 8: explain analyze is your friend