Re: [PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9

2005-05-08 Thread Jona




Sorry Tom, misread your mail! My bad :-(

I believe the following is the data you need ?

   

  Live Server


  relname
  relpages


  ctp_statcon
  72


  statcon_pk
  135


  
  
  
  


  Test Server


  relname
  relpages


  ctp_statcon
  34


  statcon_pk
  28

  


Have executed the following query to obtain that data:
SELECT relname, relpages
FROM pg_class
WHERE relname = 'statcon_pk' OR relname = 'sc2ctp_fk' OR relname =
'sc2mtp_fk' OR relname = 'sc2sc_fk' OR relname = 'ctp_statcon'

The size difference for the index is surprisingly big I think,
considering that there's only around 1000 rows more in the table on the
live server than on the server.
Count for Live Server: 12597
Count for Test Server: 11494
Any insight into this?

Cheers
Jona

PS: The meta data for the table is:
CREATE TABLE statcon_tbl
(
 id serial NOT NULL,
 data bytea,
 wm bool DEFAULT 'FALSE',
 created timestamp DEFAULT now(),
 modified timestamp DEFAULT now(),
 enabled bool DEFAULT 'TRUE',
 bitsperpixel int4 DEFAULT 0,
 mtpid int4,
 sctid int4,
 ctpid int4,
 CONSTRAINT statcon_pk PRIMARY KEY (id),
 CONSTRAINT sc2ctp_fk FOREIGN KEY (ctpid) REFERENCES contype_tbl (id)
ON   UPDATE CASCADE ON DELETE CASCADE,
 CONSTRAINT sc2mtp_fk FOREIGN KEY (mtpid) REFERENCES mimetype_tbl (id)
ON UPDATE CASCADE ON DELETE CASCADE,
 CONSTRAINT sc2sct_fk FOREIGN KEY (sctid) REFERENCES statcontrans_tbl
(id) ON UPDATE CASCADE ON DELETE CASCADE
) 
WITHOUT OIDS;
CREATE INDEX ctp_statcon ON statcon_tbl USING btree (sctid, ctpid);


Tom Lane wrote:

  Jona [EMAIL PROTECTED] writes:
  
  
anyway, here's the info for relpages:
Live Server: 424
Test Server: 338

  
  
I was asking about the indexes associated with the table, not the table
itself.

			regards, tom lane
  






[PERFORM] sequence scan on PK

2005-05-08 Thread Jeroen van Iddekinge
Hi,
I understand that when a table contains only a few rows it is better to 
do a sequence scan than an index scan. But is this also for a table with 
99 records?

A table contains
id  integer (primary key)
namevarchar(70)
parent   integer
comment  text
owner  integer
inheritAccess   integer
defaultAccess  integer
sequence  bigint
contentsinheritaccessmove   integer
contentsinheritaccessadd  integer
explain select * from tblFolders where id=90;
   QUERY PLAN
---
Seq Scan on tblfolders  (cost=0.00..3.24 rows=1 width=50)
  Filter: (id = 90)
(I have analyze table bit still a sequence scan).
With how manys rows it is  ok to do an index scan or sequence scan? How 
is this calculated in pg?

Regards
Jer
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread John A Meinel
Jeroen van Iddekinge wrote:
Hi,
I understand that when a table contains only a few rows it is better to
do a sequence scan than an index scan. But is this also for a table with
99 records?
...
explain select * from tblFolders where id=90;
   QUERY PLAN
---
Seq Scan on tblfolders  (cost=0.00..3.24 rows=1 width=50)
  Filter: (id = 90)
(I have analyze table bit still a sequence scan).
With how manys rows it is  ok to do an index scan or sequence scan? How
is this calculated in pg?
Regards
Jer
It depends on how many pages need to be read. To do an index scan you
need to read the index pages, and then you read the page where the
actual row resides.
Usually the comment is if you are selecting 5% of the rows, seqscan is
faster than an index scan. If I'm reading your query correctly, it is
estimating needing to read about 3 pages to get the row you are asking
for. If you used an index, it probably would have to read at least that
many pages, and they would not be a sequential read, so it should be slower.
If you want to prove it, try:
\timing
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
SET enable_seqscan TO OFF;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
Run multiple times to make sure everything is cached, and take the
fastest time. On your machine it might be true that the index scan is
slightly faster than the seqscan in this exact circumstance. But I have
the feeling the time is trivially different, and if you had say 70 rows
it would favor seqscan. Probably somewhere at 150-200 rows it will
switch on it's own.
You could tweak with several settings to get it to do an index scan
earlier, but these would probably break other queries. You don't need to
tune for 100 rows, more like 100k or 100M.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread Matteo Beccati
Hi,
Thanks for respone.
The index scan was a little bit faster for id=1 and faster for id=99.
Which settings shoud I change for this?  cpu_index_tuple_cost ,  
cpu_operator_cost, cpu_tuple_cost?
You should lower random_page_cost to make the planner choose an index 
scan vs sequential scan.

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread Jeroen van Iddekinge

You should lower random_page_cost to make the planner choose an index 
scan vs sequential scan.

Yes , it was a bit to high (18) so a lowered it. It speeded up some 
pages for about 5%.

Reg. Jer
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread Tom Lane
Jeroen van Iddekinge [EMAIL PROTECTED] writes:
 You could tweak with several settings to get it to do an index scan
 earlier, but these would probably break other queries. You don't need to
 tune for 100 rows, morelike 100k or 100M.

 Which settings shoud I change for this?

I'd agree with John's response: if you change any settings based on just
this one test case, you're a fool.  But usually random_page_cost is the
best knob to twiddle if you wish to encourage indexscans.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread Matteo Beccati
Hi,
Yes , it was a bit to high (18) so a lowered it. It speeded up some 
pages for about 5%.
18? The default is 4 if I can remember correctly. I wonder if your db 
has ever seen an index scan ;)

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] COPY vs INSERT

2005-05-08 Thread Jim C. Nasby
On Fri, May 06, 2005 at 09:30:46AM +0200, Dennis Bjorklund wrote:
 The sql standard include this, except that you can not have the outer ().
 So it should be
 
 INSERT INTO table VALUES
 (1,2,3),
 (4,5,6),
 (7,8,9);
 
 Do DB2 demand these extra ()?

My recollection is that it does, but it's been a few years...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] sequence scan on PK

2005-05-08 Thread Jeroen van Iddekinge
Matteo Beccati wrote:
Hi,
Yes , it was a bit to high (18) so a lowered it. It speeded up some 
pages for about 5%.

18? The default is 4 if I can remember correctly. I wonder if your db 
has ever seen an index scan ;)

I was expermenting how much some setting influence has on the perfomance 
of some web application.
So I think i forgot to change the setting back and got some strange 
query plans.

Thanks
Jer
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Whence the Opterons?

2005-05-08 Thread Sam Vilain
Mischa Sandberg wrote:
After reading the comparisons between Opteron and Xeon processors for Linux,
I'd like to add an Opteron box to our stable of Dells and Sparcs, for 
comparison.
IBM, Sun and HP have their fairly pricey Opteron systems.
The IT people are not swell about unsupported purchases off ebay.
Anyone care to suggest any other vendors/distributors?
Check out the Tyan Transport systems.  Tyan are an ex Sparc clone 
manufacturer, and
released the second available Opteron board - widely considered the first 
serious
Opteron board to hit the market.
Sam.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Whence the Opterons?

2005-05-08 Thread Mischa Sandberg
Thanks to everyone for their pointers to suppliers of Opteron systems.

The system I'm pricing is under a tighter budget than a production machine,
because it will be for perftests. Our customers tend to run on Dells but
occasionally run on (Sun) Opterons. 



---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
Sorry to bother everyone with yet another my query isn't using an 
index problem but I am over my head on this one..  I am open to ways 
of restructuring this query to perform better.

I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has 
been just run on the table.

This is the result of:
explain analyze
select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea, em, eg
where
em.incidentid = ea.incidentid and
em.incidentid = eg.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00'
and ea.incidentid in (
select
incidentid
from
ea
where
recordtext like '%RED%'
)
	and ea.incidentid in (
		select
			incidentid
		from
			ea
		where
			recordtext like '%CORVETTE%'
	)
	and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
order by em.entrydate

-
ANALYZE RESULTS
-
 Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual 
time=446787.056..446787.342 rows=72 loops=1)
   -  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual 
time=446787.053..446787.075 rows=72 loops=1)
 Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
 -  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) 
(actual time=444378.655..446786.746 rows=72 loops=1)
   -  Nested Loop  (cost=771835.10..774688.81 rows=1 
width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
 -  Nested Loop IN Join  
(cost=771835.10..774678.88 rows=2 width=81) (actual 
time=444367.080..446191.864 rows=701 loops=1)
   -  Nested Loop  (cost=771835.10..774572.05 
rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 
loops=1)
 -  HashAggregate  
(cost=771835.10..771835.10 rows=1 width=17) (actual 
time=444366.702..444368.583 rows=473 loops=1)
   -  Seq Scan on ea  
(cost=0.00..771834.26 rows=335 width=17) (actual 
time=259.746..444358.837 rows=592 loops=1)
 Filter: 
((recordtext)::text ~~ '%CORVETTE%'::text)
 -  Index Scan using ea1 on ea  
(cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 
loops=473)
   Index Cond: 
((ea.incidentid)::text = (outer.incidentid)::text)
   Filter: (((recordtext)::text ~~ 
'%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
   -  Index Scan using ea1 on ea  
(cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 
loops=1011)
 Index Cond: 
((outer.incidentid)::text = (ea.incidentid)::text)
 Filter: ((recordtext)::text ~~ 
'%RED%'::text)
 -  Index Scan using em_incidentid_idx on em  
(cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 
loops=701)
   Index Cond: ((outer.incidentid)::text = 
(em.incidentid)::text)
   Filter: ((entrydate = '2005-01-01 
00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 
00:00:00'::timestamp without time zone))
   -  Index Scan using eg_incidentid_idx on eg  
(cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 
loops=72)
 Index Cond: ((outer.incidentid)::text = 
(eg.incidentid)::text)
 Total runtime: 446871.880 ms
(22 rows)

-
EXPLANATION
-
The reason for the redundant LIKE clause is that first, I only want 
those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT, 
those two words may exist across multiple records with the same 
incidentid.  Then, I only want to actually work with the rows that 
contain one of the words.  This query will repeat the same logic for 
however many keywords are entered by the user.  I have investigated 
text searching options and have not found them to be congruous with my 
application.

Why is it choosing a sequential scan one part of the query when 
searching for the words, yet using an index scan for another part of 
it?  Is there a better way to structure the query to give it better 
hints?

I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.
Thank you for any advice.
-Dan

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
 Sorry to bother everyone with yet another my query isn't using an 
 index problem but I am over my head on this one..  I am open to ways 
 of restructuring this query to perform better.
 
 I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has 
 been just run on the table.
 
 This is the result of:
 
 explain analyze
 select distinct
  em.incidentid,
  ea.recordtext as retdata,
  eg.long,
  eg.lat
 from
  ea, em, eg
 where
  em.incidentid = ea.incidentid and
  em.incidentid = eg.incidentid and
  em.entrydate = '2005-1-1 00:00' and
  em.entrydate = '2005-5-9 00:00'
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%RED%'
  )
 
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%CORVETTE%'
  )
  and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
 order by em.entrydate
 
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning the whole
index.  So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you 
will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.  I think 
this query does what you do 
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,  eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 
00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND 
ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or 
recordtext like '%CORVETTE%'  );

 
 -
 ANALYZE RESULTS
 -
 
   Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual 
 time=446787.056..446787.342 rows=72 loops=1)
 -  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual 
 time=446787.053..446787.075 rows=72 loops=1)
   Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
   -  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) 
 (actual time=444378.655..446786.746 rows=72 loops=1)
 -  Nested Loop  (cost=771835.10..774688.81 rows=1 width=148) 
 (actual time=444378.532..446768.381 rows=72 loops=1)
   -  Nested Loop IN Join  (cost=771835.10..774678.88 
 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
 -  Nested Loop  (cost=771835.10..774572.05 
 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
   -  HashAggregate  
 (cost=771835.10..771835.10 rows=1 width=17) (actual 
 time=444366.702..444368.583 rows=473 loops=1)
 -  Seq Scan on ea  
 (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 
 rows=592 loops=1)
   Filter: ((recordtext)::text ~~ 
 '%CORVETTE%'::text)
   -  Index Scan using ea1 on ea  
 (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 
 loops=473)
 Index Cond: ((ea.incidentid)::text = 
 (outer.incidentid)::text)
 Filter: (((recordtext)::text ~~ 
 '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
 -  Index Scan using ea1 on ea  
 (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 
 loops=1011)
   Index Cond: ((outer.incidentid)::text = 
 (ea.incidentid)::text)
   Filter: ((recordtext)::text ~~ 
 '%RED%'::text)
   -  Index Scan using em_incidentid_idx on em  
 (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
 Index Cond: ((outer.incidentid)::text = 
 (em.incidentid)::text)
 Filter: ((entrydate = '2005-01-01 
 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 
 00:00:00'::timestamp without time zone))
 -  Index Scan using eg_incidentid_idx on eg  
 (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
   Index Cond: ((outer.incidentid)::text = 
 (eg.incidentid)::text)
   Total runtime: 446871.880 ms
 (22 rows)
 
 
 -
 EXPLANATION
 -
 The reason for the redundant LIKE clause is that first, I only want 
 those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT, 
 those two words may exist across multiple records with the same 
 incidentid.  Then, I only want to actually work with the rows that 
 contain one of the words.  This query will repeat the same logic for 
 however many keywords are entered 

Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

 and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  

It is simply not possible to use B-tree indexes on these kind of text queries.  
B-trees require you to start at the left side of the field, because B-trees 
locate records via  tests.  Anywhere in the field text search requires a 
Full Text Index.

 The reason for the redundant LIKE clause is that first, I only want
 those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT,
 those two words may exist across multiple records with the same
 incidentid.  Then, I only want to actually work with the rows that
 contain one of the words.  This query will repeat the same logic for
 however many keywords are entered by the user.  I have investigated
 text searching options and have not found them to be congruous with my
 application.

Sounds like you either need to restructure your application, restructure your 
database (so that you're not doing anywhere in field searches), or buy 32GB 
of ram so that you can cache the whole table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Russell Smith [EMAIL PROTECTED] writes:
 On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
 and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
 
 You cannot use an index for %CORVETTE%, or %RED%.

Not a btree index anyway.  Dan might have some success here with a
full-text-indexing package (eg, contrib/tsearch2)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 6:51 PM, Russell Smith wrote:
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning  
the whole
index.  So it's much cheaper to do a sequence scan.

While I believe you, I'm confused by this line in my original EXPLAIN  
ANALYZE:

-  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)  
(actual time=2.085..2.309 rows=2 loops=473)
Index Cond:  
((ea.incidentid)::text = (outer.incidentid)::text)
Filter: (((recordtext)::text  
~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
Doesn't that mean it was using an index to filter?  Along those lines,  
before I created index 'ea1', the query was much much slower.  So, it  
seemed like creating this index made a difference.

One possible way to make the query faster is to limit based on date,  
as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.   
I think this query does what you do
above, and I think it will be faster, but I don't know.

I REALLY like this idea! If I could just filter by date first and then  
sequential scan through those, it should be very manageable.  Hopefully  
I can keep this goal while still accommodating the requirement listed  
in my next paragraph.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
'%RED%' or recordtext like '%CORVETTE%'  );

I have run this, and while it is very fast, I'm concerned it's not  
doing what I need.  Here's the situation:

Due to the format of the systems with which I integrate ( I have no  
control over these formats ), we will get these 'recordtext' values one  
line at a time, accumulating over time.  The only way I can find to  
make this work is to insert a new record for each line.  The problem  
is, that when someone wants to search multiple keywords, they expect  
these words to be matched across multiple records with a given incident  
number.

 For a very simple example:
IncidentID		DateRecordtext
--		-			 
---
1			2005-05-01 14:21	blah blah blah RED blah blah
			2005-05-01 14:23	not what we are looking for
1			2005-05-02 02:05	blah CORVETTE blah blah

So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
'%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
condition will only be applied to a single row of recordtext at a time,  
not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to  
create another table that merges all of my recordtext rows into a  
single 'text' field type?  If so, this is where I run into problems, as  
my logic also needs to match multiple words in their original order.  I  
may also receive additional updates to the previous data.  In that  
case, I need to replace the original record with the latest version of  
it.  If I have already concatenated these rows into a single field, the  
logic to in-line replace only the old text that has changed is very  
very difficult at best.  So, that's the reason I had to do two  
subqueries in my example.  Please tell me if I misunderstood your logic  
and it really will match given my condition above, but it didn't seem  
like it would.

Thanks again for the quick responses!  This list has been a great  
resource for me.

-Dan
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

 While I believe you, I'm confused by this line in my original EXPLAIN

 ANALYZE:
  -  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)
  (actual time=2.085..2.309 rows=2 loops=473)
  Index Cond:
  ((ea.incidentid)::text = (outer.incidentid)::text)
  Filter: (((recordtext)::text
  ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

The index named is matching based on incidentid -- the join condition.  The 
filter is applied against the table rows, i.e. a scan.

 If I were to use tsearch2 for full-text indexing, would I need to
 create another table that merges all of my recordtext rows into a
 single 'text' field type?  

No.   Read the OpenFTS docs, they are fairly clear on how to set up a simple 
FTS index. (TSearch2 ~~ OpenFTS)

 If so, this is where I run into problems, as 
 my logic also needs to match multiple words in their original order.  

You do that by doubling up ... that is, use the FTS index to pick all rows 
that contain RED and CORVETTE, and then check the order.  I'll also note 
that your current query is not checking word order. 

Example:
WHERE recordtext_fti @@ to_tsquery ('default', 'RED  CORVETTE')
AND recordtext LIKE '%RED%CORVETTE%'

I'm doing something fairly similar on one of my projects and it works very 
well.

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably slower.  
2) they are only fast when cached in RAM (and when cached, are *very* fast).  
So if you have a variety of other processes that tend to fill up RAM between 
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, which 
will increase the size of the table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Whence the Opterons?

2005-05-08 Thread Geoffrey
Mischa Sandberg wrote:
After reading the comparisons between Opteron and Xeon processors for Linux,
I'd like to add an Opteron box to our stable of Dells and Sparcs, for 
comparison.
IBM, Sun and HP have their fairly pricey Opteron systems.
The IT people are not swell about unsupported purchases off ebay.
Anyone care to suggest any other vendors/distributors?
Looking for names with national support, so that we can recommend as much to our
customers.
Monarch Computer http://www.monarchcomputer.com/
They have prebuilt and custom built systems.
--
Until later, Geoffrey
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 -  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47) 
 (actual time=2.085..2.309 rows=2 loops=473)
   Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text)
   Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text 
 ~~ '%CORVETTE%'::text))

 Doesn't that mean it was using an index to filter?

No.  The Index Cond shows it is using the index only for the join
condition.  A Filter is an additional filter condition that happens to
get applied at this plan node --- but it'll be applied to every row the
index finds for the index condition.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:06 PM, Josh Berkus wrote:

If I were to use tsearch2 for full-text indexing, would I need to
create another table that merges all of my recordtext rows into a
single 'text' field type?
No.   Read the OpenFTS docs, they are fairly clear on how to set up a 
simple
FTS index. (TSearch2 ~~ OpenFTS)

If so, this is where I run into problems, as
my logic also needs to match multiple words in their original order.
I have been reading the Tsearch2 docs and either I don't understand 
something or I'm not communicating my situation clearly enough.  It 
seems that Tsearch2 has a concept of document.  And, in everything I 
am reading, they expect your document to be all contained in a single 
row.  Since my words can be spread across multiple rows, I don't see 
that Tsearch2 will combine all 'recordtext' row values with the same 
incidentid into a single vector.  Am I overlooking something in the 
docs?

I'm doing something fairly similar on one of my projects and it works 
very
well.

I'd be curious what similarities they have?  Is it the searching across 
multiple rows or the order of words?

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably 
slower.
2) they are only fast when cached in RAM (and when cached, are *very* 
fast).
So if you have a variety of other processes that tend to fill up RAM 
between
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, 
which
will increase the size of the table.
Duly noted.  If this method can search across rows, I'm willing to 
accept this overhead for the speed it would add.

In the meantime, is there any way I can reach my goal without Tsearch2 
by just restructuring my query to narrow down the results by date 
first, then seq scan for the 'likes'?

-Dan
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
 
 On May 8, 2005, at 6:51 PM, Russell Smith wrote:
 
[snip]
  select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
  eg.lat
  FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
  '2005-1-1 00:00'
  AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
  AND ea.recordtext like '%CORVETTE%')
  JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
  '%RED%' or recordtext like '%CORVETTE%'  );
 
 
 I have run this, and while it is very fast, I'm concerned it's not  
 doing what I need.
How fast is very fast?


 Here's the situation: 
 
 Due to the format of the systems with which I integrate ( I have no  
 control over these formats ), we will get these 'recordtext' values one  
 line at a time, accumulating over time.  The only way I can find to  
 make this work is to insert a new record for each line.  The problem  
 is, that when someone wants to search multiple keywords, they expect  
 these words to be matched across multiple records with a given incident  
 number.
 
   For a very simple example:
 
 IncidentID  DateRecordtext
 --  -
 ---
 1   2005-05-01 14:21 blah blah blah RED blah blah
    2005-05-01 14:23 not what we are looking for
 1   2005-05-02 02:05 blah CORVETTE blah blah
 
 So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
 '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
 condition will only be applied to a single row of recordtext at a time,  
 not a whole group with the same incident number.
 
 If I were to use tsearch2 for full-text indexing, would I need to  
 create another table that merges all of my recordtext rows into a  
 single 'text' field type?  If so, this is where I run into problems, as  
 my logic also needs to match multiple words in their original order.  I  
 may also receive additional updates to the previous data.  In that  
 case, I need to replace the original record with the latest version of  
 it.  If I have already concatenated these rows into a single field, the  
 logic to in-line replace only the old text that has changed is very  
 very difficult at best.  So, that's the reason I had to do two  
 subqueries in my example.  Please tell me if I misunderstood your logic  
 and it really will match given my condition above, but it didn't seem  
 like it would.
 
 Thanks again for the quick responses!  This list has been a great  
 resource for me.
 
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  OR 
ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate.  However I would cool it VERY NASTY. Josh's 
solutions may be better.
However much of the data should be in memory once the subplans are done, so it 
may be quite fast.
you may 
 

 -Dan
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:32 PM, Russell Smith wrote:
I have run this, and while it is very fast, I'm concerned it's not
doing what I need.
How fast is very fast?
It took 35 seconds to complete versus ~450 my old way.
select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  
OR ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like 
'%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

Yes, it is nasty, but so was my previous query :)  So long as this is 
faster, I'm ok with that.  I'll see if i can make this work.  Thank you 
very much.

-Dan
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query tuning help

2005-05-08 Thread Klint Gore
On Sun, 8 May 2005 20:31:38 -0600, Dan Harris [EMAIL PROTECTED] wrote:
 Duly noted.  If this method can search across rows, I'm willing to 
 accept this overhead for the speed it would add.

You could use intersect to search across rows.  Using tsearch2 will look
up the RED and CORVETTE using the index and intersect will pull out the
commmon rows.

 In the meantime, is there any way I can reach my goal without Tsearch2 
 by just restructuring my query to narrow down the results by date 
 first, then seq scan for the 'likes'?


select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea, em, eg, 
(
select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00' and
recordtext like '%RED%'

intersect

select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00' and
recordtext like '%CORVETTE%'
) as iid
where
em.incidentid = ea.incidentid and
em.incidentid = eg.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00'
and ea.incidentid = iid.incidentid 
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate

klint.

+---+-+
: Klint Gore: Non rhyming:
: EMail   : [EMAIL PROTECTED]   :  slang - the:
: Snail   : A.B.R.I.:  possibilities  :
: Mail  University of New England   :  are useless   :
:   Armidale NSW 2351 Australia : L.J.J.  :
: Fax : +61 2 6772 5376 : :
+---+-+

---(end of broadcast)---
TIP 8: explain analyze is your friend