Re: [PERFORM] performance on new linux box

2010-07-09 Thread Russell Smith
On 09/07/10 02:31, Ryan Wexler wrote:
 Thanks a lot for all the comments.  The fact that both my windows box
 and the old linux box both show a massive performance improvement over
 the new linux box seems to point to hardware to me.  I am not sure how
 to test the fsync issue, but i don't see how that could be it.

 The raid card the server has in it is:
 3Ware 4 Port 9650SE-4LPML RAID Card

 Looking it up, it seems to indicate that it has BBU

 The only other difference between the boxes is the postgresql
 version.  The new one has 8.4-2 from the yum install instructions on
 the site:
 http://yum.pgrpms.org/reporpms/repoview/pgdg-centos.html

 Any more thoughts?
Really dumb idea, you don't happen to have the build of the RPM's that
had debug enabled do you?  That resulted in significant performance problem?

Regards

Russell


Re: [PERFORM] Low perfomance SUM and Group by large databse

2010-06-22 Thread Russell Smith
On 22/06/10 00:42, Sergio Charpinel Jr. wrote:
 Hi,

[snip]

 = explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
 tcp_flags, ip_proto,SUM(bytes),SUM(packets),SUM(flows) FROM
 acct_2010_25 WHERE stamp_inserted='2010-06-20 10:10' AND
 stamp_inserted'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst,
 port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0;
   
QUERY PLAN
  
 --
  Limit  (cost=3998662.81..3998662.94 rows=50 width=50) (actual
 time=276981.107..276981.133 rows=50 loops=1)
-  Sort  (cost=3998662.81..4001046.07 rows=953305 width=50)
 (actual time=276981.105..276981.107 rows=50 loops=1)
  Sort Key: sum(bytes)
  -  GroupAggregate  (cost=3499863.27..3754872.33 rows=953305
 width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
-  Sort  (cost=3499863.27..3523695.89 rows=9533049
 width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
  Sort Key: ip_src, port_src, ip_dst, port_dst,
 tcp_flags, ip_proto

You are having to sort and aggregate a large number of rows before you
can get the top 50.  That's 9 million rows in this case, width 50 =
400MB+ sort.  That's going to be slow as you are going to have to sort
it on disk unless you bump up sort mem to 500Mb (bad idea).  So unless
you have really fast storage for temporary tables it's going to take a
while.  About 2.5 minutes you are experiencing at the moment is probably
not too bad.

I'm sure improvements have been made in the area since 8.1 and if you
are able to upgrade to 8.4 which is also offered by Centos5 now, you
might get benefit there.  I can't remember the specific benefits, but I
believe sorting speed has improved, your explain analyze will also give
you more information about what's going on with disk/memory sorting.

  -  Seq Scan on acct_2010_25
  (cost=0.00..352648.10 rows=9533049 width=50) (actual
 time=0.038..50860.391 rows=9494165 loops=1)
Filter: ((stamp_inserted = '2010-06-20
 10:10:00'::timestamp without time zone) AND (stamp_inserted 
 '2010-06-21 10:10:00'::timestamp without time zone))
  Total runtime: 278791.661 ms
 (9 registros)

 Another one just summing bytes (still low):

 = explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
 tcp_flags, ip_proto,SUM(bytes) FROM acct_2010_25 WHERE
 stamp_inserted='2010-06-20 10:10' AND stamp_inserted'2010-06-21
 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags,
 ip_proto LIMIT 50 OFFSET 0;
   
 QUERY PLAN
   
 
  Limit  (cost=3395202.50..3395213.12 rows=50 width=42) (actual
 time=106261.359..106261.451 rows=50 loops=1)
-  GroupAggregate  (cost=3395202.50..3602225.48 rows=974226
 width=42) (actual time=106261.357..106261.435 rows=50 loops=1)
  -  Sort  (cost=3395202.50..3419558.14 rows=9742258 width=42)
 (actual time=106261.107..106261.169 rows=176 loops=1)
Sort Key: ip_src, port_src, ip_dst, port_dst,
 tcp_flags, ip_proto
-  Seq Scan on acct_2010_25  (cost=0.00..367529.72
 rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
  Filter: ((stamp_inserted = '2010-06-20
 10:10:00'::timestamp without time zone) AND (stamp_inserted 
 '2010-06-21 10:10:00'::timestamp without time zone))
  Total runtime: 109911.882 ms
 (7 registros)


 The server has 2 Intel(R) Xeon(R) CPU  E5430 @ 2.66GHz and 16GB RAM.
 I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
 increased checkpoint_segments to 50).

Checkpoint segments won't help you as the number of segments is about
writing to the database and how fast that can happen.


 What can I change to increase performance?

Increasing sort-memory (work_mem) will give you speed benefits even
though you are going to disk.  I don't know how much spare memory you
have, but trying other values between 8MB and 128MB may be useful just
for the specific query runs.  If you can afford 512Mb for each of the
two sorts, go for that, but it's dangerous as mentioned due to the risk
of using more RAM than you have.  work_mem allocates that amount of
memory per sort.

If you are running these queries all the time, a summary table the
produces there reports on a regular basis, maybe daily or even hourly
would be useful.  Basically the 

Re: [PERFORM] Getting time of a postgresql-request

2009-08-17 Thread Russell Smith
Kai Behncke wrote:

 But I would like to get it in a php-script, like

 $timerequest_result=pg_result($timerequest,0);

 (well, that does not work).

 I wonder: Is there another way to get the time a request needs?
 How do you handle this?
   
$time = microtime()
$result = pg_result($query);
echo Time to run query and return result to PHP: .(microtime() - $time);

Something like that.

Regards

Russell

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SQL select query becomes slow when using limit (with no offset)

2009-08-05 Thread Russell Smith
Kees van Dieren wrote:
 Hi Folks,

 Thanks for your response.

 I have added the following index (suggested by other post):

 CREATE INDEX events_events_cleared_eventtype
   ON events_events
   USING btree
   (eventtype_id, cleared)
   WHERE cleared = false;

 Also with columns in reversed order.

 No changes in response time noticed.

 Index on cleared column already is there (indices are in sql file
 attached to initial post.). eventtype_id has a foreign key constraint,
 which adds an index automatically I believe?

 The explain analyze results for both queries:
 explain analyze select events_events.id http://events_events.id FROM
 events_events
 left join events_event_types on
 events_events.eventType_id=events_event_types.id
 http://events_event_types.id
 where events_event_types.severity=70
 and not events_events.cleared
 order by events_events.dateTime DESC LIMIT 100
 
 Limit  (cost=0.00..125.03 rows=100 width=16) (actual
 time=0.046..3897.094 rows=77 loops=1)
   -  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual
 time=0.042..3896.881 rows=77 loops=1)
 -  Index Scan Backward using events_events_datetime_ind on
 events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual
 time=0.025..720.345 rows=360637 loops=1)
   Filter: (NOT cleared)
 -  Index Scan using events_event_types_pkey on
 events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual
 time=0.003..0.003 rows=0 loops=360637)
   Index Cond: (events_event_types.id
 http://events_event_types.id = events_events.eventtype_id)
   Filter: (events_event_types.severity = 70)
 Total runtime: 3897.268 ms

The plan here is guessing that we will find the 100 rows we want pretty
quickly by scanning the dateTime index.  As we aren't expecting to have
to look through many rows to find 100 that match the criteria.  With no
cross column statistics it's more a guess than a good calculation.  So
the guess is bad and we end up scanning 360k rows from the index before
we find what we want.   My skills are not up to giving specific advise
on how to avert this problem.  Maybe somebody else can help there.
 explain analyze select events_events.id http://events_events.id FROM
 events_events
 left join events_event_types on
 events_events.eventType_id=events_event_types.id
 http://events_event_types.id
 where events_event_types.severity=70
 and not events_events.cleared
 order by events_events.dateTime DESC
 
 Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual
 time=1084.842..1084.951 rows=77 loops=1)
   Sort Key: events_events.datetime
   Sort Method:  quicksort  Memory: 20kB
   -  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual
 time=1080.789..1084.696 rows=77 loops=1)
 Hash Cond: (events_events.eventtype_id =
 events_event_types.id http://events_event_types.id)
 -  Seq Scan on events_events  (cost=0.00..9968.06
 rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)
   Filter: (NOT cleared)
 -  Hash  (cost=1.89..1.89 rows=16 width=8) (actual
 time=0.077..0.077 rows=16 loops=1)
   -  Seq Scan on events_event_types  (cost=0.00..1.89
 rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)
 Filter: (severity = 70)
 Total runtime: 1085.145 ms

 Any suggestions?
This plan is faster as you avoid the index scan.  The planner is
preferring to do a tablescan to find what it needs.  This is much faster
than the 360k random I/O index lookups.  You can force this type of plan
with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
solution.

eg

explain analyze SELECT * FROM
(SELECT events_events.id http://events_events.id FROM events_events
 LEFT JOIN events_event_types on
events_events.eventType_id=events_event_types.id
http://events_event_types.id
WHERE events_event_types.severity=70
 AND not events_events.cleared
ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100

Regards

Russell

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-01 Thread Russell Smith
Pavel Stehule wrote:
 Hello

 2008/9/1 David West [EMAIL PROTECTED]:
   
 Thanks for your suggestion but the result is the same.

 Here is the explain analyse output from different queries.
 Select * from my_table where A is null and B = '21' limit 15

 Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
 time=85837.043..85896.140 rows=15 loops=1)
   -  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)
 Filter: ((A IS NULL) AND ((B)::text = '21'::text))
 Total runtime: 85896.214 ms

 
[snip]

Further to Pavel's comments;

(actual time=85837.038..85896.091 rows=15 loops=1)

That's 85 seconds on a sequence scan to return the first tuple.  The table is 
not bloated by any chance is it?

Regards

Russell




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Define all IP's in the world in pg_hba.conf

2008-07-03 Thread Russell Smith
idc danny wrote:
 Hi everybody,

 I know that this group deals with performance but is the only one on which 
 I'm subscribed, so my apologize in advance for the question.

 I want to allow everybody in the world, all IP's, to connect to my server. 
 How do I accomplish that? Definitely, it's not a good solution to enter all 
 them manually in pg_hba.conf :).
   
what's wrong with 0.0.0.0/0 ?
 Currently, if above question cannot be answered, I want to achieve to allow 
 the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the 
 future it can expand to all IP's.

 Thank you,
 Danny


  


   

   


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Heavy write activity on first vacuum of fresh TOAST data

2007-12-13 Thread Russell Smith

Simon Riggs wrote:

On Thu, 2007-12-13 at 15:19 -0600, Kevin Grittner wrote:
  

What impact would lack of the hint bits have until a vacuum?



Vacuum isn't important here. Its the first idiot to read the data that
gets hit.

  
Given vacuum must then touch every page, is there a win in only setting 
hint bits on pages where vacuum has to do some other work on the page?  
As vacuum is causing significant IO load for data that may not be 
accessed for some time.


The question becomes what is the impact of not setting hint bits?  Is it 
better or worse than the IO caused by vacuum?


Regards

Russell Smith

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] TB-sized databases

2007-11-29 Thread Russell Smith

Simon Riggs wrote:

On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote:
  

Simon Riggs wrote:


All of those responses have cooked up quite a few topics into one. Large
databases might mean text warehouses, XML message stores, relational
archives and fact-based business data warehouses.

The main thing is that TB-sized databases are performance critical. So
it all depends upon your workload really as to how well PostgreSQL, or
another other RDBMS vendor can handle them.


Anyway, my reason for replying to this thread is that I'm planning
changes for PostgreSQL 8.4+ that will make allow us to get bigger and
faster databases. If anybody has specific concerns then I'd like to hear
them so I can consider those things in the planning stages
  
it would be nice to do something with selects so we can recover a rowset 
on huge tables using a criteria with indexes without fall running a full 
scan.


In my opinion, by definition, a huge database sooner or later will have 
tables far bigger than RAM available (same for their indexes). I think 
the queries need to be solved using indexes enough smart to be fast on disk.



OK, I agree with this one. 


I'd thought that index-only plans were only for OLTP, but now I see they
can also make a big difference with DW queries. So I'm very interested
in this area now.

  
If that's true, then you want to get behind the work Gokulakannan 
Somasundaram 
(http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has 
done with relation to thick indexes.  I would have thought that concept 
particularly useful in DW.  Only having to scan indexes on a number of 
join tables would be a huge win for some of these types of queries.


My tiny point of view would say that is a much better investment than 
setting up the proposed parameter.  I can see the use of the parameter 
though.  Most of the complaints about indexes having visibility is about 
update /delete contention.  I would expect in a DW that those things 
aren't in the critical path like they are in many other applications.  
Especially with partitioning and previous partitions not getting may 
updates, I would think there could be great benefit.  I would think that 
many of Pablo's requests up-thread would get significant performance 
benefit from this type of index.  But as I mentioned at the start, 
that's my tiny point of view and I certainly don't have the resources to 
direct what gets looked at for PostgreSQL.


Regards

Russell Smith


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Join performance

2007-11-09 Thread Russell Smith

Pepe Barbe wrote:

Hello,

I am having an issue on PostgreSQL 8.0.12. In the past we had 
performance issues with the query planner for queries on some tables 
where we knew we had indexes and it was doing a sequential scan, and for 
this reason we issue SET enable_seqscan = FALSE for some queries.


Recently we have stumbled upon one of these kind of queries that is 
giving terrible performance, because seqscan is disabled. I've reduced 
the problem to a a command like this one:


SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER JOIN 
temp_busy_hr USING(start_time,bsc_id,sect_id);


Where temp_busy_hr is a temporary table.


Have you tried analyzing the temp_busy_hr table?
Possibly adding an index to the temp table can help if you are doing lots of 
queries.



If the previous is issued with seqscan TRUE, it runs within reasonable 
time, else it runs for ever. The query plan for the previous query with 
enable_seqscan = TRUE:


It would be worth know how far the estimates are out.  Also, have you tried 
altering the statistics target
for relevant columns to increase the accuracy?



QUERY PLAN
 


Limit  (cost=0.00..384555.98 rows=1 width=3092)
  -  Nested Loop  (cost=0.00..384555.98 rows=1 width=3092)
Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND 
(inner.sect_id = outer.sect_id))
-  Nested Loop  (cost=0.00..368645.64 rows=28 width=1192)
  Join Filter: ((outer.sect_id = inner.sect_id) AND (outer.bsc_id = 
inner.bsc_id))
  -  Seq Scan on temp_busy_hr  (cost=0.00..24.00 rows=1400 
width=24)
  -  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  
(cost=0.00..226.66 rows=2094 width=1168)
Index Cond: (outer.start_time = 
gsm_amr_metrics.start_time)
-  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 
 (cost=0.00..528.77 rows=1973 width=1936)
  Index Cond: (t1.start_time = outer.start_time)
(10 rows)

and the plan for enable_seqscan = FALSE:

QUERY PLAN
 


Limit  (cost=10097.16.. 100720844.01 rows=1 width=3092)
  -  Nested Loop  (cost=10097.16..100720844.01 rows=1 width=3092)
Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND 
(inner.sect_id = outer.sect_id))
-  Merge Join  (cost=10097.16..100704933.67 rows=28 width=1192)
  Merge Cond: (outer.start_time = inner.start_time)
  Join Filter: ((inner.sect_id = outer.sect_id) AND (inner.bsc_id = 
outer.bsc_id))
  -  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  
(cost=0.00..631211.45 rows=6005551 width=1168)
  -  Sort  (cost=10097.16..10100.66 rows=1400 width=24)
Sort Key: temp_busy_hr.start_time
-  Seq Scan on temp_busy_hr  
(cost=1.00..10024.00 rows=1400 width=24)
-  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 
 (cost=0.00..528.77 rows=1973 width=1936)
  Index Cond: (t1.start_time = outer.start_time)
(12 rows)

Any ideas what could I try to fix this problem?

Thanks,
Pepe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Optimising in queries

2007-08-23 Thread Russell Smith

Michael Glaesemann wrote:


On Aug 22, 2007, at 5:58 , Russell Smith wrote:


Stephen Davies wrote:
select count(rdate),rdate from reading where sensor_id in 
(1137,1138,1139,1140) group by rdate order by rdate desc limit 1;



It would have been helpful to see the table definition here.  I can 
say up front that array processing in postgres is SLOW.


Um, what array processing are you seeing here? IN (a, b, b) is not an 
array construct.


Filter: (sensor_id = ANY ('{1137,1138,1139,1140}'::integer[]))

I've never seen this plan item except for when array's are involved.  I 
could be wrong.  I'd like to know how this is generated when you don't 
have an array.


Regards

Russell Smith

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Poor Performance after Upgrade

2007-08-22 Thread Russell Smith

Ben Perrault wrote:

Hi,

I recently inherited a very old (PostgreSQL 7.0.3) database, and have 
migrated it to 8.2.4 but have run into a performance issue.


Basically, I did a dump and import into the new database, vacuumed and 
created fresh indexes and everything is work great except the 
following type of query (and similar):


 SELECT tsr.stepId, tsr.testType, tsr.problemReportId, tsr.excpt, 
tcr.caseId

FROM   TestCaseRun tcr, TestStepRun tsr
WHERE  tcr.parentSN = 194813
AND(tsr.testType ''
OR  tsr.problemReportId  ''
OR  tsr.excpt'')
ANDtsr.parentSN =  tcr.recordSN
This query is not similar to the plans listed below.  It will not 
result in a sort/unique unless tcr or tsr are views.


Can we also see explain analyze instead of just explain, it's much more 
helpful to see what's actually going on.  Especially since the row

estimates are quite different in the two plans.

You also mentioned above that you vacuumed, did you analyze with that?  
vacuum doesn't do analyze in 8.2.4.  You have to say vacuum analyze, 
or just analyze.


What used to take 250ms or so on the old database now takes between 55 
and 60 Seconds.


On the old database, the query plan looks like this:

Unique  (cost=13074.30..13078.36 rows=32 width=68)
  -  Sort  (cost=13074.30..13074.30 rows=324 width=68)
-  Nested Loop  (cost=0.00..13060.77 rows=324 width=68)
  -  Index Scan using parentsn_tcr_indx on testcaserun 
tcr (cost=0.00..444.83 rows=111 width=16)
  -  Index Scan using parentsn_tsr_indx on teststeprun 
tsr (cost=0.00..113.42 rows=27 width=52)


And on the new database it looks like this:

 Unique  (cost=206559152.10..206559157.14 rows=336 width=137)
   -  Sort  (cost=206559152.10..206559152.94 rows=336 width=137)
 Sort Key: tsr.stepid, tsr.testtype, tsr.problemreportid, 
tsr.excpt, tcr.caseid
 -  Nested Loop  (cost=1.00..106559138.00 rows=336 
width=137)
   -  Index Scan using parentsn_tcr_indx on testcaserun 
tcr (cost=0.00..17.00 rows=115 width=11)

 Index Cond: (parentsn = 186726)
   -  Index Scan using parentsn_tsr_indx on teststeprun 
tsr (cost=0.00..56089.00 rows=75747 width=134)

 Index Cond: (tsr.parentsn = tcr.recordsn)
 Filter: ((testtype  ''::text) OR 
((problemreportid)::text  ''::text) OR (excpt  ''::text))

(9 rows)

I'm fairly familiar with PostgreSQL, but I have no idea where to start 
in trying to trouble shoot this huge performance discrepancy. The 
hardware and OS are the same.


And the data size is exactly the same between the two, and the total 
data size is about 7.5GB, with the largest table (teststeprun 
mentioned above) being about 15 million rows.


Any pointers to where to start troubleshooting this or how to change 
the query to work better would be appreciated.
Look at row estimates vs reality.  They should be pretty close in the 
new version.
Why are the costs so high in the new plan?  1 happens to be a 
nice number that's used when you attempt to turn off a certain type of plan.

EXPLAIN ANALZE (query) is your friend.

Regards

Russell


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimising in queries

2007-08-22 Thread Russell Smith

Stephen Davies wrote:

I have a PostgreSQL 8.2.4 table with some seven million rows.

The psql query:

select count(rdate),rdate from reading where sensor_id in 
(1137,1138,1139) group by rdate order by rdate desc limit 1;


takes a few seconds but:

select count(rdate),rdate from reading where sensor_id in 
(1137,1138,1139,1140) group by rdate order by rdate desc limit 1;


  
It would have been helpful to see the table definition here.  I can say 
up front that array processing in postgres is SLOW.


(anything with four or more values in the in list) takes several 
minutes.


Is there any way to make the larger queries more efficient?

Both rdate and sensor_id are indexed and the database is vacuumed every 
night.


The values in the in list are seldom as neat as in the above 
examples. Actual values can range from 1 to about 2000. The number of 
values ranges from 2 to about 10.


Explain outputs are:

benparts=# explain select count(rdate),rdate from reading where 
sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc 
limit 1;

QUERY PLAN
---
 Limit  (cost=0.00..39890.96 rows=1 width=8)
   -  GroupAggregate  (cost=0.00..7938300.21 rows=199 width=8)
 -  Index Scan Backward using date on reading  
(cost=0.00..7937884.59 rows=82625 width=8)
   Filter: (sensor_id = ANY 
('{1137,1138,1139,1140}'::integer[]))

(4 rows)
  
I'm unsure of how you produced a plan like this without the benefit of 
seeing the table definition.
benparts=# explain select count(rdate),rdate from reading where 
sensor_id in (1137,1138,1139) group by rdate order by rdate desc limit 
1;

 QUERY PLAN
-
 Limit  (cost=48364.32..48364.32 rows=1 width=8)
   -  Sort  (cost=48364.32..48364.49 rows=69 width=8)
 Sort Key: rdate
 -  HashAggregate  (cost=48361.35..48362.21 rows=69 width=8)
   -  Bitmap Heap Scan on reading  (cost=535.53..48218.10 
rows=28650 width=8)
 Recheck Cond: (sensor_id = ANY 
('{1137,1138,1139}'::integer[]))
 -  Bitmap Index Scan on reading_sensor  
(cost=0.00..528.37 rows=28650 width=0)
   Index Cond: (sensor_id = ANY 
('{1137,1138,1139}'::integer[]))

(8 rows)


  

As mentioned already, you need explain analyze.

However I again will say that array processing is postgres is SLOW.  It 
would strongly recommend redesigning your schema to use a table with 
sensor_id's that correspond to the primary key in the reading table.


Rethinking the way you are going about this will probably be the most 
effective solution, but we will need more information if you are not 
comfortable doing that yourself.


Regards

Russell Smith


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-27 Thread Russell Smith

Guido Neitzer wrote:

On 27.01.2007, at 00:35, Russell Smith wrote:

Guess 1 would be that your primary key is int8, but can't be certain 
that is what's causing the problem.


Why could that be a problem?
Before 8.0, the planner would not choose an index scan if the types were 
different int8_col = const, int8_col = 4.
4 in this example is cast to int4.  int8 != int4.  So the planner will 
not choose an index scan.


Regards

Russell Smith


cug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Russell Smith

Carlos Moreno wrote:


Hi,

I find various references in the list to this issue of queries
being too slow because the planner miscalculates things and
decides to go for a sequenctial scan when an index is available
and would lead to better performance.

Is this still an issue with the latest version?   I'm doing some
tests right now, but I have version 7.4  (and not sure when I will
be able to spend the effort to move our system to 8.2).

When I force it via  set enable_seqscan to off, the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas
with the default, it chooses a seq. scan, for a total execution
time around 10 msec!!  (yes: 100 times slower!).  The table has
20 thousand records, and the WHERE part of the query uses one
field that is part of the primary key  (as in, the primary key
is the combination of field1,field2, and the query involves a
where field1=1 and some_other_field=2).  I don't think I'm doing
something wrong, and I find no reason not to expect the query
planner to choose an index scan.

For the time being, I'm using an explicit enable_seqscan off
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?
Please supply explain analyze for the query in both the index and 
sequence scan operation.  We may be able to tell you why it's choosing 
the wrong options.  Guess 1 would be that your primary key is int8, but 
can't be certain that is what's causing the problem.


Regards

Russell Smith


Thanks,

Carlos



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] unusual performance for vac following 8.2 upgrade

2007-01-11 Thread Russell Smith

Kim wrote:

snip

OS: Solaris 10
write transactions/hr: 1.5 million
size of pg_class: 535,226
number of relations: 108,694

That is a huge pg_class.  I remember some discussion recently about 
problems with 8.2 and the way it scans pg_class.  I also believe it's 
fixed in 8.2.1.  Are you running that.  If not, I suggest you upgrade 
and see if the fault still exists.


Regards

Russell Smith

Thanks to all,

Kim

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Improving SQL performance

2007-01-11 Thread Russell Smith

Carlos H. Reimer wrote:

Hi,
 
I know that the problem with the following SQL is the LOG.CODCEP = 
ENDE.CODCEP||CODLOG condition, but what can I

do to improve the performance?
 
I wouldn't say it's the join condition.  There is a nested loop join on 
500k+ rows.

Is it possible to put an index on LOG.CODCEP?

That might give you a better plan, as you only have 1 row in the left of 
the join.  so index scan would be preferable.


Regards

Russell Smith
Is there a type of index that could help or is there another way to 
build this SQL?
 
Thank you in advance!
 
explain analyze

SELECT ENDE.* , DEND.DESEND, DEND.USOEND, DEND.DUPEND,
   to_char('F') as NOVO,
   LOG.TIPLOG
  FROM TT_END ENDE LEFT OUTER JOIN TD_END DEND ON DEND.CODTAB 
= ENDE.TIPEND
   LEFT OUTER JOIN TT_LOG LOG ON LOG.CODCEP = 
ENDE.CODCEP||CODLOG

 WHERE ENDE.FILCLI = '001'
   AND ENDE.CODCLI = ' 19475';
 

QUERY 
PLAN  
--
 Nested Loop Left Join  (cost=0.00..25366.84 rows=1259 width=417) 
(actual time=1901.499..1901.529 rows=1 loops=1)
   Join Filter: ((inner.codcep)::text = ((outer.codcep)::text || 
(outer.codlog)::text))
   -  Nested Loop Left Join  (cost=0.00..4.91 rows=1 width=412) 
(actual time=0.117..0.144 rows=1 loops=1)

 Join Filter: (inner.codtab = outer.tipend)
 -  Index Scan using pk_end on tt_end ende  (cost=0.00..3.87 
rows=1 width=388) (actual time=0.066..0.078 rows=1 loops=1)
   Index Cond: ((filcli = '001'::bpchar) AND (codcli = ' 
19475'::bpchar))
 -  Seq Scan on td_end dend  (cost=0.00..1.02 rows=2 
width=33) (actual time=0.012..0.018 rows=2 loops=1)
   -  Seq Scan on tt_log log  (cost=0.00..12254.24 rows=582424 
width=17) (actual time=0.013..582.521 rows=582424 loops=1)

 Total runtime: 1901.769 ms
(9 rows)
 
\d tt_log

Table TOTALL.tt_log
 Column |  Type  | Modifiers
++---
 codbai | numeric(5,0)   | not null
 nomlog | character varying(55)  | not null
 codcep | character(8)   | not null
 
\d tt_end

  Table TOTALL.tt_end
 Column | Type  |Modifiers
+---+-
...
...
...
 codlog | character(3)  |
...
...
...
 codcep | character(5)  |
...
...

Reimer

 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Russell Smith

Sebastián Baioni wrote:

Thanks for answering.
This is my configuration:
# - Memory -

shared_buffers = 1000# min 16, at least max_connections*2, 8KB 
each

#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384# min 1024, size in KB
#max_stack_depth = 2048# min 100, size in KB

The PC where we are runing PostgreSQL server is:
AMD Athlon(tm) 64 Processor
3000+
1.79 GHz, 1.93 GB RAM
with WindowsXP Proffesional, Version 2002 Service Pack 2.

How should we set it?
Shared buffers even on a workstation should be higher than 1000 if you 
want some performance.  It depends how much memory you have spare to use 
for PostgreSQL.  But something like

shared_buffers = 2
maintenance_work_mem = 256000

Will certainly give you a performance boost.  You will have to adjust 
those figures based on whatever else you are doing on the machine.


Russell Smith.


Thanks a lot!
 Sebastián

*/Tom Lane [EMAIL PROTECTED]/* escribió:

Rod Taylor writes:
 Rebuilding the indexes or integrity confirmations are probably
taking
 most of the time.

 What is your work_mem setting?

maintenance_work_mem is the thing to look at, actually. I concur that
bumping it up might help.

regards, tom lane

---(end of
broadcast)---
TIP 2: Don't 'kill -9' the postmaster

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar





Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Russell Smith

AMIR FRANCO D. JOVEN wrote:

Hi!

Im new to PostgreSQL.

My current project uses PostgreSQL 7.3.4.
Upgrading your version of PostgreSQL to 8.1 will give you significant 
benefits to performance.


the problem is like this:

I have a table with 94 fields and a select with only one resultset in 
only one client consumes about 0.86 seconds.
The client executes three 'select' statements to perform the task 
which consumes 2.58 seconds.
With only one client this is acceptable, but the real problem is as i 
add more clients, it goes more and more slower.


for a single select with one field in one resultset, is 0.86 seconds 
normal?

You will need to attach the query.
EXPLAIN ANALYZE SELECT ...

where SELECT ... is your query.  That will help us work out what the 
problem is. 

0.86 seconds might be slow for a query that returns 1 row, it might be 
fast for a query that returns a large set with complex joins and where 
conditions.  Fast and slow are not objective terms.  They are very 
dependent on the query.




I tried vacuuming and reindexing but to no avail.
the total record count in that particular table is 456,541.

456,541 is not all that many records.  But again you will need to post 
more information for us to be able to assist.

Thanks in advance.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Postgres server crash

2006-11-15 Thread Russell Smith

Craig A. James wrote:
For the third time today, our server has crashed, or frozen, actually 
something in between.  Normally there are about 30-50 connections 
because of mod_perl processes that keep connections open.  After the 
crash, there are three processes remaining:


# ps -ef | grep postgres
postgres 23832 1  0 Nov11 pts/100:02:53 
/usr/local/pgsql/bin/postmaster -D /postgres/main
postgres  1200 23832 20 14:28 pts/100:58:14 postgres: pubchem 
pubchem 66.226.76.106(58882) SELECT
postgres  4190 23832 25 14:33 pts/101:09:12 postgres: asinex 
asinex 66.226.76.106(56298) SELECT


But they're not doing anything: No CPU time consumed, no I/O going on, 
no progress.  If I try to connect with psql(1), it says:


  psql: FATAL:  the database system is in recovery mode

And the server log has:

LOG:  background writer process (PID 23874) was terminated by signal 9
LOG:  terminating any other active server processes
LOG:  statistics collector process (PID 23875) was terminated by signal 9
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process 
exited ab

normally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process 
exited ab

... repeats about 50 times, one per process.

Questions:
 1. Any idea what happened and how I can avoid this?  It's a *big* 
problem.

 2. Why didn't the database recover?  Why are there two processes
that couldn't be killed?
 3. Where did the signal 9 come from?  (Nobody but me ever logs
in to the server machine.)

I would guess it's the linux OOM if you are running linux. You need to 
turn off killing of processes when you run out of memory.  Are you 
getting close to running out of memory?



Help!

Thanks,
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] View not using index

2005-06-13 Thread Russell Smith
On Mon, 13 Jun 2005 04:54 pm, Yves Vindevogel wrote:
 Still, when I use explain, pg says it will first sort my tables instead 
 of using my index
 How is that possible ?

Can we see the output of the explain analyze?
The definition of the view?

Regards

Russell Smith

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] View not using index

2005-06-13 Thread Russell Smith
Please CC the list.

On Mon, 13 Jun 2005 05:11 pm, Yves Vindevogel wrote:
 create or replace view vw_document_pagesperjob as
  select documentname, eventdate, eventtime, loginuser,
   fnFormatInt(pages) as pages
  from tblPrintjobs
  order by descpages, documentname ;
 
 rvponp=# explain select documentname, eventdate, eventtime, loginuser,  
 pages from tblPrintjobs order
 by descpages, documentname ;
   QUERY PLAN
  
 
   Sort  (cost=81326.07..82796.59 rows=588209 width=74)
 Sort Key: descpages, documentname
 -  Seq Scan on tblprintjobs  (cost=0.00..24958.09 rows=588209  
 width=74)
 (3 rows)
 
Postgresql must scan the entire heap anyway, so ordering in memory will be 
faster,
and you don't have to load the pages from disk in a random order.

 rvponp=# explain select documentname, eventdate, eventtime, loginuser,  
 pages from tblPrintjobs order
 by descpages, documentname limit 10 ;
   QUERY PLAN
  
 -
   Limit  (cost=0.00..33.14 rows=10 width=74)
 -  Index Scan using ixprintjobspagesperjob on tblprintjobs   
 (cost=0.00..1949116.68 rows=588209 width=74)
 (2 rows)
 
That's because an index scan is only useful if you are scanning a small
percentage of the table.  Which you are doing when you have the limit clause.

 Strange thing is, when I immediately add the limit clause, it runs like  
 I want it to run.

I am not sure of the usefulness of the first query anyway, it returns a lot of 
data.
How do you expect it not to scan the whole table when you want all the data form
the table?


 Problem is that I run this from Cocoon.  Cocoon adds the limit clause  
 itself.
 Maybe I need to rewrite everything in functions instead of views.
 
Functions, views.  It will make not difference.  The issue is the amount of 
data returned
relative to the amount of data in the table.

Regards

Russell Smith

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance nightmare with dspam (urgent)

2005-06-07 Thread Russell Smith
On Thu, 2 Jun 2005 06:19 am, Casey Allen Shobe wrote:
 I found this response to my original post, and tried every single suggestion 
 in it, which has not helped:
 
 http://archives.postgresql.org/pgsql-performance/2004-11/msg00416.php
 
 I'm sorry to come begging for help, but this is a MAJOR problem with no 
 logical explanation, and is almost certainly the fault of PostgreSQL, because 
 the database and contents have been identical across all the hosts, and some 
 work beautifully with no tuning whatsoever; so I don't feel I'm wrong in 
 placing blame...

I would personally strongly suggest turing on logging
on the PG server for about an hour, sifting through the runtimes for the 
queries and
finding which ones are taking all the time.  I'd then run explain analyze and 
see what
is happening.  I have heard you could get much better performance by rewriting 
some of
the dspam queries to use PG features.  But I've never used dspam, so I can't 
verify that.

But a quick look through the dspam pg driver source...

/* Declare Cursor */
#ifdef VIRTUAL_USERS
strcpy (query, DECLARE dscursor CURSOR FOR SELECT DISTINCT username FROM 
dspam_virtual_uids);
#else
strcpy (query, DECLARE dscursor CURSOR FOR SELECT DISTINCT uid FROM 
dspam_stats);
#endif

If that's run often, it probably won't give the best performance, but that's a 
guess.
Again I'd suggest turning up the logging.


 
 All machines run Gentoo Linux.  All have the same package versions.  Disk I/O 
 doesn't seem to be related - the 733MHz server had a 33MB/s IDE drive, the 
 2.4GHz server had a RAID 5 with 3 ultra320 drives:  neither of those required 
 any tuning.  The new 3.0GHz has a mirror raid with 2 ultra320 drives, and the 
 3000+ that tuning fixed had an ultra160 disk not in a RAID.
 
 I really like PostgreSQL, and really don't want to use MySQL for dspam, but 
 if 
 I can't get this worked out ASAP I'm going to have to change for the sake of 
 our customers.  Any help is GREATLY appreciated!
Again I'd suggest turning up the logging.


 
 I'm online on instant messengers (contact IDs shown below), monitoring my 
 email, and will be on #postgresql on Freenode.
 
 Cheers,

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
 Sorry to bother everyone with yet another my query isn't using an 
 index problem but I am over my head on this one..  I am open to ways 
 of restructuring this query to perform better.
 
 I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has 
 been just run on the table.
 
 This is the result of:
 
 explain analyze
 select distinct
  em.incidentid,
  ea.recordtext as retdata,
  eg.long,
  eg.lat
 from
  ea, em, eg
 where
  em.incidentid = ea.incidentid and
  em.incidentid = eg.incidentid and
  em.entrydate = '2005-1-1 00:00' and
  em.entrydate = '2005-5-9 00:00'
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%RED%'
  )
 
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%CORVETTE%'
  )
  and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
 order by em.entrydate
 
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning the whole
index.  So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you 
will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.  I think 
this query does what you do 
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,  eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 
00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND 
ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or 
recordtext like '%CORVETTE%'  );

 
 -
 ANALYZE RESULTS
 -
 
   Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual 
 time=446787.056..446787.342 rows=72 loops=1)
 -  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual 
 time=446787.053..446787.075 rows=72 loops=1)
   Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
   -  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) 
 (actual time=444378.655..446786.746 rows=72 loops=1)
 -  Nested Loop  (cost=771835.10..774688.81 rows=1 width=148) 
 (actual time=444378.532..446768.381 rows=72 loops=1)
   -  Nested Loop IN Join  (cost=771835.10..774678.88 
 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
 -  Nested Loop  (cost=771835.10..774572.05 
 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
   -  HashAggregate  
 (cost=771835.10..771835.10 rows=1 width=17) (actual 
 time=444366.702..444368.583 rows=473 loops=1)
 -  Seq Scan on ea  
 (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 
 rows=592 loops=1)
   Filter: ((recordtext)::text ~~ 
 '%CORVETTE%'::text)
   -  Index Scan using ea1 on ea  
 (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 
 loops=473)
 Index Cond: ((ea.incidentid)::text = 
 (outer.incidentid)::text)
 Filter: (((recordtext)::text ~~ 
 '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
 -  Index Scan using ea1 on ea  
 (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 
 loops=1011)
   Index Cond: ((outer.incidentid)::text = 
 (ea.incidentid)::text)
   Filter: ((recordtext)::text ~~ 
 '%RED%'::text)
   -  Index Scan using em_incidentid_idx on em  
 (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
 Index Cond: ((outer.incidentid)::text = 
 (em.incidentid)::text)
 Filter: ((entrydate = '2005-01-01 
 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 
 00:00:00'::timestamp without time zone))
 -  Index Scan using eg_incidentid_idx on eg  
 (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
   Index Cond: ((outer.incidentid)::text = 
 (eg.incidentid)::text)
   Total runtime: 446871.880 ms
 (22 rows)
 
 
 -
 EXPLANATION
 -
 The reason for the redundant LIKE clause is that first, I only want 
 those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT, 
 those two words may exist across multiple records with the same 
 incidentid.  Then, I only want to actually work with the rows that 
 contain one of the words.  This query will repeat the same logic for 
 however many keywords are entered 

Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
 
 On May 8, 2005, at 6:51 PM, Russell Smith wrote:
 
[snip]
  select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
  eg.lat
  FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
  '2005-1-1 00:00'
  AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
  AND ea.recordtext like '%CORVETTE%')
  JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
  '%RED%' or recordtext like '%CORVETTE%'  );
 
 
 I have run this, and while it is very fast, I'm concerned it's not  
 doing what I need.
How fast is very fast?


 Here's the situation: 
 
 Due to the format of the systems with which I integrate ( I have no  
 control over these formats ), we will get these 'recordtext' values one  
 line at a time, accumulating over time.  The only way I can find to  
 make this work is to insert a new record for each line.  The problem  
 is, that when someone wants to search multiple keywords, they expect  
 these words to be matched across multiple records with a given incident  
 number.
 
   For a very simple example:
 
 IncidentID  DateRecordtext
 --  -
 ---
 1   2005-05-01 14:21 blah blah blah RED blah blah
    2005-05-01 14:23 not what we are looking for
 1   2005-05-02 02:05 blah CORVETTE blah blah
 
 So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
 '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
 condition will only be applied to a single row of recordtext at a time,  
 not a whole group with the same incident number.
 
 If I were to use tsearch2 for full-text indexing, would I need to  
 create another table that merges all of my recordtext rows into a  
 single 'text' field type?  If so, this is where I run into problems, as  
 my logic also needs to match multiple words in their original order.  I  
 may also receive additional updates to the previous data.  In that  
 case, I need to replace the original record with the latest version of  
 it.  If I have already concatenated these rows into a single field, the  
 logic to in-line replace only the old text that has changed is very  
 very difficult at best.  So, that's the reason I had to do two  
 subqueries in my example.  Please tell me if I misunderstood your logic  
 and it really will match given my condition above, but it didn't seem  
 like it would.
 
 Thanks again for the quick responses!  This list has been a great  
 resource for me.
 
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  OR 
ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate.  However I would cool it VERY NASTY. Josh's 
solutions may be better.
However much of the data should be in memory once the subplans are done, so it 
may be quite fast.
you may 
 

 -Dan
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] column name is LIMIT

2005-03-13 Thread Russell Smith
On Mon, 14 Mar 2005 06:14 pm, Gourish Singbal wrote:
 Guys,
 
 I am having a problem firing queries on one of the tables which is
 having limit as the column name.
 
 If a run an insert/select/update command on that table i get the below error.
 
 ERROR:  syntax error at or near limit at character 71

select limit from limit_table WHERE limit  50 LIMIT 2;

You need to quote the field name, and make sure the case is correct.
 
 Any Help would be realyl great to solve the problem.
 
 postgresql 7.4.5 and linux OS
 
You should probably upgrade to 7.4.7

Regards

Russell Smith.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Index use and slow queries

2005-03-12 Thread Russell Smith
On Sun, 13 Mar 2005 04:40 pm, Tom Pfeifer wrote:
 Hello,
 
 
 My version of Postgresql is 7.4.3. 
 I have a simple table with 2 indexes: 
  Table public.tst 
  Column |    Type |  Modifiers 
 +-+- 
  tst_id | bigint  | default nextval('tst_id_seq'::text) 
  mmd5   | character varying(32)   | not null 
  active | character(1)    | not null 
  lud    | timestamp without time zone | default now() 
 Indexes: 
     tst_idx unique, btree (mmd5, active) 
     tst_tst_id_key unique, btree (tst_id) 
  
 
 
 There are exactly 1,000,000 (one million) rows in the table (tst).  There are 
 no NULLS, empty columns in any row.
 
 
 I get really fast response times when using the following select statement 
 (Less than 1 second). 
 maach=# explain select * from tst where mmd5 = 
 '71e1c18cbc708a0bf28fe106e03256c7' and active = 'A'; 
   QUERY PLAN 
 --
  
  Index Scan using tst_idx on tst  (cost=0.00..6.02 rows=1 width=57) 
    Index Cond: (((mmd5)::text = '71e1c18cbc708a0bf28fe106e03256c7'::text) AND 
 (active = 'A'::bpchar)) 
 (2 rows) 
  
 
 
 I get really slow repoonse times when using the following select statement 
 (About 20 seconds). 
 maach=# explain select * from tst where tst_id = 639246; 

Before 8.0, bigint would not use an index unless you cast it, or quote it.

eg
explain select * from tst where tst_id = 639246::int8; 
explain select * from tst where tst_id = '639246'; 

Hope this helps.

Russell Smith

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to boost performance of ilike queries ?

2005-01-25 Thread Russell Smith
On Tue, 25 Jan 2005 07:23 pm, Antony Paul wrote:
 Creating an index and using lower(column) does not change the explain
 plan estimates.
 It seems that it is not using index for like or ilike queries
 irrespective of whether it have a pattern matching character in it or
 not. (using PostgreSQL 7.3.3)
 
 On googling I found this thread 
 
 http://archives.postgresql.org/pgsql-sql/2004-11/msg00285.php
 
 It says that index is not used if the search string begins with a % symbol.

What exactly are the type of like queries you are going?  there is a solution
for having the % at the start, but you can win everyway.

 
 rgds
 Antony Paul
 
 On Mon, 24 Jan 2005 20:58:54 +1100, Russell Smith [EMAIL PROTECTED] wrote:
  On Mon, 24 Jan 2005 08:18 pm, Antony Paul wrote:
   Hi,
   I have a query which is executed using ilike. The query values are
   received from user and it is executed using PreparedStatement.
   Currently all queries are executed as it is using iilike irrespective
   of whether it have a pattern matching character or not. Can using =
   instead of ilike boot performance ?.  If creating index can help then
   how the index should be created on lower case or uppercase ?.
   
  It depends on the type of queries you are doing.
  
  changing it to something like  lower(column) like lower('text%'), and
  creating an index on lower(column) will give you much better performance.
  
  If you have % in the middle of the query, it will still be slow, but I 
  assume that is not
  the general case.
  
  I am not sure what the effect of it being prepared will be, however I've 
  had much success
  with the method above without the queries being prepared.  Others may be 
  able to offer advice
  about if prepare will effect it.
  
  Regards
  
  Russell Smith
 
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to boost performance of ilike queries ?

2005-01-24 Thread Russell Smith
On Mon, 24 Jan 2005 08:18 pm, Antony Paul wrote:
 Hi,
 I have a query which is executed using ilike. The query values are
 received from user and it is executed using PreparedStatement.
 Currently all queries are executed as it is using iilike irrespective
 of whether it have a pattern matching character or not. Can using =
 instead of ilike boot performance ?.  If creating index can help then
 how the index should be created on lower case or uppercase ?.
 
It depends on the type of queries you are doing.

changing it to something like  lower(column) like lower('text%'), and
creating an index on lower(column) will give you much better performance.

If you have % in the middle of the query, it will still be slow, but I assume 
that is not
the general case.

I am not sure what the effect of it being prepared will be, however I've had 
much success
with the method above without the queries being prepared.  Others may be able 
to offer advice
about if prepare will effect it.

Regards

Russell Smith

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Russell Smith
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
 On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

[snip]
  Honestly I expected it to be slower (which it was), but I figured it's
  worth seeing what alternate plans it'll generate (specifically to see how
  it cost a nested loop on that join to compare to the fast plan).
  Unfortunately, it generated a merge join, so I think it might require both
  enable_hashjoin=false and enable_mergejoin=false to get it which is likely
  to be even slower in practice but still may be useful to see.
 
 Setting both to false gives a dramatic performance boost.  See 
 http://rafb.net/paste/results/b70KAi42.html
 
 -  Materialize  (cost=15288.70..15316.36 rows=2766 width=35) (actual 
time=0.004..0.596 rows=135 loops=92)
   -  Nested Loop  (cost=0.00..15288.70 rows=2766 width=35) 
(actual time=0.060..9.130 rows=135 loops=1)

The Planner here has a quite inaccurate guess at the number of rows that will 
match in the join.  An alternative to 
turning off join types is to up the statistics on the Element columns because 
that's where the join is happening.  Hopefully the planner will
get a better idea.  However it may not be able too.  2766 rows vs 135 is quite 
likely to choose different plans.  As you can
see you have had to turn off two join types to give something you 
wanted/expected.

 This gives suitable speed, but why does the plan vary so much with 
 such a minor change in the WHERE clause?
Plan 1 - broken
   -  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual 
time=0.056..16.161 rows=218 loops=1)

Plan 2 - deprecated
-  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35) (actual 
time=7.638..1158.128 rows=135 loops=1)

The performance difference is when the where is changed, you have a totally 
different set of selection options.
The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by 
a factor of 2 for plan 1.  But for plan 2
its a factor of 20.  The planner is likely to make the wrong choice when the 
stats are out by that factor.

Beware what is a small typing change does not mean they queries are anything 
alight.

Regards

Russell Smith.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] query plan question

2004-11-16 Thread Russell Smith
On Wed, 17 Nov 2004 02:54 pm, you wrote:
 I have a query for which postgres is generating a different plan on different 
 machines. The database schema is the same, the dataset is the same, the 
 configuration is the same (e.g., pg_autovacuum running in both cases), both 
 systems are Solaris 9. The main difference in the two systems is that one is 
 sparc and the other is intel.
 
 The query runs in about 40 ms on the intel box, but takes about 18 seconds on 
 the sparc box. Now, the intel boxes we have are certainly faster, but I'm 
 curious why the query plan might be different.
 
 For the intel:
 
 QUERY PLAN
 Unique  (cost=11.50..11.52 rows=2 width=131)
   -  Sort  (cost=11.50..11.50 rows=2 width=131)
 Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
 -  Hash Join  (cost=10.42..11.49 rows=2 width=131)
   Hash Cond: (outer.dbid = inner.schema)
   -  Seq Scan on schema s  (cost=0.00..1.02 rows=2 width=128)
   -  Hash  (cost=10.41..10.41 rows=4 width=11)
 -  Nested Loop  (cost=0.00..10.41 rows=4 width=11)
   -  Nested Loop  (cost=0.00..2.14 rows=4 width=4)
 -  Seq Scan on flow fl  (cost=0.00..0.00 
 rows=1 width=4)
   Filter: (servicetype = 646)
 -  Index Scan using usage_flow_i on usage 
 u  (cost=0.00..2.06 rows=6 width=8)
   Index Cond: (u.flow = outer.dbid)
   -  Index Scan using usageparameter_usage_i on 
 usageparameter up  (cost=0.00..2.06 rows=1 width=15)
 Index Cond: (up.usage = outer.dbid)
 Filter: ((prefix)::text  'xsd'::text)
 
 For the sparc:
 
 QUERY PLAN
 Unique  (cost=10.81..10.83 rows=1 width=167)
   -  Sort  (cost=10.81..10.82 rows=1 width=167)
 Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
 -  Nested Loop  (cost=9.75..10.80 rows=1 width=167)
   Join Filter: (outer.flow = inner.dbid)
   -  Hash Join  (cost=9.75..10.79 rows=1 width=171)
 Hash Cond: (outer.dbid = inner.schema)
 -  Seq Scan on schema s  (cost=0.00..1.02 rows=2 
 width=128)
 -  Hash  (cost=9.75..9.75 rows=1 width=51)
   -  Nested Loop  (cost=0.00..9.75 rows=1 width=51)
 Join Filter: (inner.usage = outer.dbid)
 -  Index Scan using usage_flow_i on usage 
 u  (cost=0.00..4.78 rows=1 width=8)
 -  Index Scan using usageparameter_schema_i 
 on usageparameter up  (cost=0.00..4.96 rows=1 width=51)
   Filter: ((prefix)::text  'xsd'::text)
   -  Seq Scan on flow fl  (cost=0.00..0.00 rows=1 width=4)
 Filter: (servicetype = 646)
 
Unique  (cost=11.50..11.52 rows=2 width=131)
Unique  (cost=10.81..10.83 rows=1 width=167)

The estimations for the cost is basically the same, 10ms for the first row.  
Can you supply Explain analyze to see what it's actually doing?

Russell Smith

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Postgresql is using seqscan when is should use indexes.

2004-11-08 Thread Russell Smith
On Mon, 8 Nov 2004 09:40 pm, Andreas kre Solberg wrote:
 We have two tables, dst_port_hour and dst_port_day, which should be
 very similar, they both have about 50.000.000 rows. In both tables we
 have an index for period_id.
 
 We run postgresql  7.4.5 on a dedicated Debian server, with dual Intel
 Xeon 3GHz and 4GB memory.
 
 The problem is that on the dst_port_day table, postgresql is using
 seqscan, and not the index when it should. Forcing the use of the index
 by setting enable_seqscan to false, makes the query lighthening fast.
 When using seqscan, the query takes several minutes. The planner
 calculates the cost for Index scan to be much more than sequence scan.
 
 Why is our query planner misbehaving?
 
 Here are the exaplain analyze output with and without index-force:
 
 
 SET enable_seqscan=false;
 
 stager=  explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE
 cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC
   LIMIT 5;
 
dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual 
time=0.038..303.801 rows=48072 loops=1)

The guess of the number of rows returned by the index scan is out by a factor 
of 10.  500k rows is greater than 1% of
the rows, so I think the planner is likely to choose a sequence scan at this 
amount, unless you have tuned things like
random page cost.

What is the selectivity like on that column?
Have you analyzed recently?

If so, you should probably increase the statistics on that column
See ALTER TABLE SET STATISTICS in the manual.

 QUERY PLAN
 -
   Limit  (cost=2022664.62..2022664.63 rows=5 width=12) (actual 
 time=831.772..831.816 rows=5 loops=1)
 -  Sort  (cost=2022664.62..2022664.82 rows=80 width=12) (actual 
 time=831.761..831.774 rows=5 loops=1)
   Sort Key: sum(octets)
   -  HashAggregate  (cost=2022661.89..2022662.09 rows=80 width=12) 
 (actual time=587.036..663.991 rows=16396 loops=1)
 -  Index Scan using dst_port_day_period_id_key on 
 dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual 
 time=0.038..303.801 rows=48072 loops=1)
   Index Cond: (period_id = 2779)
   Total runtime: 836.362 ms
 (7 rows)
 
 
 
 SET enable_seqscan=true;
 
 stager=   explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE 
 cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC  
 LIMIT 5;
 
 QUERY PLAN
 --
   Limit  (cost=1209426.88..1209426.89 rows=5 width=12) (actual 
 time=299053.006..299053.053 rows=5 loops=1)
 -  Sort  (cost=1209426.88..1209427.08 rows=80 width=12) (actual 
 time=299052.995..299053.008 rows=5 loops=1)
   Sort Key: sum(octets)
   -  HashAggregate  (cost=1209424.15..1209424.35 rows=80 width=12) 
 (actual time=298803.273..298881.020 rows=16396 loops=1)
 -  Seq Scan on dst_port_day cur  (cost=0.00..1206693.40 
 rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072 loops=1)
   Filter: (period_id = 2779)
   Total runtime: 299057.643 ms
 (7 rows)
 

Regards

Russell Smith

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select with qualified join condition / Batch inserts

2004-10-19 Thread Russell Smith
On Fri, 15 Oct 2004 08:47 pm, Gavin Sherry wrote:
 On Fri, 15 Oct 2004, Bernd wrote:
 
  Hi,
[snip]

  Table-def:
  Table public.scr_well_compound
 Column   |  Type  | Modifiers
  ++---
   mat_id | numeric(10,0)  | not null
   barcode| character varying(240) | not null
   well_index | numeric(5,0)   | not null
   id_level   | numeric(3,0)   | not null
   compound   | character varying(240) | not null
  Indexes:
  scr_wcm_pk PRIMARY KEY, btree (id_level, mat_id, barcode, well_index)
 
numeric is not optimized by postgresql like it is by Oracle.  You will get much better
performance by changing the numeric types to int, big int, or small int.

That should get the query time down to somewhere near what Oracle is giving you.

Regards

Russell Smith.



[snip]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Query planner problem

2004-10-02 Thread Russell Smith
On Sat, 2 Oct 2004 08:06 am, Ryan VanMiddlesworth wrote:

[snip]
 
 
 Here is the query and EXPLAIN that runs quickly:
   SELECT case_id FROM case_data 
   WHERE case_filed_date  '2004-09-16' 
 AND case_filed_date  '2004-09-20'
 
QUERY PLAN
 -
 Index Scan using case_data_case_filed_date on case_data  
 (cost=0.00..13790.52 rows=3614 width=18)
   Index Cond: ((case_filed_date  '2004-09-16'::date) 
AND (case_filed_date  '2004-09-20'::date))
 
 
 And here is the query and EXPLAIN from the version that I believe the planner 
 should reduce to be logically equivalent:
  SELECT case_id FROM case_data 
  WHERE (('2004-09-16' IS NULL) OR (case_filed_date  '2004-09-16'))
AND (('2004-09-20' IS NULL) OR (case_filed_date  '2004-09-20'))
 
QUERY PLAN
 -
 Seq Scan on case_data  (cost=0.00..107422.02 rows=27509 width=18)
   Filter: ((('2004-09-16' IS NULL) OR (case_filed_date  '2004-09-16'::date))
AND (('2004-09-20' IS NULL) OR (case_filed_date  '2004-09-20'::date)))
 
 
 I was hoping that the null comparisons would get folded out by the planner 
 relatively cheaply. But as you can see, the first query uses indexes and the 
 second one uses sequence scans, thereby taking much longer.  I guess my 
 question is - is there a better way to accomplish what I'm doing in SQL or am 
 I going to have to dynamically generate the statement based on supplied 
 parameters?
 
The Index does not store NULL values, so you have to do a tables scan to find NULL 
values.
That means the second query cannot use an Index, even if it wanted to.

Regards

Russell Smith


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query performance issue with 8.0.0beta1

2004-08-27 Thread Russell Smith
7.4.2
  Aggregate  (cost=46817.89..46817.89 rows=1 width=0) (actual time=401.216..401.217 
 rows=1 loops=1)
-  Index Scan using snsdata_codpar on SNS_DATA  (cost=0.00..46817.22 rows=268 
 width=0) (actual time=165.948..400.258 rows=744 loops=1)
  Index Cond: ((Cod_Par)::text = '17476'::text)
  Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp 
 without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp 
 without time zone))
  Total runtime: 401.302 ms
 
Row counts are out by a factor of 3, on the low side. so the planner will guess index 
is better, which it is.

 ***while on 8.0.0***
  Aggregate  (cost=93932.91..93932.91 rows=1 width=0) (actual 
 time=14916.371..14916.371 rows=1 loops=1)
-  Seq Scan on SNS_DATA  (cost=0.00..93930.14 rows=1108 width=0) (actual 
 time=6297.152..14915.330 rows=744 loops=1)
  Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp 
 without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp 
 without time zone) AND ((Cod_Par)::text = '17476'::text))
  Total runtime: 14916.935 ms
Planner guesses that 1108 row should be returned, which is out by less, but on the 
high side.
Big question is given there are 2M rows, why does returning 1108 rows, less than 1% 
result in a sequence scan.
Usually the selectivity on the index is bad, try increasing the stats target on the 
column.

I know 8.0 has new stats anaylsis code, which could be effecting how it choses the 
plan. But it would still
require a good amount of stats to get it to guess correctly.

Increase stats and see if the times improve.

 
 And I if disable the seqscan
 SET enable_seqscan = false;
 
 I get the following:
 
 Aggregate  (cost=158603.19..158603.19 rows=1 width=0) (actual 
 time=4605.862..4605.863 rows=1 loops=1)
-  Index Scan using snsdata_codpar on SNS_DATA  (cost=0.00..158600.41 
 rows=1108 width=0) (actual time=2534.422..4604.865 rows=744 loops=1)
  Index Cond: ((Cod_Par)::text = '17476'::text)
  Filter: ((Data_Arrivo_Campione = '2004-01-01 00:00:00'::timestamp 
 without time zone) AND (Data_Arrivo_Campione = '2004-01-31 23:59:59'::timestamp 
 without time zone))
  Total runtime: 4605.965 ms
 
 The total runtime is bigger (x10 !!) than the old one.
Did you run this multiple times, or is this the first time.  If it had to get the data 
off disk it will be slower.
Are you sure that it's coming from disk in this and the 7.4 case? or both from memory.
If 7.4 is from buffer_cache, or kernel_cache, and 8.0 is from disk you are likely to 
get A LOT slower.

 
 The memory runtime parameters are 
 shared_buffer = 2048
 work_mem = sort_mem = 2048
 
[ snip ]

 The table has 2M of records
 Can it be a datatype conversion issue?
That should not be an issue in 8.0, at least for the simple type conversions.  like 
int8 to int4.
I'm not 100% sure which ones were added, and which were not, but the query appears to 
cast everything correctly anyway.

 Can it be depend on the the type of restore (with COPY commands)?
Shouldn't and VACUUM FULL ANALYZE will make the table as small as possible.  The row 
order may be different
on disk, but the planner won't know that, and it's a bad plan causing the problem.

 I have no idea.
 
 Thanks in advance!
 Reds
 
Regards

Russell Smith.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-11 Thread Russell Smith
[snip]
 
 One question I do have though - you specifically mentioned NOW() as
 something to watch out for, in that it's mutable.  We typically use COUNT()
 as a subselect to retrieve the number of associated rows to the current
 query.  Additionally, we use NOW a lot, primarily to detect the status of a
 date, i.e.:
 
 SELECT id FROM subscriptions WHERE userid = 1 AND timeend  NOW();
 
 Is there a better way to do this?  I was under the impression that NOW() was
 pretty harmless, just to return a current timestamp.
 
NOW() will trigger unnessecary sequence scans.  As it is unknown with prepared
query and function when the statement is run, the planner plans the query with
now as a variable.  This can push the planner to a seq scan over and index scan.
I have seen this time and time again.

You can create your own immutable now, but don't use it in functions or prepared 
queries
or you will get wrong results.

 Based on feedback, I'm looking at a minor upgrade of our RAID controller to
 a 3ware 9000 series (SATA with cache, battery backup optional), and
 re-configuring it for RAID 10.  It's a damn cheap upgrade at around $350 and
 an hour of downtime, so I figure that it's worth it for us to give it a
 shot.
 
 Thanks,
 
 Jason

Russell Smith

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] my boss want to migrate to ORACLE

2004-07-29 Thread Russell Smith
On Thu, 29 Jul 2004 03:08 am, Stephane Tessier wrote:
 Hi everyone,

 somebody can help me??? my boss want to migrate to
 ORACLE

 we have a BIG problem of performance,it's slow
 we use postgres 7.3 for php security application with approximately 4
 millions of insertion by day and 4 millions of delete and update
 and archive db with 40 millions of archived stuff...
This is heavy update.  as I say below, what is the vacuum setup like?

 we have 10 databases for our clients and a centralized database for the
 general stuff.

 database specs:

 double XEON 2.4 on DELL PowerEdge2650
 2 gigs of RAM
 5 SCSI Drive RAID 5 15rpm

 tasks:

 4 millions of transactions by day
 160 open connection 24 hours by day 7 days by week
 pg_autovacuum running 24/7
 reindex on midnight
Where is your pg_autovacuum config?  how often is it set to vacuum? and 
analyze for that matter.


 postgresql.conf:

 tcpip_socket = true
 #ssl = false

 max_connections = 256
 #superuser_reserved_connections = 2

 #port = 5432
 #hostname_lookup = false
 #show_source_port = false

 #unix_socket_directory = ''
 #unix_socket_group = ''
 #unix_socket_permissions = 0777 # octal

 #virtual_host = ''

 #krb_server_keyfile = ''


 #
 #   Shared Memory Size
 #
 #shared_buffers = 256   # min max_connections*2 or 16, 8KB each
 #shared_buffers = 196000# min max_connections*2 or 16, 8KB
 each
 shared_buffers = 128000 # min max_connections*2 or 16, 8KB each
 #max_fsm_relations = 1000   # min 10, fsm is free space map, ~40 bytes
 max_fsm_pages = 100 # min 1000, fsm is free space map, ~6 bytes
 #max_locks_per_transaction = 64 # min 10
 #wal_buffers = 8# min 4, typically 8KB each
I would assume given heavy update you need more WAL buffers, but then I don't 
know a lot.

 #
 #   Non-shared Memory Sizes
 #
 #sort_mem = 32168   # min 64, size in KB
 #vacuum_mem = 8192  # min 1024, size in KB
 vacuum_mem = 65536  # min 1024, size in KB

 #
 #   Write-ahead log (WAL)
 #
 #checkpoint_segments = 3# in logfile segments, min 1, 16MB each
 #checkpoint_timeout = 300   # range 30-3600, in seconds
3 checkpoint_segments is too low for the number of inserts/delete/updates you 
are doing.  you need a much larger check_point, something like 10+  but the 
tuning docs will give you a better idea.

 #
 #commit_delay = 0   # range 0-10, in microseconds
 #commit_siblings = 5# range 1-1000
 #
 #fsync = true
 #wal_sync_method = fsync# the default varies across platforms:
 #   # fsync, fdatasync, open_sync, or
 open_datasync
 #wal_debug = 0  # range 0-16


 #
 #   Optimizer Parameters
 #
 #enable_seqscan = true
 #enable_indexscan = true
 #enable_tidscan = true
 #enable_sort = true
 #enable_nestloop = true
 #enable_mergejoin = true
 #enable_hashjoin = true

 #effective_cache_size = 1000# typically 8KB each
 effective_cache_size = 196608   # typically 8KB each
 #random_page_cost = 4   # units are one sequential page fetch cost
 #cpu_tuple_cost = 0.01  # (same)
 #cpu_index_tuple_cost = 0.001   # (same)
 #cpu_operator_cost = 0.0025 # (same)

 #default_statistics_target = 10 # range 1-1000

 #
 #   GEQO Optimizer Parameters
 #
 #geqo = true
 #geqo_selection_bias = 2.0  # range 1.5-2.0
 #geqo_threshold = 11
 #geqo_pool_size = 0 # default based on tables in statement,
 # range 128-1024
 #geqo_effort = 1
 #geqo_generations = 0
 #geqo_random_seed = -1  # auto-compute seed


 #
 #   Message display
 #
 server_min_messages =notice # Values, in order of decreasing detail:
 #   debug5, debug4, debug3, debug2, debug1,
 #   info, notice, warning, error, log,
 fatal,
 #   panic
 client_min_messages =notice # Values, in order of decreasing detail:
 #   debug5, debug4, debug3, debug2, debug1,
 #   log, info, notice, warning, error
 #silent_mode = false

 #log_connections =true
 #log_pid =true
 #log_statement =true
 #log_duration =true
 #log_timestamp =true

 log_min_error_statement =error
 # Values in order of increasing severity:
  #   debug5, debug4, debug3, debug2,
 debug1, #   info, notice, warning, error, panic(off)

 #debug_print_parse = false
 #debug_print_rewritten = false
 #debug_print_plan = false
 #debug_pretty_print = false

 #explain_pretty_print = true

 # requires USE_ASSERT_CHECKING
 #debug_assertions = true


 #
 #   Syslog
 #
 syslog = 0  # range 0-2
 syslog_facility = 'LOCAL0'
 syslog_ident = 'postgres'


 #
 #   Statistics
 #
 show_parser_stats = false
 show_planner_stats =false
 show_executor_stats = false
 show_statement_stats =false

 # requires 

[PERFORM] Use of Functional Indexs and Planner estimates

2004-06-08 Thread Russell Smith
Dear All,

I have a table with approximately 570k Rows.


  Table filter.rules
  Column  |  Type  |   Modifiers
--++
 rulename | character varying(16)  | not null default ''::character varying
 uri  | character varying(200) | not null default ''::character varying
 redirect | character varying(200) | not null default ''::character varying
 moddate  | date   | not null default ('now'::text)::date
 active   | boolean| not null default true
 comment  | character varying(255) | not null default ''::character varying
Indexes:
rules_pkey primary key, btree (rulename, uri)
moddate_idx btree (moddate)
rules_idx btree (lower((uri)::text))

Statistic on the uri column have been set 1000
Vacuum full and analyze was run before tests, no alteration to tables since then.

# analyze verbose filter.rules;
INFO:  analyzing filter.rules
INFO:  rules: 5228 pages, 30 rows sampled, 570533 estimated total rows
ANALYZE

# explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) 
IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY 
rulename,redirect;
   
QUERY PLAN

 HashAggregate  (cost=22352.79..22352.79 rows=1 width=12) (actual 
time=2047.331..2047.332 rows=1 loops=1)
   -  Seq Scan on rules  (cost=0.00..22296.32 rows=11294 width=12) (actual 
time=540.149..2047.308 rows=1 loops=1)
 Filter: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 
'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 
'com/'::text))
 Total runtime: 2047.420 ms
(4 rows)

# SET enable_seqscan=off;

# explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) 
IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY 
rulename,redirect;
   
QUERY PLAN

 HashAggregate  (cost=38970.68..38970.68 rows=1 width=12) (actual time=0.328..0.328 
rows=1 loops=1)
   -  Index Scan using rules_idx, rules_idx, rules_idx, rules_idx on rules  
(cost=0.00..38914.21 rows=11294 width=12) (actual time=0.210..0.312 rows=1 loops=1)
 Index Cond: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 
'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 
'com/'::text))
 Total runtime: 0.700 ms
(4 rows)

Could anybody offer explanations of why the planner does such a terrible job of 
estimated the number of rows for this query, with the stats set so high.
Tests were also done with stats set to 100, and 1.  The results are exactly the same.  
Which I would have assumed.

Also I am interested in how functional indexes have statistics collected for them, if 
they do.  As to possibly minimize or avoid this problem in the future.

Thanks for your considersation of this matter.

Regards

Russell Smith.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match