Re: [PERFORM] 8.1 -> 8.4 regression

2010-02-15 Thread Dennis Björklund
Can you force 8.4 to generate the same plan as 8.1? For example by running

  SET enable_hashjoin = off;

before you run EXPLAIN on the query? If so, then we can compare the
numbers from the forced plan with the old plan and maybe figure out why it
didn't use the same old plan in 8.4 as it did in 8.1.

Note that the solution is not to force the plan, but it can give us more
information.

/Dennis

> is at least one query which has degraded in performance quite a bit. Here
> is the plan on 8.4.2:
> http://wood.silentmedia.com/bench/842
>
> Here is the very much less compact plan for the same query on 8.1.19:
> http://wood.silentmedia.com/bench/8119



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

2008-08-28 Thread Matthew Dennis
On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe <[EMAIL PROTECTED]>wrote:

> > wait a min here, postgres is supposed to be able to survive a complete
> box
> > failure without corrupting the database, if killing a process can corrupt
> > the database it sounds like a major problem.
>
> Yes it is a major problem, but not with postgresql.  It's a major
> problem with the linux OOM killer killing processes that should not be
> killed.
>
> Would it be postgresql's fault if it corrupted data because my machine
> had bad memory?  Or a bad hard drive?  This is the same kind of
> failure.  The postmaster should never be killed.  It's the one thing
> holding it all together.
>

I fail to see the difference between the OOM killing it and the power going
out.  And yes, if the power went out and PG came up with a corrupted DB
(assuming I didn't turn off fsync, etc) I *would* blame PG.  I understand
that killing the postmaster could stop all useful PG work, that it could
cause it to stop responding to clients, that it could even "crash" PG, et
ceterabut if a particular process dying causes corrupted DBs, that sounds
borked to me.


Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-30 Thread Dennis Brakhane
Slightly off-topic, but judging from the fact that you were able to
"fix" the query, it seems you have some way to modify the application
code itself. In that case, I'd try to implement caching (at least for
this statement) on the application side, for example with memcached.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index scan cost

2008-07-17 Thread Dennis Brakhane
The "fast" server makes a much more accurate estimation of the number
of rows to expect (4 rows are returning, 1 was estimated). The "slow"
server estimates 1151 rows. Try running ANALYZE on the slow one

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible Redundancy/Performance Solution

2008-05-07 Thread Dennis Muhlestein




1) RAID1+0 to make one big volume
2) RAID1 for OS/apps/etc, RAID1 for database
3) RAID1 for OS+xlog, RAID1 for database
4) RAID1 for OS+popular tables, RAID1 for rest of database


Lots of good info, thanks for all the replies.  It seems to me then, 
that the speed increase you'd get from raid0 is not worth the downtime 
risk, even when you have multiple servers.  I'll start pricing things 
out and see what options we have.


Thanks again,
Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible Redundancy/Performance Solution

2008-05-06 Thread Dennis Muhlestein

Greg Smith wrote:

On Tue, 6 May 2008, Dennis Muhlestein wrote:


 > Since disks are by far the most likely thing to fail, I think it would
be bad planning to switch to a design that doubles the chance of a disk 
failure taking out the server just because you're adding some 
server-level redundancy.  Anybody who's been in this business for a 
while will tell you that seemingly improbable double failures happen, 
and if were you'd I want a plan that survived a) a single disk failure 
on the primary and b) a single disk failure on the secondary at the same 
time.


Let me strengthen that--I don't feel comfortable unless I'm able to 
survive a single disk failure on the primary and complete loss of the 
secondary (say by power supply failure), because a double failure that 
starts that way is a lot more likely than you might think.  Especially 
with how awful hard drives are nowadays.


Those are good points.  So you'd go ahead and add the pgpool in front 
(or another redundancy approach, but then use raid1,5 or perhaps 10 on 
each server?


-Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible Redundancy/Performance Solution

2008-05-06 Thread Dennis Muhlestein

Greg Smith wrote:

On Tue, 6 May 2008, Dennis Muhlestein wrote:


RAID0 on two disks makes a disk failure that will wipe out the database 
twice as likely.  If you goal is better reliability, you want some sort 
of RAID1, which you can do with two disks.  That should increase read 
throughput a bit (not quite double though) while keeping write 
throughput about the same.


I was planning on pgpool being the cushion between the raid0 failure 
probability and my need for redundancy.  This way, I get protection 
against not only disks, but cpu, memory, network cards,motherboards etc. 
   Is this not a reasonable approach?




If you added four disks, then you could do a RAID1+0 combination which 
should substantially outperform your existing setup in every respect 
while also being more resiliant to drive failure.


Our applications are mostly read intensive.  I don't think that having 
two databases on one machine, where previously we had just one, would 
add too much of an impact, especially if we use the load balance 
feature of pgpool as well as the redundancy feature.


A lot depends on how much RAM you've got and whether it's enough to keep 
the cache hit rate fairly high here.  A reasonable thing to consider 
here is doing a round of standard performance tuning on the servers to 
make sure they're operating efficient before increasing their load.



Can anyone comment on any gotchas or issues we might encounter?


Getting writes to replicate to multiple instances of the database 
usefully is where all the really nasty gotchas are in this area.  
Starting with that part and working your way back toward the front-end 
pooling from there should crash you into the hard parts early in the 
process.



Thanks for the tips!
Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Possible Redundancy/Performance Solution

2008-05-06 Thread Dennis Muhlestein

Right now, we have a few servers that host our databases.  None of them
are redundant.  Each hosts databases for one or more applications.
Things work reasonably well but I'm worried about the availability of
some of the sites.  Our hardware is 3-4 years old at this point and I'm 
not naive to the possibility of drives, memory, motherboards or whatever 
failing.


I'm toying with the idea of adding a little redundancy and maybe some
performance to our setup.  First, I'd replace are sata hard drives with
a scsi controller and two scsi hard drives that run raid 0 (probably 
running the OS and logs on the original sata drive).  Then I'd run the 
previous two databases on one cluster of two servers with pgpool in 
front (using the redundancy feature of pgpool).


Our applications are mostly read intensive.  I don't think that having 
two databases on one machine, where previously we had just one, would 
add too much of an impact, especially if we use the load balance feature 
of pgpool as well as the redundancy feature.


Can anyone comment on any gotchas or issues we might encounter?  Do you 
think this strategy has possibility to accomplish what I'm originally 
setting out to do?


TIA
-Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Dennis Muhlestein

A B wrote:

So, it is time to improve performance, it is running to slow.
AFAIK (as a novice) there are a few general areas:

1) hardware
2) rewriting my queries and table structures
3) using more predefined queries
4) tweek parameters in the db conf files

Of these points:
1) is nothing I can do about right now, but in the future perhaps.
2) will be quite hard right now since there is more code than time.
3) almost like 2 but perhaps more do-able with the current constraints.
4) This seems to be the easiest one to start with...

So what should I do/read concerning point 4?
If you have other good suggestions  I'd be very interested in that.

Thank you :-)



1st, change your log settings log_min_duration_statement to something 
like 1000 (one second).  This will allow you to see which statements 
take the longest.


2nd.  Use EXPLAIN ANALYZE on those statements to determine what is 
taking a long time and focus on optimizing those statements that take 
the longest to execute.


That ought to get you a long way down the road.

-Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Turn correlated in subquery into join

2008-03-24 Thread Dennis Bjorklund
Look like the mysql people found a subquery that postgresql doesn't 
handle as good as possible:


  http://s.petrunia.net/blog/

Is there some deeper issue here that I fail to see or is it simply that 
it hasn't been implemented but is fairly straigt forward? In the link 
above they do state that it isn't a very common case anyway.


/Dennis

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Postgresql server

2007-04-11 Thread Dennis Bjorklund

Jason Lustig skrev:
and work_mem to 8096. What would cause the computer to only use such a 
small percentage of the CPU, with more than half of it waiting on I/O 
requests?


Do your webpages write things to the database on each connect?

Maybe it do a bunch of writes each individually commited? For every 
commit pg will wait for the data to be written down to the disk platter 
before it move on. So if you do several writes you want to do them in 
one transaction so you only need one commit.


/Dennis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Upgraded to 8.2.3 --- still having performance issues

2007-03-02 Thread Dennis Bjorklund

Carlos Moreno skrev:

The system does very frequent insertions and updates --- the longest
table has, perhaps, some 20 million rows, and it's indexed (the primary
key is the combination of two integer fields).  This longest table only
has inserts (and much less frequent selects), at a peak rate of maybe
one or a few insertions per second.


One or a few inserts per second doesn't sound like that much. I would 
have expected it to work. If you can you might want to group several 
inserts into a single transaction.


A standard hint is also to move the WAL onto its own disk. Or get a disk 
controller with battery backed up ram.


But it's hard to say from your description what the bottleneck is and 
thus hard to give any advice.



Any help/tips/guidance in troubleshooting this issue?  It will be
much appreciated!


You could try to find specific queries that are slow. Pg can for example 
log queries for you that run for longer than X seconds.


/Dennis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Dennis Bjorklund

Carlos Moreno skrev:


When I force it via  "set enable_seqscan to off", the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas

>

For the time being, I'm using an explicit "enable_seqscan off"
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?


For most queries it has never been an issue. Every once in a while there 
is a query that the planner makes a non-optimal plan for, but it's not 
that common.


In general the optimizer has improved with every new version of pg.

Almost everyone I've talked to that has upgraded has got a faster 
database tham before. It was like that for 7.4->8.0, for 8.0->8.1 and 
for 8.1->8.2. So in your case going from 7.4->8.2 is most likely going 
to give a speedup (especially if you have some queries that isn't just 
simple primary key lookups).


In your case it's hard to give any advice since you didn't share the 
EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg 
so it makes the right choice even for this query of yours but without 
the EXPLAIN ANALYZE output we would just be guessing anyway. If you want 
to share it then it might be helpful to show the plan both with and 
without seqscan enabled.


How often do you run VACUUM ANALYZE; on the database?

/Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14

2006-12-31 Thread Dennis Bjorklund

Rolf Østvik skrev:


I have a simple query which uses 32ms on 7.4.14 and 1015ms on 8.2.0.
I guess 7.4.14 creates a better execution plan than 8.2.0 for this query but



Try to turn off planner options in 8.2 to make it generate the same plan 
as 7.4. Then run EXPLAIN ANALYZE on that query that generate the same 
plan as in 7.4 and we can compare the costs and maybe understand what go 
wrong.


For example, try

set enable_hashjoin to false;
set enable_bitmapscan to false;

but you might need to turn off more things to get it to generate the 7.4 
plan.


/Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How import big amounts of data?

2005-12-29 Thread Dennis Bjorklund
On Thu, 29 Dec 2005, Arnau wrote:

>Which is the best way to import data to tables? I have to import 
> 9 rows into a column and doing it as inserts takes ages. Would be 
> faster with copy? is there any other alternative to insert/copy?

Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster.
Copy is even faster, but for just 9 rows I wouldn't bother.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 8.1 iss

2005-11-06 Thread Dennis Bjorklund
On Sun, 6 Nov 2005, PostgreSQL wrote:

> SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING 
> count(*) > 1;
> 
> This is a dual Opteron box with 16 Gb memory and a 3ware SATA raid
> runing 64bit SUSE.  Something seems badly wrong.
> 
> GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
>   Filter: (count(*) > 1)
>   ->  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
> Sort Key: v_barcode
> ->  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)

What do the plan look like in 8.0? Since it's so much faster I assume you 
get a different plan.

> shared_buffers = 5
> work_mem = 16384
> maintenance_work_mem = 16384
> max_fsm_pages = 10
> max_fsm_relations = 5000
> wal_buffers = 32
> checkpoint_segments = 32
> effective_cache_size = 5
> default_statistics_target = 50

The effective_cache_size is way too low, only 390M and you have a machine
with 16G. Try bumping it to 100 (which means almost 8G, how nice it
would be to be able to write 8G instead...). It could be set even higher 
but it's hard for me to know what else your memory is used for.

I don't know if this setting will affect this very query, but it should 
have a positive effect on a lot of queries.

work_mem also seems low, but it's hard to suggest a good value on it
without knowing more about how your database is used.
 
-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Need help in setting optimal configuration for a huge

2005-10-23 Thread Dennis Bjorklund
On Sun, 23 Oct 2005, Kishore B wrote:

> We need to insert into the bigger table almost for every second , through
> out the life time. In addition, we receive at least 200,000 records a day at
> a fixed time.
> 
> We are facing a* critical situation because of the performance of the **
> database**.* Even a basic query like select count(*) from bigger_table is
> taking about 4 minutes to return.

Count(*) like that always scans the full table, but 4 minutes still sound
like a lot.  How often do you vacuum? Could it be that the disk is full of
garbage due to not enough vacuum?

A query like this can help find bloat:

 SELECT oid::regclass, reltuples, relpages FROM pg_class ORDER BY 3 DESC;

I assume to do updates and deletes as well, and not just inserts?

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-09-28 Thread Dennis Bjorklund
On Wed, 28 Sep 2005, Joe wrote:

> Before I post the EXPLAIN and the table schema I'd appreciate
> confirmation that this list is the appropriate forum. 

It is and and useful things to show are

 * the slow query
 * EXPLAIN ANALYZE of the query
 * the output of \d for each table involved in the query
 * the output of SHOW ALL;
 * The amount of memory the machine have

The settings that are the most important to tune in postgresql.conf for
performance is in my opinion; shared_buffers, effective_cache_size and
(to a lesser extent) work_mem.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to improve Postgres performance

2005-08-29 Thread Dennis Bjorklund
On Tue, 30 Aug 2005, Hemant Pandey wrote:

> So please tell me how can i improve database performance through
> configuration parameters. I had tried to change parameters in
> postgresql.conf file but of no avail.
> Now i am trying to Auto Vacuum, but don't know how to run Auto Vacuum.

The most important part is that you need to run VACUUM ANALYZE regulary.  
Vacuum can be started each night in a cron job, started from pg_autovacuum
when it thinks it's needed, or started in some other way. In any case, it
has to be run whenever the data in the database have changed enough.

The parameters in the config that is most important in my experience is 
effective_cache_size and shared_buffers.

This is a text I like (it's for pg 7.4 but still useful):

  http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Need for speed

2005-08-16 Thread Dennis Bjorklund
On Tue, 16 Aug 2005, Ulrich Wisser wrote:

> Still when several users are on line the reporting gets very slow. 
> Queries can take more then 2 min.

Could you show an exampleof such a query and the output of EXPLAIN ANALYZE
on that query (preferably done when the database is slow).

It's hard to say what is wrong without more information.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Dennis Bjorklund
On Wed, 27 Jul 2005, Matthew Schumacher wrote:

> Then they do this to insert the token:
> 
> INSERT INTO bayes_token (
>   id,
>   token,
>   spam_count,
>   ham_count,
>   atime
> ) VALUES (
>   ?,
>   ?,
>   ?,
>   ?,
>   ?
> ) ON DUPLICATE KEY
>   UPDATE
> spam_count = GREATEST(spam_count + ?, 0),
> ham_count = GREATEST(ham_count + ?, 0),
> atime = GREATEST(atime, ?)
> 
> Or update the token:
> 
> UPDATE bayes_vars SET
>   $token_count_update
>   newest_token_age = GREATEST(newest_token_age, ?),
>   oldest_token_age = LEAST(oldest_token_age, ?)
>   WHERE id = ?
> 
> 
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements performing poorly.

How can they perform poorly when they are dead simple? Here are 2
functions that work for the above cases of greatest:

CREATE FUNCTION greatest_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;'
 LANGUAGE SQL;

CREATE FUNCTION least_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;'
 LANGUAGE SQL;

and these should be inlined by pg and very fast to execute.

I wrote a function that should do what the insert above does. The update 
I've not looked at (I don't know what $token_count_update is) but the 
update looks simple enough to just implement the same way in pg as in 
mysql.

For the insert or replace case you can probably use this function:

CREATE FUNCTION insert_or_update_token (xid INTEGER,
xtoken BYTEA,
xspam_count INTEGER,
xham_count INTEGER,
xatime INTEGER)
RETURNS VOID AS
$$
BEGIN
   LOOP
 UPDATE bayes_token
SET spam_count = greatest_int (spam_count + xspam_count, 0),
ham_count  = greatest_int (ham_count + xham_count, 0),
atime  = greatest_int (atime, xatime)
  WHERE id = xid
AND token = xtoken;

  IF found THEN
 RETURN;
  END IF;

  BEGIN
INSERT INTO bayes_token VALUES (xid,
xtoken,
xspam_count,
xham_count,
xatime);
RETURN;
  EXCEPTION WHEN unique_violation THEN
-- do nothing
  END;
   END LOOP;
END;
$$
LANGUAGE plpgsql;

It's not really tested so I can't tell if it's faster then what you have.  
What it does do is mimic the way you insert values in mysql. It only work
on pg 8.0 and later however since the exception handling was added in 8.0.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-15 Thread Dennis

David Mitchell wrote:

If your table has got into this state, then vacuum analyze won't fix 
it. You will have to do a vacuum full to get it back to normal, then 
regularly vacuum (not full) to keep it in good condition. We vacuum 
our critical tables every 10 minutes to keep them in good nick.



So should I have vacuum run during the load test? At what level of 
updates should it run every ten minutes?


Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

David Mitchell wrote:

What is the load average on this machine? Do you do many updates? If 
you do a lot of updates, perhaps you haven't vacuumed recently. We 
were seeing similar symptoms when we started load testing our stuff 
and it turned out we were vacuuming too infrequently.


The load average at the 100% utilization point was about 30! A vacuum 
analyze was done before the test was started. I believe there are many 
more selects than updates happening at any one time.


Dennis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

Qingqing Zhou wrote:


Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from
the ps status you list, I can hardly see that's a problem because of problem
you mentioned below.
 

The postgreSQL processes are what is taking up all the cpu. There aren't 
any other major applications on the machine. Its a dedicated database 
server, only for this application.


It doesn't seem to make sense that PostgreSQL would be maxed out at this 
point. I think given the size of the box, it could do quite a bit 
better. So, what is going on? I don't know.


Dennis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] performance problems ... 100 cpu utilization

2005-07-13 Thread Dennis

Hi,

I've got a java based web application that uses PostgreSQL 8.0.2. 
PostgreSQL runs on its own machine with RHEL 3, ia32e kernel, dual Xeon 
processor, 4 Gb ram.


The web application runs on a seperate machine from the database. The 
application machine has three tomcat instances configured to use 64 
database connections each using DBCP for pooling. Most of the data 
access is via Hibernate. The database itself is about 100 meg in size.


We're perf testing the application with Loadrunner. At about 500 virtual 
users hitting the web application, the cpu utilization on the database 
server is at 100%, PostgreSQL is on its knees. The memory usage isn't 
bad, the I/O isn't bad, only the CPU seems to be maxed out.


checking the status of connections at this point ( ps -eaf | grep 
"postgres:")  where the CPU is maxed out I saw this:


127 idle
12 bind
38 parse
34 select

Hibernate is used in the application and unfortunately this seems to 
cause queries not to get logged. (see 
http://archives.postgresql.org/pgsql-admin/2005-05/msg00241.php)


I know there has been discussion about problems on Xeon MP systems. Is 
this what we are running into? Or is something else going on? Is there 
other information I can provide that might help determine what is going on?


Here are the postgresql.conf settings:

# The maximum number of connections.
max_connections = 256

# Standard performance-related settings.
shared_buffers = 16384
max_fsm_pages = 20
max_fsm_relations = 1
fsync = false
wal_sync_method = fsync
wal_buffers = 32
checkpoint_segments = 6
effective_cache_size = 38400
random_page_cost = 2
work_mem = 16384
maintenance_work_mem = 16384

# TODO - need to investigate these.
commit_delay = 0
commit_siblings = 5
max_locks_per_transaction = 512


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] General DB Tuning

2005-07-12 Thread Dennis

Tom Arthurs wrote:

we are using jdbc -- the "log_min_duration_statement = 3000 " 
statement works fine for me.  Looks like there's no other work around 
for the bug(?).  Not sure since I have no interest in logging a 
million statements a day, I only want to see the poorly performing hits. 



Doesn't it depend on what jdbc driver you are using?

Dennis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] planner picking more expensive plan

2005-07-01 Thread Dennis Bjorklund
On Fri, 1 Jul 2005, Sam Mason wrote:

The key thing with the query that Sam have is that if you turn off seqscan
you get the first plan that run in 0.4ms and if seqscan is on the runtime
is 27851ms.

There are 100 way to make it select the seq scan, including rewriting the 
query to something more useful, tweaking different parameters and so on. 

The interesting part is that pg give the fast plan a cost of 202 and the
slow a cost of 566141, but still it chooses the slow query unless seqscan
is turned off (or some other tweak with the same effect). It know very
well that the plan with the index scan will be much faster, it just don't
manage to generate it unless you force it to.

It makes you wonder if pg throws away some plans too early in the planning
phase.

>  Limit  (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 
> loops=1)
>->  Merge Left Join  (cost=0.00..6628.30 rows=3302780 width=8) (actual 
> time=0.211..0.576 rows=10 loops=1)
>  Merge Cond: ("outer".animalid = "inner".animalid)
>  ->  Index Scan using animals_pkey on animals a  
> (cost=0.00..10198983.91 rows=3302780 width=8) (actual time=0.112..0.276 
> rows=10 loops=1)
>  ->  Index Scan using movement_animal on movements m  
> (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.088..0.235 
> rows=10 loops=1)
>Filter: (mtypeid = 0)
>  Total runtime: 0.413 ms
> 
>  Limit  (cost=565969.42..566141.09 rows=10 width=8) (actual 
> time=27769.047..27769.246 rows=10 loops=1)
>->  Merge Right Join  (cost=565969.42..57264070.77 rows=3302780 width=8) 
> (actual time=27769.043..27769.228 rows=10 loops=1)
>  Merge Cond: ("outer".animalid = "inner".animalid)
>  ->  Index Scan using movement_animal on movements m  
> (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.022..0.154 
> rows=10 loops=1)
>Filter: (mtypeid = 0)
>  ->  Sort  (cost=565969.42..574226.37 rows=3302780 width=8) (actual 
> time=27768.991..27769.001 rows=10 loops=1)
>Sort Key: a.animalid
>->  Seq Scan on animals a  (cost=0.00..77086.80 rows=3302780 
> width=8) (actual time=0.039..5620.651 rows=3303418 loops=1)
>  Total runtime: 27851.097 ms


Another thing to notice is that if one remove the Limit node then the
situation is reversed and the plan that pg choose (with the Limit node) is
the one with the lowest cost. The startup cost is however very high so 
combining that Merge Join with a Limit will of course produce something 
slow compared to the upper plan where the startup cost is 0.0.

A stand alone test case would be nice, but even without the above plans 
are interesting.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] ported application having performance issues

2005-07-01 Thread Dennis Bjorklund
On Thu, 30 Jun 2005, John Mendenhall wrote:

> Our setting for effective_cache_size is 2048.
> 
> random_page_cost = 4, effective_cache_size = 2048   time approximately 4500ms
> random_page_cost = 3, effective_cache_size = 2048   time approximately 1050ms
> random_page_cost = 3, effective_cache_size = 4096   time approximately 1025ms
> 
> The decrease of random_page_cost to 3 caused the plan
> to work properly, using the lead_requests table as a
> join starting point and using the contacts index.

The effective_cache_size still looks small. As a rule of tumb you might
want effective_cache_size to be something like 1/2 or 2/3 of your total
memory. I don't know how much you had, but effective_cache_size = 4096 is
only 32M.

shared_buffers and effective_cache_size is normally the two most important 
settings in my experience.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] ETL optimization

2005-06-23 Thread Dennis Bjorklund
On Thu, 23 Jun 2005, Bricklen Anderson wrote:

> iii. UNIQUE constraint on table "t1". This didn't seem to perform too
> badly with fewer rows (preliminary tests), but as you'd expect, on error
> the whole transaction would roll back. Is it possible to skip a row if
> it causes an error, as opposed to aborting the transaction altogether?

You don't need to roll back the whole transaction if you use savepoints or 
the exception features in pl/pgsql

Take a look at this example:

http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Needed: Simplified guide to optimal memory configuration

2005-06-15 Thread Dennis Bjorklund
On Wed, 15 Jun 2005, Todd Landfried wrote:

> So, what I need is to be pointed to (or told) what are the best  
> settings for our database given these memory configurations.  What  
> should we do?

Maybe this will help:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

> NOTICE:  shared_buffers is 256

This looks like it's way too low. Try something like 2048.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query plan changes after pg_dump / pg_restore

2005-06-09 Thread Dennis Bjorklund
On Thu, 9 Jun 2005, Jona wrote:

> It's the same (physical) server as well as the same PostGreSQL daemon, 
> so yes.

The only thing that can differ then is the statistics collected and the
amount of dead space in tables and indexes (but since you both reindex and
run vacuum full that should not be it).

So comparing the statistics in the system tables is the only thing I can 
think of that might bring some light on the issue. Maybe someone else have 
some ideas.

And as KL said, the effective_cache_size looked like it was way to small. 
With that setting bigger then pg should select index scans more often. It 
doesn't explain why the databases behave like they do now, but it might 
make pg select the same plan nevertheless.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query plan changes after pg_dump / pg_restore

2005-06-09 Thread Dennis Bjorklund
On Thu, 9 Jun 2005 [EMAIL PROTECTED] wrote:

> I am continously encountering an issue with query plans that changes after 
> a pg_dump / pg_restore operation has been performed.
> 
> Have placed an export of the query, query plan etc. online at: 
> http://213.173.234.215:8080/plan.htm in order to ensure it's still 
> readable.

There is not a major difference in time, so pg is at least not way off 
(225ms vs. 280ms). The estimated cost is however not very related to the 
runtime (117 vs 1389).

What you have not showed is if the database is properly tuned. The output
of SHOW ALL; could help explain a lot together with info of how much
memory your computer have.

The first thing that comes to mind to me is that you probably have not 
tuned shared_buffers and effective_cache_size properly (SHOW ALL would 
tell).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] COPY vs INSERT

2005-05-06 Thread Dennis Bjorklund
On Fri, 6 May 2005, Jim C. Nasby wrote:

> Has thought been given to supporting inserting multiple rows in a single
> insert? DB2 supported:
> 
> INSERT INTO table VALUES(
> (1,2,3),
> (4,5,6),
> (7,8,9)
> );
> 
> I'm not sure how standard that is or if other databases support it.

The sql standard include this, except that you can not have the outer ().
So it should be

INSERT INTO table VALUES
(1,2,3),
(4,5,6),
(7,8,9);

Do DB2 demand these extra ()?

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Dennis Bjorklund
On Tue, 8 Mar 2005, Rick Schumeyer wrote:

> =# explain select * from data where x = 0;
> -
>  Index Scan using data_x_ix on data  (cost=0.00..78.25 rows=19 width=34)
>Index Cond: (x = 0::double precision)
>  
> But this command, in which the only difference if > instead of =, is a
> sequential scan.
> 
> =# explain select * from data where x > 0;
> --
>  Seq Scan on data  (cost=0.00..1722605.20 rows=62350411 width=34)
>Filter: (x > 0::double precision)
> 
> Why is this?

That is because it's faster to execute the x>0 query with a seq. scan then 
a index scan. Postgresql is doing the right thing here.

Pg estimates that the first query will return 19 rows and that the second 
query will return 62350411 rows. To return 62350411 rows it's faster to 
just scan the table and not use the index.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Dennis Bjorklund
On Thu, 21 Oct 2004, Thomas F.O'Connell wrote:

>   Aggregate  (cost=0.02..0.02 rows=1 width=8) (actual  
> time=298321.421..298321.422 rows=1 loops=1)
> ->  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual  
> time=1.771..298305.531 rows=2452 loops=1)
>   Join Filter: ("inner".id = "outer".id)
>   ->  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)  
> (actual time=0.026..11.869 rows=2452 loops=1)
>   ->  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1  
> width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
> Filter: (id = 18181::bigint)

It looks like you have not run ANALYZE recently. Most people run VACUUM 
ANALYZE every night (or similar) in a cron job.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] First set of OSDL Shared Mem scalability results, some

2004-10-10 Thread Dennis Bjorklund
On Fri, 8 Oct 2004, Josh Berkus wrote:

> As you can see, the "sweet spot" appears to be between 5% and 10% of RAM, 
> which is if anything *lower* than recommendations for 7.4!   

What recommendation is that? To have shared buffers being about 10% of the
ram sounds familiar to me. What was recommended for 7.4? In the past we
used to say that the worst value is 50% since then the same things might
be cached both by pg and the os disk cache.

Why do we excpect the shared buffer size sweet spot to change because of
the new arc stuff? And why would it make it better to have bigger shared 
mem?

Wouldn't it be the opposit, that now we don't invalidate as much of the
cache for vacuums and seq. scan so now we can do as good caching as 
before but with less shared buffers.

That said, testing and getting some numbers of good sizes for shared mem
is good.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On 22 Sep 2004, Greg Stark wrote:

> Actually this looks like it's arguably a bug to me. Why does the hash
> join execute the sequential scan at all? Shouldn't it also like the
> merge join recognize that the other hashed relation is empty and skip
> the sequential scan entirely?

I'm not sure you can classify that as a bug. It's just that he in one of 
the plans started with the empty scan and bacause of that didn't need 
the other, but with the hash join it started with the table that had 16 
rows and then got to the empty one.

While I havn't checked, I assume that if it had started with the empty 
table there then it would have skipped the other.

I don't know what criteria is used to select which part to start with when
doing a hash join. Looks like it started with the one that had the highest
estimate of rows here, doing it the other way around might be a good idea
because you in some cases are lucky to find an empty scans and can omit
the other.

The above are just observations of the behaviour, I've not seen the source 
at all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On Wed, 22 Sep 2004, Gaetano Mendola wrote:

>   Limit  (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 
> loops=1)
>   Limit  (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 
> loops=1)

These estimated costs are almost the same, but the runtime differs a bit. 
This means that maybe you need to alter settings like random_page_cost, 
effective_cache and maybe some others to make the cost reflect the runtime 
better.

Since the costs are so close to each other very small changes can make it 
choose the other plan. It's also very hard to make an estimate that is 
correct in all situations. That's why it's called an estimate after all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-21 Thread Dennis Bjorklund
On Wed, 22 Sep 2004, Gaetano Mendola wrote:

> Now my question is why the 7.4 choose the hash join ?  :-(

It looks to me that the marge join is faster because there wasn't really 
anything to merge, it resulted in 0 rows. Maybe the hash join that is 
choosen in 7.4 would have been faster had there been a couple of result 
rows (just a guess).

It would be interesting to compare the plans in 7.4 with and without 
hash_join active and see what costs it estimates for a merge join compared 
to a hash join.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres

2004-09-11 Thread Dennis Bjorklund
On Sat, 11 Sep 2004, Mark Cotner wrote:

> There are no performance concerns with MySQL, but it would benefit
> greatly from stored procedures, views, etc.  It is a very large rolling
> data warehouse that inserts about 4.5 million rows every 2 hours and
> subsequently rolls this data off the back end of a 90 day window.

While it is impossible to know without testing, postgresql has the benefit
of readers and writers that does not block each other. So in situations
where you do lots of concurrent inserts and selects postgresql should
behave well.

> Merge table definition equivalent.  We use these extensively.

As far as I can tell a merge table in mysql is the same as a view over a 
number of unions of other tables. And possibly a rule that defines how 
inserts will be done if you do inserts in the merged table.

> Merge table equivalent with all tables containing over 100M rows(and
> about 40 columns, some quite wide) will need to do index scans in at
> least 5 seconds(MySQL currently does 2, but we can live with 5) and
> return ~200 rows.

Since each table that are merged will have it's own index the speed should 
be proportional to the number of tables. Index scans in them self are very 
fast, and of you have 30 tables you need 30 index scans.

Also, are you sure you really need merge tables? With pg having row locks
and mvcc, maybe you could go for a simpler model with just one big table.  
Often you can also combine that with partial indexes to get a smaller
index to use for lots of your queries.

> Thoughts, suggestions?

I see nothing in what you have written that indicates that pg can not do 
the job, and do it well. It's however very hard to know exactly what is 
the bottleneck before one tries. There are lots of cases where people have 
converted mysql applications to postgresql and have gotten a massive 
speedup. You could be lucky and have such a case, who knows..

I spend some time each day supporting people using postgresql in the
#postgresql irc channel (on the freenode.net network). There I talk to
people doing both small and big conversions and the majority is very happy
with the postgresql performance. Postgresql have gotten faster and faster 
with each release and while speed was a fair argument a number of years 
ago it's not like that today.

That said, in the end it depends on the application.

We are all interested in how it goes (well, at least me :-), so feel free
to send more mails keeping us posted. Good luck.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] The usual sequential scan, but with LIMIT !

2004-09-06 Thread Dennis Bjorklund
On Mon, 6 Sep 2004, [iso-8859-15] Pierre-Frédéric Caillaud wrote:

>   Why is it that way ? The planner should use the LIMIT values when  
> planning the query, should it not ?

And it do use limit values, the estimated cost was lower when you had the 
limit,

What you need to do is to tune pg for your computer. For example the 
following settings:

 * effective_cache - this setting tells pg how much the os are caching
  (for example use top to find out during a normal work load). You said 
  that the tables fit in memory and by telling pg how much is cached it 
  might adjust it's plans accordingly.

* random_page_cost - how expensive is a random access compared to seq. 
  access. This is dependent on the computer and disk system you have.
  If the setting above does not help, maybe you need to lower this to
  variable to 2 or something.

And don't forget the shared_buffer setting. But most people usually have
it tuned in my experience (but usually too high). Here is an article that
might help you:

  http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Why those queries do not utilize indexes?

2004-08-27 Thread Dennis Bjorklund
On Fri, 27 Aug 2004, Artimenko Igor wrote:

> 1. Sequential search and very high cost if set enable_seqscan to on;
> Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 )
> 
> 2. Index scan but even bigger cost if set enable_seqscan to off;
> Index “messagesStatus” on messageinfo ( Cost=0.00..27220.72, rows=36802 )

So pg thinks that a sequential scan will be a little bit faster (The cost 
is a little bit smaller). If you compare the actual runtimes maybe you 
will see that pg was right. In this case the cost is almost the same so 
the runtime is probably almost the same.

When you have more data pg will start to use the index since then it will 
be faster to use an index compared to a seq. scan.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimizer Selecting Incorrect Index

2004-08-25 Thread Dennis Bjorklund
On Wed, 25 Aug 2004, Richard Huxton wrote:

> >  Index Scan using trptserc on trans  (cost=0.00..465.10 rows=44 width=118)
> >Index Cond: (trn_patno = 19)
> >Filter: ((trn_old_date >= '1994-08-23'::date) AND (trn_old_date <=
> > '2004-08-23'::date) AND (trn_bill_inc = 'B'::bpchar))
> > (687 rows)
> 
> >  Index Scan using todate on trans  (cost=0.00..105165.74 rows=1 width=143)
> >Index Cond: ((trn_old_date >= '1994-08-23'::date) AND (trn_old_date <=
> > '2004-08-23'::date))
> >Filter: ((trn_patno = 19) AND (trn_bill_inc = 'B'::bpchar))
> > (713 rows)
> 
> These queries are different. The first returns 687 rows and the second 
> 713 rows.

The 687 and 713 are the number of rows in the plan, not the number of rows 
the queries return.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Dennis Bjorklund
On Tue, 3 Aug 2004, Martin Foster wrote:

> to roughly 175 or more. Essentially, the machine seems to struggle 
> to keep up with continual requests and slows down respectively as 
> resources are tied down.

I suggest you try to find queries that are slow and check to see if the 
plans are optimal for those queries.

There are some logging options for logging quries that run longer then a 
user set limit. That can help finding the slow queries. Just doing some 
logging for some typical page fetches often show things that can be done 
better. For example, it's not uncommon to see the same information beeing 
pulled several times by misstake.

Maybe you can also try something like connection pooling. I'm not sure how
much that can give, but for small queries the connection time is usually
the big part.

> Would disabling 'fsync' provide more performance if I choose that
> information may be lost in case of a crash?

I would not do that. In most cases the performance increase is modest and
the data corruption risk after a crash is much bigger so it's not worth
it.

If you have a lot of small inserts then it might be faster with this, but
if possible it's much better to try to do more work in a transaction then 
before.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Query gets slow when where clause increases

2004-07-01 Thread Dennis Bjorklund
On Tue, 29 Jun 2004, James Antill wrote:

>->  Index Scan using idx_ticket_groups_assigned on ticket_groups g  
> (cost=0.00..241.76 rows=5 width=20) (actual time=0.13..12.67 rows=604 loops=1)
>  Index Cond: (assigned_to = 1540)

Here the planner estimated that it would find 5 rows, but it did find 604.  
I take that as a sign that you have not ran VACUUM ANALYZE recently?

If you done that, then maybe you need to change the statistics target for
that column. Before you set it on that column you could try to just alter
the default statistics target for one session like this:

SET default_statistics_target TO 100;
ANALYZE;

and then see if you get a better plan when you run the query afterwards.

If it helps you can either set the default_statistics_target in
postgresql.conf or set it just for some column using ALTER TABLE.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] no index-usage on aggregate-functions?

2004-06-29 Thread Dennis Bjorklund
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote:

> > Average and sum can never use an index AFAIK, in any db server.  You 
> > need information from every row.
> 
> Take a look at the SQLSrv-pendant:
> create index x_1 on the_table (num_found)
> select avg(num_found) from the_table
> -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1])

But is it really faster is the question?

This sum needs all the values in that column. As far as I know it uses the
index because it uses less space on disk and thus is a little faster due
to less IO. In pg the index doesn't work like that, so in pg it's faster
to sum all values using the table itself.

If you have a WHERE clause to only sum some values, then pg will use an
index (if applicable) and you will see a speedup.

For min and max the situation is different, there an index can give you
the answer without scanning all rows. For that the workaround exist in pg.  
The pg aggregate functions are very general and no one have special cased
min/max yet. Until that happen the work around works and is fast.

> So, it seems that PG is not s well suited for a datawarehouse and/or
> performing extensive statistics/calculations/reportings on large tables,
> is it?

I don't see how you can say that from your example. Just because it uses
an index for the sum above does not mean that it is a lot faster. It still 
have to do as many additions as pg has to do.

Sure, mvcc is best when you have both read and writes. But it should still
be comparable in speed even if you only do reads.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Slow vacuum performance

2004-06-17 Thread Dennis Bjorklund
On Thu, 17 Jun 2004, Patrick Hatcher wrote:

> I have 6 million row table that I vacuum full analyze each night.  The time
> seems to be streching out further and further as I add more rows.  I read

You could try to run normal (non full) vacuum every hour or so. If you do 
normal vacuum often enough you probably don't need to run vacuum full at 
all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Quad processor options

2004-05-11 Thread Dennis Bjorklund
On Tue, 11 May 2004, Bjoern Metzdorf wrote:

> I am curious if there are any real life production quad processor setups 
> running postgresql out there. Since postgresql lacks a proper 
> replication/cluster solution, we have to buy a bigger machine.

Du you run the latest version of PG? I've read the thread bug have not 
seen any information about what pg version. All I've seen was a reference 
to debian which might just as well mean that you run pg 7.2 (probably not 
but I have to ask).

Some classes of queries run much faster in pg 7.4 then in older versions
so if you are lucky that can help.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Dennis Bjorklund
On Fri, 30 Apr 2004, Gary Doades wrote:

> I should have also pointed out that MSSQL reported that same index scan
> as taking 65% of the overall query time. It was just "faster". The
> overall query took 103ms in MSSQL.

Are your results based on a single client accessing the database and no 
concurrent updates?

Would adding more clients, and maybe having some client that
updates/inserts into the tables, still make mssql faster then pg? Maybe
it's so simple as pg being optimized for more concurrent users then mssql?

I'm just asking, I don't know much about the inner workings of 
mssql.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Bruno Wolff III wrote:

> Another option would be to put the numbers into two int4s. For int4 or
> smaller types one of these would be zero. int8s would be split between
> the two. The hash function would then be defined on the two int4s.

Sure, this is an internal calculation in the hash function. The only 
important thing is that the number 7 (for example) gives the same hash 
value no matter if it is an int2 or an int8 and that the hash function 
works well also for int8 numbers (which is does not today).

At least that was the properties I understood that we wanted.

We got side tracked into talking about what datatype exists in all 
platforms, that's not an issue at all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Tom Lane wrote:

> > What do you mean? int8 is supported on all platformas
> 
> No it isn't.

So on platforms where it isn't you would use int4 as the biggest int then. 
I don't really see that as a problem. As long as you calculate the hash on 
the biggest int on that platform it should work.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Tom Lane wrote:

> That creates portability issues though.  We do not depend on there being
> a 64-bit-int type for anything except int8 itself, and I don't want to
> start doing so.

What do you mean? int8 is supported on all platformas and if the 
hasfunction would convert all numbers to int8 before making the hash it 
would work.

I don't see any portability problems.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Dennis Bjorklund
On Sat, 17 Apr 2004, Tom Lane wrote:

> *some* set of inputs.  (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)

The simple solution would be to always extend integers to 64 bits (or
whatever the biggest integer is) before calculating the hash. It makes the
hash function a little slower for smaller types, but it's mostly an
operation in the cpu and no memory involved, so it's probably not
noticable.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] select count(*) on large tables

2004-04-08 Thread Dennis Bjorklund
On Thu, 8 Apr 2004, Cosimo Streppone wrote:

> The alternative solution I tried, that has an optimal
> speed up, unfortunately is not a way out, and it is based
> on "EXPLAIN SELECT count(*)" output parsing, which
> is obviously *not* reliable.

Try this to get the estimate:

   SELECT relname, reltuples from pg_class order by relname;

> The times always get better doing a vacuum (and eventually
> reindex) of the table, and they slowly lower again.

Yes, the estimate is updated by the analyze.

> Is there an estimate time for this issue to be resolved?

It's not so easy to "fix". The naive fixes makes other operations slower,
most notably makes things less concurrent which is bad since it wont scale 
as good for many users then.

You can always keep the count yourself and have some triggers that update 
the count on each insert and delete on the table. It will of course make 
all inserts and deletes slower, but if you count all rows often maybe it's 
worth it. Most people do not need to count all rows in a table anyway. You 
usually count all rows such as this and that (some condition).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] good pc but bad performance,why?

2004-04-07 Thread Dennis Bjorklund
On Wed, 7 Apr 2004, huang yaqin wrote:

> You said turning fsync off may cause losing data, that's terrible. I use
> SCSI disk, and file system is ext3. I tune postgresql.conf and can't get
> any improvement. So what can I do?

Make sure you do as much as possible inside one transaction. If you want 
to do 1000 inserts, then do BEGIN; insert ; insert; ... ; COMMIT;

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] simple query join

2004-03-07 Thread Dennis Bjorklund
On Mon, 8 Mar 2004, Chris Smith wrote:

>  assetid| integer | not null default 0

>  assetid| character varying(255) | not null default '0'

The types above does not match, and these are the attributes you use to 
join.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] speeding up a select with C function?

2004-03-07 Thread Dennis Bjorklund
On Sun, 7 Mar 2004, David Teran wrote:

>   we need to optimize / speed up a simple select:
> 
> explain analyze select
> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
> from job_property t0, job_property t1
> where t0.id_job_profile = 5
> and t1.id_job_profile = 6
> and t1.id_job_attribute = t0.id_job_attribute
> and t1.int_value < t0.int_value;

Try to add an index on (id_job_profile, id_job_attribute) or maybe even 
(id_job_profile, id_job_attribute, int_value)

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Fixed width rows faster?

2004-03-07 Thread Dennis Bjorklund
On Sat, 6 Mar 2004, Andrew Sullivan wrote:

> > places is to ensure that the column can be indexed. Postgres, it seems,
> > refuses to insert a string that is longer than some value into an
> > indexed column, and I'll rather have such errors flagged while inserting
> 
> Care to provide some details of this?  It sure sounds like a bug to
> me, if it's true.  I've never run into anything like this, though.

There is a limit of the size of values that can be indexed. I think it's
8k or something (a block I assume). Something for someone with an itch to
fix in the future.

The error however comes when you try to insert the value. Doing a reindex
will not change the length of the value and will always work.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Fixed width rows faster?

2004-03-05 Thread Dennis Bjorklund
On Fri, 5 Mar 2004, Jeff wrote:

> Is there any performance advantage to using a fixed width row (with PG)?

As far as I know there is only a small win when you want to extract some
field from a tuple and with variable width fields you have to walk to the
correct field. But this is a small performance problem unless you have
very many variable size columns in the table.

> different databases and we have also some custom software here that 
> uses fixed width rows to be able to hit row N in O(1)

This can not happen in pg since there is no row N. Every transaction can 
have a different view of the table, some rows are visible and some others 
are not. To find row N you have to walk from the start and inspect every 
tuple to see if it's visible to this transaction or not.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] slow database

2004-02-11 Thread Dennis Bjorklund
On Wed, 11 Feb 2004 [EMAIL PROTECTED] wrote:

> somebody please knows to give tips to me to increase the
> performance

Run VACUUM ANALYZE. Find one query that is slow. Run EXPLAIN ANALYZE on
that query. Read the plan and figure out why it is slow. Fix it.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] query optimization question

2004-01-30 Thread Dennis Bjorklund
On Thu, 29 Jan 2004, Jack Coates wrote:

> > Probably better to repost it as a gzip'd attachment.  That should
> 
> complete with a picture of the GUI version. 26k zipped, let's see if
> this makes it through.

Are you sure you attached it?

At least when it got here there was no attachment.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] query optimization question

2004-01-30 Thread Dennis Bjorklund
On Thu, 29 Jan 2004, Tom Lane wrote:

> > jackdb-# GROUP BY memberid_ HAVING ( 
> 
> Um, that's not what I had in mind at all.  Does GROUP BY actually do
> anything at all here?  (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)

>From your comment I assume that there is no transformation in pg that 
detects that the group by columns are unique?

> this is all blue-sky speculation anyway.  What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly.  I find that a bit hard to believe because I don't see any
> relevant optimization techniques.

Getting rid of the group by would not give that kind of speedup? Maybe
mssql manage to rewrite the query like that before executing.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Really slow even after vacuum

2004-01-21 Thread Dennis Bjorklund
On Wed, 21 Jan 2004, Jeroen Baekelandt wrote:

> jms_messages again. It takes 80 seconds!?! While before, with 1000
> records, it took only a fraction of a second.

run: VACUUM FULL ANALYZE;

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] annoying query/planner choice

2004-01-11 Thread Dennis Bjorklund
On Sun, 11 Jan 2004, Andrew Rawnsley wrote:

> 20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything
> until I exceed 0.5, which strikes me as a bit high (though please
> correct me if I am assuming too much...). RANDOM_PAGE_COST seems to have
> no effect.

What about the effective cache size, is that set properly?

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Explain not accurate

2004-01-11 Thread Dennis Bjorklund
On Fri, 9 Jan 2004, Richard van den Berg wrote:

> problems. However, I've run into an issue where explain tells us a the 
> costs of a quiry are tremendous (105849017586), but the query actually 
> runs quite fast. Even "explain analyze" shows these costs.

It would be helpful if you can show the query and the EXPLAIN ANALYZE of
the query (and not just EXPLAIN).

> This makes me wonder: can the estimates explain shows be dead wrong?

Of course they can. An estimation is just an estimation. If you have not
analyzed the database then it's most likely wrong. Dead wrong is not
common, but not impossible.

Run VACUUM ANALYZE and see if the estimate is better after that.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Slow query problem

2004-01-09 Thread Dennis Björklund
On Fri, 9 Jan 2004, Richard Huxton wrote:

> > > select invheadref, invprodref, sum(units)
> > > from invtran
> > > group by invheadref, invprodref
> 
> > For the above query, shouldn't you have one index for both columns
> > (invheadref, invprodref). Then it should not need to sort at all to do the
> > grouping and it should all be fast.
> 
> Not sure if that would make a difference here, since the whole table is being 
> read. 

The goal was to avoid the sorting which should not be needed with that 
index (I hope). So I still think that it would help in this case.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Slow query problem

2004-01-09 Thread Dennis Björklund
On Thu, 8 Jan 2004, Bradley Tate wrote:

> We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, 
> Redhat 9) to do some basic comparisons between postgresql and firebird 
> 1.0.3 and 1.5rc8. Mostly the results are comparable, with one 
> significant exception.
> 
> QUERY
> select invheadref, invprodref, sum(units)
> from invtran
> group by invheadref, invprodref
> 
> RESULTS
> pg 7.3.4  -  5.5 min
> pg 7.4.0  -  10 min
> fb 1.0.3   -  64 sec
> fb 1.5 -   44 sec
> 
> * The invtran table has about 2.5 million records, invheadref and 
> invprodref are both char(10) and indexed.

For the above query, shouldn't you have one index for both columns
(invheadref, invprodref). Then it should not need to sort at all to do the
grouping and it should all be fast.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Question about difference in performance of 2 queries

2003-12-29 Thread Dennis Bjorklund
On Mon, 29 Dec 2003, Sean Shanny wrote:

> The first plan below has horrendous performance.  we only get about 2% 
> CPU usage and iostat shows 3-5 MB/sec IO.  The second plan runs at 30% 
> cpu and 15-30MB.sec IO. 
> 
> Could someone shed some light on why the huge difference in 
> performance?  Both are doing index scans plus a filter.  We have no 
> content_keys below -1 at this time so the queries return the same results.

EXPLAIN ANALYZE gives more information then EXPLAIN, and is prefered.

It uses different indexes in the two queries, and one seems to be 
faster then the other. Why, I can't tell yet.

I would assume that you would get the fastet result if you had an index 

   (content_key, date_key)

I don't know if pg will even use an index to speed up a <> operation. When 
you had > then it could use the idx_pageviews_content index. Why it choose 
that when the other would be faster I don't know. Maybe explain analyze 
will give some hint.

-- 
/Dennis


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Why is restored database faster?

2003-12-18 Thread Dennis Bjorklund
On Thu, 18 Dec 2003, Shridhar Daithankar wrote:

> Well, then the only issue left is file sytem defragmentation.

And the internal fragmentation that can be "fixed" with the CLUSTER 
command.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Why is restored database faster?

2003-12-16 Thread Dennis Bjorklund
On Tue, 16 Dec 2003, David Shadovitz wrote:

> I backed up my database using pg_dump, and then restored it onto a different 
> server using psql.  I see that the query "SELECT COUNT(*) FROM myTable" 
> executes immediately on the new server but takes several seconds on the old 
> one.  (The servers are identical.)
> 
> What could account for this difference?  Clustering?  How can I get the 
> original server to perform as well as the new one?

You probably need to run VACUUM FULL. It locks the tables during its 
execution so only do it when the database is not in full use.

If this helps you probably need to do normal vacuums more often and maybe
tune the max_fsm_pages to be bigger. 

-- 
/Dennis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] a lot of problems with pg 7.4

2003-12-14 Thread Dennis Bjorklund
On Sat, 13 Dec 2003, Kari Lavikka wrote:

> I evaluated pg 7.4 on our development server and it looked just fine
> but performance with production loads seems to be quite poor. Most of
> performance problems are caused by nonsensical query plans

Some of the estimates that pg made in the plans you showed was way off. I 
assume you have run VACUUM ANALYZE recently? If that does not help maybe 
you need to increaste the statistics gathering on some columns so that pg 
makes better estimates. With the wrong statistics it's not strange that pg 
chooses bad plans.

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimize

2003-11-24 Thread Dennis Bjorklund
On Mon, 24 Nov 2003, Torsten Schulz wrote:

> sort_mem = 32768 # min 32

32 meg per sort can be a lot in total if you have many clients sorting 
things. I assume you have checked so that the computer is not pushed into 
swapping when you have the peak with lots of users. A swapping computer is 
never fast.

Using some swap space is not bad, but a lot of page in and page out to the
swap is not good.

-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with count(*)

2003-11-15 Thread Dennis Bjorklund
On Fri, 14 Nov 2003, Tom Lane wrote:

> I believe the previous discussion also agreed that we wanted to postpone
> the freezing of now(), which currently also happens at BEGIN rather than
> the first command after BEGIN.

Or should that happen at the first call to now()?

/me should ge back and try to find this previous discussion.

-- 
/Dennis


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Dennis Bjorklund
On Mon, 10 Nov 2003, Marc G. Fournier wrote:

> 
> explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> FROM company c, traffic_logs ts
>WHERE c.company_id = ts.company_id
>  AND month_trunc(ts.runtime) = '2003-10-01'
> GROUP BY company_name,ts.company_id;

What if you do

  ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'

and add an index like (runtime, company_name, company_id)?


-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] One or more processor ?

2003-10-10 Thread Dennis Bjorklund
On Fri, 10 Oct 2003, [iso-8859-15] Hervé Piedvache wrote:

> A simple question about PostgreSQL ... I have a Pentium Xeon Quadri
> processors ... If I do a SQL request ... does PostgreSQL use one or more
> processor ?

Each connection becomes a process, and each process runs on one processor. 
So, with only one connection you use only one processor (and the OS might 
use an other processor). Most databases has many concurrent users and then 
it will use more processors.

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] OFFTOPIC: PostgreSQL vs MySQL

2003-10-09 Thread Dennis Bjorklund
On Thu, 9 Oct 2003, David Griffiths wrote:

> > > "PostgreSQL supports constraints. MySQL doesn't; programmers need to
> > > take care of that from the client side"
> > > Again, InnoDB supports constraints.
> >
> > Really?  This is news.   We did some tests on constraints on InnoDB, and
> > found that while they parsed, they were not actually enforced.Was 
> > our test in error?
> 
> You may have turned them off to load data? I've run into constraints
> when my data-load script missed some rows in address_type. When it went
> to do the address_list table, all rows that had the missing address_type
> failed, as they should. I saw no weakness in the constraints.

It sounds like you talk about foreign keys only, while the previous writer 
talkes about other constraints also. For example, in postgresql you 
can do:

CREATE TABLE foo (
  x int,

  CONSTRAINT bar CHECK (x > 5)
);

and then

# INSERT INTO foo VALUES (4);
ERROR:  ExecInsert: rejected due to CHECK constraint "bar" on "foo"


I don't know MySQL, but I've got the impression from other posts on the
lists that innodb supports foreign keys only. I might be wrong though.

-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] restore time: sort_mem vs. checkpoing_segments

2003-09-24 Thread Dennis Bjorklund
On Tue, 23 Sep 2003, Bruce Momjian wrote:

> With the new warning about too-frequent checkpoints, people have actual
> feedback to encourage them to increase checkpoint_segments.  One issue
> is that it is likely to recommend increasing checkpoint_segments during
> restore, even if there is no value to it being large during normal
> server operation.  Should that be decumented?

One could have a variable that turns off that warning, and have pg_dump
insert a statement to turn it off. That is, if one never want these
warnings from a restore (from a new dump).

In any case, documentation is good and still needed.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] LIKE query running slow

2003-09-24 Thread Dennis Bjorklund
On Tue, 23 Sep 2003, Garrett Bladow wrote:

> Recently we upgraded the RAM in our server. After the install a LIKE
> query that used to take 5 seconds now takes 5 minutes. We have tried the
> usual suspects, VACUUM, ANALYZE and Re-indexing.

If you mean that you reinstalled postgresql then it's probably because you
before run the database with the "C" locale but now you run it with
something else.

If all you did was to install the extra memory then I don't see how that
can affect it at all (especially so if you have not altered
postgresql.conf to make use of more memory).

-- 
/Dennis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Reading data in bulk - help?

2003-09-10 Thread Dennis Bjorklund
On Wed, 10 Sep 2003, Chris Huston wrote:

> Sometime before then I need to figure out how to cut the fetch times 
> from the now 200ms to something like 10ms.

You didn't say anything about Joshs first point of adjusting
postgresql.conf to match your machine. Settings like effective_cache_size
you almost always want to increase from the default setting, also shared 
memory.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Dennis Björklund
On Tue, 26 Aug 2003, Bill Moran wrote:

> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.

What version of pg was used in debian and redhat? For freebsd it's 7.2.4
it says on the page, but I see nothing about the other two. The version
that comes with Redhat 9 (Shrike) is 7.3.2.

-- 
/Dennis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Please scan your computer

2003-08-28 Thread Dennis Gearon
Someone who has my:

   [EMAIL PROTECTED]

email address has an infected computer, infected with the SoBig.F virus. 
I'm getting 200+ infected emails a day from that person(s).

Go to this site and do a free online virus scan. It's safe, and done by 
one of the two top virus scanning companies in world. I've done it 
several times.

http://housecall.antivirus.com/

--
Dennis Gearon
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL performance problem -> tuning

2003-08-14 Thread Dennis Björklund
On Thu, 7 Aug 2003, Richard Huxton wrote:

> But this parameter controls how much memory can be allocated to sorts - I 
> don't see how PG can figure out a reasonable maximum by itself.

One could have one setting for the total memory usage and pg could use
statistics or some heuristics to use the memory for different things in a 
good way.

Then that setting could have an auto setting so it uses 40% of all memory 
or something like that. Not perfect but okay for most people.

-- 
/Dennis


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Dennis Björklund
On Tue, 12 Aug 2003, mixo wrote:

> that I am currently importing data into Pg which is about 2.9 Gigs.
> Unfortunately, to maintain data intergrity, data is inserted into a table
> one row at a time.'

So you don't put a number of inserts into one transaction?

If you don't do that then postgresql will treat each command as a
transaction and each insert is going to be forced out on disk (returning
when the data is just in some cache is not safe even if other products
might do that). If you don't do this then the server promise the client
that the row have been stored but then the server goes down and the row
that was in the cache is lost. It's much faster but not what you expect
from a real database.

So, group the inserts in transactions with maybe 1000 commands each. It 
will go much faster. It can then cache the rows and in the end just make 
sure all 1000 have been written out on disk.

There is also a configuration variable that can tell postgresql to not 
wait until the insert is out on disk, but that is not recomended if you 
value your data.

And last, why does it help integrity to insert data one row at a time?

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Poor delete performance AFTER vacuum analyze

2003-07-19 Thread Dennis Björklund
On Sat, 19 Jul 2003, Jeremy M. Guthrie wrote:

> 100megs of new data each day.  However, the instant the system finishes only 
> a 'vacuum analyze', the whole thing slows WAY down to where each run can take 
> 10-15 minutes.

Have you run EXPLAIN ANALYZE on the delete query before and after the 
vacuum? Does it explain why it goes slower?

-- 
/Dennis


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] index / sequential scan problem

2003-07-18 Thread Dennis Björklund
On Fri, 18 Jul 2003, Tom Lane wrote:

> >> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index.
> 
> > Doesn't sound very good and it will most likely make other queries slower.
> 
> Seems like a reasonable approach to me --- certainly better than setting
> random_page_cost to physically nonsensical values.

Hehe, just before this letter there was talk about changing
random_page_cost. I kind of responed that 0.042 is not a good random page
cost. But now of course I can see that it says cpu_tuple_cost :-)

Sorry for adding confusion.

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] index / sequential scan problem

2003-07-17 Thread Dennis Björklund
On Fri, 18 Jul 2003, Fabian Kreitner wrote:

> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index.

Doesn't sound very good and it will most likely make other queries slower.
You could always turn off sequential scan before that query and turn it on
after.

> Anything I need to consider when raising it to such "high" values?

You could fill the table with more data and it will probably come to a 
point where it will stop using the seq. scan.

You could of course also change pg itself so it calculates a better
estimate.

-- 
/Dennis


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Extreme high load averages

2003-07-07 Thread Dennis Björklund
On Sun, 6 Jul 2003, Martin Foster wrote:

> The processor seems to be purposely sitting there twiddling it's thumbs. 
>   Which leads me to believe that perhaps the nice levels have to be 
> changed on the server itself?

It could also be all the usual things that affect performance. Are your 
queries using indexes where it should? Do you vacuum analyze after you 
have updated/inserted a lot of data?

It could be that some of your queries is not as efficient as it should, 
like doing a sequenctial scan over a table instead of an index scan. That 
translates into more IO needed and slower response times. Especially when 
you have more connections figthing for the available IO.

-- 
/Dennis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly