[PERFORM] Some very weird behaviour....

2003-07-09 Thread Chris Bowlby
Hi All, 

 I'm sure some of you know me from previous questions on other lists,
but this one has myself and Marc completely stumped. We've got a
database that has about 89 Million rows, under PostgreSQL 7.3.3 on a
dual PIII 1.2 with 4 GBytes of RAM on a 5 disk RAID 5 array. The dataset
itself is about 26+ GBYtes in size, all of it in the one table. 

 To give you some perspective on the size of the dataset and the
performance level we are hitting, here are some good results based on
some explains:

jnlstats=# explain analyze select count(*) from some_table where
some_time::date='2003-05-21';
   
QUERY
PLAN
--
 Aggregate  (cost=1794562.35..1794562.35 rows=1 width=0) (actual
time=3013.55..3013.55 rows=1 loops=1)
   -  Index Scan using some_table_ix_0 on some_table 
(cost=0.00..1793446.02 rows=446531 width=0) (actual time=48.40..2721.26
rows=249837 loops=1)
 Index Cond: ((some_time)::date = '2003-05-21'::date)
 Total runtime: 3015.02 msec
(4 rows)

jnlstats=# explain analyze select count(*) from stats_raw where
some_time::date='2003-05-21';
   QUERY
PLAN
-
 Aggregate  (cost=1794562.35..1794562.35 rows=1 width=0) (actual
time=1401.23..1401.23 rows=1 loops=1)
   -  Index Scan using some_table_ix_0 on some_table 
(cost=0.00..1793446.02 rows=446531 width=0) (actual time=0.50..1118.92
rows=249837 loops=1)
 Index Cond: ((some_time)::date = '2003-05-21'::date)
 Total runtime: 1401.42 msec

 There are about 249837 items that the query is identifying as valid
results and the results range between 1-1.4 seconds over ten runs with
the initial query taking 3 seconds, this average is how 90% of the
queries resopond, but we've got several peaks that we can not explain in
any way. For instance:

jnlstats=# explain analyze select count(*) from some_table where
some_time::date='2003-05-26';

QUERY
PLAN

 Aggregate  (cost=1794562.35..1794562.35 rows=1 width=0) (actual
time=307025.65..307025.65 rows=1 loops=1)
   -  Index Scan using some_table_ix_0 on some_table 
(cost=0.00..1793446.02 rows=446531 width=0) (actual
time=51.05..306256.93 rows=374540 loops=1)
 Index Cond: ((some_time)::date = '2003-05-26'::date)
 Total runtime: 307025.81 msec
(4 rows)

jnlstats=# explain analyze select count(*) from some_table where
some_time::date='2003-05-26';
   
QUERY
PLAN
--
 Aggregate  (cost=1794562.35..1794562.35 rows=1 width=0) (actual
time=10837.86..10837.86 rows=1 loops=1)
   -  Index Scan using some_table_ix_0 on some_table 
(cost=0.00..1793446.02 rows=446531 width=0) (actual time=1.01..10304.78
rows=374540 loops=1)
 Index Cond: ((some_time)::date = '2003-05-26'::date)
 Total runtime: 10838.04 msec

 The total number of items counted is 374540 items, so not too much more
then the previous query, but the 300 second runtime was unexpected (we
were expecting ~4-5 seconds and then ~1-2 seconds for the caches
results. I have 5 other dates that all exhibit this information,but it's
ONLY those dates that run that slow and the one I presented above here
is the largest of them all. The database server is configured with a 5
MByte shared mamory buffer, but even a larger shared memory buffer does
not help (we have had it set to 800 MBytes before). The disk is getting
hit the heviest durring that last query, with iostat results being:

  tty da0  da1  da2
cpu
 tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in
id
   0   25 12.22   1  0.02   7.68   0  0.00   7.68   0  0.00   0  0  0  0
99
   4 3758  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   6  0 38  0
56
   0  151  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   9  0 43  0
48 
   0  148  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00  10  0 40  0
49
   0  153  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00  10  0 40  0
49
   0  152  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00  10  

Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Kaarel
Are you willing to say that the PostgreSQL database system should only be
used by DBAs?  I believe that Postgres is such a good and useful tool that
anyone should be able to start using it with little or no barrier to entry.


I quite agree.  But there is a difference between saying you should get
decent performance with no effort and you should get optimal
performance with no effort.  I think we can get to the first with
relatively little trouble (like boosting the default shared_buffers to
1000), but the second is an impractical goal.


Just wanted to repeat some of the thoughts already been expressed.

There are no reasons why shouldn't PostgreSQL be reasonably well 
configured for a particular platform out of the box. Not for maximum 
performance but for good enough performance. The many complaints by new 
users about PostgreSQL being suprisingly slow and the all the so 
standard answers (vacuum, pump up memory settings) imho prove that the 
default installatio can be improved. Already mentioned in the mail 
lists: using multiple standard conf files, quering system info and 
dynamically generating all or some parts of the conf file, automating 
the vacuum process...

Kaarel

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread scott.marlowe
On Wed, 9 Jul 2003, Kaarel wrote:

 Are you willing to say that the PostgreSQL database system should only be
 used by DBAs?  I believe that Postgres is such a good and useful tool that
 anyone should be able to start using it with little or no barrier to entry.
  
  
  I quite agree.  But there is a difference between saying you should get
  decent performance with no effort and you should get optimal
  performance with no effort.  I think we can get to the first with
  relatively little trouble (like boosting the default shared_buffers to
  1000), but the second is an impractical goal.
 
 
 Just wanted to repeat some of the thoughts already been expressed.
 
 There are no reasons why shouldn't PostgreSQL be reasonably well 
 configured for a particular platform out of the box. Not for maximum 
 performance but for good enough performance. The many complaints by new 
 users about PostgreSQL being suprisingly slow and the all the so 
 standard answers (vacuum, pump up memory settings) imho prove that the 
 default installatio can be improved. Already mentioned in the mail 
 lists: using multiple standard conf files, quering system info and 
 dynamically generating all or some parts of the conf file, automating 
 the vacuum process...

It would be nice to have a program that could run on any OS postgresql 
runs on and could report on the current limits of the kernel, and make 
recommendations for changes the admin might want to make.

One could probably make a good stab at effective cache size during 
install.  Anything reasonably close would probably help.

Report what % of said resources could be consumed by postgresql under 
various circumstances...


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Scott Marlowe  wrote:
It would be nice to have a program that could run on any OS postgresql 
runs on and could report on the current limits of the kernel, and make 
recommendations for changes the admin might want to make.

One could probably make a good stab at effective cache size during 
install.  Anything reasonably close would probably help.

Report what % of said resources could be consumed by postgresql under 
various circumstances...

One of the issues that automating the process would encounter are limits 
in the kernel that are too low for PostgreSQL to handle. The BSD's come 
to mind where they need values manually increased in the kernel before 
you can reach a reasonable maximum connection count.

Another example is how OpenBSD will outright crash when trying to test 
the database during install time.   It seems that most of the tests fail 
because the maximum amount of processes allowed is too low for the test 
to succeed.   While FreeBSD will work just fine on those same tests.

If PostgreSQL automates the configuration, that would be a plus.   But 
also detect the platform and inform the person that these changes should 
be done to the kernel, sysctl or whatever in order to have that 
configuration run.

Perl may be useful in this for a few reasons.   It's portable enough to 
run on multiple Unix variants and the tools would be fairly standard, so 
the code would require less considerations for more exotic implementations.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Sean Chittenden
 I don't have much to add because I'm pretty new to Postgres and have
 been soliciting advice here recently, but I totally agree with
 everything you said.  I don't mind if it's in the postgres.conf file
 or in a faq that is easy to find, I just would like it to be in one
 place.  A good example of the need for this is when I was tuning
 effective_cache I thought that was creating a cache for Postgres
 when in fact as it was pointed out to me, it's just hinting to
 postgres the size of the OS cache.  Lots of ways for people to get
 really confused here.

I looked through the src/doc/runtime.sgml for a good place to stick
this and couldn't find a place that this seemed appropriate, but on
FreeBSD, this can be determined with a great deal of precision in a
programmatic manner:

echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))

The same OID is available via C too.  It'd be slick if PostgreSQL
could tune itself (on FreeBSD) at initdb time with the above code.  If
Linux exports this info via /proc and can whip out the appropriate
magic, even better.  An uncommented out good guess that shows up in
postgresql.conf would be stellar and quite possible with the use of
sed.

Maybe an initdb switch could be added to have initdb tune the config
it generates?  If a -n is added, have it generate a config and toss it
to stdout?


case `uname` in
FreeBSD)
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))
;;
*)
echo Unable to automatically determine the effective cache size  
/dev/stderr
;;
esac


-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Sean Chittenden wrote:
I looked through the src/doc/runtime.sgml for a good place to stick
this and couldn't find a place that this seemed appropriate, but on
FreeBSD, this can be determined with a great deal of precision in a
programmatic manner:
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))

The same OID is available via C too.  It'd be slick if PostgreSQL
could tune itself (on FreeBSD) at initdb time with the above code.  If
Linux exports this info via /proc and can whip out the appropriate
magic, even better.  An uncommented out good guess that shows up in
postgresql.conf would be stellar and quite possible with the use of
sed.
Maybe an initdb switch could be added to have initdb tune the config
it generates?  If a -n is added, have it generate a config and toss it
to stdout?
case `uname` in
FreeBSD)
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))
;;
*)
echo Unable to automatically determine the effective cache size  
/dev/stderr
;;
esac
-sc

Simplest way may be to create a 'auto-tune' directory with scripts for 
configured platforms.   When postgres installs the databases, it checks 
for 'tune.xxx' and if found uses that to generate the script itself?

This would allow for defaults on platforms that do not have them and 
optimization for those that do.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] plpgsql vs. SQL performance (again)

2003-07-09 Thread Michael Pohl
About a month ago I asked the general list about plpgsql functions that
occasionally significantly underperform their straight SQL equivalents.  
Tom noted that a different query plan was almost certainly being chosen by
the plpgsql function:

http://archives.postgresql.org/pgsql-general/2003-05/msg00966.php
http://archives.postgresql.org/pgsql-general/2003-05/msg00998.php

Tom suggested checking for sloppy datatype declarations in the plpgsql 
functions.  Double-checked, a-ok.

Tom also suggested that indexscans might not get picked by the plpgsql
function if I have some very skewed statistics.  Is there a way to verify
the plpgsql function's planner choices?

My casual observations are that this problem occurs with aggregates, and
that the big performance hit is not consistent.  I'd like advice on more
formal troubleshooting.

I can provide examples (my latest problem function is currently taking
over 4 seconds vs. .04 seconds for its straight SQL equivalent), table
schema, explain output for the straight SQL, etc., if anyone cares to work
through this with me.

thanks,

michael


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-09 Thread Martin Foster
Dennis Björklund wrote:

On Sun, 6 Jul 2003, Martin Foster wrote:


The processor seems to be purposely sitting there twiddling it's thumbs. 
 Which leads me to believe that perhaps the nice levels have to be 
changed on the server itself?


It could also be all the usual things that affect performance. Are your 
queries using indexes where it should? Do you vacuum analyze after you 
have updated/inserted a lot of data?

It could be that some of your queries is not as efficient as it should, 
like doing a sequenctial scan over a table instead of an index scan. That 
translates into more IO needed and slower response times. Especially when 
you have more connections figthing for the available IO.

I actually got a bit more respect for PostgreSQL tonight.  It seems that 
one of my scripts was not committing changes after maintenance was 
conducted.  Meaning that rows that would normally be removed after 
offline archiving was completed were in fact still around.

Normally at any given point in time this table would grow 50K rows 
during a day, be archived that night and then loose rows that were no 
longer needed.This process, is what allowed MySQL to maintain any 
stability as the size of this table can balloon significantly.

PostgreSQL with tweaking was handling a table with nearly 300K rows. 
That size alone would of dragged the MySQL system down to a near grind, 
and since most of those rows are not needed.   One can imagine that 
queries are needlessly processing rows that should be outright ignored.

This probably explains why row numbering based searches greatly 
accelerated the overall process.

By fixing the script and doing the appropriate full vacuum and re-index, 
the system is behaving much more like it should.  Even if the process 
may seem a bit odd to some.

The reason for removing rows on a daily basis is due to the perishable 
nature of the information.  Since this is a chat site, posts over a day 
old are rarely needed for any reason.   Which is why they are archived 
into dumps in case we really need to retrieve the information itself and 
this gives us the added bonus of smaller backup sizes and smaller 
database sizes.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly