Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-11 Thread mljv
Hi all, dear Richard,

your mail about my configuration parameter were the right hint, but i am still 
struggling with the problem. i will appreciate if you or somebody else can 
help me even further.

After some investigation i got some new results to my problem. The following 
query is not working as it should and is my most important query:

# select * from SpielTipp natural join Tippspiel 
# where tippspieltag_id=1254056;

- Spieltipp has about 80.000.000 records
- Tippspiel has about 10.000.000 records
- random_page_cost = 3 

Both table have indexes, of course.  So there should be no seqscan in use, but 
the planner is using a sequence scan:

# explain analyze 
# select * from SpielTipp natural join Tippspiel 
# where tippspieltag_id = 817372;

QUERY PLAN

 Hash Join  (cost=333.93..1647646.04 rows=1383 width=28) (actual 
time=104193.150..104193.150 rows=0 loops=1)
   Hash Cond: (outer.tippspiel_id = inner.tippspiel_id)
   -  Seq Scan on spieltipp  (cost=0.00..1253846.52 rows=78690352 width=16) 
(actual time=10.355..69195.235 rows=78690348 loops=1)
   -  Hash  (cost=333.44..333.44 rows=198 width=16) (actual 
time=44.821..44.821 rows=9 loops=1)
 -  Index Scan using tippspiel_tippspieltag_id_key on tippspiel  
(cost=0.00..333.44 rows=198 width=16) (actual time=44.798..44.809 rows=9 
loops=1)
   Index Cond: (tippspieltag_id = 817372)
 Total runtime: 104193.209 ms
(7 rows)
--

just to see how wrong the plan is, i disabled seqscan:

# set enable_seqscan to off;
# explain analyze 
# select * from SpielTipp natural join Tippspiel 
# where tippspieltag_id = 817372;

QUERY PLAN

 Nested Loop  (cost=0.00..1682807.57 rows=1383 width=28) (actual 
time=0.186..0.186 rows=0 loops=1)
   -  Index Scan using tippspiel_tippspieltag_id_key on tippspiel  
(cost=0.00..333.44 rows=198 width=16) (actual time=0.072..0.082 rows=9 
loops=1)
 Index Cond: (tippspieltag_id = 817372)
   -  Index Scan using ix_spieltipp_tippspiel_id on spieltipp  
(cost=0.00..8458.83 rows=3081 width=16) (actual time=0.010..0.010 rows=0 
loops=9)
 Index Cond: (spieltipp.tippspiel_id = outer.tippspiel_id)
 Total runtime: 0.232 ms
(6 rows)


no i tried to lower random_page_cost:

# set enable_seqscan to on;
# set random_page_cost to 1.5;
# explain analyze 
# select * from SpielTipp natural join Tippspiel 
# where tippspieltag_id = 817372;

QUERY PLAN

 Nested Loop  (cost=0.00..731643.62 rows=1383 width=28) (actual 
time=0.089..0.089 rows=0 loops=1)
   -  Index Scan using tippspiel_tippspieltag_id_key on tippspiel  
(cost=0.00..146.17 rows=198 width=16) (actual time=0.017..0.024 rows=9 
loops=1)
 Index Cond: (tippspieltag_id = 817372)
   -  Index Scan using ix_spieltipp_tippspiel_id on spieltipp  
(cost=0.00..3655.92 rows=3081 width=16) (actual time=0.005..0.005 rows=0 
loops=9)
 Index Cond: (spieltipp.tippspiel_id = outer.tippspiel_id)
 Total runtime: 0.135 ms
(6 rows)


looks fine at first glance. but the total estimated cost of 731643 is still 
far to high, right?

so what happened to me with a random_page_cost of 3 at my production server: 
the estimated costs between a seq scan and an index scan are not too 
different. So sometimes it will use a sequence scan after a fresh ANALYZE and 
sometime not as the statistics vary across ANALYZE runs. so i had the problem 
that the plan changed by running nightly ANALYZE on the database. 

As i have 200-240 connections at peak time, so reading your advices and 
annotated postgresql.conf, my conclusion is:

max_connections = 250
shared_buffers = 20 # 1.6 GB = 20% of avail. RAM
work_mem = 2 
maintenance_work_mem = 16
effective_cache_size = 60 # 4.8 GB = 60% of avail. RAM
random_page_cost = 2

Are those settings reasonable for my box?
my box is:
- dedicated
- AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
- 3ware RAID 1 Controller with two rather cheap SATA disks
- 8 GB RAM

kind regards 
Janning

Am Dienstag, 8. April 2008 17:40 schrieb Richard Huxton:
  Here are some of our configuration parameters. We never really tweaked it
  as it ran fine. We just raised some 

Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-09 Thread mljv
Am Dienstag, 8. April 2008 18:38 schrieb Scott Marlowe:
 It sounds to me like two possible problems, maybe combined.

 One possibility is that you have a data distribution that results in
 statistics being gathered that don't really represent your data.  Try
 increasing the stats target for that column (or the whole db if you'd
 rather) and re-running analyze.

 The other possibility is that you've got some index bloat happening
 and you might benefit from  reindexing the problematic table.

But if i do VACUUM ANALYZE without concurrent queries, everything runs fine 
afterwards.

If i run  VACUUM ANALYZE with few concurrent queries, it slows down to a 
crawl.

Could it be that something like this is happening:
- In the early morning a new DB connection is opened.
- While running VACUUM ANALYZE the planner uses different plans because some 
index could not be used or the statistics are right in that moment not 
present because they are updated... So the query gets a wrong plan. It uses a 
seqScan instead of an index scan.
- This wrongly planned statement is prepared so even after VACUUM ANALYZE is 
done, the statement does not use the wrong plan. 
- load raises triggers many concurrent queries with wrong plans. so everything 
slows down.

kind regards,
janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-09 Thread David Wilson
On Wed, Apr 9, 2008 at 3:29 AM,  [EMAIL PROTECTED] wrote:

  But if i do VACUUM ANALYZE without concurrent queries, everything runs fine
  afterwards.

  If i run  VACUUM ANALYZE with few concurrent queries, it slows down to a
  crawl.

  Could it be that something like this is happening:
  - In the early morning a new DB connection is opened.
  - While running VACUUM ANALYZE the planner uses different plans because some
  index could not be used or the statistics are right in that moment not
  present because they are updated... So the query gets a wrong plan. It uses a
  seqScan instead of an index scan.
  - This wrongly planned statement is prepared so even after VACUUM ANALYZE is
  done, the statement does not use the wrong plan.
  - load raises triggers many concurrent queries with wrong plans. so 
 everything
  slows down.


I've run into a very similar problem. I have some long-running
processes that generate a large amount of data and then query that
data that must periodically drop their connections and rebuild to
ensure that query plans get re-planned according to updated
statistics. This is especially true when a new DB is first being
brought up with an initial set of data (~1 week of compute time, ~30gb
of data and ~120m rows) with processes that live through the entire
process.

My solution, as mentioned above, is to rebuild the connection approx
every hour on my long-running processes. This is a tricky question,
because you want the benefit of not re-planning queries in 99.9% of
the cases- I'm not really opposed to the current system that requires
the reconnections, given that the overhead involved in them is
completely negligible in the grand scheme of things. There are
downsides to not replanning queries, and if you can't live with them
then you should either force re-planning at intervals or avoid
prepared statements.

-- 
- David T. Wilson
Princeton Satellite Systems
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-09 Thread Pavan Deolasee
On Wed, Apr 9, 2008 at 1:41 PM, David Wilson [EMAIL PROTECTED] wrote:



  I've run into a very similar problem. I have some long-running
  processes that generate a large amount of data and then query that
  data that must periodically drop their connections and rebuild to
  ensure that query plans get re-planned according to updated
  statistics. This is especially true when a new DB is first being
  brought up with an initial set of data (~1 week of compute time, ~30gb
  of data and ~120m rows) with processes that live through the entire
  process.


I believe plan-invalidation in 8.3 should address this. Isn't it ?

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-09 Thread mljv
Am Mittwoch, 9. April 2008 10:11 schrieb David Wilson:
 On Wed, Apr 9, 2008 at 3:29 AM,  [EMAIL PROTECTED] wrote:
   But if i do VACUUM ANALYZE without concurrent queries, everything runs
  fine afterwards.
 
   If i run  VACUUM ANALYZE with few concurrent queries, it slows down to
  a crawl.
 
   Could it be that something like this is happening:
   - In the early morning a new DB connection is opened.
   - While running VACUUM ANALYZE the planner uses different plans because
  some index could not be used or the statistics are right in that moment
  not present because they are updated... So the query gets a wrong plan.
  It uses a seqScan instead of an index scan.
   - This wrongly planned statement is prepared so even after VACUUM
  ANALYZE is done, the statement does not use the wrong plan.
   - load raises triggers many concurrent queries with wrong plans. so
  everything slows down.

 I've run into a very similar problem. I have some long-running
 processes that generate a large amount of data and then query that
 data that must periodically drop their connections and rebuild to
 ensure that query plans get re-planned according to updated
 statistics. This is especially true when a new DB is first being
 brought up with an initial set of data (~1 week of compute time, ~30gb
 of data and ~120m rows) with processes that live through the entire
 process.

This is not the case at my place. We are not producing lots of data. Just few 
data with small INSERTs all the time. I don't think that the statistic are 
really changing quite often as the tables are rather fixed and just some data 
is added.

 My solution, as mentioned above, is to rebuild the connection approx
 every hour on my long-running processes. This is a tricky question,
 because you want the benefit of not re-planning queries in 99.9% of
 the cases- I'm not really opposed to the current system that requires
 the reconnections, given that the overhead involved in them is
 completely negligible in the grand scheme of things. There are
 downsides to not replanning queries, and if you can't live with them
 then you should either force re-planning at intervals or avoid
 prepared statements.

In release 8.3:
http://www.postgresql.org/docs/8.3/interactive/release-8-3.html
Automatically re-plan cached queries when table definitions change or 
statistics are updated

So no reconnection is necessary in 8.3 to replan prepared statements. Maybe it 
helps you.

kind regards,
janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-09 Thread Gregory Stark
[EMAIL PROTECTED] writes:

 Could it be that something like this is happening:
 - In the early morning a new DB connection is opened.
 - While running VACUUM ANALYZE the planner uses different plans because some 
 index could not be used or the statistics are right in that moment not 
 present because they are updated... So the query gets a wrong plan. It uses a 
 seqScan instead of an index scan.

This isn't supposed to happen. The old statistics are still visible until the
new ones are visible. Nothing is locked or unavailable while analyze is running.

However one of the things that can happen is the statistics for one table get
updated and until the statistics for another table are updated the planner has
skewed results. It has the new statistics for the first table but old
statistics for the second table.

Normally the solution to this is to run analyze more frequently so things
don't change too drastically from one set of statistics to another. In some
situations this isn't good enough, for example if you're truncating a bunch of
tables together.

You can also run analyze (but not vacuum analyze) inside a transaction so all
the stats go into effect together when you commit. This does have downsides
however. I think it will block a vacuum analyze which tries to update the same
statistics, for example.

Another thing to be aware of is that *empty* tables cause Postgres to default
to a heuristic of assuming 1,000 records. That can cause strange things to
happen if you truncate just some tables and run analyze on them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-08 Thread Richard Huxton

[EMAIL PROTECTED] wrote:
We looked in our cpu monitoring and saw that we have huge IOwait while VACUUM 
is running, not unusual though. But just after VACUUM was finished, the 
userCPU load raised to 200% (dual cpu). At this time in the early morning 
there were only very few users using our app. We analyzed the situation 
further and saw that most queries had a very long duration time (minutes 
instead of milliseconds). BTW: the vacuum process was not running at this 
moment. It was logged before as done.


But WHAT was using all of your cpu? Was it PostgreSQL, and if so just 
one backend? If something else was using all your cpu, then it might 
just be that the PostgreSQL server wasn't getting a chance to run your 
queries.


* Maybe the Vacuum analyze process has not enough memory and therefore could 
not ANALYZE the tables correctly. It then writes wrong statistics to the 
database which results in wrong execution plans using sequence scans instead 
of index scans. This only happens if the vacuum analyze process runs 
concurrently with user requests. If it runs on its own, the vacuum process 
has enough memory and writes correct statistics to the database.


Doesn't sound likely to me. You've got 8GB of RAM, and if you were going 
into swap you'd have noticed the disk I/O.


Here are some of our configuration parameters. We never really tweaked it as 
it ran fine. We just raised some parameters. The following list should show 
all parameters changed from the default:



max_connections = 300
shared_buffers = 3
work_mem = 10240


OK, so that's 30,000 * 8KB = 240MB of shared_buffers
You have 10MB of work_mem and if all 300 connections were using that 
much you'd have committed 3GB of your RAM for that. Of course they'll 
want more than just that.


Do you really have 300 concurrent connections?


maintenance_work_mem = 163840


160MB for vacuums - should be OK given how much memory you have and the 
fact that it's quiet when you vacuum.



vacuum_mem = 32000


This is what maintenance_work_mem used to be called. You can delete this 
 entry.



max_fsm_pages = 50


You can track at most 500,000 pages with free space on them. In 8.2+ 
versions VACUUM VERBOSE will show you how many are currently being used. 
Not sure about 8.1



bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5
bgwriter_all_maxpages = 200
wal_buffers = 16 


checkpoint_segments = 10  


If you have bursts of write activity you might want to increase this.


checkpoint_warning = 3600



effective_cache_size = 18


That's 180,000 * 8KB = 180 * 8MB = 1.4GB
If that's really all you're using as cache, I'd reduce the number of 
concurrent connections. Check free/top and see how much RAM is really 
being used as disk cache.



random_page_cost = 3


Might be too high - you don't mention what disks you have.


stats_command_string = off


If you turn this one on, you'll be able to see the queries each backend 
is executing as they happen. Might be useful, but does have some cost.



The crucial thing is to find out exactly what is happening when things 
get very slow. Check vmstat and top, look in the pg_locks system-table 
and if needs be we can see what strace says a particular backend is doing.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-08 Thread mljv
HI Richard,

thanks for your immediate response. I will answer your questions below:

Am Dienstag, 8. April 2008 17:40 schrieb Richard Huxton:
 [EMAIL PROTECTED] wrote:
  We looked in our cpu monitoring and saw that we have huge IOwait while
  VACUUM is running, not unusual though. But just after VACUUM was
  finished, the userCPU load raised to 200% (dual cpu). At this time in the
  early morning there were only very few users using our app. We analyzed
  the situation further and saw that most queries had a very long duration
  time (minutes instead of milliseconds). BTW: the vacuum process was not
  running at this moment. It was logged before as done.

 But WHAT was using all of your cpu? Was it PostgreSQL, and if so just
 one backend? If something else was using all your cpu, then it might
 just be that the PostgreSQL server wasn't getting a chance to run your
 queries.

The CPU was used by postgresql. As i saw many SELECTS by looking at ps ax i 
guess that many backends using all of the cpu.

  * Maybe the Vacuum analyze process has not enough memory and therefore
  could not ANALYZE the tables correctly. It then writes wrong statistics
  to the database which results in wrong execution plans using sequence
  scans instead of index scans. This only happens if the vacuum analyze
  process runs concurrently with user requests. If it runs on its own, the
  vacuum process has enough memory and writes correct statistics to the
  database.

 Doesn't sound likely to me. You've got 8GB of RAM, and if you were going
 into swap you'd have noticed the disk I/O.

That makes sense. We had not any IOwait at this moment but 200% userCPU. 
But maybe vaccum can't use the ram because of a mis-configuration.

  Here are some of our configuration parameters. We never really tweaked it
  as it ran fine. We just raised some parameters. The following list should
  show all parameters changed from the default:
 
  max_connections = 300
  shared_buffers = 3
  work_mem = 10240

 OK, so that's 30,000 * 8KB = 240MB of shared_buffers
 You have 10MB of work_mem and if all 300 connections were using that
 much you'd have committed 3GB of your RAM for that. Of course they'll
 want more than just that.

 Do you really have 300 concurrent connections?

At peaktime, yes. 

  maintenance_work_mem = 163840

 160MB for vacuums - should be OK given how much memory you have and the
 fact that it's quiet when you vacuum.

  vacuum_mem = 32000

 This is what maintenance_work_mem used to be called. You can delete this
   entry.

  max_fsm_pages = 50

 You can track at most 500,000 pages with free space on them. In 8.2+
 versions VACUUM VERBOSE will show you how many are currently being used.
 Not sure about 8.1

  bgwriter_lru_percent = 10.0
  bgwriter_lru_maxpages = 100
  bgwriter_all_percent = 5
  bgwriter_all_maxpages = 200
  wal_buffers = 16
 
  checkpoint_segments = 10

 If you have bursts of write activity you might want to increase this.

  checkpoint_warning = 3600
 
  effective_cache_size = 18

 That's 180,000 * 8KB = 180 * 8MB = 1.4GB
 If that's really all you're using as cache, I'd reduce the number of
 concurrent connections. Check free/top and see how much RAM is really
 being used as disk cache.

  random_page_cost = 3

 Might be too high - you don't mention what disks you have.

3ware RAID-1 Controller with plain (cheap) SATA disks

 The crucial thing is to find out exactly what is happening when things
 get very slow. Check vmstat and top, look in the pg_locks system-table
 and if needs be we can see what strace says a particular backend is doing.

The problem is that we have peaktimes were everything is running fine. It has 
something to do with the vacuum process running. To simplify my problem:

- I run vaccum analyze concurrently with some few user queries: slows down to 
a crawl after vacuum is finished.

- if i run it while no user is connected, everything runs fine afterwards.

It has something to do with the vacuum analyze process. 
 
kind regards,
janning


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-08 Thread Alan Hodgson
On Tuesday 08 April 2008, [EMAIL PROTECTED] wrote:
 The problem is that we have peaktimes were everything is running fine. It
 has something to do with the vacuum process running. To simplify my
 problem:

 - I run vaccum analyze concurrently with some few user queries: slows
 down to a crawl after vacuum is finished.

 - if i run it while no user is connected, everything runs fine
 afterwards.

 It has something to do with the vacuum analyze process.

It's probably running the machine into swap. Check swap usage and disk I/O 
when it happens. 


-- 
Alan

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem after VACUUM ANALYZE

2008-04-08 Thread Scott Marlowe
It sounds to me like two possible problems, maybe combined.

One possibility is that you have a data distribution that results in
statistics being gathered that don't really represent your data.  Try
increasing the stats target for that column (or the whole db if you'd
rather) and re-running analyze.

The other possibility is that you've got some index bloat happening
and you might benefit from  reindexing the problematic table.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general