Re: [HACKERS] 8.5 release timetable, again

2009-08-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This seems a good idea.  Possibly pushing the betas more aggresively to
current users would make them tested not only by PG hackers ...


Isn't this the purpose of the new alpha releases, at lease to some extent.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Matthew T. O'Connor

Tom Lane wrote:

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.


I'm sure I'm missing something, but I thought the point of this was to 
lessen the impact of VACUUM and now you are suggesting that we have to 
add vacuums to tables that have never needed one before.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-11-17 Thread Matthew T. O'Connor

Aidan Van Dyk wrote:

* Greg Stark [EMAIL PROTECTED] [081117 03:54]:
I thought of saying that too but it doesn't really solve the problem.  
Think of what happens if someone sets a hint bit on a dirty page.


If the page is dirty from a real change, then it has a WAL backup block
record already, so the torn-page on disk is going to be fixed with the wal
replay ... *because* of the torn-page problem already being solved in PG.
You don't get the hint-bits back, but that's no different from the current
state.  But nobody's previously cared if hint-bits wern't set on WAL replay.



What if all changes to a page (even hit bits) are WAL logged when 
running with Block-level CRC checks enables, does that make things 
easier?  I'm sure it would result in some performance loss, but anyone 
enabling Block Level CRCs is already trading some performance for safety.


Thoughts?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Matthew T. O'Connor

Tom Lane wrote:

Decibel! [EMAIL PROTECTED] writes:
  
I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.



Of course they can do that --- they just have to do it one version at a
time.


Also, people may be less likely to stick with an old outdated version 
for years and years if the upgrade process is easier.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RAM-only temporary tables

2008-11-05 Thread Matthew T. O'Connor

Kevin Grittner wrote:

An idea for a possible enhancement to PostgreSQL: allow creation of a
temporary table without generating any disk I/O.  (Creating and
dropping a three-column temporary table within a database transaction
currently generates about 150 disk writes).
 
If some circumstances don't make it feasible to always do this as a

RAM-only operation, perhaps a clause could be added to the CREATE
TEMPORARY TABLE syntax to specify this behavior along with whatever
limitations on the temporary table are required for this to work. 
(For example, maybe this is only feasible if the table will be dropped

by the end of the transaction?)


As someone else already pointed out you can put temp tables on a RAM 
disk, but the larger issue is that temp tables still cause system table 
churn which will always need to be on stable media.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED

2008-10-15 Thread Matthew T. O'Connor

Tom Lane wrote:

Andrew Chernow [EMAIL PROTECTED] writes:

Be careful.  From LockFileEx docs:


However, the time it takes for the operating system to unlock these 
locks depends upon available system resources. Therefore, it is 
recommended that your process explicitly unlock all files it has locked 
when it terminates. If this is not done, access to these files may be 
denied if the operating system has not yet unlocked them.


ROTFL ... so to translate: If your program crashes, please release
locks before crashing.


Obviously that wasn't the intent of the above, but I guess it is the net 
effect.  Either way, I don't think it's a huge problem, it just means 
that PG may not be able to restart for a few seconds until the OS has 
time to clean-up the locks.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] September CommitFest Closed

2008-10-01 Thread Matthew T. O'Connor

Josh Berkus wrote:
For the September commitfest, 29 patches were applied (one to pgFoundry) 
 and 18 patches were sent back for more work.


More importantly, six *new* reviewers completed reviews of of various 
patches: Abbas Butt, Alex Hunsaker, Markus Wanner, Ibrar Ahmed, Ryan 
Bradetich and Gianni Colli.  Several other new reviewers volunteered, 
but we ran out of patches to check, and Kenneth Marshall had to quit 
reviewing because of Hurricane Ike.


Yay, reviewers.



If nothing else ever came of the Commit Fest approach, if it creates 
more reviewers, then I think the Commit Fest process would be a success.


I think the Commit Fest approach does this by lowering the bar of entry 
to become a reviewer.  It does this because it brings more focus to 
patch review on a regular basis and since it changes the patch review 
process from a last minute dash that only experience hackers should get 
involved with to a process with more time before the final deadline, 
hence more friendly for new reviewers to get involved.  Good news all 
around!




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor

Joshua D. Drake wrote:

Merlin Moncure wrote:

Well, there doesn't seem to be a TODO for partial/restartable vacuums,
which were mentioned upthread.  This is a really desirable feature for
big databases and removes one of the reasons to partition large
tables.
I would agree that partial vacuums would be very useful. 



I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.  I was thinking 
about Alvaro's recent work to make vacuum deal with TOAST tables 
separately, which is almost like a partial vacuum since it effectively 
splits the vacuum work up into multiple independent blocks of work, the 
limitation obviously being that it can only split the work around 
TOAST.  Is there anyway that vacuum could work per relfile since we 
already split tables into files that are never greater than 1G?  I would 
think that if Vacuum never had more than 1G of work to do at any given 
moment it would make it much more manageable.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-08-24 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor [EMAIL PROTECTED] writes:
  
I think everyone agrees that partial vacuums would be useful / *A Good 
Thing* but it's the implementation that is the issue.



I'm not sure how important it will really be once we have support for
dead-space-map-driven vacuum.


Is that something we can expect any time soon? I haven't heard much 
about it really happening for 8.4.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Matthew T. O'Connor

Tom Lane wrote:

Greg Sabino Mullane [EMAIL PROTECTED] writes:

Code outside of core, is, in reality, less reviewed, less likely to work
well with recent PG versions, and more likely to cause problems. It's also
less likely to be found by people, less likely to be used by people, and
less likely to be included by distros. Not to say that everything should get
shoved into core, of course, but there are strong arguments for both sides.


These are all true statements, of course, but ISTM they should be looked
on as problems to be solved.  Pushing stuff into core instead of solving
these problems is not a scalable long-term answer.


A few random thoughts...

The application that comes to mind first for me when you talk plugins is 
Firefox.  They make it very easy to browse for plugins and to install, 
update, remove them.  Their plug-in system also tries to account for 
Firefox version and OS platform which we would need to do also.


Perhaps one thing that would help PostgreSQL plug-ins is a nice GUI 
plug-in browser and management application.  The logical place to add 
this IMHO is PGAdmin since it is GUI, already talks to the DB and is 
cross platform.  I'm not saying a GUI should be required to manage 
plug-ins, a fully CLI option should be made available too.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrent VACUUM and ANALYZE

2008-07-21 Thread Matthew T. O'Connor

Jonah H. Harris wrote:

On Mon, Jul 21, 2008 at 10:19 PM, Tom Lane [EMAIL PROTECTED] wrote:
  

I don't find this a compelling argument, at least not without proof that
the various vacuum-improvement projects already on the radar screen
(DSM-driven vacuum, etc) aren't going to fix your problem.



Is DSM going to be in 8.4?  The last I had heard, DSM+related
improvements weren't close to being guaranteed for this release.  If
it doesn't make it, waiting another year and a half for something
easily fixed would be fairly unacceptable.  Should I provide a patch
in the event that DSM doesn't make it?


Can't hurt to submit a patch.  Also, could you do something to help 
mitigate the worse case, something like don't update the stats in 
pg_class if the analyze finishes after a vacuum has finished since the 
current analyze started?


Matt


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-06-27 Thread Matthew T. O'Connor

Tom Lane wrote:

We might have to rearrange the logic a bit to make that happen (I'm not
sure what order things get tested in), but a log message does seem like
a good idea.  I'd go for logging anytime an orphaned table is seen,
and dropping once it's past the anti-wraparound horizon.


Is there an easy way for an Admin clean-up the lost temp tables that 
autovacuum is complaining about?  It seems like it could be along time 
and a lot of log messages between when they are first orphaned and and 
finally dropped due to anti-wraparound protection.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XIDs and big boxes again ...

2008-05-12 Thread Matthew T. O'Connor

Hans-Juergen Schoenig wrote:
i suggest to introduce a --with-long-xids flag which would give me 62 / 
64 bit XIDs per vacuum on the entire database.

this should be fairly easy to implement.
i am not too concerned about the size of the tuple header here - if we 
waste 500 gb of storage here i am totally fine.


As you say later in the thread, you are on 8.1.  Alot of work has gone 
into reducing the effect, impact and frequency of XID wrap around  and 
vacuuming since then.  In 8.3 transactions that don't actually update a 
table no long use a real XID and autovacuum you no longer need a 
database wide vacuum to solve the XID wraparound problem, so I think the 
answer is upgrade to 8.3 and see if you still have this problem.


Matthew O'Connor

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor

Alex Hunsaker wrote:

In fact I
would argue -patches should go away so we dont have that split.


+1I think the main argument for the split is to keep the large 
patch emails off the hackers list, but I don't think that limit is so 
high that it's a problem.  People have to gzip their patches to the 
patches list fairly often anyway.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posting to hackers and patches lists

2008-05-07 Thread Matthew T. O'connor

Alex Hunsaker wrote:

A big part of my problem with the split is if there is a discussion
taking place on -hackers I want to be able to reply to the discussion
and say well, here is what I was thinking.  Sending it to -patches
first waiting for it to hit the archive so I can link to it in my
reply on -hackers seems pointless and convoluted.

But if thats what you want, thats what ill try to do from now on :)

For instance the patch Tom reviewed of mine yesterday only -hackers
was copied, so I maintained that but also added -patches because I was
sending in a patch...

I think It will be an ongoing problem though especially for new people
as they probably wont understand the logical split...


Patches are an integral part of the conversation about development, I 
think trying to split them up is awkward at best.  Do people really 
still think that the potential for larger messages is really a problem?  
By the way, what is the actual size limit on hackers vs patches.  I 
would imagine that most patches would already fit in the current hackers 
limit, especially since you can gzip.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor

Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
  

This is an interesting idea, but I think it's attacking the wrong
problem.  To me, the problem here is that an ANALYZE should not block
CREATE INDEX or certain forms of ALTER TABLE.



I doubt that that will work; in particular I'm pretty dubious that you
can safely make CREATE INDEX and VACUUM run together.  Since they'd be
unlikely to be using the identical OldestXmin horizon, you'd likely end
up with dangling index entries (ie, CREATE INDEX indexes a tuple that
the VACUUM removes shortly afterward).


I think the main issue is ANALYZE not VACUUM (at least in this thread) 
since it's DB load times that are in question.  Can CREATE INDEX and 
ANALYZE be made to run concurrently?




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-01 Thread Matthew T. O'Connor

Tom Lane wrote:

If you insist on crafting a solution that only fixes this problem for
pg_restore's narrow usage, you'll be back revisiting it before beta1
has been out a month.



I don't know much about what is involved in crafting these solutions, 
but it seems we're close to beta and probably don't want to make drastic 
changes to anything.  As such it seems to me that solving the problem 
for analyze is a nice piece of low-hanging fruit that solves an 
immediate problem that has been reported.  I would think that reducing 
the locking implications of  VACUUM is much more involved, no?  Also, I 
would think that the fix for ANALYZE will be totally different than the 
fix for VACUUM no?


Are you proposing that we solve the VACUUM locking problem before we 
release 8.3?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] More logging for autovacuum

2007-08-07 Thread Matthew T. O'Connor

Gregory Stark wrote:

I'm having trouble following what's going on with autovacuum and I'm finding
the existing logging insufficient. In particular that it's only logging vacuum
runs *after* the vacuum finishes makes it hard to see what vacuums are running
at any given time. Also, I want to see what is making autovacuum decide to
forgo vacuuming a table and the log with that information is at DEBUG2.

So I would like to suggest adding two options:

log_autovacuum_jobs - output every time a vacuum or analyze *starts*

log_autovacuum_level - set the log level for the autovacuum process

I would also suggest raising the level of the DEBUG2 message indicating why
tables were chosen or not. At least to DEBUG1 if not to INFO.

Am I missing anything? Are there ways to get this info already that I'm
missing? I imagine it would be pretty simple to add these and I'll be happy to
do it and send the patch to -patches assuming others (Alvaro? :) agree.



I think this sounds good.  There was talk a while ago about need a 
special log level setting just for autovacuum, but nobody did the leg 
work.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)


I think it's probably fine. I think, that the optimal number for the 
base_threhold is probably dependant on the width of the row, for a very 
narrow row where you might have many on the same page, 20 or 50 might be 
right, but for a very wide table a smaller number might be optimal, 
however I think it probably doesn't matter much anyway.


Reducing the default to 10 seems fine, and perhaps even removing it as a 
tuning knob.  I think there are too many autovacuum knobs and it 
confuses people.  Is it too late to possibly remove this GUC altogether?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.


How about pushing thresholds all the way down to 0?



As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.


If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor

Andrew Dunstan wrote:
The situation with this patch is that I now have it in a state where I 
think it could be applied, but there is one blocker, namely that we do 
not have a way of preventing the interleaving of log messages from 
different backends, which leads to garbled logs. This is an existing 
issue about which we have had complaints, but it becomes critical for a 
facility the whole purpose of which is to provide logs in a format 
guaranteed to work with our COPY command.


Unfortunately, there is no solution in sight for this problem, certainly 
not one which I think can be devised and implemented simply at this 
stage of the cycle. The solution we'd like to use, LWLocks, is not 
workable in his context. In consequence, I don't think we have any 
option but to shelve this item for the time being.


I think this will get shot down, but here goes anyway...

How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] COPYable logs status

2007-06-08 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor [EMAIL PROTECTED] writes:
How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.


We *have* a log-writing process.  The problem is in getting the data to it.


By that I assume you mean the bgwriter, I thought that was for WAL data, 
I didn't think it could or perhaps should be used for normal log file 
writing, but I also know I'm way outside my comfort area in talking 
about this, so excuse the noise if this is way off base.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby escribió:

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.


Yes, the code in HEAD is different -- each database will be considered
separately.  So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either.  There can be more than one
worker in a single database.


Ok, but I think the question posed is that in say a virtual hosting 
environment there might be say 1,000 databases in the cluster. Am I 
still going to have to wait a long time for my database to get vacuumed? 
 I don't think this has changed much no?


(If default naptime is 1 minute, then autovacuum won't even look at a 
given database but once every 1,000 minutes (16.67 hours) assuming that 
there isn't enough work to keep all the workers busy.)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] [BUGS] BUG #3326: Invalid lower bound of autovacuum_cost_limit

2007-06-07 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

But this is misleading (started postmaster with good value, then edited
postgresql.conf and entered -2):
17903 LOG:  received SIGHUP, reloading configuration files
17903 LOG:  -2 is outside the valid range for parameter 
autovacuum_vacuum_cost_limit (-1 .. 1000)
Note how it still says the range is -1 .. 1000.

Can we redefine things to make zero be the disabled value, thus
keeping the range of valid values contiguous?


That would be another solution ... though it would be different from the
valid value for autovacuum_vacuum_cost_delay (on which 0 is a valid
value).  Also it would be a different value from previous versions.

I don't think either of these is a showstopper, so let's go for that if
nobody objects.


Can you make 0 and -1 both valid disabled values?  That way it will be 
compatible with previous releases.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-07 Thread Matthew T. O'Connor

Tom Lane wrote:

Andrew Hammond [EMAIL PROTECTED] writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.


It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-).  Would an hour be sane?  10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.


Do we need a configurable autovacuum naptime at all?  I know I put it in 
the original contrib autovacuum because I had no idea what knobs might 
be needed.  I can't see a good reason to ever have a naptime longer than 
the default 60 seconds, but I suppose one might want a smaller naptime 
for a very active system?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.


Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).



Perhaps if you are as PITR master and you have active readonly slaves 
then there should be a WAL record to note plan invalidations, etc?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum versus rolled-back transactions

2007-06-01 Thread Matthew T. O'Connor

Tom Lane wrote:

ITAGAKI Takahiro [EMAIL PROTECTED] writes:

Our documentation says
| analyze threshold = analyze base threshold
|   + analyze scale factor * number of tuples
| is compared to the total number of tuples inserted, updated, or deleted
| since the last ANALYZE. 



but deleted tuples are not considered in the total number, because the delta
of {n_live_tuples + n_dead_tuples} is not changed by DELETE. We add the number
of DELETE into n_live_tuples and subtract it from n_dead_tuples.


Yeah, I was concerned about that when I was making the patch, but didn't
see any simple fix.  A large number of DELETEs (without any inserts or
updates) would trigger a VACUUM but not an ANALYZE, which in the worst
case would be bad because the stats could have shifted.

We could fix this at the cost of carrying another per-table counter in
the stats info, but I'm not sure it's worth it.


I believe that whenever autovacuum performs a VACUUM it actually 
performs a VACUUM ANALYZE at leas the old contrib version did and I 
think Alvaro copied that.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] What X86/X64 OS's do we need coverage for?

2007-04-05 Thread Matthew T. O'Connor

Larry Rosenman wrote:

I might use that as the base then, since the hardware finishes getting here
tomorrow.


The other thing to consider is that CentOS 5 has Xen built right in, so 
you should be able run VMs without VMWare on it.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor

Joshua D. Drake wrote:

The big thing for me, is a single document, zero clicks, that is
searchable. PDF and plain text are the only thing that give me that. If
you are really zealous you can even use Beagle (which I don't) to
preindex the PDF for you for easy searching.


Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Documentation access problems.

2007-03-23 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
Lots of projects publish their HTML docs in two formats: One Big HTML 
file with everything; Broken up into many HTML files that link to each 
other.  This would allow you you have one big searchable document.

The key word there being big ;-) ... I don't have any problem with
making such a version available on the website, but I don't think
shipping two versions of the HTML docs in our tarballs is reasonable.


I think having the single HTML file version available on our web site is
enough.


Agreed.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 3

2007-03-09 Thread Matthew T. O'Connor
My initial reaction is that this looks good to me, but still a few 
comments below.


Alvaro Herrera wrote:

Here is a low-level, very detailed description of the implementation of
the autovacuum ideas we have so far.

launcher's dealing with databases
-


[ Snip ]


launcher and worker interactions


[Snip]


worker to-do list
-
When each worker starts, it determines which tables to process in the
usual fashion: get pg_autovacuum and pgstat data and compute the
equations.

The worker then takes a snapshot of what's currently going on in the
database, by storing worker PIDs, the corresponding table OID that's
being currently worked, and the to-do list for each worker.


Does a new worker really care about the PID of other workers or what 
table they are currently working on?



It removes from its to-do list the tables being processed.  Finally, it
writes the list to disk.


Just to be clear, the new worker removes from it's todo list all the 
tables mentioned in the todo lists of all the other workers?



The table list will be written to a file in
PGDATA/vacuum/database-oid/todo.worker-pid
The file will consist of table OIDs, in the order in which they are
going to be vacuumed.

At this point, vacuuming can begin.


This all sounds good to me so far.


Before processing each table, it scans the WorkerInfos to see if there's
a new worker, in which case it reads its to-do list to memory.


It's not clear to me why a worker cares that there is a new worker, 
since the new worker is going to ignore all the tables that are already 
claimed by all worker todo lists.



Then it again fetches the tables being processed by other workers in the
same database, and for each other worker, removes from its own in-memory
to-do all those tables mentioned in the other lists that appear earlier
than the current table being processed (inclusive).  Then it picks the
next non-removed table in the list.  All of this must be done with the
Autovacuum LWLock grabbed in exclusive mode, so that no other worker can
pick the same table (no IO takes places here, because the whole lists
were saved in memory at the start.)


Again it's not clear to me what this is gaining us?  It seems to me that 
if when a worker starts up writes out it's to-do list, it should just do 
it, I don't see the value in workers constantly updating their todo 
lists.  Maybe I'm just missing something can you enlighten me?



other things to consider


This proposal doesn't deal with the hot tables stuff at all, but that is
very easy to bolt on later: just change the first phase, where the
initial to-do list is determined, to exclude cold tables.  That way,
the vacuuming will be fast.  Determining what is a cold table is still
an exercise to the reader ...


I think we can make this algorithm naturally favor small / hot tables 
with one small change.  Having workers remove tables that they just 
vacuumed from their to-do lists and re-write their todo lists to disk. 
Assuming the todo lists are ordered by size ascending, smaller tables 
will be made available for inspection by newer workers sooner rather 
than later.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum next steps, take 3

2007-03-09 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
It's not clear to me why a worker cares that there is a new worker, 
since the new worker is going to ignore all the tables that are already 
claimed by all worker todo lists.


That seems wrong to me, since it means that new workers will ignore
tables that are scheduled for processing by an existing worker, no
matter how far in the future that schedule extends.  As an example,
suppose you have half a dozen large tables in need of vacuuming.
The first worker in will queue them all up, and subsequent workers
will do nothing useful, at least not till the first worker is done
with the first table.  Having the first worker update its todo
list file after each table allows the earlier tables to be exposed
for reconsideration, but that's expensive and it does nothing for
later tables.


Well the big problem that we have is not that large tables are being 
starved, so this doesn't bother me too much, plus there is only so much 
IO, so one worker working sequentially through the big tables seems OK 
to me.



I suggest that maybe we don't need exposed TODO lists at all.  Rather
the workers could have internal TODO lists that are priority-sorted
in some way, and expose only their current table OID in shared memory.
Then the algorithm for processing each table in your list is

1. Grab the AutovacSchedule LWLock exclusively.
2. Check to see if another worker is currently processing
   that table; if so drop LWLock and go to next list entry.
3. Recompute whether table needs vacuuming; if not,
   drop LWLock and go to next entry.  (This test covers the
   case where someone vacuumed the table since you made your
   list.)
4. Put table OID into shared memory, drop LWLock, then
   vacuum table.
5. Clear current-table OID from shared memory, then
   repeat for next list entry.

This creates a behavior of whoever gets to it first rather than
allowing workers to claim tables that they actually won't be able
to service any time soon.


Right, but you could wind up with as many workers working concurrently 
as you have tables in a database which doesn't seem like a good idea 
either.  One thing I like about the todo list setup Alvaro had is that 
new workers will be assigned fewer tables to work on and hence exit 
sooner.  We are going to fire off a new worker every autovac_naptime so 
availability of new workers isn't going to be a problem.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:

Jim C. Nasby wrote:


The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters that
are running dozens-hundreds of databases; in that environment you really
need to turn naptime way down (to like a second). In that case you
wouldn't want to key to naptime.

Actually, I've been thinking that it would be a good idea to change the
semantics of autovacuum_naptime so that it means the average time to
start a worker in any given database.  That way, the time between
autovac runs is not dependent on the number of databases you have.


BTW, another issue that I don't think we can ignore: we actually need to
do this on a per-tablespace level, or at least have the ability to
disable or somehow limit it. While it's not common, there are users that
run a hundred or more databases in a single cluster; it would be ugly if
we suddenly had 100 vacuums trying to run on the same set of drives
concurrently.


I think we all agree that autovacuum needs to become tablespace aware at 
some point, but I think that is further down the line, we're having 
enough trouble figuring things out without that additional complication.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-27 Thread Matthew T. O'Connor

Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

On Tue, 2007-02-27 at 10:37 -0600, Jim C. Nasby wrote:

... The idea would be to give vacuum a target run time, and it
would monitor how much time it had remaining, taking into account how
long it should take to scan the indexes based on how long it's been
taking to scan the heap. When the amount of time left becomes less than
the estimate of the amount of time required to scan the indexes (and
clean the heap), you stop the heap scan and start scanning indexes.



I do like this idea, but it also seems easy to calculate that bit
yourself. Run VACUUM, after X minutes issue stop_vacuum() and see how
long it takes to finish. Adjust X until you have it right.


One problem with it is that a too-small target would result in vacuum
proceeding to scan indexes after having accumulated only a few dead
tuples, resulting in increases (potentially enormous ones) in the total
work needed to vacuum the table completely.

I think it's sufficient to have two cases: abort now, and restart from
the last cycle-completion point next time (this would basically just be
SIGINT); or set a flag to stop at the next cycle-completion point.


It occurs to me that we may be thinking about this the wrong way
entirely.  Perhaps a more useful answer to the problem of using a
defined maintenance window is to allow VACUUM to respond to changes in
the vacuum cost delay settings on-the-fly.  So when your window closes,
you don't abandon your work so far, you just throttle your I/O rate back
to whatever's considered acceptable for daytime vacuuming.


I thought we already did that?  Which BTW was part of my plan on how to 
deal with a vacuum that is still running after it's maintenance window 
has expired.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:


That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number of seconds.

If we wanted to get fancy, we could factor in how far over the vacuum
threshold a table is, so even if the table is on the larger size, if
it's way over the threshold the second vacuum will hit it.


Ok, I think we may be actually getting somewhere.


Me too.


I propose to have two different algorithms for choosing the tables to
work on.  The worker would behave differently, depending on whether
there is one or more workers on the database already or not.

The first algorithm is the plain threshold equation stuff we use today.
If a worker connects and determines that no other worker is in the
database, it uses the plain worker mode.  A worker in this mode would
examine pgstats, determine what tables to vacuum/analyze, sort them by
size (smaller to larger), and goes about its work.  This kind of worker
can take a long time to vacuum the whole database -- we don't impose any
time limit or table size limit to what it can do.


Right, I like this.


The second mode is the hot table worker mode, enabled when the worker
detects that there's already a worker in the database.  In this mode,
the worker is limited to those tables that can be vacuumed in less than
autovacuum_naptime, so large tables are not considered.  Because of
this, it'll generally not compete with the first mode above -- the
tables in plain worker were sorted by size, so the small tables were
among the first vacuumed by the plain worker.  The estimated time to
vacuum may be calculated according to autovacuum_vacuum_delay settings,
assuming that all pages constitute cache misses.


How can you determine what tables can be vacuumed within 
autovacuum_naptime?  I agree that large tables should be excluded, but I 
don't know how we can do that calculation based on autovacuum_naptime.


So at:
t=0*autovacuume_naptime: worker1 gets started on DBX
t=1*autovacuume_naptime: worker2 gets started on DBX
worker2 determines all tables that need to be vacuumed,
worker2 excludes tables that are too big from it's to-do list,
worker2 gets started working,
worker2 exits when it either:
a) Finishes it's entire to-do-list.
b) Catches up to worker1

I think the questions are 1) What is the exact math you are planning on 
using to determine which tables are too big?  2) Do we want worker2 to 
exit when it catches worker1 or does the fact that we have excluded 
tables that re too big mean that we don't have to worry about this?



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
How can you determine what tables can be vacuumed within 
autovacuum_naptime?


My assumption is that
pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to vacuum

This is of course not the reality, because the delay is not how long
it takes to fetch the pages.  But it lets us have a value with which we
can do something.  With the default values, vacuum_cost_delay=10,
vacuum_cost_page_miss=10, autovacuum_naptime=60s, we'll consider tables
of under 600 pages, 4800 kB (should we include indexes here in the
relpages count?  My guess is no).


I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?



A table over 600 pages does not sound like a good candidate for hot, so
this seems more or less reasonable to me.  On the other hand, maybe we
shouldn't tie this to the vacuum cost delay stuff.


I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.


Thoughts?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?


Well, this would only happen the first time, until the plain worker
processed the table; next time it would be picked up by the hot table
worker.  But yeah, we can build a better estimate using the same trick
the planner uses: estimate tuple density as reltuples/relpages times the
actual number of blocks on disk.


Well even skipping it the first time isn't good, anything that further 
delays a hot table from getting vacuumed is bad.  Also, I'm not sure it 
would just be the first time since plain VACUUM isn't going to reclaim 
most of the space, just mark it as reusable.  This is moot however if we 
use a good metric, I have no idea if what you suggest above would be 
good enough.



A table over 600 pages does not sound like a good candidate for hot, so
this seems more or less reasonable to me.  On the other hand, maybe we
shouldn't tie this to the vacuum cost delay stuff.
I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.


Yeah, maybe that's better -- it's certainly simpler.


Simple is better, at least until proven otherwise.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

Matthew T. O'Connor wrote:
I'm not sure it's a good idea to tie this to the vacuum cost delay 
settings either, so let me as you this, how is this better than just 
allowing the admin to set a new GUC variable like 
autovacuum_hot_table_size_threshold  (or something shorter) which we can 
assign a decent default of say 8MB.



Yeah, maybe that's better -- it's certainly simpler.


I'm not liking any of these very much, as they seem critically dependent
on impossible-to-tune parameters.  I think it'd be better to design this
around having the first worker explicitly expose its state (list of
tables to process, in order) and having subsequent workers key off that
info.  The shared memory state could include the OID of the table each
worker is currently working on, and we could keep the to-do list in some
simple flat file for instance (since we don't care about crash safety).


So far we are only talking about one parameter, the 
hot_table_size_threshold, which I agree would be a guess by an admin, 
but if we went in this direction, I would also advocate adding a column 
to the pg_autovacuum table that allows an admin to explicitly define a 
table as hot or not.


Also I think each worker should be mostly independent, the only caveat 
being that (assuming each worker works in size order) if we catch up to 
an older worker (get to the table they are currently working on) we 
exit.  Personally I think this is all we need, but others felt the 
additional threshold was needed.  What do you think?  Or what do you 
think might be better?



I'm not certain exactly what key off needs to mean; perhaps each
worker should make its own to-do list and then discard items that are
either in-progress or recently done by another worker when it gets to
them.


My initial design didn't have any threshold at all, but others felt this 
would/could result in too many worker working concurrently in the same DB.



I think an absolute minimum requirement for a sane design is that no two
workers ever try to vacuum the same table concurrently, and I don't see
where that behavior will emerge from your proposal; whereas it's fairly
easy to make it happen if non-first workers pay attention to what other
workers are doing.


Maybe we never made that clear, I was always working on the assumption 
that two workers would never try to work on the same table at the same time.



BTW, it's probably necessary to treat shared catalogs specially ...


Certainly.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a 
bloated table?  For example, a 100 row table that is constantly updated 
and hasn't been vacuumed in a while (say the admin disabled autovacuum 
for a while), now that small 100 row table has 1000 pages in it most of 
which are just bloat, will we miss this table?  Perhaps basing this on 
reltuples would be better?


The entire point of this is to ensure that the second daemon will only
vacuum tables that it can finish very quickly. If you let a table bloat
so it's too big, then you just can't vacuum it very frequently without
risking all your other hot tables bloating because they're no longer
getting vacuumed.

The reality is that you can actually vacuum a pretty good-sized table in
60 seconds with typical cost-delay settings (ie: defaults except
cost_delay set to 10). That means you can do 9 pages ~100 times a
second, or 54k pages a minute. Even with a vacuum_cost_delay of 20,
that's still 27k pages per minute.


At the risk of sounding like a broken record, I still think the size 
limit threshold is unnecessary.  Since all workers will be working in on 
tables in size order, younger workers will typically catch older workers 
fairly quickly since the tables will be either small, or recently 
vacuumed and not need work.  And since younger workers exit when they 
catch-up to an older worker, there is some inherent stability in the 
number of workers.


Here is a worst case example: A DB with 6 tables all of which are highly 
active and will need to be vacuumed constantly.  While this is totally 
hypothetical, it is how I envision things working (without the threshold).


table1:10 rows
table2:   100 rows
table3: 1,000 rows
table4:10,000 rows
table5:   100,000 rows
table6: 1,000,000 rows

time=0*naptime: No workers in the DB

time=1*naptime: worker1 starts on table1

time=2*naptime: worker1 has finished table1,table2 and table3, it's now 
working on table4, worker2 starts on table1.


time=3*naptime: worker1 is on table5, worker2 is working on table4, 
worker3 starts on table1.


time=4*naptime: worker1 is still on table5, worker2 has caught up to 
worker1 and exits, worker3 also catches up to worker1 since tables2-4 
didn't require vacuum at this time so it exits, worker4 starts on table1


time=5*naptime: worker1 is working on table6, worker4 is up to table4, 
worker5 starts on table1


time=6*naptime: worker1 is working on table6, worker4 catches up to 
worker1 and exits, worker5 finds no additional work to be done and 
exits, worker6 starts at table1.


time=7*naptime: worker1 still working on table6, worker6 is up to 
table4, worker7 starts at table1.


time=8*naptime: worker1 still working on table6, worker6 still working 
on table4, worker7 working on table3, worker8 starting on table1.


time=9*naptime: worker1 still working on table6, worker6 working on 
table5, worker7 catches worker 6 and exits, worker8 finds nothing more 
todo and exits, worker9 starts on table1


time=10*naptim: worker1 still working on table6, worker9 working on 
table4, worker10 starts on table1.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

The real problem is trying to set that up in such a fashion that keeps
hot tables frequently vacuumed;


Are we assuming that no single worker instance will vacuum a given table
more than once?  (That's not a necessary assumption, certainly, but
without it there are so many degrees of freedom that I'm not sure how
it should act.)  Given that assumption, the maximum vacuuming rate for
any table is once per autovacuum_naptime, and most of the magic lies in
the launcher's algorithm for deciding which databases to launch workers
into.


Yes, I have been working under the assumption that a worker goes through 
the list of tables once and exits, and yes the maximum vacuuming rate 
for any table would be once per autovacuum_naptime.  We can lower the 
default if necessary, as far as I'm concerned it's (or should be) fairly 
cheap to fire off a worker and have it find that there isn't anything 
todo and exit.



I'm inclined to propose an even simpler algorithm in which every worker
acts alike; its behavior is
1. On startup, generate a to-do list of tables to process, sorted in
priority order.
2. For each table in the list, if the table is still around and has not
been vacuumed by someone else since you started (including the case of
a vacuum-in-progress), then vacuum it.


That is what I'm proposing except for one difference, when you catch up 
to an older worker, exit.  This has the benefit reducing the number of 
workers concurrently working on big tables, which I think is a good thing.



Detecting already vacuumed since you started is a bit tricky; you
can't really rely on the stats collector since its info isn't very
up-to-date.  That's why I was thinking of exposing the to-do lists
explicitly; comparing those with an advertised current-table would
allow accurate determination of what had just gotten done.


Sounds good, but I have very little insight into how we would implement 
already vacuumed since you started or have I caught up to another 
worker.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

BTW, to what extent might this whole problem be simplified if we adopt
chunk-at-a-time vacuuming (compare current discussion with Galy Lee)?
If the unit of work has a reasonable upper bound regardless of table
size, maybe the problem of big tables starving small ones goes away.


So if we adopted chunk-at-a-time then perhaps each worker processes the 
list of tables in OID order (or some unique and stable order) and does 
one chunk per table that needs vacuuming.  This way an equal amount of 
bandwidth is given to all tables.


That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?


Matt


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:

Tom Lane wrote:

I'm inclined to propose an even simpler algorithm in which every worker
acts alike;


That is what I'm proposing except for one difference, when you catch up 
to an older worker, exit.


No, that's a bad idea, because it means that any large table starves
even-larger tables.


True, but the assumption I'm making is that there is a finite amount of 
bandwidth available and more concurrent activity will have a net 
negative effect the time it takes to vacuum all tables.  I'm willing to 
pay that price to prevent small hot tables from getting starved, but 
less willing to pay the same price for large tables where the percentage 
of bloat will be much smaller.



(Note: in all this I assume we're all using size as a shorthand for
some sort of priority metric that considers number of dirty tuples not
only size.  We don't want every worker insisting on passing over every
small read-only table every time, for instance.)


I was using size to mean reltuples.  The whole concept of sorting by 
size was designed to ensure that smaller (more susceptible to bloat) 
tables got priority.  It might be useful for workers to sort their to-do 
lists by some other metric, but I don't have a clear vision of what that 
might be.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:

That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?


It seems fairly trivial to me to have a scheme where you do one
fill-workmem-and-scan-indexes cycle per invocation, and store the
next-heap-page-to-scan in some handy place (new pg_class column updated
along with relpages/reltuples, likely).  Galy is off in left field with
some far more complex ideas :-( but I don't see that there's all that
much needed to support this behavior ... especially if we don't expose
it to the SQL level but only support it for autovac's use.  Then we're
not making any big commitment to support the behavior forever.


Well, if we can make it happen soon, it might be the best thing for 
autovacuum.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:

Jim C. Nasby wrote:
Here is a worst case example: A DB with 6 tables all of which are highly 
active and will need to be vacuumed constantly.  While this is totally 
hypothetical, it is how I envision things working (without the threshold).


I fail to see how a simple 6 table case is 'worst case'. It's common to
see hundreds of tables, and I've run across more than one database with
thousands of tables (think partitioning). In cases like those it's
certainly possible, perhaps even likely that you would get many daemons
running in the database at one time just from different tables suddenly
needing vacuuming and appearing at a higher point in the list than other
tables. With 100 ~1G tables getting updates it certainly wouldn't be
hard to end up with 10 of those being vacuumed all at the same time.


Yes 6 tables is small, the worst-case part of the example was that all 
the tables would need to be vacuumed constantly.  Most databases only 
have a few hot tables.  Most tables only need to vacuumed every once in 
a while.



I do like the idea since it should be easier to tune, but I think we
still need some limit on it. Perhaps as a first-pass we could just have
a hard limit and log a message and/or set a flag any time we hit it.
That would hopefully allow us to get information about how big a problem
it really is. We could go one step further and say that the last daemon
that can start in a database will only vacuum tables that can be done
quickly; that's essentially what we've been talking about, except the
limit we've been discussing would be hard-coded at 2.


I'm confused, what limit would be set at 2?  The number of concurrent 
workers?  I've never said that.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it.  With the default maintenance_work_mem of 128M it would mean writing
out 64M of state every minute on average, which is likely to take
several seconds to fsync (though, maybe we wouldn't need to fsync it...)


Which is exactly why we needn't bother benchmarking it.  Even if it
weren't complex and unsafe, it will be a net loss when you consider the
fact that it adds I/O instead of removing it.


I'm not sure what you are saying here, are you now saying that partial 
vacuum won't work for autovac?  Or are you saying that saving state as 
Jim is describing above won't work?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
I'm not sure what you are saying here, are you now saying that partial 
vacuum won't work for autovac?  Or are you saying that saving state as 
Jim is describing above won't work?


I'm saying that I don't like the idea of trying to stop on a dime by
saving the current contents of vacuum's dead-TID array to disk with the
idea that we can trust those values 100% later.  Saving the array is
expensive both in runtime and code complexity, and I don't believe we
can trust it later --- at least not without even more expensive-and-
complex measures, such as WAL-logging every such save :-(

I'm for stopping only after completing an index-cleaning pass, at the
point where we empty the dead-TID array anyway.  If you really have to
have stop on a dime, just kill -INT the process, accepting that you
will have to redo your heap scan since the last restart point.


OK, so if I understand correct, a vacuum of a table with 10 indexes on 
it can be interrupted 10 times, once after each index-cleaning pass? 
That might have some value, especially breaking up the work required to 
vacuum a large table. Or am I still not getting it?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
  
My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.



Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.


So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



Thoughts?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
  

So the heuristic would be:
* Launcher fires off workers into a database at a given interval 
(perhaps configurable?)
* Each worker works on tables in size order. 
* If a worker ever catches up to an older worker, then the younger 
worker exits.


This sounds simple and workable to me, perhaps we can later modify this 
to include some max_workers variable so that a worker would only exit if 
it catches an older worker and there are max_workers currently active.



That would likely result in a number of workers running in one database,
unless you limited how many workers per database. And if you did that,
you wouldn't be addressing the frequently update table problem.

A second vacuum in a database *must* exit after a fairly short time so
that we can go back in and vacuum the important tables again (well or
the 2nd vacuum has to periodically re-evaluate what tables need to be
vacuumed).
  


I'm not sure this is a great idea, but I don't see how this would result 
in large numbers of workers working in one database.   If workers work 
on tables in size order, and exit as soon as they catch up to an older 
worker, I don't see the problem.  Newer works are going to catch-up to 
older workers pretty quickly since small tables will vacuum fairly quickly.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Ok, scratch that :-)  Another round of braindumping below.


I still think this is solution in search of a problem.  The main problem 
we have right now is that hot tables can be starved from vacuum.  Most 
of this proposal doesn't touch that.  I would like to see that problem 
solved first, then we can talk about adding multiple workers per 
database or per tablespace etc...



(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables).  This may be overkill.)


I think this is more along the lines of what we need first.


Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc.  He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented.  So this take 2 could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).


Agree this is probably a good idea in the long run, but I agree this is 
lower on the priority list and should come next.



Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace.  I say we're not :-)  If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate.  I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.


I agree that in the long run it might be better to have multiple workers 
with MB/s throttle and tablespace aware, but we don't have any of that 
infrastructure right now.  I think the piece of low-hanging fruit that 
your launcher concept can solve is the hot table starvation.


My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.


This seems a very simple solution (given your launcher work) that can 
solve the starvation problem.



Thoughts?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:


[Snip Detailed Description]


How does this sound?


On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).  In general the only time it's a 
good idea to have multiple vacuums running at the same time is when a 
big table is starving a small hot table and causing bloat.


I think we can extend the current autovacuum stats to add one more 
column that specifies is hot or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


Thoughts?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).


Yeah, I understand that.  However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available.  The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.


So you are telling people to choose an autovacuum_delay so high that 
they need to run multiple autovacuums at once to keep up?  I'm probably 
being to dramatic, but it seems inconsistent.


I think we can extend the current autovacuum stats to add one more 
column that specifies is hot or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


How would you define what's a hot table?


I wasn't clear, I would have the Admin specified it, and we can store it 
as an additional column in the pg_autovacuum_settings table.  Or perhaps 
if the table is below some size threshold and autovacuum seems that it 
needs to be vacuumed every time it checks it 10 times in a row or 
something like that.	


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).


I'm not familiar with his proposal, but I'm not sure what I think of 
logging vacuum (and perhaps analyze) commands to a table.  We have never 
logged anything to tables inside PG.  I would be worried about this 
eating a lot of space in some situations.


I think most people would just be happy if we could get autovacuum to 
log it's actions at a much higher log level.  I think that autovacuum 
vacuumed table x is important and shouldn't be all the way down at the 
debug level.


The other (more involved) solution to this problem was proposed which 
was create a separate set of logging control params for autovacuum so 
that you can turn it up or down independent of the general server logging.



Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.


+1


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum improvements

2007-01-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  
This still seems ambiguous to me, how would I handle a maintenance 
window of Weekends from Friday at 8PM though Monday morning at 6AM? My 
guess from what said is:

mon dom dow starttime endtime
null  null6  20:00  null
null  null1  null  06:00

So how do we know to vacuum on Saturday or Sunday?  I think clearly 
defined intervals with explicit start and stop times is cleaner.



mon dom dow start   end
nullnull5   20:00   23:59:59
nullnull6   00:00   23:59:59
nullnull7   00:00   23:59:59
nullnull1   00:00   06:00

(1 = monday, 5 = friday)
  


So it takes 4 lines to handle one logical interval, I don't really like 
that.  I know that your concept of interval groups will help mask this 
but still.



Now I'm starting to wonder what will happen between 23:59:59 of day X
and 00:00:00 of day (X+1) ...  Maybe what we should do is not specify
an end time, but a duration as an interval:

month   int
dom int
dow int
start   time
durationinterval

That way you can specify the above as
mon dom dow start   duration
nullnull5   20:00   (4 hours + 2 days + 6 hours)

Now, if a DST boundary happens to fall in that interval you'll be an
hour short, or it'll last an hour too long :-)
  


I certainly like this better than the first proposal, but I still don't 
see how it's better than a  full set of columns for start and end 
times.  Can you tell me why you are trying to avoid that design? 

Hmm... this seems like queue is nearly a synonym for group.  Can't we 
just add num_workers property to table groups?  That seems to accomplish 
the same thing.  And yes, a GUC variable to limits the total number of 
concurrent autovacuums is probably a good idea.



queue = group of groups.  But I'm not sure about this at all, which is
why I took it away from the proposal.


I think we can live without the groups of groups, at least for now. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum improvements

2007-01-14 Thread Matthew T. O'Connor
First, thanks for working on this.  I hope to be helpful with the design 
discussion and possibly some coding if I can find the time.


My initial reaction to this proposal is that it seems overly complex, 
however I don't see a more elegant solution.  I'm a bit concerned that 
most users won't figure out all the knobs.


Alvaro Herrera wrote:

I've been thinking how to improve autovacuum so that we can convince
more people that it can be enabled by default.  


I would like to see it enabled by default too, however the reason it 
isn't already enabled by default is that it caused failures in the 
regression test when we tried to turn it on during the 8.2 dev cycle and 
it was too close to beta to fix everything.  All this new machinery is 
great, but it doesn't address that problem.



Here are my thoughts.
There are two areas of improvements:

1. scheduling, and
2. process handling, i.e., how to have multiple vacuum processes running
   at any time.


Fail enough, but I would say the two biggest area for improvement are 
scheduling and preventing HOT tables from becoming vacuum starved 
(essentially what you said, but with a different emphasis).


[snip]


Process Handling


My idea here is to morph the current autovacuum daemon from an agent
that itself runs a vacuum command, into something that launches other
processes to run those vacuum commands.  I'll call this the autovacuum
launcher process, or the launcher for short.  The idea here is that the
launcher can take care of the scheduling while the worker processes do
their work.  If the launcher then determines that a particular instant
there should be two vacuums running, then it simply starts two worker
processes.


How about calling it the autovacuum_master process?

[snip autovacuum launcher process description]

That all sounds reasonable to me.  I think the harder part is what you 
are getting at below (how to get the launcher to figure out what to 
vacuum when).



Scheduling
==
We introduce the following concepts:

1. table groups.  We'll have a system catalog for storing OID and group
name, and another catalog for membership, linking relid to group OID.

pg_av_tablegroup
  tgrname   name

pg_av_tgroupmembers
  groupid   oid
  relid oid



2. interval groups.  We'll have a catalog for storing igroup name and
OID, and another catalog for membership.  We identify an interval by:
   - month of year
   - day of month
   - day of week
   - start time of day
   - end time of day

This is modelled after crontabs.

pg_av_intervalgroup
 igrnamename

pg_av_igroupmembers
 groupidoid
 month  int
 domint
 dowint
 starttime  timetz
 endtimetimetz


This seems to assume that the start and end time for an interval will be 
on the same day, you probably need to specify a start month, dom, dow, 
time and an end month, dom, dow and time.


Since this is modeled after cron, do we allow wild-cards, or any of the 
other cron tricks like */20 or 1-3,5,7,9-11?


Also your notation above is ambiguous, it took me a while to realize 
that pg_av_igroupmembers.groupid wasn't referencing the id from 
pg_av_tablegroup.



Additionally, we'll have another catalog on which we'll store table
groups to interval groups relationships.  On that catalog we'll also
store those autovacuum settings that we want to be able to override:
whether to disable it for this interval group, or the values for the
vacuum/analyze equations.

pg_av_schedule
 tgroup oid
 igroup oid
 enabledbool
 queue  int
 vac_base_threshint
 vac_scale_factor   float
 anl_base_threshint
 anl_scal_factorfloat
 vac_cost_delay int
 vac_cost_limit int
 freeze_min_age int
 freeze_max_age int



What is queue for?


So the scheduler, at startup, loads the whole schedule in memory, and
then wakes up at reasonable intervals and checks whether these equations
hold for some of the tables it's monitoring.  If they do, then launch a
new worker process to do the job.

We need a mechanism for having the scheduler rescan the schedule when a
user modifies the catalog -- maybe having a trigger that sends a signal
to the process is good enough (implementation detail: the signal must be
routed via the postmaster, since the backend cannot hope to know the
scheduler's PID.  This is easy enough to do.)


This all looks reasonable if not a bit complex.  Question, what happens 
to the current pg_autovacuum relation?


Also what about system defaults, will we have a hard coded default 
interval of always on, and one default table group that contains all the 
tables with one default entry in pg_av_schedule?


I think we need more discussion on scheduling, we need to make sure this 
solves the vacuum starvation problem.  Does the launcher process 
consider each row in pg_av_schedule that 

Re: [HACKERS] Autovacuum improvements

2007-01-14 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  

Alvaro Herrera wrote:


pg_av_igroupmembers
groupid oid
month   int
dom int
dow int
starttime   timetz
endtime timetz
  
This seems to assume that the start and end time for an interval will be 
on the same day, you probably need to specify a start month, dom, dow, 
time and an end month, dom, dow and time.



Actually, I was thinking that if you want intervals that cross day
boundaries, you just add more tuples (one which finishes at 23:59:59 and
another which starts at 00:00:00 the next day).
  


This still seems ambiguous to me, how would I handle a maintenance 
window of Weekends from Friday at 8PM though Monday morning at 6AM? My 
guess from what said is:

mon dom dow starttime endtime
null  null6  20:00  null
null  null1  null  06:00

So how do we know to vacuum on Saturday or Sunday?  I think clearly 
defined intervals with explicit start and stop times is cleaner.


This all looks reasonable if not a bit complex.  Question, what happens 
to the current pg_autovacuum relation?



I had two ideas: one was to make pg_autovacuum hold default config for
all tables not mentioned in any group, so sites which are OK with 8.2's
representation can still use it.  The other idea was to remove it and
replace it with this mechanism.

  


Probably best to just get rid of it.  GUC variables hold the defaults 
and if we create a default interval / group, it will also have defaults.


I think we need more discussion on scheduling, we need to make sure this 
solves the vacuum starvation problem.  Does the launcher process 
consider each row in pg_av_schedule that applies at the current time 
separately?  That is say there are three entries in pg_av_schedule that 
apply right now, does that mean that the launcher can fire off three 
different vacuums?  Perhaps we need to add a column to pg_av_tablegroup 
that specifies the max number of concurrent worker processes for this 
table group.



My idea was to assign each table, or maybe each group, to a queue, and
then have as much workers as there are queues.  So you could put them
all in a single queue and it would mean there can be at most one vacuum
running at any time.  Or you could put each group in a queue, and then
there could be as many workers as there are groups.  Or you could mix.

And also there would be a autovac concurrency limit, which would be
a GUC var saying how many vacuums to have at any time.


Hmm... this seems like queue is nearly a synonym for group.  Can't we 
just add num_workers property to table groups?  That seems to accomplish 
the same thing.  And yes, a GUC variable to limits the total number of 
concurrent autovacuums is probably a good idea.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] pdfs of the conference

2006-09-20 Thread Matthew T. O'Connor

Walter Cruz wrote:

The larger version is only hidden from everyone :)

http://people.planetpostgresql.org/mha/uploads/photo/conf/conference_group.jpg 
http://people.planetpostgresql.org/mha/uploads/photo/conf/conference_group.jpg


Very cool, I was hoping someone would post this.  Any chance we can get 
a list of names to go with the faces?




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum causing numerous regression-test failures

2006-08-28 Thread Matthew T. O'Connor

Peter Eisentraut wrote:

Tom Lane wrote:

Not a solution for make installcheck,


Well, for make installcheck we don't have any control over whether 
autovacuum has been turned on or off manually anyway.  If you are 
concerned about build farm reliability, the build farm scripts can 
surely be made to initialize or start the instance in a particular way.


Another option might be to turn off stats_row_level on the fly.


I'm sure I'm missing some of the subtleties of make installcheck issues, 
but autovacuum can be enabled / disabled on the fly just as easily as 
stats_row_level, so I don't see the difference?



Or we put manual exceptions for the affected tables into
pg_autovacuum.

New feature?  Or does that capability exist already?


I haven't ever used the pg_autovacuum table but the documentation 
certainly makes one believe that this is possible.


Right, if it doesn't work, that would certainly be a bug.  This feature 
was included during the original integration into the backend during the 
8.0 dev cycle.


Let's just consider some of the options a bit more closely, and if they 
don't work, we'll revert it.


Agreed.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [Pgsqlrpms-hackers] Safer auto-initdb for RPM init

2006-08-26 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Sat, Aug 26, 2006 at 01:32:17PM -0700, Joshua D. Drake wrote:
I am not exactly sure why we initdb at all. IMHO it would be better if 
the start script just checked if there was a cluster. If not, it 
wouldn't start, it would error with: You do not have a cluster, please 
read man page on initdb.


As Tom mentioned, it's for newbie-friendliness. While I can understand
that, I think it needs to be easy to shut that off.


I understand that, but it seems the whole problem of people overwriting 
there data dir is because we initdb from the start script.  If we 
installed the datadir during the RPM install, it would still be newbie 
friendly and would removed initdb from start script solving that problem.


The only downside is larger disk foot-print for something that a user 
may never use.  Given the size of todays hard drives, and the size of a 
standard RedHat install, I don't think that is much of an issue.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [Pgsqlrpms-hackers] Safer auto-initdb for RPM init

2006-08-26 Thread Matthew T. O'Connor

Joshua D. Drake wrote:

Matthew T. O'Connor wrote:

Jim C. Nasby wrote:

As Tom mentioned, it's for newbie-friendliness. While I can understand
that, I think it needs to be easy to shut that off.


I understand that, but it seems the whole problem of people 
overwriting there data dir is because we initdb from the start 
script.  If we installed the datadir during the RPM install, it would 
still be newbie friendly and would removed initdb from start script 
solving that problem.


 initdb will not overwrite an existing installation. 


Poorly chosen words.  I meant, the problem where the start script will 
create a new data dir when it doesn't see that one exists even though 
one actually does exist it's just not available at the moment.  Either 
way, if the start scripts never created a data dir, then there is no 
problem.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Matthew T. O'Connor

Peter Eisentraut wrote:

Summarizing this thread, I see support for the following:

- autovacuum set to on by default in 8.2.


Yes.


- stats_row_level also defaults to on.


Yes.

(Perhaps stats_block_level should also default to on so it's not inconsistent, 
seeing that everything else in on, too.)


I haven't followed this closely, but are all the other stats commands 
other than block_level now on by default?  In general I would vote to 
keep it off if not needed just for performance reasons, though I haven't 
measured the effect of block_level turned on.  Anyone measured this?



- Delayed vacuum and delayed autovacuum will stay disabled.


Unfortunately.

- Scale factor set to 0.08 (vacuum) and 0.04 (analyze) (?)  (formerly 0.4 and 
0.2)


That seems a big jump.  BTW, I know .08 and .04 were suggested, but I 
didn't see confirmation that it was a good idea.  I know my initial 
values were grossly over-conservative, but I am concerned about bogging 
down the server with lots of vacuums, especially since we don't have the 
delay settings on by default, nor do we have a maintenance windows yet.


- Leave base thresholds alone (pending further analysis that might remove them 
altogether?)


While there is talk of removing this all together, I think it was also 
agreed that as long as these values are there, they should be reduced. 
I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
  
While there is talk of removing this all together, I think it was also 
agreed that as long as these values are there, they should be reduced. 
I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.



ISTM that if we don't want to remove the thresholds immediately,
we should make them default to zero for a release or two and see how
well it works.

At the moment I can't find the thread that discussed removing them,
but IIRC there were some good arguments why the thresholds should always
be zero.


I don't have any significant objection to removing them, it just seemed 
to me that we are late in release cycle and that might be more than we 
want to do at the moment.  If others think it's OK, then it's OK with me.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Autovacuum on by default?

2006-08-18 Thread Matthew T. O'Connor

Peter Eisentraut wrote:

Am Donnerstag, 17. August 2006 18:40 schrieb Josh Berkus:

I'm in favor of this, but do we want to turn on vacuum_delay by default
as well?


People might complain that suddenly their vacuum runs take four times as long 
(or whatever).  Of course, if we turn on autovacuum and advocate a more or 
less hands-off vacuum policy, they won't have to care either way.  All of 
this would certainly be release-notes material.


Does anyone, for that matter, want to propose possible default parameters for 
vacuum_delay?


I said vacuum_delay but I should have been more specific, there are 
autovacuum GUC variables which is what we should be talking about.  This 
way manually run, or nighly run by cron vacuums are still as fast as 
they ever were.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread Matthew T. O'Connor

ITAGAKI Takahiro wrote:

Matthew T. O'Connor matthew@zeut.net wrote:
What is this based on?  That is, based on what information is it 
deciding to reduce the naptime?


If there are some vacuum or analyze jobs, the naptime is shortened
(i.e, autovacuum is accelerated). And if there are no jobs, the naptime
is lengthened (autovacuum is decelerated).


Yeah, I looked through the patch after I sent this email.  It's an 
interesting perspective, but I want to see some performance numbers or 
significant bloat reduction before I agree this is a good idea.  Again, 
when a table is busy, constant vacuuming will help keep down bloat, but 
at the expense of throughput.



I noticed my method is based on different views from contrib/pg_autovacuum.
I'm afraid of the lack of vacuum by autovacuum. So if the database seems to
require frequent vacuums, I'll accelerate autovacuum, and vice versa.
If we have a small heavily-updated table and a large rarely-updated table,
we should vacuum the small one soon after vacuum on the large one is done,
even if the large vacuum takes long time. -- but hmm, it may be better to
have multiple autovacuums in such a case primarily.


Yes, I think we are heading in this direction.  As of 8.2 PostgreSQL 
will allow multiple vacuums at the same time (just not on the same 
table), autovacuum hasn't been trained on this yet, but I think it will 
eventually.



I agree. We can use autovacuum thresholds and cost-delay parameters to
control the frequency and priority of vacuum. I don't think it is good
to control vacuums by changing naptime.


Now I'm confused, are you now saying that you don't like the concept 
behind your patch?  Or am I misunderstanding.  I think your idea might 
be a good one, I'm just not sure yet.


Matt

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Peter Eisentraut wrote:
Is it time to turn on autovacuum by default in 8.2?  I know we wanted to 
be on the side of caution with 8.1, but perhaps we should evaluate the 
experiences now.  Comments?


Would be fine by me, but I'm curious to see what the community has to 
say.  A few comments:


Autovacuum can cause unpredictable performance issues, that is if it 
vacuums in the middle of a busy day and people don't want that, of 
course they turn it off easy enough, but they might be surprised.


I haven't played with CVS HEAD much, but I think the logging issue has 
been addressed no?  That is my single biggest gripe with the 8.1 
autovacuum is that it's very hard to see if it's actually done anything 
without having to turn up the logging significantly.


The remaining big ticket items for autovacuum are the maintenance window 
that Alvaro and I have just been discussing, and multiple concurrent 
vacuum, (possibly you could add the continued reduction of vacuum impact 
but that just a constant thing).  Do we think it's worth waiting for 
either of these two features prior to turning on autovacuum by default?


Matt

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Matthew T. O'Connor wrote:
  
Would be fine by me, but I'm curious to see what the community has to 
say.  A few comments:


Autovacuum can cause unpredictable performance issues, that is if it 
vacuums in the middle of a busy day and people don't want that, of 
course they turn it off easy enough, but they might be surprised.


I haven't played with CVS HEAD much, but I think the logging issue has 
been addressed no?  That is my single biggest gripe with the 8.1 
autovacuum is that it's very hard to see if it's actually done anything 
without having to turn up the logging significantly.



This has not been addressed, except that pg_stat_activity now shows
autovacuum.  Someone was going to work on per-module log output, but it
wasn't completed for 8.2.   Does pg_stat_activity now show the table
being vacuumed?


Hmm... I though it had, not the full blown per-module log output stuff, 
but just a simple reorgainzing of the log levels for autovacuum 
messages.  That is lowering the level for:

LOG:  autovacuum: processing database foo
and increasing the log level when autovacuum actually fires off a VACUUM 
or ANALYZE command.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Rod Taylor wrote:

The defaults could be a little more aggressive for both vacuum and
analyze scale_factor settings; 10% and 5% respectively.


I would agree with this, not sure of 10%/5% are right, but the general 
feedback I have heard is that while the defaults in 8.1 are much better 
than the ones in the contrib module, they are still not aggressive enough.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Matthew T. O'Connor wrote:
  
and increasing the log level when autovacuum actually fires off a VACUUM 
or ANALYZE command.



This was not done because the logging control only for autovacuum was
going to be added.  Right now, if you want to see the vacuum activity,
you end up seeing debug stuff too (very ugly).


Any chance we can make this change before release?  I think it's very 
important to be able to look through the logs and *know* that you tables 
are getting vacuumed or not.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Autovacuum maintenance window (was Re: Adjust autovacuum

2006-08-17 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

My vision is a little more complex than that.  You define group of
tables, and separately you define time intervals.  For each combination
of group and interval you can configure certain parameters, like a
multiplier for the autovacuum thresholds and factors; and also the
enable bit.  So you can disable vacuum for some intervals, and refine
the equation factors for some others.  This is all configured in tables,
not in GUC, so you have more flexibility in choosing stuff for different
groups of tables (say, you really want the small-but-high-update tables
to be still vacuumed even during peak periods, but you don't want that
big fat table to be vacuumed at all during the same period).


That sounds good.  I worry a bit that it's going to get overly complex.  
I suppose if we create the concept of a default window that all new 
tables will be automatically be added to when created, then out of the 
box we can create 1 default 24 hour maintenance window that would 
effectively give us the same functionality we have now.


Could we also use these groups to be used for concurrent vacuums?  That 
is autovacuum will loop through each group of tables independently thus 
allowing multiple simultaneous vacuums on different tables and giving us 
a solution to the constantly updated table problem.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Matthew T. O'Connor wrote:
  
Any chance we can make this change before release?  I think it's very 
important to be able to look through the logs and *know* that you tables 
are getting vacuumed or not.



Agreed.  I just IM'ed Alvaro and he says pg_stat_activity should now
show exactly what autovacuum is doing (and if it doesn't, let's fix it).
I think that is the best solution to the monitoring problem, rather than
throwing lines in the server logs.


I'm not sure I agree with this.  I can use the pg_stat_activity table to 
see if autovacuum is doing something right now, but what I want to be 
able to do is look through my logs and see that table_foo hasn't been 
vacuumed since last week, or that table_bar has been vacuumed 7 times 
today.  Can I do that just with the stat system alone?




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Josh Berkus wrote:
Is it time to turn on autovacuum by default in 8.2?  I know we wanted 
to be on the side of caution with 8.1, but perhaps we should evaluate 
the experiences now.  Comments?


I'm in favor of this, but do we want to turn on vacuum_delay by default 
as well?


I thought about this, might be a good idea as it will mitigate the 
impact of vacuums, however it will also slow down vacuums, I'm a bit 
concerned that it won't be able to keep up on really large database, or 
that it'll fall really far behind after vacuuming a big table.


Also, if we do enable it, what is a good default?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Thu, Aug 17, 2006 at 03:00:00PM +0900, ITAGAKI Takahiro wrote:
  
IMO, the only reason at all for naptime is because there is a

non-trivial cost associated with checking a database to see if any
vacuuming is needed.
  


This cost is reduced significantly in the integrated version as compared 
to the contrib version, but yes still not zero.



One problem that I've run across is that in a cluster with a lot of
databases it can take a very long time to cycle through all of them.

Perhaps a better idea would be to check a number of databases on each
pass. That way you won't bog the server down while checking, but it
won't take as long to get to all the databases.

Also, autovac should immediately continue checking databases after it
finishes vacuuming one. The reason for this is that while vacuuming,
the vacuum_cost_delay settings will almost certainly be in effect, which
will prevent autovac from hammering the system. Since the system won't
be hammered during the vacuum, it's ok to check more databases
immediately after finishing vacuuming on one.
  


This is basically what Itagaki's patch does. 


Does anyone have any info on how much load there actually is when
checking databases to see if they need vacuuming?
  


I haven't.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Thu, Aug 17, 2006 at 12:41:57PM -0400, Matthew T. O'Connor wrote:
  
Would be fine by me, but I'm curious to see what the community has to 
say.  A few comments:


Autovacuum can cause unpredictable performance issues, that is if it 
vacuums in the middle of a busy day and people don't want that, of 
course they turn it off easy enough, but they might be surprised.



Anyone in that situation needs to be more hands-on with the database
anyway. I think the big target for autovac is beginners who otherwise
would bloat themselves silly (of course, it's also very useful far
beyond beginners, but by that time an admin should have some clue about
what they're doing).
  


Fair enough, also if we turn on the delay setting by default, it will 
help address this.



And +1 on Rod's suggestion to make it more aggressive. I always drop the
scale factor to at least 0.2 and 0.1 (though 0.1 and 0.05 don't seem
unreasonable), and typically drop the thresholds to 200 and 100 (though
again, lower is probably warrented).
  


OK.


Actually, on a table small enough for the thresholds to kick in it's
going to be extremely fast to vacuum anyway, and the table is probably
either static or changing very rapidly. I'm wondering if maybe they
should just default to 0?
  
I assume you are suggesting that the base value be 0?  Well for one 
thing if the table doesn't have any rows that will result in constant 
vacuuming of that table, so it needs to be greater than 0.  For a small 
table, say 100 rows, there usually isn'tn much performance impact if the 
table if 50% dead space, so I think the base values you suggest are OK, 
but they shouldn't be 0.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Larry Rosenman wrote:

Alvaro Herrera wrote:
  

Bruce Momjian wrote:

Well, the problem is that it shows what it's *currently* doing, but it
doesn't let you know what has happened in the last day or whatever. 
It can't answer has table foo been vacuumed recently? or what

tables haven't been vacuumed at all during this week?



I added last vacuum and last analyze (both auto and manual) dates in the 8.2
tree.


Hmm... well that does address most of my issue.  Doesn't tell me that 
table_foo is getting vacuumed constantly, but it does tell me that it 
was vacuumed recently.  Might be good enough.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Autovacuum on by default?

2006-08-17 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
  
I assume you are suggesting that the base value be 0?  Well for one 
thing if the table doesn't have any rows that will result in constant 
vacuuming of that table, so it needs to be greater than 0.  For a small 
table, say 100 rows, there usually isn'tn much performance impact if the 
table if 50% dead space, so I think the base values you suggest are OK, 
but they shouldn't be 0.



Actually Tom suggested some time ago that we should get rid of the base
value completely, i.e. make it 0 forever.

A row with 0 tables would not show any activity in pgstats, so it would
not be vacuumed constantly.  Only once after it's truncated.


OK, forgot that.  Well I put it in originally as a way to give more 
flexability to the calculation, if I want a tabled vacuumed every 100 
updates, then I can set the scaling factor to 0 and the base value to 
100, but maybe that's not really needed.  It would simplify things if we 
got rid of it.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Adjust autovacuum naptime automatically

2006-08-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

ITAGAKI Takahiro wrote:


In the case of a heavily update workload, the default naptime (60 seconds)
is too long to keep the number of dead tuples low. With my patch, the naptime
will be adjusted around 3 seconds at the case of pgbench (scale=10, 80 tps)
with default other autovacuum parameters.


What is this based on?  That is, based on what information is it 
deciding to reduce the naptime?




Interesting.  To be frank I don't know what the sleep scale factor was
supposed to do.


I'm not sure that sleep scale factor is a good idea or not at this 
point, but what I was thinking back in the day when i originally wrote 
the contrib autovacuum is that I didn't want the system to get bogged 
down constantly vacuuming.  So, if it just spent a long time working on 
one database, it would sleep for long time.


Given that we can now specify the vacuum cost delay settings for 
autovacuum and disable tables and everything else, I'm not sure we this 
anymore, at least not as it was originally designed.  It sounds like 
Itagaki is doing things a little different with his patch, but I'm not 
sure I understand it.



 - I removed autovacuum_naptime guc variable, because it is adjusted
   automatically now. Is it appropriate?


I think we should provide the user with a way to stop the naptime from
changing at all.  Eventually we will have the promised maintenance
windows feature which will mean the user will not have to worry at all
about the naptime, but in the meantime I think we should keep it.


I'm not sure that's true.  I believe we will want the naptime GUC option 
even after we have the maintenance window.  I think we might ignore the 
naptime during the maintenance window, but even after we have the 
maintenance window, we will still vacuum during the day as required.


My vision of the maintenance window has always been very simple, that 
is, during the maintenance window the thresholds get reduced by some 
factor (probably a GUC variable) so during the day it might take 1 
updates on a table to cause a vacuum but during the naptime it might be 
10% of that, 1000.  Is this in-line with what others were thinking?



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] New beginings

2006-08-15 Thread Matthew T. O'Connor

Darcy Buskermolen wrote:
Dear Community members, 

It is with great enthuasim I announce that I have accepted an offer from 
Joshua D. Drake of Command Prompt Inc, to join his team.  As former Vice 
President of Software Development with Wavefire Technologies Corp, I endeavor 
to leverage over 10 years of technical expertise and 8 years of PostgreSQL 
background within my new team.  Rest assured that I will continue my role 
within the Slony development group to bring additional features, robustness 
and usability to the project.


I wish all the best to my former co-workers, and plan for a bright future 
within the ranks of the Command Prompt team. 


Congrats!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.2 features status

2006-08-04 Thread Matthew T. O'Connor

Robert Treat wrote:
So, the things I hear most non-postgresql people complain about wrt postgresql 
are:


no full text indexing built in
no replication built in
no stored procedures (with a mix of wanting in db cron facility) 
the planner is not smart enough (with a mix of wanting hints)

vacuum leads to unpredictable performance


FTI is a biggie in my mind.  I know it ain't happening for 8.2, but is 
the general plan to integrate TSearch2 directly into the backend?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Matthew T. O'Connor

Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.


ODBC over an SSH tunnnel?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [pgsql-advocacy] Toward A Positive Marketing Approach.

2006-05-19 Thread Matthew T. O'Connor

Tom Lane wrote:

Josh Berkus josh@agliodbs.com writes:
Other projects need even more intensive coding help.  OpenOffice, for example, 
doesn't offer the Postgres driver by default because it's still too buggy.  


That seems like something that it'd be worth our while to help fix.


+1 (or +10 if that's not to piggy ;-)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Logging pg_autovacuum

2006-04-27 Thread Matthew T. O'Connor

I think there are two things people typically want to know from the logs:
1) Is autovacuum running
2) Did autovacuum take action (issue a VACUUM or ANALYZE)

I don't think we need mention the name of each and every database we 
touch, we can, but it should be at a lower level like DEBUG1 or something.


I don't know what logging level these thing should go at, but I for one 
would like them to be fairly high easy to get to, perhaps NOTICE?



Matt


Bruce Momjian wrote:

Tom Lane wrote:

[EMAIL PROTECTED] (Bruce Momjian) writes:

Change log message about vacuuming database name from LOG to DEBUG1.
Prevents duplicate meaningless log messsages.

Could we have some discussion about this sort of thing, rather than
unilateral actions?

Those messages were at LOG level because otherwise it's difficult to be
sure from the log that autovac is running at all.


OK, so what do we want to do?  Clearly outputing something everytime
pg_autovacuum touches a database isn't ideal.  By default, the server
logs should show significant events, which this is not.

Do we want something output only the first time autovacuum runs?



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] problem with large maintenance_work_mem settings and

2006-03-04 Thread Matthew T. O'Connor

Stefan Kaltenbrunner wrote:

foo=# set maintenance_work_mem to 200;
SET
foo=# VACUUM ANALYZE verbose;
INFO:  vacuuming information_schema.sql_features
ERROR:  invalid memory alloc request size 204798


Just an FYI, I reported a similar problem on my 8.0.0 database a few 
weeks ago.  I upgraded to 8.1.3 and just recetly got that message again 
in a nighly maintenance email.


vacuumdb: vacuuming database postgres
vacuumdb: vacuuming of database postgres failed: ERROR:  out of memory
DETAIL:  Failed on request of size 167772156.

The interesting thing is that there aren't any tables in the postgres 
database.  This happened Thursday night, but vacuum ran fine on Friday 
night.  I'm on vacation right now, so I can't really look into it much 
deeper at the moment, but I thought I would mention it.


Matt


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Csaba Nagy wrote:
  

Now when the queue tables get 1000 times dead space compared to their
normal size, I get performance problems. So tweaking vacuum cost delay
doesn't buy me anything, as not vacuum per se is the performance
problem, it's long run time for big tables is.


So for you it would certainly help a lot to be able to vacuum the first
X pages of the big table, stop, release locks, create new transaction,
continue with the next X pages, lather, rinse, repeat.


I got the impression that Csaba is looking more for multiple 
simultaneous vacuum more than the partial vacuum.  Not sure the best 
way to set this up, but perhaps a flag in the pg_autovacuum table that 
says vacuum this table even if there is another vacuum running that 
way you can control things and not have autovacuum firing off lots of 
vacuums at the same time.  Sounds to me that these frequently updated 
queue tables need to be monitored closely and not ignored for a long 
period of time because we are vacuuming another table.  Has anyone 
looked more closely at the multiple vacuum patch that was submitted to 
the patches list a while ago?


Matt


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Matthew T. O'Connor

Csaba Nagy wrote:

So he rather needs Hannu Krosing's patch for simultaneous vacuum ...



Well, I guess that would be a good solution to the queue table
problem. The problem is that I can't deploy that patch on our production
systems without being fairly sure it won't corrupt any data... and I
can't rely on non-production testing either. Basically I'm waiting to
see Tom saying it will fly :-)


That patch is a step forward if it's deemed OK by the powers that be.  
However, autovacuum would still need to be taught to handle simultaneous 
vacuums.  I suppose that in the interim, you could disable autovacuum 
for the problematic queue table and have cron issue a manual vacuum 
command for that table at the required frequency.


Anyone up for working on / testing / improving Hannu's patch?  I think 
it's beyond my skill set.


Matt


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Matthew T. O'Connor

Tom Lane wrote:

Matthew T. O'Connor matthew@zeut.net writes:
  
That patch is a step forward if it's deemed OK by the powers that be.  
However, autovacuum would still need to be taught to handle simultaneous 
vacuums.  I suppose that in the interim, you could disable autovacuum 
for the problematic queue table and have cron issue a manual vacuum 
command for that table at the required frequency.



I'm not sure you should think of that as an interim solution.  I don't
really like the idea of multiple autovacuums running concurrently.  ISTM
autovac is intended to be something that lurks in the background and
doesn't take up an unreasonable percentage of your system bandwidth ...
but if there's more than one of them, it's going to be mighty hard to
control the overall load penalty.  Plus you have to worry about keeping
them off each others' backs, ie, not all trying to vac the same table at
once.  And in a scenario like Csaba's, I think the hotspot tables are
just exactly what they'd all try to vacuum.

For small hotspot tables I think a scheduled vacuum process is just the
thing, whereas autovac is more of a free-lance thing to keep the rest of
your DB in line.


While I agree that given the current state of affairs the cron solution 
is elegant, I personally want autovac to solve all of our vacuuming 
needs, I really dislike the idea of requiring a cron based solution to 
solve a fairly typical problem.  Besides the cron solution is sloppy, it 
blindly vacuums whether it's needed or not resulting in a net increase 
of cycles spent vacuuming.


Anyway, I don't know the best way to implement it but I wasn't thinking 
of just firing off multiple autovac processes.  I was envisioning 
something like an autovacuum master process that launches (forks?) 
VACUUM commands and has some smarts about how many processes to fire 
off, or that it would only fire off simultaneous VACUUMS for tables that 
have been flagged as hot spot tables.


I recognize that teaching autovac to handle simultaneous VACUUM's in a 
sane way will require a quantum leap of complexity but it still seems a 
better long term solution.  I would agree that using cron makes sense if 
we were seeing lots of different scenarios that we couldn't possibly 
anticipate, but I don't think that is where we are.


BTW, this discussion is only relevant if we allow simultaneous vacuum.  
Is this something you see as inevitable whether or not you think Hannu's 
implementation is acceptable.


Matt


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Automatic free space map filling

2006-03-02 Thread Matthew T. O'Connor

Csaba Nagy wrote

From my POV, there must be a way to speed up vacuums on huge tables and
small percentage of to-be-vacuumed tuples... a 200 million rows table
with frequent updates of the _same_ record is causing me some pain right
now. I would like to have that table vacuumed as often as possible, but
right now it only works to do it once per week due to load problems on
long-running transactions preventing vacuuming other tables.


Are you running 8.1?  If so, you can use autovacuum and set per table 
thresholds (read vacuum aggressivly) and per table cost delay settings 
so that the performance impact is minimal.  If you have tried 8.1 
autovacuum and found it unhelpful, I would be curious to find out why.



Matt


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] autovacuum

2006-02-01 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Chris Browne wrote:


It strikes me as a slick idea for autovacuum to take on that
behaviour.  If the daily backup runs for 2h, then it is quite futile
to bother vacuuming a table multiple times during that 2h period when
none of the tuples obsoleted during the 2h period will be able to be
cleaned out until the end.


Hmm, yeah, sounds useful.  There's one implementation issue to notice
however, and it's that the autovacuum process dies and restarts for each
iteration, so there's no way for it to remember previous state unless
it's saved somewhere permanent, as the stats info is.

However this seems at least slightly redundant with the maintenance
window feature -- you could set a high barrier to vacuum during the
daily backup period instead.  (Anybody up for doing this job?)


I can't promise anything, but it's on my list of things to hopefully 
find time for in the coming months.  No way I can start it in Feb, but 
maybe sometime in March.  Anyone else?



Matt

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] stats for failed transactions (was Re: [GENERAL] VACUUM

2006-01-28 Thread Matthew T. O'Connor

Tom Lane wrote:

I'd argue it's fine: there are tons of people using row-level stats
via autovacuum, and (AFAICT) just about nobody using 'em for any other
purpose.  Certainly you never see anyone suggesting them as a tool for
investigating problems on pgsql-performance.  Sure, it's a repurposing
of the stats subsystem, but we should be willing to do that when field
experience teaches us what's really needed.
  


OK.


But all that we have is *global* counts.  Inferring ratios applicable to
particular tables seems a big stretch to me.  Any given application is
likely to have some types of transactions that roll back much more often
than others.
  


Certainly a stretch, but it's far from fatal, the worst case scenario is 
we occasionally vacuum a table that might not need it, whereas the way 
things stand right now, the worst case is that we never vacuum that 
might have a lot of slack space.  BTW, I'm only arguing this based on 
what is in the stats system now and pondering how we might improve 
things if the stats system isn't changed to directly address this problem.



One thing we could do is tie the stats message sending more tightly to
top-level transaction commit/abort.  (It's already effectively true that
we send stats only after commit/abort, but we send 'em from the wrong
place, ie PostgresMain.)  Then the semantics of the message could be
read as here's what I did before committing or here's what I did
before aborting and the collector could interpret the counts
accordingly.  However, this still fails in the case where a committed
top-level transaction includes some failed subtransactions.  I think
the only full solution will involve backends doing some extra work at
subtransaction commit/abort so that they can report properly classified
update counts.


Any guess as to the performance implications?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] stats for failed transactions (was Re: [GENERAL] VACUUM

2006-01-27 Thread Matthew T. O'Connor

Tom Lane wrote:

hmm... That's true.  I don't think autovacuum doesn't anything to account
for the concept of rolledback inserts.



I think this is the fault of the stats system design.  AFAICT from a
quick look at the code, inserted/updated/deleted tuples are reported
to the collector in the same way regardless of whether the sending
transaction committed or rolled back.  I think this is unquestionably
a bug, at least for autovacuum's purposes --- though it might be OK
for the original intent of the stats system, which was simply to track
activity levels.

Any thoughts about how it ought to work?


The fact that autovacuum bases it's decisions on info from the stats 
system is arguably an abuse of the original design.  However I don't 
know of a better source of information at the moment.  It has always 
been my vision that autovacuum will eventually incorporate additional 
information sources to make better informed decisions.  There has always 
been discussion of using the FSM to help clue us in as to when we need 
another vacuum.  Perhaps the addition of the vacuum space map that 
people are talking about will also help. 

None of this directly addresses the question of what the stats system 
*should* track, but perhaps it is wrongheaded to totally redesign the 
stats system for the purposes of autovacuum.  As a quick semi-fix, 
perhaps autovacuum should look at the number of rollbacks vs. commits in 
an attempt to determine the accuracy of the stats.  For example if 50% 
of the transactions are getting rolled back, then autovacuum might 
include 50% of the inserts in the count towards the vacuum threshold.  
Obviously this isn't perfect, but it probably gets us closer to reality 
with the information already available.


Thoughts?

Matt


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Segv in pg_autovacuum in 8.0.x

2005-10-19 Thread Matthew T. O'Connor

daveg wrote:

Apologies if this is old news, but pg_autovacuum in 8.0.x has the bad habit
of SEGVing and exiting when a table gets dropped out from under it. This
creates problems if you rely on pg_autovacuum for the bulk of your vacuuming
as it forgets it's statistics when it is restarted and so will skip some
desireable vacuums.

I have a patch for this that should apply against any 8.0.x if there is any
interest.
  


Please post the patch, it might get committed into 8.0.x.  I personally 
haven't heard any other reports of it dieing when tables are dropped, 
but I don't doubt there are some lingering issues like this.



I looked at the new autovacuum in 8.1 and it appears from casual inspection
not to have the same problem.


The autovacuum in 8.1 is a very different critter.  While its design is 
based on the contrib autovacuum, it is mostly a total rewrite, and 
probably a vast improvement :-)



Matt


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Stuff running slooow

2005-08-25 Thread Matthew T. O'Connor

Marc G. Fournier wrote:

As a couple of ppl have found out by becoming 'moderators' for the 
mailing lists, there are *alot* of messages through the server that 
aren't list subscribers, but are legit emails ...



Perhaps that shouldn't be allowed?  Would it help things if all 
non-subscriber emails are just bounced / dropped immediately, before 
anti-virus etc...Seems this would save a lot of CPU time and more 
importantly people time reviewing potentially legit emails.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Autovacuum to-do list

2005-08-01 Thread Matthew T. O'Connor
All the items you mentioned look like 8.2 issues to me.   But here are 
some thoughts.



Alvaro Herrera wrote:


* Enable autovacuum by default.

 Get some field experience with it first, so the worst bugs are covered.
 (Has anybody tested it?)
 



I have done some testing and it seems to be working ok.  I am planning 
on doing some more in depth tests this week.




* Stop a running VACUUM if the system load is too high.



What if vacuum used a vacuum delay that was equal to the vacuum delay 
GUC  settings * the system load.   Or something more sophisticated but 
this would have the effect of having vacuum automatically throttle down 
when the system gets busy and throttle back up when the system gets 
quiet.  We would probably set some min / max values but it sounds 
interesting, thoughts?



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum to-do list

2005-08-01 Thread Matthew T. O'Connor

Andrew - Supernews wrote:


On 2005-08-01, Matthew T. O'Connor matthew@zeut.net wrote:
 


* Stop a running VACUUM if the system load is too high.
 

What if vacuum used a vacuum delay that was equal to the vacuum delay 
GUC  settings * the system load.   Or something more sophisticated but 
this would have the effect of having vacuum automatically throttle down 
when the system gets busy and throttle back up when the system gets 
quiet.  We would probably set some min / max values but it sounds 
interesting, thoughts?
   



I'd be very concerned about feedback loop stability; insufficient
vacuuming can increase the system load, causing vacuum to get further
behind...



Right which is why we would need to enforce some max value so that 
vacuuming will never be totally squeezed out.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum loose ends

2005-07-14 Thread Matthew T. O'Connor
This is great news!  I will do what I can to continue improving the code 
and address these concerns as best I can.  Many of the items below will 
need to be addressed by Alvaro, but I will comment where I think I have 
something useful to say :-)


Tom Lane wrote:


I've applied Alvaro's latest integrated-autovacuum patch.  There are
still a number of loose ends to be dealt with before beta, though:

* Not all the functionality of the current contrib code seems to have
made it in.  In particular I noted the sleep scaling factor is
missing, as well as the options to use nondefault vacuum_cost_delay
settings.  (I'm not sure how important the sleep scale factor is,
but the vacuum cost options seem pretty critical for practical use.)
There may be other stuff to move over; Matthew or someone more familiar
than I with the contrib version needs to take a look.  (I have refrained
from removing the contrib module until we're sure we have extracted
everything from it.)
 



I will take a look for missing features, thanks for not removing it from 
contrib yet. 

As for the sleep factor I'm not sure it makes sense.  It was initially 
put in as a way to prevent autovacuum from running more than X% of the 
time. However, I think the better answer these days is to use the vacuum 
delay settings.


Speaking of which, I think I mentioned this to Alvaro, but I guess it 
just didn't make it in.  The pg_autovacuum table should have a few 
additional columns that allow setting vacuum delay settings on a per 
table basis.  I also think that there should be GUC settings for the 
default autovacuum delay settings which an admin might want to be 
separate from the system wide default vacuum delay settings.



* The code does not make a provision to ignore temporary tables.
Although vacuum.c and analyze.c will disregard the request to touch
such tables, it'd probably be better to recognize the situation further
upstream.  In particular it seems that autovacuum will continually throw
ANALYZE requests for a temp table due to lack of stats.
 



Does the stats system track data about temp tables?  If it doesn't then 
autovacuum won't try to vacuum them.  Will take a look.



* ANALYZE also refuses to do anything with pg_statistic itself, which
is another case that may need special treatment to avoid useless cycles.
 



Should be easy enough to tell autovacuum to ignore this table specifically.


* For that matter I'm unconvinced that it's a good idea to try to force
the pgstat DB to pick up every table in every database.  If there's no
entry it's because the table is not getting modified, and therefore it
seems to me that we can just leave well enough alone.  The code really
is not very good about doing nothing where nothing is called for ;-)
 



I think in a production environment, this won't be an issue, but in a 
development situation where the postmaster is getting stopped and 
started fairly often, it could be an issue.  Actually, if the stats 
system doesn't reset it's data on postmaster restart, this shouldn't be 
a problem.  Any thoughts on changing this default?



* The code ignores datallowconn and therefore will periodically vacuum
template0.  I've got mixed emotions about this --- it could save
someone's bacon if they failed to properly VACUUM FREEZE a template
database, but in 99.99% of installations it's just wasted cycles.
Maybe it'd make sense to perform XID-wraparound-prevention vacuuming,
but not anything more, in a template DB.  Thoughts?
 



Sounds like a good idea.  Bacon conservation is clearly one of the goals 
of autovacuum.



* Or actually, it would vacuum template0, except that since no regular
backend ever connects to template0, there will be no stats DB entry for
it and so the loop in AutoVacMain will ignore it.  This is definitely
BAD as it means that a database that's not been touched since postmaster
start will never be vacuumed, not even for XID wraparound prevention.
That test needs to be weakened.

* I'm still pretty concerned about the handling of shared catalogs.
AFAICS the current pgstats infrastructure simply gets this wrong,
meaning that shared catalogs will not get adequate vacuuming.  We need
to fix that.
 



This was handled in the contrib version by only vacuuming shared 
catalogs inside template1, however it would then analyze those tables in 
each and every database.  Is there a reason this solution is not 
adequate?  Or perhaps this concept doesn't translate to the integrated 
version?



* As Alvaro noted, the default parameter settings need a lookover.
What is in the patch is not what was the default in the contrib module,
but the contrib defaults seem awfully passive.
 



Alvaro and I talked about this.  I suggested these as the new defaults 
as there seemed to be a consensus that the defaults in the contrib  
version were not very useful for most people.  Hopefully these defaults 
still a bit conservative, but useful.



* The documentation badly needs work.  I committed some 

Re: [HACKERS] Autovacuum loose ends

2005-07-14 Thread Matthew T. O'Connor

Tom Lane wrote:


Matthew T. O'Connor matthew@zeut.net writes:
 

Speaking of which, I think I mentioned this to Alvaro, but I guess it 
just didn't make it in.  The pg_autovacuum table should have a few 
additional columns that allow setting vacuum delay settings on a per 
table basis.  I also think that there should be GUC settings for the 
default autovacuum delay settings which an admin might want to be 
separate from the system wide default vacuum delay settings.
   



I was thinking GUC settings only; is there a real use-case for
table-specific delay parameters?  ISTM the point of the delay parameters
for autovac is to put a lid on its impact on interactive response.  Seen
in that light, you do not care exactly which table it's hitting at the
moment.
 



I was thinking of users that might not want the vacuum delay settings on 
small tables that will normally be vacuumed very quickly.  This isn't a 
very strong argument, but I thought I should mention it.  Also, given 
the projects tenancy towards not giving knobs to users unless we are 
sure they need them, I think GUC only would be OK.


This was handled in the contrib version by only vacuuming shared 
catalogs inside template1, however it would then analyze those tables in 
each and every database.  Is there a reason this solution is not 
adequate?
   



The problem is that now that we've invented the default postgres
database, it becomes more plausible to think about installations that
haven't got a template1 at all.  I'd prefer a solution that does not
assume the presence of any specific database.  ISTM reasonable to
process the shared catalogs symmetrically in every DB: look to see
if they need vacuuming or not.  The problem (which was also a problem
for the contrib version) is that the stats system fails to maintain
a single set of stats for a shared catalog --- operations get counted
under whichever DB they were issued from.  This means that autovac
will underestimate the need for vacuuming of a shared catalog, since
no matter where it looks from, it will see only a portion of the
true update activity.



Ok, so without reworking the stats system, I don't see an easy answer to 
this other than autovacuum trying to sum up all the activity it finds in 
all the different databases it looks at, but that seems rather ugly.  
Any thoughts on improving the stats situation here?



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PL/pgSQL Debugger Support

2005-06-23 Thread Matthew T. O'Connor

Tom Lane wrote:


Denis Lussier [EMAIL PROTECTED] writes:
 


I got to thinking it'd be kewl if PgAdmin3 supported an interactive debugger
for pl/pgsql.
   



That's been kicked around before, although I don't think anyone wants to
tie it to pgAdmin specifically.  Check the archives...
 



I didn't find anything relevant after a quick search, but if memory 
serves, one of the objections to PgAdmin was that it was windows only.  
This of course is no longer true as of PgAdmin III 1.0.  It now support 
Win32, Linux and FreeBSD.  So perhaps that objection is no longer valid.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [PATCHES] default database creation with initdb

2005-06-21 Thread Matthew T. O'Connor

Tom Lane wrote:


One thing that neither Dave nor I wanted to touch is pg_autovacuum.
If that gets integrated into the backend by feature freeze then the
question is moot, but if it doesn't then we'll have to decide whether
autovac should preferentially connect to template1 or postgres.  Neither
choice seems real appealing to me: if autovac connects to template1
then it could interfere with CREATE DATABASE, but if it connects to
postgres then it could fail if postgres isn't there.

Now the latter does not bother me if autovac is considered a client,
but it does bother me if autovac is considered part of the backend.
I think that template1 and template0 can reasonably be considered
special from the point of view of the backend --- but I really don't
want postgres to be special in that way.



I'm still hoping that autovac will get integrated so this will be moot, 
but just in case.


Perhaps pg_autovacuum should try to connect to the postgres database and 
if the connection fails, then it will try to connect to template1.  This 
way autovacuum will work whether the postgres database is there or not. 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Autovacuum Patch Status

2005-06-20 Thread Matthew T. O'Connor
Sorry to do this on the hackers list, but I have tried to email Alvaro 
off-list and my email keeps getting bounced so


Alvaro,

I was just wondering what the current status of your work with the 
Autovacuum patch is.  Also, if you would like to discuss anything and 
also if I can help you.  My time is limited but I can help think thinks 
through / help with design issues.  Also I can probably contribute a few 
hours of actual coding time this week / this coming weekend.


Anyway, I know there was a lot of conversation on the hackers list and I 
just wanted to see how you were doing.


Thanks,

Matthew O'Connor

---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Autovacuum in the backend

2005-06-17 Thread Matthew T. O'Connor

Joshua D. Drake wrote:


Josh Berkus wrote:

I've personally seen at least a dozen user requests for autovacuum 
in the backend, and had this conversation about 1,100 times:


NB: After a week, my database got really slow.
Me: How often are you running VACUUM ANALYZE?
NB: Running what?



Can't argue that except... RTFM ;). I am not saying it doesn't have a 
validity. I am just saying that if you actually pay attention to 
PostgreSQL and maintain it, you don't need it ;) 



I think everyone on this list would agree with you.  The only reason I 
think the newbie protection is important (and I don't think it's the 
most important reason for autovacuum) is that perception is reality to 
some extent.  Valid or not we still suffer from a reputation of being 
more complicated and slower than mysql.  Steps towards reducing / 
eliminating that perception can only be good for us as I think lots of 
developers make their first database decision based solely on their 
perceptions and then just stick with what they know.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


  1   2   3   4   >