Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-26 Thread Kenneth Marshall
On Wed, Jan 24, 2007 at 07:30:05PM -0500, Tom Lane wrote:
> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> > Not that I am aware of. Even extending the relation by one additional
> > block can make a big difference in performance
> 
> Do you have any evidence to back up that assertion?
> 
> It seems a bit nontrivial to me --- not the extension part exactly, but
> making sure that the space will get used promptly.  With the current
> code the backend extending a relation will do subsequent inserts into
> the block it just got, which is fine, but there's no mechanism for
> remembering that any other newly-added blocks are available --- unless
> you wanted to push them into the FSM, which could work but the current
> FSM code doesn't support piecemeal addition of space, and in any case
> there's some question in my mind about the concurrency cost of increasing
> FSM traffic even more.
> 
> In short, it's hardly an unquestionable improvement, so we need some
> evidence.
> 
>   regards, tom lane
> 

My comment was purely based on the reduction in fragmentation of the
file behind the relation. A result that I have seen repeatedly in file
related data processing. It does sound much more complicated to make the
additional space available to other backends. If one backend was doing
many inserts, it might still be of value even for just that backend. As
you mention, testing is needed to see if there is enough value in this
process.

Ken


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-25 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Jim C. Nasby wrote:
> >> If we extended relations by more than one page at a time we'd probably
> >> have a better shot at the blocks on disk being contiguous and all read
> >> at the same time by the OS.
> 
> > Actually, there is evidence that adding only a single page to the end
> > causes a lot of contention for that last page, and that adding a few
> > might be better.
> 
> Evidence where?  The code is designed so that the last page *isn't*
> shared --- go read the comments in hio.c sometime.

I was talking about the last page of a table, where INSERTs all cluster
on that last page and cause lots of page locking.  hio.c does look like
it avoids that problem.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-25 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Jim C. Nasby wrote:
>> If we extended relations by more than one page at a time we'd probably
>> have a better shot at the blocks on disk being contiguous and all read
>> at the same time by the OS.

> Actually, there is evidence that adding only a single page to the end
> causes a lot of contention for that last page, and that adding a few
> might be better.

Evidence where?  The code is designed so that the last page *isn't*
shared --- go read the comments in hio.c sometime.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-25 Thread Bruce Momjian
Jim C. Nasby wrote:
> On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> > Gregory Stark wrote:
> > > 
> > > Actually no. A while back I did experiments to see how fast reading a file
> > > sequentially was compared to reading the same file sequentially but 
> > > skipping
> > > x% of the blocks randomly. The results were surprising (to me) and 
> > > depressing.
> > > The breakeven point was about 7%. [...]
> > > 
> > > The theory online was that as long as you're reading one page from each 
> > > disk
> > > track you're going to pay the same seek overhead as reading the entire 
> > > track.
> > 
> > Could one take advantage of this observation in designing the DSM?
> > 
> > Instead of a separate bit representing every page, having each bit
> > represent 20 or so pages might be a more useful unit.  It sounds
> > like the time spent reading would be similar; while the bitmap
> > would be significantly smaller.
> 
> If we extended relations by more than one page at a time we'd probably
> have a better shot at the blocks on disk being contiguous and all read
> at the same time by the OS.

Actually, there is evidence that adding only a single page to the end
causes a lot of contention for that last page, and that adding a few
might be better.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-24 Thread Tom Lane
Kenneth Marshall <[EMAIL PROTECTED]> writes:
> Not that I am aware of. Even extending the relation by one additional
> block can make a big difference in performance

Do you have any evidence to back up that assertion?

It seems a bit nontrivial to me --- not the extension part exactly, but
making sure that the space will get used promptly.  With the current
code the backend extending a relation will do subsequent inserts into
the block it just got, which is fine, but there's no mechanism for
remembering that any other newly-added blocks are available --- unless
you wanted to push them into the FSM, which could work but the current
FSM code doesn't support piecemeal addition of space, and in any case
there's some question in my mind about the concurrency cost of increasing
FSM traffic even more.

In short, it's hardly an unquestionable improvement, so we need some
evidence.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-24 Thread Kenneth Marshall
On Mon, Jan 22, 2007 at 05:11:03PM -0600, Jim C. Nasby wrote:
> On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> > Gregory Stark wrote:
> > > 
> > > Actually no. A while back I did experiments to see how fast reading a file
> > > sequentially was compared to reading the same file sequentially but 
> > > skipping
> > > x% of the blocks randomly. The results were surprising (to me) and 
> > > depressing.
> > > The breakeven point was about 7%. [...]
> > > 
> > > The theory online was that as long as you're reading one page from each 
> > > disk
> > > track you're going to pay the same seek overhead as reading the entire 
> > > track.
> > 
> > Could one take advantage of this observation in designing the DSM?
> > 
> > Instead of a separate bit representing every page, having each bit
> > represent 20 or so pages might be a more useful unit.  It sounds
> > like the time spent reading would be similar; while the bitmap
> > would be significantly smaller.
> 
> If we extended relations by more than one page at a time we'd probably
> have a better shot at the blocks on disk being contiguous and all read
> at the same time by the OS.
> -- 
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
> 
Yes, most OS have some read-ahead  when reading a file from disk. Any
increment over 1 would be an improvement. If you used a counter with
a time-based decrement function, you could increase the amount that
the relation is extended based on temporal proximity. If you have
extended it several times recently, increase the size of the new
extension to reduce the overhead even further. The default should
be approximately the OS standard read-ahead amount.

Ken


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-24 Thread Kenneth Marshall
On Tue, Jan 23, 2007 at 09:01:41PM -0600, Jim Nasby wrote:
> On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:
> >The default should
> >be approximately the OS standard read-ahead amount.
> 
> Is there anything resembling a standard across the OSes we support?  
> Better yet, is there a standard call that allows you to find out what  
> the read-ahead setting is?
> --
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
> 
Not that I am aware of. Even extending the relation by one additional
block can make a big difference in performance and should easily fall
within every read-ahead in use today. Or a GUC variable, that defaults
to a small power of 2 number of PostgreSQL blocks, with a default arrived
at by testing.

Ken

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-23 Thread Jim Nasby

On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:

The default should
be approximately the OS standard read-ahead amount.


Is there anything resembling a standard across the OSes we support?  
Better yet, is there a standard call that allows you to find out what  
the read-ahead setting is?

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Martijn van Oosterhout
On Mon, Jan 22, 2007 at 05:51:53PM +, Gregory Stark wrote:
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.

I asusume this means you were reading 7% of the blocks, not skipping 7%
of the blocks when you broke even?

I presume by break-even you mean it took just as long, time-wise. But
did it have the same effect on system load? If reading only 7% of the
blocks allows the drive to complete other requests more quickly then
it's beneficial, even if the vacuum takes longer.

This may be a silly thought, I'm not sure how drives handle multiple
requests...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Steve Atkins


On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:


Bruce Momjian wrote:
Yep, agreed on the random I/O issue.  The larger question is if  
you have
a huge table, do you care to reclaim 3% of the table size, rather  
than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is  
going
to take a lot of time, but vacuuming to relaim 3% three times  
seems like
it is going to be more expensive than just vacuuming the 10%  
once.  And

vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than  
just paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you  
can at least do something useful with a large table if a full  
vacuum takes 24 hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way  
back when isn't directly addressing the issue. What most people  
want (I'm guessing) is to vacuum *when the user-workload allows*  
and the time-tabling is just a sysadmin first-approximation at that.


Yup. I'd really like for my app to be able to say "Hmm. No  
interactive users at the moment, no critical background tasks. Now  
would be a really good time for the DB to do some maintenance." but  
also to be able to interrupt the maintenance process if some new  
users or other system load show up.


With partial vacuuming possible, we can arrange things with just  
three thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table "requires vacuuming" value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to  
the thresholds and starts a new vacuum, kills one or does nothing.  
New vacuum processes keep getting started as long as there is  
workload spare and tables that need vacuuming.


Now the trick of course is how you measure system workload in a  
meaningful manner.


I'd settle for a "start maintenance", "stop maintenance" API.  
Anything else (for instance the heuristics you suggest above) would  
definitely be gravy.


It's not going to be simple to do, though, I don't think.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> Gregory Stark wrote:
> > 
> > Actually no. A while back I did experiments to see how fast reading a file
> > sequentially was compared to reading the same file sequentially but skipping
> > x% of the blocks randomly. The results were surprising (to me) and 
> > depressing.
> > The breakeven point was about 7%. [...]
> > 
> > The theory online was that as long as you're reading one page from each disk
> > track you're going to pay the same seek overhead as reading the entire 
> > track.
> 
> Could one take advantage of this observation in designing the DSM?
> 
> Instead of a separate bit representing every page, having each bit
> represent 20 or so pages might be a more useful unit.  It sounds
> like the time spent reading would be similar; while the bitmap
> would be significantly smaller.

If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall
On Mon, Jan 22, 2007 at 07:24:20PM +, Heikki Linnakangas wrote:
> Kenneth Marshall wrote:
> >On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
> >>Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
> >
> >There may be other functions that could leverage a similar sort of
> >infrastructure. For example, a long DB mining query could be registered
> >with the system. Then as the pieces of the table/database are brought in
> >to shared memory during the normal daily DB activity they can be acquired
> >without forcing the DB to run a very I/O expensive query when waiting a
> >bit for the results would be acceptable. As long as we are thinking
> >piggyback.
> 
> Yeah, I had the same idea when we discussed synchronizing sequential 
> scans. The biggest difference is that with queries, there's often a user 
> waiting for the query to finish, but with vacuum we don't care so much 
> how long it takes.
> 
Yes, but with trending and statistical analysis you may not need the
exact answer ASAP. An approximate answer based on a fraction of the
information would be useful. Also, "what if" queries could be run without
impacting the production uses of a database. One might imagine having a
query with results that "converge" as the table is processed during normal
use.

Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall
On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
> On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
> 
> Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
> 
> -- 
>   Simon Riggs 
>   EnterpriseDB   http://www.enterprisedb.com
> 

There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.

Ken


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Joris Dobbelsteen
>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark
>Sent: maandag 22 januari 2007 19:41
>To: Bruce Momjian
>Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen; 
>Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan 
>Deolasee; Christopher Browne; pgsql-general@postgresql.org; 
>pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements
>
>"Bruce Momjian" <[EMAIL PROTECTED]> writes:
>
>> Yep, agreed on the random I/O issue.  The larger question is if you 
>> have a huge table, do you care to reclaim 3% of the table 
>size, rather 
>> than just vacuum it when it gets to 10% dirty?  I realize the vacuum 
>> is going to take a lot of time, but vacuuming to relaim 3% 
>three times 
>> seems like it is going to be more expensive than just vacuuming the 
>> 10% once.  And vacuuming to reclaim 1% ten times seems even more 
>> expensive.  The partial vacuum idea is starting to look like 
>a loser to me again.
>
>Well the answer is of course "that depends".
>
>If you maintain the dead space at a steady state averaging 
>1.5% instead of 5% your table is 3.33% smaller on average. If 
>this is a DSS system that will translate into running your 
>queries 3.33% faster. It will take a lot of vacuums before 
>they hurt more than a 3%+ performance drop.

Good, this means a DSS system will mostly do table scans (right?). So
probably you should witness the 'table scan' statistic and rows fetched
aproaching the end of the universe (at least compared to
inserts/updates/deletes)?

>If it's an OLTP system the it's harder to figure. a 3.33% 
>increase in data density will translate to a higher cache hit 
>rate but how much higher depends on a lot of factors. In our 
>experiments we actually got bigger boost in these kinds of 
>situations than the I expected (I expected comparable to the 
>3.33% improvement). So it could be even more than 3.33%. But 
>like said it depends.
>If you already have the whole database cache you won't see any 
>improvement. If you are right on the cusp you could see a huge benefit.

These tables have high insert, update and delete rates, probably a lot
of index scans? I believe the workload on table scans should be (close
to) none.

Are you willing to share some of this measured data? I'm quite
interested in such figures.

>It sounds like you're underestimating the performance drain 
>10% wasted space has. If we found out that one routine was 
>unnecessarily taking 10% of the cpu time it would be an 
>obvious focus of attention. 10% wasted space is going to work 
>out to about 10% of the i/o time.
>
>It also sounds like we're still focused on the performance 
>impact in absolute terms. I'm much more interested in changing 
>the performance characteristics so they're predictable and 
>scalable. It doesn't matter much if your 1kb table is 100% 
>slower than necessary but it does matter if your 1TB table 
>needs 1,000x as much vacuuming as your 1GB table even if it's 
>getting the same update traffic.

Or rather, the vacuuming should pay back.
A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming.
Obviously, the higher the better.

- Joris Dobbelsteen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Ron Mayer
Gregory Stark wrote:
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%. [...]
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.

Could one take advantage of this observation in designing the DSM?

Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit.  It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > 
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
> 
> But if the partial vacuum is able to clean the busiest pages and reclaim
> useful space, currently-running transactions will be able to use that
> space and thus not have to extend the table.  Not that extension is a
> problem on itself, but it'll keep your working set smaller.

Yes, but my point is that if you are trying to avoid vacuuming the
table, I am afraid the full index scan is going to be painful too.  I
can see corner cases where partial vacuum is a win (I only have 4 hours
of idle I/O), but for the general case I am still worried that partial
vacuum will not be that useful as long as we have to scan the indexes.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Kenneth Marshall wrote:

On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...


There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.


Yeah, I had the same idea when we discussed synchronizing sequential 
scans. The biggest difference is that with queries, there's often a user 
waiting for the query to finish, but with vacuum we don't care so much 
how long it takes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Gregory Stark wrote:

"Bruce Momjian" <[EMAIL PROTECTED]> writes:


I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.


Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.


Note that with uniformly random updates, you have dirtied every page of 
the table until you get anywhere near 5% of dead space. So we have to 
assume non-uniform distribution of update for the DSM to be of any help.


And if we assume non-uniform distribution, it's a good bet that the 
blocks that need vacuuming are also not randomly distributed. In fact, 
they might very well all be in one cluster, so that scanning that 
cluster is indeed sequential I/O.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Richard Huxton

Bruce Momjian wrote:

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than just 
paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you can at 
least do something useful with a large table if a full vacuum takes 24 
hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way back 
when isn't directly addressing the issue. What most people want (I'm 
guessing) is to vacuum *when the user-workload allows* and the 
time-tabling is just a sysadmin first-approximation at that.


With partial vacuuming possible, we can arrange things with just three 
thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table "requires vacuuming" value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the 
thresholds and starts a new vacuum, kills one or does nothing. New 
vacuum processes keep getting started as long as there is workload spare 
and tables that need vacuuming.


Now the trick of course is how you measure system workload in a 
meaningful manner.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Alvaro Herrera
Bruce Momjian wrote:
> 
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table.  Not that extension is a
problem on itself, but it'll keep your working set smaller.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark

"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Well the answer is of course "that depends".

If you maintain the dead space at a steady state averaging 1.5% instead of 5%
your table is 3.33% smaller on average. If this is a DSS system that will
translate into running your queries 3.33% faster. It will take a lot of
vacuums before they hurt more than a 3%+ performance drop.

If it's an OLTP system the it's harder to figure. a 3.33% increase in data
density will translate to a higher cache hit rate but how much higher depends
on a lot of factors. In our experiments we actually got bigger boost in these
kinds of situations than the I expected (I expected comparable to the 3.33%
improvement). So it could be even more than 3.33%. But like said it depends.
If you already have the whole database cache you won't see any improvement. If
you are right on the cusp you could see a huge benefit.

It sounds like you're underestimating the performance drain 10% wasted space
has. If we found out that one routine was unnecessarily taking 10% of the cpu
time it would be an obvious focus of attention. 10% wasted space is going to
work out to about 10% of the i/o time.

It also sounds like we're still focused on the performance impact in absolute
terms. I'm much more interested in changing the performance characteristics so
they're predictable and scalable. It doesn't matter much if your 1kb table is
100% slower than necessary but it does matter if your 1TB table needs 1,000x
as much vacuuming as your 1GB table even if it's getting the same update
traffic.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---

Gregory Stark wrote:
> "Bruce Momjian" <[EMAIL PROTECTED]> writes:
> 
> > I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
> 
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
> 
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
> 
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark
"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 12:18 -0500, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
> > 
> > In any case, for the statement "Index cleanup is the most expensive part 
> > of vacuum" to be true, you're indexes would have to take up 2x as much 
> > space as the heap, since the heap is scanned twice. I'm sure there's 
> > databases like that out there, but I don't think it's the common case.
> 
> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Clearly keeping track of what needs vacuuming will lead to a more
efficient VACUUM. Your math applies to *any* design that uses some form
of book-keeping to focus in on the hot spots.

On a separate thread, Heikki has raised a different idea for VACUUM.

Heikki's idea asks an important question: where and how should DSM
information be maintained? Up to now everybody has assumed that it would
be maintained when DML took place and that the DSM would be a
transactional data structure (i.e. on-disk). Heikki's idea requires
similar bookkeeping requirements to the original DSM concept, but the
interesting aspect is that the DSM information is collected off-line,
rather than being an overhead on every statement's response time.

That idea seems extremely valuable to me.

One of the main challenges is how we cope with large tables that have a
very fine spray of updates against them. A DSM bitmap won't help with
that situation, regrettably.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> Russell Smith wrote:
> >>> 2. Index cleanup is the most expensive part of vacuum.  So doing a 
> >>> partial vacuum actually means more I/O as you have to do index cleanup 
> >>> more often.
> >> I don't think that's usually the case. Index(es) are typically only a 
> >> fraction of the size of the table, and since 8.2 we do index vacuums in 
> >> a single scan in physical order. In fact, in many applications the index 
> >> is be mostly cached and the index scan doesn't generate any I/O at all.
> > 
> > Are _all_ the indexes cached?  I would doubt that.
> 
> Well, depends on your schema, of course. In many applications, yes.
> 
> >  Also, for typical
> > table, what percentage is the size of all indexes combined?
> 
> Well, there's no such thing as a typical table. As an anecdote here's 
> the ratios (total size of all indexes of a table)/(size of corresponding 
> heap) for the bigger tables for a DBT-2 run I have at hand:
> 
> Stock:1190470/68550 = 6%
> Order_line:   950103/274372 = 29%
> Customer: 629011 /(5711+20567) = 8%
> 
> In any case, for the statement "Index cleanup is the most expensive part 
> of vacuum" to be true, you're indexes would have to take up 2x as much 
> space as the heap, since the heap is scanned twice. I'm sure there's 
> databases like that out there, but I don't think it's the common case.

I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.
I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


Are _all_ the indexes cached?  I would doubt that.


Well, depends on your schema, of course. In many applications, yes.


 Also, for typical
table, what percentage is the size of all indexes combined?


Well, there's no such thing as a typical table. As an anecdote here's 
the ratios (total size of all indexes of a table)/(size of corresponding 
heap) for the bigger tables for a DBT-2 run I have at hand:


Stock:  1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer:   629011 /(5711+20567) = 8%

In any case, for the statement "Index cleanup is the most expensive part 
of vacuum" to be true, you're indexes would have to take up 2x as much 
space as the heap, since the heap is scanned twice. I'm sure there's 
databases like that out there, but I don't think it's the common case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Russell Smith wrote:
> > 2. Index cleanup is the most expensive part of vacuum.  So doing a 
> > partial vacuum actually means more I/O as you have to do index cleanup 
> > more often.
> 
> I don't think that's usually the case. Index(es) are typically only a 
> fraction of the size of the table, and since 8.2 we do index vacuums in 
> a single scan in physical order. In fact, in many applications the index 
> is be mostly cached and the index scan doesn't generate any I/O at all.

Are _all_ the indexes cached?  I would doubt that.  Also, for typical
table, what percentage is the size of all indexes combined?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.


I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


I believe the heap scans are the biggest issue at the moment.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Sun, 2007-01-21 at 14:26 -0600, Jim C. Nasby wrote:
> On Sun, Jan 21, 2007 at 11:39:45AM +, Heikki Linnakangas wrote:
> > Russell Smith wrote:
> > >Strange idea that I haven't researched,  Given Vacuum can't be run in a 
> > >transaction, it is possible at a certain point to quit the current 
> > >transaction and start another one.  There has been much chat and now a 
> > >TODO item about allowing multiple vacuums to not starve small tables.  
> > >But if a big table has a long running vacuum the vacuum of the small 
> > >table won't be effective anyway will it?  If vacuum of a big table was 
> > >done in multiple transactions you could reduce the effect of long 
> > >running vacuum.  I'm not sure how this effects the rest of the system 
> > >thought.
> > 
> > That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
> > ignore other vacuums in the oldest xmin calculation.
> 
> And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
> will commit and start a new transaction. 

err...It doesn't do this now and IIRC didn't do that in 8.1 either.

> That's still useful even with
> Hannu's patch in case you start a vacuum with maintenance_work_mem too
> small; you can abort the vacuum some time later and at least some of the
> work it's done will get committed.

True, but not recommended, though for a variety of reasons.

The reason is not intermediate commits, but just that the work of VACUUM
is mostly non-transactional in nature, apart from the various catalog
entries when it completes.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-21 Thread Jim C. Nasby
On Sun, Jan 21, 2007 at 11:39:45AM +, Heikki Linnakangas wrote:
> Russell Smith wrote:
> >Strange idea that I haven't researched,  Given Vacuum can't be run in a 
> >transaction, it is possible at a certain point to quit the current 
> >transaction and start another one.  There has been much chat and now a 
> >TODO item about allowing multiple vacuums to not starve small tables.  
> >But if a big table has a long running vacuum the vacuum of the small 
> >table won't be effective anyway will it?  If vacuum of a big table was 
> >done in multiple transactions you could reduce the effect of long 
> >running vacuum.  I'm not sure how this effects the rest of the system 
> >thought.
> 
> That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
> ignore other vacuums in the oldest xmin calculation.

And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
will commit and start a new transaction. That's still useful even with
Hannu's patch in case you start a vacuum with maintenance_work_mem too
small; you can abort the vacuum some time later and at least some of the
work it's done will get committed.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-21 Thread Martijn van Oosterhout
On Sun, Jan 21, 2007 at 12:24:38PM +, Simon Riggs wrote:
> Partial vacuum would still be possible if you remembered where you got
> to in the VACUUM and then started from that same point next time. It
> could then go to the end of the table and wrap back around.

ISTM the Dead Space Map would give you this automatically, since that's
your memory... Once you have the DSM to track where the dead pages are,
you can set it up to target clusters first, thus giving maximum bang
for buck.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-21 Thread Simon Riggs
On Sat, 2007-01-20 at 09:41 +1100, Russell Smith wrote:
> Darcy Buskermolen wrote: 
> > [snip] 
> > 
> > Another thought, is it at all possible to do a partial vacuum?  ie spend 
> > the 
> > next 30 minutes vacuuming foo table, and update the fsm with what hew have 
> > learned over the 30 mins, even if we have not done a full table scan ?
> >   
> There was a proposal for this, but it was dropped on 2 grounds.
> 1. partial vacuum would mean that parts of the table are missed, the
> user could never vacuum certain parts and transaction wraparound would
> get you.  You may also have other performance issues as you forgot
> certian parts of the table

Partial vacuum would still be possible if you remembered where you got
to in the VACUUM and then started from that same point next time. It
could then go to the end of the table and wrap back around.

> 2. Index cleanup is the most expensive part of vacuum.  So doing a
> partial vacuum actually means more I/O as you have to do index cleanup
> more often.

Again, not necessarily. A large VACUUM can currently perform more than
one set of index scans, so if you chose the right stopping place for a
partial VACUUM you need never incur any additional work. It might even
save effort in the long run.

I'm not necessarily advocating partial VACUUM, just pointing out that
the problems you raise need not be barriers to implementation, should
that be considered worthwhile.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-21 Thread Heikki Linnakangas

Russell Smith wrote:
Strange idea that I haven't researched,  Given Vacuum can't be run in a 
transaction, it is possible at a certain point to quit the current 
transaction and start another one.  There has been much chat and now a 
TODO item about allowing multiple vacuums to not starve small tables.  
But if a big table has a long running vacuum the vacuum of the small 
table won't be effective anyway will it?  If vacuum of a big table was 
done in multiple transactions you could reduce the effect of long 
running vacuum.  I'm not sure how this effects the rest of the system 
thought.


That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
ignore other vacuums in the oldest xmin calculation.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-19 Thread Russell Smith

Darcy Buskermolen wrote:

[snip]

Another thought, is it at all possible to do a partial vacuum?  ie spend the 
next 30 minutes vacuuming foo table, and update the fsm with what hew have 
learned over the 30 mins, even if we have not done a full table scan ?
  

There was a proposal for this, but it was dropped on 2 grounds.
1. partial vacuum would mean that parts of the table are missed, the 
user could never vacuum certain parts and transaction wraparound would 
get you.  You may also have other performance issues as you forgot 
certian parts of the table


2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.


If we are talking about autovacuum, 1 doesn't become so much of an issue 
as you just make the autovacuum remember what parts of the table it's 
vacuumed.  This really has great power when you have a dead space map.


Item 2 will still be an issue.  But if you define "partial" as either 
fill maintenance_work_mem, or finish the table, you are not increasing 
I/O at all.  As when maintenance work mem is full, you have to cleanup 
all the indexes anyway.  This is really more like VACUUM SINGLE, but the 
same principal applies.


I believe all planning really needs to think about how a dead space map 
will effect what vacuum is going to be doing in the future.



Strange idea that I haven't researched,  Given Vacuum can't be run in a 
transaction, it is possible at a certain point to quit the current 
transaction and start another one.  There has been much chat and now a 
TODO item about allowing multiple vacuums to not starve small tables.  
But if a big table has a long running vacuum the vacuum of the small 
table won't be effective anyway will it?  If vacuum of a big table was 
done in multiple transactions you could reduce the effect of long 
running vacuum.  I'm not sure how this effects the rest of the system 
thought.


Russell Smith


  




Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-19 Thread Bruce Momjian

Added to TODO:

>   o Allow multiple vacuums so large tables do not starve small
> tables
>
> http://archives.postgresql.org/pgsql-general/2007-01/msg00031.php
>
>   o Improve control of auto-vacuum
>
> http://archives.postgresql.org/pgsql-hackers/2006-12/msg00876.php


---

Darcy Buskermolen wrote:
> On Friday 19 January 2007 01:47, Simon Riggs wrote:
> > On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > > elein wrote:
> > > > > Have you made any consideration of providing feedback on autovacuum
> > > > > to users? Right now we don't even know what tables were vacuumed when
> > > > > and what was reaped.  This might actually be another topic.
> > > >
> > > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > > to have a log table, on which we'd register what did we run, at what
> > > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > > having it on the regular text log is useful but it's not good enough.
> > > > Keep in mind that in the future we may want to peek at that collected
> > > > information to be able to take better scheduling decisions (or at least
> > > > inform the DBA that he sucks).
> > > >
> > > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > > that manually-run vacuums would be logged as well.
> > >
> > > Yes I did intend this thought for vacuum, not strictly autovacuum.
> >
> > I agree, for all VACUUMs: we need a log table.
> >
> > The only way we can get a feedback loop on what has come before is by
> > remembering what happened. Simply logging it is interesting, but not
> > enough.
> 
> Correct, I think we are all saying the same thing that is this log table is 
> purely inserts so that we can see trends over time.
> 
> >
> > There is some complexity there, because with many applications a small
> > table gets VACUUMed every few minutes, so the log table would become a
> > frequently updated table itself. I'd also suggest that we might want to
> > take account of the number of tuples removed by btree pre-split VACUUMs
> > also.
> 
> Thinking on this a bit more, I suppose that this table really should allow 
> for 
> user defined triggers on it, so that a DBA can create partioning for it, not 
> to mention being able to move it off into it's own tablespace. 
> 
> 
> >
> > I also like the idea of a single scheduler and multiple child workers.
> >
> > The basic architecture is clear and obviously beneficial. What worries
> > me is how the scheduler will work; there seems to be as many ideas as we
> > have hackers. I'm wondering if we should provide the facility of a
> > pluggable scheduler? That way you'd be able to fine tune the schedule to
> > both the application and to the business requirements. That would allow
> > integration with external workflow engines and job schedulers, for when
> > VACUUMs need to not-conflict with external events.
> >
> > If no scheduler has been defined, just use a fairly simple default.
> >
> > The three main questions are
> > - what is the maximum size of VACUUM that can start *now*
> 
> How can we determine this given we have no real knowledge of the upcoming  
> adverse IO conditions ?
> 
> > - can *this* VACUUM start now?
> > - which is the next VACUUM to run?
> >
> > If we have an API that allows those 3 questions to be asked, then a
> > scheduler plug-in could supply the answers. That way any complex
> > application rules (table A is available for VACUUM now for next 60 mins,
> > table B is in constant use so we must use vacuum_delay), external events
> > (long running reports have now finished, OK to VACUUM), time-based rules
> > (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> > first 3 days of the each month is financial accounting close) can be
> > specified.
> 
> Another thought, is it at all possible to do a partial vacuum?  ie spend the 
> next 30 minutes vacuuming foo table, and update the fsm with what hew have 
> learned over the 30 mins, even if we have not done a full table scan ?
> 
> 
> -- 
> 
> 
> Darcy Buskermolen
> The PostgreSQL company, Command Prompt Inc.
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-19 Thread Darcy Buskermolen
On Friday 19 January 2007 01:47, Simon Riggs wrote:
> On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > elein wrote:
> > > > Have you made any consideration of providing feedback on autovacuum
> > > > to users? Right now we don't even know what tables were vacuumed when
> > > > and what was reaped.  This might actually be another topic.
> > >
> > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > to have a log table, on which we'd register what did we run, at what
> > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > having it on the regular text log is useful but it's not good enough.
> > > Keep in mind that in the future we may want to peek at that collected
> > > information to be able to take better scheduling decisions (or at least
> > > inform the DBA that he sucks).
> > >
> > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > that manually-run vacuums would be logged as well.
> >
> > Yes I did intend this thought for vacuum, not strictly autovacuum.
>
> I agree, for all VACUUMs: we need a log table.
>
> The only way we can get a feedback loop on what has come before is by
> remembering what happened. Simply logging it is interesting, but not
> enough.

Correct, I think we are all saying the same thing that is this log table is 
purely inserts so that we can see trends over time.

>
> There is some complexity there, because with many applications a small
> table gets VACUUMed every few minutes, so the log table would become a
> frequently updated table itself. I'd also suggest that we might want to
> take account of the number of tuples removed by btree pre-split VACUUMs
> also.

Thinking on this a bit more, I suppose that this table really should allow for 
user defined triggers on it, so that a DBA can create partioning for it, not 
to mention being able to move it off into it's own tablespace. 


>
> I also like the idea of a single scheduler and multiple child workers.
>
> The basic architecture is clear and obviously beneficial. What worries
> me is how the scheduler will work; there seems to be as many ideas as we
> have hackers. I'm wondering if we should provide the facility of a
> pluggable scheduler? That way you'd be able to fine tune the schedule to
> both the application and to the business requirements. That would allow
> integration with external workflow engines and job schedulers, for when
> VACUUMs need to not-conflict with external events.
>
> If no scheduler has been defined, just use a fairly simple default.
>
> The three main questions are
> - what is the maximum size of VACUUM that can start *now*

How can we determine this given we have no real knowledge of the upcoming  
adverse IO conditions ?

> - can *this* VACUUM start now?
> - which is the next VACUUM to run?
>
> If we have an API that allows those 3 questions to be asked, then a
> scheduler plug-in could supply the answers. That way any complex
> application rules (table A is available for VACUUM now for next 60 mins,
> table B is in constant use so we must use vacuum_delay), external events
> (long running reports have now finished, OK to VACUUM), time-based rules
> (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> first 3 days of the each month is financial accounting close) can be
> specified.

Another thought, is it at all possible to do a partial vacuum?  ie spend the 
next 30 minutes vacuuming foo table, and update the fsm with what hew have 
learned over the 30 mins, even if we have not done a full table scan ?


-- 


Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-19 Thread Simon Riggs
On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > elein wrote:
> > > Have you made any consideration of providing feedback on autovacuum to
> > > users? Right now we don't even know what tables were vacuumed when and
> > > what was reaped.  This might actually be another topic.
> >
> > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > to have a log table, on which we'd register what did we run, at what
> > time, how long did it last, how many tuples did it clean, etc.  I feel
> > having it on the regular text log is useful but it's not good enough.
> > Keep in mind that in the future we may want to peek at that collected
> > information to be able to take better scheduling decisions (or at least
> > inform the DBA that he sucks).
> >
> > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > that manually-run vacuums would be logged as well.
> 
> Yes I did intend this thought for vacuum, not strictly autovacuum.

I agree, for all VACUUMs: we need a log table.

The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
enough.

There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
also.

I also like the idea of a single scheduler and multiple child workers.

The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.

If no scheduler has been defined, just use a fairly simple default.

The three main questions are
- what is the maximum size of VACUUM that can start *now*
- can *this* VACUUM start now?
- which is the next VACUUM to run?

If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
specified. 


-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-17 Thread Ron Mayer
Matthew T. O'Connor wrote:
> Alvaro Herrera wrote:
>> I'd like to hear other people's opinions on Darcy Buskermolen proposal
>> to have a log table, on which we'd register what did we run, at what
>> time, how long did it last, [...] 
> 
> I think most people would just be happy if we could get autovacuum to
> log it's actions at a much higher log level.  I think that "autovacuum
> vacuumed table x" is important and shouldn't be all the way down at the
> debug level.

+1 here.  Even more than "autovacuum vacuumed table x", I'd like to see
"vacuum starting table x" and "vacuum done table x".  The reason I say
that is because the speculation "autovacuum might have been running"
is now a frequent phrase I hear when performance questions are asked.
If vacuum start and end times were logged at a much earlier level,
that feature plus log_min_duration_statement could easily disprove
the "vacuum might have been running" hypothesis.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).


I'm not familiar with his proposal, but I'm not sure what I think of 
logging vacuum (and perhaps analyze) commands to a table.  We have never 
logged anything to tables inside PG.  I would be worried about this 
eating a lot of space in some situations.


I think most people would just be happy if we could get autovacuum to 
log it's actions at a much higher log level.  I think that "autovacuum 
vacuumed table x" is important and shouldn't be all the way down at the 
debug level.


The other (more involved) solution to this problem was proposed which 
was create a separate set of logging control params for autovacuum so 
that you can turn it up or down independent of the general server logging.



Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.


+1


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Darcy Buskermolen
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> elein wrote:
> > Have you made any consideration of providing feedback on autovacuum to
> > users? Right now we don't even know what tables were vacuumed when and
> > what was reaped.  This might actually be another topic.
>
> I'd like to hear other people's opinions on Darcy Buskermolen proposal
> to have a log table, on which we'd register what did we run, at what
> time, how long did it last, how many tuples did it clean, etc.  I feel
> having it on the regular text log is useful but it's not good enough.
> Keep in mind that in the future we may want to peek at that collected
> information to be able to take better scheduling decisions (or at least
> inform the DBA that he sucks).
>
> Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> that manually-run vacuums would be logged as well.

Yes I did intend this thought for vacuum, not strictly autovacuum.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Alvaro Herrera
elein wrote:

> Have you made any consideration of providing feedback on autovacuum to users?
> Right now we don't even know what tables were vacuumed when and what was
> reaped.  This might actually be another topic.

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).

Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-16 Thread Alvaro Herrera
Simon Riggs wrote:

> Perhaps we should focus on the issues that might result, so that we
> address those before we spend time on the details of the user interface.
> Can we deadlock or hang from running multiple autovacuums?

If you were to run multiple autovacuum processes the way they are today,
maybe.  But that's not my intention -- the launcher would be the only
one to read the catalogs; the workers would be started only to do a
single VACUUM job.  This reduces the risk of this kind of problems.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-15 Thread Alban Hertroys
Pavan Deolasee wrote:
> Simon Riggs wrote:
>> On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
>>> Christopher Browne wrote:
>>>
 Seems to me that you could get ~80% of the way by having the simplest
 "2 queue" implementation, where tables with size < some threshold get
 thrown at the "little table" queue, and tables above that size go to
 the "big table" queue.

 That should keep any small tables from getting "vacuum-starved."
>>
> 
> This is exectly what I am trying, two process autovacuum and a GUC to
> seperate small tables.
> 
> In this case, one process takes up vacuuming of the small tables and
> other process vacuuming of the remaining tables as well as Xid
> avoidance related vacuuming. The goal is to avoid starvation of small
> tables when a large table is being vacuumed (which may take
> several hours) without adding too much complexity to the code.

Would it work to make the queues push the treshold into the direction of
the still running queue if the other queue finishes before the still
running one? This would achieve some kind of auto-tuning, but that is
usually tricky.

For example, what if one of the queues got stuck on a lock?

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-15 Thread Pavan Deolasee

Simon Riggs wrote:
> On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
>> Christopher Browne wrote:
>>
>>> Seems to me that you could get ~80% of the way by having the simplest
>>> "2 queue" implementation, where tables with size < some threshold get
>>> thrown at the "little table" queue, and tables above that size go to
>>> the "big table" queue.
>>>
>>> That should keep any small tables from getting "vacuum-starved."
>

This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.

In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.

>
> Some feedback from initial testing is that 2 queues probably isn't
> enough. If you have tables with 100s of blocks and tables with millions
> of blocks, the tables in the mid-range still lose out. So I'm thinking
> that a design with 3 queues based upon size ranges, plus the idea that
> when a queue is empty it will scan for tables slightly above/below its
> normal range. That way we wouldn't need to specify the cut-offs with a
> difficult to understand new set of GUC parameters, define them exactly
> and then have them be wrong when databases grow.
>
> The largest queue would be the one reserved for Xid wraparound
> avoidance. No table would be eligible for more than one queue at a time,
> though it might change between queues as it grows.
>
> Alvaro, have you completed your design?
>
> Pavan, what are your thoughts?
>

IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.

Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?

Thanks,
Pavan




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-14 Thread Pavan Deolasee

Simon Riggs wrote:

On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:

Christopher Browne wrote:


Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.

That should keep any small tables from getting "vacuum-starved."




This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.

In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.



Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.

The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.

Alvaro, have you completed your design?

Pavan, what are your thoughts?



IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.

Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?

Thanks,
Pavan





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-13 Thread elein
On Fri, Jan 12, 2007 at 07:33:05PM -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > Some feedback from initial testing is that 2 queues probably isn't
> > enough. If you have tables with 100s of blocks and tables with millions
> > of blocks, the tables in the mid-range still lose out. So I'm thinking
> > that a design with 3 queues based upon size ranges, plus the idea that
> > when a queue is empty it will scan for tables slightly above/below its
> > normal range.
> 
> Yeah, eventually it occurred to me the fact that as soon as you have 2
> queues, you may as well want to have 3 or in fact any number.  Which in
> my proposal is very easily achieved.
> 
> 
> > Alvaro, have you completed your design?
> 
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that is
> harmful because it makes other people waste time thinking on issues that
> I may already have resolved, and delays the bashing that yet others will
> surely inflict on my proposal, which is never a good thing ;-)  So maybe
> I'll put in a stub about the "queues" stuff and see how people like the
> whole thing.

Have you made any consideration of providing feedback on autovacuum to users?
Right now we don't even know what tables were vacuumed when and what was
reaped.  This might actually be another topic.

---elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-12 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Alvaro 
Herrera) transmitted:
> Simon Riggs wrote:
>
>> Some feedback from initial testing is that 2 queues probably isn't
>> enough. If you have tables with 100s of blocks and tables with
>> millions of blocks, the tables in the mid-range still lose out. So
>> I'm thinking that a design with 3 queues based upon size ranges,
>> plus the idea that when a queue is empty it will scan for tables
>> slightly above/below its normal range.
>
> Yeah, eventually it occurred to me the fact that as soon as you have
> 2 queues, you may as well want to have 3 or in fact any number.
> Which in my proposal is very easily achieved.

Adding an extra attribute to reflect a different ordering or a
different policy allows having as many queues in one queue table as
you might need.

>> Alvaro, have you completed your design?
>
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that
> is harmful because it makes other people waste time thinking on
> issues that I may already have resolved, and delays the bashing that
> yet others will surely inflict on my proposal, which is never a good
> thing ;-) So maybe I'll put in a stub about the "queues" stuff and
> see how people like the whole thing.

Seems like a good idea to me.

Implementing multiple queues amounts to having different worker
processes/threads that operate on the queue table using varying
policies.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/lisp.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-12 Thread Simon Riggs
On Fri, 2007-01-12 at 19:33 -0300, Alvaro Herrera wrote:

> > Alvaro, have you completed your design?
> 
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that is
> harmful because it makes other people waste time thinking on issues that
> I may already have resolved, and delays the bashing that yet others will
> surely inflict on my proposal, which is never a good thing ;-)  So maybe
> I'll put in a stub about the "queues" stuff and see how people like the
> whole thing.

I've not read a word spoken against the general idea, so I think we
should pursue this actively for 8.3. It should be straightforward to
harvest the good ideas, though there will definitely be many.

Perhaps we should focus on the issues that might result, so that we
address those before we spend time on the details of the user interface.
Can we deadlock or hang from running multiple autovacuums?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-12 Thread Alvaro Herrera
Simon Riggs wrote:

> Some feedback from initial testing is that 2 queues probably isn't
> enough. If you have tables with 100s of blocks and tables with millions
> of blocks, the tables in the mid-range still lose out. So I'm thinking
> that a design with 3 queues based upon size ranges, plus the idea that
> when a queue is empty it will scan for tables slightly above/below its
> normal range.

Yeah, eventually it occurred to me the fact that as soon as you have 2
queues, you may as well want to have 3 or in fact any number.  Which in
my proposal is very easily achieved.


> Alvaro, have you completed your design?

No, I haven't, and the part that's missing is precisely the queues
stuff.  I think I've been delaying posting it for too long, and that is
harmful because it makes other people waste time thinking on issues that
I may already have resolved, and delays the bashing that yet others will
surely inflict on my proposal, which is never a good thing ;-)  So maybe
I'll put in a stub about the "queues" stuff and see how people like the
whole thing.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-12 Thread Simon Riggs
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
> Christopher Browne wrote:
> 
> > Seems to me that you could get ~80% of the way by having the simplest
> > "2 queue" implementation, where tables with size < some threshold get
> > thrown at the "little table" queue, and tables above that size go to
> > the "big table" queue.
> > 
> > That should keep any small tables from getting "vacuum-starved."
> 
> Hmm, would it make sense to keep 2 queues, one that goes through the
> tables in smaller-to-larger order, and the other one in the reverse
> direction?
> 
> I am currently writing a design on how to create "vacuum queues" but I'm
> thinking that maybe it's getting too complex to handle, and a simple
> idea like yours is enough (given sufficient polish).

Sounds good to me. My colleague Pavan has just suggested multiple
autovacuums and then prototyped something almost as a side issue while
trying to solve other problems. I'll show him this entry, maybe he saw
it already? I wasn't following this discussion until now.

The 2 queue implementation seemed to me to be the most straightforward
implementation, mirroring Chris' suggestion. A few aspects that haven't
been mentioned are:
- if you have more than one VACUUM running, we'll need to watch memory
management. Having different queues based upon table size is a good way
of doing that, since the smaller queues have a naturally limited memory
consumption.
- with different size-based queues, the larger VACUUMs can be delayed so
they take much longer, while the small tables can go straight through

Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.

The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.

Alvaro, have you completed your design?

Pavan, what are your thoughts?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq