Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-28 Thread Jim Beckstrom
Just for the humor of it, as well as to confirm Nick's perspective, 
years ago on our inhouse developed Burroughs mainframe dbms, we had a 
process called garbage collect.

Nicolai Tufar wrote:

I always wandered if VACUUM is the right name for the porcess. Now, when
PostgreSQL
is actively challenging in Enterprise space, it might be a good idea to give
it a more
enterprise-like name. Try to think how it is looking for an outside person
to see
us, database professionals hold lenghty discussions about the ways we
vacuum a database. Why should you need to vacuum a database? Is it
dirty? In my personal opinion, something like space reclaiming daemon,
free-list organizer, tuple recyle job or segment coalesce process
would
sound more business-like .

Regards,
Nick


- Original Message -
From: Bruce Momjian [EMAIL PROTECTED]
To: Curtis Faith [EMAIL PROTECTED]
Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED];
PgSQL Performance ML [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 9:09 PM
Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
insert/delete/update


 

Good ideas.  I think the master solution is to hook the statistics
daemon information into an automatic vacuum that could _know_ which
tables need attention.

--
   

-
 

Curtis Faith wrote:
   

tom lane wrote:
 

Sure, it's just shuffling the housekeeping work from one place to
another.  The thing that I like about Postgres' approach is that we
put the housekeeping in a background task (VACUUM) rather than in the
critical path of foreground transaction commit.
   

Thinking with my marketing hat on, MVCC would be a much bigger win if
 

VACUUM
 

was not required (or was done automagically). The need for periodic
 

VACUUM
 

just gives ammunition to the PostgreSQL opponents who can claim we are
deferring work but that it amounts to the same thing.

A fully automatic background VACUUM will significantly reduce but will
 

not
 

eliminate this perceived weakness.

However, it always seemed to me there should be some way to reuse the
 

space
 

more dynamically and quickly than a background VACUUM thereby reducing
 

the
 

percentage of tuples that are expired in heavy update cases. If only a
 

very
 

tiny number of tuples on the disk are expired this will reduce the
 

aggregate
 

performance/space penalty of MVCC into insignificance for the majority
 

of
 

uses.

Couldn't we reuse tuple and index space as soon as there are no
 

transactions
 

that depend on the old tuple or index values. I have imagined that this
 

was
 

always part of the long-term master plan.

Couldn't we keep a list of dead tuples in shared memory and look in the
 

list
 

first when deciding where to place new values for inserts or updates so
 

we
 

don't have to rely on VACUUM (even a background one)? If there are
 

expired
 

tuple slots in the list these would be used before allocating a new slot
 

from
 

the tuple heap.

The only issue is determining the lowest transaction ID for in-process
transactions which seems relatively easy to do (if it's not already done
somewhere).

In the normal shutdown and startup case, a tuple VACUUM could be
 

performed
 

automatically. This would normally be very fast since there would not be
 

many
 

tuples in the list.

Index slots would be handled differently since these cannot be
 

substituted
 

one for another. However, these could be recovered as part of every
 

index
 

page update. Pages would be scanned before being written and any expired
slots that had transaction ID's lower than the lowest active slot would
 

be
 

removed. This could be done for non-leaf pages as well and would result
 

in
 

only reorganizing a page that is already going to be written thereby not
adding much to the overall work.

I don't think that internal pages that contain pointers to values in
 

nodes
 

further down the tree that are no longer in the leaf nodes because of
 

this
 

partial expired entry elimination will cause a problem since searches
 

and
 

scans will still work fine.

Does VACUUM do something that could not be handled in this realtime
 

manner?
 

- Curtis



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

 

--
 Bruce Momjian|  http://candle.pha.pa.us
 [EMAIL PROTECTED]   |  (610) 359-1001
 +  If your life is a hard drive, |  13 Roberts Road
 +  Christ can be your backup.|  Newtown Square, Pennsylvania
   

19073
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

   



---(end

Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-27 Thread Nicolai Tufar
I always wandered if VACUUM is the right name for the porcess. Now, when
PostgreSQL
is actively challenging in Enterprise space, it might be a good idea to give
it a more
enterprise-like name. Try to think how it is looking for an outside person
to see
us, database professionals hold lenghty discussions about the ways we
vacuum a database. Why should you need to vacuum a database? Is it
dirty? In my personal opinion, something like space reclaiming daemon,
free-list organizer, tuple recyle job or segment coalesce process
would
sound more business-like .

Regards,
Nick


- Original Message -
From: Bruce Momjian [EMAIL PROTECTED]
To: Curtis Faith [EMAIL PROTECTED]
Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED];
PgSQL Performance ML [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 9:09 PM
Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
insert/delete/update



 Good ideas.  I think the master solution is to hook the statistics
 daemon information into an automatic vacuum that could _know_ which
 tables need attention.

 --
-

 Curtis Faith wrote:
  tom lane wrote:
   Sure, it's just shuffling the housekeeping work from one place to
   another.  The thing that I like about Postgres' approach is that we
   put the housekeeping in a background task (VACUUM) rather than in the
   critical path of foreground transaction commit.
 
  Thinking with my marketing hat on, MVCC would be a much bigger win if
VACUUM
  was not required (or was done automagically). The need for periodic
VACUUM
  just gives ammunition to the PostgreSQL opponents who can claim we are
  deferring work but that it amounts to the same thing.
 
  A fully automatic background VACUUM will significantly reduce but will
not
  eliminate this perceived weakness.
 
  However, it always seemed to me there should be some way to reuse the
space
  more dynamically and quickly than a background VACUUM thereby reducing
the
  percentage of tuples that are expired in heavy update cases. If only a
very
  tiny number of tuples on the disk are expired this will reduce the
aggregate
  performance/space penalty of MVCC into insignificance for the majority
of
  uses.
 
  Couldn't we reuse tuple and index space as soon as there are no
transactions
  that depend on the old tuple or index values. I have imagined that this
was
  always part of the long-term master plan.
 
  Couldn't we keep a list of dead tuples in shared memory and look in the
list
  first when deciding where to place new values for inserts or updates so
we
  don't have to rely on VACUUM (even a background one)? If there are
expired
  tuple slots in the list these would be used before allocating a new slot
from
  the tuple heap.
 
  The only issue is determining the lowest transaction ID for in-process
  transactions which seems relatively easy to do (if it's not already done
  somewhere).
 
  In the normal shutdown and startup case, a tuple VACUUM could be
performed
  automatically. This would normally be very fast since there would not be
many
  tuples in the list.
 
  Index slots would be handled differently since these cannot be
substituted
  one for another. However, these could be recovered as part of every
index
  page update. Pages would be scanned before being written and any expired
  slots that had transaction ID's lower than the lowest active slot would
be
  removed. This could be done for non-leaf pages as well and would result
in
  only reorganizing a page that is already going to be written thereby not
  adding much to the overall work.
 
  I don't think that internal pages that contain pointers to values in
nodes
  further down the tree that are no longer in the leaf nodes because of
this
  partial expired entry elimination will cause a problem since searches
and
  scans will still work fine.
 
  Does VACUUM do something that could not be handled in this realtime
manner?
 
  - Curtis
 
 
 
  ---(end of broadcast)---
  TIP 4: Don't 'kill -9' the postmaster
 

 --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania
19073

 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-27 Thread Tommi Maekitalo
Or just reorg.

Am Mittwoch, 27. November 2002 15:02 schrieb Nicolai Tufar:
 I always wandered if VACUUM is the right name for the porcess. Now, when
 PostgreSQL
 is actively challenging in Enterprise space, it might be a good idea to
 give it a more
 enterprise-like name. Try to think how it is looking for an outside person
 to see
 us, database professionals hold lenghty discussions about the ways we
 vacuum a database. Why should you need to vacuum a database? Is it
 dirty? In my personal opinion, something like space reclaiming daemon,
 free-list organizer, tuple recyle job or segment coalesce process
 would
 sound more business-like .

 Regards,
 Nick


 - Original Message -
 From: Bruce Momjian [EMAIL PROTECTED]
 To: Curtis Faith [EMAIL PROTECTED]
 Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED];
 PgSQL Performance ML [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Sent: Tuesday, November 26, 2002 9:09 PM
 Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
 insert/delete/update

  Good ideas.  I think the master solution is to hook the statistics
  daemon information into an automatic vacuum that could _know_ which
  tables need attention.
 
  -
 -

 -

  Curtis Faith wrote:
   tom lane wrote:
Sure, it's just shuffling the housekeeping work from one place to
another.  The thing that I like about Postgres' approach is that we
put the housekeeping in a background task (VACUUM) rather than in the
critical path of foreground transaction commit.
  
   Thinking with my marketing hat on, MVCC would be a much bigger win if

 VACUUM

   was not required (or was done automagically). The need for periodic

 VACUUM

   just gives ammunition to the PostgreSQL opponents who can claim we are
   deferring work but that it amounts to the same thing.
  
   A fully automatic background VACUUM will significantly reduce but will

 not

   eliminate this perceived weakness.
  
   However, it always seemed to me there should be some way to reuse the

 space

   more dynamically and quickly than a background VACUUM thereby reducing

 the

   percentage of tuples that are expired in heavy update cases. If only a

 very

   tiny number of tuples on the disk are expired this will reduce the

 aggregate

   performance/space penalty of MVCC into insignificance for the majority

 of

   uses.
  
   Couldn't we reuse tuple and index space as soon as there are no

 transactions

   that depend on the old tuple or index values. I have imagined that this

 was

   always part of the long-term master plan.
  
   Couldn't we keep a list of dead tuples in shared memory and look in the

 list

   first when deciding where to place new values for inserts or updates so

 we

   don't have to rely on VACUUM (even a background one)? If there are

 expired

   tuple slots in the list these would be used before allocating a new
   slot

 from

   the tuple heap.
  
   The only issue is determining the lowest transaction ID for in-process
   transactions which seems relatively easy to do (if it's not already
   done somewhere).
  
   In the normal shutdown and startup case, a tuple VACUUM could be

 performed

   automatically. This would normally be very fast since there would not
   be

 many

   tuples in the list.
  
   Index slots would be handled differently since these cannot be

 substituted

   one for another. However, these could be recovered as part of every

 index

   page update. Pages would be scanned before being written and any
   expired slots that had transaction ID's lower than the lowest active
   slot would

 be

   removed. This could be done for non-leaf pages as well and would result

 in

   only reorganizing a page that is already going to be written thereby
   not adding much to the overall work.
  
   I don't think that internal pages that contain pointers to values in

 nodes

   further down the tree that are no longer in the leaf nodes because of

 this

   partial expired entry elimination will cause a problem since searches

 and

   scans will still work fine.
  
   Does VACUUM do something that could not be handled in this realtime

 manner?

   - Curtis
  
  
  
   ---(end of
   broadcast)--- TIP 4: Don't 'kill -9' the
   postmaster
 
  --
Bruce Momjian|  http://candle.pha.pa.us
[EMAIL PROTECTED]   |  (610) 359-1001
+  If your life is a hard drive, |  13 Roberts Road
+  Christ can be your backup.|  Newtown Square, Pennsylvania

 19073

  ---(end of broadcast)---
  TIP 2: you can get off all lists at once with the unregister command
  (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
Dr. Eckhardt

Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-27 Thread Dave Page


 -Original Message-
 From: Nicolai Tufar [mailto:[EMAIL PROTECTED]] 
 Sent: 27 November 2002 14:02
 To: [EMAIL PROTECTED]; PgSQL Performance ML
 Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: 
 performance of insert/delete/update
 
 
 I always wandered if VACUUM is the right name for the 
 porcess. Now, when PostgreSQL is actively challenging in 
 Enterprise space, it might be a good idea to give it a more 
 enterprise-like name. Try to think how it is looking for an 
 outside person to see us, database professionals hold lenghty 
 discussions about the ways we vacuum a database. Why should 
 you need to vacuum a database? Is it dirty? In my personal 
 opinion, something like space reclaiming daemon, free-list 
 organizer, tuple recyle job or segment coalesce process 
 would sound more business-like .

As inspired by the SQL Server Enterprise Manager I've just been swearing
at:

Database Optimizer

Regards, Dave.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-26 Thread Curtis Faith
tom lane wrote:
 Sure, it's just shuffling the housekeeping work from one place to
 another.  The thing that I like about Postgres' approach is that we
 put the housekeeping in a background task (VACUUM) rather than in the
 critical path of foreground transaction commit.

Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM
was not required (or was done automagically). The need for periodic VACUUM
just gives ammunition to the PostgreSQL opponents who can claim we are
deferring work but that it amounts to the same thing.

A fully automatic background VACUUM will significantly reduce but will not
eliminate this perceived weakness.

However, it always seemed to me there should be some way to reuse the space
more dynamically and quickly than a background VACUUM thereby reducing the
percentage of tuples that are expired in heavy update cases. If only a very
tiny number of tuples on the disk are expired this will reduce the aggregate
performance/space penalty of MVCC into insignificance for the majority of
uses.

Couldn't we reuse tuple and index space as soon as there are no transactions
that depend on the old tuple or index values. I have imagined that this was
always part of the long-term master plan.

Couldn't we keep a list of dead tuples in shared memory and look in the list
first when deciding where to place new values for inserts or updates so we
don't have to rely on VACUUM (even a background one)? If there are expired
tuple slots in the list these would be used before allocating a new slot from
the tuple heap.

The only issue is determining the lowest transaction ID for in-process
transactions which seems relatively easy to do (if it's not already done
somewhere).

In the normal shutdown and startup case, a tuple VACUUM could be performed
automatically. This would normally be very fast since there would not be many
tuples in the list.

Index slots would be handled differently since these cannot be substituted
one for another. However, these could be recovered as part of every index
page update. Pages would be scanned before being written and any expired
slots that had transaction ID's lower than the lowest active slot would be
removed. This could be done for non-leaf pages as well and would result in
only reorganizing a page that is already going to be written thereby not
adding much to the overall work.

I don't think that internal pages that contain pointers to values in nodes
further down the tree that are no longer in the leaf nodes because of this
partial expired entry elimination will cause a problem since searches and
scans will still work fine.

Does VACUUM do something that could not be handled in this realtime manner?

- Curtis



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-26 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes:
 tom lane wrote:
 Sure, it's just shuffling the housekeeping work from one place to
 another.  The thing that I like about Postgres' approach is that we
 put the housekeeping in a background task (VACUUM) rather than in the
 critical path of foreground transaction commit.

 Couldn't we reuse tuple and index space as soon as there are no transactions
 that depend on the old tuple or index values. I have imagined that this was
 always part of the long-term master plan.
 Couldn't we keep a list of dead tuples in shared memory and look in the list
 first when deciding where to place new values for inserts or updates so we
 don't have to rely on VACUUM (even a background one)?

ISTM that either of these ideas would lead to pushing VACUUM overhead
into the foreground transactions, which is exactly what we don't want to
do.  Keep in mind also that shared memory is finite ... *very* finite.
It's bad enough trying to keep per-page status in there (cf FSM) ---
per-tuple status is right out.

I agree that automatic background VACUUMing would go a long way towards
reducing operational problems.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-26 Thread Bruce Momjian

Good ideas.  I think the master solution is to hook the statistics
daemon information into an automatic vacuum that could _know_ which
tables need attention.

---

Curtis Faith wrote:
 tom lane wrote:
  Sure, it's just shuffling the housekeeping work from one place to
  another.  The thing that I like about Postgres' approach is that we
  put the housekeeping in a background task (VACUUM) rather than in the
  critical path of foreground transaction commit.
 
 Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM
 was not required (or was done automagically). The need for periodic VACUUM
 just gives ammunition to the PostgreSQL opponents who can claim we are
 deferring work but that it amounts to the same thing.
 
 A fully automatic background VACUUM will significantly reduce but will not
 eliminate this perceived weakness.
 
 However, it always seemed to me there should be some way to reuse the space
 more dynamically and quickly than a background VACUUM thereby reducing the
 percentage of tuples that are expired in heavy update cases. If only a very
 tiny number of tuples on the disk are expired this will reduce the aggregate
 performance/space penalty of MVCC into insignificance for the majority of
 uses.
 
 Couldn't we reuse tuple and index space as soon as there are no transactions
 that depend on the old tuple or index values. I have imagined that this was
 always part of the long-term master plan.
 
 Couldn't we keep a list of dead tuples in shared memory and look in the list
 first when deciding where to place new values for inserts or updates so we
 don't have to rely on VACUUM (even a background one)? If there are expired
 tuple slots in the list these would be used before allocating a new slot from
 the tuple heap.
 
 The only issue is determining the lowest transaction ID for in-process
 transactions which seems relatively easy to do (if it's not already done
 somewhere).
 
 In the normal shutdown and startup case, a tuple VACUUM could be performed
 automatically. This would normally be very fast since there would not be many
 tuples in the list.
 
 Index slots would be handled differently since these cannot be substituted
 one for another. However, these could be recovered as part of every index
 page update. Pages would be scanned before being written and any expired
 slots that had transaction ID's lower than the lowest active slot would be
 removed. This could be done for non-leaf pages as well and would result in
 only reorganizing a page that is already going to be written thereby not
 adding much to the overall work.
 
 I don't think that internal pages that contain pointers to values in nodes
 further down the tree that are no longer in the leaf nodes because of this
 partial expired entry elimination will cause a problem since searches and
 scans will still work fine.
 
 Does VACUUM do something that could not be handled in this realtime manner?
 
 - Curtis
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html