Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-09-11 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I've applied this but I'm now having some second thoughts about it,
>> because I'm seeing an actual *decrease* in pgbench numbers from the
>> immediately prior CVS HEAD code.

> The attached patch requires the new row to fit, and 10% to be free on
> the page.  Would someone test that?

At the moment, I cannot replicate any consistent difference between
CVS head with the patch, without the patch, with the patch plus your
BLCKSZ/10 limit addition, or with a variant BLCKSZ/32 limit addition.
That's whether I use HEAD's broken version of pgbench or one from late
July.  So I'm feeling a tad frustrated ... but I have no evidence in
favor of changing what is in CVS, and accordingly recommend that we
leave well enough alone for 8.2.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-08-09 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Have we made a decision on this issue?  Should I apply my patch that
> still forces a split unless 10% of the page has been freed?

I haven't gotten back to doing any more performance testing.  Please
stick that patch on the pending queue, so we don't forget it, but
don't apply it yet ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-08-09 Thread Bruce Momjian

Have we made a decision on this issue?  Should I apply my patch that
still forces a split unless 10% of the page has been freed?

---

Tom Lane wrote:
> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > This is a revised patch originated by Junji TERAMOTO for HEAD.
> >   [BTree vacuum before page splitting]
> >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > I think we can resurrect his idea because we will scan btree pages
> > at-atime now; the missing-restarting-point problem went away.
> 
> I've applied this but I'm now having some second thoughts about it,
> because I'm seeing an actual *decrease* in pgbench numbers from the
> immediately prior CVS HEAD code.  Using
>   pgbench -i -s 10 bench
>   pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
> with fsync off but all other settings factory-stock, what I'm seeing
> is that the first run looks really good but subsequent runs tail off in
> spectacular fashion :-(  Pre-patch there was only minor degradation in
> successive runs.
> 
> What I think is happening is that because pgbench depends so heavily on
> updating existing records, we get into a state where an index page is
> about full and there's one dead tuple on it, and then for each insertion
> we have
> 
>   * check for uniqueness marks one more tuple dead (the
> next-to-last version of the tuple)
>   * newly added code removes one tuple and does a write
>   * now there's enough room to insert one tuple
>   * lather, rinse, repeat, never splitting the page.
> 
> The problem is that we've traded splitting a page every few hundred
> inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
> record, on *every* insert.  This is not good.
> 
> Had you done any performance testing on this patch, and if so what
> tests did you use?  I'm a bit hesitant to try to fix it on the basis
> of pgbench results alone.
> 
> One possible fix that comes to mind is to only perform the cleanup
> if we are able to remove more than one dead tuple (perhaps about 10
> would be good).  Or do the deletion anyway, but then go ahead and
> split the page unless X amount of space has been freed (where X is
> more than just barely enough for the incoming tuple).
> 
> After all the thought we've put into this, it seems a shame to
> just abandon it :-(.  But it definitely needs more tweaking.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-28 Thread Jim C. Nasby
On Thu, Jul 27, 2006 at 05:24:35PM -0400, Greg Stark wrote:
> 
> Jim Nasby <[EMAIL PROTECTED]> writes:
> 
> > Even if we stopped right there it would still be a huge win in many  (most?)
> > cases. How often do the indexes on a table comprise even 50%  of the table's
> > size? 
> 
> I would say they're usually roughly comparable actually. It depends on how
> wide your table is of course but the wider your table rows the more indexes
> you're likely to have on the table too.

I think the number of fields in a table will correlate with the number
of indexes, but I don't think the width of those fields matters.

> > Even in the  50% case, you've gone from 1.5X to .6X
> 
> Sure, and a 3x speedup is nothing to sneeze at, that would be a great
> improvement to vacuum. But it's still just a linear speedup and doesn't
> address the algorithmic problem. 
> 
> The fundamental problem is we have a process that's O(m) where m is the total
> space taken by a table and its indexes. The actual amount of space it has to
> reclaim is n. Other than n figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).
> 
> We frequently assume -- and often it's a valid assumption -- that these
> figures are roughly proportional. Hence all the talk about databases reaching
> a "steady-state" where the amount of dead space is constantly being reclaimed
> at more or less the same speed it's being generated. But there are also plenty
> of use cases where a complete vacuum pass takes thousands of times longer than
> the i/o it took to generate those dead tuples. Currently Postgres just isn't
> that great a tool for those use cases.
 
This is exactly why I'm suggesting that we stop waiting for the perfect
vacuum that will only hit the exact tuples in both the heap and indexes
that it needs to and at least take the giant leap forward of only
hitting heap tuples/pages that we know are dead. Even if indexes are the
same size as the heap, you've still gained nearly a 2x speed improvement
(depending on how long you wait to vacuum).

> Unfortunately while I'm convinced of the problem I'm equally unconvinced of
> the solution. I tried to solve online index builds using retail index lookups
> in a very similar way to what's being discussed here. And ran into the same
> problems. I eventually decided that while it could be made to work that way it
> would be far too much code, far too unsafe, and far too invasive in the index
> access methods to be the right approach.
> 
> Our existing method works with minimal help from the index access methods
> which allows for an enormous degree of freedom in the index design.. To be
> able to support retail vacuum you would have to force index method
> implementors to keep information in a way that allowed them to look up a
> particular value/tid efficiently which would limit the kinds of indexes you
> could support drastically.
> 
> -- 
> greg
> 
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Greg Stark

Jim Nasby <[EMAIL PROTECTED]> writes:

> Even if we stopped right there it would still be a huge win in many  (most?)
> cases. How often do the indexes on a table comprise even 50%  of the table's
> size? 

I would say they're usually roughly comparable actually. It depends on how
wide your table is of course but the wider your table rows the more indexes
you're likely to have on the table too.

> Even in the  50% case, you've gone from 1.5X to .6X

Sure, and a 3x speedup is nothing to sneeze at, that would be a great
improvement to vacuum. But it's still just a linear speedup and doesn't
address the algorithmic problem. 

The fundamental problem is we have a process that's O(m) where m is the total
space taken by a table and its indexes. The actual amount of space it has to
reclaim is n. Other than n

Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Jim Nasby

On Jul 26, 2006, at 10:29 AM, Tom Lane wrote:

Gregory Stark <[EMAIL PROTECTED]> writes:

... Well it's not like the existing vacuum checks for this.


Right, that's exactly why the patch works at all.  But the point  
here is

that the existing vacuum does not rely on re-computing index keys; all
it cares about is matching TIDs.  The retail-vacuum idea depends on  
the
assumption that you can look at the tuple and re-compute the same  
index

keys that you computed the first time; which is an assumption much
shakier than the assumption that TID comparison works.  (In fact, it's
trivial to see how user-defined functions that are mislabeled  
immutable

could make this fail.)  So retail vacuum without any cross-check that
you got all the index tuples is a scary proposition IMHO.


Are there other cases that could cause a problem besides mislabeled  
user function indexes (a case I don't feel we need to protect  
against) and bugs in core?


If we don't worry about users who can't figure out what immutable  
means, we should be able to cover core bugs by having the buildfarm  
occasionally do long-running pg_bench (or some other workload) runs  
with autovacuum turned on. Knowing what autovacuum is set to, we can  
calculate approximately how much dead/empty space should exist in  
indexes, and make sure we're within reason (though this might mean  
having a mode that disables deleting known-dead tuples before  
splitting a page). Another possibility would be actually inspecting  
the indexes for invalid entries.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Jim Nasby

On Jul 26, 2006, at 4:29 PM, Hannu Krosing wrote:
Well the desire for it comes from a very well established need  
for dealing
with extremely large tales with relatively small hot spots. The  
basic problem
being that currently the cost of vacuum is proportional to the  
size of the
table rather than the amount of dead space. There's no link  
between those
variables (at least in one direction) and any time they're far  
out of whack it

means excruciating pain for the DBA.


I thought the suggested solution for that was the dead space map.  
That

way vacuum can ignore parts of the table that havn't changed...


It can ignore parts of the *table* but still has to scan full  
*indexes*.


Even if we stopped right there it would still be a huge win in many  
(most?) cases. How often do the indexes on a table comprise even 50%  
of the table's size? If indexes are 10% of the table's size, and you  
only scan 10% of the table that's dead, you've gone from scanning  
1.10*X pages (where X is the number of pages in the table) to 0.20*X  
pages. For large tables, that's a drastic improvement. Even in the  
50% case, you've gone from 1.5X to .6X (assuming 10% dead space). And  
I'm actually ignoring that we currently scan the heap twice; if you  
throw that into the mix the argument for doing this is even stronger.


While it would be ideal to eliminate the need to scan the indexes,  
I'd certainly rather have the 'half-solution' of not scanning the  
entire heap and still scanning the indexes over what we have today.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 23:02, kirjutas Martijn van
Oosterhout:
> On Wed, Jul 26, 2006 at 12:47:57PM -0400, Greg Stark wrote:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> > 
> > > So far, the case hasn't been made for retail vacuum even ignoring the
> > > not-so-immutable-function risk.
> > 
> > Well the desire for it comes from a very well established need for dealing
> > with extremely large tales with relatively small hot spots. The basic 
> > problem
> > being that currently the cost of vacuum is proportional to the size of the
> > table rather than the amount of dead space. There's no link between those
> > variables (at least in one direction) and any time they're far out of whack 
> > it
> > means excruciating pain for the DBA.
> 
> I thought the suggested solution for that was the dead space map. That
> way vacuum can ignore parts of the table that havn't changed...

It can ignore parts of the *table* but still has to scan full *indexes*.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Martijn van Oosterhout
On Wed, Jul 26, 2006 at 12:47:57PM -0400, Greg Stark wrote:
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > So far, the case hasn't been made for retail vacuum even ignoring the
> > not-so-immutable-function risk.
> 
> Well the desire for it comes from a very well established need for dealing
> with extremely large tales with relatively small hot spots. The basic problem
> being that currently the cost of vacuum is proportional to the size of the
> table rather than the amount of dead space. There's no link between those
> variables (at least in one direction) and any time they're far out of whack it
> means excruciating pain for the DBA.

I thought the suggested solution for that was the dead space map. That
way vacuum can ignore parts of the table that havn't changed...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> So far, the case hasn't been made for retail vacuum even ignoring the
> not-so-immutable-function risk.

Well the desire for it comes from a very well established need for dealing
with extremely large tales with relatively small hot spots. The basic problem
being that currently the cost of vacuum is proportional to the size of the
table rather than the amount of dead space. There's no link between those
variables (at least in one direction) and any time they're far out of whack it
means excruciating pain for the DBA.

The case that perhaps hasn't been made is for whether retail vacuum is the
best solution. The only other candidate that I've seen proposed (many many
times) is some form of segregating the older tuples a la Oracle.

That doesn't mean retail vacuuming is a good idea. It may just be that we
haven't thought of a the best option yet. But it also may be that retail
vacuuming or some kind of rollback segment is just the least bad idea.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Tom Lane
Csaba Nagy <[EMAIL PROTECTED]> writes:
>> [snip] (In fact, it's
>> trivial to see how user-defined functions that are mislabeled immutable
>> could make this fail.)  So retail vacuum without any cross-check that
>> you got all the index tuples is a scary proposition IMHO.

> Wouldn't work to restrict that kind of vacuum to only tables which have
> no indexes using user defined functions ?

Of course, we never have bugs in PG core.  Nope, doesn't happen ...

> I actually wonder if such a vacuum would be useful for my scenario,
> where I have some pretty big tables, and update a relatively small
> percentage of it. Would it be faster to run such a vacuum against the
> current one ?

So far, the case hasn't been made for retail vacuum even ignoring the
not-so-immutable-function risk.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Csaba Nagy
> [snip] (In fact, it's
> trivial to see how user-defined functions that are mislabeled immutable
> could make this fail.)  So retail vacuum without any cross-check that
> you got all the index tuples is a scary proposition IMHO.

Wouldn't work to restrict that kind of vacuum to only tables which have
no indexes using user defined functions ? That would mean a very small
restriction I guess, probably 99.9% of the indexes won't use user
defined functions...

I actually wonder if such a vacuum would be useful for my scenario,
where I have some pretty big tables, and update a relatively small
percentage of it. Would it be faster to run such a vacuum against the
current one ?
One example would be a ~100 million table where I have 1-4 million
updates per day. Could I run vacuum multiple times a day for this table
and expect that individual runs are relatively fast ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> ... Well it's not like the existing vacuum checks for this.

Right, that's exactly why the patch works at all.  But the point here is
that the existing vacuum does not rely on re-computing index keys; all
it cares about is matching TIDs.  The retail-vacuum idea depends on the
assumption that you can look at the tuple and re-compute the same index
keys that you computed the first time; which is an assumption much
shakier than the assumption that TID comparison works.  (In fact, it's
trivial to see how user-defined functions that are mislabeled immutable
could make this fail.)  So retail vacuum without any cross-check that
you got all the index tuples is a scary proposition IMHO.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > This is a revised patch originated by Junji TERAMOTO for HEAD.
> >   [BTree vacuum before page splitting]
> >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > I think we can resurrect his idea because we will scan btree pages
> > at-atime now; the missing-restarting-point problem went away.
> > Have I missed something? Comments welcome.
> 
> I think the only serious objection to this would be that it'd mean that
> tuples that should have an index entry might not have one.  The current
> form of VACUUM does not care, but people keep raising the idea of doing
> "retail" vacuuming that operates by looking up index entries explicitly.
> You could certainly make a retail vacuumer do nothing if it fails to
> find the expected index entry, but ISTM that'd be a rather serious loss
> of consistency checking --- you could not tell the someone-already-
> deleted-it case apart from a bug in the vacuumer's index value
> computation or lookup.

Well you already have that case anyways due to online index builds. (If a
vacuum gets in between the two transactions.) I suppose you could go and check
whether the pg_index entry for the index indicates that it's valid already but
that's not something anyone has to be looking for currently.

> Personally I don't think retail vacuuming in that form will ever fly
> anyway, so I have no problem with installing the proposed patch,
> but I thought I'd better throw this comment out to see if anyone
> thinks it's a big deal.

Well it's not like the existing vacuum checks for this. It's not even
convenient to check for it in the current vacuum architecture. It would have
to maintain the list of expired htids that haven't been found yet in the
bulkdelete and see if there are any left when it's done. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-25 Thread Bruce Momjian

Tom, I ran your tests with fsync off (as you did), and saw numbers
bouncing between 400-700 tps without my patch, and sticking at 700 tps
with my patch.

---

Bruce Momjian wrote:
> 
> The attached patch requires the new row to fit, and 10% to be free on
> the page.  Would someone test that?
> 
> ---
> 
> Tom Lane wrote:
> > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > > This is a revised patch originated by Junji TERAMOTO for HEAD.
> > >   [BTree vacuum before page splitting]
> > >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > > I think we can resurrect his idea because we will scan btree pages
> > > at-atime now; the missing-restarting-point problem went away.
> > 
> > I've applied this but I'm now having some second thoughts about it,
> > because I'm seeing an actual *decrease* in pgbench numbers from the
> > immediately prior CVS HEAD code.  Using
> > pgbench -i -s 10 bench
> > pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
> > with fsync off but all other settings factory-stock, what I'm seeing
> > is that the first run looks really good but subsequent runs tail off in
> > spectacular fashion :-(  Pre-patch there was only minor degradation in
> > successive runs.
> > 
> > What I think is happening is that because pgbench depends so heavily on
> > updating existing records, we get into a state where an index page is
> > about full and there's one dead tuple on it, and then for each insertion
> > we have
> > 
> > * check for uniqueness marks one more tuple dead (the
> >   next-to-last version of the tuple)
> > * newly added code removes one tuple and does a write
> > * now there's enough room to insert one tuple
> > * lather, rinse, repeat, never splitting the page.
> > 
> > The problem is that we've traded splitting a page every few hundred
> > inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
> > record, on *every* insert.  This is not good.
> > 
> > Had you done any performance testing on this patch, and if so what
> > tests did you use?  I'm a bit hesitant to try to fix it on the basis
> > of pgbench results alone.
> > 
> > One possible fix that comes to mind is to only perform the cleanup
> > if we are able to remove more than one dead tuple (perhaps about 10
> > would be good).  Or do the deletion anyway, but then go ahead and
> > split the page unless X amount of space has been freed (where X is
> > more than just barely enough for the incoming tuple).
> > 
> > After all the thought we've put into this, it seems a shame to
> > just abandon it :-(.  But it definitely needs more tweaking.
> > 
> > regards, tom lane
> > 
> > ---(end of broadcast)---
> > TIP 4: Have you searched our list archives?
> > 
> >http://archives.postgresql.org
> 
> -- 
>   Bruce Momjian   [EMAIL PROTECTED]
>   EnterpriseDBhttp://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +


> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-25 Thread Bruce Momjian

The attached patch requires the new row to fit, and 10% to be free on
the page.  Would someone test that?

---

Tom Lane wrote:
> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > This is a revised patch originated by Junji TERAMOTO for HEAD.
> >   [BTree vacuum before page splitting]
> >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > I think we can resurrect his idea because we will scan btree pages
> > at-atime now; the missing-restarting-point problem went away.
> 
> I've applied this but I'm now having some second thoughts about it,
> because I'm seeing an actual *decrease* in pgbench numbers from the
> immediately prior CVS HEAD code.  Using
>   pgbench -i -s 10 bench
>   pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
> with fsync off but all other settings factory-stock, what I'm seeing
> is that the first run looks really good but subsequent runs tail off in
> spectacular fashion :-(  Pre-patch there was only minor degradation in
> successive runs.
> 
> What I think is happening is that because pgbench depends so heavily on
> updating existing records, we get into a state where an index page is
> about full and there's one dead tuple on it, and then for each insertion
> we have
> 
>   * check for uniqueness marks one more tuple dead (the
> next-to-last version of the tuple)
>   * newly added code removes one tuple and does a write
>   * now there's enough room to insert one tuple
>   * lather, rinse, repeat, never splitting the page.
> 
> The problem is that we've traded splitting a page every few hundred
> inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
> record, on *every* insert.  This is not good.
> 
> Had you done any performance testing on this patch, and if so what
> tests did you use?  I'm a bit hesitant to try to fix it on the basis
> of pgbench results alone.
> 
> One possible fix that comes to mind is to only perform the cleanup
> if we are able to remove more than one dead tuple (perhaps about 10
> would be good).  Or do the deletion anyway, but then go ahead and
> split the page unless X amount of space has been freed (where X is
> more than just barely enough for the incoming tuple).
> 
> After all the thought we've put into this, it seems a shame to
> just abandon it :-(.  But it definitely needs more tweaking.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/access/nbtree/nbtinsert.c
===
RCS file: /cvsroot/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.142
diff -c -c -r1.142 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c	25 Jul 2006 19:13:00 -	1.142
--- src/backend/access/nbtree/nbtinsert.c	26 Jul 2006 01:35:52 -
***
*** 438,445 
  			if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop))
  			{
  _bt_vacuum_one_page(rel, buf);
! if (PageGetFreeSpace(page) >= itemsz)
! 	break;		/* OK, now we have enough space */
  			}
  
  			/*
--- 438,451 
  			if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop))
  			{
  _bt_vacuum_one_page(rel, buf);
! /*
!  *	Free space should be large enough for the new tuple and
!  *	should be >= 10% because scanning the page over and
!  *	over again to get just a little free space is inefficient.
!  */
! if (PageGetFreeSpace(page) >= itemsz &&
! 	PageGetFreeSpace(page) >= BLCKSZ / 10)
! 	break;
  			}
  
  			/*

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-25 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
>   [BTree vacuum before page splitting]
>   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code.  Using
pgbench -i -s 10 bench
pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-(  Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

* check for uniqueness marks one more tuple dead (the
  next-to-last version of the tuple)
* newly added code removes one tuple and does a write
* now there's enough room to insert one tuple
* lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert.  This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use?  I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good).  Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(.  But it definitely needs more tweaking.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-25 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

> Have I missed something? Comments welcome.

I was thinking for awhile just now that this would break the interlock
that guarantees VACUUM can't delete a heap tuple that an indexscanning
process is about to visit.  After further thought, it doesn't, but it's
non-obvious.  I've added the attached commentary to nbtree/README:


On-the-fly deletion of index tuples
---

If a process visits a heap tuple and finds that it's dead and removable
(ie, dead to all open transactions, not only that process), then we can
return to the index and mark the corresponding index entry "known dead",
allowing subsequent index scans to skip visiting the heap tuple.  The
"known dead" marking uses the LP_DELETE bit in ItemIds.  This is currently
only done in plain indexscans, not bitmap scans, because only plain scans
visit the heap and index "in sync" and so there's not a convenient way
to do it for bitmap scans.

Once an index tuple has been marked LP_DELETE it can actually be removed
from the index immediately; since index scans only stop "between" pages,
no scan can lose its place from such a deletion.  We separate the steps
because we allow LP_DELETE to be set with only a share lock (it's exactly
like a hint bit for a heap tuple), but physically removing tuples requires
exclusive lock.  In the current code we try to remove LP_DELETE tuples when
we are otherwise faced with having to split a page to do an insertion (and
hence have exclusive lock on it already).

This leaves the index in a state where it has no entry for a dead tuple
that still exists in the heap.  This is not a problem for the current
implementation of VACUUM, but it could be a problem for anything that
explicitly tries to find index entries for dead tuples.  (However, the
same situation is created by REINDEX, since it doesn't enter dead
tuples into the index.)

It's sufficient to have an exclusive lock on the index page, not a
super-exclusive lock, to do deletion of LP_DELETE items.  It might seem
that this breaks the interlock between VACUUM and indexscans, but that is
not so: as long as an indexscanning process has a pin on the page where
the index item used to be, VACUUM cannot complete its btbulkdelete scan
and so cannot remove the heap tuple.  This is another reason why
btbulkdelete has to get super-exclusive lock on every leaf page, not only
the ones where it actually sees items to delete.


regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-24 Thread Bruce Momjian
Tom Lane wrote:
> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > This is a revised patch originated by Junji TERAMOTO for HEAD.
> >   [BTree vacuum before page splitting]
> >   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> > I think we can resurrect his idea because we will scan btree pages
> > at-atime now; the missing-restarting-point problem went away.
> > Have I missed something? Comments welcome.
> 
> I think the only serious objection to this would be that it'd mean that
> tuples that should have an index entry might not have one.  The current
> form of VACUUM does not care, but people keep raising the idea of doing
> "retail" vacuuming that operates by looking up index entries explicitly.
> You could certainly make a retail vacuumer do nothing if it fails to
> find the expected index entry, but ISTM that'd be a rather serious loss
> of consistency checking --- you could not tell the someone-already-
> deleted-it case apart from a bug in the vacuumer's index value
> computation or lookup.
> 
> Personally I don't think retail vacuuming in that form will ever fly
> anyway, so I have no problem with installing the proposed patch,
> but I thought I'd better throw this comment out to see if anyone
> thinks it's a big deal.

Agreed.  Reverse lookup of index entries will always be too slow.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-24 Thread Heikki Linnakangas

On Mon, 24 Jul 2006, Tom Lane wrote:


Personally I don't think retail vacuuming in that form will ever fly
anyway, so I have no problem with installing the proposed patch,
but I thought I'd better throw this comment out to see if anyone
thinks it's a big deal.


My feeling is that retail vacuuming would be useful some day. But it's 
certainly not going to be there in 8.2 so I have no objection with the 
patch. It's a fairly localized change; it can easily be reverted later if 
necessary.


- Heikki

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-24 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
>   [BTree vacuum before page splitting]
>   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.
> Have I missed something? Comments welcome.

I think the only serious objection to this would be that it'd mean that
tuples that should have an index entry might not have one.  The current
form of VACUUM does not care, but people keep raising the idea of doing
"retail" vacuuming that operates by looking up index entries explicitly.
You could certainly make a retail vacuumer do nothing if it fails to
find the expected index entry, but ISTM that'd be a rather serious loss
of consistency checking --- you could not tell the someone-already-
deleted-it case apart from a bug in the vacuumer's index value
computation or lookup.

Personally I don't think retail vacuuming in that form will ever fly
anyway, so I have no problem with installing the proposed patch,
but I thought I'd better throw this comment out to see if anyone
thinks it's a big deal.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster