Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for

2006-09-14 Thread Bruce Momjian

OK, removed from open item list.

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  I've applied this but I'm now having some second thoughts about it,
  because I'm seeing an actual *decrease* in pgbench numbers from the
  immediately prior CVS HEAD code.
 
  The attached patch requires the new row to fit, and 10% to be free on
  the page.  Would someone test that?
 
 At the moment, I cannot replicate any consistent difference between
 CVS head with the patch, without the patch, with the patch plus your
 BLCKSZ/10 limit addition, or with a variant BLCKSZ/32 limit addition.
 That's whether I use HEAD's broken version of pgbench or one from late
 July.  So I'm feeling a tad frustrated ... but I have no evidence in
 favor of changing what is in CVS, and accordingly recommend that we
 leave well enough alone for 8.2.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-09-11 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I've applied this but I'm now having some second thoughts about it,
 because I'm seeing an actual *decrease* in pgbench numbers from the
 immediately prior CVS HEAD code.

 The attached patch requires the new row to fit, and 10% to be free on
 the page.  Would someone test that?

At the moment, I cannot replicate any consistent difference between
CVS head with the patch, without the patch, with the patch plus your
BLCKSZ/10 limit addition, or with a variant BLCKSZ/32 limit addition.
That's whether I use HEAD's broken version of pgbench or one from late
July.  So I'm feeling a tad frustrated ... but I have no evidence in
favor of changing what is in CVS, and accordingly recommend that we
leave well enough alone for 8.2.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for

2006-09-02 Thread Bruce Momjian

Tom, should I apply this patch now?  Are you still considering other
options for this?

---

Bruce Momjian wrote:
 
 Tom, I ran your tests with fsync off (as you did), and saw numbers
 bouncing between 400-700 tps without my patch, and sticking at 700 tps
 with my patch.
 
 ---
 
 Bruce Momjian wrote:
  
  The attached patch requires the new row to fit, and 10% to be free on
  the page.  Would someone test that?
  
  ---
  
  Tom Lane wrote:
   ITAGAKI Takahiro [EMAIL PROTECTED] writes:
This is a revised patch originated by Junji TERAMOTO for HEAD.
  [BTree vacuum before page splitting]
  http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
I think we can resurrect his idea because we will scan btree pages
at-atime now; the missing-restarting-point problem went away.
   
   I've applied this but I'm now having some second thoughts about it,
   because I'm seeing an actual *decrease* in pgbench numbers from the
   immediately prior CVS HEAD code.  Using
 pgbench -i -s 10 bench
 pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
   with fsync off but all other settings factory-stock, what I'm seeing
   is that the first run looks really good but subsequent runs tail off in
   spectacular fashion :-(  Pre-patch there was only minor degradation in
   successive runs.
   
   What I think is happening is that because pgbench depends so heavily on
   updating existing records, we get into a state where an index page is
   about full and there's one dead tuple on it, and then for each insertion
   we have
   
 * check for uniqueness marks one more tuple dead (the
   next-to-last version of the tuple)
 * newly added code removes one tuple and does a write
 * now there's enough room to insert one tuple
 * lather, rinse, repeat, never splitting the page.
   
   The problem is that we've traded splitting a page every few hundred
   inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
   record, on *every* insert.  This is not good.
   
   Had you done any performance testing on this patch, and if so what
   tests did you use?  I'm a bit hesitant to try to fix it on the basis
   of pgbench results alone.
   
   One possible fix that comes to mind is to only perform the cleanup
   if we are able to remove more than one dead tuple (perhaps about 10
   would be good).  Or do the deletion anyway, but then go ahead and
   split the page unless X amount of space has been freed (where X is
   more than just barely enough for the incoming tuple).
   
   After all the thought we've put into this, it seems a shame to
   just abandon it :-(.  But it definitely needs more tweaking.
   
 regards, tom lane
   
   ---(end of broadcast)---
   TIP 4: Have you searched our list archives?
   
  http://archives.postgresql.org
  
  -- 
Bruce Momjian   [EMAIL PROTECTED]
EnterpriseDBhttp://www.enterprisedb.com
  
+ If your life is a hard drive, Christ can be your backup. +
 
 
  
  ---(end of broadcast)---
  TIP 2: Don't 'kill -9' the postmaster
 
 -- 
   Bruce Momjian   [EMAIL PROTECTED]
   EnterpriseDBhttp://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for

2006-09-02 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom, should I apply this patch now?  Are you still considering other
 options for this?

Please wait.  This issue is very far down the to-list in terms of size
or significance ... but I'll get to it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-08-09 Thread Bruce Momjian

Have we made a decision on this issue?  Should I apply my patch that
still forces a split unless 10% of the page has been freed?

---

Tom Lane wrote:
 ITAGAKI Takahiro [EMAIL PROTECTED] writes:
  This is a revised patch originated by Junji TERAMOTO for HEAD.
[BTree vacuum before page splitting]
http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
  I think we can resurrect his idea because we will scan btree pages
  at-atime now; the missing-restarting-point problem went away.
 
 I've applied this but I'm now having some second thoughts about it,
 because I'm seeing an actual *decrease* in pgbench numbers from the
 immediately prior CVS HEAD code.  Using
   pgbench -i -s 10 bench
   pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
 with fsync off but all other settings factory-stock, what I'm seeing
 is that the first run looks really good but subsequent runs tail off in
 spectacular fashion :-(  Pre-patch there was only minor degradation in
 successive runs.
 
 What I think is happening is that because pgbench depends so heavily on
 updating existing records, we get into a state where an index page is
 about full and there's one dead tuple on it, and then for each insertion
 we have
 
   * check for uniqueness marks one more tuple dead (the
 next-to-last version of the tuple)
   * newly added code removes one tuple and does a write
   * now there's enough room to insert one tuple
   * lather, rinse, repeat, never splitting the page.
 
 The problem is that we've traded splitting a page every few hundred
 inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
 record, on *every* insert.  This is not good.
 
 Had you done any performance testing on this patch, and if so what
 tests did you use?  I'm a bit hesitant to try to fix it on the basis
 of pgbench results alone.
 
 One possible fix that comes to mind is to only perform the cleanup
 if we are able to remove more than one dead tuple (perhaps about 10
 would be good).  Or do the deletion anyway, but then go ahead and
 split the page unless X amount of space has been freed (where X is
 more than just barely enough for the incoming tuple).
 
 After all the thought we've put into this, it seems a shame to
 just abandon it :-(.  But it definitely needs more tweaking.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-08-09 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Have we made a decision on this issue?  Should I apply my patch that
 still forces a split unless 10% of the page has been freed?

I haven't gotten back to doing any more performance testing.  Please
stick that patch on the pending queue, so we don't forget it, but
don't apply it yet ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-28 Thread Jim C. Nasby
On Thu, Jul 27, 2006 at 05:24:35PM -0400, Greg Stark wrote:
 
 Jim Nasby [EMAIL PROTECTED] writes:
 
  Even if we stopped right there it would still be a huge win in many  (most?)
  cases. How often do the indexes on a table comprise even 50%  of the table's
  size? 
 
 I would say they're usually roughly comparable actually. It depends on how
 wide your table is of course but the wider your table rows the more indexes
 you're likely to have on the table too.

I think the number of fields in a table will correlate with the number
of indexes, but I don't think the width of those fields matters.

  Even in the  50% case, you've gone from 1.5X to .6X
 
 Sure, and a 3x speedup is nothing to sneeze at, that would be a great
 improvement to vacuum. But it's still just a linear speedup and doesn't
 address the algorithmic problem. 
 
 The fundamental problem is we have a process that's O(m) where m is the total
 space taken by a table and its indexes. The actual amount of space it has to
 reclaim is n. Other than nm there's basically no relationship between these
 figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).
 
 We frequently assume -- and often it's a valid assumption -- that these
 figures are roughly proportional. Hence all the talk about databases reaching
 a steady-state where the amount of dead space is constantly being reclaimed
 at more or less the same speed it's being generated. But there are also plenty
 of use cases where a complete vacuum pass takes thousands of times longer than
 the i/o it took to generate those dead tuples. Currently Postgres just isn't
 that great a tool for those use cases.
 
This is exactly why I'm suggesting that we stop waiting for the perfect
vacuum that will only hit the exact tuples in both the heap and indexes
that it needs to and at least take the giant leap forward of only
hitting heap tuples/pages that we know are dead. Even if indexes are the
same size as the heap, you've still gained nearly a 2x speed improvement
(depending on how long you wait to vacuum).

 Unfortunately while I'm convinced of the problem I'm equally unconvinced of
 the solution. I tried to solve online index builds using retail index lookups
 in a very similar way to what's being discussed here. And ran into the same
 problems. I eventually decided that while it could be made to work that way it
 would be far too much code, far too unsafe, and far too invasive in the index
 access methods to be the right approach.
 
 Our existing method works with minimal help from the index access methods
 which allows for an enormous degree of freedom in the index design.. To be
 able to support retail vacuum you would have to force index method
 implementors to keep information in a way that allowed them to look up a
 particular value/tid efficiently which would limit the kinds of indexes you
 could support drastically.
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Jim Nasby

On Jul 26, 2006, at 4:29 PM, Hannu Krosing wrote:
Well the desire for it comes from a very well established need  
for dealing
with extremely large tales with relatively small hot spots. The  
basic problem
being that currently the cost of vacuum is proportional to the  
size of the
table rather than the amount of dead space. There's no link  
between those
variables (at least in one direction) and any time they're far  
out of whack it

means excruciating pain for the DBA.


I thought the suggested solution for that was the dead space map.  
That

way vacuum can ignore parts of the table that havn't changed...


It can ignore parts of the *table* but still has to scan full  
*indexes*.


Even if we stopped right there it would still be a huge win in many  
(most?) cases. How often do the indexes on a table comprise even 50%  
of the table's size? If indexes are 10% of the table's size, and you  
only scan 10% of the table that's dead, you've gone from scanning  
1.10*X pages (where X is the number of pages in the table) to 0.20*X  
pages. For large tables, that's a drastic improvement. Even in the  
50% case, you've gone from 1.5X to .6X (assuming 10% dead space). And  
I'm actually ignoring that we currently scan the heap twice; if you  
throw that into the mix the argument for doing this is even stronger.


While it would be ideal to eliminate the need to scan the indexes,  
I'd certainly rather have the 'half-solution' of not scanning the  
entire heap and still scanning the indexes over what we have today.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Jim Nasby

On Jul 26, 2006, at 10:29 AM, Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:

... Well it's not like the existing vacuum checks for this.


Right, that's exactly why the patch works at all.  But the point  
here is

that the existing vacuum does not rely on re-computing index keys; all
it cares about is matching TIDs.  The retail-vacuum idea depends on  
the
assumption that you can look at the tuple and re-compute the same  
index

keys that you computed the first time; which is an assumption much
shakier than the assumption that TID comparison works.  (In fact, it's
trivial to see how user-defined functions that are mislabeled  
immutable

could make this fail.)  So retail vacuum without any cross-check that
you got all the index tuples is a scary proposition IMHO.


Are there other cases that could cause a problem besides mislabeled  
user function indexes (a case I don't feel we need to protect  
against) and bugs in core?


If we don't worry about users who can't figure out what immutable  
means, we should be able to cover core bugs by having the buildfarm  
occasionally do long-running pg_bench (or some other workload) runs  
with autovacuum turned on. Knowing what autovacuum is set to, we can  
calculate approximately how much dead/empty space should exist in  
indexes, and make sure we're within reason (though this might mean  
having a mode that disables deleting known-dead tuples before  
splitting a page). Another possibility would be actually inspecting  
the indexes for invalid entries.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-27 Thread Greg Stark

Jim Nasby [EMAIL PROTECTED] writes:

 Even if we stopped right there it would still be a huge win in many  (most?)
 cases. How often do the indexes on a table comprise even 50%  of the table's
 size? 

I would say they're usually roughly comparable actually. It depends on how
wide your table is of course but the wider your table rows the more indexes
you're likely to have on the table too.

 Even in the  50% case, you've gone from 1.5X to .6X

Sure, and a 3x speedup is nothing to sneeze at, that would be a great
improvement to vacuum. But it's still just a linear speedup and doesn't
address the algorithmic problem. 

The fundamental problem is we have a process that's O(m) where m is the total
space taken by a table and its indexes. The actual amount of space it has to
reclaim is n. Other than nm there's basically no relationship between these
figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).

We frequently assume -- and often it's a valid assumption -- that these
figures are roughly proportional. Hence all the talk about databases reaching
a steady-state where the amount of dead space is constantly being reclaimed
at more or less the same speed it's being generated. But there are also plenty
of use cases where a complete vacuum pass takes thousands of times longer than
the i/o it took to generate those dead tuples. Currently Postgres just isn't
that great a tool for those use cases.

Unfortunately while I'm convinced of the problem I'm equally unconvinced of
the solution. I tried to solve online index builds using retail index lookups
in a very similar way to what's being discussed here. And ran into the same
problems. I eventually decided that while it could be made to work that way it
would be far too much code, far too unsafe, and far too invasive in the index
access methods to be the right approach.

Our existing method works with minimal help from the index access methods
which allows for an enormous degree of freedom in the index design.. To be
able to support retail vacuum you would have to force index method
implementors to keep information in a way that allowed them to look up a
particular value/tid efficiently which would limit the kinds of indexes you
could support drastically.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 [snip] (In fact, it's
 trivial to see how user-defined functions that are mislabeled immutable
 could make this fail.)  So retail vacuum without any cross-check that
 you got all the index tuples is a scary proposition IMHO.

 Wouldn't work to restrict that kind of vacuum to only tables which have
 no indexes using user defined functions ?

Of course, we never have bugs in PG core.  Nope, doesn't happen ...

 I actually wonder if such a vacuum would be useful for my scenario,
 where I have some pretty big tables, and update a relatively small
 percentage of it. Would it be faster to run such a vacuum against the
 current one ?

So far, the case hasn't been made for retail vacuum even ignoring the
not-so-immutable-function risk.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Martijn van Oosterhout
On Wed, Jul 26, 2006 at 12:47:57PM -0400, Greg Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  So far, the case hasn't been made for retail vacuum even ignoring the
  not-so-immutable-function risk.
 
 Well the desire for it comes from a very well established need for dealing
 with extremely large tales with relatively small hot spots. The basic problem
 being that currently the cost of vacuum is proportional to the size of the
 table rather than the amount of dead space. There's no link between those
 variables (at least in one direction) and any time they're far out of whack it
 means excruciating pain for the DBA.

I thought the suggested solution for that was the dead space map. That
way vacuum can ignore parts of the table that havn't changed...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 23:02, kirjutas Martijn van
Oosterhout:
 On Wed, Jul 26, 2006 at 12:47:57PM -0400, Greg Stark wrote:
  Tom Lane [EMAIL PROTECTED] writes:
  
   So far, the case hasn't been made for retail vacuum even ignoring the
   not-so-immutable-function risk.
  
  Well the desire for it comes from a very well established need for dealing
  with extremely large tales with relatively small hot spots. The basic 
  problem
  being that currently the cost of vacuum is proportional to the size of the
  table rather than the amount of dead space. There's no link between those
  variables (at least in one direction) and any time they're far out of whack 
  it
  means excruciating pain for the DBA.
 
 I thought the suggested solution for that was the dead space map. That
 way vacuum can ignore parts of the table that havn't changed...

It can ignore parts of the *table* but still has to scan full *indexes*.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match