Re: [HACKERS] Performance penalty of visibility info in indexes?

2007-02-05 Thread Martijn van Oosterhout
On Thu, Feb 01, 2007 at 11:57:41PM -0600, Jim Nasby wrote:
 Has anyone actually measured the performance overhead of storing  
 visibility info in indexes? I know the space overhead sounds  
 daunting, but even if it doubled the size of the index in many cases  
 that'd still be a huge win over having to scan the heap as well as  
 the index (esp. for things like count(*)). There would also be  
 overhead from having to update the old index tuple, but for the case  
 of updates you're likely to need that page for the new index tuple  
 anyway.

I thought the main problem was locking. If you change the visibility of
an existing row, you have to update the index in a way that won't kill
concurrant scans, either by returning the row twice, or skipping it.

I think it hinges on what exactly falls under visibility info. Maybe
with the page-at-a-time index scans, the problem is easier now.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Performance penalty of visibility info in indexes?

2007-02-04 Thread Jim Nasby

On Feb 2, 2007, at 1:41 PM, Simon Riggs wrote:

On Thu, 2007-02-01 at 23:57 -0600, Jim Nasby wrote:

Has anyone actually measured the performance overhead of storing
visibility info in indexes? I know the space overhead sounds
daunting, but even if it doubled the size of the index in many cases
that'd still be a huge win over having to scan the heap as well as
the index (esp. for things like count(*)). There would also be
overhead from having to update the old index tuple, but for the case
of updates you're likely to need that page for the new index tuple
anyway.

I know this wouldn't work for all cases, but ISTM there are many
cases where it would be a win.


It would prevent any optimization that sought to avoid inserting rows
into the index each time we perform an UPDATE. Improving UPDATE
performance seems more important than improving count(*), IMHO.


That depends on what you're doing; a large read-mostly table would  
likely see a lot of benefit from being able to do covering index scans.


Of course this would have to be optional; there's lots of cases where  
you wouldn't want the added index size.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Performance penalty of visibility info in indexes?

2007-02-04 Thread Hannu Krosing
Ühel kenal päeval, P, 2007-02-04 kell 22:23, kirjutas Jim Nasby:
 On Feb 2, 2007, at 1:41 PM, Simon Riggs wrote:
  On Thu, 2007-02-01 at 23:57 -0600, Jim Nasby wrote:
  Has anyone actually measured the performance overhead of storing
  visibility info in indexes? I know the space overhead sounds
  daunting, but even if it doubled the size of the index in many cases
  that'd still be a huge win over having to scan the heap as well as
  the index (esp. for things like count(*)). There would also be
  overhead from having to update the old index tuple, but for the case
  of updates you're likely to need that page for the new index tuple
  anyway.
 
  I know this wouldn't work for all cases, but ISTM there are many
  cases where it would be a win.
 
  It would prevent any optimization that sought to avoid inserting rows
  into the index each time we perform an UPDATE. 

Not always. If we do in-page update and keep the unchanged index entry
pointing to the first tuple inside the page, then the indexes visibility
info would still be valid for that tuple and also right for that field.

 Improving UPDATE
  performance seems more important than improving count(*), IMHO.
 
 That depends on what you're doing; a large read-mostly table would  
 likely see a lot of benefit from being able to do covering index scans.

A large read-mostly table would also benefit from separating the
visibility info out to a compressed visibility heap.

 Of course this would have to be optional; there's lots of cases where  
 you wouldn't want the added index size.

Of course. All alternative ways of storing MVCC info should be optional
and user-selectable so DBA can test and select the most suitable one for
each usecase.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Performance penalty of visibility info in indexes?

2007-02-02 Thread Jim Nasby
Has anyone actually measured the performance overhead of storing  
visibility info in indexes? I know the space overhead sounds  
daunting, but even if it doubled the size of the index in many cases  
that'd still be a huge win over having to scan the heap as well as  
the index (esp. for things like count(*)). There would also be  
overhead from having to update the old index tuple, but for the case  
of updates you're likely to need that page for the new index tuple  
anyway.


I know this wouldn't work for all cases, but ISTM there are many  
cases where it would be a win.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Performance penalty of visibility info in indexes?

2007-02-02 Thread Simon Riggs
On Thu, 2007-02-01 at 23:57 -0600, Jim Nasby wrote:
 Has anyone actually measured the performance overhead of storing  
 visibility info in indexes? I know the space overhead sounds  
 daunting, but even if it doubled the size of the index in many cases  
 that'd still be a huge win over having to scan the heap as well as  
 the index (esp. for things like count(*)). There would also be  
 overhead from having to update the old index tuple, but for the case  
 of updates you're likely to need that page for the new index tuple  
 anyway.
 
 I know this wouldn't work for all cases, but ISTM there are many  
 cases where it would be a win.

It would prevent any optimization that sought to avoid inserting rows
into the index each time we perform an UPDATE. Improving UPDATE
performance seems more important than improving count(*), IMHO.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org