Re: [PERFORM] Exact index overhead

2008-04-19 Thread Scott Marlowe
On Thu, Apr 17, 2008 at 9:42 AM, Gunther Mayer
<[EMAIL PROTECTED]> wrote:
> Pavan Deolasee wrote:
>

> > 2. The existing block should have enough free space to accommodate the
> > new version
> > A less than 100 fillfactor may help you given your rate of updates.
> >
> >
>  I see, as soon as a new block is required for the new version the index
> pointer needs updating too, I understand now. But at least in the common
> case of space being available the index overhead is reduced to zero. I can
> live with that.

Quick clarification, it's the table, not the index that has to have
free space for the new row version.  This rewards good normalization
practices (narrower rows) and a lower fill factor.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exact index overhead

2008-04-17 Thread Gunther Mayer

Pavan Deolasee wrote:

On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer
<[EMAIL PROTECTED]> wrote:
  

 You see, all updates change most of the data fields but never ever touch
the time field. Assuming correct and efficient behaviour of postgresql it
should then also never touch the time index and incur zero overhead in its
presence, but is this really the case?



Normally, whenever a row is updated, Postgres inserts a new index entry in each
of the index. So to answer your question, there is certainly index
overhead during
updates, even if you are not changing the indexed column.
  
Ah, I knew these "obvious" assumptions wouldn't necessarily hold. Good 
that I checked.

But if you are using 8.3 then HOT may help you here, assuming you are
not updating
any index keys. HOT optimizes the case by *not* inserting a new index entry and
also by performing retail vacuuming. The two necessary conditions for HOT are:

1. Update should not change any of the index keys. So if you have two
indexes, one
on column A and other on column B, update must not be modifying either A or B.
  

That condition is always satisfied.

2. The existing block should have enough free space to accommodate the
new version
A less than 100 fillfactor may help you given your rate of updates.
  
I see, as soon as a new block is required for the new version the index 
pointer needs updating too, I understand now. But at least in the common 
case of space being available the index overhead is reduced to zero. I 
can live with that.

If your application satisfies 1, then I would suggest you to upgrade
to 8.3 (if you are
not using it already) and then you can create the index without
bothering much about
overheads.
  
I'm still running 8.2.7 but I guess here's a compelling reason to 
upgrade ;-) Will do so soon.


Thanks a lot to everyone who responded (and at what pace!). I love this 
community, it beats commercial support hands down.


Gunther

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exact index overhead

2008-04-17 Thread Pavan Deolasee
On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer
<[EMAIL PROTECTED]> wrote:
>
>
>  You see, all updates change most of the data fields but never ever touch
> the time field. Assuming correct and efficient behaviour of postgresql it
> should then also never touch the time index and incur zero overhead in its
> presence, but is this really the case?

Normally, whenever a row is updated, Postgres inserts a new index entry in each
of the index. So to answer your question, there is certainly index
overhead during
updates, even if you are not changing the indexed column.

But if you are using 8.3 then HOT may help you here, assuming you are
not updating
any index keys. HOT optimizes the case by *not* inserting a new index entry and
also by performing retail vacuuming. The two necessary conditions for HOT are:

1. Update should not change any of the index keys. So if you have two
indexes, one
on column A and other on column B, update must not be modifying either A or B.

2. The existing block should have enough free space to accommodate the
new version
A less than 100 fillfactor may help you given your rate of updates.

If your application satisfies 1, then I would suggest you to upgrade
to 8.3 (if you are
not using it already) and then you can create the index without
bothering much about
overheads.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exact index overhead

2008-04-17 Thread Heikki Linnakangas

Gunther Mayer wrote:
You see, all updates change most of the data fields but never ever touch 
the time field. Assuming correct and efficient behaviour of postgresql 
it should then also never touch the time index and incur zero overhead 
in its presence, but is this really the case? If it somehow does update 
the index too even though the value hasn't changed by some weird 
implementation detail I'd rather not have that index and live with slow 
queries for the few times a day that reporting is run.


Updates do generally modify the indexes as well. The way MVCC has been 
implemented in PostgreSQL, UPDATE is internally very much like 
DELETE+INSERT. A new row version is inserted, new index pointers are 
added for the new row version, and the old row version is marked as deleted.


In version 8.3, however, the new HOT feature reduces the need for that. 
In a nutshell, if the new row version fits on the same page as the old 
one, no new index pointers need to be created.


I would suggest just testing how much additional overhead the new index 
incurs. It might be less expensive than you think.


You didn't mention how often the inserts happen, in other words, how 
fast you expect the table to grow. If the table is expected to grow 
orders of magnitude larger, you might want to partition the table by date.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exact index overhead

2008-04-17 Thread Richard Huxton

Gunther Mayer wrote:
You see, all updates change most of the data fields but never ever touch 
the time field. Assuming correct and efficient behaviour of postgresql 
it should then also never touch the time index and incur zero overhead 
in its presence, but is this really the case? If it somehow does update 
the index too even though the value hasn't changed by some weird 
implementation detail I'd rather not have that index and live with slow 
queries for the few times a day that reporting is run.


Well, until 8.3 PG does indeed update the index. That's because with 
MVCC an update is basically a delete+insert, so you'll end up with two 
versions (the V in MVCC) of the row.


With 8.3 there's a new feature called HOT which means updates that don't 
change an index can be more efficient.


So - if you are running 8.3, I'd say try the index and see what 
difference it makes.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Exact index overhead

2008-04-17 Thread Gunther Mayer

Hi there,

I have a table which looks similar to:

CREATE TABLE accounting
(
 id text NOT NULL,
 time timestamp with time zone,
 data1 int,
 data2 int,
 data3 int,
 data4 int,
 data5 int,
 data6 int,
 data7 int,
 data8 int,
 state int
 CONSTRAINT accounting_pkey PRIMARY KEY (id),
)

The table has about 300k rows but is growing steadily. The usage of this 
table is few selects and inserts, tons of updates and no deletes ever. 
Ratios are roughly

select:insert = 1:1
insert:update = 1:60

Now it turns out that almost all reporting queries use the time field 
and without any additional indexes it ends up doing slow and expensive 
sequential scans (10-20 seconds). Therefore I'd like to create and index 
on time to speed this up, yet I'm not entirely sure what overhead that 
introduces. Clearly there's some overhead during insertion of a new row 
which I can live with but what's not clear is the overhead during 
updates, and the postgresql manual doesn't make that explicit.


You see, all updates change most of the data fields but never ever touch 
the time field. Assuming correct and efficient behaviour of postgresql 
it should then also never touch the time index and incur zero overhead 
in its presence, but is this really the case? If it somehow does update 
the index too even though the value hasn't changed by some weird 
implementation detail I'd rather not have that index and live with slow 
queries for the few times a day that reporting is run.


Gunther

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance