Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Bruce Momjian
Pavan Deolasee wrote:
> together. In the SELECT path, we conditionally try for exclusive lock. If
> the exclusive lock is also cleanup lock, we prune and defrag the page.
> As the patch stands, during index fetch we try to prune only the chain
> originating at that tuple. I am wondering if we should change that
> and prune all the tuple chains in the page ?

I am thinking we are best just doing the index chain we already have to
read.   

If you are modifying the table (like with UPDATE) it makes sense to be
more aggressive and do the whole page because you know there are
probably other table modifications, but for an index lookup there isn't
any knowledge of whether the table is being modified so looking at more
than we need seems like overkill.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> I am thinking we are best just doing the index chain we already have to
> read.   

> If you are modifying the table (like with UPDATE) it makes sense to be
> more aggressive and do the whole page because you know there are
> probably other table modifications, but for an index lookup there isn't
> any knowledge of whether the table is being modified so looking at more
> than we need seems like overkill.

Uh, why would any of this code at all execute during a pure lookup?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > I am thinking we are best just doing the index chain we already have to
> > read.   
> 
> > If you are modifying the table (like with UPDATE) it makes sense to be
> > more aggressive and do the whole page because you know there are
> > probably other table modifications, but for an index lookup there isn't
> > any knowledge of whether the table is being modified so looking at more
> > than we need seems like overkill.
> 
> Uh, why would any of this code at all execute during a pure lookup?

No idea.  It seems an index lookup tries to prune a heap chain, and he
was asking if it should look at other chains on the page;  I said not. 
Whether the index lookup should prune the heap chain is another issue.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Uh, why would any of this code at all execute during a pure lookup?

> No idea.  It seems an index lookup tries to prune a heap chain, and he
> was asking if it should look at other chains on the page;  I said not. 
> Whether the index lookup should prune the heap chain is another issue.

ISTM the only time we should be doing HOT cleanup work is when we are
trying to insert a new tuple on that page --- and maybe even only when
we try and it doesn't fit straight off.  Otherwise you're pushing
maintenance work into foreground query pathways, which is exactly where
I don't think we should be headed.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Gregory Stark
"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Tom Lane wrote:
>> Bruce Momjian <[EMAIL PROTECTED]> writes:
>> > I am thinking we are best just doing the index chain we already have to
>> > read.   
>> 
>> > If you are modifying the table (like with UPDATE) it makes sense to be
>> > more aggressive and do the whole page because you know there are
>> > probably other table modifications, but for an index lookup there isn't
>> > any knowledge of whether the table is being modified so looking at more
>> > than we need seems like overkill.
>> 
>> Uh, why would any of this code at all execute during a pure lookup?
>
> No idea.  It seems an index lookup tries to prune a heap chain, and he
> was asking if it should look at other chains on the page;  I said not. 
> Whether the index lookup should prune the heap chain is another issue.

Pruning chains is kind of the whole point of the exercise no?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Bruce Momjian <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Uh, why would any of this code at all execute during a pure lookup?
>
>> No idea.  It seems an index lookup tries to prune a heap chain, and he
>> was asking if it should look at other chains on the page;  I said not. 
>> Whether the index lookup should prune the heap chain is another issue.
>
> ISTM the only time we should be doing HOT cleanup work is when we are
> trying to insert a new tuple on that page --- and maybe even only when
> we try and it doesn't fit straight off.  Otherwise you're pushing
> maintenance work into foreground query pathways, which is exactly where
> I don't think we should be headed.

Ah, as I understand it you can't actually do the pruning then because the
executor holds references to source tuple on the page. In other words you can
never "get the vacuum lock" there because you already have the page pinned
yourself.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Pavan Deolasee
On 9/6/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>
> >
> > ISTM the only time we should be doing HOT cleanup work is when we are
> > trying to insert a new tuple on that page --- and maybe even only when
> > we try and it doesn't fit straight off.  Otherwise you're pushing
> > maintenance work into foreground query pathways, which is exactly where
> > I don't think we should be headed.
>
> Ah, as I understand it you can't actually do the pruning then because the
> executor holds references to source tuple on the page. In other words you
> can
> never "get the vacuum lock" there because you already have the page pinned
> yourself.
>
>
I don't think executor holding a reference is a problem because when
we check for vacuum lock, we have already pinned the page anyways.
But moving the old tuple around deep down in the UPDATE code path
(when we realize that there is no free space) is an issue. I know Heikki
tried to do it this way - but then moved the pruning code to lookup
code. Heikki ?

Another real problem with doing pruning only in UPDATE path is that
we may end up with long HOT chains if the page does not receive a
UPDATE, after many consecutive HOT updates. Every lookup to the
visible tuple in this chain would be CPU expensive since it would require
long chain following.

The downside is of course that SELECT may occasionally do more work.
But since we ensure that we do the extra work only when there is no
contention on the page, ISTM that the downside is limited.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-05 Thread Gregory Stark
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:

> On 9/6/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
>> Ah, as I understand it you can't actually do the pruning then because the
>> executor holds references to source tuple on the page. In other words you
>> can never "get the vacuum lock" there because you already have the page
>> pinned yourself.
>
> I don't think executor holding a reference is a problem because when
> we check for vacuum lock, we have already pinned the page anyways.

But that's the point. Even though the pin-count is 1 and it may look like we
have the vacuum lock we really don't. The fact that the buffer was already
pinned by us means that there are already pointers around referencing tuples
on that page. If we move them around those pointers become garbage. In fact
Postgres avoids copying data if it can get by with a pointer at the original
tuple on disk so some of the pointers will be Datums pointing at individual
columns in the tuples in the page.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Pavan Deolasee wrote:
> On 9/6/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>> "Tom Lane" <[EMAIL PROTECTED]> writes:
>>
>>> ISTM the only time we should be doing HOT cleanup work is when we are
>>> trying to insert a new tuple on that page --- and maybe even only when
>>> we try and it doesn't fit straight off.  Otherwise you're pushing
>>> maintenance work into foreground query pathways, which is exactly where
>>> I don't think we should be headed.
>> Ah, as I understand it you can't actually do the pruning then because the
>> executor holds references to source tuple on the page. In other words you
>> can
>> never "get the vacuum lock" there because you already have the page pinned
>> yourself.
>>
>>
> I don't think executor holding a reference is a problem because when
> we check for vacuum lock, we have already pinned the page anyways.
> But moving the old tuple around deep down in the UPDATE code path
> (when we realize that there is no free space) is an issue. I know Heikki
> tried to do it this way - but then moved the pruning code to lookup
> code. Heikki ?

When I suggested that we get rid of the LP_DELETE flag for heap tuples,
the tuple-level fragmentation and all that, and just take the vacuum
lock and call PageRepairFragmentation, I was thinking that we'd do it in
heap_update and only when we run out of space on the page. But as Greg
said, it doesn't work because you're already holding a reference to at
least one tuple on the page, the one you're updating, by the time you
get to heap_update. That's why I put the pruning code to heap_fetch
instead. Yes, though the amortized cost is the same, it does push the
pruning work to the foreground query path.

That was a reason for separating the pruning and defragmenting a page.
We could do the pruning in heap_update, and only do the defragmenting in
heap_fetch. I was also thinking of an optimization to the pruning, so
that while we scan the page and remove dead tuples, we also check if we
leave behind an empty gap that's big enough to accommodate the new tuple
we're inserting, and reuse that space immediately, without defragmenting.

> Another real problem with doing pruning only in UPDATE path is that
> we may end up with long HOT chains if the page does not receive a
> UPDATE, after many consecutive HOT updates. Every lookup to the
> visible tuple in this chain would be CPU expensive since it would require
> long chain following.

Yes. For that as well, we could prune only, but not defragment, the page
in the lookup path.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> When I suggested that we get rid of the LP_DELETE flag for heap tuples,
> the tuple-level fragmentation and all that, and just take the vacuum
> lock and call PageRepairFragmentation, I was thinking that we'd do it in
> heap_update and only when we run out of space on the page. But as Greg
> said, it doesn't work because you're already holding a reference to at
> least one tuple on the page, the one you're updating, by the time you
> get to heap_update. That's why I put the pruning code to heap_fetch
> instead. Yes, though the amortized cost is the same, it does push the
> pruning work to the foreground query path.

The amortized cost is only "the same" if every heap_fetch is associated
with a heap update.  I feel pretty urgently unhappy about this choice.
Have you tested the impact of the patch on read-mostly workloads?

>> Another real problem with doing pruning only in UPDATE path is that
>> we may end up with long HOT chains if the page does not receive a
>> UPDATE, after many consecutive HOT updates.

How is that, if the same number of prune attempts would occur?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> When I suggested that we get rid of the LP_DELETE flag for heap tuples,
>> the tuple-level fragmentation and all that, and just take the vacuum
>> lock and call PageRepairFragmentation, I was thinking that we'd do it in
>> heap_update and only when we run out of space on the page. But as Greg
>> said, it doesn't work because you're already holding a reference to at
>> least one tuple on the page, the one you're updating, by the time you
>> get to heap_update. That's why I put the pruning code to heap_fetch
>> instead. Yes, though the amortized cost is the same, it does push the
>> pruning work to the foreground query path.
> 
> The amortized cost is only "the same" if every heap_fetch is associated
> with a heap update.  I feel pretty urgently unhappy about this choice.
> Have you tested the impact of the patch on read-mostly workloads?

I haven't. Someone should. We have a tester working on a test suite with
many small CPU-bound performance test cases; hopefully we'll get those
test cases and results out soon.

Assuming the rule for when to prune would be the same whether we do it
in heap_fetch or heap_update, I don't see how the total cost would be
different. (that's a bad assumption, though, see below)

>>> Another real problem with doing pruning only in UPDATE path is that
>>> we may end up with long HOT chains if the page does not receive a
>>> UPDATE, after many consecutive HOT updates.
> 
> How is that, if the same number of prune attempts would occur?

It wouldn't. To avoid the long HOT chains, we want to prune more often
than what's needed to just make room for updates. I'm not sure what the
exact rules are in the current patch.

That's a pretty sensitive tradeoff, we want to prune often to cut the
long HOT chains, but not too often because it's pretty expensive to
acquire the vacuum lock and move tuples around. I don't think we've
found the optimal solution yet. Separating the pruning and defragmenting
might help.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Florian Pflug

Heikki Linnakangas wrote:

That's a pretty sensitive tradeoff, we want to prune often to cut the
long HOT chains, but not too often because it's pretty expensive to
acquire the vacuum lock and move tuples around. I don't think we've
found the optimal solution yet. Separating the pruning and defragmenting
might help.


Does defragmenting force writing a full page image to the WAL afterwards?
Or does it just log the fact that the page was defragmented, and the actual
work is redone on recovery?

In the first case, over-zealous defragmenting might be costly in terms of
WAL traffic too, not only in term of CPU usage.

greetings, Florian Pflug


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
 Another real problem with doing pruning only in UPDATE path is that
 we may end up with long HOT chains if the page does not receive a
 UPDATE, after many consecutive HOT updates.
>>> How is that, if the same number of prune attempts would occur?
> 
>> It wouldn't. To avoid the long HOT chains, we want to prune more often
>> than what's needed to just make room for updates.
> 
> I don't follow.  HOT chains can only get longer by updates.

Yes. We don't seem to be on the same wavelength...

Imagine a page with just one tuple on it:

1

After a bunch of updates, it looks like this

1 -> 2 -> 3 -> 4 -> 5

1 is the tuple the indexes point to, others are heap only.

Every time you do an index lookup, you have to follow that chain from 1
to 5. Which gets expensive. That's why we want to prune the chain to
make it shorter, even if there's still plenty of room on the page for
updates, and even if we're never going to update it again.

That's the theory. I don't know *how* expensive following the chain
really is, compared to index scans skipping over entries marked as
killed. I don't remember seeing any measurements of that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>>> Another real problem with doing pruning only in UPDATE path is that
>>> we may end up with long HOT chains if the page does not receive a
>>> UPDATE, after many consecutive HOT updates.
>> 
>> How is that, if the same number of prune attempts would occur?

> It wouldn't. To avoid the long HOT chains, we want to prune more often
> than what's needed to just make room for updates.

I don't follow.  HOT chains can only get longer by updates.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Alvaro Herrera wrote:
> Pruning is going to take place on next vacuum anyway, isn't it?

Yes. If HOT is working well, you're not going to run vacuum very often,
though.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Alvaro Herrera wrote:
> Pruning is going to take place on next vacuum anyway, isn't it?
> 
> AFAIUI the point is that you can't do cleanup selectively only on UPDATE
> because of pointers that your process may already have to the page,
> which is why you wanted to do it on every heap_fetch.  I suggest it is
> optimization which can be left for a later version (8.4?)

Wait, did you mean that we don't do pruning at all in 8.3? That's a bad
idea, the main purpose of HOT is to be able to vacuum page at a time,
reducing the need to do regular vacuums.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Florian Pflug

Tom Lane wrote:

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

Another real problem with doing pruning only in UPDATE path is that
we may end up with long HOT chains if the page does not receive a
UPDATE, after many consecutive HOT updates.

How is that, if the same number of prune attempts would occur?



It wouldn't. To avoid the long HOT chains, we want to prune more often
than what's needed to just make room for updates.


I don't follow.  HOT chains can only get longer by updates.


I believe the case pruning-on-select protects against is
where you'd have a period of heavy updating and long running
transaction, followed by a period of (mostly) read only access.

If you didn't manage to prune most chains during the first phase
due to a rather old OldestXmin, the following selects will all
spend cycles on following the long HOT chains.

Still, it sounds like the real reason is more the technical
difficulties of doing it on update, rather than preventing
that scenario.

A rather wild idea: Could we maybe pin individual tuples, instead
of the whole page? Then we'd just have to be careful not to move
those when pruning during the update.

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Me & Greg just had a little chat, and came up with this scheme:

1. on heap_update, if the page is full, you prune the page (but don't
defragment it, because you can't get the vacuum lock). That hopefully
leaves behind a large enough gap to put the new tuple in. Insert the new
tuple in the gap, and mark the page as Fragmented. Also make a note in
some backend-private data structure that we've left that page in
fragmented state.

2. In UnpinBuffer, if the pin count falls to zero and it's a page we've
pruned (check the backend-private data structure), defragment it.

Under little contention, all the cost of pruning will be carried by
transactions that do updates. Whether we need to prune in heap_fetch in
addition to that to keep the chains short, I don't know.

One problem with this scheme is that when the page gets full, you have
to hope that you can create a wide enough gap by pruning. It will work
well with fixed-size tuples, but not so well otherwise.

Hmm. I wonder if we could prune/defragment in bgwriter?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> Imagine a page with just one tuple on it:
> 
>> 1
> 
>> After a bunch of updates, it looks like this
> 
>> 1 -> 2 -> 3 -> 4 -> 5
> 
>> 1 is the tuple the indexes point to, others are heap only.
> 
> But if we were attempting prune at every update, at least some of the
> later updates should have managed to prune.  "2" should certainly be
> gone at this point, unless there's lots of contention for the page,
> in which case pruning at select won't make things better.

Oh I see. Yeah, hopefully you don't end up with long chains too often.
You don't need contention for it, though, a long-running transaction
will cause it. Or a transaction that inserts a tuple and updates it
multiple times in the same transaction.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Florian Pflug wrote:
> Tom Lane wrote:
> A rather wild idea: Could we maybe pin individual tuples, instead
> of the whole page? Then we'd just have to be careful not to move
> those when pruning during the update.

Uh, that's a huge change. We might be able to keep track of tuples that
we have references to in our own backend, but even that seems like a
non-starter to me.

Yet another idea is to add an "intent" argument (or somehow pass it out
of line) to heap_fetch. You would prune the page in heap_fetch, but only
if you're fetching for the purpose of updating the tuple.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Simon Riggs
On Wed, 2007-09-05 at 18:15 -0400, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> Uh, why would any of this code at all execute during a pure lookup?
> 
> > No idea.  It seems an index lookup tries to prune a heap chain, and he
> > was asking if it should look at other chains on the page;  I said not. 
> > Whether the index lookup should prune the heap chain is another issue.
> 
> ISTM the only time we should be doing HOT cleanup work is when we are
> trying to insert a new tuple on that page --- and maybe even only when
> we try and it doesn't fit straight off.  Otherwise you're pushing
> maintenance work into foreground query pathways, which is exactly where
> I don't think we should be headed.

This is one of the many heuristics for tuning HOT.

We can choose to do this immediately, when the page fills or somewhere
in between.

The rationale was: In other circumstances we dirty a page to set a hint
bit on the first time we see the need for it. That avoids the clog
lookup on all later reads of that tuple. By analogy, we do the same
thing with HOT: when we see a tuple chain that can be pruned, we prune
it. If we dirty the page for the hint bit it seems like we may as well
dirty the page some more and prune it also at the same time.

We could begin pruning only when the chain is N long. Currently N=2, but
we could set N=3+ easily enough. There's no code yet to actually count
that, but we can do that easily as we do each lookup. We should also be
able to remember the visibility result for each tuple in the chain to
decide whether pruning will be effective or not. 

I would say that if the buffer is already dirty and the chain is
prunable, we should prune it at the first possible chance.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Alvaro Herrera
Heikki Linnakangas escribió:

> Hmm. I wonder if we could prune/defragment in bgwriter?

That would be best, if at all possible.  You can prune without accessing
anything outside the page itself, right?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Alvaro Herrera wrote:
> Heikki Linnakangas escribió:
> 
>> Hmm. I wonder if we could prune/defragment in bgwriter?
> 
> That would be best, if at all possible.  You can prune without accessing
> anything outside the page itself, right?

Yes, though you do need to have an oldest xmin to determine which tuples
are dead, and the removed tuples need to be WAL-logged.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Florian Pflug wrote:
> Heikki Linnakangas wrote:
>> That's a pretty sensitive tradeoff, we want to prune often to cut the
>> long HOT chains, but not too often because it's pretty expensive to
>> acquire the vacuum lock and move tuples around. I don't think we've
>> found the optimal solution yet. Separating the pruning and defragmenting
>> might help.
> 
> Does defragmenting force writing a full page image to the WAL afterwards?
> Or does it just log the fact that the page was defragmented, and the actual
> work is redone on recovery?

No, you just log a note that it was defragmented.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Alvaro Herrera
Heikki Linnakangas escribió:

> Every time you do an index lookup, you have to follow that chain from 1
> to 5. Which gets expensive. That's why we want to prune the chain to
> make it shorter, even if there's still plenty of room on the page for
> updates, and even if we're never going to update it again.
> 
> That's the theory. I don't know *how* expensive following the chain
> really is, compared to index scans skipping over entries marked as
> killed. I don't remember seeing any measurements of that.

I suggest you let that be.  Chain following can't be *that* expensive --
and if it is, then pruning would be a lot more expensive, so it's not a
thing you want to be doing for every heap_fetch.

Pruning is going to take place on next vacuum anyway, isn't it?

AFAIUI the point is that you can't do cleanup selectively only on UPDATE
because of pointers that your process may already have to the page,
which is why you wanted to do it on every heap_fetch.  I suggest it is
optimization which can be left for a later version (8.4?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 16:15 +0100, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> >> Imagine a page with just one tuple on it:
> > 
> >> 1
> > 
> >> After a bunch of updates, it looks like this
> > 
> >> 1 -> 2 -> 3 -> 4 -> 5
> > 
> >> 1 is the tuple the indexes point to, others are heap only.
> > 
> > But if we were attempting prune at every update, at least some of the
> > later updates should have managed to prune.  "2" should certainly be
> > gone at this point, unless there's lots of contention for the page,
> > in which case pruning at select won't make things better.
> 
> Oh I see. Yeah, hopefully you don't end up with long chains too often.
> You don't need contention for it, though, a long-running transaction
> will cause it. Or a transaction that inserts a tuple and updates it
> multiple times in the same transaction.

Yes, the main point is that an UPDATE doesn't always allow you to prune.
If it did, that would be the right place. Since it doesn't the best
place to prune is surely the first time we see we *can* prune.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Simon Riggs wrote:
> Yes, the main point is that an UPDATE doesn't always allow you to prune.

You can always remove dead HOT tuples in heap_update. But you can never
defragment the page at that point.

> If it did, that would be the right place. Since it doesn't the best
> place to prune is surely the first time we see we *can* prune.

Not necessarily. Pruning is expensive, you need to scan all tuples on
the page and write WAL record. And defragment the page if you consider
that part of pruning. You don't want to do it too aggressively.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I don't follow.  HOT chains can only get longer by updates.

> Yes. We don't seem to be on the same wavelength...

> Imagine a page with just one tuple on it:

> 1

> After a bunch of updates, it looks like this

> 1 -> 2 -> 3 -> 4 -> 5

> 1 is the tuple the indexes point to, others are heap only.

But if we were attempting prune at every update, at least some of the
later updates should have managed to prune.  "2" should certainly be
gone at this point, unless there's lots of contention for the page,
in which case pruning at select won't make things better.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Alvaro Herrera
Heikki Linnakangas escribió:
> Alvaro Herrera wrote:
> > Heikki Linnakangas escribió:
> > 
> >> Hmm. I wonder if we could prune/defragment in bgwriter?
> > 
> > That would be best, if at all possible.  You can prune without accessing
> > anything outside the page itself, right?
> 
> Yes, though you do need to have an oldest xmin to determine which tuples
> are dead,

Hmm, well, I think that with the VXID patch it might actually be
possible to get a Xmin without being in a transaction.

> and the removed tuples need to be WAL-logged.

Is this a problem for bgwriter?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> That would be best, if at all possible.  You can prune without accessing
>> anything outside the page itself, right?

> Yes, though you do need to have an oldest xmin to determine which tuples
> are dead, and the removed tuples need to be WAL-logged.

Hmm ... also check commit status (pg_clog access).  I've always thought
that those things couldn't be done in bgwriter, because it wasn't
running real transactions, but right at the moment I can't see that
there is any obstacle.  Perhaps that meme dates from a time when
GetOldestXmin didn't work outside a transaction?

Pushing the prune/defrag work into bgwriter would certainly fix my
worries about having it in foreground query paths.

(And screw up all Greg Smith's work on how much bgwriter should run...
or maybe this should be a third independent task within bgwriter?)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Heikki Linnakangas escribió:
>> and the removed tuples need to be WAL-logged.

> Is this a problem for bgwriter?

bgwriter does checkpoints, which emit WAL records; and they also call
GetOldestXmin.  So we know a-priori that those two things work.  There
may be some other showstopper but I'm not sure what it would be.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 19:05 +0100, Heikki Linnakangas wrote:

> Pruning is expensive, you need to scan all tuples on
> the page and write WAL record. And defragment the page if you consider
> that part of pruning. You don't want to do it too aggressively.

OK, so my understanding or the meaning of the word has changed since we
decided that initial behaviour. So yes, not too aggressively.

Elsewhere on this thread, I said:

On Thu, 2007-09-06 at 18:05 +0100, Simon Riggs wrote:

> We could begin pruning only when the chain is N long. Currently N=2, but
> we could set N=3+ easily enough. There's no code yet to actually count
> that, but we can do that easily as we do each lookup. We should also be
> able to remember the visibility result for each tuple in the chain to
> decide whether pruning will be effective or not. 
> 
> I would say that if the buffer is already dirty and the chain is
> prunable, we should prune it at the first possible chance.

If we defer pruning until the page is full, worst case we may could end
up with a chain ~240 tuples long, which might need to be scanned
repeatedly. That won't happen often, but it would be better to prune
whenever we hit one of these conditions
- when the block is full 
- when we reach the 16th tuple in a chain

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Tom Lane
I wrote:
> Hmm ... also check commit status (pg_clog access).  I've always thought
> that those things couldn't be done in bgwriter, because it wasn't
> running real transactions, but right at the moment I can't see that
> there is any obstacle.  Perhaps that meme dates from a time when
> GetOldestXmin didn't work outside a transaction?

On further thought, I think what I'm remembering is that full-scale
VACUUM can't work inside bgwriter, because you need to take table-level
locks and worry about index vs heap consistency.  But as long as HOT
pruning involves only a single heap page I see no need for it to take
more than the buffer-level locks on that page.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Heikki Linnakangas
Tom Lane wrote:
> I wrote:
>> Hmm ... also check commit status (pg_clog access).  I've always thought
>> that those things couldn't be done in bgwriter, because it wasn't
>> running real transactions, but right at the moment I can't see that
>> there is any obstacle.  Perhaps that meme dates from a time when
>> GetOldestXmin didn't work outside a transaction?
> 
> On further thought, I think what I'm remembering is that full-scale
> VACUUM can't work inside bgwriter, because you need to take table-level
> locks and worry about index vs heap consistency.  But as long as HOT
> pruning involves only a single heap page I see no need for it to take
> more than the buffer-level locks on that page.

One complication is that you need to identify heap pages; you don't want
to start pruning index pages. We could mark the buffer with a new
Prunable-flag when an it's updated, to signal the bgwriter that it can
prune it.

I wonder if pruning in bgwriter only is enough to make HOT work
efficiently. On a very frequently updated page, bgwriter will have to
work hard to keep up.

Another problematic scenario is a big table and small shared_buffers
(like on Windows). Page can easily fall out of the buffer cache, before
bgwriter has the chance to prune anything on it.

But if it works reasonably well in typical scenarios, we can go with
that for 8.3 and improve later.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-06 Thread Greg Smith

On Thu, 6 Sep 2007, Heikki Linnakangas wrote:


I wonder if pruning in bgwriter only is enough to make HOT work
efficiently. On a very frequently updated page, bgwriter will have to
work hard to keep up.


One of the goals for how I rebuilt the just-in-time BGW was to try and 
make smaller values for bgwriter_delay feasible.  None of the tunables 
*should* have to be adjusted if you need to run the bgwriter much more 
often to support some new HOT-related activity in there as well; this is 
actually my next area I wanted to do a bit more testing on myself.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-07 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Tom Lane wrote:
> > I wrote:
> >> Hmm ... also check commit status (pg_clog access).  I've always thought
> >> that those things couldn't be done in bgwriter, because it wasn't
> >> running real transactions, but right at the moment I can't see that
> >> there is any obstacle.  Perhaps that meme dates from a time when
> >> GetOldestXmin didn't work outside a transaction?
> > 
> > On further thought, I think what I'm remembering is that full-scale
> > VACUUM can't work inside bgwriter, because you need to take table-level
> > locks and worry about index vs heap consistency.  But as long as HOT
> > pruning involves only a single heap page I see no need for it to take
> > more than the buffer-level locks on that page.
> 
> One complication is that you need to identify heap pages; you don't want
> to start pruning index pages. We could mark the buffer with a new
> Prunable-flag when an it's updated, to signal the bgwriter that it can
> prune it.
> 
> I wonder if pruning in bgwriter only is enough to make HOT work
> efficiently. On a very frequently updated page, bgwriter will have to
> work hard to keep up.
> 
> Another problematic scenario is a big table and small shared_buffers
> (like on Windows). Page can easily fall out of the buffer cache, before
> bgwriter has the chance to prune anything on it.
> 
> But if it works reasonably well in typical scenarios, we can go with
> that for 8.3 and improve later.

I don't like pushing pruning to the bgwriter.  I am afraid we get into
the same problem with autovacuum where polling via the bgwriter will
frequently not happen just-in-time, when we need it.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-07 Thread Bruce Momjian
Simon Riggs wrote:
> > We could begin pruning only when the chain is N long. Currently N=2, but
> > we could set N=3+ easily enough. There's no code yet to actually count
> > that, but we can do that easily as we do each lookup. We should also be
> > able to remember the visibility result for each tuple in the chain to
> > decide whether pruning will be effective or not. 
> > 
> > I would say that if the buffer is already dirty and the chain is
> > prunable, we should prune it at the first possible chance.
> 
> If we defer pruning until the page is full, worst case we may could end
> up with a chain ~240 tuples long, which might need to be scanned
> repeatedly. That won't happen often, but it would be better to prune
> whenever we hit one of these conditions
> - when the block is full 
> - when we reach the 16th tuple in a chain

I don't see how following a HOT chain is any slower than following an
UPDATE chain like we do now.  In fact, what we do now is to have an
index row for every row version, while with HOT we will have one index
entry and all the row versions on the same page.  That has to be better
than what we have now, even without pruning.

Let me define two terms:

prune -- modify ctid pointers to reduce the length of one HOT chain

defragment -- reduce the length of all HOT chains on the page
and collect free space

I think we all agree defragmenting should only happen when we need free
space on the page.  But it seems by the time we _know_ we are going to
be adding to the page we can't defragment.  I am thinking we should go
the direction of passing a boolean down into the routines so they know
to defragment before they get into a case where they can't.  

As for pruning, I am thinking Simon's idea of just saying don't prune
unless the chain is longer than X is correct.  If there are a lot of
updates to the page the page will fill and the chain pruned during a
defragment, and if not the chains are guaranteed to be shorter than X.

Ideally you would want to say if the chain has been Y for a certain
length of time (meaning it isn't growing and defragement hasn't happened
in a while) you would start to prune more aggressively but I see no
easy way to track that.

Also, why all the talk of index lookups doing pruning?  Can't a
sequential scan do pruning?

Would someone tell us exactly when pruning and defragmenting happens in
the current version of the patch?  If we don't nail this issue down soon
PostgreSQL 8.3 is going to sail without this patch.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-07 Thread Simon Riggs
On Fri, 2007-09-07 at 22:08 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > > We could begin pruning only when the chain is N long. Currently N=2, but
> > > we could set N=3+ easily enough. There's no code yet to actually count
> > > that, but we can do that easily as we do each lookup. We should also be
> > > able to remember the visibility result for each tuple in the chain to
> > > decide whether pruning will be effective or not. 
> > > 
> > > I would say that if the buffer is already dirty and the chain is
> > > prunable, we should prune it at the first possible chance.
> > 
> > If we defer pruning until the page is full, worst case we may could end
> > up with a chain ~240 tuples long, which might need to be scanned
> > repeatedly. That won't happen often, but it would be better to prune
> > whenever we hit one of these conditions
> > - when the block is full 
> > - when we reach the 16th tuple in a chain

Thanks for defining terms.

Just to answer a few of your points:

> I don't see how following a HOT chain is any slower than following an
> UPDATE chain like we do now.  

The speed/cost is the same. The issue is *when* we do this. Normal
SELECTs will follow the chain each time we do an index lookup. 

So if our read/write ratio is 1000:1, then we will waste many cycles and
yet the chain is never pruned on SELECT. So there really must be some
point at which we prune on a SELECT.

Perhaps we could say if Xid % 16 == 0 then we prune, i.e. every 16th
transaction walking the chain will prune it.

> Also, why all the talk of index lookups doing pruning?  Can't a
> sequential scan do pruning?

The SeqScan doesn't follow the chains, so can't easily determine whether
there are any long chains that need pruning. Its only when we come in
via an index that we need to walk the chain to the latest tuple version
and in that case we learn how long the chain is.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2007-09-07 at 22:08 -0400, Bruce Momjian wrote:
>> I don't see how following a HOT chain is any slower than following an
>> UPDATE chain like we do now.  

> The speed/cost is the same. The issue is *when* we do this. Normal
> SELECTs will follow the chain each time we do an index lookup. 

> So if our read/write ratio is 1000:1, then we will waste many cycles and
> yet the chain is never pruned on SELECT. So there really must be some
> point at which we prune on a SELECT.

This argument is too weak to stand up by itself.

Once a tuple is marked as committed dead (which it certainly must be
before you could even consider pruning it) the cost of skipping over
it in a HOT-chain search is going to be very very small.  You have
already paid all the costs of searching the index, fetching and
pinning the heap page, etc.  The incremental cost of rejecting one
more HOT-chain entry is the cost to
* fetch the line pointer;
* check HEAP_XMIN_COMMITTED and see that it's set;
* check xmin and see that it's valid per snapshot;
* check HEAP_XMAX_COMMITTED and see that it's set;
* check xmax and see that it's valid per snapshot;
* fetch t_ctid to chain to next tuple.
Now the snapshot checks could be relatively expensive --- but in any
situation where a tuple is potentially prunable, they'll reduce to *one*
comparison: is xid less than snapshot's xmin?  So you've got two or at
most three shared-memory cache line fetches, no write to shared memory,
no locks taken or released, and a trivial number of CPU operations.

Compared to what it currently takes to check the same tuple (a separate
index entry fetch and traversal to the heap page), this is already an
enormous performance improvement.  Insisting that we add complexity and
pay performance penalties elsewhere to reduce this cost still more does
not strike me as a sane tradeoff --- certainly not if it's being made
without conclusive performance measurements to back it up.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Heikki Linnakangas
Tom Lane wrote:
> Compared to what it currently takes to check the same tuple (a separate
> index entry fetch and traversal to the heap page), this is already an
> enormous performance improvement.

Though keep in mind that we kill index tuples as soon as they're deemed
to be dead. Nevertheless, I'm not very worried about the cost of
following the chain either. But that's something we can quite easily
measure if we want to.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Florian Pflug

Heikki Linnakangas wrote:

Tom Lane wrote:

Compared to what it currently takes to check the same tuple (a separate
index entry fetch and traversal to the heap page), this is already an
enormous performance improvement.


Though keep in mind that we kill index tuples as soon as they're deemed
to be dead. Nevertheless, I'm not very worried about the cost of
following the chain either. But that's something we can quite easily
measure if we want to.


I'm confused now. I though that pruning would be enough to shorten HOT-Chains -
because the root line pointer afterwards points directly to the first live
tuple. But we can *prune* (without actually defragmenting) without holding
a VACUUM-strength lock, right? Or did I get that wrong?

greetings, Florian Pflug



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Heikki Linnakangas
Florian Pflug wrote:
> Heikki Linnakangas wrote:
>> Tom Lane wrote:
>>> Compared to what it currently takes to check the same tuple (a separate
>>> index entry fetch and traversal to the heap page), this is already an
>>> enormous performance improvement.
>>
>> Though keep in mind that we kill index tuples as soon as they're deemed
>> to be dead. Nevertheless, I'm not very worried about the cost of
>> following the chain either. But that's something we can quite easily
>> measure if we want to.
> 
> I'm confused now. I though that pruning would be enough to shorten
> HOT-Chains -
> because the root line pointer afterwards points directly to the first live
> tuple. But we can *prune* (without actually defragmenting) without holding
> a VACUUM-strength lock, right? Or did I get that wrong?

Yes, that's right. You don't seem to be confused at all.

Tom argued that following the tuple chain is cheap enough, and might
even be cheaper than what we have now, that we don't need to prune just
for the purpose of keeping the chains short. To which I pointed out that
currently, without HOT, we mark index tuples pointing to dead tuples as
killed to avoid following them in the future, so HOT without pruning is
not cheaper than what we have now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Simon Riggs
On Sat, 2007-09-08 at 17:46 +0100, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Compared to what it currently takes to check the same tuple (a separate
> > index entry fetch and traversal to the heap page), this is already an
> > enormous performance improvement.
> 
> Though keep in mind that we kill index tuples as soon as they're deemed
> to be dead. Nevertheless, I'm not very worried about the cost of
> following the chain either. But that's something we can quite easily
> measure if we want to.

OK, I'm good about it if you guys are.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom argued that following the tuple chain is cheap enough, and might
> even be cheaper than what we have now, that we don't need to prune just
> for the purpose of keeping the chains short. To which I pointed out that
> currently, without HOT, we mark index tuples pointing to dead tuples as
> killed to avoid following them in the future, so HOT without pruning is
> not cheaper than what we have now.

That hack only works in plain indexscans, though, not bitmapped scans.
Anyway, I remain unconvinced that the chains would normally get very
long in the first place, if we could prune when updating.

The we-already-pinned-the-page problem is a bit nasty but may not be
insurmountable.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Florian Pflug

Heikki Linnakangas wrote:

Florian Pflug wrote:

Heikki Linnakangas wrote:

Tom Lane wrote:

Compared to what it currently takes to check the same tuple (a separate
index entry fetch and traversal to the heap page), this is already an
enormous performance improvement.

Though keep in mind that we kill index tuples as soon as they're deemed
to be dead. Nevertheless, I'm not very worried about the cost of
following the chain either. But that's something we can quite easily
measure if we want to.

I'm confused now. I though that pruning would be enough to shorten
HOT-Chains -
because the root line pointer afterwards points directly to the first live
tuple. But we can *prune* (without actually defragmenting) without holding
a VACUUM-strength lock, right? Or did I get that wrong?


Yes, that's right. You don't seem to be confused at all.

Tom argued that following the tuple chain is cheap enough, and might
even be cheaper than what we have now, that we don't need to prune just
for the purpose of keeping the chains short. To which I pointed out that
currently, without HOT, we mark index tuples pointing to dead tuples as
killed to avoid following them in the future, so HOT without pruning is
not cheaper than what we have now.


Hm.. In that case, couldn't we prune the chain while we followed it, with
nearly zero overhead at all? It seems that we'd just have to always update
the root line pointer / tuples to point at the first RECENTLY_DEAD tuple in the 
chain, together with maybe compacting the orphaned tuples to just a line pointer


I'm not sure how HOT handles this currently, but it seems that this sort of
minimal pruning shouldn't need WAL.

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Florian Pflug wrote:
> > Heikki Linnakangas wrote:
> >> Tom Lane wrote:
> >>> Compared to what it currently takes to check the same tuple (a separate
> >>> index entry fetch and traversal to the heap page), this is already an
> >>> enormous performance improvement.
> >>
> >> Though keep in mind that we kill index tuples as soon as they're deemed
> >> to be dead. Nevertheless, I'm not very worried about the cost of
> >> following the chain either. But that's something we can quite easily
> >> measure if we want to.
> > 
> > I'm confused now. I though that pruning would be enough to shorten
> > HOT-Chains -
> > because the root line pointer afterwards points directly to the first live
> > tuple. But we can *prune* (without actually defragmenting) without holding
> > a VACUUM-strength lock, right? Or did I get that wrong?
> 
> Yes, that's right. You don't seem to be confused at all.
> 
> Tom argued that following the tuple chain is cheap enough, and might
> even be cheaper than what we have now, that we don't need to prune just
> for the purpose of keeping the chains short. To which I pointed out that
> currently, without HOT, we mark index tuples pointing to dead tuples as
> killed to avoid following them in the future, so HOT without pruning is
> not cheaper than what we have now.

The central idea I now understand is that pruning only shrinks HOT
chains.  It does not allow reuse of space.  Only defragmentation does
that, and that is triggered when the page is almost full.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Bruce Momjian
Simon Riggs wrote:
> > > If we defer pruning until the page is full, worst case we may could end
> > > up with a chain ~240 tuples long, which might need to be scanned
> > > repeatedly. That won't happen often, but it would be better to prune
> > > whenever we hit one of these conditions
> > > - when the block is full 
> > > - when we reach the 16th tuple in a chain
> 
> Thanks for defining terms.
> 
> Just to answer a few of your points:
> 
> > I don't see how following a HOT chain is any slower than following an
> > UPDATE chain like we do now.  
> 
> The speed/cost is the same. The issue is *when* we do this. Normal
> SELECTs will follow the chain each time we do an index lookup. 

But a sequential scan still follows the chain, and that isn't going to
prune.  Why are we more worried about index chain lookups than
sequential scan lookups?

As I understand it, the pruning is only to reduce the chains.  It
doesn't allow space reuse.  I have updated my README to reflect that:

ftp://momjian.us/pub/postgresql/mypatches/README.HOT

> So if our read/write ratio is 1000:1, then we will waste many cycles and
> yet the chain is never pruned on SELECT. So there really must be some
> point at which we prune on a SELECT.
> 
> Perhaps we could say if Xid % 16 == 0 then we prune, i.e. every 16th
> transaction walking the chain will prune it.
> 
> > Also, why all the talk of index lookups doing pruning?  Can't a
> > sequential scan do pruning?
> 
> The SeqScan doesn't follow the chains, so can't easily determine whether
> there are any long chains that need pruning. Its only when we come in
> via an index that we need to walk the chain to the latest tuple version
> and in that case we learn how long the chain is.

Uh, as I understand it, every access follows the ctid, so why doesn't a
sequential scan follow the chain?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Florian Pflug

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Florian Pflug wrote:

Heikki Linnakangas wrote:

Tom Lane wrote:

Compared to what it currently takes to check the same tuple (a separate
index entry fetch and traversal to the heap page), this is already an
enormous performance improvement.

Though keep in mind that we kill index tuples as soon as they're deemed
to be dead. Nevertheless, I'm not very worried about the cost of
following the chain either. But that's something we can quite easily
measure if we want to.

I'm confused now. I though that pruning would be enough to shorten
HOT-Chains -
because the root line pointer afterwards points directly to the first live
tuple. But we can *prune* (without actually defragmenting) without holding
a VACUUM-strength lock, right? Or did I get that wrong?

Yes, that's right. You don't seem to be confused at all.

Tom argued that following the tuple chain is cheap enough, and might
even be cheaper than what we have now, that we don't need to prune just
for the purpose of keeping the chains short. To which I pointed out that
currently, without HOT, we mark index tuples pointing to dead tuples as
killed to avoid following them in the future, so HOT without pruning is
not cheaper than what we have now.


The central idea I now understand is that pruning only shrinks HOT
chains.  It does not allow reuse of space.  Only defragmentation does
that, and that is triggered when the page is almost full.


I was under the impression that pruning *does* free the space occupied
by DEAD HOT-Tuples (minus the size of a redirection line pointer). It
just doesn't defragment, so how much of that freed space you can actually
use to store new tuples is another question...

Or is that wrong - do we need a VACUUM-strength lock to turn a tuple
into a redirecting line pointer, and can pruning therefore only really
free *no* space?

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Tom Lane
Florian Pflug <[EMAIL PROTECTED]> writes:
> I was under the impression that pruning *does* free the space occupied
> by DEAD HOT-Tuples (minus the size of a redirection line pointer). It
> just doesn't defragment, so how much of that freed space you can actually
> use to store new tuples is another question...

None, unless the freed tuple happens to be directly adjacent to the
pd_lower --- pd_upper hole, or unless the patch has done a lot more to
the page-free-space-management code than I thought.  We have never
looked for free space anywhere except the main "hole".

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Simon Riggs wrote:
>> The speed/cost is the same. The issue is *when* we do this. Normal
>> SELECTs will follow the chain each time we do an index lookup. 

> But a sequential scan still follows the chain, and that isn't going to
> prune.  Why are we more worried about index chain lookups than
> sequential scan lookups?

No, a seqscan *doesn't* follow HOT chains.  It just looks at every live
line pointer on the page, sequentially.  An indexscan will have to
follow HOT chains, or it will miss tuples it needs to return.

>> The SeqScan doesn't follow the chains, so can't easily determine whether
>> there are any long chains that need pruning. Its only when we come in
>> via an index that we need to walk the chain to the latest tuple version
>> and in that case we learn how long the chain is.

> Uh, as I understand it, every access follows the ctid, so why doesn't a
> sequential scan follow the chain?

Up to now, normal searches (whether seq or index) have never paid the
slightest attention to t_ctid.  That's only used when a READ COMMITTED
update realizes it's trying to update a non-latest version of a row.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-08 Thread Bruce Momjian
Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> > Tom argued that following the tuple chain is cheap enough, and might
> > even be cheaper than what we have now, that we don't need to prune just
> > for the purpose of keeping the chains short. To which I pointed out that
> > currently, without HOT, we mark index tuples pointing to dead tuples as
> > killed to avoid following them in the future, so HOT without pruning is
> > not cheaper than what we have now.
> 
> That hack only works in plain indexscans, though, not bitmapped scans.
> Anyway, I remain unconvinced that the chains would normally get very
> long in the first place, if we could prune when updating.
> 
> The we-already-pinned-the-page problem is a bit nasty but may not be
> insurmountable.

As I understand it, there are two HOT features:

Single-chain pruning, which trims HOT chains but doesn't reuse
the space

Defragementation, which prunes the entire page and reuses space
and handles deleted rows, etc.

Defragementation is the HOT feature we really want.  Single-chain
pruning is kind of nice because it speeds things up, but isn't
necessary.  The fact that the entire chain is on the same page makes me
think that we could just leave single-chain pruning for 8.4 if
necessary.

I think allowing the chain to be more often on the same page via
defragmentation and having a single index entry for the chain is going
to be a win, and the fact we don't have marked-dead index entries for
some of the chain isn't going to be a problem.  My guess is that the
marked-dead index entries were a win only when the chain was on several
pages, which isn't the case for HOT chains.

FYI, I saw this comment in the patch:

+   /*
+* If the free space left in the page is less than the average FSM
+* request size (or a percentage of it), prune all the tuples or
+* tuple chains in the page. Since the operation requires exclusive
+* access to the page and needs to be WAL logged, we want to do as
+* much as possible. At the same time, since the function may be
+* called from a critical path, we want it to be as fast as
+* possible.
+*
+* Disregard the free space if PAGE_PRUNE_DEFRAG_FORCE option is set.
+*
+* XXX The value of 120% is a ad-hoc choice and we may want to
+* tweak it if required:
+*
+* XXX The average request size for a relation is currently
+* initialized to a small value such as 256. So for a table with
+* large size tuples, during initial few UPDATEs we may not prune
+* a page even if the free space available is less than the new
+* tuple size - resulting in unnecessary extention of the relation.
+* Add a temporary hack to prune the page if the free space goes
+* below a certain percentage of the block size (set to 12.5% here))
+*/

So this is how the system determines if it should defrag the whole page.
The defrag function is heap_page_prune_defrag().  The big downside of
this function is it has to get a lock to survey things and it often has
to guess if it should activate or not, meaning it has no idea if free
space is needed on this page or not.

In summary, I feel we have the HOT mechanics down well, but the open
issue is _when_ to activate each operation.

(Can someone time the access time for following a chain that fills an
entire page (the worst case) vs. having a single tuple on the page?)

In an ideal world, we would prune single chains only when they were long
enough to cause a performance impact, and would defragment only when a
new row will not fit on the page.  Other than those two cases, we don't
care how much dead space there is on a page.

However, there are two complexities to this.  One, we can't be sure we
can defragment when we need it because we might not get the lock, and
second, we are only going to try to put a row on a page if we are
updating a row on that page.  If the page is 90% dead but no rows are
being updated on that page no one will try to add a row to the page
because FSM thinks it is full.  That might be OK, it might not.

Another issue.  My guess is that it will take 2-3 weeks to get HOT
applied, meaning we aren't going to go to beta before October 1.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-09 Thread Simon Riggs
On Sat, 2007-09-08 at 23:13 -0400, Bruce Momjian wrote:

> Defragementation is the HOT feature we really want.  Single-chain
> pruning is kind of nice because it speeds things up, but isn't
> necessary.  The fact that the entire chain is on the same page makes me
> think that we could just leave single-chain pruning for 8.4 if
> necessary.

Agreed

> I think allowing the chain to be more often on the same page via
> defragmentation and having a single index entry for the chain is going
> to be a win, and the fact we don't have marked-dead index entries for
> some of the chain isn't going to be a problem.  My guess is that the
> marked-dead index entries were a win only when the chain was on several
> pages, which isn't the case for HOT chains.

Agreed

> In summary, I feel we have the HOT mechanics down well, but the open
> issue is _when_ to activate each operation.

That has been the only open issue for months. The interplay of behaviour
is where HOT succeeds and fails. Some behaviours sound like they will be
good, but aren't. 

> However, there are two complexities to this.  One, we can't be sure we
> can defragment when we need it because we might not get the lock, and

That is a probability call. Heikki measured that, and its a good bet.

If we do fail to fragment, then one of the rows will migrate to another
block. There will likely be fewer backends contending for that block and
then fragmentation will succeed. So this behaviour is self-balancing.

> second, we are only going to try to put a row on a page if we are
> updating a row on that page.  If the page is 90% dead but no rows are
> being updated on that page no one will try to add a row to the page
> because FSM thinks it is full.  That might be OK, it might not.

We must consider data row contention as well as space efficiency. 

In the current system UPDATE contention is not an issue because we treat
them as INSERTs and there is a good scheme in place to deal with INSERT
contention. It would be easy to neglect UPDATE contention issues with
HOT because of this.

If we actively send more backends towards a full block, we may increase
contention which is a bad thing, and we may cause defragmentation to
fail, which would be even worse. The net result will be that some
backends go to a different block again, so it seems simpler to not send
backends towards other blocks just because they are full.

> Another issue.  My guess is that it will take 2-3 weeks to get HOT
> applied, meaning we aren't going to go to beta before October 1.

Seems reasonable.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT patch - version 15

2007-09-09 Thread Bruce Momjian
Simon Riggs wrote:
> On Sat, 2007-09-08 at 23:13 -0400, Bruce Momjian wrote:
> > In summary, I feel we have the HOT mechanics down well, but the open
> > issue is _when_ to activate each operation.
> 
> That has been the only open issue for months. The interplay of behaviour
> is where HOT succeeds and fails. Some behaviours sound like they will be
> good, but aren't. 

Looking at the patch, it seems defragmentation is checked every time a
heap page is pinned in heapam.c by calling heap_page_prune_defrag().  If
page free space < 1.2 * average_tuple_size the page is defragmented.  It
would seem defragmentation is checked as much as possible.  Seems
pruning is also as aggressive as possible.

I think an open question is whether defragmentation should happen _only_
when we are trying to add something to the page and it doesn't fit. 
Another open question is whether pruning should happen only when the
chain gets to a certain length.

The larger question is how are we going to get these answers?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-09 Thread Simon Riggs
On Sun, 2007-09-09 at 07:55 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Sat, 2007-09-08 at 23:13 -0400, Bruce Momjian wrote:
> > > In summary, I feel we have the HOT mechanics down well, but the open
> > > issue is _when_ to activate each operation.
> > 
> > That has been the only open issue for months. The interplay of behaviour
> > is where HOT succeeds and fails. Some behaviours sound like they will be
> > good, but aren't. 
> 
> Looking at the patch, it seems defragmentation is checked every time a
> heap page is pinned in heapam.c by calling heap_page_prune_defrag().  If
> page free space < 1.2 * average_tuple_size the page is defragmented.  It
> would seem defragmentation is checked as much as possible.  Seems
> pruning is also as aggressive as possible.

I think the factor should be 1.0. A Setting of 1.2 will lead to twice as
many defragmentation attempts when we have a fixed row length.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 09:03 +0100, Heikki Linnakangas wrote:

> > As I understand it, there are two HOT features:
> > 
> > Single-chain pruning, which trims HOT chains but doesn't reuse
> > the space
> > 
> > Defragementation, which prunes the entire page and reuses space
> > and handles deleted rows, etc.
> 
> I'm repeating myself, but I would split that second operation into two
> parts while we think about this: pruning the entire page, and
> defragmenting (PageRepairFragmentation). Tom wondered why they're
> separated in the patch. As the patch stands, there is no reason, but I
> feel that separating them and doing them at different times might be an
> important piece in the puzzle.

Yes, it does seem important to keep that separation, since it is
possible to do these things at different times, even if, for now, we
choose not to.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Heikki Linnakangas
Bruce Momjian wrote:
> Tom Lane wrote:
>> The we-already-pinned-the-page problem is a bit nasty but may not be
>> insurmountable.

If you find a way, that would be great.

> As I understand it, there are two HOT features:
> 
>   Single-chain pruning, which trims HOT chains but doesn't reuse
>   the space
> 
>   Defragementation, which prunes the entire page and reuses space
>   and handles deleted rows, etc.

I'm repeating myself, but I would split that second operation into two
parts while we think about this: pruning the entire page, and
defragmenting (PageRepairFragmentation). Tom wondered why they're
separated in the patch. As the patch stands, there is no reason, but I
feel that separating them and doing them at different times might be an
important piece in the puzzle.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/10/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
>
>
> I'm repeating myself, but I would split that second operation into two
> parts while we think about this: pruning the entire page, and
> defragmenting (PageRepairFragmentation). Tom wondered why they're
> separated in the patch. As the patch stands, there is no reason, but I
> feel that separating them and doing them at different times might be an
> important piece in the puzzle.
>
>

I think we all agree that defragmentation should be done less aggressively
and ideally when we are going to insert a tuple in the page and running
low on free space in the page. If we could really do it in heap_update(),
after
we figure out that there is not enough free space available in the page,
that will be great. We already have exclusive lock on the page and hopefully
we also have the vacuum-lock. (This is how earlier version of the patch used
to work - before we took out all the complexity associated with tracking
LP_DELETEd tuples,
tracking fragmented tuples and replaced them with simple
PageRepairFragmentation
Since in the earlier version, we never used to call PageRepairFragmentation,
we could easily do pruning in heap_update())

If we can't find a way to defragment inside heap_update(), then we have
three
choices:

1. Pass a hint to heap_fetch that a UPDATE may follow. If we are running low
on free space, we try to prune and defragment at that point.

2. Move some of the work to bgwriter.

3. Use some heuristic to decide whether to defragment or not.

I am not sure how easy it is to know apriori that we are fetching a tuple
for UPDATE. Assuming we can, this seems like a good idea.

Eventually we may want to move some work to bgwriter or some other
background process. But my take would be to save that for 8.4

As a starting point, we are doing 3 in the patch. We always try to
keep just one tuple worth of space free in the page. So if an UPDATE
takes up  remaining free space in the page, the page will be
pruned/defraged in the subsequent lookup (index or seq). In read-mostly
scenario,
only the first lookup would trigger prune/defrag, but next many lookups
would be quick. In update-mostly scenario, most of the heap_fetches
would anyways end up doing update, so doing pruning/defragmentation
in heap_fetch shouldn't be too bad. For a balanced scenario, we might
be moving cost of pruning/defragmentation to the SELECT path, but
UPDATEs would be quick.


The other issue is when to prune the HOT chain. I tend to agree that
the chances of having long HOT chains are less and the cost of following the
chain won't be too significant. At the same we probably don't want to live
with very long chains forever. An example would be: a tuple gets HOT updated
several times in a transaction. If the page is never updated again, the long
HOT
chain would never be pruned. I like Simon's idea of pruning the chain if it
goes beyond a limit. Instead of adding any significant complexity to
track length of each HOT chain, we can just mark the page with a flag
if heap_hot_fetch() detects a long chain. The next lookup (index or seq)
of the page will try to prune the page and clear the flag if successful.

I also agree that we should avoid taking exclusive lock before checking
free space in heap_page_prune_defrag(). That might be unsafe, but we
don't care if you occasionally skip the maintenance work (or do it a
little early)

Thanks,
Pavan

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Heikki Linnakangas
Bruce Momjian wrote:
> (Can someone time the access time for following a chain that fills an
> entire page (the worst case) vs. having a single tuple on the page?)

Here is some results, on my laptop. The test case is a table with a
single integer column, and a single row in the table. The row is updated
250 times to make a 251 tuples long chain. Page can hold 255 tuples, so
this is pretty much the worst case. After that, the row is fetched
100 times, in a PL/pgSQL function. I ran the tests with
enable_seqscan on and off, see "seqscan" and "idxscan" rows below.
Numbers are the times spent doing the fetches, in seconds. I repeated
each test a few times to make sure that the results are repeatable, they
seem to be repeatable to ~0.1s precision. The test script used is
attached. Autovacuum was disabled, and shared_buffers=320MB, otherwise
all settings were left to defaults.

HEADHOT HOT-opt HOT-pruned
seqscan 19.921.120.111.5
idxscan 27.831.430.413.7

Explanations of the columns:

HEAD:   CVS HEAD
HOT-pruned: CVS HEAD + HOT patch v15
HOT:CVS HEAD + HOT patch v15, but with heap_page_prune_defrag short
circuited to do nothing
HOT-opt:CVS HEAD + HOT patch v15, but with static XidInMVCCSnapshot
like in CVS HEAD

I didn't expect a difference in seqscan performance between HEAD and
HOT. I oprofiled it, and figured out that it's because HOT patch removed
the static-qualifier XidInMVCCSnapshot, because it's needed in
plancat.c. I changed it back to static, dummying out the call in
plancat.c, and the results are now closer to each other (HOT-opt column).

Comparing the idxscan columns, it looks like following the chain *is*
more expensive than having to go through killed index pointers. Pruning
clearly does help.

Given that this test is pretty much the worst case scenario, I'm ok with
not pruning for the purpose of keeping chains short.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
set enable_seqscan=on;

DROP TABLE hottest;

CREATE TABLE hottest (id integer);

INSERT INTO hottest VALUES (1);

CREATE INDEX i_hottest ON hottest(id);

-- Fill the table 

CREATE OR REPLACE FUNCTION longchain (n integer) RETURNS void LANGUAGE PLPGSQL AS 
$$
DECLARE
  i integer;
BEGIN
  FOR i IN 1..n LOOP
UPDATE hottest SET id=id WHERE id = 1;
  END LOOP;
END;
$$;

SELECT longchain(250);

CREATE OR REPLACE FUNCTION fetchchain (n integer) RETURNS void LANGUAGE PLPGSQL AS 
$$
DECLARE
  i integer;
  foo integer;
BEGIN
  FOR i IN 1..n LOOP
SELECT id INTO foo FROM hottest WHERE id = 1;
  END LOOP;
END;
$$;

\timing
SELECT fetchchain(100);
\timing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
> Explanations of the columns:
> 
> HEAD: CVS HEAD
> HOT-pruned:   CVS HEAD + HOT patch v15
> HOT:  CVS HEAD + HOT patch v15, but with heap_page_prune_defrag short
> circuited to do nothing
> HOT-opt:  CVS HEAD + HOT patch v15, but with static XidInMVCCSnapshot
> like in CVS HEAD

One correction: HOT-opt was also short circuited like HOT. IOW, HOT and
HOT-opt are the same, except for the static qualifier in XidInMVCCSnapshot

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/6/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> > When I suggested that we get rid of the LP_DELETE flag for heap tuples,
> > the tuple-level fragmentation and all that, and just take the vacuum
> > lock and call PageRepairFragmentation, I was thinking that we'd do it in
> > heap_update and only when we run out of space on the page. But as Greg
> > said, it doesn't work because you're already holding a reference to at
> > least one tuple on the page, the one you're updating, by the time you
> > get to heap_update. That's why I put the pruning code to heap_fetch
> > instead. Yes, though the amortized cost is the same, it does push the
> > pruning work to the foreground query path.
>
> The amortized cost is only "the same" if every heap_fetch is associated
> with a heap update.  I feel pretty urgently unhappy about this choice.
> Have you tested the impact of the patch on read-mostly workloads?
>
>
For read-mostly workloads, only the first SELECT after an UPDATE
would trigger pruning/defragmentation. heap_page_prune_defrag()
would be a no-op for subsequent SELECTs  (PageIsPrunable() would
be false until the next UPDATE)

I think Heikki's recent test confirms this.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 12:17 +0100, Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > (Can someone time the access time for following a chain that fills an
> > entire page (the worst case) vs. having a single tuple on the page?)
> 
> Here is some results, on my laptop. 

>   HEADHOT HOT-opt HOT-pruned
> seqscan   19.921.120.111.5
> idxscan   27.831.430.413.7
> 

> Comparing the idxscan columns, it looks like following the chain *is*
> more expensive than having to go through killed index pointers. Pruning
> clearly does help.
> 
> Given that this test is pretty much the worst case scenario, I'm ok with
> not pruning for the purpose of keeping chains short.

I wasn't expecting that result and had accepted the counter argument.
IMHO it overturns the argument, or at least begs more test results.
Perhaps we need to compare this with a situation where we do 50% reads
and 50% writes where we try to prune aggressively on reads. There are
clearly big wins to be had in comparison with HEAD, but also the
possibility of regressions (in multiple directions) for various worst
cases.

ISTM that if we made 1 in every 16 index scans prune chains longer than
N (=16?) then we would avoid worst cases like this for almost no
additional (amortized) cost.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/8/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
>
>
>
> Would someone tell us exactly when pruning and defragmenting happens in
> the current version of the patch?  If we don't nail this issue down soon
> PostgreSQL 8.3 is going to sail without this patch.
>
>
>

I guess you already have the answers by now, but let me repeat here:

Until patch version 14, defragmentation and pruning were separate
operations, though we used to invoke both at the same time. The only
difference was that pruning can work with a simple exclusive lock
whereas defragmentation needs vacuum-strength lock. Tom argued
that its not worth the complexity, so I changed that and clubbed
both the operations together. What it means: pruning also now needs
vacuum-strength lock and is always followed by defragmentation.

So as the patch stands (version 15), we call heap_page_prune_defrag()
inside heap_release_fetch() and heapgetpage(), apart from the vacuum
loop. It checks for PageIsPrunable() before proceeding. PageIsPrunable
is set when a tuple is updated/deleted. So for read-mostly workloads
heap_page_prune_defrag will mostly be no-op.

If PageIsPrunable is set, cleanup lock is available (exclusive lock is tried
conditionally, so we don't wait if there is contention) and we are running
low on free space (right now if free space is less than 1.2 times the
average
tuple size or less than BLCKSZ/8), the entire page is pruned and
fragmentation
is repaired.

We also prune-defrag is vacuum lazy and vacuum full. But I assume we
are not worried about these maintenance activities.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/10/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2007-09-10 at 12:17 +0100, Heikki Linnakangas wrote:
> > Bruce Momjian wrote:
> > > (Can someone time the access time for following a chain that fills an
> > > entire page (the worst case) vs. having a single tuple on the page?)
> >
> > Here is some results, on my laptop.
>
> >   HEADHOT HOT-opt HOT-pruned
> > seqscan   19.921.120.111.5
> > idxscan   27.831.430.413.7
> >
>
> > Comparing the idxscan columns, it looks like following the chain *is*
> > more expensive than having to go through killed index pointers. Pruning
> > clearly does help.
> >
> > Given that this test is pretty much the worst case scenario, I'm ok with
> > not pruning for the purpose of keeping chains short.
>
> I wasn't expecting that result and had accepted the counter argument.
>
>

Yes, I go with Simon. I am also surprised that HOT-pruned did
so well in this setup. I always thought that HOT would do well
in update-intensive scenarios, but from the results it seems that
HOT is also doing well for read-mostly queries.

In this particular example, the first SELECT after the 250 UPDATEs
would have pruned all the dead tuples and reduced HOT chain
to a single tuple. Hence the total time for subsequent SELECTs improved
tremendously.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Heikki Linnakangas wrote:
>   HEADHOT HOT-opt HOT-pruned
> seqscan   19.921.120.111.5
> idxscan   27.831.430.413.7
> 
> Explanations of the columns:
> 
> HEAD: CVS HEAD
> HOT-pruned:   CVS HEAD + HOT patch v15
> HOT:  CVS HEAD + HOT patch v15, but with heap_page_prune_defrag short
> circuited to do nothing
> HOT-opt:  CVS HEAD + HOT patch v15, but with static XidInMVCCSnapshot
> like in CVS HEAD
> 
> I didn't expect a difference in seqscan performance between HEAD and
> HOT. I oprofiled it, and figured out that it's because HOT patch removed
> the static-qualifier XidInMVCCSnapshot, because it's needed in
> plancat.c. I changed it back to static, dummying out the call in
> plancat.c, and the results are now closer to each other (HOT-opt column).
> 
> Comparing the idxscan columns, it looks like following the chain *is*
> more expensive than having to go through killed index pointers. Pruning
> clearly does help.
> 
> Given that this test is pretty much the worst case scenario, I'm ok with
> not pruning for the purpose of keeping chains short.

Wow, that is very helpful.  You have shown that pruning a single chain
is indeed a valuable operation.  My guess is that once you prune a tuple
you no longer have to check its visibility, and that is where the win is
coming from.  

If we check a tuple in a chain and the tuple is dead is it possible the
pruning operation is cheaper than having to check the tuple again for
visibility the next time we see it?  If so, we can just always prune
when we see a dead tuple in the chain, which I believe was the original
design.  Pruning becomes an operation similar to marking an index entry
as dead.  (I assuming pruing does not generate WAL traffic.)

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Heikki Linnakangas
Bruce Momjian wrote:
> My guess is that once you prune a tuple
> you no longer have to check its visibility, and that is where the win is
> coming from.  

Yes, I think that's right.

Oh, one more thing occured to me. Without HOT, we not only mark index
tuples pointing to dead tuples as killed, we remove them altogether if
the index page gets full. If you modify the test case so that after
doing the updates, you insert a bunch of tuples with a different key to
fill the index page, you should see CVS HEAD winning HOT without pruning
hands down.

> If we check a tuple in a chain and the tuple is dead is it possible the
> pruning operation is cheaper than having to check the tuple again for
> visibility the next time we see it?  If so, we can just always prune
> when we see a dead tuple in the chain, which I believe was the original
> design.  Pruning becomes an operation similar to marking an index entry
> as dead.  (I assuming pruing does not generate WAL traffic.)

Pruning does generate a WAL record at the moment. Maybe you could do
some kind of a quick pruning without a WAL record. Like just modify the
ctid of the oldest dead tuple in the chain, or the redirecting line
pointer if there is one, to point to the latest live tuple, without
removing the dead tuples or the line pointers.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Pavan Deolasee wrote:
> Until patch version 14, defragmentation and pruning were separate
> operations, though we used to invoke both at the same time. The only
> difference was that pruning can work with a simple exclusive lock
> whereas defragmentation needs vacuum-strength lock. Tom argued
> that its not worth the complexity, so I changed that and clubbed
> both the operations together. What it means: pruning also now needs
> vacuum-strength lock and is always followed by defragmentation.

Interesting.  See my previous post to Heikki stating we might need to
change that based on Heikki test results.

> So as the patch stands (version 15), we call heap_page_prune_defrag()
> inside heap_release_fetch() and heapgetpage(), apart from the vacuum
> loop. It checks for PageIsPrunable() before proceeding. PageIsPrunable
> is set when a tuple is updated/deleted. So for read-mostly workloads
> heap_page_prune_defrag will mostly be no-op.
> 
> If PageIsPrunable is set, cleanup lock is available (exclusive lock is tried
> conditionally, so we don't wait if there is contention) and we are running
> low on free space (right now if free space is less than 1.2 times the
> average
> tuple size or less than BLCKSZ/8), the entire page is pruned and
> fragmentation
> is repaired.

Looking at the patch I see:

+   /*
+* Mark the page as clear of prunable tuples. If we find a tuple which
+* may become prunable, we shall set the hint again.
+*/
+   PageClearPrunable(page);

I like the idea of the page hint bit, but my question is if there is a
long-running transaction, isn't each SELECT going to try to defragment a
page over and over again because there is still something prunable on
the page?

I think we need to find out how hard it would be to try the
defragmentation only on INSERT or UPDATE.  The hint bit might still be
useful.

> We also prune-defrag is vacuum lazy and vacuum full. But I assume we
> are not worried about these maintenance activities.

Right.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Gregory Stark
"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Looking at the patch I see:
>
> +   /*
> +* Mark the page as clear of prunable tuples. If we find a tuple which
> +* may become prunable, we shall set the hint again.
> +*/
> +   PageClearPrunable(page);
>
> I like the idea of the page hint bit, but my question is if there is a
> long-running transaction, isn't each SELECT going to try to defragment a
> page over and over again because there is still something prunable on
> the page?

Well it'll try to prune the chains over and over again. If it doesn't find
anything it won't defragment, but yes.

I think we could tackle that by storing on the page the oldest xmax which
would allow us to prune a tuple. Then you could compare RecentGlobalXmin
against that and not bother looking at any other chains if it hasn't been
passed yet. 

It would be hard to do that with single-chain pruning though, once you the
limiting tuple you would then wouldn't know what the next limiting xmax is
until the next time you do a full-page prune. Still that gets us down to at
most two full-page prunes per update instead of a potentially unbounded number
of prunes.

This seems like a further optimization to think about after we have a place to
trigger the pruning where it'll do the most good.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Florian Pflug

Bruce Momjian wrote:

Looking at the patch I see:

+   /*
+* Mark the page as clear of prunable tuples. If we find a tuple which
+* may become prunable, we shall set the hint again.
+*/
+   PageClearPrunable(page);

I like the idea of the page hint bit, but my question is if there is a
long-running transaction, isn't each SELECT going to try to defragment a
page over and over again because there is still something prunable on
the page?


Maybe that risk could be lowered if instead of a flag, we stored the
minimal global xmin needed to prune at least one tuple.

greetings, Florian Pflug



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 18:23 +0100, Heikki Linnakangas wrote:

> > If we check a tuple in a chain and the tuple is dead is it possible the
> > pruning operation is cheaper than having to check the tuple again for
> > visibility the next time we see it?  If so, we can just always prune
> > when we see a dead tuple in the chain, which I believe was the original
> > design.  Pruning becomes an operation similar to marking an index entry
> > as dead.  (I assuming pruing does not generate WAL traffic.)
> 
> Pruning does generate a WAL record at the moment. Maybe you could do
> some kind of a quick pruning without a WAL record. Like just modify the
> ctid of the oldest dead tuple in the chain, or the redirecting line
> pointer if there is one, to point to the latest live tuple, without
> removing the dead tuples or the line pointers.

Sounds like a great idea.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/10/07, Florian Pflug <[EMAIL PROTECTED]> wrote:
>
>
> Maybe that risk could be lowered if instead of a flag, we stored the
> minimal global xmin needed to prune at least one tuple.
>
>

I like the idea. The question is whether the chances of a Prunable
page being looked up again and again in presence of a long
running transaction are high enough to justify adding 4 bytes
to page header.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/10/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
>
> Oh, one more thing occured to me. Without HOT, we not only mark index
> tuples pointing to dead tuples as killed, we remove them altogether if
> the index page gets full. If you modify the test case so that after
> doing the updates, you insert a bunch of tuples with a different key to
> fill the index page, you should see CVS HEAD winning HOT without pruning
> hands down.
>
>
Um. I am not sure I follow that. With HOT even if the HOT chain is long,
there is still just one index entry for the entire chain. So I don't see
why CVS HEAD would do better than HOT in the above case. Net-net
there will be equal number of index keys after the inserts.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/11/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
>
> Heikki Linnakangas wrote:
>
> >
> > Pruning does generate a WAL record at the moment. Maybe you could do
> > some kind of a quick pruning without a WAL record. Like just modify the
> > ctid of the oldest dead tuple in the chain, or the redirecting line
> > pointer if there is one, to point to the latest live tuple, without
> > removing the dead tuples or the line pointers.
>
> I am wondering what you even mean by removing the dead tuples when
> pruning.  I thought only defragmentation removed tuples.
>
>
Pruning removes intermediate dead tuples by marking their line pointers
~LP_USED and redirecting the root line pointer to the first
live/recently_dead
tuple in the chain. OTOH defragmentation moves tuples around to repair
fragmentation. IOW defragmentation is nothing but calling
PageRepairFragmentation on the page.

For example, if we have a HOT chain of 5 tuples:

1 --> 2 --> 3 --> 4 --> 5

of which, 1, 2 and 3 are DEAD, 4 is RECENTLY_DEAD and 5 is LIVE

At the end of pruning:

- line pointer of 1 is redirected to 4
- line pointers of 2 and 3 are marked ~LP_USED
- the offset of 4 and 5 is unchanged

At the end of defragmentation:

- 4 and 5 are moved to the end of the page to create a larger
contiguous free space in the page.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > My guess is that once you prune a tuple
> > you no longer have to check its visibility, and that is where the win is
> > coming from.  
> 
> Yes, I think that's right.
> 
> Oh, one more thing occured to me. Without HOT, we not only mark index
> tuples pointing to dead tuples as killed, we remove them altogether if
> the index page gets full. If you modify the test case so that after
> doing the updates, you insert a bunch of tuples with a different key to
> fill the index page, you should see CVS HEAD winning HOT without pruning
> hands down.
> 
> > If we check a tuple in a chain and the tuple is dead is it possible the
> > pruning operation is cheaper than having to check the tuple again for
> > visibility the next time we see it?  If so, we can just always prune
> > when we see a dead tuple in the chain, which I believe was the original
> > design.  Pruning becomes an operation similar to marking an index entry
> > as dead.  (I assuming pruing does not generate WAL traffic.)
> 
> Pruning does generate a WAL record at the moment. Maybe you could do
> some kind of a quick pruning without a WAL record. Like just modify the
> ctid of the oldest dead tuple in the chain, or the redirecting line
> pointer if there is one, to point to the latest live tuple, without
> removing the dead tuples or the line pointers.

I am wondering what you even mean by removing the dead tuples when
pruning.  I thought only defragmentation removed tuples.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Pavan Deolasee wrote:
> > > Pruning does generate a WAL record at the moment. Maybe you could do
> > > some kind of a quick pruning without a WAL record. Like just modify the
> > > ctid of the oldest dead tuple in the chain, or the redirecting line
> > > pointer if there is one, to point to the latest live tuple, without
> > > removing the dead tuples or the line pointers.
> >
> > I am wondering what you even mean by removing the dead tuples when
> > pruning.  I thought only defragmentation removed tuples.
> >
> >
> Pruning removes intermediate dead tuples by marking their line pointers
> ~LP_USED and redirecting the root line pointer to the first
> live/recently_dead
> tuple in the chain. OTOH defragmentation moves tuples around to repair
> fragmentation. IOW defragmentation is nothing but calling
> PageRepairFragmentation on the page.
> 
> For example, if we have a HOT chain of 5 tuples:
> 
> 1 --> 2 --> 3 --> 4 --> 5
> 
> of which, 1, 2 and 3 are DEAD, 4 is RECENTLY_DEAD and 5 is LIVE
> 
> At the end of pruning:
> 
> - line pointer of 1 is redirected to 4
> - line pointers of 2 and 3 are marked ~LP_USED
> - the offset of 4 and 5 is unchanged
> 
> At the end of defragmentation:
> 
> - 4 and 5 are moved to the end of the page to create a larger
> contiguous free space in the page.

Right.  My point is that pruning only modifies item pointers.  It does
not change the actual heap tuples.  In the quote above, how is Heikki's
"quick pruning" different from the pruning you are describing?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Heikki Linnakangas
Bruce Momjian wrote:
> Pavan Deolasee wrote:
 Pruning does generate a WAL record at the moment. Maybe you could do
 some kind of a quick pruning without a WAL record. Like just modify the
 ctid of the oldest dead tuple in the chain, or the redirecting line
 pointer if there is one, to point to the latest live tuple, without
 removing the dead tuples or the line pointers.
>>> I am wondering what you even mean by removing the dead tuples when
>>> pruning.  I thought only defragmentation removed tuples.
>>>
>>>
>> Pruning removes intermediate dead tuples by marking their line pointers
>> ~LP_USED and redirecting the root line pointer to the first
>> live/recently_dead
>> tuple in the chain. OTOH defragmentation moves tuples around to repair
>> fragmentation. IOW defragmentation is nothing but calling
>> PageRepairFragmentation on the page.
>>
>> For example, if we have a HOT chain of 5 tuples:
>>
>> 1 --> 2 --> 3 --> 4 --> 5
>>
>> of which, 1, 2 and 3 are DEAD, 4 is RECENTLY_DEAD and 5 is LIVE
>>
>> At the end of pruning:
>>
>> - line pointer of 1 is redirected to 4
>> - line pointers of 2 and 3 are marked ~LP_USED
>> - the offset of 4 and 5 is unchanged
>>
>> At the end of defragmentation:
>>
>> - 4 and 5 are moved to the end of the page to create a larger
>> contiguous free space in the page.
> 
> Right.  My point is that pruning only modifies item pointers.  It does
> not change the actual heap tuples.  In the quote above, how is Heikki's
> "quick pruning" different from the pruning you are describing?

In "quick pruning", you don't mark the line pointer as not used.
Otherwise someone might reuse the line pointer for another tuple, which
is a problem if you then crash. WAL replay would see an insert to a line
pointer that's already in use (the quick pruning wouldn't be WAL
logged), which would raise an error.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/11/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
>
>
>
> Right.  My point is that pruning only modifies item pointers.  It does
> not change the actual heap tuples.  In the quote above, how is Heikki's
> "quick pruning" different from the pruning you are describing?
>
>
>
I think the only difference is that the quick pruning does not mark
intermediate tuples ~LP_USED and hence we may avoid WAL logging.

Btw, I am not too enthusiastic about quick pruning because it leaves
behind dead heap-only tuples which are not reachable from any root
heap tuples. Not that we can't handle them, but I am worried about
making such changes right now unless we are sure about the benefits.
We can always tune and tweak in 8.4

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 20:46 +0100, Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Pavan Deolasee wrote:
> >> At the end of pruning:
> >>
> >> - line pointer of 1 is redirected to 4
> >> - line pointers of 2 and 3 are marked ~LP_USED
> >> - the offset of 4 and 5 is unchanged
> >>

> > In the quote above, how is Heikki's
> > "quick pruning" different from the pruning you are describing?

> In "quick pruning", you don't mark the line pointer as not used.
> Otherwise someone might reuse the line pointer for another tuple, which
> is a problem if you then crash. WAL replay would see an insert to a line
> pointer that's already in use (the quick pruning wouldn't be WAL
> logged), which would raise an error.

That seems like the best way forward to me. It's just a hint that says
where the first live tuple is in the chain.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Gregory Stark wrote:
> "Bruce Momjian" <[EMAIL PROTECTED]> writes:
> 
> > Looking at the patch I see:
> >
> > +   /*
> > +* Mark the page as clear of prunable tuples. If we find a tuple 
> > which
> > +* may become prunable, we shall set the hint again.
> > +*/
> > +   PageClearPrunable(page);
> >
> > I like the idea of the page hint bit, but my question is if there is a
> > long-running transaction, isn't each SELECT going to try to defragment a
> > page over and over again because there is still something prunable on
> > the page?
> 
> Well it'll try to prune the chains over and over again. If it doesn't find
> anything it won't defragment, but yes.
> 
> I think we could tackle that by storing on the page the oldest xmax which
> would allow us to prune a tuple. Then you could compare RecentGlobalXmin
> against that and not bother looking at any other chains if it hasn't been
> passed yet. 

Yea, that might work if we have space on the page.

> It would be hard to do that with single-chain pruning though, once you the
> limiting tuple you would then wouldn't know what the next limiting xmax is
> until the next time you do a full-page prune. Still that gets us down to at
> most two full-page prunes per update instead of a potentially unbounded number
> of prunes.
> 
> This seems like a further optimization to think about after we have a place to
> trigger the pruning where it'll do the most good.

I was thinking if we could only try defragementing when we are doing an
INSERT or UPDATE, then the defragment would either free enough space so
we could stay on the page or the new row would go on another page and we
would probably not return to that page again anytime soon.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Tue, 2007-09-11 at 01:22 +0530, Pavan Deolasee wrote:

> I think the only difference is that the quick pruning does not mark
> intermediate tuples ~LP_USED and hence we may avoid WAL logging.

Sounds great.

> Btw, I am not too enthusiastic about quick pruning because it leaves 
> behind dead heap-only tuples which are not reachable from any root
> heap tuples. Not that we can't handle them, but I am worried about
> making such changes right now unless we are sure about the benefits.
> We can always tune and tweak in 8.4

...but we don't want to reach them. They are waiting to be removed.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Gregory Stark

"Pavan Deolasee" <[EMAIL PROTECTED]> writes:

> On 9/11/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
>
>> Right.  My point is that pruning only modifies item pointers.  It does
>> not change the actual heap tuples.  In the quote above, how is Heikki's
>> "quick pruning" different from the pruning you are describing?
>
> I think the only difference is that the quick pruning does not mark
> intermediate tuples ~LP_USED and hence we may avoid WAL logging.
>
> Btw, I am not too enthusiastic about quick pruning because it leaves
> behind dead heap-only tuples which are not reachable from any root
> heap tuples. Not that we can't handle them, but I am worried about
> making such changes right now unless we are sure about the benefits.
> We can always tune and tweak in 8.4

You could mark such tuples with LP_DELETE. That would also let other
transactions quickly tot up how much space would be available if they were to
run PageRepairFragmentation.

But if you don't WAL log setting LP_DELETE then you would still have to deal
with unreachable HOT tuples who lost their LP_DELETE flag. I suppose that as
long as PageRepairFragmentation first looks for any dead HOT tuples without
depending on LP_DELETE that would be enough.

I wonder if you could do the quick prune without even taking the exclusive
page lock? You're overwriting 16 bits and you know nobody else will be
modifying any of the line pointers in question to anything else. (because your
pin prevents the vacuum lock from coming in and trying to mark it unused).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> Pruning removes intermediate dead tuples by marking their line pointers
> ~LP_USED and redirecting the root line pointer to the first
> live/recently_dead tuple in the chain.

It seems utterly unsafe to do this without a vacuum-grade lock on the
page.  Someone might be examining the root tuple (eg, in a SnapshotAny
scan) and I think they are entitled to assume that the line pointer
stays valid while they are looking at the tuple.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2007-09-11 at 01:22 +0530, Pavan Deolasee wrote:
>> I think the only difference is that the quick pruning does not mark
>> intermediate tuples ~LP_USED and hence we may avoid WAL logging.

> Sounds great.

What it sounds is utterly unsafe.  You can get away with not WAL-logging
individual bit flips (that is, hint-bit-setting) because either state of
the page is valid.  If I read this proposal correctly it is to change
t_ctid without WAL-logging, which means that a partial page write (torn
page syndrome) could leave the page undetectably corrupted --- t_ctid
is 6 bytes and could easily cross a hardware sector boundary.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Tue, 2007-09-11 at 01:22 +0530, Pavan Deolasee wrote:
> >> I think the only difference is that the quick pruning does not mark
> >> intermediate tuples ~LP_USED and hence we may avoid WAL logging.
> 
> > Sounds great.
> 
> What it sounds is utterly unsafe.  You can get away with not WAL-logging
> individual bit flips (that is, hint-bit-setting) because either state of
> the page is valid.  If I read this proposal correctly it is to change
> t_ctid without WAL-logging, which means that a partial page write (torn
> page syndrome) could leave the page undetectably corrupted --- t_ctid
> is 6 bytes and could easily cross a hardware sector boundary.

OK, we certainly want to avoid WAL writes for pruning a single chain if
possible.  One idea would be to just mark the item pointer as dead but
not repoint the first item pointer to point to the first live tuple.

Heikki, would that give us speedups similar to what you saw in your
earlier tests?  It would if the majority of the cost of scanning the
page looking for the first live tuple was in doing the visibility tests
on every tuple.

Basically, when we are walking the chain via an index and we see the
tuple is dead we mark it as dead but don't repoint.  We would repoint
and reclaim on a defragementation.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Pavan Deolasee
On 9/11/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
>
>
>
> You could mark such tuples with LP_DELETE. That would also let other
> transactions quickly tot up how much space would be available if they were
> to
> run PageRepairFragmentation.
>
>
>
>

IMHO we are making full circles here. We have already tried LP_DELETE
techniques
and moved away to simplify things. We also tried reusing dead space without
running PageRepairFragmentation. Each of these techniques worked just fine
with slightly different performance characteristics. What we now have is a
simplified algorithm which is much easier to follow and is safer, yet giving
us a very good performance boost. I am not sure if this is the right time to
throw new ideas because we would never be sure as what we are doing
would be the most optimal solution. Would it help if we go with some
solution right now, get rest of the review process done and then use
the feedback during beta testing to tune things ? We may have far more
data points at that time to choose one technique over other. And we
would also know what areas to focus on.

I am also worried that by focusing too much on this issue we may
overlook some other correctness issue in the patch.

>From whatever we have discussed so far, IMHO we should do the following
things and let rest of the review process proceed

- Defragment a page only when the free space left in the page is not
enough to accommodate even a single tuple (use average tuple length for
this decision). This would mean we might be defragmenting even though
there is no immediate UPDATE to the page. But we can treat this as
fillfactor which allows us to provision for the next UPDATE coming to
the page. Since we are defragmenting when the page is almost full
hopefully we would reclaim good amount of space in the page and
won't call defragmentation for next few UPDATEs.

We already have mechanism to track average tuple request size in
relcache. May be we can have some relcache invalidation to keep the
information in sync (send invalidation when the average request size
changes by say 5%)

- Avoid pruning chains in every index or seq lookup. But if the chain
becomes longer than X tuples, mark the page to be pruned in the
next lookup. We can choose to separate prune and defragmentation
and only do pruning in this case. But I would prefer to keep them
together for now.

- Track the minimum xmin in the page header to avoid repeated
(wasted) attempts to prune a Prunable page in the presence of long running
transactions.

We can save rest of the techniques for beta testing period or 8.4.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Bruce Momjian
Pavan Deolasee wrote:
> IMHO we are making full circles here. We have already tried LP_DELETE
> techniques
> and moved away to simplify things. We also tried reusing dead space without
> running PageRepairFragmentation. Each of these techniques worked just fine
> with slightly different performance characteristics. What we now have is a
> simplified algorithm which is much easier to follow and is safer, yet giving
> us a very good performance boost. I am not sure if this is the right time to
> throw new ideas because we would never be sure as what we are doing
> would be the most optimal solution. Would it help if we go with some
> solution right now, get rest of the review process done and then use
> the feedback during beta testing to tune things ? We may have far more
> data points at that time to choose one technique over other. And we
> would also know what areas to focus on.

I totally understand.  I was just throwing out ideas to make sure I
fully understood it and to explore various options.  I do agree we
should just wait for the review, knowing we have these trade-offs.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 22:24 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Tue, 2007-09-11 at 01:22 +0530, Pavan Deolasee wrote:
> >> I think the only difference is that the quick pruning does not mark
> >> intermediate tuples ~LP_USED and hence we may avoid WAL logging.
> 
> > Sounds great.
> 
> What it sounds is utterly unsafe.  You can get away with not WAL-logging
> individual bit flips (that is, hint-bit-setting) because either state of
> the page is valid.  If I read this proposal correctly it is to change
> t_ctid without WAL-logging, which means that a partial page write (torn
> page syndrome) could leave the page undetectably corrupted --- t_ctid
> is 6 bytes and could easily cross a hardware sector boundary.

We can calculate where they are, if thats the objection.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-10 Thread Simon Riggs
On Tue, 2007-09-11 at 09:21 +0530, Pavan Deolasee wrote:

> IMHO we are making full circles here. We have already tried LP_DELETE
> techniques and moved away to simplify things. 

> We can save rest of the techniques for beta testing period or 8.4.

Agreed, we're just fine tuning by reinventing old ideas. HOT works, but
everybody has their own opinion of which subtle factors are important.
Seems like time to find out in Beta.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> What it sounds is utterly unsafe.  You can get away with not WAL-logging
> individual bit flips (that is, hint-bit-setting) because either state of
> the page is valid.  If I read this proposal correctly it is to change
> t_ctid without WAL-logging, which means that a partial page write (torn
> page syndrome) could leave the page undetectably corrupted --- t_ctid
> is 6 bytes and could easily cross a hardware sector boundary.

Well we would never be overwriting the blockid, only the posid which is 2
bytes. And the ctid (and posid) should always be 4-byte aligned. So actually
it would never cross a hardware sector boundary.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Pavan Deolasee
On 9/11/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
>
>
> We're only changing the offsetnumber part of it, which is 2 bytes. That
> shouldn't cross a hardware sector boundary on any reasonable hardware.
>
>
Not entirely true if we consider line pointer redirection, which involves
changing 4 bytes. You can argue that for quick pruning we don't do any
line pointer redirection though.

Also my worry about dealing with unreachable heap-only dead tuples
remain. We may need to do far more work to identify that they are not part
of
any reachable HOT chain and hence can be removed.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Heikki Linnakangas
Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
>> On Tue, 2007-09-11 at 01:22 +0530, Pavan Deolasee wrote:
>>> I think the only difference is that the quick pruning does not mark
>>> intermediate tuples ~LP_USED and hence we may avoid WAL logging.
> 
>> Sounds great.
> 
> What it sounds is utterly unsafe.  You can get away with not WAL-logging
> individual bit flips (that is, hint-bit-setting) because either state of
> the page is valid.  If I read this proposal correctly it is to change
> t_ctid without WAL-logging, which means that a partial page write (torn
> page syndrome) could leave the page undetectably corrupted --- t_ctid
> is 6 bytes and could easily cross a hardware sector boundary.

We're only changing the offsetnumber part of it, which is 2 bytes. That
shouldn't cross a hardware sector boundary on any reasonable hardware.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Pavan Deolasee
On 9/11/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> "Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> > Pruning removes intermediate dead tuples by marking their line pointers
> > ~LP_USED and redirecting the root line pointer to the first
> > live/recently_dead tuple in the chain.
>
> It seems utterly unsafe to do this without a vacuum-grade lock on the
> page.  Someone might be examining the root tuple (eg, in a SnapshotAny
> scan) and I think they are entitled to assume that the line pointer
> stays valid while they are looking at the tuple.
>
>
As the patch stands, we prune holding a vacuum-grade lock. But ISTM
that there are only limited users of SnapshotAny and we make them
recheck the line pointer after acquiring the buffer lock. So hopefully we
should be fine.

Anyways, its not a concern right now because we hold vacuum-grade lock while
pruning. But if we decide to what Heikki/Greg suggested about quick
pruning then we will have to make sure that your concern is addressed.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Pavan Deolasee
On 9/11/07, Pavan Deolasee <[EMAIL PROTECTED]> wrote:
>
>
> - Track the minimum xmin in the page header to avoid repeated
> (wasted) attempts to prune a Prunable page in the presence of long running
> transactions.
>
>
I would actually think twice before even doing this because this would lead
to
complete change in heap page structure and stop people from upgrading
to 8.3 without a complete dump/restore. I don't remember 8.3 introduces any
other
significant change which already enforces that.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> I would actually think twice before even doing this because this would
> lead to complete change in heap page structure and stop people from
> upgrading to 8.3 without a complete dump/restore. I don't remember 8.3
> introduces any other significant change which already enforces that.

Then you don't remember far enough --- either the HeapTupleHeader change
or the varvarlena change would be enough to prevent that.  For that
matter, the pd_flags header field that the patch is already relying on
is new for 8.3.

Adding an xmin field might or might not be a good solution, but to
complain about it on compatibility grounds is silly.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Pavan Deolasee
On 9/11/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> "Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> > I would actually think twice before even doing this because this would
> > lead to complete change in heap page structure and stop people from
> > upgrading to 8.3 without a complete dump/restore. I don't remember 8.3
> > introduces any other significant change which already enforces that.
>
> Then you don't remember far enough --- either the HeapTupleHeader change
> or the varvarlena change would be enough to prevent that.  For that
> matter, the pd_flags header field that the patch is already relying on
> is new for 8.3.
>
>
Oops! I forgot combo command id. That went into last major release of EDB
and that confused me. Regarding pd_flags, I thought somebody can just
fix that in-place and avoid dump/restore (of course, its too invasive, but
isn't it feasible with pg_migrator ?)

Anyways, given this, should we go for storing xmin in the page header ?
For that matter should there be a separate heap page header ?

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Heikki Linnakangas
Tom Lane wrote:
> "Pavan Deolasee" <[EMAIL PROTECTED]> writes:
>> I would actually think twice before even doing this because this would
>> lead to complete change in heap page structure and stop people from
>> upgrading to 8.3 without a complete dump/restore. I don't remember 8.3
>> introduces any other significant change which already enforces that.
> 
> Then you don't remember far enough --- either the HeapTupleHeader change
> or the varvarlena change would be enough to prevent that.  For that
> matter, the pd_flags header field that the patch is already relying on
> is new for 8.3.

Yeah, those changes will need to be dealt with in the conversion. But
none of the changes this far have increased the on-disk size. Adding a
new field to page header means that in some corner cases it might not be
possible to convert a page from old format to the new one because the
data no longer fits.

But let's not hang ourselves on that. I'm sure there's a way to deal
with the page format conversion if we have to. The crucial design
decision for HOT is when to prune and when to defragment the page, so
that when we're doing the UPDATE, there's room in the page.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Then you don't remember far enough --- either the HeapTupleHeader change
>> or the varvarlena change would be enough to prevent that.  For that
>> matter, the pd_flags header field that the patch is already relying on
>> is new for 8.3.

> Yeah, those changes will need to be dealt with in the conversion. But
> none of the changes this far have increased the on-disk size. Adding a
> new field to page header means that in some corner cases it might not be
> possible to convert a page from old format to the new one because the
> data no longer fits.

Counting on my fingers, it seems that the 4-byte savings in
HeapTupleHeader size would guarantee that adding 4 bytes to the page
header wouldn't kill you.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Heikki Linnakangas
Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Then you don't remember far enough --- either the HeapTupleHeader change
>>> or the varvarlena change would be enough to prevent that.  For that
>>> matter, the pd_flags header field that the patch is already relying on
>>> is new for 8.3.
> 
>> Yeah, those changes will need to be dealt with in the conversion. But
>> none of the changes this far have increased the on-disk size. Adding a
>> new field to page header means that in some corner cases it might not be
>> possible to convert a page from old format to the new one because the
>> data no longer fits.
> 
> Counting on my fingers, it seems that the 4-byte savings in
> HeapTupleHeader size would guarantee that adding 4 bytes to the page
> header wouldn't kill you.

It depends on the alignment of the data and null-bitmap. For example:
- a platform with 8 bytes alignment
- the table has 9 columns
- every tuple on the page has a null bitmap
- there is zero bytes left on the page
- no tuples where varvarlen frees up space

The header size after alignment is MAXALIGN(23+2) = 32. Which is the
same as before combo cid, MAXALIGN(27+2) = 32. It's a corner-case, but
it's possible.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2007-09-11 at 09:21 +0530, Pavan Deolasee wrote:
>> IMHO we are making full circles here. We have already tried LP_DELETE
>> techniques and moved away to simplify things. 

>> We can save rest of the techniques for beta testing period or 8.4.

> Agreed, we're just fine tuning by reinventing old ideas. HOT works, but
> everybody has their own opinion of which subtle factors are important.
> Seems like time to find out in Beta.

I am worried by this idea that it's fine to make major changes in HOT
after beta starts, or that the attentions of beta testers will allow us
to collect data on variant implementations.  Neither is the case.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT patch - version 15

2007-09-11 Thread Simon Riggs
On Tue, 2007-09-11 at 16:10 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Tue, 2007-09-11 at 09:21 +0530, Pavan Deolasee wrote:
> >> IMHO we are making full circles here. We have already tried LP_DELETE
> >> techniques and moved away to simplify things. 
> 
> >> We can save rest of the techniques for beta testing period or 8.4.
> 
> > Agreed, we're just fine tuning by reinventing old ideas. HOT works, but
> > everybody has their own opinion of which subtle factors are important.
> > Seems like time to find out in Beta.
> 
> I am worried by this idea that it's fine to make major changes in HOT
> after beta starts, or that the attentions of beta testers will allow us
> to collect data on variant implementations.  Neither is the case.

Understood.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT patch - version 15

2007-09-12 Thread Pavan Deolasee
On 9/11/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
>
>
> The crucial design
> decision for HOT is when to prune and when to defragment the page, so
> that when we're doing the UPDATE, there's room in the page.
>
>
Right.

For defragmentation, I am still inclined towards doing it when we see
that the free space is less than (or slightly more than) the average tuple
size of the table - unless we have a better solution.

For pruning, we can do the quick pruning. But I won't feel comfortable
doing it without an exclusive lock on the buffer. Also I would avoid
line pointer redirection during quick prune. A simpler solution would
be to flag the page whenever HOT chain becomes longer than, say
5, (which can easily be detected in heap_hot_fetch) and prune it in
the next lookup. Hopefully we would only rarely have long HOT chains
and if at all we have them, we will have some mechanism to recover
from it.

Any other ideas ?

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


  1   2   >