On Thu, Aug  4, 2016 at 04:29:09PM +0530, Pavan Deolasee wrote:
> Write Amplification Reduction Method (WARM)
> ====================================
> 
> A few years back, we developed HOT to address the problem associated with MVCC
> and frequent updates and it has served us very well. But in the light of 
> Uber's
> recent technical blog highlighting some of the problems that are still
> remaining, especially for workloads where HOT does not help to the full 
> extent,
> Simon, myself and others at 2ndQuadrant discussed several ideas and what I
> present below is an outcome of that. This is not to take away credit from
> anybody else. Others may have thought about similar ideas, but I haven’t seen
> any concrete proposal so far.

HOT was a huge win for Postgres and I am glad you are reviewing
improvements.

> This method succeeds in reducing the write amplification, but causes other
> issues which also need to be solved. WARM breaks the invariant that all tuples
> in a HOT chain have the same index values and so an IndexScan would need to
> re-check the index scan conditions against the visible tuple returned from
> heap_hot_search(). We must still check visibility, so we only need to re-check
> scan conditions on that one tuple version.
>   
> We don’t want to force a recheck for every index fetch because that will slow
> everything down. So we would like a simple and efficient way of knowing about
> the existence of a WARM tuple in a chain and do a recheck in only those cases,
> ideally O(1). Having a HOT chain contain a WARM tuple is discussed below as
> being a “WARM chain”, implying it needs re-check.

In summary, we are already doing visibility checks on the HOT chain, so
a recheck if the heap tuple matches the index value is only done at most
on the one visible tuple in the chain, not ever tuple in the chain.

> 2. Mark the root line pointer (or the root tuple) with a special
> HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in 
> the
> chain. Since all indexes point to the root line pointer, it should be enough 
> to
> just mark the root line pointer (or tuple) with this flag. 

Yes, I think #2 is the easiest.  Also, if we modify the index page, we
would have to WAL the change and possibly WAL log the full page write
of the index page.  :-(

> Approach 2 seems more reasonable and simple. 
> 
> There are only 2 bits for lp_flags and all combinations are already used. But
> for LP_REDIRECT root line pointer, we could use the lp_len field to store this
> special flag, which is not used for LP_REDIRECT line pointers. So we are able
> to mark the root line pointer.

Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.

Also, what is currently in the lp_len field for LP_REDIRECT?  Zeros, or
random data?  I am asking for pg_upgrade purposes.

> One idea is to somehow do this as part of the vacuum where we collect root 
> line
> pointers of  WARM chains during the first phase of heap scan, check the 
> indexes
> for all such tuples (may be combined with index vacuum) and then clear the 
> heap
> flags during the second pass, unless new tuples are added to the WARM chain. 
> We
> can detect that by checking that all tuples in the WARM chain still have XID
> less than the OldestXmin that VACUUM is using.

Yes, it seems natural to clear the ctid HEAP_RECHECK_REQUIRED flag where
you are adjusting the HOT chain anyway.

> It’s important to ensure that the flag is set when it is absolutely necessary,
> while having false positives is not a problem. We might do a little wasteful
> work if the flag is incorrectly set. Since flag will be set only during
> heap_update() and the operation is already WAL logged, this can be piggybacked
> with the heap_update WAL record. Similarly, when a root tuple is pruned to a
> redirect line pointer, the operation is already WAL logged and we can 
> piggyback
> setting of line pointer flag with that WAL record.
> 
> Flag clearing need not be WAL logged, unless we can piggyback that to some
> existing WAL logging.

Agreed, good point.

Very nice!

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to