On Thu, Aug 4, 2016 at 04:29:09PM +0530, Pavan Deolasee wrote: > Write Amplification Reduction Method (WARM) > ==================================== > > A few years back, we developed HOT to address the problem associated with MVCC > and frequent updates and it has served us very well. But in the light of > Uber's > recent technical blog highlighting some of the problems that are still > remaining, especially for workloads where HOT does not help to the full > extent, > Simon, myself and others at 2ndQuadrant discussed several ideas and what I > present below is an outcome of that. This is not to take away credit from > anybody else. Others may have thought about similar ideas, but I haven’t seen > any concrete proposal so far.
HOT was a huge win for Postgres and I am glad you are reviewing improvements. > This method succeeds in reducing the write amplification, but causes other > issues which also need to be solved. WARM breaks the invariant that all tuples > in a HOT chain have the same index values and so an IndexScan would need to > re-check the index scan conditions against the visible tuple returned from > heap_hot_search(). We must still check visibility, so we only need to re-check > scan conditions on that one tuple version. > > We don’t want to force a recheck for every index fetch because that will slow > everything down. So we would like a simple and efficient way of knowing about > the existence of a WARM tuple in a chain and do a recheck in only those cases, > ideally O(1). Having a HOT chain contain a WARM tuple is discussed below as > being a “WARM chain”, implying it needs re-check. In summary, we are already doing visibility checks on the HOT chain, so a recheck if the heap tuple matches the index value is only done at most on the one visible tuple in the chain, not ever tuple in the chain. > 2. Mark the root line pointer (or the root tuple) with a special > HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in > the > chain. Since all indexes point to the root line pointer, it should be enough > to > just mark the root line pointer (or tuple) with this flag. Yes, I think #2 is the easiest. Also, if we modify the index page, we would have to WAL the change and possibly WAL log the full page write of the index page. :-( > Approach 2 seems more reasonable and simple. > > There are only 2 bits for lp_flags and all combinations are already used. But > for LP_REDIRECT root line pointer, we could use the lp_len field to store this > special flag, which is not used for LP_REDIRECT line pointers. So we are able > to mark the root line pointer. Uh, as I understand it, we only use LP_REDIRECT when we have _removed_ the tuple that the ctid was pointing to, but it seems you would need to set HEAP_RECHECK_REQUIRED earlier than that. Also, what is currently in the lp_len field for LP_REDIRECT? Zeros, or random data? I am asking for pg_upgrade purposes. > One idea is to somehow do this as part of the vacuum where we collect root > line > pointers of WARM chains during the first phase of heap scan, check the > indexes > for all such tuples (may be combined with index vacuum) and then clear the > heap > flags during the second pass, unless new tuples are added to the WARM chain. > We > can detect that by checking that all tuples in the WARM chain still have XID > less than the OldestXmin that VACUUM is using. Yes, it seems natural to clear the ctid HEAP_RECHECK_REQUIRED flag where you are adjusting the HOT chain anyway. > It’s important to ensure that the flag is set when it is absolutely necessary, > while having false positives is not a problem. We might do a little wasteful > work if the flag is incorrectly set. Since flag will be set only during > heap_update() and the operation is already WAL logged, this can be piggybacked > with the heap_update WAL record. Similarly, when a root tuple is pruned to a > redirect line pointer, the operation is already WAL logged and we can > piggyback > setting of line pointer flag with that WAL record. > > Flag clearing need not be WAL logged, unless we can piggyback that to some > existing WAL logging. Agreed, good point. Very nice! -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers