On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko <sawada.m...@gmail.com> wrote:
> On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
>> On 4/6/15 5:18 PM, Greg Stark wrote:
>>>
>>> Only I would suggest thinking of it in terms of two orthogonal boolean
>>> flags rather than three states. It's easier to reason about whether a
>>> table has a specific property than trying to control a state machine in
>>> a predefined pathway.
>>>
>>> So I would say the two flags are:
>>> READONLY: guarantees nothing can be dirtied
>>> ALLFROZEN: guarantees no unfrozen tuples are present
>>>
>>> In practice you can't have the later without the former since vacuum
>>> can't know everything is frozen unless it knows nobody is inserting. But
>>> perhaps there will be cases in the future where that's not true.
>>
>>
>> I'm not so sure about that. There's a logical state progression here (see
>> below). ISTM it's easier to just enforce that in one place instead of a
>> bunch of places having to check multiple conditions. But, I'm not wed to a
>> single field.
>>
>>> Incidentally there are number of other optimisations tat over had in
>>> mind that are only possible on frozen read-only tables:
>>>
>>> 1) Compression: compress the pages and pack them one after the other.
>>> Build a new fork with offsets for each page.
>>>
>>> 2) Automatic partition elimination where the statistics track the
>>> minimum and maximum value per partition (and number of tuples) and treat
>>> then as implicit constraints. In particular it would magically make read
>>> only empty parent partitions be excluded regardless of the where clause.
>>
>>
>> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
>> uncompact and re-compact for #1 when you actually freeze (which maybe isn't
>> worth it), but freezing isn't absolutely required. #2 would only require
>> that everything in the relation is visible; not frozen.
>>
>> I think there's value here to having an ALLVISIBLE state as well as
>> ALLFROZEN.
>>
>
> Based on may suggestions, I'm going to deal with FM at first as one
> patch. It would be simply mechanism and similar to VM, at first patch.
> - Each bit of FM represent single page
> - The bit is set only by vacuum
> - The bit is un-set by inserting and updating and deleting
>
> At second, I'll deal with simply read-only table and 2 states,
> Read/Write(default) and ReadOnly as one patch. ITSM the having the
> Frozen state needs to more discussion. read-only table just allow us
> to disable any updating table, and it's controlled by read-only flag
> pg_class has. And DDL command which changes these status is like ALTER
> TABLE SET READ ONLY, or READ WRITE.
> Also as Alvaro's suggested, the read-only table affect not only
> freezing table but also performance optimization. I'll consider
> including them when I deal with read-only table.
>

Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.

Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.

Current patch adds new source file src/backend/access/heap/frozenmap.c
which is quite similar to visibilitymap.c. They have similar code but
are separated for now. I do refactoring these source code like adding
bitmap.c, if needed.
Also, when skipping vacuum by visibility map, we can skip at least
SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
frozen map.

Please give me feedbacks.

Regards,

-------
Sawada Masahiko
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..53f07fd 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o frozenmap.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/frozenmap.c b/src/backend/access/heap/frozenmap.c
new file mode 100644
index 0000000..6e64cb8
--- /dev/null
+++ b/src/backend/access/heap/frozenmap.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * frozenmap.c
+ *	  bitmap for tracking frozen heap tuples
+ *
+ * Portions Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/heap/frozenmap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/frozenmap.h"
+#include "access/heapam_xlog.h"
+#include "access/xlog.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/smgr.h"
+#include "utils/inval.h"
+
+
+//#define TRACE_FROZENMAP
+
+/*
+ * Size of the bitmap on each frozen map page, in bytes. There's no
+ * extra headers, so the whole page minus the standard page header is
+ * used for the bitmap.
+ */
+#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
+
+/* Number of bits allocated for each heap block. */
+#define BITS_PER_HEAPBLOCK 1
+
+/* Number of heap blocks we can represent in one byte. */
+#define HEAPBLOCKS_PER_BYTE 8
+
+/* Number of heap blocks we can represent in one frozen map page. */
+#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)
+
+/* Mapping from heap block number to the right bit in the frozen map */
+#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
+#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)
+#define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
+
+/* table for fast counting of set bits */
+static const uint8 number_of_ones[256] = {
+	0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
+	1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+	1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+	1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+	2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+	3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+	3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+	4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
+};
+
+/* prototypes for internal routines */
+static Buffer fm_readbuf(Relation rel, BlockNumber blkno, bool extend);
+static void fm_extend(Relation rel, BlockNumber nfmblocks);
+
+
+/*
+ *	frozenmap_clear - clear a bit in frozen map
+ *
+ * This function is same logic as visibilitymap_clear.
+ * You must pass a buffer containing the correct map page to this function.
+ * Call frozenmap_pin first to pin the right one. This function doesn't do
+ * any I/O.
+ */
+void
+frozenmap_clear(Relation rel, BlockNumber heapBlk, Buffer buf)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	int			mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	int			mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+	uint8		mask = 1 << mapBit;
+	char	   *map;
+
+#ifdef TRACE_FROZENMAP
+	elog(DEBUG1, "fm_clear %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	if (!BufferIsValid(buf) || BufferGetBlockNumber(buf) != mapBlock)
+		elog(ERROR, "wrong buffer passed to frozenmap_clear");
+
+	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+	map = PageGetContents(BufferGetPage(buf));
+
+	if (map[mapByte] & mask)
+	{
+		map[mapByte] &= ~mask;
+
+		MarkBufferDirty(buf);
+	}
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+}
+
+/*
+ *	frozenmap_pin - pin a map page for setting a bit
+ *
+ * This function is same logic as visibilitymap_pin.
+ * Setting a bit in the frozen map is a two-phase operation. First, call
+ * frozenmap_pin, to pin the frozen map page containing the bit for
+ * the heap page. Because that can require I/O to read the map page, you
+ * shouldn't hold a lock on the heap page while doing that. Then, call
+ * frozenmap_set to actually set the bit.
+ *
+ * On entry, *buf should be InvalidBuffer or a valid buffer returned by
+ * an earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation. On return, *buf is a valid buffer with the map page containing
+ * the bit for heapBlk.
+ *
+ * If the page doesn't exist in the map file yet, it is extended.
+ */
+void
+frozenmap_pin(Relation rel, BlockNumber heapBlk, Buffer *buf)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+
+	/* Reuse the old pinned buffer if possible */
+	if (BufferIsValid(*buf))
+	{
+		if (BufferGetBlockNumber(*buf) == mapBlock)
+			return;
+
+		ReleaseBuffer(*buf);
+	}
+	*buf = fm_readbuf(rel, mapBlock, true);
+}
+
+/*
+ *	frozenmap_pin_ok - do we already have the correct page pinned?
+ *
+ * On entry, buf should be InvalidBuffer or a valid buffer returned by
+ * an earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation.  The return value indicates whether the buffer covers the
+ * given heapBlk.
+ */
+bool
+frozenmap_pin_ok(BlockNumber heapBlk, Buffer buf)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+
+	return BufferIsValid(buf) && BufferGetBlockNumber(buf) == mapBlock;
+}
+
+/*
+ *	frozenmap_set - set a bit on a previously pinned page
+ *
+ * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
+ * or InvalidXLogRecPtr in normal running.  The page LSN is advanced to the
+ * one provided; in normal running, we generate a new XLOG record and set the
+ * page LSN to that value.  cutoff_xid is the largest xmin on the page being
+ * marked all-frozen; it is needed for Hot Standby, and can be
+ * InvalidTransactionId if the page contains no tuples.
+ *
+ * Caller is expected to set the heap page's PD_ALL_FROZEN bit before calling
+ * this function. Except in recovery, caller should also pass the heap
+ * buffer. When checksums are enabled and we're not in recovery, we must add
+ * the heap buffer to the WAL chain to protect it from being torn.
+ *
+ * You must pass a buffer containing the correct map page to this function.
+ * Call frozenmap_pin first to pin the right one. This function doesn't do
+ * any I/O.
+ */
+void
+frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+				  XLogRecPtr recptr, Buffer fmBuf)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+	Page		page;
+	char	   *map;
+
+#ifdef TRACE_FROZENMAP
+	elog(DEBUG1, "fm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	Assert(InRecovery || BufferIsValid(heapBuf));
+
+	/* Check that we have the right heap page pinned, if present */
+	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
+		elog(ERROR, "wrong heap buffer passed to frozenmap_set");
+
+	/* Check that we have the right VM page pinned */
+	if (!BufferIsValid(fmBuf) || BufferGetBlockNumber(fmBuf) != mapBlock)
+		elog(ERROR, "wrong FM buffer passed to frozenmap_set");
+
+	page = BufferGetPage(fmBuf);
+	map = PageGetContents(page);
+	LockBuffer(fmBuf, BUFFER_LOCK_EXCLUSIVE);
+
+	if (!(map[mapByte] & (1 << mapBit)))
+	{
+		START_CRIT_SECTION();
+
+		map[mapByte] |= (1 << mapBit);
+		MarkBufferDirty(fmBuf);
+
+		if (RelationNeedsWAL(rel))
+		{
+			if (XLogRecPtrIsInvalid(recptr))
+			{
+				Assert(!InRecovery);
+				recptr = log_heap_frozenmap(rel->rd_node, heapBuf, fmBuf);
+
+				/*
+				 * If data checksums are enabled (or wal_log_hints=on), we
+				 * need to protect the heap page from being torn.
+				 */
+				if (XLogHintBitIsNeeded())
+				{
+					Page		heapPage = BufferGetPage(heapBuf);
+
+					/* caller is expected to set PD_ALL_FROZEN first */
+					Assert(PageIsAllFrozen(heapPage));
+					PageSetLSN(heapPage, recptr);
+				}
+			}
+			PageSetLSN(page, recptr);
+		}
+
+		END_CRIT_SECTION();
+	}
+
+	LockBuffer(fmBuf, BUFFER_LOCK_UNLOCK);
+}
+
+/*
+ *	frozenmap_test - test if a bit is set
+ *
+ * Are all tuples on heapBlk frozen to all, according to the frozen map?
+ *
+ * On entry, *buf should be InvalidBuffer or a valid buffer returned by an
+ * earlier call to frozenmap_pin or frozenmap_test on the same
+ * relation. On return, *buf is a valid buffer with the map page containing
+ * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
+ * releasing *buf after it's done testing and setting bits.
+ *
+ * NOTE: This function is typically called without a lock on the heap page,
+ * so somebody else could change the bit just after we look at it.  In fact,
+ * since we don't lock the frozen map page either, it's even possible that
+ * someone else could have changed the bit just before we look at it, but yet
+ * we might see the old value.  It is the caller's responsibility to deal with
+ * all concurrency issues!
+ */
+bool
+frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *buf)
+{
+	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+	uint8		mapBit = HEAPBLK_TO_MAPBIT(heapBlk);
+	bool		result;
+	char	   *map;
+
+#ifdef TRACE_FROZENMAP
+	elog(DEBUG1, "fm_test %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+	/* Reuse the old pinned buffer if possible */
+	if (BufferIsValid(*buf))
+	{
+		if (BufferGetBlockNumber(*buf) != mapBlock)
+		{
+			ReleaseBuffer(*buf);
+			*buf = InvalidBuffer;
+		}
+	}
+
+	if (!BufferIsValid(*buf))
+	{
+		*buf = fm_readbuf(rel, mapBlock, false);
+		if (!BufferIsValid(*buf))
+			return false;
+	}
+
+	map = PageGetContents(BufferGetPage(*buf));
+
+	/*
+	 * A single-bit read is atomic.  There could be memory-ordering effects
+	 * here, but for performance reasons we make it the caller's job to worry
+	 * about that.
+	 */
+	result = (map[mapByte] & (1 << mapBit)) ? true : false;
+
+	return result;
+}
+
+/*
+ *	frozenmap_count  - count number of bits set in frozen map
+ *
+ * Note: we ignore the possibility of race conditions when the table is being
+ * extended concurrently with the call.  New pages added to the table aren't
+ * going to be marked all-frozen, so they won't affect the result.
+ */
+BlockNumber
+frozenmap_count(Relation rel)
+{
+	BlockNumber result = 0;
+	BlockNumber mapBlock;
+
+	for (mapBlock = 0;; mapBlock++)
+	{
+		Buffer		mapBuffer;
+		unsigned char *map;
+		int			i;
+
+		/*
+		 * Read till we fall off the end of the map.  We assume that any extra
+		 * bytes in the last page are zeroed, so we don't bother excluding
+		 * them from the count.
+		 */
+		mapBuffer = fm_readbuf(rel, mapBlock, false);
+		if (!BufferIsValid(mapBuffer))
+			break;
+
+		/*
+		 * We choose not to lock the page, since the result is going to be
+		 * immediately stale anyway if anyone is concurrently setting or
+		 * clearing bits, and we only really need an approximate value.
+		 */
+		map = (unsigned char *) PageGetContents(BufferGetPage(mapBuffer));
+
+		for (i = 0; i < MAPSIZE; i++)
+		{
+			result += number_of_ones[map[i]];
+		}
+
+		ReleaseBuffer(mapBuffer);
+	}
+
+	return result;
+}
+
+/*
+ *	frozenmap_truncate - truncate the frozen map
+ *
+ * The caller must hold AccessExclusiveLock on the relation, to ensure that
+ * other backends receive the smgr invalidation event that this function sends
+ * before they access the VM again.
+ *
+ * nheapblocks is the new size of the heap.
+ */
+void
+frozenmap_truncate(Relation rel, BlockNumber nheapblocks)
+{
+	BlockNumber newnblocks;
+
+	/* last remaining block, byte, and bit */
+	BlockNumber truncBlock = HEAPBLK_TO_MAPBLOCK(nheapblocks);
+	uint32		truncByte = HEAPBLK_TO_MAPBYTE(nheapblocks);
+	uint8		truncBit = HEAPBLK_TO_MAPBIT(nheapblocks);
+
+#ifdef TRACE_FROZENMAP
+	elog(DEBUG1, "fm_truncate %s %d", RelationGetRelationName(rel), nheapblocks);
+#endif
+
+	RelationOpenSmgr(rel);
+
+	/*
+	 * If no frozen map has been created yet for this relation, there's
+	 * nothing to truncate.
+	 */
+	if (!smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+		return;
+
+	/*
+	 * Unless the new size is exactly at a frozen map page boundary, the
+	 * tail bits in the last remaining map page, representing truncated heap
+	 * blocks, need to be cleared. This is not only tidy, but also necessary
+	 * because we don't get a chance to clear the bits if the heap is extended
+	 * again.
+	 */
+	if (truncByte != 0 || truncBit != 0)
+	{
+		Buffer		mapBuffer;
+		Page		page;
+		char	   *map;
+
+		newnblocks = truncBlock + 1;
+
+		mapBuffer = fm_readbuf(rel, truncBlock, false);
+		if (!BufferIsValid(mapBuffer))
+		{
+			/* nothing to do, the file was already smaller */
+			return;
+		}
+
+		page = BufferGetPage(mapBuffer);
+		map = PageGetContents(page);
+
+		LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Clear out the unwanted bytes. */
+		MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1));
+
+		/*----
+		 * Mask out the unwanted bits of the last remaining byte.
+		 *
+		 * ((1 << 0) - 1) = 00000000
+		 * ((1 << 1) - 1) = 00000001
+		 * ...
+		 * ((1 << 6) - 1) = 00111111
+		 * ((1 << 7) - 1) = 01111111
+		 *----
+		 */
+		map[truncByte] &= (1 << truncBit) - 1;
+
+		MarkBufferDirty(mapBuffer);
+		UnlockReleaseBuffer(mapBuffer);
+	}
+	else
+		newnblocks = truncBlock;
+
+	if (smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM) <= newnblocks)
+	{
+		/* nothing to do, the file was already smaller than requested size */
+		return;
+	}
+
+	/* Truncate the unused VM pages, and send smgr inval message */
+	smgrtruncate(rel->rd_smgr, FROZENMAP_FORKNUM, newnblocks);
+
+	/*
+	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
+	 * sent an smgr cache inval message, which will cause other backends to
+	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
+	 * command boundary.  But this ensures it isn't outright wrong until then.
+	 */
+	if (rel->rd_smgr)
+		rel->rd_smgr->smgr_fm_nblocks = newnblocks;
+}
+
+/*
+ * Read a frozen map page.
+ *
+ * If the page doesn't exist, InvalidBuffer is returned, or if 'extend' is
+ * true, the frozen map file is extended.
+ */
+static Buffer
+fm_readbuf(Relation rel, BlockNumber blkno, bool extend)
+{
+	Buffer		buf;
+
+	/*
+	 * We might not have opened the relation at the smgr level yet, or we
+	 * might have been forced to close it by a sinval message.  The code below
+	 * won't necessarily notice relation extension immediately when extend =
+	 * false, so we rely on sinval messages to ensure that our ideas about the
+	 * size of the map aren't too far out of date.
+	 */
+	RelationOpenSmgr(rel);
+
+	/*
+	 * If we haven't cached the size of the frozen map fork yet, check it
+	 * first.
+	 */
+	if (rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber)
+	{
+		if (smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+			rel->rd_smgr->smgr_fm_nblocks = smgrnblocks(rel->rd_smgr,
+													  FROZENMAP_FORKNUM);
+		else
+			rel->rd_smgr->smgr_fm_nblocks = 0;
+	}
+
+	/* Handle requests beyond EOF */
+	if (blkno >= rel->rd_smgr->smgr_fm_nblocks)
+	{
+		if (extend)
+			fm_extend(rel, blkno + 1);
+		else
+			return InvalidBuffer;
+	}
+
+	/*
+	 * Use ZERO_ON_ERROR mode, and initialize the page if necessary. It's
+	 * always safe to clear bits, so it's better to clear corrupt pages than
+	 * error out.
+	 */
+	buf = ReadBufferExtended(rel, FROZENMAP_FORKNUM, blkno,
+							 RBM_ZERO_ON_ERROR, NULL);
+	if (PageIsNew(BufferGetPage(buf)))
+		PageInit(BufferGetPage(buf), BLCKSZ, 0);
+	return buf;
+}
+
+/*
+ * Ensure that the frozen map fork is at least vm_nblocks long, extending
+ * it if necessary with zeroed pages.
+ */
+static void
+fm_extend(Relation rel, BlockNumber fm_nblocks)
+{
+	BlockNumber fm_nblocks_now;
+	Page		pg;
+
+	pg = (Page) palloc(BLCKSZ);
+	PageInit(pg, BLCKSZ, 0);
+
+	/*
+	 * We use the relation extension lock to lock out other backends trying to
+	 * extend the frozen map at the same time. It also locks out extension
+	 * of the main fork, unnecessarily, but extending the frozen map
+	 * happens seldom enough that it doesn't seem worthwhile to have a
+	 * separate lock tag type for it.
+	 *
+	 * Note that another backend might have extended or created the relation
+	 * by the time we get the lock.
+	 */
+	LockRelationForExtension(rel, ExclusiveLock);
+
+	/* Might have to re-open if a cache flush happened */
+	RelationOpenSmgr(rel);
+
+	/*
+	 * Create the file first if it doesn't exist.  If smgr_vm_nblocks is
+	 * positive then it must exist, no need for an smgrexists call.
+	 */
+	if ((rel->rd_smgr->smgr_fm_nblocks == 0 ||
+		 rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber) &&
+		!smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM))
+		smgrcreate(rel->rd_smgr, FROZENMAP_FORKNUM, false);
+
+	fm_nblocks_now = smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM);
+
+	/* Now extend the file */
+	while (fm_nblocks_now < fm_nblocks)
+	{
+		PageSetChecksumInplace(pg, fm_nblocks_now);
+
+		smgrextend(rel->rd_smgr, FROZENMAP_FORKNUM, fm_nblocks_now,
+				   (char *) pg, false);
+		fm_nblocks_now++;
+	}
+
+	/*
+	 * Send a shared-inval message to force other backends to close any smgr
+	 * references they may have for this rel, which we are about to change.
+	 * This is a useful optimization because it means that backends don't have
+	 * to keep checking for creation or extension of the file, which happens
+	 * infrequently.
+	 */
+	CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);
+
+	/* Update local cache with the up-to-date size */
+	rel->rd_smgr->smgr_fm_nblocks = fm_nblocks_now;
+
+	UnlockRelationForExtension(rel, ExclusiveLock);
+
+	pfree(pg);
+}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..7f7c147 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -38,6 +38,7 @@
  */
 #include "postgres.h"
 
+#include "access/frozenmap.h"
 #include "access/heapam.h"
 #include "access/heapam_xlog.h"
 #include "access/hio.h"
@@ -86,7 +87,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup,
 				HeapTuple newtup, HeapTuple old_key_tup,
-				bool all_visible_cleared, bool new_all_visible_cleared);
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 							 Bitmapset *hot_attrs,
 							 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2067,8 +2069,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
-	Buffer		vmbuffer = InvalidBuffer;
+	Buffer		vmbuffer = InvalidBuffer,
+				fmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
+	bool		all_frozen_cleared;
 
 	/*
 	 * Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2092,12 +2096,14 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * of all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 									   InvalidBuffer, options, bistate,
-									   &vmbuffer, NULL);
+									   &vmbuffer, NULL,
+									   &fmbuffer, NULL);
 
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
@@ -2113,6 +2119,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 							vmbuffer);
 	}
 
+	if (PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		frozenmap_clear(relation,
+						ItemPointerGetBlockNumber(&(heaptup->t_self)),
+						fmbuffer);
+	}
+
 	/*
 	 * XXX Should we set PageSetPrunable on this page ?
 	 *
@@ -2157,6 +2172,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 		xlrec.offnum = ItemPointerGetOffsetNumber(&heaptup->t_self);
 		xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+		if (all_frozen_cleared)
+			xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
 		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
 
 		/*
@@ -2199,6 +2216,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	UnlockReleaseBuffer(buffer);
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
+	if (fmbuffer != InvalidBuffer)
+		ReleaseBuffer(fmbuffer);
 
 	/*
 	 * If tuple is cachable, mark it for invalidation from the caches in case
@@ -2346,8 +2365,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		Buffer		vmbuffer = InvalidBuffer,
+					fmbuffer = InvalidBuffer;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_cleared = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2358,7 +2379,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
-										   &vmbuffer, NULL);
+										   &vmbuffer, NULL,
+										   &fmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -2395,6 +2417,15 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 								vmbuffer);
 		}
 
+		if (PageIsAllFrozen(page))
+		{
+			all_frozen_cleared = true;
+			PageClearAllFrozen(page);
+			frozenmap_clear(relation,
+							BufferGetBlockNumber(buffer),
+							fmbuffer);
+		}
+
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
 		 */
@@ -2437,6 +2468,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			tupledata = scratchptr;
 
 			xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+			if (all_frozen_cleared)
+				xlrec->flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2509,6 +2542,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		UnlockReleaseBuffer(buffer);
 		if (vmbuffer != InvalidBuffer)
 			ReleaseBuffer(vmbuffer);
+		if (fmbuffer != InvalidBuffer)
+			ReleaseBuffer(fmbuffer);
 
 		ndone += nthispage;
 	}
@@ -3053,7 +3088,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	Buffer		buffer,
 				newbuf,
 				vmbuffer = InvalidBuffer,
-				vmbuffer_new = InvalidBuffer;
+				vmbuffer_new = InvalidBuffer,
+				fmbuffer = InvalidBuffer,
+				fmbuffer_new = InvalidBuffer;
 	bool		need_toast,
 				already_marked;
 	Size		newtupsize,
@@ -3067,6 +3104,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	bool		key_intact;
 	bool		all_visible_cleared = false;
 	bool		all_visible_cleared_new = false;
+	bool		all_frozen_cleared = false;
+	bool		all_frozen_cleared_new = false;
 	bool		checked_lockers;
 	bool		locker_remains;
 	TransactionId xmax_new_tuple,
@@ -3100,14 +3139,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	page = BufferGetPage(buffer);
 
 	/*
-	 * Before locking the buffer, pin the visibility map page if it appears to
-	 * be necessary.  Since we haven't got the lock yet, someone else might be
-	 * in the middle of changing this, so we'll need to recheck after we have
-	 * the lock.
+	 * Before locking the buffer, pin the visibility map and frozen map page
+	 * if it appears to be necessary.  Since we haven't got the lock yet,
+	 * someone else might be in the middle of changing this, so we'll need to
+	 * recheck after we have the lock.
 	 */
 	if (PageIsAllVisible(page))
 		visibilitymap_pin(relation, block, &vmbuffer);
 
+	if (PageIsAllFrozen(page))
+		frozenmap_pin(relation, block, &fmbuffer);
+
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
@@ -3390,19 +3432,21 @@ l2:
 			UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
 		if (vmbuffer != InvalidBuffer)
 			ReleaseBuffer(vmbuffer);
+		if (fmbuffer_new != InvalidBuffer)
+			ReleaseBuffer(fmbuffer);
 		bms_free(hot_attrs);
 		bms_free(key_attrs);
 		return result;
 	}
 
 	/*
-	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, or during some
-	 * subsequent window during which we had it unlocked, we'll have to unlock
-	 * and re-lock, to avoid holding the buffer lock across an I/O.  That's a
-	 * bit unfortunate, especially since we'll now have to recheck whether the
-	 * tuple has been locked or updated under us, but hopefully it won't
-	 * happen very often.
+	 * If we didn't pin the visibility(and frozen) map page and the page has
+	 * become all visible(and frozen) while we were busy locking the buffer,
+	 * or during some subsequent window during which we had it unlocked,
+	 * we'll have to unlock and re-lock, to avoid holding the buffer lock
+	 * across an I/O.  That's a bit unfortunate, especially since we'll now
+	 * have to recheck whether the tuple has been locked or updated under us,
+	 * but hopefully it won't happen very often.
 	 */
 	if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 	{
@@ -3412,6 +3456,15 @@ l2:
 		goto l2;
 	}
 
+	if (fmbuffer == InvalidBuffer && PageIsAllFrozen(page))
+	{
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		frozenmap_pin(relation, block, &fmbuffer);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		goto l2;
+
+	}
+
 	/*
 	 * We're about to do the actual update -- check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -3570,7 +3623,8 @@ l2:
 			/* Assume there's no chance to put heaptup on same page. */
 			newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 											   buffer, 0, NULL,
-											   &vmbuffer_new, &vmbuffer);
+											   &vmbuffer_new, &vmbuffer,
+											   &fmbuffer_new, &fmbuffer);
 		}
 		else
 		{
@@ -3588,7 +3642,8 @@ l2:
 				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
-												   &vmbuffer_new, &vmbuffer);
+												   &vmbuffer_new, &vmbuffer,
+												   &fmbuffer_new, &fmbuffer);
 			}
 			else
 			{
@@ -3713,6 +3768,22 @@ l2:
 							vmbuffer_new);
 	}
 
+	/* clear PD_ALL_FROZEN flags */
+	if (newbuf == buffer && PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		frozenmap_clear(relation, BufferGetBlockNumber(buffer),
+						fmbuffer);
+	}
+	else if (newbuf != buffer && PageIsAllFrozen(BufferGetPage(newbuf)))
+	{
+		all_frozen_cleared_new = true;
+		PageClearAllFrozen(BufferGetPage(newbuf));
+		frozenmap_clear(relation, BufferGetBlockNumber(newbuf),
+						fmbuffer_new);
+	}
+
 	if (newbuf != buffer)
 		MarkBufferDirty(newbuf);
 	MarkBufferDirty(buffer);
@@ -3736,7 +3807,9 @@ l2:
 								 newbuf, &oldtup, heaptup,
 								 old_key_tuple,
 								 all_visible_cleared,
-								 all_visible_cleared_new);
+								 all_visible_cleared_new,
+								 all_frozen_cleared,
+								 all_frozen_cleared_new);
 		if (newbuf != buffer)
 		{
 			PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -3768,6 +3841,10 @@ l2:
 		ReleaseBuffer(vmbuffer_new);
 	if (BufferIsValid(vmbuffer))
 		ReleaseBuffer(vmbuffer);
+	if (BufferIsValid(fmbuffer_new))
+		ReleaseBuffer(fmbuffer_new);
+	if (BufferIsValid(fmbuffer))
+		ReleaseBuffer(fmbuffer);
 
 	/*
 	 * Release the lmgr tuple lock, if we had it.
@@ -6534,6 +6611,34 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
 }
 
 /*
+ * Perform XLogInsert for a heap-all-frozen operation. heap_buffer is the block
+ * being marked all-frozen, and fm_buffer is the buffer containing the
+ * corresponding frozen map block. Both should have already been modified and dirty.
+ */
+XLogRecPtr
+log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer, Buffer fm_buffer)
+{
+	XLogRecPtr	recptr;
+	uint8		flags;
+
+	Assert(BufferIsValid(heap_buffer));
+	Assert(BufferIsValid(fm_buffer));
+
+	XLogBeginInsert();
+
+	XLogRegisterBuffer(0, fm_buffer, 0);
+
+	flags = REGBUF_STANDARD;
+	if (!XLogHintBitIsNeeded())
+		flags |= REGBUF_NO_IMAGE;
+	XLogRegisterBuffer(1, heap_buffer, flags);
+
+	recptr = XLogInsert(RM_HEAP3_ID, XLOG_HEAP3_FROZENMAP);
+
+	return recptr;
+}
+
+/*
  * Perform XLogInsert for a heap-visible operation.  'block' is the block
  * being marked all-visible, and vm_buffer is the buffer containing the
  * corresponding visibility map block.  Both should have already been modified
@@ -6577,7 +6682,8 @@ static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
 				HeapTuple old_key_tuple,
-				bool all_visible_cleared, bool new_all_visible_cleared)
+				bool all_visible_cleared, bool new_all_visible_cleared,
+				bool all_frozen_cleared, bool new_all_frozen_cleared)
 {
 	xl_heap_update xlrec;
 	xl_heap_header xlhdr;
@@ -6660,6 +6766,10 @@ log_heap_update(Relation reln, Buffer oldbuf,
 		xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
 	if (new_all_visible_cleared)
 		xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
+	if (all_frozen_cleared)
+		xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
+	if (new_all_frozen_cleared)
+		xlrec.flags |= XLOG_HEAP_NEW_ALL_FROZEN_CLEARED;
 	if (prefixlen > 0)
 		xlrec.flags |= XLOG_HEAP_PREFIX_FROM_OLD;
 	if (suffixlen > 0)
@@ -7198,6 +7308,75 @@ heap_xlog_visible(XLogReaderState *record)
 		UnlockReleaseBuffer(vmbuffer);
 }
 
+
+/*
+ * Reply XLOG_HEAP3_FROZENMAP record.
+ */
+static void
+heap_xlog_frozenmap(XLogReaderState *record)
+{
+	XLogRecPtr	lsn = record->EndRecPtr;
+	Buffer		fmbuffer = InvalidBuffer;
+	Buffer		buffer;
+	Page		page;
+	RelFileNode rnode;
+	BlockNumber blkno;
+	XLogRedoAction action;
+
+	XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno);
+
+	/*
+	 * Read the heap page, if it still exists. If the heap file has dropped or
+	 * truncated later in recovery, we don't need to update the page, but we'd
+	 * better still update the frozen map.
+	 */
+	action = XLogReadBufferForRedo(record, 1, &buffer);
+	if (action == BLK_NEEDS_REDO)
+	{
+		page = BufferGetPage(buffer);
+		PageSetAllFrozen(page);
+		MarkBufferDirty(buffer);
+	}
+	else if (action == BLK_RESTORED)
+	{
+		/*
+		 * If heap block was backed up, restore it. This can only happen with
+		 * checksums enabled.
+		 */
+		Assert(DataChecksumsEnabled());
+	}
+	if (BufferIsValid(buffer))
+		UnlockReleaseBuffer(buffer);
+
+	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
+									  &fmbuffer) == BLK_NEEDS_REDO)
+	{
+		Page		fmpage = BufferGetPage(fmbuffer);
+		Relation	reln;
+
+		/* initialize the page if it was read as zeros */
+		if (PageIsNew(fmpage))
+			PageInit(fmpage, BLCKSZ, 0);
+
+		/*
+		 * XLogReplayBufferExtended locked the buffer. But frozenmap_set
+		 * will handle locking itself.
+		 */
+		LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);
+
+		reln = CreateFakeRelcacheEntry(rnode);
+		frozenmap_pin(reln, blkno, &fmbuffer);
+
+		if (lsn > PageGetLSN(fmpage))
+			frozenmap_set(reln, blkno, InvalidBuffer, lsn, fmbuffer);
+
+		ReleaseBuffer(fmbuffer);
+		FreeFakeRelcacheEntry(reln);
+	}
+	else if (BufferIsValid(fmbuffer))
+		UnlockReleaseBuffer(fmbuffer);
+}
+
 /*
  * Replay XLOG_HEAP2_FREEZE_PAGE records
  */
@@ -7384,6 +7563,20 @@ heap_xlog_insert(XLogReaderState *record)
 		FreeFakeRelcacheEntry(reln);
 	}
 
+	/* The frozen map may need to be fixed even if the heap page is
+	 * already up-to-date.
+	 */
+	if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(target_node);
+		Buffer		fmbuffer = InvalidBuffer;
+
+		frozenmap_pin(reln, blkno, &fmbuffer);
+		frozenmap_clear(reln, blkno, fmbuffer);
+		ReleaseBuffer(fmbuffer);
+		FreeFakeRelcacheEntry(reln);
+	}
+
 	/*
 	 * If we inserted the first and only tuple on the page, re-initialize the
 	 * page from scratch.
@@ -7439,6 +7632,9 @@ heap_xlog_insert(XLogReaderState *record)
 		if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+			PageClearAllFrozen(page);
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
@@ -7504,6 +7700,21 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		FreeFakeRelcacheEntry(reln);
 	}
 
+	/*
+	 * The frozen map may need to be fixed even if the heap page is
+	 * already up-to-date.
+	 */
+	if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rnode);
+		Buffer		fmbuffer = InvalidBuffer;
+
+		visibilitymap_pin(reln, blkno, &fmbuffer);
+		visibilitymap_clear(reln, blkno, fmbuffer);
+		ReleaseBuffer(fmbuffer);
+		FreeFakeRelcacheEntry(reln);
+	}
+
 	if (isinit)
 	{
 		buffer = XLogInitBufferForRedo(record, 0);
@@ -7577,6 +7788,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+			PageClearAllFrozen(page);
 
 		MarkBufferDirty(buffer);
 	}
@@ -7660,6 +7873,22 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	}
 
 	/*
+	 * The frozen map may need to be fixed even if the heap page is
+	 * already up-to-date.
+	 */
+	if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rnode);
+		Buffer		fmbuffer = InvalidBuffer;
+
+		frozenmap_pin(reln, oldblk, &fmbuffer);
+		frozenmap_clear(reln, oldblk, fmbuffer);
+		ReleaseBuffer(fmbuffer);
+		FreeFakeRelcacheEntry(reln);
+	}
+
+
+	/*
 	 * In normal operation, it is important to lock the two pages in
 	 * page-number order, to avoid possible deadlocks against other update
 	 * operations going the other way.  However, during WAL replay there can
@@ -7705,6 +7934,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 
 		if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+			PageClearAllFrozen(page);
 
 		PageSetLSN(page, lsn);
 		MarkBufferDirty(obuffer);
@@ -7743,6 +7974,21 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		FreeFakeRelcacheEntry(reln);
 	}
 
+	/*
+	 * The frozen map may need to be fixed even if the heap page is
+	 * already up-to-date.
+	 */
+	if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+	{
+		Relation	reln = CreateFakeRelcacheEntry(rnode);
+		Buffer		fmbuffer = InvalidBuffer;
+
+		visibilitymap_pin(reln, oldblk, &fmbuffer);
+		visibilitymap_clear(reln, oldblk, fmbuffer);
+		ReleaseBuffer(fmbuffer);
+		FreeFakeRelcacheEntry(reln);
+	}
+
 	/* Deal with new tuple */
 	if (newaction == BLK_NEEDS_REDO)
 	{
@@ -7840,6 +8086,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 
 		if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED)
+			PageClearAllFrozen(page);
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
@@ -8072,6 +8320,21 @@ heap2_redo(XLogReaderState *record)
 	}
 }
 
+void
+heap3_redo(XLogReaderState *record)
+{
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	switch (info & XLOG_HEAP_OPMASK)
+	{
+		case XLOG_HEAP3_FROZENMAP:
+			heap_xlog_frozenmap(record);
+			break;
+		default:
+			elog(PANIC, "heap3_redo: unknown op code %u", info);
+	}
+}
+
 /*
  *	heap_sync		- sync a heap, for use when no WAL has been written
  *
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 6d091f6..5460d4f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -15,6 +15,7 @@
 
 #include "postgres.h"
 
+#include "access/frozenmap.h"
 #include "access/heapam.h"
 #include "access/hio.h"
 #include "access/htup_details.h"
@@ -156,6 +157,62 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * For each heap page which is all-frozen, acquire a pin on the appropriate
+ * frozen map page, if we haven't already got one.
+ *
+ * This function is same logic as GetVisibilityMapPins function.
+ */
+static void
+GetFrozenMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
+					 BlockNumber block1, BlockNumber block2,
+					 Buffer *fmbuffer1, Buffer *fmbuffer2)
+{
+	bool		need_to_pin_buffer1;
+	bool		need_to_pin_buffer2;
+
+	Assert(BufferIsValid(buffer1));
+	Assert(buffer2 == InvalidBuffer || buffer1 <= buffer2);
+
+	while (1)
+	{
+		/* Figure out which pins we need but don't have. */
+		need_to_pin_buffer1 = PageIsAllFrozen(BufferGetPage(buffer1))
+			&& !frozenmap_pin_ok(block1, *fmbuffer1);
+		need_to_pin_buffer2 = buffer2 != InvalidBuffer
+			&& PageIsAllFrozen(BufferGetPage(buffer2))
+			&& !frozenmap_pin_ok(block2, *fmbuffer2);
+		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
+			return;
+
+		/* We must unlock both buffers before doing any I/O. */
+		LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
+		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
+			LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
+
+		/* Get pins. */
+		if (need_to_pin_buffer1)
+			frozenmap_pin(relation, block1, fmbuffer1);
+		if (need_to_pin_buffer2)
+			frozenmap_pin(relation, block2, fmbuffer2);
+
+		/* Relock buffers. */
+		LockBuffer(buffer1, BUFFER_LOCK_EXCLUSIVE);
+		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
+			LockBuffer(buffer2, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * If there are two buffers involved and we pinned just one of them,
+		 * it's possible that the second one became all-frozen while we were
+		 * busy pinning the first one.  If it looks like that's a possible
+		 * scenario, we'll need to make a second pass through this loop.
+		 */
+		if (buffer2 == InvalidBuffer || buffer1 == buffer2
+			|| (need_to_pin_buffer1 && need_to_pin_buffer2))
+			break;
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -215,7 +272,8 @@ Buffer
 RelationGetBufferForTuple(Relation relation, Size len,
 						  Buffer otherBuffer, int options,
 						  BulkInsertState bistate,
-						  Buffer *vmbuffer, Buffer *vmbuffer_other)
+						  Buffer *vmbuffer, Buffer *vmbuffer_other,
+						  Buffer *fmbuffer, Buffer *fmbuffer_other)
 {
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
@@ -316,6 +374,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 			buffer = ReadBufferBI(relation, targetBlock, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			if (PageIsAllFrozen(BufferGetPage(buffer)))
+				frozenmap_pin(relation, targetBlock, fmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -324,6 +384,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 			buffer = otherBuffer;
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			if (PageIsAllFrozen(BufferGetPage(buffer)))
+				frozenmap_pin(relation, targetBlock, fmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock < targetBlock)
@@ -332,6 +394,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 			buffer = ReadBuffer(relation, targetBlock);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			if (PageIsAllFrozen(BufferGetPage(buffer)))
+				frozenmap_pin(relation, targetBlock, fmbuffer);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -341,6 +405,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 			buffer = ReadBuffer(relation, targetBlock);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			if (PageIsAllFrozen(BufferGetPage(buffer)))
+				frozenmap_pin(relation, targetBlock, fmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -367,13 +433,23 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		 * done.
 		 */
 		if (otherBuffer == InvalidBuffer || buffer <= otherBuffer)
+		{
 			GetVisibilityMapPins(relation, buffer, otherBuffer,
 								 targetBlock, otherBlock, vmbuffer,
 								 vmbuffer_other);
+			GetFrozenMapPins(relation, buffer, otherBuffer,
+								 targetBlock, otherBlock, fmbuffer,
+								 fmbuffer_other);
+		}
 		else
+		{
 			GetVisibilityMapPins(relation, otherBuffer, buffer,
 								 otherBlock, targetBlock, vmbuffer_other,
 								 vmbuffer);
+			GetFrozenMapPins(relation, otherBuffer, buffer,
+								 otherBlock, targetBlock, fmbuffer_other,
+								 fmbuffer);
+		}
 
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 4f06a26..9a67733 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -149,6 +149,20 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+void
+heap3_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_HEAP3_FROZENMAP)
+	{
+		xl_heap_clean *xlrec = (xl_heap_clean *) rec;
+
+		appendStringInfo(buf, "remxid %u", xlrec->latestRemovedXid);
+	}
+}
+
 const char *
 heap_identify(uint8 info)
 {
@@ -226,3 +240,18 @@ heap2_identify(uint8 info)
 
 	return id;
 }
+
+const char *
+heap3_identify(uint8 info)
+{
+	const char *id = NULL;
+
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_HEAP3_FROZENMAP:
+			id = "FROZENMAP";
+			break;
+	}
+
+	return id;
+}
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ce398fc..961775e 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -19,6 +19,7 @@
 
 #include "postgres.h"
 
+#include "access/frozenmap.h"
 #include "access/visibilitymap.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -228,6 +229,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	bool		fm;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -238,6 +240,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_targblock = InvalidBlockNumber;
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
+	rel->rd_smgr->smgr_fm_nblocks = InvalidBlockNumber;
 
 	/* Truncate the FSM first if it exists */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
@@ -249,6 +252,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	if (vm)
 		visibilitymap_truncate(rel, nblocks);
 
+	/* Truncate the frozen map too if it exists. */
+	fm = smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM);
+	if (fm)
+		frozenmap_truncate(rel, nblocks);
+
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
 	 * trouble if the truncation fails. If we then crash, the WAL replay
@@ -282,7 +290,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 		 * with a truncated heap, but the FSM or visibility map would still
 		 * contain entries for the non-existent heap pages.
 		 */
-		if (fsm || vm)
+		if (fsm || vm || fm)
 			XLogFlush(lsn);
 	}
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3febdd5..80a9f96 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -17,6 +17,7 @@
  */
 #include "postgres.h"
 
+#include "access/frozenmap.h"
 #include "access/multixact.h"
 #include "access/relscan.h"
 #include "access/rewriteheap.h"
@@ -1484,6 +1485,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	Oid			mapped_tables[4];
 	int			reindex_flags;
 	int			i;
+	Buffer		fmbuffer = InvalidBuffer,
+				buf = InvalidBuffer;
+	Relation	rel;
+	BlockNumber	nblocks, blkno;
 
 	/* Zero out possible results from swapped_relation_files */
 	memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1591,6 +1596,26 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 		RelationMapRemoveMapping(mapped_tables[i]);
 
 	/*
+	 * We can ensure that the all tuple of new relation has been completely
+	 * frozen at this point since we aquired AccessExclusiveLock already.
+	 * We set a bit on frozen map and flag to page header to each page.
+	 */
+	rel = relation_open(OIDOldHeap, NoLock);
+	nblocks = RelationGetNumberOfBlocks(rel);
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		buf = ReadBuffer(rel, blkno);
+		PageSetAllFrozen(BufferGetPage(buf));
+		frozenmap_pin(rel, blkno, &fmbuffer);
+		frozenmap_set(rel, blkno, buf, InvalidXLogRecPtr, fmbuffer);
+		ReleaseBuffer(buf);
+	}
+
+	if (fmbuffer != InvalidBuffer)
+		ReleaseBuffer(fmbuffer);
+	relation_close(rel, NoLock);
+
+	/*
 	 * At this point, everything is kosher except that, if we did toast swap
 	 * by links, the toast table's name corresponds to the transient table.
 	 * The name is irrelevant to the backend because it's referenced by OID,
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c3d6e59..8e9940b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -37,6 +37,7 @@
 
 #include <math.h>
 
+#include "access/frozenmap.h"
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/heapam_xlog.h"
@@ -106,6 +107,7 @@ typedef struct LVRelStats
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* number of pages we examined */
 	BlockNumber	pinskipped_pages; /* # of pages we skipped due to a pin */
+	BlockNumber fmskipped_pages; /* # of pages we skipped by frozen map */
 	double		scanned_tuples; /* counts only tuples on scanned pages */
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
@@ -222,6 +224,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * than or equal to the requested Xid full-table scan limit; or if the
 	 * table's minimum MultiXactId is older than or equal to the requested
 	 * mxid full-table scan limit.
+	 * Even if scan_all is set so far, we could skip to scan some pages
+	 * according by frozen map.
 	 */
 	scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
 											 xidFullScanLimit);
@@ -247,20 +251,22 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/*
-	 * Compute whether we actually scanned the whole relation. If we did, we
-	 * can adjust relfrozenxid and relminmxid.
+	 * Compute whether we actually scanned the whole relation. If we did,
+	 * we can adjust relfrozenxid and relminmxid.
 	 *
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
+	if ((vacrelstats->scanned_pages + vacrelstats->fmskipped_pages)
+		< vacrelstats->rel_pages)
 	{
-		Assert(!scan_all);
 		scanned_all = false;
 	}
 	else
 		scanned_all = true;
 
+	scanned_all |= scan_all;
+
 	/*
 	 * Optionally truncate the relation.
 	 *
@@ -450,7 +456,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
-	Buffer		vmbuffer = InvalidBuffer;
+	Buffer		vmbuffer = InvalidBuffer,
+				fmbuffer = InvalidBuffer;
 	BlockNumber next_not_all_visible_block;
 	bool		skipping_all_visible_blocks;
 	xl_heap_freeze_tuple *frozen;
@@ -533,6 +540,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 					hastup;
 		int			prev_dead_count;
 		int			nfrozen;
+		int			already_nfrozen; /* # of tuples already frozen */
+		int			ntup_blk; /* # of tuples in single page */
 		Size		freespace;
 		bool		all_visible_according_to_vm;
 		bool		all_visible;
@@ -562,12 +571,33 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			else
 				skipping_all_visible_blocks = false;
 			all_visible_according_to_vm = false;
+
+			/* Even if current block is not all-visible, we scan skip vacuum
+			 * this block only when corresponding frozen map bit is set, and
+			 * whole table scanning is required.
+			 */
+			if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)
+			{
+				vacrelstats->fmskipped_pages++;
+				continue;
+			}
 		}
 		else
 		{
-			/* Current block is all-visible */
+			/*
+			 * Current block is all-visible.
+			 * If frozen map represents that it's all frozen and this
+			 * function is called for freezing tuples, we can skip to
+			 * vacuum block.
+			 */
+			if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)
+			{
+				vacrelstats->fmskipped_pages++;
+				continue;
+			}
 			if (skipping_all_visible_blocks && !scan_all)
 				continue;
+
 			all_visible_according_to_vm = true;
 		}
 
@@ -592,6 +622,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 				vmbuffer = InvalidBuffer;
 			}
 
+			if (BufferIsValid(fmbuffer))
+			{
+				ReleaseBuffer(fmbuffer);
+				fmbuffer = InvalidBuffer;
+			}
+
 			/* Log cleanup info before we touch indexes */
 			vacuum_log_cleanup_info(onerel, vacrelstats);
 
@@ -621,6 +657,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		 * and did a cycle of index vacuuming.
 		 */
 		visibilitymap_pin(onerel, blkno, &vmbuffer);
+		frozenmap_pin(onerel, blkno, &fmbuffer);
 
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
 								 RBM_NORMAL, vac_strategy);
@@ -763,6 +800,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		all_visible = true;
 		has_dead_tuples = false;
 		nfrozen = 0;
+		already_nfrozen = 0;
+		ntup_blk = 0;
 		hastup = false;
 		prev_dead_count = vacrelstats->num_dead_tuples;
 		maxoff = PageGetMaxOffsetNumber(page);
@@ -917,8 +956,13 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			else
 			{
 				num_tuples += 1;
+				ntup_blk += 1;
 				hastup = true;
 
+				/* If current tuple is already frozen, count it up */
+				if (HeapTupleHeaderXminFrozen(tuple.t_data))
+					already_nfrozen += 1;
+
 				/*
 				 * Each non-removable tuple must be checked to see if it needs
 				 * freezing.  Note we already have exclusive buffer lock.
@@ -952,6 +996,27 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 				heap_execute_freeze_tuple(htup, &frozen[i]);
 			}
 
+			/*
+			 * If the un-frozen tuple is remaining in current page and
+			 * current page is marked as ALL_FROZEN, we should clear it.
+			 */
+			if (ntup_blk != (nfrozen + already_nfrozen)
+				&& PageIsAllFrozen(page))
+			{
+				PageClearAllFrozen(page);
+				frozenmap_clear(onerel, blkno, fmbuffer);
+			}
+			/*
+			 * As a result of scanning a page, we ensure that all tuples
+			 * are completely frozen. Set bit on frozen map and PD_ALL_FROZEN
+			 * flag on page.
+			 */
+			else if (ntup_blk == (nfrozen + already_nfrozen))
+			{
+				PageSetAllFrozen(page);
+				frozenmap_set(onerel, blkno, buf, InvalidXLogRecPtr, fmbuffer);
+			}
+
 			/* Now WAL-log freezing if neccessary */
 			if (RelationNeedsWAL(onerel))
 			{
@@ -1077,13 +1142,18 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 														 num_tuples);
 
 	/*
-	 * Release any remaining pin on visibility map page.
+	 * Release any remaining pin on visibility map and frozen map page.
 	 */
 	if (BufferIsValid(vmbuffer))
 	{
 		ReleaseBuffer(vmbuffer);
 		vmbuffer = InvalidBuffer;
 	}
+	if (BufferIsValid(fmbuffer))
+	{
+		ReleaseBuffer(fmbuffer);
+		fmbuffer = InvalidBuffer;
+	}
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f96fb24..67898df 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -92,7 +92,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 			if (exprType((Node *) tle->expr) != attr->atttypid)
 				ereport(ERROR,
 						(errcode(ERRCODE_DATATYPE_MISMATCH),
-						 errmsg("table row type and query-specified row type do not match"),
+				  errmsg("table row type and query-specified row type do not match"),
 						 errdetail("Table has type %s at ordinal position %d, but query expects %s.",
 								   format_type_be(attr->atttypid),
 								   attno,
@@ -117,7 +117,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 	if (attno != resultDesc->natts)
 		ereport(ERROR,
 				(errcode(ERRCODE_DATATYPE_MISMATCH),
-		  errmsg("table row type and query-specified row type do not match"),
+				 errmsg("table row type and query-specified row type do not match"),
 				 errdetail("Query has too few columns.")));
 }
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eb7293f..d66660d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -55,6 +55,7 @@ typedef struct XLogRecordBuffer
 static void DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 
@@ -104,6 +105,10 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 			DecodeStandbyOp(ctx, &buf);
 			break;
 
+		case RM_HEAP3_ID:
+			DecodeHeap3Op(ctx, &buf);
+			break;
+
 		case RM_HEAP2_ID:
 			DecodeHeap2Op(ctx, &buf);
 			break;
@@ -300,6 +305,29 @@ DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 }
 
 /*
+ * Handle rmgr HEAP3_ID records for DecodeRecordIntoReorderBuffer().
+ */
+static void
+DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
+{
+	uint8		info = XLogRecGetInfo(buf->record) & XLOG_HEAP_OPMASK;
+	SnapBuild	*builder = ctx->snapshot_builder;
+
+	/* no point in doing anything yet */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+		return;
+
+	switch (info)
+	{
+		case XLOG_HEAP3_FROZENMAP:
+			break;
+		default:
+			elog(ERROR, "unexpected RM_HEAP3_ID record type: %u", info);
+	}
+
+}
+
+/*
  * Handle rmgr HEAP2_ID records for DecodeRecordIntoReorderBuffer().
  */
 static void
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 244b4ea..666e682 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -168,6 +168,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 		reln->smgr_targblock = InvalidBlockNumber;
 		reln->smgr_fsm_nblocks = InvalidBlockNumber;
 		reln->smgr_vm_nblocks = InvalidBlockNumber;
+		reln->smgr_fm_nblocks = InvalidBlockNumber;
 		reln->smgr_which = 0;	/* we only have md.c at present */
 
 		/* mark it not open */
diff --git a/src/common/relpath.c b/src/common/relpath.c
index 66dfef1..7eba9ee 100644
--- a/src/common/relpath.c
+++ b/src/common/relpath.c
@@ -35,6 +35,7 @@ const char *const forkNames[] = {
 	"main",						/* MAIN_FORKNUM */
 	"fsm",						/* FSM_FORKNUM */
 	"vm",						/* VISIBILITYMAP_FORKNUM */
+	"fm",						/* FROZENMAP_FORKNUM */
 	"init"						/* INIT_FORKNUM */
 };
 
@@ -58,7 +59,7 @@ forkname_to_number(const char *forkName)
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 			 errmsg("invalid fork name"),
 			 errhint("Valid fork names are \"main\", \"fsm\", "
-					 "\"vm\", and \"init\".")));
+					 "\"vm\", \"fm\" and \"init\".")));
 #endif
 
 	return InvalidForkNumber;
diff --git a/src/include/access/frozenmap.h b/src/include/access/frozenmap.h
new file mode 100644
index 0000000..0f2e54e
--- /dev/null
+++ b/src/include/access/frozenmap.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * frozenmap.h
+ *		frozen map interface
+ *
+ *
+ * Portions Copyright (c) 2007-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/frozenmap.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FROZENMAP_H
+#define FROZENMAP_H
+
+#include "access/xlogdefs.h"
+#include "storage/block.h"
+#include "storage/buf.h"
+#include "utils/relcache.h"
+
+extern void frozenmap_clear(Relation rel, BlockNumber heapBlk,
+					Buffer fmbuf);
+extern void frozenmap_pin(Relation rel, BlockNumber heapBlk,
+				  Buffer *fmbuf);
+extern bool frozenmap_pin_ok(BlockNumber heapBlk, Buffer fmbuf);
+extern void frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+				  XLogRecPtr recptr, Buffer fmBuf);
+extern bool frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *fmbuf);
+extern BlockNumber frozenmap_count(Relation rel);
+extern void frozenmap_truncate(Relation rel, BlockNumber nheapblocks);
+
+#endif   /* FROZENMAP_H */
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f0f89de..087cfeb 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,6 +60,13 @@
 #define XLOG_HEAP2_NEW_CID		0x70
 
 /*
+ * heapam.c has a third RmgrId now. These opcodes are associated with
+ * RM_HEAP3_ID, but are not logically different fromthe ones above
+ * asssociated with RM_HEAP_ID. XLOG_HEAP_OPMASK applies to these, too.
+ */
+#define XLOG_HEAP3_FROZENMAP	0x00
+
+/*
  * xl_heap_* ->flag values, 8 bits are available.
  */
 /* PD_ALL_VISIBLE was cleared */
@@ -73,6 +80,10 @@
 #define XLOG_HEAP_SUFFIX_FROM_OLD			(1<<6)
 /* last xl_heap_multi_insert record for one heap_multi_insert() call */
 #define XLOG_HEAP_LAST_MULTI_INSERT			(1<<7)
+/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */
+#define XLOG_HEAP_ALL_FROZEN_CLEARED		(1<<8)
+/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */
+#define XLOG_HEAP_NEW_ALL_FROZEN_CLEARED	(1<<9)
 
 /* convenience macro for checking whether any form of old tuple was logged */
 #define XLOG_HEAP_CONTAINS_OLD						\
@@ -110,12 +121,12 @@ typedef struct xl_heap_header
 typedef struct xl_heap_insert
 {
 	OffsetNumber offnum;		/* inserted tuple's offset */
-	uint8		flags;
+	uint16		flags;
 
 	/* xl_heap_header & TUPLE DATA in backup block 0 */
 } xl_heap_insert;
 
-#define SizeOfHeapInsert	(offsetof(xl_heap_insert, flags) + sizeof(uint8))
+#define SizeOfHeapInsert	(offsetof(xl_heap_insert, flags) + sizeof(uint16))
 
 /*
  * This is what we need to know about a multi-insert.
@@ -130,7 +141,7 @@ typedef struct xl_heap_insert
  */
 typedef struct xl_heap_multi_insert
 {
-	uint8		flags;
+	uint16		flags;
 	uint16		ntuples;
 	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
 } xl_heap_multi_insert;
@@ -170,7 +181,7 @@ typedef struct xl_heap_update
 	TransactionId old_xmax;		/* xmax of the old tuple */
 	OffsetNumber old_offnum;	/* old tuple's offset */
 	uint8		old_infobits_set;		/* infomask bits to set on old tuple */
-	uint8		flags;
+	uint16		flags;
 	TransactionId new_xmax;		/* xmax of the new tuple */
 	OffsetNumber new_offnum;	/* new tuple's offset */
 
@@ -342,6 +353,9 @@ extern const char *heap_identify(uint8 info);
 extern void heap2_redo(XLogReaderState *record);
 extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
+extern void heap3_redo(XLogReaderState *record);
+extern void heap3_desc(StringInfo buf, XLogReaderState *record);
+extern const char *heap3_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
 extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
@@ -354,6 +368,8 @@ extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
 extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer,
 				TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples,
 				int ntuples);
+extern XLogRecPtr log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer,
+									 Buffer fm_buffer);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 						  TransactionId cutoff_xid,
 						  TransactionId cutoff_multi,
diff --git a/src/include/access/hio.h b/src/include/access/hio.h
index b014029..1a27ee8 100644
--- a/src/include/access/hio.h
+++ b/src/include/access/hio.h
@@ -40,6 +40,8 @@ extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
 extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
 						  Buffer otherBuffer, int options,
 						  BulkInsertState bistate,
-						  Buffer *vmbuffer, Buffer *vmbuffer_other);
+						  Buffer *vmbuffer, Buffer *vmbuffer_other,
+						  Buffer *fmbuffer, Buffer *fmbuffer_other
+	);
 
 #endif   /* HIO_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 48f04c6..e49c0b0 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -34,6 +34,7 @@ PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, N
 PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL)
 PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL)
 PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL)
+PG_RMGR(RM_HEAP3_ID, "Heap3", heap3_redo, heap3_desc, heap3_identify, NULL, NULL)
 PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL)
 PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL)
 PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL)
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 8b4c35c..8420e47 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -47,6 +47,8 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
 	float4		reltuples;		/* # of tuples (not always up-to-date) */
 	int32		relallvisible;	/* # of all-visible blocks (not always
 								 * up-to-date) */
+	int32		relallfrozen;	/* # of all-frozen blocks (not always
+								   up-to-date) */
 	Oid			reltoastrelid;	/* OID of toast table; 0 if none */
 	bool		relhasindex;	/* T if has (or has had) any indexes */
 	bool		relisshared;	/* T if shared across databases */
@@ -95,7 +97,7 @@ typedef FormData_pg_class *Form_pg_class;
  * ----------------
  */
 
-#define Natts_pg_class					30
+#define Natts_pg_class					31
 #define Anum_pg_class_relname			1
 #define Anum_pg_class_relnamespace		2
 #define Anum_pg_class_reltype			3
@@ -107,25 +109,26 @@ typedef FormData_pg_class *Form_pg_class;
 #define Anum_pg_class_relpages			9
 #define Anum_pg_class_reltuples			10
 #define Anum_pg_class_relallvisible		11
-#define Anum_pg_class_reltoastrelid		12
-#define Anum_pg_class_relhasindex		13
-#define Anum_pg_class_relisshared		14
-#define Anum_pg_class_relpersistence	15
-#define Anum_pg_class_relkind			16
-#define Anum_pg_class_relnatts			17
-#define Anum_pg_class_relchecks			18
-#define Anum_pg_class_relhasoids		19
-#define Anum_pg_class_relhaspkey		20
-#define Anum_pg_class_relhasrules		21
-#define Anum_pg_class_relhastriggers	22
-#define Anum_pg_class_relhassubclass	23
-#define Anum_pg_class_relrowsecurity	24
-#define Anum_pg_class_relispopulated	25
-#define Anum_pg_class_relreplident		26
-#define Anum_pg_class_relfrozenxid		27
-#define Anum_pg_class_relminmxid		28
-#define Anum_pg_class_relacl			29
-#define Anum_pg_class_reloptions		30
+#define Anum_pg_class_relallfrozen		12
+#define Anum_pg_class_reltoastrelid		13
+#define Anum_pg_class_relhasindex		14
+#define Anum_pg_class_relisshared		15
+#define Anum_pg_class_relpersistence	16
+#define Anum_pg_class_relkind			17
+#define Anum_pg_class_relnatts			18
+#define Anum_pg_class_relchecks			19
+#define Anum_pg_class_relhasoids		20
+#define Anum_pg_class_relhaspkey		21
+#define Anum_pg_class_relhasrules		22
+#define Anum_pg_class_relhastriggers	23
+#define Anum_pg_class_relhassubclass	24
+#define Anum_pg_class_relrowsecurity	25
+#define Anum_pg_class_relispopulated	26
+#define Anum_pg_class_relreplident		27
+#define Anum_pg_class_relfrozenxid		28
+#define Anum_pg_class_relminmxid		29
+#define Anum_pg_class_relacl			30
+#define Anum_pg_class_reloptions		31
 
 /* ----------------
  *		initial contents of pg_class
@@ -140,13 +143,13 @@ typedef FormData_pg_class *Form_pg_class;
  * Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
  * similarly, "1" in relminmxid stands for FirstMultiXactId
  */
-DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 31 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
 
 
diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h
index a263779..5d40997 100644
--- a/src/include/common/relpath.h
+++ b/src/include/common/relpath.h
@@ -27,6 +27,7 @@ typedef enum ForkNumber
 	MAIN_FORKNUM = 0,
 	FSM_FORKNUM,
 	VISIBILITYMAP_FORKNUM,
+	FROZENMAP_FORKNUM,
 	INIT_FORKNUM
 
 	/*
@@ -38,7 +39,7 @@ typedef enum ForkNumber
 
 #define MAX_FORKNUM		INIT_FORKNUM
 
-#define FORKNAMECHARS	4		/* max chars for a fork name */
+#define FORKNAMECHARS	5		/* max chars for a fork name */
 
 extern const char *const forkNames[];
 
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index c2fbffc..f46375d 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -178,8 +178,10 @@ typedef PageHeaderData *PageHeader;
 										 * tuple? */
 #define PD_ALL_VISIBLE		0x0004		/* all tuples on page are visible to
 										 * everyone */
+#define PD_ALL_FROZEN		0x0008		/* all tuples on page are completely
+										   frozen */
 
-#define PD_VALID_FLAG_BITS	0x0007		/* OR of all valid pd_flags bits */
+#define PD_VALID_FLAG_BITS	0x000F		/* OR of all valid pd_flags bits */
 
 /*
  * Page layout version number 0 is for pre-7.3 Postgres releases.
@@ -367,6 +369,13 @@ typedef PageHeaderData *PageHeader;
 #define PageClearAllVisible(page) \
 	(((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)
 
+#define PageIsAllFrozen(page) \
+	(((PageHeader) (page))->pd_flags & PD_ALL_FROZEN)
+#define PageSetAllFrozen(page) \
+	(((PageHeader) (page))->pd_flags |= PD_ALL_FROZEN)
+#define PageClearAllFrozen(page) \
+	(((PageHeader) (page))->pd_flags &= ~PD_ALL_FROZEN)
+
 #define PageIsPrunable(page, oldestxmin) \
 ( \
 	AssertMacro(TransactionIdIsNormal(oldestxmin)), \
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 69a624f..2173c20 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -55,6 +55,7 @@ typedef struct SMgrRelationData
 	BlockNumber smgr_targblock; /* current insertion target block */
 	BlockNumber smgr_fsm_nblocks;		/* last known size of fsm fork */
 	BlockNumber smgr_vm_nblocks;	/* last known size of vm fork */
+	BlockNumber	smgr_fm_nblocks;	/* last known size of fm fork */
 
 	/* additional public fields may someday exist here */
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to