Re: [HACKERS] Relation extension scalability

Dilip Kumar Mon, 29 Feb 2016 02:08:23 -0800

On Wed, Feb 10, 2016 at 7:06 PM, Dilip Kumar <[email protected]> wrote:


>
I have tested Relation extension patch from various aspects and performance
results and other statistical data are explained in the mail.

Test 1: Identify the Heavy Weight lock is the Problem or the Actual Context
Switch
1. I converted the RelationExtensionLock to simple LWLock and tested with
single Relation. Results are as below

This is the simple script of copy 10000 record in one transaction of size 4
Bytes
client    base    lwlock    multi_extend by 50 block
1    155    156    160
2    282    276    284
4    248    319    428
8    161    267    675
16   143    241    889

LWLock performance is better than base, obvious reason may be because we
have saved some instructions by converting to LWLock but it don't scales
any better compared to base code.


Test2: Identify that improvement in case of multiextend is becuase of
avoiding context switch or some other factor, like reusing blocks b/w
backend by putting in FSM..

1. Test by just extending multiple blocks and reuse in it's own backend
(Don't put in FSM)
Insert 1K record data don't fits in shared buffer 512MB Shared Buffer


Client    Base    Extend 800 block self use    Extend 1000 Block
1          117              131                                     118
2          111              203                                     140
3            51              242                                     178
4            51              231                                     190
5            52              259                                     224
6            51              263                                     243
7            43              253                                      254
8            43              240                                      254
16          40              190                                      243

We can see the same improvement in case of self using the blocks also, It
shows that Sharing the blocks between the backend was not the WIN but
avoiding context switch was the measure win.

2. Tested the Number of ProcSleep during the Run.
This is the simple script of copy 10000 record in one transaction of size 4
Bytes

                        * BASE CODE*
*PATCH MULTI EXTEND*
Client    Base_TPS    ProcSleep Count        Extend By 10 Block    Proc
Sleep Count
2                280                       457,506
311                    62,641
3                235                    1,098,701
358                   141,624
4                216                    1,155,735
368                   188,173

What we can see in above test that, in Base code performance is degrading
after 2 threads, while Proc Sleep count in increasing with huge amount.

Compared to that in Patch, with extending 10 blocks at a time Proc Sleep
reduce to ~1/8 and we can see it is constantly scaling.

Proc Sleep test for Insert test when data don't fits in shared buffer and
inserting big record of 1024 bytes, is currently running
once I get the data will post the same.

Posting the re-based version and moving to next CF.

Open points:
1. After getting the Lock recheck the FSM if some other back end has
already added extra blocks and reuse them.
2. Is it good idea to have user level parameter for extend_by_block or we
can try some approach to internally identify how many blocks are needed and
as per the need only add the blocks, this will make it more flexible.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 86b9ae1..78e81dd 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -268,6 +268,16 @@ static relopt_int intRelOpts[] =
 #endif
 	},
 
+	{
+		{
+			"extend_by_blocks",
+			"Number of blocks to be added to relation in every extend call",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		1, 1, 10000
+	},
+
 	/* list terminator */
 	{{NULL}}
 };
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"user_catalog_table", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, user_catalog_table)}
+		offsetof(StdRdOptions, user_catalog_table)},
+		{"extend_by_blocks", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, extend_by_blocks)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..eb3ce17 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -238,6 +238,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 				otherBlock;
 	bool		needLock;
+	int			extraBlocks;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -443,25 +444,50 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	if (needLock)
 		LockRelationForExtension(relation, ExclusiveLock);
 
+	if (use_fsm)
+		extraBlocks = RelationGetExtendBlocks(relation) - 1;
+	else
+		extraBlocks = 0;
 	/*
 	 * XXX This does an lseek - rather expensive - but at the moment it is the
 	 * only way to accurately determine how many blocks are in a relation.  Is
 	 * it worth keeping an accurate file length in shared memory someplace,
 	 * rather than relying on the kernel to do it for us?
 	 */
-	buffer = ReadBufferBI(relation, P_NEW, bistate);
 
-	/*
-	 * We can be certain that locking the otherBuffer first is OK, since it
-	 * must have a lower page number.
-	 */
-	if (otherBuffer != InvalidBuffer)
-		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+	do
+	{
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
 
-	/*
-	 * Now acquire lock on the new page.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		/*
+		 * We can be certain that locking the otherBuffer first is OK, since
+		 * it must have a lower page number.
+		 */
+		if ((otherBuffer != InvalidBuffer) && !extraBlocks)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Now acquire lock on the new page.
+		 */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		if (extraBlocks)
+		{
+			Page		page;
+			Size		freespace;
+			BlockNumber blockNum;
+
+			page = BufferGetPage(buffer);
+			PageInit(page, BufferGetPageSize(buffer), 0);
+
+			freespace = PageGetHeapFreeSpace(page);
+			MarkBufferDirty(buffer);
+			blockNum = BufferGetBlockNumber(buffer);
+			UnlockReleaseBuffer(buffer);
+			RecordPageWithFreeSpace(relation, blockNum, freespace);
+		}
+
+	} while (extraBlocks--);
 
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..26f6b8e 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -203,6 +203,7 @@ typedef struct StdRdOptions
 	AutoVacOpts autovacuum;		/* autovacuum-related options */
 	bool		user_catalog_table;		/* use as an additional catalog
 										 * relation */
+	int			extend_by_blocks;
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
@@ -239,6 +240,13 @@ typedef struct StdRdOptions
 	((relation)->rd_options ?				\
 	 ((StdRdOptions *) (relation)->rd_options)->user_catalog_table : false)
 
+/*
+ * RelationGetExtendBlocks
+ *		Returns the relation's number of block to be extended one time.
+ */
+#define RelationGetExtendBlocks(relation) \
+	((relation)->rd_options ? \
+	 ((StdRdOptions *) (relation)->rd_options)->extend_by_blocks : 1)
 
 /*
  * ViewOptions

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

Reply via email to