Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-24 Thread Simon Riggs
On 24 September 2017 at 21:32, Tomas Vondra
 wrote:

> Attached is an updated version of the patch, tweaking the comments.

That looks good, thanks. Marking Ready for Committer to give notice
before commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-24 Thread Tomas Vondra
Hi,

Attached is an updated version of the patch, tweaking the comments.

1) I've added a section at the end of src/backend/utils/mmgr/README,
briefly explaining the alternative memory allocators we have. I don't
think we should get into too much low-level detail here, that's more
appropriate for the .c file for each context.

2) I've slightly reworded a paragraph in generation.c describing what
use cases are suitable for the memory context. It used to say:

   This memory context is based on the assumption that the allocated
   chunks have similar lifespan, i.e. that chunks allocated close from
   each other (by time) will also be freed in close proximity, and
   mostly in the same order. This is typical for various queue-like use
   cases, i.e. when tuples are constructed, processed and then thrown
   away.

and now it says:

   This memory context is based on the assumption that the chunks are
   freed roughly in the same order as they were allocated (FIFO), or in
   groups with similar lifespan (generations - hence the name of the
   context). This is typical for various queue-like use cases, i.e. when
   tuples are constructed, processed and then thrown away.

3) I've also added a brief note into reorderbuffer.c mentioning that it
uses SlabContext and GenerationContext. As I explained, I don't think we
should include details about how we tested the patch or whatever here.

regard

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 25806c68a05287f3294f2d8508bd45599232f67b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 24 Sep 2017 22:19:17 +0200
Subject: [PATCH] Generational memory allocator

This memory context is based on the assumption that the allocated chunks
have similar lifespan, i.e. that chunks allocated close from each other
(by time) will also be freed in close proximity, and mostly in the same
order. This is typical for various queue-like use cases, i.e. when
tuples are constructed, processed and then thrown away.

The memory context uses a very simple approach to free space management.
Instead of a complex global freelist, each block tracks a number
of allocated and freed chunks. The space released by freed chunks is not
reused, and once all chunks are freed (i.e. when nallocated == nfreed),
the whole block is thrown away. When the allocated chunks have similar
lifespan, this works very well and is extremely cheap.
---
 src/backend/replication/logical/reorderbuffer.c |  80 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/README   |  23 +
 src/backend/utils/mmgr/generation.c | 768 
 src/include/nodes/memnodes.h|   4 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   5 +
 8 files changed, 819 insertions(+), 79 deletions(-)
 create mode 100644 src/backend/utils/mmgr/generation.c

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..dc0ad5b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -43,6 +43,12 @@
  *	  transaction there will be no other data carrying records between a row's
  *	  toast chunks and the row data itself. See ReorderBufferToast* for
  *	  details.
+ *
+ *	  ReorderBuffer uses two special memory context types - SlabContext for
+ *	  allocations of fixed-length structures (changes and transactions), and
+ *	  GenerationContext for the variable-length transaction data (allocated
+ *	  and freed in groups with similar lifespan).
+ *
  * -
  */
 #include "postgres.h"
@@ -150,15 +156,6 @@ typedef struct ReorderBufferDiskChange
  */
 static const Size max_changes_in_memory = 4096;
 
-/*
- * We use a very simple form of a slab allocator for frequently allocated
- * objects, simply keeping a fixed number in a linked list when unused,
- * instead pfree()ing them. Without that in many workloads aset.c becomes a
- * major bottleneck, especially when spilling to disk while decoding batch
- * workloads.
- */
-static const Size max_cached_tuplebufs = 4096 * 2;	/* ~8MB */
-
 /* ---
  * primary reorderbuffer support routines
  * ---
@@ -248,6 +245,10 @@ ReorderBufferAllocate(void)
 			SLAB_DEFAULT_BLOCK_SIZE,
 			sizeof(ReorderBufferTXN));
 
+	buffer->tup_context = GenerationContextCreate(new_ctx,
+		   "Tuples",
+		   SLAB_LARGE_BLOCK_SIZE);
+
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
 	hash_ctl.hcxt = buffer->context;
@@ -258,15 +259,12 @@ ReorderBufferAllocate(void)
 	

Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-14 Thread Tomas Vondra
On 09/14/2017 04:21 PM, Simon Riggs wrote:
> On 14 August 2017 at 01:35, Tomas Vondra  wrote:
>> Hi,
>>
>> Attached is a rebased version of the Generational context, originally
>> submitted with SlabContext (which was already committed into Pg 10).
>>
>> The main change is that I've abandoned the pattern of defining a Data
>> structure and then a pointer typedef, i.e.
>>
>> typedef struct GenerationContextData { ... } GenerationContextData;
>> typedef struct GenerationContextData *GenerationContext;
>>
>> Now it's just
>>
>> typedef struct GenerationContext { ... } GenerationContext;
>>
>> mostly because SlabContext was committed like that, and because Andres was
>> complaining about this code pattern ;-)
>>
>> Otherwise the design is the same as repeatedly discussed before.
>>
>> To show that this is still valuable change (even after SlabContext and
>> adding doubly-linked list to AllocSet), I've repeated the test done by
>> Andres in [1] using the test case described in [2], that is
>>
>>   -- generate data
>>   SELECT COUNT(*) FROM (SELECT test1()
>>   FROM generate_series(1, 5)) foo;
>>
>>   -- benchmark (measure time and VmPeak)
>>   SELECT COUNT(*) FROM (SELECT *
>>   FROM pg_logical_slot_get_changes('test', NULL,
>> NULL, 'include-xids', '0')) foo;
>>
>> with different values passed to the first step (instead of the 5). The
>> VmPeak numbers look like this:
>>
>>  N   masterpatched
>> --
>> 10   1155220 kB  361604 kB
>> 20   2020668 kB  434060 kB
>> 30   2890236 kB  502452 kB
>> 40   3751592 kB  570816 kB
>> 50   4621124 kB  639168 kB
>>
>> and the timing (on assert-enabled build):
>>
>>  N   masterpatched
>> --
>> 10  1103.182 ms 412.734 ms
>> 20  2216.711 ms 820.438 ms
>> 30  3320.095 ms1223.576 ms
>> 40  4584.919 ms1621.261 ms
>> 50  5590.444 ms2113.820 ms
>>
>> So it seems it's still a significant improvement, both in terms of memory
>> usage and timing. Admittedly, this is a single test, so ideas of other
>> useful test cases are welcome.
> 
> This all looks good.
> 
> What I think this needs is changes to
>src/backend/utils/mmgr/README
> which decribe the various options that now exist (normal?, slab) and
> will exist (generational)
> 
> Don't really care about the name, as long as its clear when to use it
> and when not to use it.
> 
> This description of generational seems wrong: "When the allocated
> chunks have similar lifespan, this works very well and is extremely
> cheap."
> They don't need the same lifespan they just need to be freed in groups
> and in the order they were allocated.
> 

Agreed, should be described differently. What matters is (mostly) FIFO
pattern of the palloc/pfree requests, which allows us to release the memory.

>
> For this patch specifically, we need additional comments in 
> reorderbuffer.c to describe the memory allocation pattern in that 
> module so that it is clear that the choice of allocator is useful
> and appropriate, possibly with details of how that testing was
> performed, so it can be re-tested later or tested on a variety of
> platforms.
> 

Including details about the testing into reorderbuffer.c does not seem
very attractive to me. I don't recall any other place describing the
tests in detail. Why not to discuss the details here, and then include a
link to this thread in the commit message?

In any case, I doubt anyone will repeat the testing on a variety of
platforms (and I don't have any such comprehensive test suite anyway).
What will likely happen is someone bumping into a poorly performing
corner case, we will analyze it and fix it as usual.

>
> Particularly in reorderbuffer, surely we will almost immediately
> reuse chunks again, so is it worth issuing free() and then malloc()
> again soon after? Does that cause additional overhead we could also
> avoid? Could we possibly keep the last/few free'd chunks around
> rather than re-malloc?
> 

I haven't seen anything like that in tests I've done. The malloc/free
overhead is negligible thanks as our allocators significantly reduce the
number of calls to those functions. If we happen to run into such case,
it shouldn't be difficult to keep a few empty blocks. But perhaps we can
leave that as a future optimization.

>
> Seems like we should commit this soon.
> 

Thanks.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-14 Thread Simon Riggs
On 14 August 2017 at 01:35, Tomas Vondra  wrote:
> Hi,
>
> Attached is a rebased version of the Generational context, originally
> submitted with SlabContext (which was already committed into Pg 10).
>
> The main change is that I've abandoned the pattern of defining a Data
> structure and then a pointer typedef, i.e.
>
> typedef struct GenerationContextData { ... } GenerationContextData;
> typedef struct GenerationContextData *GenerationContext;
>
> Now it's just
>
> typedef struct GenerationContext { ... } GenerationContext;
>
> mostly because SlabContext was committed like that, and because Andres was
> complaining about this code pattern ;-)
>
> Otherwise the design is the same as repeatedly discussed before.
>
> To show that this is still valuable change (even after SlabContext and
> adding doubly-linked list to AllocSet), I've repeated the test done by
> Andres in [1] using the test case described in [2], that is
>
>   -- generate data
>   SELECT COUNT(*) FROM (SELECT test1()
>   FROM generate_series(1, 5)) foo;
>
>   -- benchmark (measure time and VmPeak)
>   SELECT COUNT(*) FROM (SELECT *
>   FROM pg_logical_slot_get_changes('test', NULL,
> NULL, 'include-xids', '0')) foo;
>
> with different values passed to the first step (instead of the 5). The
> VmPeak numbers look like this:
>
>  N   masterpatched
> --
> 10   1155220 kB  361604 kB
> 20   2020668 kB  434060 kB
> 30   2890236 kB  502452 kB
> 40   3751592 kB  570816 kB
> 50   4621124 kB  639168 kB
>
> and the timing (on assert-enabled build):
>
>  N   masterpatched
> --
> 10  1103.182 ms 412.734 ms
> 20  2216.711 ms 820.438 ms
> 30  3320.095 ms1223.576 ms
> 40  4584.919 ms1621.261 ms
> 50  5590.444 ms2113.820 ms
>
> So it seems it's still a significant improvement, both in terms of memory
> usage and timing. Admittedly, this is a single test, so ideas of other
> useful test cases are welcome.

This all looks good.

What I think this needs is changes to
   src/backend/utils/mmgr/README
which decribe the various options that now exist (normal?, slab) and
will exist (generational)

Don't really care about the name, as long as its clear when to use it
and when not to use it.

This description of generational seems wrong: "When the allocated
chunks have similar lifespan, this works very well and is extremely
cheap."
They don't need the same lifespan they just need to be freed in groups
and in the order they were allocated.

For this patch specifically, we need additional comments in
reorderbuffer.c to describe the memory allocation pattern in that
module so that it is clear that the choice of allocator is useful and
appropriate, possibly with details of how that testing was performed,
so it can be re-tested later or tested on a variety of platforms.

Particularly in reorderbuffer, surely we will almost immediately reuse
chunks again, so is it worth issuing free() and then malloc() again
soon after? Does that cause additional overhead we could also avoid?
Could we possibly keep the last/few free'd chunks around rather than
re-malloc?

Seems like we should commit this soon.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-08-13 Thread Tomas Vondra

Hi,

Attached is a rebased version of the Generational context, originally 
submitted with SlabContext (which was already committed into Pg 10).


The main change is that I've abandoned the pattern of defining a Data 
structure and then a pointer typedef, i.e.


typedef struct GenerationContextData { ... } GenerationContextData;
typedef struct GenerationContextData *GenerationContext;

Now it's just

typedef struct GenerationContext { ... } GenerationContext;

mostly because SlabContext was committed like that, and because Andres 
was complaining about this code pattern ;-)


Otherwise the design is the same as repeatedly discussed before.

To show that this is still valuable change (even after SlabContext and 
adding doubly-linked list to AllocSet), I've repeated the test done by 
Andres in [1] using the test case described in [2], that is


  -- generate data
  SELECT COUNT(*) FROM (SELECT test1()
  FROM generate_series(1, 5)) foo;

  -- benchmark (measure time and VmPeak)
  SELECT COUNT(*) FROM (SELECT *
  FROM pg_logical_slot_get_changes('test', NULL,
NULL, 'include-xids', '0')) foo;

with different values passed to the first step (instead of the 5). 
The VmPeak numbers look like this:


 N   masterpatched
--
10   1155220 kB  361604 kB
20   2020668 kB  434060 kB
30   2890236 kB  502452 kB
40   3751592 kB  570816 kB
50   4621124 kB  639168 kB

and the timing (on assert-enabled build):

 N   masterpatched
--
10  1103.182 ms 412.734 ms
20  2216.711 ms 820.438 ms
30  3320.095 ms1223.576 ms
40  4584.919 ms1621.261 ms
50  5590.444 ms2113.820 ms

So it seems it's still a significant improvement, both in terms of 
memory usage and timing. Admittedly, this is a single test, so ideas of 
other useful test cases are welcome.


regards


[1] 
https://www.postgresql.org/message-id/20170227111732.vrx5v72ighehwpkf%40alap3.anarazel.de


[2] 
https://www.postgresql.org/message-id/20160706185502.1426.28143%40wrigleys.postgresql.org


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1c46d25ffa9bb104c415cba7c7b3a013958b6ab5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 14 Aug 2017 01:52:50 +0200
Subject: [PATCH] Generational memory allocator

This memory context is based on the assumption that the allocated chunks
have similar lifespan, i.e. that chunks allocated close from each other
(by time) will also be freed in close proximity, and mostly in the same
order. This is typical for various queue-like use cases, i.e. when
tuples are constructed, processed and then thrown away.

The memory context uses a very simple approach to free space management.
Instead of a complex global freelist, each block tracks a number
of allocated and freed chunks. The space released by freed chunks is not
reused, and once all chunks are freed (i.e. when nallocated == nfreed),
the whole block is thrown away. When the allocated chunks have similar
lifespan, this works very well and is extremely cheap.
---
 src/backend/replication/logical/reorderbuffer.c |  74 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/generation.c | 768 
 src/include/nodes/memnodes.h|   4 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   5 +
 7 files changed, 790 insertions(+), 79 deletions(-)
 create mode 100644 src/backend/utils/mmgr/generation.c

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5567bee..5309170 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -150,15 +150,6 @@ typedef struct ReorderBufferDiskChange
  */
 static const Size max_changes_in_memory = 4096;
 
-/*
- * We use a very simple form of a slab allocator for frequently allocated
- * objects, simply keeping a fixed number in a linked list when unused,
- * instead pfree()ing them. Without that in many workloads aset.c becomes a
- * major bottleneck, especially when spilling to disk while decoding batch
- * workloads.
- */
-static const Size max_cached_tuplebufs = 4096 * 2;	/* ~8MB */
-
 /* ---
  * primary reorderbuffer support routines
  * ---
@@ -248,6 +239,10 @@ ReorderBufferAllocate(void)
 			SLAB_DEFAULT_BLOCK_SIZE,
 			sizeof(ReorderBufferTXN));
 
+