Re: [HACKERS] Dynamic shared memory areas

2017-01-04 Thread Peter Geoghegan
On Tue, Nov 15, 2016 at 5:31 PM, Robert Haas  wrote:
> I think we should develop versions of this that (1) allocate from the
> main shared memory segment and (2) allocate from backend-private
> memory.  Per my previous benchmarking results, allocating from
> backend-private memory would be a substantial win for tuplesort.c
> because this allocator is substantially more memory-efficient for
> large memory contexts than aset.c, and Tomas Vondra tested it out and
> found that it is also faster for logical decoding than the approach he
> proposed.

The approach that I'd prefer to take with tuplesort.c is to have a
buffer for caller tuples that is written to sequentially, and
repalloc()'d as needed, much like the memtuples array. It would be
slightly tricky to make this work when memtuples needs to be a heap
(I'm mostly thinking of top-N heapsorts here). That has perhaps
unbeatable efficiency, while also helping cases with significant
physical/logical correlation in their input, which is pretty common.
Creating an index on a serial PK within pg_restore would probably get
notably faster if we went this way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-05 Thread Robert Haas
On Fri, Dec 2, 2016 at 3:46 PM, Thomas Munro
 wrote:
> Here's a patch to provide the right format string for dsa_pointer to
> printf-like functions, which clears a warning coming from dsa_dump (a
> debugging function) on 32 bit systems.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Thomas Munro
On Sat, Dec 3, 2016 at 9:02 AM, Robert Haas  wrote:
> On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas  wrote:
>> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
>>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>>>  wrote:
 Please find attached dsa-v8.patch, and also a small test module for
 running random allocate/free exercises and dumping the internal
 allocator state.
>>>
>>> OK, I've committed the main patch.
>>
>> ...but the buildfarm isn't very happy about it.
>>
>> tern complains:
>>
>> In file included from dsa.c:58:0:
>> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
>> 'pg_atomic_uint64'
>>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>>
>> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
>> to be true.  And that should always be true unless
>> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
>> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
>> pg_atomic_uint64?  That doesn't seem right.
>
> No, that's not the problem.  Just a garden variety thinko in dsa.h.
> Will push a fix presently.

Here's a patch to provide the right format string for dsa_pointer to
printf-like functions, which clears a warning coming from dsa_dump (a
debugging function) on 32 bit systems.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-dsa-pointer-format.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas
On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas  wrote:
> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>>  wrote:
>>> Please find attached dsa-v8.patch, and also a small test module for
>>> running random allocate/free exercises and dumping the internal
>>> allocator state.
>>
>> OK, I've committed the main patch.
>
> ...but the buildfarm isn't very happy about it.
>
> tern complains:
>
> In file included from dsa.c:58:0:
> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
> 'pg_atomic_uint64'
>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>
> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
> to be true.  And that should always be true unless
> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
> pg_atomic_uint64?  That doesn't seem right.

No, that's not the problem.  Just a garden variety thinko in dsa.h.
Will push a fix presently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas
On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas  wrote:
> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>  wrote:
>> Please find attached dsa-v8.patch, and also a small test module for
>> running random allocate/free exercises and dumping the internal
>> allocator state.
>
> OK, I've committed the main patch.

...but the buildfarm isn't very happy about it.

tern complains:

In file included from dsa.c:58:0:
../../../../src/include/utils/dsa.h:59:1: error: unknown type name
'pg_atomic_uint64'
 typedef pg_atomic_uint64 dsa_pointer_atomic;

...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
to be true.  And that should always be true unless
PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
pg_atomic_uint64?  That doesn't seem right.

The failures on several other BF members appear to be similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Robert Haas
On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
 wrote:
> Please find attached dsa-v8.patch, and also a small test module for
> running random allocate/free exercises and dumping the internal
> allocator state.

OK, I've committed the main patch.  As far as test-dsa.patch, can we
tie that into make check-world so that committing it delivers some
buildfarm coverage for this code?  Of course, the test settings would
have to be fairly conservative given that some buildfarm machines have
very limited resources, but it still seems worth doing.  test_shm_mq
might provide some useful precedent.

Note that you don't need the prototype if you've already used
PG_FUNCTION_INFO_V1.

I'm not sure that using the same random seed every time is a good
idea.  Maybe you should provide a way to set the seed as part of
starting the test, or to not do that (pass NULL?) and then elog(LOG,
...) the seed that's chosen.  Then if the BF crashes, we can see what
seed was in use for that particular test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-12-02 Thread Haribabu Kommi
On Thu, Dec 1, 2016 at 10:33 PM, Thomas Munro  wrote:

>
> Please find attached dsa-v8.patch, and also a small test module for
> running random allocate/free exercises and dumping the internal
> allocator state.


Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Dynamic shared memory areas

2016-11-29 Thread Robert Haas
More review:

+ * For large objects, we just stick all of the allocations in fullness class
+ * 0. Since we can just return the space directly to the free page manager,
+ * we don't really need them on a list at all, except that if someone wants
+ * to bulk release everything allocated using this BlockAreaContext, we
+ * have no other way of finding them.

This comment is out-of-date.

+   /*
+* If this is the only span, and there is no active
span, then maybe
+* we should probably move this span to fullness class
1.  (Otherwise
+* if you allocate exactly all the objects in the only
span, it moves
+* to class 3, then you free them all, it moves to 2,
and then is
+* given back, leaving no active span).
+*/

"maybe we should probably" seems to have one more doubt-expressing
word than it needs.

+   if (size_class == DSA_SCLASS_SPAN_LARGE)
+   /* Large object frees give back segments
aggressively already. */
+   continue;

We generally use braces in this kind of case.

+* Search the fullness class 1 only.  That is where we
expect to find

extra "the"

+   /* Call for effect (we don't need the result). */
+   get_segment_by_index(area, index);
...
+   return area->segment_maps[index].mapped_address + offset;

It isn't guaranteed that area->segment_maps[index].mapped_address will
be non-NULL on return from get_segment_by_index, and then this
function will return a completely bogus pointer to the caller.  I
think you should probably elog() instead.

+   elog(ERROR, "dsa: can't attach to area handle %u", handle);

Avoid ":" in elog messages.   You don't really need to - and it isn't
project style to - tag these with "dsa:"; that's what \errverbose or
\set VERBOSITY verbose is for.  In this particular case, I might just
adopt the formulation from parallel.c:

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("could not map dynamic shared
memory segment")));

+   elog(FATAL,
+"dsa couldn't find run of pages:
fpm_largest out of sync");

Here I'd go with "dsa could not find %u free pages".

+   elog(ERROR, "dsa_pin: area already pinned");

"dsa_area already pinned"

+   elog(ERROR, "dsa_unpin: area not pinned");

"dsa_area not pinned"

+   if (segment == NULL)
+   elog(ERROR, "dsa: can't attach to segment");

As above.

+static dsa_segment_map *
+get_segment_by_index(dsa_area *area, dsa_segment_index index)
+{
+   if (unlikely(area->segment_maps[index].mapped_address == NULL))
+   {
+   dsm_handle  handle;
+   dsm_segment *segment;
+   dsa_segment_map *segment_map;
+
+   handle = area->control->segment_handles[index];

Don't you need to acquire the lock for this?

+   /* Check all currently mapped segments to find what's
been freed. */
+   for (i = 0; i <= area->high_segment_index; ++i)
+   {
+   if (area->segment_maps[i].header != NULL &&
+   area->segment_maps[i].header->freed)
+   {
+   dsm_detach(area->segment_maps[i].segment);
+   area->segment_maps[i].segment = NULL;
+   area->segment_maps[i].header = NULL;
+   area->segment_maps[i].mapped_address = NULL;
+   }
+   }
+   area->freed_segment_counter = freed_segment_counter;

And this?

+/*
+ * Release a DSA area that was produced by dsa_create_in_place or
+ * dsa_attach_in_place.  It is preferable to use one of the 'dsa_on_XXX'
+ * callbacks so that this is managed automatically, because failure to release
+ * an area created in-place leaks its segments permanently.
+ */
+void
+dsa_release_in_place(void *place)
+{
+   decrement_reference_count((dsa_area_control *) place);
+}

Since this seems to be the only caller of decrement_reference_count,
you could just put the logic here.  The contract for this function is
also a bit unclear from the header comment.  I initially thought that
it was your intention that this should be called from every process
that has either created or attached the segment.  But that doesn't
seem like it will work, because decrement_reference_count calls
dsm_unpin_segment on every segment, and a segment can only be pinned
once, so everybody else would fail.  So maybe the idea is that ANY ONE
process has to call dsa_release_in_place.  But then that could lead to
failures in other backends inside get_segment_by_index(), because
segments they don't have mapped might already be gone.  OK, third try:
maybe the idea is that the LAST proce

Re: [HACKERS] Dynamic shared memory areas

2016-11-28 Thread Robert Haas
On Wed, Nov 23, 2016 at 7:07 AM, Thomas Munro
 wrote:
> Those let you create an area in existing memory (in a DSM segment,
> traditional inherited shmem).  The in-place versions will stlll create
> DSM segments on demand as required, though I suppose if you wanted to
> prevent that you could with dsa_set_size_limit(area, size).  One
> complication is that of course the automatic detach feature doesn't
> work if you're in some random piece of memory.  I have exposed
> dsa_on_dsm_detach, so that there is a way to hook it up to the detach
> hook for a pre-existing DSM segment, but that's the caller's
> responibility.

shm_mq_attach() made the opposite decision about how to solve this
problem, and frankly I think that API is a lot more convenient: if the
first argument to shm_mq_attach() happens to be located inside of a
DSM, you can pass the DSM as the second argument and it registers the
on_dsm_detach() hook for you.  If not, you can pass NULL and deal with
it in some other way.  But this makes the common case very simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic shared memory areas

2016-11-15 Thread Robert Haas
On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro
 wrote:
> On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
>  wrote:
>> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
>>  wrote:
>>> [dsa-v3.patch]
>>
>> Here is a new version which just adds CLOBBER_FREED_MEMORY support to 
>> dsa_free.
>
> Here is a new version that fixes a bug I discovered in freepage.c today.
>
> Details:  When dsa_free decides to give back a whole superblock back
> to the free page manager for a segment with FreePageManagerPut, and
> there was already exactly one span of exactly one free page in that
> segment, and the span being 'put' is not adjacent to that existing
> free page, then the singleton format must be converted to a btree with
> the existing page as root and the newly put span as the sole leaf.
> But in that special case we forgot to add the newly put span to the
> appropriate free list.  Not only did we lose track of it, but a future
> call to FreePageManagerPut might try to merge it with another adjacent
> span, which will try to manipulate the freelist that it expects it to
> be in and blow up.  The fix is just to add a call to
> FreePagePushSpanLeader in this corner case before the early return.

Since a lot of the design of this patch is mine - from my earlier work
on sb_alloc - I don't expect to have a ton of objections to it. And
I'd like to get it committed, because other parallelism work depends
on it (bitmap heap scan and parallel hash join in particular), and
because it's got other uses as well.  However, I don't want to be
perceived as slamming my code (or that of my colleagues) into the tree
without due opportunity for other people to provide feedback, so if
anyone has questions, comments, concerns, or review to offer, please
do.

I think we should develop versions of this that (1) allocate from the
main shared memory segment and (2) allocate from backend-private
memory.  Per my previous benchmarking results, allocating from
backend-private memory would be a substantial win for tuplesort.c
because this allocator is substantially more memory-efficient for
large memory contexts than aset.c, and Tomas Vondra tested it out and
found that it is also faster for logical decoding than the approach he
proposed.  Perhaps that's not an argument for holding up his proposed
patches for that problem, but I think it IS a good argument for
pressing forward with a backend-private version of this allocator.
I'm not saying that should be part of the initial commit of this code,
but I think it's a good direction to pursue.

One question that we need to resolve is where the code should live in
the source tree.  When I wrote the original patches upon which this
work was based, I think that I put all the code in
src/backend/utils/mmgr, since it's all memory-management code.  In
this patch, Thomas left the free page manager code there, but put the
allocator itself in src/backend/storage/ipc.  There's a certain logic
to that because dynamic shared areas (dsa.c) sit right next to dynamic
shared memory (dsm.c) but it feels a bit schizophrenic to have half of
the code in storage/ipc and the other half in utils/mmgr.  I guess my
view is that utils/mmgr is a better fit, because I think that this is
basically memory management code that happens to use shared memory,
rather than basically IPC that happens to be an allocator.  If we
decide that this stuff goes in storage/ipc then that's basically
saying that everything that uses dynamic shared memory is going to end
up in that directory, which seems like a mess.  The fact that the
free-page manager, at least, could be used for allocations not based
on DSM strengthens that argument in my view. Other opinions?

The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
which I believe I'm responsible, is ugly.  There is probably a
compiler out there that has __typeof__ but not
__builtin_types_compatible_p, and we could cater to that by adding a
separate configure test for __typeof__.  A little browsing of the
documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
that __builtin_types_compatible_p didn't exist before GCC 3.1, but
__typeof__ seems to be there even in 2.95.3.  That's not very
interesting, honestly, because 3.1 came out in 2002, but there might
be non-GCC compilers that implement __typeof__ but not
__builtin_types_compatible_p.  I am inclined not to worry about this
unless somebody feels otherwise, but it's not beautiful.

I wonder if it wouldn't be a good idea to allow the dsa_area_control
to be stored wherever instead of insisting that it's got to be located
inside the first DSM segment backing the pool itself.  For example,
you could make dsa_create_dynamic() take a third argument which is a
pointer to enough space for a dsa_area_control, and it would
initialize it in place.  Then you could make dsa_attach_dynamic() take
a pointer to that same structure instead of taking a dsa_handle.
Actually, I think dsa_handle goes away: it becomes the caller's
responsibility to

Re: [HACKERS] Dynamic shared memory areas

2016-10-05 Thread Dilip Kumar
On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro
 wrote:
> Here's a new version that does that.

While testing this patch I found some issue,

+ total_size = DSA_INITIAL_SEGMENT_SIZE;
+ total_pages = total_size / FPM_PAGE_SIZE;
+ metadata_bytes =
+ MAXALIGN(sizeof(dsa_area_control)) +
+ MAXALIGN(sizeof(FreePageManager)) +
+ total_pages * sizeof(dsa_pointer);
+ /* Add padding up to next page boundary. */
+ if (metadata_bytes % FPM_PAGE_SIZE != 0)
+ metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+ usable_pages =
+ (total_size - metadata_bytes) / FPM_PAGE_SIZE;

+ segment = dsm_create(total_size, 0);
+ dsm_pin_segment(segment);

Actually problem is that size of dsa_area_control is bigger than
DSA_INITIAL_SEGMENT_SIZE.
but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size.

(gdb) p sizeof(dsa_area_control)
$8 = 67111000
(gdb) p DSA_INITIAL_SEGMENT_SIZE
$9 = 1048576

In dsa-v1 problem was not exist because  DSA_MAX_SEGMENTS was 1024,
but in dsa-v2 I think it's calculated wrongly.

(gdb) p DSA_MAX_SEGMENTS
$10 = 16777216

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers