On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.

You are right that this might highly depend on the OS. But you are
only partially right that it's impossible to give the memory back once
you touched it. It is possible in many cases with additional measures.
That is with additional control over memory mapping. Surprisingly, in
this case windows has the most straightforward solution. VirtualAlloc
has separate MEM_RESERVE and MEM_COMMIT flags. On various
Unix flavours it is possible to play with mmap MAP_NORESERVE
flag and madvise syscall. Finally, it's possible to repeatedly mmap
and munmap on portions of a contiguous address space providing
a given addr argument for both of them. The last option might, of
course, is susceptible to hijacking this portion of the address by an
inadvertent caller of mmap with NULL addr argument. But probably
this could be avoided by imposing a disciplined use of mmap in
postgresql core and extensions.

Thus providing a single contiguous shared address space is doable.
The other question is how much it would buy. As for development
time of an allocator it is a clear win. In terms of easy passing direct
memory pointers between backends this a clear win again.

In terms of resulting performance, I don't know. This would take
a few cycles on every step. You have a shared hash table. You
cannot keep pointers there. You need to store offsets against the
base address. Any reference would involve additional arithmetics.
When these things add up, the net effect might become noticeable.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to