Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

knizhnik Sun, 05 Jan 2014 10:30:03 -0800

From my point of view it is not a big problem that it is not possibleto place LWLock in DSM.I can allocate LWLocks in standard way - using RequestAddinLWLocks anduse them for synchronization.

Concerning support of huge pages - actually I do not think that itshould involve something more than just setting MAP_HUGETLB flag.Allocation of correspondent number of huge pages should be done bysystem administrator.

And what I still do not completely understand - how DSM enforces thatsegment created by one PosatgreSQL process will be mapped to the samevirtual memory address in all other PostgreSQL processes.As far as I understand right now (with standard PostgreSQL shared memorysegments) it is enforced by fork().Shared memory segments are allocated in one process and all otherprocesses are forked from this process inheriting this memory segments.

But if new DSM segment is allocated at during execution of some query,then we should add it to virtual space of all PostgreSQL processes. Evenif we somehow notify them all about presence of new segment, there isabsolutely no warranty that all of them can map this segment to thespecified memory address (it can be for some reasons already used bysome other shared object).Or may be DSM doesn't guarantee than DSM segment is mapped to the sameaddress in all processes?In this case it significantly complicates DSM usage: it will not bepossible to use direct pointers.

Can you clarify me please how dynamically allocated DSM segments will beshared by all PostgreSQL processes?



On 01/05/2014 08:50 PM, Robert Haas wrote:

On Sat, Jan 4, 2014 at 3:27 PM, knizhnik <knizh...@garret.ru> wrote:

1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
shared memory), like 9.2, 9.3.1,...

Yeah.  If it's loaded at postmaster start time, then it can work with
any version.  On 9.4+, you could possibly make it work even if it's
loaded on the fly by using the dynamic shared memory facilities.
However, there are currently some limitations to those facilities that
make some things you might want to do tricky.  There are pending
patches to lift some of these limitations.

2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
hash_search,...)
May be I missed something - I just noticed DSM and have no chance to
investigate it, but looks like hash table can not be allocated in DSM...

It wouldn't be very difficult to write an analog of ShmemInitHash() on
top of the dsm_toc patch that is currently pending.  A problem,
though, is that it's not currently possible to put LWLocks in dynamic
shared memory, and even spinlocks will be problematic if
--disable-spinlocks is used.  I'm due to write a post about these
problems; perhaps I should go do that.

3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
to provide own allocator (although creation of non-releasing memory
allocator should not be a big issue).

The dsm_toc infrastructure would solve this problem.

4. Current implementation of DSM still suffers from 256Gb problem. Certainly
I can create multiple segments and so provide workaround without using huge
pages, but it complicates allocator.

So it sounds like DSM should also support huge pages somehow.  I'm not
sure what that should look like.

5. I wonder if I dynamically add new DSM segment - will it be available for
other PostgreSQL processes? For example I run query which loads data in IMCS
and so needs more space and allocates new DSM segment. Then another query is
executed by other PostgreSQL process which tries to access this data. This
process is not forked from the process created this new DSM segment, so I do
not understand how this segment will be mapped to the address space of this
process, preserving address... Certainly I can prohibit dynamic extension of
IMCS storage (hoping that in this case there will be no such problem with
DSM). But in this case we will loose the main advantage of using DSM instead
of old schema of plugin's private shared memory.

You can definitely dynamically add a new DSM segment; that's the point
of making it *dynamic* shared memory.  What's a bit tricky as things
stand today is making sure that it sticks around.  The current model
is that the DSM segment is destroyed when the last process unmaps it.
It would be easy enough to lift that limitation on systems other than
Windows; we could just add a dsm_keep_until_shutdown() API or
something similar.  But on Windows, segments are *automatically*
destroyed *by the operating system* when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

6. IMCS has some configuration parameters which has to be set through
postgresql.conf. So in any case user has to edit postgresql.conf file.
In case of using DSM it will be not necessary to add IMCS to
shared_preload_libraries list. But I do not think that it is so restrictive
and critical requirement, is it?

I don't really see a problem here.  One of the purposes of dynamic
shared memory (and dynamic background workers) is precisely that you
don't *necessarily* need to put extensions that use shared memory in
shared_preload_libraries - or in other words, you can add the
extension to a running server without restarting it.  If you know in
advance that you will want it, you probably still *want* to put it in
shared_preload_libraries, but part of the idea is that we can get away
from requiring that.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

Reply via email to