Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Sat, Jun 18, 2016 at 3:43 AM, Tom Lanewrote: > DSM already exists, and for many purposes its lack of a > within-a-shmem-segment dynamic allocator is irrelevant; the same purpose > is served (with more speed, more reliability, and less code) by releasing > the whole DSM segment when no longer needed. The DSM segment effectively > acts like a memory context, saving code from having to account precisely > for every single allocation it makes. > > I grant that having a dynamic allocator added to DSM will support even > more use-cases. What I'm not convinced of is that we need a dynamic > allocator within the fixed-size shmem segment. Robert already listed some > reasons why that's rather dubious, but I'll add one more: any leak becomes > a really serious bug, because the only way to recover the space is to > restart the whole database instance. > Okay, if you say that DSM segments work the best for accumulating transient data that may be freed together when it becomes unnecessary at once, then I agree with that. My code is for long-living data that could be allocated and freed chunk by chunk. As if an extension wants to store more data and in more complicated fashion than fits to an ordinary dynahash with the HASH_SHARED_MEM flag. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Aleksey Demakov <adema...@gmail.com> writes: >> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> In my opinion, that's not going to fly. If I thought otherwise, I >>> would not have developed the DSM facility in the first place. > >> Essentially this is pessimizing for the lowest common denominator >> among OSes. > > You're right, but that doesn't mean that the community is going to take > much interest in an unportable replacement for code that already exists. Excuse me, what code already exists? As far as I understand, we compare the approach taken in my code against Robert's code that is not yet available to the community. Discussing DSM is beyond the point. My code might be smoothly hooked into the existing system from an extension module just with a couple of calls: RequestAddinShmemSpace() and ShmemInitStruct(). After that this extension might use my concurrent memory allocator and safe memory reclamation for implementing highly optimized concurrent data structures of their choice. E.g. concurrent data structures that I am going to add to the package in the future. All in all, currently this is not a replacement for anything. This is an experimental add-on and a food for thought for interested people. Integrating my code right into the core to replace anything there is a very remote possibility. I understand if it ever happens it would take very serious work and multiple iterations. > Especially not an unportable replacement that also needs sweeping > assumptions like "disciplined use of mmap in postgresql core and > extensions". You don't have to look further than the availability of > mmap to plperlu programmers to realize that that won't fly. (Even if > we threw all the untrusted PLs overboard, I believe plain old stdio > is willing to use mmap in many versions of libc.) > Sorry. I made a sloppy statement about mmap/munmap use. As correctly pointed out by Andres Freund, it is problematic. So the whole line about "disciplined use of mmap in postgresql core and extensions" goes away. Forget it. But the other techniques that I mentioned do not take such a special discipline. The corrected statement is that a single contiguous shared space is practically doable on many platforms with some effort. And this approach would make implementation of many shared data structures more efficient. Furthermore, I'd guess there is no much point to enable parallel query execution on a macbook. Or at least one wouldn't expect superb results from this anyway. I'd make a wild claim that users who would benefit from parallel queries or my concurrency work most of the time are the same users who run platforms that can support single address space. Thus if there is a solution that benefits e.g. 95% of target users then why refrain from it in the name of the other 5%? Should not the support of those 5% be treated as a lower-priority fallback, while the main effort be put on optimizing for 95-percenters? Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Sat, Jun 18, 2016 at 12:31 AM, Andres Freund <and...@anarazel.de> wrote: > On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote: >> Finally, it's possible to repeatedly mmap >> and munmap on portions of a contiguous address space providing >> a given addr argument for both of them. The last option might, of >> course, is susceptible to hijacking this portion of the address by an >> inadvertent caller of mmap with NULL addr argument. But probably >> this could be avoided by imposing a disciplined use of mmap in >> postgresql core and extensions. > > I don't think that's particularly realistic. malloc() uses mmap(NULL) > internally. And you can't portably mmap non-file backed memory from > different processes; you need something like tmpfs backed / posix shared > memory / for it. On linux you can do stuff like madvise(MADV_FREE), > which kinda helps. Oops. Agreed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
Sorry for unclear language. Late Friday evening in my place is to blame. On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <adema...@gmail.com> wrote: > On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote: >> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote: >>>> I expect that to be useful for parallel query and anything else where >>>> processes need to share variable-size data. However, that's different >>>> from this because ours can grown to arbitrary size and shrink again by >>>> allocating and freeing with DSM segments. We also do everything with >>>> relative pointers since DSM segments can be mapped at different >>>> addresses in different processes, whereas this would only work with >>>> memory carved out of the main shared memory segment (or some new DSM >>>> facility that guaranteed identical placement in every address space). >>>> >>> >>> I believe it would be perfectly okay to allocate huge amount of address >>> space with mmap on startup. If the pages are not touched, the OS VM >>> subsystem will not commit them. >> >> In my opinion, that's not going to fly. If I thought otherwise, I >> would not have developed the DSM facility in the first place. >> >> First, the behavior in this area is highly dependent on choice of >> operating system and configuration parameters. We've had plenty of >> experience with requiring non-default configuration parameters to run >> PostgreSQL, and it's all bad. I don't really want to have to tell >> users that they must run with a particular value of >> vm.overcommit_memory in order to run the server. Nor do I want to >> tell users of other operating systems that their ability to run >> PostgreSQL is dependent on the behavior their OS has in this area. I >> had a MacBook Pro up until a year or two ago where a sufficiently >> shared memory request would cause a kernel panic. That bug will >> probably be fixed at some point if it hasn't been already, but >> probably by returning an error rather than making it work. >> >> Second, there's no way to give memory back once you've touched it. If >> you decide to do a hash join on a 250GB inner table using a shared >> hash table, you're going to have 250GB in swap-backed pages floating >> around when you're done. If the user has swap configured (and more >> and more people don't), the operating system will eventually page >> those out, but until that happens those pages are reducing the amount >> of page cache that's available, and after it happens they're using up >> swap. In either case, the space consumed is consumed to no purpose. >> You don't care about that hash table any more once the query >> completes; there's just no way to tell the operating system that. If >> your workload follows an entirely predictable pattern and you always >> have about the same amount of usage of this facility then you can just >> reuse the same pages and everything is fine. But if your usage >> fluctuates I believe it will be a big problem. With DSM, we can and >> do explicitly free the memory back to the OS as soon as we don't need >> it any more - and that's a big benefit. >> > > Essentially this is pessimizing for the lowest common denominator > among OSes. Having a contiguous address space makes things so > much simpler that considering this case, IMHO, is well worth of it. > > You are right that this might highly depend on the OS. But you are > only partially right that it's impossible to give the memory back once > you touched it. It is possible in many cases with additional measures. > That is with additional control over memory mapping. Surprisingly, in > this case windows has the most straightforward solution. VirtualAlloc > has separate MEM_RESERVE and MEM_COMMIT flags. On various > Unix flavours it is possible to play with mmap MAP_NORESERVE > flag and madvise syscall. Finally, it's possible to repeatedly mmap > and munmap on portions of a contiguous address space providing > a given addr argument for both of them. The last option might, of > course, is susceptible to hijacking this portion of the address by an > inadvertent caller of mmap with NULL addr argument. But probably > this could be avoided by imposing a disciplined use of mmap in > postgresql core and extensions. > > Thus providing a single contiguous shared address space is doable. > The other question is how much it would buy. As for development > time of an allocator it is a clear win. In terms of easy passing direct > memory pointers between ba
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote: >>> I expect that to be useful for parallel query and anything else where >>> processes need to share variable-size data. However, that's different >>> from this because ours can grown to arbitrary size and shrink again by >>> allocating and freeing with DSM segments. We also do everything with >>> relative pointers since DSM segments can be mapped at different >>> addresses in different processes, whereas this would only work with >>> memory carved out of the main shared memory segment (or some new DSM >>> facility that guaranteed identical placement in every address space). >>> >> >> I believe it would be perfectly okay to allocate huge amount of address >> space with mmap on startup. If the pages are not touched, the OS VM >> subsystem will not commit them. > > In my opinion, that's not going to fly. If I thought otherwise, I > would not have developed the DSM facility in the first place. > > First, the behavior in this area is highly dependent on choice of > operating system and configuration parameters. We've had plenty of > experience with requiring non-default configuration parameters to run > PostgreSQL, and it's all bad. I don't really want to have to tell > users that they must run with a particular value of > vm.overcommit_memory in order to run the server. Nor do I want to > tell users of other operating systems that their ability to run > PostgreSQL is dependent on the behavior their OS has in this area. I > had a MacBook Pro up until a year or two ago where a sufficiently > shared memory request would cause a kernel panic. That bug will > probably be fixed at some point if it hasn't been already, but > probably by returning an error rather than making it work. > > Second, there's no way to give memory back once you've touched it. If > you decide to do a hash join on a 250GB inner table using a shared > hash table, you're going to have 250GB in swap-backed pages floating > around when you're done. If the user has swap configured (and more > and more people don't), the operating system will eventually page > those out, but until that happens those pages are reducing the amount > of page cache that's available, and after it happens they're using up > swap. In either case, the space consumed is consumed to no purpose. > You don't care about that hash table any more once the query > completes; there's just no way to tell the operating system that. If > your workload follows an entirely predictable pattern and you always > have about the same amount of usage of this facility then you can just > reuse the same pages and everything is fine. But if your usage > fluctuates I believe it will be a big problem. With DSM, we can and > do explicitly free the memory back to the OS as soon as we don't need > it any more - and that's a big benefit. > Essentially this is pessimizing for the lowest common denominator among OSes. Having a contiguous address space makes things so much simpler that considering this case, IMHO, is well worth of it. You are right that this might highly depend on the OS. But you are only partially right that it's impossible to give the memory back once you touched it. It is possible in many cases with additional measures. That is with additional control over memory mapping. Surprisingly, in this case windows has the most straightforward solution. VirtualAlloc has separate MEM_RESERVE and MEM_COMMIT flags. On various Unix flavours it is possible to play with mmap MAP_NORESERVE flag and madvise syscall. Finally, it's possible to repeatedly mmap and munmap on portions of a contiguous address space providing a given addr argument for both of them. The last option might, of course, is susceptible to hijacking this portion of the address by an inadvertent caller of mmap with NULL addr argument. But probably this could be avoided by imposing a disciplined use of mmap in postgresql core and extensions. Thus providing a single contiguous shared address space is doable. The other question is how much it would buy. As for development time of an allocator it is a clear win. In terms of easy passing direct memory pointers between backends this a clear win again. In terms of resulting performance, I don't know. This would take a few cycles on every step. You have a shared hash table. You cannot keep pointers there. You need to store offsets against the base address. Any reference would involve additional arithmetics. When these things add up, the net effect might become noticeable. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Fri, Jun 17, 2016 at 10:18 PM, Robert Haaswrote: > On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane wrote: > But I'm a bit confused about where it gets the bytes it wants to > manage. There's no call to dsm_create() or ShmemAlloc() anywhere in > the code, at least not that I could find quickly. The only way to get > shar_base set to a non-NULL value seems to be to call SharAttach(), > and if there's no SharCreate() where would we get that non-NULL value? > You are right, I just have to tidy up the initialisation code before publishing it. > I expect that to be useful for parallel query and anything else where > processes need to share variable-size data. However, that's different > from this because ours can grown to arbitrary size and shrink again by > allocating and freeing with DSM segments. We also do everything with > relative pointers since DSM segments can be mapped at different > addresses in different processes, whereas this would only work with > memory carved out of the main shared memory segment (or some new DSM > facility that guaranteed identical placement in every address space). > I believe it would be perfectly okay to allocate huge amount of address space with mmap on startup. If the pages are not touched, the OS VM subsystem will not commit them. > I've been a bit reluctant to put it out there > until we have a tangible application of the allocator working, for > fear people will say "that's not good for anything!". I'm confident > it's good for lots of things, but other people have been known not to > share my confidence. > This is what I've been told by Postgres Pro folks too. But I felt that this thing deserves to be shown to the community sooner rather than latter. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory
On Fri, Jun 17, 2016 at 9:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Aleksey Demakov <adema...@gmail.com> writes: >> I have some very experimental code to enable dynamic memory allocation >> of shared memory for postgresql backend processes. > > Um ... what's this do that the existing DSM stuff doesn't do? > It operates over a single large shared memory segment. Within this segment it lets alloc / free small chunks of memory from 16 bytes to 16 kilobytes. Chunks are carved out from fixed-size 32k blocks. Each block is used to allocate chunks of single size class. When a block is full, another block for a given size class is taken from the top shared segment. The goal is to support high levels of concurrency for alloc / free calls. Therefore the allocator is mostly non-blocking. Currently it uses Heller's lazy list algorithm to maintain block lists of a given size class, so it uses slocks once in a while, when a new block is added or removed. If this proves to cause scalability problems the Heller's list might be replaced with Maged Michael's lock-free list to make the whole allocator absolutely lock-free. Additionally it provides epoch-based memory reclamation facility that solves ABA-problem for lock-free algorithms. I am going to implement some lock-free algorithms (extendable hash-tables and probably skip lists) on top of this facility. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Experimental dynamic memory allocation of postgresql shared memory
Hi all, I have some very experimental code to enable dynamic memory allocation of shared memory for postgresql backend processes. The source code in the repository is not complete yet. Moreover it is not immediately useful by itself. However it might serve as the basis to implement higher-level features. Such as expanding hash-tables or other data structures to share data between backends. Ultimately it might be used for an in-memory data store usable via FDW interface. Despite such higher level features are not available yet the code anyway might be interesting for curious eyes. https://github.com/ademakov/sharena The first stage of this project was funded by Postgres Pro. Many thanks to this wonderful team. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using FDW AddForeignUpdateTargets for a hidden pseudo-column
A very quick and dirty hack I did in src/backend/optimizer/plan/initsplan.c (in 9.5.3): --- initsplan.c.orig2016-06-14 19:08:27.0 +0600 +++ initsplan.c 2016-06-14 19:10:55.0 +0600 @@ -185,9 +185,12 @@ if (IsA(node, Var)) { Var*var = (Var *) node; - RelOptInfo *rel = find_base_rel(root, var->varno); + RelOptInfo *rel; int attno = var->varattno; + if (var->varno == INDEX_VAR) + continue; + rel = find_base_rel(root, var->varno); if (bms_is_subset(where_needed, rel->relids)) continue; Assert(attno >= rel->min_attr && attno <= rel->max_attr); And then in my FDW I add the tuple id column like this: static void MyAddForeignUpdateTargets(Query *parsetree, RangeTblEntry *target_rte, Relation target_relation) { Var*var; TargetEntry *tle; /* Make a Var representing the desired value */ var = makeVar(INDEX_VAR, /* instead of parsetree->resultRelation,*/ target_relation->rd_att->natts + 1, INT8OID, -1, InvalidOid, 0); /* Wrap it in a resjunk TLE with the right name ... */ tle = makeTargetEntry((Expr *) var, list_length(parsetree->targetList) + 1, pstrdup(MY_FDW_TUPLE_ID), true); /* ... and add it to the query's targetlist */ parsetree->targetList = lappend(parsetree->targetList, tle); } I was able to run successfully a couple of very simple tests with these. This seems to indicate that tweaking the core to handle this case properly is doable. The question is if this approach is conceptually correct and if so what are the other required places to patch. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Using FDW AddForeignUpdateTargets for a hidden pseudo-column
Hi all, I have a data store where tuples have unique identities that normally are not visible. I also have a FDW to work with this data store. As per the docs to implement updates for this data store I have AddForeignUpdateTargets() function that adds an artificial column to the target list. It seems to me that I cannot re-use a system attribute number for this artificial resjunk column (as, for instance, postgres_fdw uses SelfItemPointerAttributeNumber). These attributes have specific meaning not compatible with my tuple identity. On other hand using a regular AttrNumber might confuse the query planner. In contrast e..g with Oracle FDW that can use a unique key to identify the row, in my data store the tuple identity is normally not visible. So the data planner might break if it sees a Var node with an unexpected varattno number. What is the best approach to handle such a case? 1. Give up on this entirely and require a unique key for any table used thru FDW. 2. Force the FDW to expose the identity column either explicitly by the user who creates a foreign table or automatically adding it in the corresponding trigger (preferably still making it hidden for normal scans). 3. Modify the postgresql core to nicely handle the case of an unknown target column added by a FDW. 4. Something else? Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories
Hi there, > On 23 Mar 2016, at 22:38, Tom Lanewrote: > Anybody want to bikeshed the directory name src/feutils? Maybe fe-utils > would be more readable. And where to put the corresponding header files? > src/include/fe-utils? For me “utils" sounds like something of auxiliary nature. if some pretty basic stuff is going to be added there, then I believe that fe_common or perhaps fe_shared would be a bit more suitable. Regards, Aleksey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers