On Thu, Dec 5, 2013 at 4:56 AM, Andres Freund <and...@2ndquadrant.com> wrote: > Hi Robert, > > On 2013-12-04 23:32:27 -0500, Robert Haas wrote: >> But I'm also learning painfully that this kind of thing only goes so >> far. For example, I spent some time looking at what it would take to >> provide a dynamic shared memory equivalent of palloc/pfree, a facility >> that I feel fairly sure would attract a few fans. > > I have to admit, I don't fully see the point in a real palloc() > equivalent for shared memory. If you're allocating shared memory in a > granular fashion at somewhat high frequency you're doing something > *wrong*. And the result is not going to be scalable.
Why? Lots of people have written lots of programs that do just that. I agree it's overdone, but saying it should never happen seems like an over-rotation in the opposite direction. In any case, the application that led me to this was actually parallel internal sort. Let's suppose we had an implementation of parallel internal sort. You'd load all the tuples you want to sort into dynamic shared memory (just as we now load them into palloc'd chunks), start a bunch of workers, and the original backend and the newly-started workers would cooperate to sort your data. Win. It's fairly clear that, technically speaking, you do NOT need a real allocator for this. You can just shove the first tuple that you store into shared memory, say right up against the end of the region. Then you can place the next tuple immediately before it and the following one just before that, and so on. This is very simple to implement and also extremely memory-efficient, so superficially it's quite an appealing approach. But now let's suppose the input data was estimated to fit in work_mem but in the end did not, and therefore we need to instead do an external sort. That means we're going to start evicting tuples from memory and to make room for new tuples we read from the input stream. Well, now our strategy of tight-packing everything does not look so good, because we have no way of tracking which tuples we no longer need and reusing that space for new tuples. If external sort is not something we know how to do in parallel, we can potentially copy all of the tuples out of the dynamic shared memory segment and back to backend-private memory in palloc'd chunks, and then deallocate the dynamic shared memory segment... but that's potentially expensive if work_mem is large, and it temporarily uses double what we're allowed by work_mem. And suppose we want to do a parallel external sort with, say, the worker backend pushing tuples into a heap stored in the dynamic shared memory segment and other backends writing them out to tapes and removing them from the heap. You can't do that without some kind of space management, i.e. an allocator. > This is one of those times where live really, really would be easier if > we were using c++. Just have small smart-pointer wrapper, and we > wouldn't need to worry too much. Yes, a template class would be perfect here. >> I still have mixed feelings about the idea of same-address mappings. >> On the one hand, on 64-bit architectures, there's a huge amount of >> address space available. Assuming the OS does something even vaguely >> sensible in terms of laying out the text, heap, stack, and shared >> library mappings, there ought to be many petabytes of address space >> that never gets touched, and it's hard to see why we couldn't find >> some place in there to stick our stuff. > > It's actually quite a bit away from petabytes. Most x86-64 CPUs have > 48bit of virtual memory, with the OS eating up a far bit of that (so it > doesn't need to tear down TLBs in system calls). I think cross-platform > you end up with something around 8TB, up to 64TB on others OSs. > Now, there are some CPUs coming out with bigger virtual memory sizes, > but it's going to be longer than I am willing to wait till we are there. Oh. Well, that's a shame. >> Any thoughts on what the least painful compromise is here? > > I think a reasonable route is having some kind of smart-pointer on C > level that abstracts away the offset math and allows to use pointers > locally. Something like void *sptr_deref(sptr *); where the returned > pointer can be used as long as it is purely in memory. And > sptr_update(sptr *, void *); which allows a sptr to point elsewhere in > the same segment. > + lots of smarts Can you elaborate on this a little bit? I'm not sure I understand what you're getting at here. > I continue to believe that a) using pointers in dynamically allocated > segments is going to end up with lots of pain. b) the pain from not > having real pointers is manageable. Fair opinion, but I think we will certainly need to pass around memory offsets in some form for certain things. Even for purely internal parallel sort, the Tuplesort objects need to store the offset of the actual tuple within the segment. My first guess as to how to do that was to assume that sizeof(Size) == sizeof(void *) and just store offsets as a Size, but then I thought maybe it'd be better to do something like typedef Size relptr. And then I thought, boy, it sucks not to be able to declare what kind of a thing we're pointing *at* here, but apart from using C++ I see no solution to that problem. I guess we could do something like this: #define relptr(type) Size So the compiler wouldn't enforce anything, but at least notationally we'd know what sort of object we were supposedly referencing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers