On 2014-11-10 14:54:15 -0500, Robert Haas wrote: > On Mon, Nov 10, 2014 at 1:29 PM, Andres Freund <and...@2ndquadrant.com> wrote: > >> That's an issue to which we may need to engineer some solution, but > >> not in this patch. Overall, the patch's architecture is modeled > >> closely on the way we synchronize GUCs to new backends when using > >> EXEC_BACKEND, and I don't think we're going do any better than stating > >> with that and working to file down the rough edges as we go. So I'd > >> like to go ahead and commit this. > > > > Did you check out whether PGC_BACKEND GUCs work? Other than that I agree > > that we can just solve further issues as we notice them. This isn't > > particularly intrustive. > > Is the scenario you're concerned about this one: > > 1. Start postmaster. > 2. Start a user session. > 3. Change a PGC_BACKEND GUC in postgresql.conf. > 4. Reload. > 5. Launch a background worker that uses this code.
No, that's not what I was thinking of. Consider: export pg_libdir=$(pg_config --libdir) mkdir $pg_libdir/plugins ln -s $pg_libdir/auto_explain.so $pg_libdir/plugins/ PG_OPTIONS='-c local_preload_libraries=auto_explain' psql and use pg_background() (or something else using this infrastructure) in that context. Without having reread your relevant code I'd expect a: ERROR: 55P02: parameter "local_preload_libraries" cannot be set after connection start because the code would try to import the "user session's" local_preload_libraries values into the background process - after that actually already has done its initialization. > It should end up with the same values there as the active session, not > the current one from postgresql.conf. But we want to verify that's > the behavior we actually get. Right? But that's also something worthwile to check. > >> -- It doesn't handle types without send/receive functions. I thought > >> that was tolerable since the functions without such types seem like > >> mostly edge cases, but Andres doesn't like it. Apparently, BDR is > >> makes provisions to either ship the tuple as one big blob - if all > >> built-in types - or else use binary format where supported and text > >> format otherwise. Since this issue is common to BDR, parallelism, and > >> pg_background, we might want to introduce some common infrastructure > >> for it, but it's not too clear to me right now what that should look > >> like. > > > > I think we definitely need to solve this - but I'm also not at all that > > sure how. > > > > For something like pg_background it's pretty trivial to fall back to > > in/out when there's no send/recv. It's just a couple of lines, and the > > portal code has the necessary provisions. So solving it for > > pg_background itself is pretty trivial. > > That's not really the interesting part of the problem, I think. Yeah, > pg_background can be made to speak text format if we really care. But > the real issue is that even with the binary format, converting a tuple > in on-disk format into a DataRow message so that the other side can > reconstruct the tuple and shove it into an executor slot (or wherever > it wants to shove it) seems like it might be expensive. You might > have data on that; if it's not expensive, stop here. If it is > expensive, what we really want to do is figure out some way to make it > safe to copy the tuple into the shared memory queue, or send it out > over the socket, and have the process on the other end use it without > having to revalidate the whole thing column-by-column. Yes, sending the tuples as-is is noticeable win. I've not fully analyzed where the difference is - I'm suspect it's to a large degree because it requires less copying/memory allocations. But also some of the send/recv functions aren't exactly cheap in themselves. For BDR we've forgone problems around changing type definitions and such by saying only builtin types get to do the 'as-is' stuff. That can be expanded later, but that's it for now. For those there's no chance that the type definition changes or anything. A general problem is that you really can't allow this to happen outside a controlled setting - manipulating data on the 'Datum' level allows you to do bad things. > > But, as you say, pg_background > > itself isn't a primary goal (although a nice demonstration, with some > > real world applications). There's enough differences between the > > parallelism and the replication cases that I'm not entirely sure how > > much common ground there is. Namely replication has the additional > > concerns of version differences, trust (don't use blobs if you don't > > fully trust the other side), and differences in the allocated oids > > (especially relevant for arrays which, very annoyingly, embed the oid in > > the send/recv format). > > Yeah. It would be nice to use the same code for deciding what to do > in a particular case. It seems like it ought to be possible to have > one rule for whether a tuple with a given set of data types is safe to > ship in the on-disk format. For pg_background/parallelism, it's > enough if that function returns true; for BDR, there might be > additional criteria applied before deciding to do it that way. But > they could use the same decision tree for part of it. Yes, that'd be a good idea. > What would be even better is to find some way to MAKE IT SAFE to send > the undecoded tuple. I'm not sure what that would look like. How do you define 'SAFE' here? Safe against concurrent SQL level activity? Safe against receiving arbitrary data? I see little chance in being safe against the latter. But imo that's fine. Neither is WAL shipping. And that's perfectly fine for most usecases... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers