On Tue, Mar 21, 2017 at 7:37 PM, Peter Geoghegan <p...@bowt.ie> wrote: >>> Isn't that an essential part of having a refcount, in general? You >>> were the one that suggested refcounting. >> >> No, quite the opposite. My point in suggesting adding a refcount was >> to avoid needing to have a single owner. Instead, the process that >> decrements the reference count to zero becomes responsible for doing >> the cleanup. What you've done with the ref count is use it as some >> kind of medium for transferring responsibility from backend A to >> backend B; what I want is to allow backends A, B, C, D, E, and F to >> attach to the same shared resource, and whichever one of them happens >> to be the last one out of the room shuts off the lights. > > Actually, that's quite possible with the design I came up with.
I don't think it is. What sequence of calls do the APIs you've proposed would accomplish that goal? I don't see anything in this patch set that would permit anything other than a handoff from the worker to the leader. There seems to be no way for the ref count to be more than 1 (or 2?). > The > restriction that Thomas can't live with as I've left things is that > you have to know the number of BufFiles ahead of time. I'm pretty sure > that that's all it is. (I do sympathize with the fact that that isn't > very helpful to him, though.) I feel like there's some cognitive dissonance here. On the one hand, you're saying we should use your design. On the other hand, you are admitting that in at least one key respect, it won't meet Thomas's requirements. On the third hand, you just said that you weren't arguing for two mechanisms for sharing a BufFile across cooperating parallel processes. I don't see how you can hold all three of those positions simultaneously. >> As I've said before, I think that's an anti-goal. This is a different >> problem, and trying to reuse the solution we chose for the >> non-parallel case doesn't really work. resowner.c could end up owning >> a shared reference count which it's responsible for decrementing -- >> and then decrementing it removes the file if the result is zero. But >> it can't own performing the actual unlink(), because then we can't >> support cases where the file may have multiple readers, since whoever >> owns the unlink() might try to zap the file out from under one of the >> others. > > Define "zap the file". I think, based on your remarks here, that > you've misunderstood my design. I think you should at least understand > it fully if you're going to dismiss it. zap was a colloquialism for unlink(). I concede that I don't fully understand your design, and am trying to understand those things I do not yet understand. > It is true that a worker resowner can unlink() the files > mid-unification, in the same manner as with conventional temp files, > and not decrement its refcount in shared memory, or care at all in any > special way. This is okay because the leader (in the case of parallel > tuplesort) will realize that it should not "turn out the lights", > finding that remaining reference when it calls BufFileClose() in > registered callback, as it alone must. It doesn't matter that the > unlink() may have already occurred, or may be just about to occur, > because we are only operating on already-opened files, and never on > the link itself (we don't have to stat() the file link for example, > which is naturally only a task for the unlink()'ing backend anyway). > You might say that the worker only blows away the link itself, not the > file proper, since it may still be open in leader (say). Well, that sounds like it's counting on fd.c not to close the file descriptor at an inconvenient point in time and reopen it later, which is not guaranteed. > Thomas' design cannot reliably know how many segments there are in > workers in error paths, which necessitates his unlink()-ENOENT-ignore > hack. My solution is that workers/owners look after their own temp > segments in the conventional way, until they reach BufFileClose(), > which may never come if there is an error. The only way that clean-up > won't happen in conventional resowner.c-in-worker fashion is if > BufFileClose() is reached in owner/worker. BufFileClose() must be > reached when there is no error, which has to happen anyway when using > temp files. (Else there is a temp file leak warning from resowner.c.) > > This is the only way to avoid the unlink()-ENOENT-ignore hack, AFAICT, > since only the worker itself can reliably know how many segments it > has opened at every single instant in time. Because it's the owner! Above, you said that your design would allow for a group of processes to share access to a file, with the last one that abandons it "turning out the lights". But here, you are referring to it as having one owner - the "only the worker itself" can know the number of segments. Those things are exact opposites of each other. I don't think there's any problem with ignoring ENOENT, and I don't think there's any need for a process to know the exact number of segments in some temporary file. In a shared-ownership environment, that information can't be stored in a backend-private cache; it's got to be available to whichever backend ends up being the last one out. There are only two ways to do that. One is to store it in shared memory, and the other is to discover it from the filesystem. The former is conceptually more appealing, but it can't handle Thomas's requirement of an unlimited number of files, so I think it makes sense to go with the latter. The only problem with that which I can see is that we might orphan some temporary files if the disk is flaky and filesystem operations are failing intermittently, but that's already a pretty bad situation which we're not going to make much worse with this approach. >> I want the >> last process to detach to be responsible for cleanup, regardless of >> which process that ends up being. I want that for lots of good >> reasons which I have articulated including (1) it's how all other >> resource management for parallel query already works, e.g. DSM, DSA, >> and group locking; (2) it avoids the need for one process to sit and >> wait until another process assumes ownership, which isn't a feature >> even if (as you contend, and I'm not convinced) it doesn't hurt much; >> and (3) it allows for use cases where multiple processes are reading >> from the same shared BufFile without the risk that some other process >> will try to unlink() the file while it's still in use. The point for >> me isn't so much whether unlink() ever ignores errors as whether >> cleanup (however defined) is an operation guaranteed to happen exactly >> once. > > My patch demonstrably has these properties. I've done quite a bit of > fault injection testing to prove it. I don't understand this comment, because 0 of the 3 properties that I just articulated are things which can be proved or disproved by fault injection. Fault injection can confirm the presence of bugs or suggest their absence, but none of those properties have to do with whether there are bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers