On Thu, Sep 3, 2015 at 10:42 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > >> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> >>> Jeff Janes <jeff.ja...@gmail.com> writes: >>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> >> > But we would still have to deal with the >>> > fact that unconditional acquire attempt by the backends will cause a >>> vacuum >>> > to cancel itself, which is undesirable. >>> >>> Good point. >>> >>> > If we define a new namespace for >>> > this lock (like the relation extension lock has its own namespace) then >>> > perhaps the cancellation code could be made to not cancel on that >>> > condition. But that too seems like a lot of work to backpatch. >>> >>> We could possibly teach the autocancel logic to distinguish this lock >>> type >>> from others without using a new namespace. That seems a bit klugy, but >>> maybe better than adding a new namespace. (For example, there are >>> probably only a couple of modes in which we take page-level locks at >>> present. Choosing a currently unused, but self-exclusive, mode for >>> taking >>> such a lock might serve.) >>> >> >> Like the attached? (The conditional variant for user backends was >> unceremoniously yanked out.) >> > > A problem here is that now we have the user backends waiting on vacuum to > do the clean up, but vacuum is using throttled IO and so taking its sweet > time at it. Under the old code, the user backend could pass up the vacuum > while it was sleeping. > > Maybe we could have the vacuum detect when someone is waiting on it, and > temporarily suspend throttling and just run at full speed. I don't believe > there is any precedence for that, but I do think there are other places > where such a technique could be useful. That is kind of a scary change to > backpatch. > > I am running out of ideas. > Teodor published a patch in another thread: http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru but I thought it would be best to discuss it here. It is similar to my most recent patch. He removes the parts of the code that anticipates concurrent clean up, and replaces them with asserts, which I was too lazy to do until we have a final design. He uses a different lock mode (ExclusiveLock, rather than ShareUpdateExclusiveLock) when heavy-locking the metapage. It doesn't make a difference, as long as it is self-exclusive and no one else uses it in a way that causes false sharing (which is currently the case--the only other user of PageLocks is the hash index code) He always does conditional locks in regular backends. That means he doesn't have to hack the lmgr to prevent vacuum from canceling itself, the way I did. It also means there is not the "priority inversion" I mention above, where a user backend blocks on vacuum, but vacuum is intentionally throttling itself. On the other hand, using conditional locks for normal backends does mean that the size of the pending list can increase without limit, as there is nothing to throttle the user backends from adding tuples faster than they are cleaned up. Perhaps worse, it can pin down a vacuum worker without limit, as it keeps finding more pages have been added by the time it finished the prior set of pages. I actually do see this on my (not very realistic) testing. I think that for correctness, vacuum only needs to clean the part of the pending list which existed as of the time vacuum started. So as long as it gets that far, it can just be done even if more pages have since been added. I'm not sure the best way implement that, I guess you remember the blkno of the tail page from when you started, and would set a flag once you truncated away a page with that same blkno. That would solve the pinning down a vacuum worker for an unlimited amount of time issue, but would not solve the unlimited growth of the pending list issue. Cheers, Jeff