Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011:
I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372
Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.

Looks like I missed this when it passed by before, and looks like Greg Stark may have missed the message on pgsql-docs that kicked this all off: http://archives.postgresql.org/message-id/4ddb64cb.7070...@2ndquadrant.com

I will happily accept that the description there may have suffered from me not using all of the terms optimally, and that the resulting commit could be improved. Some more feedback to get the description correct and useful would be much appreciated.

What I cannot agree with is that idea that the implementation details I suggested documenting should not be. There are extremely user-hostile things that can happen here, and that are unique to this command. Saying "this is too complicated for users to make heads or tails of" may very well be true in many cases, but I think it's not giving PostgreSQL users very much credit. And when problems with this happen, and I wouldn't have spent any time on this if they didn't, right now the only way to make heads or tails of it is to read the source code.

If the code was simple, quick, and had no failure modes, it would be fine to not describe it. This is complicated, the run time cannot be bounded, and it can ripple to nasty lock queue issues--at some impossible to predict future time, long after you start the creation. I don't have a good idea how to unload the potential foot gun. The best I could think of after being shot with it was describing how it fires.

And looping over the transactions one by one is purely an
implementation detail and uninteresting to users.

That particular suggestion came from me having a painful session I didn't want anyone else to ever go through again. By the end of that, this implementation detail felt like the most important missing piece of PostgreSQL documentation in the world to me--I'm too busy to send in doc patches describing things that I haven't been shot by.

To provide some more context, the server I ran into this on always has at least 2 reports that take 10 to 16 hours to run active. I happily kicked off a concurrent index build on a heavily bloated 1GB table whose indexes are typically >5GB, which I expect to take a few minutes given the small size involved. Six hours later, when I come back and discover it's still not done, I find a single lock waiting for a transaction to finish. Since I'm used to multiple locks forming into a tree structure, and I only see one, I expect I'm OK once that's done. Fine; I estimate how much time that report has left and leave for a bit.

Four hours later, I come back. That original transaction lock is gone. Now it's created a new one I didn't expect, moving onto the second oldest report active. I actually have six more hours to go still. This locking pattern is unique to this command, and if I'd had the slightest idea that it worked this way I'd have approached the rebuild differently.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to