date:20111130

Re: [HACKERS] Word-smithing doc changes

2011-11-30 Thread Greg Smith

Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011:

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372
Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about shared locks on virtual transactions
identifiers we just talk about waiting for a transaction to complete.

Looks like I missed this when it passed by before, and looks like Greg
Stark may have missed the message on pgsql-docs that kicked this all
off:
http://archives.postgresql.org/message-id/4ddb64cb.7070...@2ndquadrant.com

I will happily accept that the description there may have suffered from
me not using all of the terms optimally, and that the resulting commit
could be improved. Some more feedback to get the description correct
and useful would be much appreciated.

What I cannot agree with is that idea that the implementation details I
suggested documenting should not be. There are extremely user-hostile
things that can happen here, and that are unique to this command.
Saying this is too complicated for users to make heads or tails of may
very well be true in many cases, but I think it's not giving PostgreSQL
users very much credit. And when problems with this happen, and I
wouldn't have spent any time on this if they didn't, right now the only
way to make heads or tails of it is to read the source code.

If the code was simple, quick, and had no failure modes, it would be
fine to not describe it. This is complicated, the run time cannot be
bounded, and it can ripple to nasty lock queue issues--at some
impossible to predict future time, long after you start the creation. I
don't have a good idea how to unload the potential foot gun. The best I
could think of after being shot with it was describing how it fires.

And looping over the transactions one by one is purely an
implementation detail and uninteresting to users.

That particular suggestion came from me having a painful session I
didn't want anyone else to ever go through again. By the end of that,
this implementation detail felt like the most important missing piece of
PostgreSQL documentation in the world to me--I'm too busy to send in doc
patches describing things that I haven't been shot by.

To provide some more context, the server I ran into this on always has
at least 2 reports that take 10 to 16 hours to run active. I happily
kicked off a concurrent index build on a heavily bloated 1GB table whose
indexes are typically 5GB, which I expect to take a few minutes given
the small size involved. Six hours later, when I come back and discover
it's still not done, I find a single lock waiting for a transaction to
finish.Since I'm used to multiple locks forming into a tree
structure, and I only see one, I expect I'm OK once that's done. Fine;
I estimate how much time that report has left and leave for a bit.

Four hours later, I come back. That original transaction lock is gone.
Now it's created a new one I didn't expect, moving onto the second
oldest report active. I actually have six more hours to go still. This
locking pattern is unique to this command, and if I'd had the slightest
idea that it worked this way I'd have approached the rebuild differently.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

57 matches

Mail list logo