Re: [HACKERS] a raft of parallelism-related bug fixes

Robert Haas Mon, 08 Feb 2016 12:25:25 -0800

On Mon, Feb 8, 2016 at 2:48 PM, Peter Geoghegan <p...@heroku.com> wrote:
> FWIW, I appreciate your candor. However, I think that you could have
> done a better job of making things easier for reviewers, even if that
> might not have made an enormous difference. I suspect I would have not
> been able to get UPSERT done as a non-committer if it wasn't for the
> epic wiki page, that made it at least possible for someone to jump in.


I'm not going to argue with the proposition that it could have been
done better.  Equally, I'm going to disclaim having the ability to
have done it better.  I've been working on this for three years, and
most of the work that I've put into it has gone into tinkering with C
code that was not in any way user-testable.  I've modified essentially
every major component of the system.  We had a shared memory facility;
I built another one.  We had background workers; I overhauled them.  I
invented a message queueing system, and then layered a modified
version of the FE/BE protocol on top of that message queue, and then
later layered tuple-passing on top of that same message queue and then
invented a bespoke protocol that is used to handle typemod mapping.
We had a transaction system; I made substantial, invasive
modifications to it.  I tinkered with the GUC subsystem, the combocid
system, and the system for loading loadable modules.  Amit added read
functions to a whole class of nodes that never had them before and
together we overhauled core pieces of the executer machinery.  Then I
hit the planner with hammer.  Finally there's this patch, which
affects heavyweight locking and deadlock detection.  I don't believe
that during the time I've been involved with this project anyone else
has ever attempted a project that required changing as many subsystems
as this one did - in some cases rather lightly, but in a number of
cases in pretty significant, invasive ways.  No other project in
recent memory has been this invasive to my knowledge.  Hot Standby
probably comes closest, but I think (admittedly being much closer to
this work than I was to that work) that this has its fingers in more
places.  So, there may be a person who knows how to do all of that
work and get it done in a reasonable time frame and also knows how to
make sure that everybody has the opportunity to be as involved in the
process as they want to be and that there are no bugs or controversial
design decisions, but I am not that person.  I am doing my best.

> To be more specific, I thought it was really hard to test parallel
> sequential scan a few months ago, because there was so many threads
> and so many dependencies. I appreciate that we now use git
> format-patch patch series for complicated stuff these days, but it's
> important to make it clear how everything fits together. That's
> actually what I was thinking about when I said we need to be clear on
> how things fit together from the CF app patch page, because there
> doesn't seem to be a culture of being particular about that, having
> good "annotations", etc.

I agree that you had to be pretty deeply involved in that thread to
follow everything that was going on.  But it's not entirely fair to
say that it was impossible for anyone else to get involved.   Both
Amit and I, mostly Amit, posted directions at various times saying:
here is the sequence of patches that you currently need to apply as of
this time.  There was not a heck of a lot of evidence that anyone was
doing that, though, though I think a few people did, and towards the
end things changed very quickly as I committed patches in the series.
We certainly knew what each other were doing and not because of some
hidden off-list collaboration that we kept secret from the community -
we do talk every week, but almost all of our correspondence on those
patches was on-list.

I think it's an inherent peril of complicated patch sets that people
who are not intimately involved in what is going on will have trouble
following just because it takes a lot of work.  Is anybody here
following what is going on on the postgres_fdw join pushdown thread?
There's only one patch to apply there right now (though there have
been as many as four at times in the past) and the people who are
actually working on it can follow along, but I'm not a bit surprised
if other people feel lost.  It's hard to think that the cause of that
is anything other than "it's hard to find the time to get invested in
a patch that other people are already working hard and apparently
diligently on, especially if you're not personally interested in
seeing that patch get committed, but sometimes even if you are".  For
example, I really want the work Fabien and Andres are doing on the
checkpointer to get committed this release.  I am reading the emails,
but I haven't tried the patches and I probably won't.  I don't have
time to be that involved in every patch.  I'm trusting that whatever
Andres commits - which will probably be a whole lot more complex than
what Fabien initially did - will be the right thing to commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a raft of parallelism-related bug fixes

Reply via email to