Hi Paul,

As you know, I try my hardest to post well-researched comments to this
mailing list, and this time I fell short of that. Please accept my
apologies. Let me try and re-frame the problem, and respond to your
criticisms.

My point is: we need more public design discussions and review, and we
need those discussions to have a logical conclusion. I think the RFC,
coupled with more traffic on dev@, is the answer to that.

That said, I counted the number of comments on those 4 PRs from the
general public:

  * Clustered purge PR #1370 has 0 non-Cloudant comments on it.

  * PSE PR #496 has one comment from me asking you to write
    documentation (that I don't think landed). That's the only
    non-Cloudant post.

  * Replicator scheduler PR #470 has a number of community
    comments on it that resulted in a higher quality PR.

  ...and I'm not going to even attempt to recap the BigCouch
  mess, but a lot of non-Cloudant people were involved.

So 50% of the PRs were developed in the open, but they might as well
have happened on an IBM private repo. That's unfortunate.

There are a number of possible, valid explanations for why these PRs
were so unengaging, in my view. It may be a natural reflection of the
fact that those are the only people who are paid to take an interest in
the code. Or it may be that the PRs themselves are not as discoverable
as posts to the mailing list. Perhaps it's because big PRs are
intimidating and difficult to interpret to those who don't live and
breathe the CouchDB code base daily. I was wrong to say that there was
just one reason why this is the case.

But I don't think I am wrong to point out that something smells wrong
when features land without community comment on either the design or the
code itself. I do think it's fair to say that the mailing list
discussions for these features were minimal as compared to the
discussions that happened in the PRs, regardless of participant. (Your
PSE dev@ post got no responses, for instance. Maybe it's a bad example,
being a somewhat esoteric feature.)

Recent traffic on FDB and resharding proves to me that the ML is still a
valid venue to discuss proposals, and that these proposals are getting
better as a result of those things. The RFC is intended to be a cap to
those discussions, just a slightly more ritualised way of voting on the
discussion and writing up the result.

As to the PR side of things, because PRs go to notifications@, they are
largely ignored by the dev@ community. Subscribing to all of the GitHub
emails from all of the CouchDB repos is overwhelming. Even if you were
to filter that only to new PRs and forward them to dev@ somehow, it's
still a lot of emails to wade through, so I'm not sure that's a solution
to the problem. PRs that reference an RFC, though, could be the "happy
medium" that we need, and again a simple bot could help here.

As a PMC member, I feel it is my responsibility to try and steer more of
our community into these discussions, so that the best possible solution
can be reached. It's less about "Cloudant vs. non-Cloudant" and more
about serving the needs of our developer and user base.

(In fact, none of the feature proposals in this thread said anything to
the user@ mailing list - where we might have reached even more people
who could have informed the design phase of the work. Something to
consider.)

> Yes these were big PRs, and yes they took a long time to review. But
> there was plenty of time for anyone to do that review (and there were
> a number of non Cloudant people involved in these listed).

Being open for a long time, and helping people through reading the PR
are very different things. Again, not until recently did these PRs
start including top-level READMEs that helped people understand the code
involved. Nick's README on the replicator scheduler is a great example
of something very positive:

https://github.com/apache/couchdb/pull/470/files#diff-a3be920760d32aca56cc1d2b838d07ef

I feel the RFC could be the initial README.md, which would then be
supplemented by a short intro to how the code is written and actually
works. But one thing at a time ;)

> While I'm not sure about prototyping, I do think RFCs would help solve
> this problem. It definitely helps to know what the reason a PR even
> exists and maybe why various other approaches were discarded before
> starting to review it. I don't personally know of much prototyping
> related to these sorts of features. There's definitely evolution to
> them based on various restrictions and that is captured on our
> commits@ lists (obviously in a difficult to consume format post facto,
> but useful for anyone following along at least).

My comment on the prototyping was specifically with FDB in mind, where I
expect we will have multiple throwaway bits of code written to try and
determine how exactly we'll make it work. Those don't necessarily need
to be shared, but if they helped someone reach a decision, it could be
useful.

> Adding RFCs won't solve the issue that large features almost by
> definition have correspondingly large PRs that can be daunting to
> review. I do think having an RFC may make it easier, but I don't think
> its solving the problem as posed.

It's a two-pronged approach.

The RFC is intended to solve the design end of things, so that even if
we don't have more community members involved in the Pull Request review
process, they can at least rest assured that there was agreement on what
*should* be implemented. Those people *should* be able to ignore the PR
and not be surprised by what it contains when it lands, since we've got
a nice summary that was agreed to of what it actually will contain.

And, should they want to engage more fully with our development process,
they *should* be able to read documentation aimed at CouchDB developers
in the PR itself that explains how the feature was implemented. The RFC
can be a start on writing this README.md. (We do a very poor job on
this, and I will continue to harp about how hard it is to onboard new
CouchDB developers until it gets easier.)

Does this help?

-Joan

Reply via email to