Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Greg Stein
On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon  wrote:

> Hi all,
>
> I am an Apache Spark PMC,


You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop
with that terminology. The Foundation has about 200 PMCs, and you are a
member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
construct of the Foundation.

>...

> I am aware of the limited GitHub Actions resources that are shared
> across all projects in ASF,
> and many projects suffer from it. This issue significantly slows down the
> development cycle of
>  other projects, at least Apache Spark.
>

And the Foundation gets those build minutes for GitHub Actions provided to
us from GitHub and Microsoft, and we are thankful that they provide them to
the Foundation. Maybe it isn't all the build minutes that every group
wants, but that is what we have. So it is incumbent upon all of us to
figure out how to build more, with fewer minutes.

Say "thank you" to GitHub, please.

Regards,
-g


Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
[last reply for tonite; let others read; and after the next drink or three,
I shouldn't be replying...]

On Thu, Nov 6, 2014 at 11:38 PM, Matei Zaharia 
wrote:

> Alright, Greg, I think I understand how Subversion's model is different,
> which is that the PMC members are all full committers. However, I still
> think that the model proposed here is purely organizational (how the PMC
> and committers organize themselves), and in no way changes peoples'
> ownership or rights.


That was not my impression, when your proposal said that maintainers need
to provide "sign-off".

Okay. Now my next item of feedback starts here:


> Certainly the reason I proposed it was organizational, to make sure
> patches get seen by the right people. I believe that every PMC member still
> has the same responsibility for two reasons:
>
> 1) The PMC is actually what selects the maintainers, so basically this
> mechanism is a way for the PMC to make sure certain people review each
> patch.
>
> 2) Code changes are all still made by consensus, where any individual has
> veto power over the code. The maintainer model mentioned here is only meant
> to make sure that the "experts" in an area get to see each patch *before*
> it is merged, and choose whether to exercise their veto power.
>
> Let me give a simple example, which is a patch to the Spark core public
> API. Say I'm a maintainer in this API. Without the maintainer model, the
> decision on the patch would be made as follows:
>
> - Any committer could review the patch and merge it
> - At any point during this process, I (as the main expert on this) could
> come in and -1 it, or give feedback
> - In addition, any other committer beyond me is allowed to -1 this patch
>
> With the maintainer model, the process is as follows:
>
> - Any committer could review the patch and merge it, but they would need
> to forward it to me (or another core API maintainer) to make sure we also
> approve
> - At any point during this process, I could come in and -1 it, or give
> feedback
> - In addition, any other committer beyond me is still allowed to -1 this
> patch
>
> The only change in this model is that committers are responsible to
> forward patches in these areas to certain other committers. If every
> committer had perfect oversight of the project, they could have also seen
> every patch to their component on their own, but this list ensures that
> they see it even if they somehow overlooked it.
>
> It's true that technically this model might "gate" development in the
> sense of adding some latency, but it doesn't "gate" it any more than
> consensus as a whole does, where any committer (not even PMC member) can -1
> any code change. In fact I believe this will speed development by
> motivating the maintainers to be active in reviewing their areas and by
> reducing the chance that mistakes happen that require a revert.
>
> I apologize if this wasn't clear in any way, but I do think it's pretty
> clear in the original wording of the proposal. The sign-off by a maintainer
> is simply an extra step in the merge process, it does *not* mean that other
> committers can't -1 a patch, or that the maintainers get to review all
> patches, or that they somehow have more "ownership" of the component (since
> they already had the ability to -1). I also wanted to clarify another thing
> -- it seems there is a misunderstanding that only PMC members can be
> maintainers, but this was not the point; the PMC *assigns* maintainers but
> they can do it out of the whole committer pool (and if we move to
> separating the PMC from the committers, I fully expect some non-PMC
> committers to be made maintainers).
>

... and ends here.

All of that text is about a process for applying Vetoes. ... That is just
the wrong focus (IMO).

Back around 2000, in httpd, we ran into vetoes. It was horrible. The
community suffered. We actually had a face-to-face at one point, flying in
people from around the US, gathering a bunch of the httpd committers to
work through some basic problems. The vetoes were flying fast and furious,
and it was just the wrong dynamic. Discussion and consensus had been thrown
aside. Trust was absent. Peer relationships were ruined. (tho thankfully,
our personal relationships never suffered, and that basis helped us pull it
back together)

Contrast that with Subversion. We've had some vetoes, yes. But invariably,
MOST of them would really be considered "woah. -1 on that. let's talk".
Only a few were about somebody laying down the veto hammer. Outside those
few, a -1 was always about opening a discussion to fix a particular commit.

It looks like you are creating a process to apply vetoes. That seems
backwards.

It seems like you want a process to ensure that reviews are performed. IMO,
all committers/PMC members should begin as *trusted*. Why not? You've
already voted them in as committers/PMCers. So trust them. Trust.

And that leads to "trust, but verify". The review process. So how about
creating a w

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
On Thu, Nov 6, 2014 at 7:28 PM, Sandy Ryza  wrote:

> It looks like the difference between the proposed Spark model and the
> CloudStack / SVN model is:
> * In the former, maintainers / partial committers are a way of
> centralizing oversight over particular components among committers
> * In the latter, maintainers / partial committers are a way of giving
> non-committers some power to make changes
>

I can't speak for CloudStack, but for Subversion: yes, you're exactly
right, Sandy.

We use the "partial committer" role as a way to bring in new committers.
"Great idea, go work >there<, and have fun". Any PMC member can give a
single +1, and that new (partial) committer gets and account/access, and is
off and running. We don't even ask for a PMC vote (though, we almost always
have a brief discussion).

The "svnrdump" tool was written by a *Git* Google Summer of Code student.
He wanted a quick way to get a Subversion dumpfile from a remote
repository, in order to drop that into Git. We gave him commit access
directly into trunk/svnrdump, and he wrote the tool. Technically, he could
commit anywhere in our tree, but we just asked him not to, without a +1
from a PMC member.

Partial committers are a way to *include* people into the [coding]
community. And hopefully, over time, they grow into something more.

"Maintainers" are a way (IMO) to *exclude* people from certain commit
activity. (or more precisely: limit/restrict, rather than exclude)

You can see why it concerns me :-)

Cheers,
-g


Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
[ I'm going to try and pull a couple thread directions into this one, to
avoid explosion :-) ]

On Thu, Nov 6, 2014 at 6:44 PM, Corey Nolet  wrote:

Note: I'm going to use "you" generically; I understand you [Corey] are not
a PMC member, at this time.

+1 (non-binding) [for original process proposal]
>
> Greg, the first time I've seen the word "ownership" on this thread is in
> your message. The first time the word "lead" has appeared in this thread is
> in your message as well. I don't think that was the intent. The PMC and
> Committers have a
>

The word "ownership" is there, but with a different term. If you are a PMC
member, and *cannot* alter a line of code without another's consent, then
you don't "own" that code. Your ownership is subservient to another. You
are not a *peer*, but a second-class citizen at this point.

The term "maintainer" in this context is being used as a word for "lead".
The maintainers are a *gate* for any change. That is not consensus. The
proposal attempts to soften that, and turn it into an oligarchy of several
maintainers. But the simple fact is that you have "some" with the ability
to set direction, and those who do not. They are called "leaders" in most
contexts, but however you want to slice it... the dynamic creates people
with unequal commit ability.

But as the PMC member you *are* responsible for it. That is the very basic
definition of being a PMC member. You are responsible for "all things
Spark".

responsibility to the community to make sure that their patches are being
> reviewed and committed. I don't see in Apache's recommended bylaws anywhere
> that says establishing responsibility on paper for specific areas cannot be
> taken on by different members of the PMC. What's been proposed looks, to
> me, to be an empirical process and it looks like it has pretty much a
> consensus from the side able to give binding votes. I don't at all this
> model establishes any form of ownership over anything. I also don't see in
> the process proposal where it mentions that nobody other than the persons
> responsible for a module can review or commit code.
>

"where each patch to that component needs to get sign-off from at least one
of its maintainers"

That establishes two types of PMC members: those who require sign-off, and
those who don't. Apache is intended to be a group of peers, none "more
equal" than others.

That said, we *do* recognize various levels of merit. This is where you see
differences between committers, their range of access, and PMC members. But
when you hit the *PMC member* role, then you are talking about a legal
construct established by the Foundation. You move outside of community
norms, and into how the umbrella of the Foundation operates. PMC members
are individually responsible for all of the code under their purview, which
is then at the direction of the Foundation itself. I'll skip that
conversation, and leave it with the simple phrase: as a PMC member, you're
responsible for the whole codebase.

So following from that, anything that *restricts* your ability to work on
that code, is a problem.

In fact, I'll go as far as to say that since Apache is a meritocracy, the
> people who have been aligned to the responsibilities probably were aligned
> based on some sort of meric, correct? Perhaps we could dig in and find out
> for sure... I'm still getting familiar with the Spark community myself.
>

Once you are a PMC member, then there is no difference in your merit. Merit
ends. You're a PMC member, and that is all there is to it. Just because
Jane commits 1000 times per month, makes her no better than John who
commits 10/month. They are peers on the PMC and have equal rights and
responsibility to the codebase.

Historically, some PMCs have attempted to create variant levels within the
PMC, or create different groups and rights, or different partitions over
the code, and ... again, historically: it has failed. This is why Apache
stresses consensus. The failure modes are crazy and numerous when moving
away from that, into silos.

>...
On Thu, Nov 6, 2014 at 6:49 PM, Matei Zaharia 
 wrote:

> So I don't understand, Greg, are the partial committers committers, or are
> they not? Spark also has a PMC, but our PMC currently consists of all
> committers (we decided not to have a differentiation when we left the
> incubator). I see the Subversion partial committers listed as "committers"
> on https://people.apache.org/committers-by-project.html#subversion, so I
> assume they are committers. As far as I can see, CloudStack is similar.
>

PMC members are responsible for the code. They provide the oversight,
direction, and management. (they're also responsible for the community, but
that distinction isn't relevant in this contrasting example)

Committers can make changes to the code, with the
acknowledgement/agreement/direction of the PMC.

When these groups are equal, like Spark, then things are pretty simple.

But many communities in Apache define them as dispa

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
Partial committers are people invited to work on a particular area, and
they do not require sign-off to work on that area. They can get a sign-off
and commit outside that area. That approach doesn't compare to this
proposal.

Full committers are PMC members. As each PMC member is responsible for
*every* line of code, then every PMC member should have complete rights to
every line of code. Creating disparity flies in the face of a PMC member's
responsibility. If I am a Spark PMC member, then I have responsibility for
GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
interposing a barrier inhibits my responsibility to ensure GraphX is
designed, maintained, and delivered to the Public.

Cheers,
-g

(and yes, I'm aware of COMMITTERS; I've been changing that file for the
past 12 years :-) )

On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell  wrote:

> In fact, if you look at the subversion commiter list, the majority of
> people here have commit access only for particular areas of the
> project:
>
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
>
> On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell 
> wrote:
> > Hey Greg,
> >
> > Regarding subversion - I think the reference is to partial vs full
> > committers here:
> > https://subversion.apache.org/docs/community-guide/roles.html
> >
> > - Patrick
> >
> > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein  wrote:
> >> -1 (non-binding)
> >>
> >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> >> to be severely frowned up. This creates *unequal* ownership of the
> >> codebase.
> >>
> >> Each Member of the PMC should have *equal* rights to all areas of the
> >> codebase until their purview. It should not be subjected to others'
> >> "ownership" except throught the standard mechanisms of reviews and
> >> if/when absolutely necessary, to vetos.
> >>
> >> Apache does not want "leads", "benevolent dictators" or "assigned
> >> maintainers", no matter how you may dress it up with multiple
> >> maintainers per component. The fact is that this creates an unequal
> >> level of ownership and responsibility. The Board has shut down
> >> projects that attempted or allowed for "Leads". Just a few months ago,
> >> there was a problem with somebody calling themself a "Lead".
> >>
> >> I don't know why you suggest that Apache Subversion does this. We
> >> absolutely do not. Never have. Never will. The Subversion codebase is
> >> owned by all of us, and we all care for every line of it. Some people
> >> know more than others, of course. But any one of us, can change any
> >> part, without being subjected to a "maintainer". Of course, we ask
> >> people with more knowledge of the component when we feel
> >> uncomfortable, but we also know when it is safe or not to make a
> >> specific change. And *always*, our fellow committers can review our
> >> work and let us know when we've done something wrong.
> >>
> >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> >> project ownership, and creates a more open and inviting project.
> >>
> >> So again: -1 on this entire concept. Not good, to be polite.
> >>
> >> Regards,
> >> Greg Stein
> >> Director, Vice Chairman
> >> Apache Software Foundation
> >>
> >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >>> Hi all,
> >>>
> >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>>
> >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>>
> >>> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component n

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Greg Stein
-1 (non-binding)

This is an idea that runs COMPLETELY counter to the Apache Way, and is
to be severely frowned up. This creates *unequal* ownership of the
codebase.

Each Member of the PMC should have *equal* rights to all areas of the
codebase until their purview. It should not be subjected to others'
"ownership" except throught the standard mechanisms of reviews and
if/when absolutely necessary, to vetos.

Apache does not want "leads", "benevolent dictators" or "assigned
maintainers", no matter how you may dress it up with multiple
maintainers per component. The fact is that this creates an unequal
level of ownership and responsibility. The Board has shut down
projects that attempted or allowed for "Leads". Just a few months ago,
there was a problem with somebody calling themself a "Lead".

I don't know why you suggest that Apache Subversion does this. We
absolutely do not. Never have. Never will. The Subversion codebase is
owned by all of us, and we all care for every line of it. Some people
know more than others, of course. But any one of us, can change any
part, without being subjected to a "maintainer". Of course, we ask
people with more knowledge of the component when we feel
uncomfortable, but we also know when it is safe or not to make a
specific change. And *always*, our fellow committers can review our
work and let us know when we've done something wrong.

Equal ownership reduces fiefdoms, enhances a feeling of community and
project ownership, and creates a more open and inviting project.

So again: -1 on this entire concept. Not good, to be polite.

Regards,
Greg Stein
Director, Vice Chairman
Apache Software Foundation

On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as 
> call for an official vote on it on a public list. Basically, as the Spark 
> project scales up, we need to define a model to make sure there is still 
> great oversight of key components (in particular internal architecture and 
> public APIs), and to this end I've proposed implementing a maintainer model 
> for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had 
> over 80 contributors/month for the past 3 months, which I believe makes us 
> the most active project in contributors/month at Apache, as well as over 500 
> patches/month. The codebase has also grown significantly, with new libraries 
> for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to 
> assign "maintainers" to oversee key components, where each patch to that 
> component needs to get sign-off from at least one of its maintainers. Most 
> existing large projects do this -- at Apache, some large ones with this model 
> are CloudStack (the second-most active project overall), Subversion, and 
> Kafka, and other examples include Linux and Python. This is also by-and-large 
> how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding 
> architecture and API. This process would ensure that the component's 
> maintainers see all proposed changes and consider them to fit together in a 
> good way.
> 
> 2) More structure for new contributors and committers -- in particular, it 
> would be easy to look up who???s responsible for each module and ask them for 
> reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only 
> applies to certain key components (e.g. scheduler, shuffle) and user-facing 
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
> if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one 
> of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable 
> committers on that component by the PMC. The PMC can vote to add / remove 
> maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their 
> components, though they do not need to be the main reviewers for them (e.g. 
> they might just sign off on architecture / API). To prevent inactive 
> maintainers from blocking the project, if a maintainer isn't responding in a 
> reasonable time period (say 2 weeks), other committers