Re: [gentoo-dev] Killing herds, again

2019-04-14 Thread Mart Raudsepp
Ühel kenal päeval, K, 03.04.2019 kell 19:35, kirjutas Michał Górny:
> My goal here is to make sure that we have clear and correct
> information
> about package maintainers.  Most notable, if a package has no active
> maintainer, we really need to have 'up for grabs' issued and package
> marked as maintainer-needed, rather than hidden behind some project
> whose members may not even be aware of the fact that they're its
> maintainers.
> 
> 
> What do you think?

I agree with most of what was written in the original post, but
regarding this point I'd hate to see packages maintained by a project
that makes sense to be thrown into a generic maintainer-needed.
We need a maintainer-needed project status, and just improve the
tooling to notify this. So similar to what Alec said, but I think it's
fine to throw them into the generic bucket when the maintaining project
was indeed just herd-like.
But I don't think we should do that for projects where it makes sense
to have the packages grouped under a project. A recent example was/is
MATE; an older example is maybe enlightenment.
We could mark projects as maintainer-needed and have a script even add
 comments in metadata.xml (and a matching
script to remove those, once the project status changes); have any
maintainer-needed list generation consider with packages marked as
maintained by such a maintainer-needed project (with some potential
grouping of them together in the list), etc.
This could then also be used to signify huge staffing-needs, even if
someone is sort-of trying to take care of them within that project.

But, as said, I agree with almost everything else mentioned in the
thread starter.


Mart


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] Killing herds, again

2019-04-04 Thread Michał Górny
On Wed, 2019-04-03 at 18:52 -0400, Alec Warner wrote:
> On Wed, Apr 3, 2019 at 1:36 PM Michał Górny  wrote:
> > Less structured projects often have problems tracking member activity.
> > More than once a project effectively died when all members became
> > inactive, yet effectively hid the fact that the relevant packages were
> > unmaintained and sometimes discouraged more timid developers from fixing
> > bugs.
> > 
> 
> I'm not sure I follow this logic.
> 
> 1) We know who is in every project.
> 2) We know the state of every developer.
> 
> We should be able to detect if:
> 
> a) A project has empty.
> b) A project has no active developers.
> 
> I don't see how this is markedly different from a package, assigned to no
> maintainer. Or a package assigned to a maintainer who is not active.

I'm afraid it's not that simple.  Just because someone is active
as a Gentoo developer doesn't mean that person is active in this
specific project.  There's no simple way to know that, and I don't think
we should try to figure out complex ways of doing that as they are
capable of causing more harm than good with false positives.

There is also a deeper problem to this.  I believe that having one
personally listed as the maintainer (and especially as the sole
maintainer) makes that person feel more responsible about the package
and its state.  If the developer in question doesn't want to maintain it
anymore, 'up for grabs' are quite likely.

On the other hand, I honestly doubt most of the projects regularly look
for unmaintained packages.  If a project's developer loses interest
in a particular package, the developer can easily assume someone else
will take care of it now.  Which sometimes happens and sometime does
not.

There is also another problem with assumptions I myself make
as an Undertaker.  When a developer is inactive and package reassignment
happens, I usually don't remove that developer from all projects,
and instead just look for projects that have no other developers.  After
all, the purpose is to make sure packages aren't left unmaintained,
and removing developer from active projects doesn't serve that goal. 
However, as you can see this way I can easily end up leaving a project
consisting solely of inactive developers because I lack the necessary
context.

> So the solution here seems to be to fix the tools to detect this situation
> and make it clearer that the package has no active maintainer?

Better looks are always welcome, and I say thankya if you can deliver
them.  However, as I said they won't solve the problem completely.

[...]

> > Firstly, they usually tend to include packages that none of the project
> > members is actually interested in maintaining.  This also includes
> > packages added by other developers (let's shove it in here, it matches
> > their job description!) or packages leftover from other developers
> > (where the project was backup maintainer).  This means having a lot of
> > packages that seem to have a maintainer but actually don't.
> > 
> 
> I have a lot of empathy for this point FWIW. Tooling can find empty /
> abandoned projects, but we cannot do things like clearly say "This package
> shouldn't be in this project"
> or "This package is not actually maintained by a project".
> 
> One rule we might use here is that packages always need at least a single
> human maintainer, and the project just an annotation; but doesn't affected
> maintainer status.
> So e.g. if there are 8 competing cron implementations, "cron-team" can't
> maintain all 8, they have to find individual humans to vouch for each[0].

I don't think we want to impose this on all projects since --
as I mentioned -- many of them make sense as co-maintaining arbitrary
packages.  And then, I don't think we can really impose it on only some
of the projects.

> > Secondly, they frequently lack proper structure and handling of leaving
> > members.  Therefore, whenever a member maintaining a specific set of
> > packages leaves, it is possible that the number of not-really-maintained
> > packages increases.
> > 
> > Thirdly, they tend to degenerate and become defunct (much more than
> > projects that make sense).  Then, the number of not-really-maintained
> > packages ends up being really high.
> > 
> > My goal here is to make sure that we have clear and correct information
> > about package maintainers.  Most notable, if a package has no active
> > maintainer, we really need to have 'up for grabs' issued and package
> > marked as maintainer-needed, rather than hidden behind some project
> > whose members may not even be aware of the fact that they're its
> > maintainers.
> > 
> > What do you think?
> > 
> > 
> [0] This is itself a question the project needs to decide for itself; does
> every package need to be maintained actively? Some might answer no, and
> maybe running for months / years without a maintainer is OK for Gentoo. Its
> not an opinion I personally hold, but I suspect some community members do
> hold it. 

Re: [gentoo-dev] Killing herds, again

2019-04-03 Thread Alec Warner
On Wed, Apr 3, 2019 at 1:36 PM Michał Górny  wrote:

> Hello, everyone.
>
> Back in 2016, we've killed the technical representation of herds.  Some
> of them were disbanded completely, others merged with existing projects
> or converted into new projects.  This solved some of the problems with
> maintainer declarations but it didn't solve the most important problem
> herds posed.  Sadly, it seems that the spirit of herds survived along
> with those problems.
>
> Herds served as a method of grouping packages by a common topic,
> somewhat similar (but usually more broadly) than categories.  In their
> mature state, herds had either their specific maintainers, or were
> directly connected to projects (which in turn provided maintainers for
> the herds).  Today, we still have many herds that are masked either
> as complete projects, or semi-projects (i.e. project entries without
> explicit lead, policies or anything else).
>
>
> What's wrong with herds?
> 
> The main problem with herds is that they represent an artificial
> relation between packages.  The only common thing about them is topic,
> and there is no real reason why a group of people would maintain all
> packages regarding the same topic.  In fact, it is absurd -- say, why
> would a single person maintain 10+ competing cron implementations?
> Surely, there is some common knowledge related to running cron,
> and it is entirely possible that a single person would use a few
> different cron implementations on different systems.  But that doesn't
> justify creating an artificial project to maintain all cron
> implementations.
>
> Mapping this to reality, projects usually represent a few developers,
> each of them interested in a specific subset of packages maintained by
> the project.  In some cases, this is explicitly noted as project member
> roles; in other, it is not stated clearly anywhere.  In both cases,
> there is usually some group of packages that are assigned to
> the specific project but not maintained by any of the project members.
>
> Less structured projects often have problems tracking member activity.
> More than once a project effectively died when all members became
> inactive, yet effectively hid the fact that the relevant packages were
> unmaintained and sometimes discouraged more timid developers from fixing
> bugs.
>

I'm not sure I follow this logic.

1) We know who is in every project.
2) We know the state of every developer.

We should be able to detect if:

a) A project has empty.
b) A project has no active developers.

I don't see how this is markedly different from a package, assigned to no
maintainer. Or a package assigned to a maintainer who is not active.

So the solution here seems to be to fix the tools to detect this situation
and make it clearer that the package has no active maintainer?

(I tend to agree with your general thrust of the rest of the proposal, but
I think in general limiting how projects are used on a statutory basis
seems incorrect.)


>
>
> What kind of projects make sense?
> -
> If we are to fight herd-like projects, I think it is important to
> consider a bit what kind of projects make sense, and what form herd-like
> trouble.
>
> The two projects maintaining the largest number of packages in Gentoo
> are respectively the Perl project and the Python project.  Strictly
> speaking, both could be considered herd-like -- after all, they maintain
> a lot of packages belonging to the same category.  To some degree, this
> is true.  However, I believe those make sense because:
>
> a. They maintain a central group of packages, eclasses, policies etc.
> related to writing ebuilds using the specific programming language,
> and help other developers with it.  The existence of such a project is
> really useful.
>
> b. The packages maintained by them have many common properties,
> frequently come from common sources (CPAN, pypi) and that makes it
> possible for a large number of developers to actually maintain all
> of them.
>
> The Python project I know better, so I'll add something.  It does not
> accept all Python packages (although some developers insist on adding us
> to them without asking), and especially not random programs written in
> the Python language.  It specifically focuses on Python module packages,
> i.e. resources generally useful to Python programmers.  This is what
> makes it different from a common herd project.
>
> The third biggest project in Gentoo is -- in my opinion -- a perfect
> example of a problematic herd-project.  The games project maintains
> a total of 877 packages, and sad to say many are in a really bad shape.
> Even if we presumed all developers were active, this gives us 175
> packages per person, and I seriously doubt one person can actively
> maintain that many programs.  Add to that the fact that many of them are
> proprietary and fetch-restricted, and only the people possessing a copy
> can maintain it, and you 

[gentoo-dev] Killing herds, again

2019-04-03 Thread Michał Górny
Hello, everyone.

Back in 2016, we've killed the technical representation of herds.  Some
of them were disbanded completely, others merged with existing projects
or converted into new projects.  This solved some of the problems with
maintainer declarations but it didn't solve the most important problem
herds posed.  Sadly, it seems that the spirit of herds survived along
with those problems.

Herds served as a method of grouping packages by a common topic,
somewhat similar (but usually more broadly) than categories.  In their
mature state, herds had either their specific maintainers, or were
directly connected to projects (which in turn provided maintainers for
the herds).  Today, we still have many herds that are masked either
as complete projects, or semi-projects (i.e. project entries without
explicit lead, policies or anything else).


What's wrong with herds?

The main problem with herds is that they represent an artificial
relation between packages.  The only common thing about them is topic,
and there is no real reason why a group of people would maintain all
packages regarding the same topic.  In fact, it is absurd -- say, why
would a single person maintain 10+ competing cron implementations? 
Surely, there is some common knowledge related to running cron,
and it is entirely possible that a single person would use a few
different cron implementations on different systems.  But that doesn't
justify creating an artificial project to maintain all cron
implementations.

Mapping this to reality, projects usually represent a few developers,
each of them interested in a specific subset of packages maintained by
the project.  In some cases, this is explicitly noted as project member
roles; in other, it is not stated clearly anywhere.  In both cases,
there is usually some group of packages that are assigned to
the specific project but not maintained by any of the project members.

Less structured projects often have problems tracking member activity. 
More than once a project effectively died when all members became
inactive, yet effectively hid the fact that the relevant packages were
unmaintained and sometimes discouraged more timid developers from fixing
bugs.


What kind of projects make sense?
-
If we are to fight herd-like projects, I think it is important to
consider a bit what kind of projects make sense, and what form herd-like 
trouble.

The two projects maintaining the largest number of packages in Gentoo
are respectively the Perl project and the Python project.  Strictly
speaking, both could be considered herd-like -- after all, they maintain
a lot of packages belonging to the same category.  To some degree, this
is true.  However, I believe those make sense because:

a. They maintain a central group of packages, eclasses, policies etc.
related to writing ebuilds using the specific programming language,
and help other developers with it.  The existence of such a project is
really useful.

b. The packages maintained by them have many common properties,
frequently come from common sources (CPAN, pypi) and that makes it
possible for a large number of developers to actually maintain all
of them.

The Python project I know better, so I'll add something.  It does not
accept all Python packages (although some developers insist on adding us
to them without asking), and especially not random programs written in
the Python language.  It specifically focuses on Python module packages,
i.e. resources generally useful to Python programmers.  This is what
makes it different from a common herd project.

The third biggest project in Gentoo is -- in my opinion -- a perfect
example of a problematic herd-project.  The games project maintains
a total of 877 packages, and sad to say many are in a really bad shape. 
Even if we presumed all developers were active, this gives us 175
packages per person, and I seriously doubt one person can actively
maintain that many programs.  Add to that the fact that many of them are
proprietary and fetch-restricted, and only the people possessing a copy
can maintain it, and you see how blurry the package mapping is.

Let's look at the next projects on the list.  Proxy-maint is very
specific as it proxies contributors; however, it is technically valid
since all project members can (and should) actively proxy for any
maintainers we have.  Though I have to admit the number of maintained
packages simply overburdens us.

Haskell, Java, Ruby are other examples of projects focused on
programming languages.  KDE and GNOME projects generally make sense
since packages maintained by those projects have many common features,
and the core set has common upstream and sometimes synced releases.  It
is reasonable to assume members of those projects will maintain all, or
at least majority of those packages.

The next project is Sound -- and in my experience, it involves a lot of
poorly maintained or unmaintained packages.  Again, the problem