On Mon, Jul 23, 2018 at 11:46 AM, Stephan Hoyer <sho...@gmail.com> wrote:

> On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers <ralf.gomm...@gmail.com>
> wrote:
>
>> Then, I think it's not unreasonable to draw a couple of hard lines. For
>> example, removing complete submodules like linalg or random has ended up on
>> some draft brainstorm roadmap list because someone (no idea who) put it
>> there after a single meeting. Clearly the cost-benefit of that is such that
>> there's no point even discussing that more, so I'd rather draw that line
>> here than every time someone open an issue.
>>
>
> I'm happy to give the broader context here. This came up in the NumPy
> sprint in Berkeley back in May of this year.
>
> The existence of all of these submodules in NumPy is mostly a historical
> artifact, due to the previously poor state of Python packaging.
>

That's true.

Our thinking was that perhaps this could be revisited in this age of conda
> and manylinux wheels.
>
> This isn't to say that it would actually be a good idea to remove any of
> these submodules today. Separate modules bring both benefits and downsides.
>
> Benefits:
> - It can be easier to maintain projects separately rather than inside
> NumPy, e.g., bug fixes do not need to be tied to NumPy releases.
> - Separate modules could reduce the maintenance burden for NumPy itself,
> because energy gets focused on core features.
>

That's certainly not a given though. Those things still need to be
maintained, and splitting up packages increases overhead for e.g. doing
releases. It's quite unclear if splitting would increase the developer pool.

- For projects for which a rewrite would be warranted (e.g., numpy.ma and
> scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy.
>

Agreed. That can happen and is already happening though (e.g.
https://github.com/pydata/sparse). It doesn't have much to do with removing
existing submodules.

- Packaging. As mentioned above, this is no longer as beneficial as it once
> way.
>

True, no longer as beneficial - that's not really a benefit though,
packaging just works fine either way.


> Downsides:
> - It's harder to find separate packages than NumPy modules.
> - If the maintainers and maintenance processes are very similar, then
> separate projects can add unnecessary overhead.
> - Changing from bundled to separate packages imposes a significant cost
> upon their users (e.g., due to changed import paths).
>
> Coming back to the NEP:
>
> The import on downstream libraries and users would be very large, and
>>
> maintenance of these modules would still have to happen.  Therefore this
>> is simply not a good idea; removing these submodules should not happen
>> even for a new major version of NumPy.
>>
>
> I'm afraid I disagree pretty strongly here. There should absolutely be a
> high bar for removing submodules, but we should not rule out the
> possibility entirely.
>

My thinking here is: given that we're not even willing to remove
MaskedArray (NEP 17), for which the benefits of removing are a lot higher
and the user base smaller, we are certainly not going to be removing random
or linalg or distutils in the foreseeable future. So we may as well say
that. Otherwise we have the discussions regularly (we actually just did
have one for numpy.testing in gh-11457), which is just a waste of energy.


> It is certainly true that modules need to be maintained for them to be
> remain usable, but I particularly object to the idea that this should be
> forced upon NumPy maintainers.
>

Nothing is "forced on you" as a NumPy maintainer - we are all individuals
who do things voluntarily (okay, almost all - we have some funding now) and
can choose to not spend any time on certain parts of NumPy. MaskedArray
languished for quite a while before Marten and Eric spent a lot of time in
improving it and closing lots of issues related to it. That can happen.

Open source projects need to be maintained by their users, and if their
> users cannot devote energy to maintain them then the open source project
> deserves to die. This is just as true for NumPy submodules as for external
> packages.
>
> NumPy itself only has an obligation to maintain submodules if they are
> actively needed by the NumPy project and valued by active NumPy
> contributors.
>

This is very developer-centric view. We have lots of users and also lots of
no-longer-active contributors. The needs, interests and previous work put
into NumPy of those groups of people matter.

Otherwise, they should be maintained by users who care about them --
> whether that means inside or outside NumPy. It serves nobody well to insist
> on NumPy developers maintaining projects that they don't use or care about.
>

> I like would suggest the following criteria for considering removing a
> NumPy submodule:
> 1. It cannot be relied upon by other portions of NumPy.
> 2. Either
> (a) the submodule imposes a significant maintenance burden upon the rest
> of NumPy that is not balanced by the level of dedicated contributions, or
> (b) much better alternatives exist outside of NumPy
>

To quote Nathaniel: "the rest of our policy is all about measuring
disruption based on effects on users". That's absent from your criteria.

Why I would like to keep this point in is:
- the discussion does come up, see draft brainstorm roadmap list and
gh-11457.
- the outcome of such discussions is in practice 100% clear.
- I would like to avoid having drawn out discussions each time (this eats
up a lot of energy for me), and I *really* would like to avoid saying "I
don't have time to discuss, but this is just not going to happen" or
"consider it vetoed".
- Hence: just write it down, so we can refer to it.

Cheers,
Ralf
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to