On Thu, Feb 22, 2024 at 7:03 PM <sean.gill...@gmail.com> wrote:

> Hi folks,
>
> My name is Sean and I'm the author of several GIS packages using Numpy:
> Fiona, Rasterio, and Shapely.


Hi Sean, thanks for this very good question, and for all your work on GIS
packages.


> I've followed Numpy's trail when it comes to wheel building for many years
> and now I'm seeking advice on how to prioritize platforms to support and
> how to pay for the labor and computing that it takes to build wheels and
> maintain the infrastructure over time.


I'm probably best placed to answer your questions, because I've both been
involved in NumPy build & packaging for a long time and am responsible for
overseeing a significant fraction of the funded work on NumPy as well as
coordinating unrestricted funding coming in (mostly via Tidelift, as you
can see at https://opencollective.com/numpy). I'll do my best to accurately
represent the situation for NumPy. Your questions are challenging though,
so if you want a higher-bandwidth conversation I'd be happy to chat. Or we
can use part of a community meeting for this, since I imagine other folks
may be interested in this topic as well.


> Fiona and Rasterio have an order of magnitude more C library dependencies
> than Numpy, via GDAL (https://gdal.org/), which is almost more of an OS
> than a library.
>

Dealing with NumPy's BLAS dependency is already a large amount of work, so
I don't envy your task. PyPI really isn't well-suited to that many C
libraries (as I'm sure you know); for a long time the geospatial stack was
only usable from conda-forge, where packaging is a much easier task. I'm
not sure that was a terrible situation - there are a couple of domains like
that where things just get too challenging. So if you want to do something
much more restricted than NumPy for platforms to support with wheels, that
seems perfectly okay.


> I found a thread in the archive about adding musllinux wheels, but it
> wasn't clear to me how the work gets done, who does, and how it gets paid
> for.



Of all work on NumPy, the funded part has increased steadily. Until ~2016
that fraction was zero, and now a lot of the heavy lifting is funded work -
9 out of 10 of the top 10 committers over the past 1.5 years get paid for
at least a part of their time spent on NumPy. This is supported in several
ways (partially documented at https://numpy.org/about, but that's a bit out
of date):

1. a number of grants received over the years, from: Moore and Sloan
Foundations (>$1M), the Chan Zuckerberg Institute (>$1M), and NASA (~$400k)
2. maintainers employed by companies who allow those maintainers to spend
part of their day job time on NumPy:
    - Quansight (Matti, Nathan, Rohit, Mateusz, Melissa, me)
    - NVIDIA (Sebastian - long-time maintainer, now ~2 years at NVIDIA)
    - Intel (Raghuveer, contributor for several years, just gained commit
rights)
    - Arm (Chris, contributor for ~1 year, just gained commit rights)
    - I'm not sure if I should list Berkeley here too; folks at Berkeley
contributed a lot in the past, not sure if that was all grant-funded or if
there was unrestricted BIDS money to support NumPy.
3. unrestricted project funds, obtained from individual and corporate
(Tidelift (>$100k), Bloomberg ($10k)) donations, which support Sayed's
Developer in Residence position:
https://blog.scientific-python.org/numpy/fellowship-program/.
4. contracts for work on NumPy from clients of Quansight (and maybe other
companies, that is hard to know) that aligned with the NumPy project
roadmap. Noteworthy mentions here for the Sovereign Tech Fund, which
supported packaging-related work (
https://www.sovereigntechfund.de/tech/openblas), and the D. E. Shaw group,
which supported recent work on string ufuncs.

That said, *funding for packaging work is still quite challenging*. While
the above is an impressive list of funding, the vast majority of funders do
care about what they fund, and "keep the package installable" or "do
general maintenance work" typically doesn't do well in grant applications.
Funders have improved in this regard, and the ones mentioned in (1) above
do allow a general maintenance bucket which is some percentage of an
overall grant.

The people who are doing most of the work on packaging and wheel build CI
jobs for NumPy are Matti Picus, Andrew Nelson and myself. Andrew's time is
unfunded, for Matti and me I'd say a significant fraction of the time
working on this topic is also unfunded.

Does NumFOCUS support pay a maintainer to do it?


No, NumFOCUS does the admin for our project funds, but doesn't supply
funding to NumPy. It also doesn't structurally support any other open
source projects with direct funding, with the exception of its Small
Development Grant program - which is meant for smaller one-off projects
(amounts in the $2k - $10k range) rather than part-time or full-time
employment.


> Are Numpy maintainers adding new platform builds as part of their day
> jobs?


In general, no. This has always been volunteer work. There is only one
exception I can think of: the recent CZI grant for Scientific Python (
https://blog.scientific-python.org/scientific-python/2022-czi-grant/) has
as a goal to make html docs of projects interactive. To achieve that, we
need improved support for Pyodide - and that's going to include wheels that
can be built and deployed as part of doc builds in CI.


> Are they donating their own time to the effort?
>

Mostly yes, as described above.


> Does the Numpy project aspire to provide wheels for all of the top N
> platforms? Is it more than an aspiration?


Not quite. We have to judge carefully, because adding platforms is costly
in terms of maintainer time. Adding CI jobs is easier, because there is no
long-term commitment. Once we decide to upload wheels for a platform to
PyPI and people start to rely on them, we can not remove them anymore
without breaking the vast majority of users on those platform, since they
won't be set up to build from source (the `pip` defaults are pretty harmful
here unfortunately).

We've always had issues with deciding on niche platforms. We settled on a
rough rule of thumb once that if usage fell below 0.5%, we could remove it
(IIRC this was for making SSE2 the baseline on Windows, and hence dropping
support for ancient CPUs).

You mentioned musllinux - that's an example where the number of users is
fairly low, but also the cost of doing that work is fairly low: hardware
and CI is easily available, builds are fast, and bugs need fixing anyway.
Also, Musl is interesting to support - from a "technical progress"
perspective it's nicer to work on that than on a legacy platform.

Another example of a toss-up platform to support is 32-bit Windows. That's
a pain and not interesting to work on, the only reason we restored it is
because when we ship it without OpenBLAS it's not too difficult. Those
builds then have poor performance of course (np.dot can be 100x slower),
but the only genuine use case is folks who interact with other apps like in
Excel - not where one cares about performance.

Yet another toss-up that we do support is PyPy. We've had wheels for the
most current stable PyPy version for a long time. The main reason is that
Matti (who is also a PyPy maintainer) has been willing to do the work to
support these builds.

Platforms that we don't plan to support include:
- 32-bit x86 Linux: we dropped those wheels, because demand is quite low
and on that platform users should be able to build from source
- ppc64le: no freely available CI system (TravisCI did for a while, but it
was very unreliable)
- s390x: same as ppc64le
- ppc64: never considered this one, seems much more niche than ppc64le
- macOS universal2: this is pointless, we have thin wheels for arm64 now,
and universal2 was simply a bad idea, it should fade away

The last valid one (I think, see PEP 600) is `armv7l`. We have never
seriously considered it, but in case cross-compiling becomes more smooth I
can see us adding support at some point in the future.

I was planning to write a NEP covering platform support and how we decide
on this. Still haven't gotten around to that. This email seems like a
reasonable start for that:)

Is support for one or more platforms part of any sponsorship agreement?


No. We have no commitments for any sponsorships, nor do we have any fixed
commitments as a project for anything.


> Maybe users and sponsorship cover the cost of building wheels and
> maintenance entirely for Numpy, but it's not so in my case.
>
> I've got so many questions in this vein, and I'm grateful for any answers
> or insights or more discussion. Thanks!
>

I've also written a bit about this in the pypackaging-native docs. These
pages may interest you:
https://pypackaging-native.github.io/meta-topics/user_expectations_wheels/
https://pypackaging-native.github.io/meta-topics/pypi_social_model/
https://pypackaging-native.github.io/key-issues/native-dependencies/geospatial_stack/

Cheers,
Ralf
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to