Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]

Matthew Miller Tue, 01 Nov 2022 07:40:51 -0700

On Wed, Oct 19, 2022 at 01:04:39PM +0200, Fabio Valentini wrote:
> I'll respond inline.


Me too -- and apologies for the delay.


> > I fundamentally disagree with Kevin on a deep level about "entirely
> > useless", but ... find myself kind of agreeing about the "unpackagable"
> > part. I mean: clearly we've found a way, but I'm really not sure we're
> > providing a lot of _value_ in this approach, and I'm also not sure it's
> > as successful as it could be.
> We *do* provide value to both users *and* developers by doing things
> the way we do, but the benefits might not be obvious to people who
> don't know how (Rust) packaging works, and what we as package
> maintainers do.

Let me rephrase: I absolutely think you've provided value and are providing
value (and I appreciate it). I am not convinced that the value is in the
RPM-izing part, though.


[...]
> This is due to a limitation of how cargo handles target-specific
> dependencies - all dependencies that are *mentioned in any way* need
> to be *present* for it to resolve dependencies / enabled optional
> features / update its lockfile etc. But since we don't want to package
> bindings for Windows and mac OS system APIs, we need to actually patch
> them out, otherwise builds will fail.

Theoretically, if we had our own crate repository, we could either make
those changes there (possibly using something like packit to carry the
patches) -- or, just, not make the changes and not worry because we know
those won't end up used anyway?


> You must realize that this is an extreme case. For many Rust
> applications that people want to package for Fedora, the number of
> dependencies that are missing is rather small, *because* most popular
> libraries are already packaged.

It may be that I just hear about the difficult cases.


> We might need to reconsider how to package projects like this. I'm
> pretty sure we could find a way to package them in a way that's
> compatible with how we're currently doing things but would be much
> less busywork.

Okay, I'm open to that.



> Sure, but isn't that the case for most projects that a newcomer wants
> to package, regardless of programming language? Say, somebody wants to
> package some cool new Python project for machine learning, then
> there's probably also some linear algebra package or SIMD math library
> in the dependency tree that's missing from Fedora. How is that
> different?

Rust tends to be more fine-grained. I don't think this is necessarily
rust-specific _really_ — I think it's a trend as people get more used to
this way of doing things. With Python, there are some big packages
(including "batteries included" standard Python itself) which tend to group
big related sets of functionality. (notably: numpy, scipy, pandas...)

> For intra-project dependencies (i.e. bevy components depending on
> exact versions of bevy components), this is kind of expected, and we
> have tools to deal with this kind of situation (though bevy is on a
> different scale). For dependencies on third-party libraries, this is
> kind of unexpected, and I wonder why they do things like that? Locking
> some dependencies to exact versions is usually handled by relying on
> the lockfile, instead.

I was wrong about this. I actually didn't realize that the ^ was optional. I
was, um, cargo-culting that around. Ah well. Anyway, that's less of a
problem than I worried.

> > The packaging guidelines say that I SHOULD create patches to update to
> > latest versions of dependencies, and that I should further convince the
> > upstream to take them. Candidly, that seems like a waste of everone's
> > time.
> This is *not* a waste of time. If we don't invest time to do that, many
> project's dependencies grow stale, and actually *increase* the need for us
> to maintain compat packages.

I have not tried this with any Rust package. My experience in the past is
that many upstreams find this the kind of thing that makes them go on long
blog rants about distro packaging -- they picked a version, it works fine,
they don't need the distraction of being told they must update it.

But even when this doesn't happen, it gets into the matter of expertise. If
I need to update a dependency for a newer-version of the sub-dependency, and
I don't know enough about either code base to do anything other than file a
"please update" bug, then everything is blocked on that.

I don't dispute that helping projects keep up to the latest is valuable
work. It even seems like it might be in-scope work for Fedora. But couldn't
we do that as something _separate_ from blocking ourselves (either literally
or through the extra overhead of compat packages) from packaging the
dependent app?

> > The guidelines provide for creating compat packages, but that means 1) the
> > existing shared work is less useful, 2) requires even more extra steps, and
> > 3) even without reviews for compat has extra administrative overhead.
> 
> We only maintain compat packages where porting to the new version (and
> submitting the changes upstream) is not feasible. Again, isn't that
> how Fedora is supposed to work?

I guess it depends on how broadly one reads "feasible". :) 


> The barrier for participation is too high in some cases, I agree.
> However, in my experience, that's for a different reason:
> 
> The "shiny new things that happen to be written in Rust" that new
> contributors want to have in Fedora are often very complicated
> projects that even experienced Rust packagers would need to spend a
> lot of time on.
> 
> Examples of that might be:
> - wasmtime: I ultimately abandoned the attempt to package it "because
> Fedora Legal", but the packages themselves worked fine

An aside, but: did I miss something with this on the Legal list? The only
thing I'm finding is a question about how to phrase `Apache-2.0 WITH
LLVM-exception`.

> - deno: requires dozens of new packages, some of which also have
> unclear / questionable licenses as well, but the packages themselves
> worked

I'm not sure this example isn't agreeing with me. :)



> On the other hand, many "nice" CLI tools that people want to package
> often require minimal knowledge of Rust packaging (our tools are
> pretty nice for "standard" projects), and often only need very few new
> dependencies to be packaged.
> 
> Just as an example, I just today started reviewing a "simple" Rust
> application here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1990713
> 
> The spec file is very simple and almost entirely automatically
> generated (with the exception of the missing License breakdown for the
> statically linked binary), no dependencies were missing from Fedora.
> Even Rust newbies would not have trouble packaging this, and that
> would be a way better entry point than packaging stuff like Bevy.

How many lines of that are unique to that package.

I guess my impression of this is a little big colored by a Python packaging
adventure from a little bit ago:
https://lists.fedoraproject.org/archives/list/python-de...@lists.fedoraproject.org/thread/MDYIFGPZS775FKXZU3LZPAJYJ36HGIDH/#WFH5G43DC2GMNLAJ2MSTXFZEROWCSYLZ
... where it kind of turns out that what looks like it could be automated
ends up with a hand-tuned specfile with a lot of exceptions.

I'm hopeful that maybe the Rust version of this could be more streamlined.

What if, instead of the specfile + boilerplate, there was a toml (or yaml,
or whatever) file that just listed whatever is unique to the package?

For this package, maybe it is... nothing? Just need an indicator that this
is a Rust package.


I know it looks simple to an experienced packager, but that specfile has a
_lot_ of complicated domain knowledge -- both general Fedora RPM packaging
and the rust packaging macros.


> > And, I led with: I appreciate all the work you've all done to make this
> > work. That's definitely true — I think it was super-valuable to pilot this
> > approach. But I think that the Rust ecosystem would be a great place to
> > pilot a different way. Something lightweight where we cache crates and use
> > them _directly_ in the build process for _application_ RPMs.
> 
> We have talked about this multiple times, but it won't work.
> I think this was tried with first-class maven artifact support in
> koji, but we all know how the Java packaging fiasco ended.

I would rather see it as: we learned some lessons from that approach and can
do it better.

> Or even if making Rust crates first-class deliverables *did work*, it
> wouldn't give us the benefits of the current approach:
> - we ensure that all crates in Fedora *build* on all architectures
> - we ensure that most crates in Fedora pass their test suites on all
> architectures

But those things aren't attached to making them into RPMs.

> - we check all crates for objectionable content, licensing problems, etc.

Nor is this. And I don't think we _should_ skip this part. It is clear
value.


> - we change build flags to default to dynamically linking to system
> libraries instead of statically linking against vendored copies

This too.

Mostly, at least. Assuming this isn't _prebuilt binaries_ or similar,
upstream may or may not have a good reason or strong opinion. Like any
bundling, we need a system which can track and react to security problems
with those libraries, though. (And we don't meaningfully have that for RPMs
now either.)



> This would mean that we basically stop contributing things to the
> upstream Rust ecosystem:
> - we diagnose / report / fix architecture support issues
> - we port projects to new versions of dependencies
> - etc.

Why? I think it would give us _more_ time to do those things.

> I see this work in the upstream ecosystem as an important part of the
> work we do in packaging Rust crates for Fedora,
> and I would not want to endorse an approach that meant we no longer do
> these things.


Sure!

> > Rust packages include a lot of machine-readable metadata. We should be
> > able to watch for CVEs, RustSec, and other security notices even without
> > encoding the metadata in RPMs. License review could also be automated —
> > the field in Cargo.toml is supposed to be SPDX, so that's convenient.
> > [3]
> 
> I already monitor RustSec advisories and check *all of them* against
> Fedora packages. This takes up a miniscule amount of the time I spend
> on Rust packaging (because there's so few Rust security advisories).
> If I remember correctly, there were only 2-3 CVE issues in the Rust
> stack that actually affected our packages, and dealing with those was
> very simple:
> 1) Push the patched version of the library, 2) rebuild dependent
> applications, 3) submit to bodhi.
> There's some amount of automation that *could* be done (mostly in
> figuring out which applications need to be rebuilt for a given library
> change), but that's also pretty easily done with a "dnf repoquery" or
> two.

I appreciate that you do this (very much!). But it seems like this could be
_entirely_ automated (with alerts for dependency or license changes, etc).



> On the other hand, license review is still important, even if it's
> already available in SPDX format in the upstream metadata.
> Just because sometimes, that metadata is either wrong or incomplete.
> And even more often, package review flags other problems (like missing
> LICENSE files for licenses that *require* redistributed sources to
> contain a copy of the license text). Relying on SPDX metadata alone is
> *not* safe.

Again, yes -- and we should talk to Jilanye and others on the Legal list,
because we can do this better generally. For most packages, there's a
one-time review gate, applied with various diligence depending on the
packager and reviewer. Then, maybe never looked at again. For packages that
go into RHEL, there's another review by RH that _hopefully_ (and by policy!)
should go back to the Fedora packages.


> 
> > We could also attach other metadata to the packages in the cache. Maybe some
> > popularity, update frequency from Cargo.io, but also package review flags:
> > checked license against source, and whatever other auditing we think should
> > be done. This moves the focus from specfile-correctness to the package
> > itself, and the effort from packaging to reviewing. (I'd suggest that for
> > the experiment, we not make any deep auditing manditory, but instead
> > encouraged.) And these flags should be able to be added by anyone in the
> > Rust SIG, not necessarily just at import.
> 
> This is already the case, though?
> Writing a spec file for a new crate is already automated to the point
> where "standard" crates can be 100% automatically generated and need
> zero manual edits.

See my comment above -- there are a lot of steps and a l

> If manual changes *are* required, then these changes would also be
> required in the "first-class crate artifact" scenario, so you don't
> gain anything.
> And if there's other problems that are caught during package review,
> the distribution mechanism doesn't matter, either.

But our mechanism is really complicated -- a barrier to entry, lots of
places for mistakes, and even with collaboration with other distros, very
Fedora-specific. So I think there is something to gain.

> In my experience, changing the distribution mechanism or packaging
> paradigm will often make things *worse* instead of better. For
> example, the implosion of the NodeJS package ecosystem in Fedora was
> not only caused by the horrid state NPM, but also because the new
> packaging guidelines which prefer bundling essentially made it
> impossible for packagers to verify that objectionable content is
> present in vendored dependencies. For Java, Modularity was seen as a
> "solution", but the result was that basically everybody - except for
> the Red Hat maintainers who maintained the modules - just stopped
> doing Java packaging because of the hostile environment.

I really hope we can look at these and learn how to do it better, instead of
deciding that better isn't possible. And — while I'm not really up on node —
I have pretty good hindsight on what went wrong with modularity. (Not enough
to try modularity _again_ just yet... but that's a different thing. A whole
talk for next year's Nest/Flock, maybe....)


> > Rust packaging seems like a great place to lead the way — and then we can
> > maybe expand to Go, which has similar issues, and then Java (where, you
> > know, things have already collapsed despite heroic effort.)
> 
> Oh, actually, I don't think Rust packaging is a good place to start
> here at all. :)
> 
> The way cargo works already maps very neatly onto how RPM packages
> work, which is definitely *not* the case for other language
> ecosystems. I also think we could even massively improve handling of
> "large" projects with many sub-components (like bevy, zola, wasmtime,
> deno, etc.) - which are currently the only projects that are "painful"
> to package - *without* completely changing the underlying packaging
> paradigm or distribution mechanism. (I've been wanting to actually
> write better tooling for this use case, but alas, Bachelor thesis is
> more important for now.)

I think we can both be right, here: the simple mapping seems like it makes
it good to experiment with.


> alternatives, all attempts at trying different approaches (maven
> artifacts in koji, vendoring NodeJS dependencies, Java Modules, etc.)
> have *failed* and ultimately made things worse instead of improving
> the situation - the only thing that has proven to be sustainable (for
> now) is ... maybe surprisingly, plain RPM packages.

I'll take "for now". :)



-- 
Matthew Miller
<mat...@fedoraproject.org>
Fedora Project Leader

-- 
Matthew Miller
<mat...@fedoraproject.org>
Fedora Project Leader
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]

Reply via email to