Bug#1030223: gobject-introspection: make cross-compilation possible
Hi Simon, On Sun, Jan 14, 2024 at 02:30:54PM +, Simon McVittie wrote: > I'm testing an implementation of that. arch-test needs specific > porting for each new architecture because of how it's written, > so I'm not intending to use that directly, but it's easy to add a > precompiled arch-test-like binary of the host architecture to the > gobject-introspection binary package, and have the wrapper script try > to invoke it and see what happens. Thank you! > Right - if we decide that qemu is not so good for some pair of (real, > emulated) architectures, and actually we'd prefer to use some other > user-space emulator like FEX or box86 for a particular pair, I don't > want to have to make new sourceful changes in all 238 source packages[1] > that produce public GIR/typelibs, plus however many packages produce > private GIR/typelibs. It seems like it would be better to only change > src:gobject-introspection and O(1) other packages. You definitely managed to convince me that having the dependency on gobject-introspection itself is preferrable. > If the cross-toolchain team implements such a thing, it would be fine to > add it as an alternative dependency in a later version. I'd prefer not to > do that while it's still hypothetical, because until there's a concrete > implementation we'd have no way to test it. The draft on how this might work sounds quite nice. Of course, the devil is in the detail, but my takeaway here is that there is a relatively easy way forward for extending this mechanism to avoid a lock-in on qemu and that extension does not require uploading tons of packages. My fear was that your qemu dependency would make us inflexible for extending it in future, but you successfully argued that it is actually the other way round. For the reasons you give, I agree that going with the direct qemu dependency is a good way to do it now. > Another option (which could perhaps be combined with this) would be for > the cross-toolchain team to define an interface to "the preferred way to > run executables from architecture A if they can't be run directly", and > then gobject-introspection could try that in preference to qemu. Meson > calls this an "EXE wrapper", which seems like as good a name as any other. Please allow me to defer this. I think we'll end up with a better interface if we can gain experience with the moving parts in practice. Your way of extending gobject-introspection will yield this experience and hopefully we'll encounter another user of this scheme (again hard coding qemu initially). And then, I hope to remember reading this excellent mail of yours with that gained experience. > - For the trivial case, cross-exe-wrapper-TUPLE:ARCH, where TUPLE and > ARCH match, contains a /usr/bin/TUPLE-exe-wrapper which just runs its > arguments as-is > (like a trivial shell script that just does an 'exec "$@"') I am thinking that perhaps this exe-wrapper could try forwarding to a location in /etc (that normally does not exist) first and then fall back to the behavior you describe. That way an administrator may supply an external exe-wrapper and thus configure the wrapping for the entire chroot solving the choice part that I took issue with. > For the gobject-introspection use-case it would be possible to skip the > cross-exe-wrapper package name and make g-i depend directly on > cross-exe-wrapper-${local:DEB-HOST-GNU-TYPE}, but I think we would need > the intermediate package name anyway if you want to be able to add > Build-Depends: cross-exe-wrapper , for use-cases where the upstream > build system will invoke TUPLE-exe-wrapper itself. Yes. Please move forward with what you have without waiting for your sketched cross-exe-wrapper machinery to appear. > A Lintian check could make sense, perhaps? It could have a table of > package names that should not be directly (build-)depended on, each with > its allowed exceptions if any, for example: > > gobject-introspection-%-endian src:gobject-introspection > gobject-introspection-bin src:gobject-introspection > libmutter% src:budgie-desktop src:gnome-remote-desktop src:gnome-shell > src:mutter > liburweb0 src:urweb > lighttpd-modules-% src:lighttpd > perl-modules-% src:perl > python-dev-is-% src:what-is-python > python-is-% src:what-is-python > python3-minimal python3 > python3.%-minimal python3.% libbinutils src:binutils The examples you give sound sensible, but I think lintian is not a useful place to store it. The people best knowing these properties are the respective package maintainers and not the (dormant) lintian maintainers. Unfortunately, lintian does not have an archive-wide view, so storing this outside lintian makes it difficult to check. The more I look at problems like these, the more I think we should have some other tool next to lintian that has an archive-wide view and can check such properties. I'm also guilty of having written two specialized tools in this area: the multiarch hinter and dumat are both specialized
Bug#1030223: gobject-introspection: make cross-compilation possible
On Thu, 11 Jan 2024 at 14:15:50 +0100, Helmut Grohne wrote: > On Thu, Jan 11, 2024 at 12:08:53PM +, Simon McVittie wrote: > > The ${GNU_TYPE}-g-ir-compiler wrapper script (which happens to be written > > in Python, the same as the upstream g-ir-compiler) explicitly tells > > g-ir-compiler to run the "dumper" binary under qemu-user if it detects > > that the Python architecture is not one that can run the host architecture. > > Do I understand correctly that cross building to i386 on amd64 would > cause this wrapper to run the i386 binary in qemu? Until today: no, but only because there was a hard-coded special case for the common i386-on-amd64. Cross-building to armhf on arm64, or to powerpc on ppc64, *would* have used qemu automatically, even if not actually necessary. In the version I hope to upload today: no, because I've added auto-detection of whether we can execute host binaries as you suggested. > I agree with the approach taken, but I think g-ir-compiler could be more > clever. Rather than assume that the host architecture is not runnable > when it differs from the build architecture, could it detect that? A > simple way would be invoking arch-test ${DEB_HOST_ARCH}, but it can as > well compile and run trivial program (as autoconf does all the time). I'm testing an implementation of that. arch-test needs specific porting for each new architecture because of how it's written, so I'm not intending to use that directly, but it's easy to add a precompiled arch-test-like binary of the host architecture to the gobject-introspection binary package, and have the wrapper script try to invoke it and see what happens. > > I would tend to think that qemu dependencies in Build-Depends are > > appropriate if and only if it's the source package that is making the > > choice to invoke qemu. > > The argument is reasonable. Your way of looking at it also lowers > maintenance cost as we don't have to modify tons of B-D. Right - if we decide that qemu is not so good for some pair of (real, emulated) architectures, and actually we'd prefer to use some other user-space emulator like FEX or box86 for a particular pair, I don't want to have to make new sourceful changes in all 238 source packages[1] that produce public GIR/typelibs, plus however many packages produce private GIR/typelibs. It seems like it would be better to only change src:gobject-introspection and O(1) other packages. We will want to make sourceful changes in all of those 238 packages eventually, to replace libgirepository1.0-dev (which cannot be M-A:same without breaking some dependent packages) with a cross-friendly alternative, but I only want to do that once per source package in most cases. For some of those packages (the ones where GIR is optional, like src:flatpak, but not the ones where GIR is required functionality, like src:gnome-shell) we will eventually also want to add support for , and maybe split out gir1.2-NAMESPACE-VERSION-dev into its own binary package so that the nogir build profile can be a reproducible/"safe" one - but, again, that should be something we can do once per source package, not something that we have to repeat every time an implementation detail changes. [1] grep-dctrl -FPackage-List -sPackage -e --pattern='gir1\.2-' \ /var/lib/apt/lists/deb.debian.org_debian_dists_sid_main_source_Sources \ | sort -u | wc -l > I am wondering > about a middle-ground of having a package can-run-arch being M-A:same > and having a maintainer script that validates the property. Then you > could Depends: qemu-user | can-run-arch (expressing the preference for > qemu-user) while any builder could still --add-depends=can-run-arch to > opt out of qemu. If the cross-toolchain team implements such a thing, it would be fine to add it as an alternative dependency in a later version. I'd prefer not to do that while it's still hypothetical, because until there's a concrete implementation we'd have no way to test it. Another option (which could perhaps be combined with this) would be for the cross-toolchain team to define an interface to "the preferred way to run executables from architecture A if they can't be run directly", and then gobject-introspection could try that in preference to qemu. Meson calls this an "EXE wrapper", which seems like as good a name as any other. Here's a straw-man design, assuming for the sake of concrete examples that the build architecture is amd64 and the host architecture is riscv64: - gobject-introspection:riscv64 Depends: cross-exe-wrapper | can-run-arch - cross-exe-wrapper:riscv64 is M-A:same and Depends on cross-exe-wrapper-riscv64-linux-gnu - cross-exe-wrapper-TUPLE is M-A:foreign (or perhaps a virtual package provided by cross-exe-wrapper-bin, which is M-A:foreign) - For the trivial case, cross-exe-wrapper-TUPLE:ARCH, where TUPLE and ARCH match, contains a /usr/bin/TUPLE-exe-wrapper which just runs its arguments as-is (like a trivial shell script that just does an 'exec
Bug#1030223: gobject-introspection: make cross-compilation possible
Hi Simon, On Thu, Jan 11, 2024 at 12:08:53PM +, Simon McVittie wrote: > On Thu, 04 Jan 2024 at 09:54:52 +0100, on #1059929, Helmut Grohne wrote: > > On Wed, Jan 03, 2024 at 07:22:26PM +, Simon McVittie wrote: > > > Or do I need to [...] > > > replace the gobject-introspection-bin | qemu-user | qemu-user-static > > > dependency by python3 | qemu-user | qemu-user-static or similar? > > > > I am not sure that you are the one who should express a qemu dependency. > > Part of how g-ir-compiler works is that it generates and compiles a > "dumper" for the host architecture, links it to the library we are > introspecting (let's say libflatpak), runs it, and parses its output. > This is the "introspection" part of the gobject-introspection name. > > The ${GNU_TYPE}-g-ir-compiler wrapper script (which happens to be written > in Python, the same as the upstream g-ir-compiler) explicitly tells > g-ir-compiler to run the "dumper" binary under qemu-user if it detects > that the Python architecture is not one that can run the host architecture. Do I understand correctly that cross building to i386 on amd64 would cause this wrapper to run the i386 binary in qemu? > It is not particularly straightforward for the package that is currently > being built to set this up, particularly if we want to do that without > changing its upstream source code (which I think we do, because changing > upstream source for this would scale very poorly). The one thing that > we can straightforwardly do across multiple build systems (Autotools > and Meson tested, CMake probably also OK) is to substitute a different > executable to be used instead of g-ir-compiler, and the executable I'm > substituting in this case is ${GNU_TYPE}-g-ir-compiler. I agree with the approach taken, but I think g-ir-compiler could be more clever. Rather than assume that the host architecture is not runnable when it differs from the build architecture, could it detect that? A simple way would be invoking arch-test ${DEB_HOST_ARCH}, but it can as well compile and run trivial program (as autoconf does all the time). If that happens to not run, it can still prepend qemu. That's not the part I'm objecting to. I object to qemu being a hard dependency. I think there are roughly three ways to make this work and I'd prefer to leave more of this flexibility to builders: a. The host architecture is directly runnable on the CPU. Examples: native builds, amd64 -> i386, and often arm64 -> armhf b. The build system has qemu-user-static installed outside the build chroot. c. The chroot contains qemu-user and this needs to be run explicitly. Making this work deviates from your current setup it two ways: * Work in the absence of a qemu binary when the host arch is runnable. * Make the qemu dependency optional somehow. > I'm using the Python architecture as an approximation of the build > architecture, on the basis that, if we have already successfully started > a Python script, then we already know we can run binaries of the same > architecture as the Python interpreter :-) I agree this is a guess that likely does not misdetect a non-runnable host architecture as runnable. It still produces misdetections of the other kind. > The dumper binary is really rather simple: it loads libraries, it > initializes the GObject type system, and it does some very simple file > I/O with the fopen()/fwrite() family. It doesn't need to do any elaborate > computation, so performance is not a concern; and it doesn't need to call > any complicated syscalls, unless the library we're introspecting makes > those syscalls during class initialization (which would be weird, normally > that would happen during instance initialization at the earliest). I agree that performance is not a concern. Emulation bugs and satisfiability is. > At the moment, i686-linux-gnu-g-ir-compiler running on x86_64-linux-gnu > Python optimistically always runs the dumper binary natively, without qemu > - but it would not be a problem to change that so that it pessimistically > always uses qemu, if you are concerned about corner-cases. As I said, > performance isn't important here. Is it actually hard to try both ways so you can do away with such corner cases? Given that you try running first, it would work the same way for native and cross. > At the moment, the ${GNU_TYPE}-g-ir-compiler scripts pessimistically assume > that there is no binfmt set up, and will always run qemu-user if it seems > that it might be necessary. Again, if this means we run qemu a bit more > often than we need to, I'm fine with that. How about trying instead of assuming? I seem to repeat myself. > In practice non-Linux architectures don't have qemu-user, so the practical > result is that you can build natively on any architecture, or you can > cross-compile for Linux on any other Linux of the same endianness > (endianness must match because of tools limitations). Expected. > The problem with tests is that they test
Bug#1030223: gobject-introspection: make cross-compilation possible
Control: retitle -1 gobject-introspection: make cross-compilation possible Moving discussion of the finer points of gobject-introspection cross-compilation from release team bug #1059929 to g-i bug #1030223 since the release team probably don't want this on their list, and retitling the g-i bug to be more general. On Thu, 04 Jan 2024 at 09:54:52 +0100, on #1059929, Helmut Grohne wrote: > On Wed, Jan 03, 2024 at 07:22:26PM +, Simon McVittie wrote: > > Or do I need to [...] > > replace the gobject-introspection-bin | qemu-user | qemu-user-static > > dependency by python3 | qemu-user | qemu-user-static or similar? > > I am not sure that you are the one who should express a qemu dependency. Part of how g-ir-compiler works is that it generates and compiles a "dumper" for the host architecture, links it to the library we are introspecting (let's say libflatpak), runs it, and parses its output. This is the "introspection" part of the gobject-introspection name. The ${GNU_TYPE}-g-ir-compiler wrapper script (which happens to be written in Python, the same as the upstream g-ir-compiler) explicitly tells g-ir-compiler to run the "dumper" binary under qemu-user if it detects that the Python architecture is not one that can run the host architecture. It is not particularly straightforward for the package that is currently being built to set this up, particularly if we want to do that without changing its upstream source code (which I think we do, because changing upstream source for this would scale very poorly). The one thing that we can straightforwardly do across multiple build systems (Autotools and Meson tested, CMake probably also OK) is to substitute a different executable to be used instead of g-ir-compiler, and the executable I'm substituting in this case is ${GNU_TYPE}-g-ir-compiler. I'm using the Python architecture as an approximation of the build architecture, on the basis that, if we have already successfully started a Python script, then we already know we can run binaries of the same architecture as the Python interpreter :-) The dumper binary is really rather simple: it loads libraries, it initializes the GObject type system, and it does some very simple file I/O with the fopen()/fwrite() family. It doesn't need to do any elaborate computation, so performance is not a concern; and it doesn't need to call any complicated syscalls, unless the library we're introspecting makes those syscalls during class initialization (which would be weird, normally that would happen during instance initialization at the earliest). > When we reason about dependencies, we care about how they behave > assuming that you can run them. Whether you can run an executable from a > package or not is something that is not expressed in our package > relationships. It's also rather difficult. Consider a few corner cases: > > * Some amd64 can run i386. To the best of my knowledge, all amd64 can run i386? Although I suppose 32-bit compat syscalls could conceivably have been disabled at kernel level (although I don't know why you'd do that and then compile i386 software). At the moment, i686-linux-gnu-g-ir-compiler running on x86_64-linux-gnu Python optimistically always runs the dumper binary natively, without qemu - but it would not be a problem to change that so that it pessimistically always uses qemu, if you are concerned about corner-cases. As I said, performance isn't important here. > * Most arm64, but not all, can run armhf. At the moment, arm-linux-gnueabihf-g-ir-compiler pessimistically assumes that nothing can run armhf, except for armhf itself. If this means we run qemu a bit more often than we need to, that's fine: it's unlikely to be a performance or functionality bottleneck. > * You may operate in a chroot with some external qemu-binfmt and thus >execute any arch. At the moment, the ${GNU_TYPE}-g-ir-compiler scripts pessimistically assume that there is no binfmt set up, and will always run qemu-user if it seems that it might be necessary. Again, if this means we run qemu a bit more often than we need to, I'm fine with that. > * You cannot run hurd-i386 on amd64 even in the presence of qemu-user. That's a good point, I'll tighten up the dependency so that gobject-introspection:hurd-i386 (or more generally, non-Linux) requires a gobject-introspection-bin (and therefore Python) from the matching OS. In practice non-Linux architectures don't have qemu-user, so the practical result is that you can build natively on any architecture, or you can cross-compile for Linux on any other Linux of the same endianness (endianness must match because of tools limitations). > When we considered whether cross building should imply disabling tests, > we went for "no, but yes by default". When you cross build a package for > i386 on amd64, sbuild and pbuilder will automatically add nocheck to > DEB_BUILD_OPTIONS and DEB_BUILD_PROFILES. However, you can opt out of > this behaviour to really run tests despite