Losing signing keys for custom Guix channel
Hey devs, So I lost the PGP key that I was using to sign commits on a private Guix channel of mine. Is there a way to introduce a hard break in my channel authentication? Despite updating authorization settings, pulls complain that my latest commit isn't signed by an authorized key. Here are the changes I've made: - New public key added to keyring branch - Appended new key fingerprint to .guix-authorizations (at commit X) - Update introduction in .config/guix/channels.scm - Point to commit X - Update openpgp-fingerprint As a sanity check, I've confirmed that the fingerprint on commit X, the fingerprint in .guix-authorizations, and the openpgp-fingerprint in my channels.scm are all the same. What am I missing?
Re: Should commits rather be buildable or small
Hi John, On 3/25/2024 9:15 AM, John Kehayias wrote: dan: Do you know if just a version/hash bump is at least buildable? Or are the changes necessary for the packages to build/function at all? Or I guess if the non-version changes are applicable to the current version first? I'll give it a try tomorrow. I think if packages are updated in certain sequence, it should be at least buildable. I'll give update or send a new patch series if I have any progress on this. -- dan
Re: Should commits rather be buildable or small
Hi, Apologies for the delay. I would like to get things rolling on mesa-updates and building, including the vulkan updates, so a choice will have to be made :) Thanks for the input so far! On Tue, Mar 05, 2024 at 06:19 AM, Liliana Marie Prikler wrote: > Hi, > > Am Montag, dem 04.03.2024 um 21:38 + schrieb John Kehayias: >> [...] >> 1. Essentially squash to one commit where all of vulkan is updated in >> one commit. The main upside is that nothing should break (within >> vulkan, dependents to be fixed as needed) and it shows as "one" >> change; the main downside is that the proposed changes are not just >> trivial version bumps. Harder to then disentangle as needed. >> >> 2. Make each commit updating a package, but don't use the variable >> %vulkan-sdk-version, updating each package with a version as it is >> done. Then do a commit where all the versions are replaced by the >> variable. This seems like unnecessary work to me and while it stops >> the obvious breaking (source hashes don't match once variable is >> updated but package hasn't yet) versions are still mixed which is >> likely a problem. >> >> 3. Go with the series as proposed: this means after the first commit >> for sure all other vulkan packages and dependents don't build, as the >> source hashes won't match until the commit that updates that package. >> Along with version mixing, this perhaps doesn't give you a helpful >> git bisect either? >> >> None are perfect. What do people think? > I think 1 would be workable if the changes to the packages are minimal. > You should also check whether you can just do the version bumps and > then the other changes – or flip the order. > As currently proposed, the changes are not minimal. dan: Do you know if just a version/hash bump is at least buildable? Or are the changes necessary for the packages to build/function at all? Or I guess if the non-version changes are applicable to the current version first? > I don't really see the benefit with 2. Normally, we'd have "-next" > variants to catch nontrivial updates (among other things), but those > don't seem a good approach here. > > If nothing else works, 3 is indeed an option to fall back to, albeit > begrudgingly. As noted for 1, you could check whether bumping all the > hashes and then only fixing whatever else for the builds is an option > here. > That's what I'll have to do I think, unless indeed the versions changes can be made separately and still build. I can mark each patch in the commit log that it is part of a series updating all the vulkan packages. That might be something worth doing in general for cases like this, to help out future time travelers and when e.g. searching the log and finding a commit. > Alternative 4 would be to build those -next variants and then replace > the base vulkan all at once. This has the advantage of not doing any > version mixing in-between IIUC. > That's also an idea. Add a %vulkan-version-next or something like that, and -next variants of all the packages using that version instead. A bit clumsy and perhaps convoluted with the extra work for maybe minimal gain. I'll wait to see if dan has any information of what changes can be made independently, but I guess I'll just have to make a decision on mesa-updates. Thanks! John
Shepherd timers
Hello Guix! I pushed to the ‘devel’ branch of the Shepherd a new module that implements “timers” along with ‘herd’ support to display information about them. It lets you provide configuration like this one: --8<---cut here---start->8--- (use-modules (shepherd service timer)) (define timer (service '(my-timer) #:start (make-timer-constructor (calendar-event #:seconds '(0 7 15 22 30 45)) (command '("sh" "-c" "echo Hi from $PWD.; sleep 20; echo done"))) #:stop (make-timer-destructor))) (register-services (list timer)) (start-in-the-background '(my-timer)) --8<---cut here---end--->8--- And then ‘my-timer’ invokes the given command at the moments that match the constraints defined by ‘calendar-event’—in this case any time the number of seconds is equal to 0, 7, 15, 22, 30, or 45. You can also make it every Monday at 9AM etc., as you would expect. The ‘herd’ command provides details information about the timer: --8<---cut here---start->8--- $ ./herd -s sock status my-timer Status of my-timer: It is running since 21:09:32 (68 seconds ago). Timed service. Periodically running: sh -c "echo Hi from $PWD.; sleep 20; echo done". Child process: 1814 It is enabled. Provides (my-timer). Requires (). Will not be respawned. Recent runs: 2024-03-24 21:10:04 Process exited successfully. 2024-03-24 21:10:19 Process exited successfully. 2024-03-24 21:10:26 Process exited successfully. 2024-03-24 21:10:34 Process exited successfully. 2024-03-24 21:10:35 Process terminated with signal 15. Recent messages: 2024-03-24 21:10:29 Hi from /home/ludo. Upcoming timer alarms: 21:10:45 (in 5 seconds) 21:11:00 (in 20 seconds) 21:11:07 (in 27 seconds) 21:11:15 (in 35 seconds) 21:11:22 (in 42 seconds) --8<---cut here---end--->8--- And of course you can do anything you can do with a service: stop it, unload it, load a replacement, and so on. Feedback & suggestions welcome! Ludo’.
Re: PyTorch with ROCm
Hi Ricardo, thanks for the information! Ricardo Wurmus writes: > Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added > python-pytorch2 at version 2.2.1. Do you think you could adjust your > patches to modify that one instead? I already adjusted the patches yesterday to remove the python-pytorch2 package you added, as the patch series updates the main python-pytorch package to version 2. The new inputs of your package were already included, with the exception of python-opt-einsum. I had overlooked it before and included it now. :) Is there a reason to keep version 1 around? Then I could adjust the patches again. Otherwise, it makes sense for me to move the python-pytorch package to version 2.2.1 and have a package variant with 2.0.1 for r-torch (which I kept and adjusted). Due to problems when building dependencies, the new package only succeeds to build for x86_64. As I explained in the patch series, asmjit fails on armhf because GCC runs out of memory (it reaches 4 GB I think and more is of course not possible) and cpuinfo has a known bug on aarch64 [1], which causes the tests to fail and AFAICT also break PyTorch at runtime. Through python-pytorch -> python-expecttest -> poetry -> python-keyring -> python-secretstorage -> python-cryptography, the python-pytorch package now depends on rust, which currently requires too much memory to build on 32 bit systems, so i686 is not supported either. What do you think should be done here? I added all packages required for the core tests to native-inputs, but decided to disable them as they require a long time to run. If I remove the test inputs (in particular python-expecttest), the package could probably also be built for i686. Would it be acceptable keep them as a comment for reference? > I think it is sufficient to only have the current version of ROCm; other > versions could be added if there is reasonable demand. That sounds good to me. Best, David [1] https://github.com/pytorch/cpuinfo/issues/14
Re: PyTorch with ROCm
Hi David, > after seeing that ROCm packages [1] are available in the Guix-HPC > channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2. Excellent initiative! > For this, I first unbundled the (many) remaining dependencies of the > python-pytorch package and updated it to 2.2.1, the patch series for > which can be found here [2,3]. Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added python-pytorch2 at version 2.2.1. Do you think you could adjust your patches to modify that one instead? > It would be really great to have these packages in Guix proper, but > first of course the base ROCm packages need to be added after deciding > how to deal with the different architectures. Also, are several ROCm > versions necessary or would only one (the current latest) version > suffice? As for the ROCm-specific work, I'm not qualified to comment. I do support a move of the ROCm packages from Guix HPC to Guix proper. I think it is sufficient to only have the current version of ROCm; other versions could be added if there is reasonable demand. -- Ricardo
Re: Bug#1066113: guix: CVE-2024-27297
On 2024-03-16, Vagrant Cascadian wrote: > For anyone with Guix or Nix installed, if I understand correctly, it > basically allows arbitrarily replacing the source code for anything that > you might build using Guix or Nix. Yes, for multi-user systems and people running untrusted code in “guix shell -CW” container isolation, there is risk. Regards, Florian
PyTorch with ROCm
Hello, after seeing that ROCm packages [1] are available in the Guix-HPC channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2. For this, I first unbundled the (many) remaining dependencies of the python-pytorch package and updated it to 2.2.1, the patch series for which can be found here [2,3]. For building ROCm and building the remaining packages, I did not apply the same quality standard as for python-pytorch and just tried to get it working at all with ROCM 6.0.2. To reduce the build time, I also only tested them for gfx1101 as set in the %amdgpu-targets variable in amd/rocm-base.scm (which needs to be adjusted for other GPUs). Here, it seemed to work fine on my GPU. The changes for the ROCm packages are here [4] as a modification of Guix-HPC. There, the python-pytorch-rocm package in amd/machine-learning.scm depends on the python-pytorch-avx package in [2,3]. Both python-pytorch and python-pytorch-avx support AVX2 / AVX-512 instructions, but the latter also has support for fbgemm and nnpack. I used it over python-pytorch because AVX2 or AVX-512 instructions should be available on a CPU with PCIe atomics anyway, which ROCm requires. For some packages, such as composable-kernel, the build time and memory requirement is already very high when building only for one GPU architecture, so maybe it would be best to make a separate package for each architecture? I'm not sure they can be combined however, as the GPU code is included in the shared libraries. Thus all dependent packages like python-pytorch-rocm would need to be built for each architecture as well, which is a large duplication for the non-GPU parts. There were a few other issues as well, some of them should probably be addressed upstream: - Many tests assume a GPU to be present, so they need to be disabled. - For several packages (e.g. rocfft), I had to disable the validate-runpath? phase, as there was an error when reading ELF files. It is however possible that I also disabled it for packages where it was not necessary, but it was the case for rocblas at least. Here, kernels generated are contained in ELF files, which are detected by elf-file? in guix/build/utils.scm, but rejected by has-elf-header? in guix/elf.scm, which leads to an error. - Dependencies of python-tensile copy source files and later copy them with shutil.copy, sometimes twice. This leads to permission errors, as the permissions in the store are kept, so I patched it to use shutil.copyfile instead. - There were a few errors due to using the GCC 11 system headers with rocm-toolchain (which is based on Clang+LLVM). For roctracer, replacing std::experimental::filesystem by std::filesystem suffices, but for rocthrust, the placement new operator is not found. I applied the patch from Gentoo [5], where it is replaced by a simple assignment. It looks like UB to me though, even if it happens to work. The question is whether this is a bug in libstdc++, clang or amdclang++... - rocMLIR also contains a fork of the LLVM source tree and it is not clear at a first glance how exactly it differs from the main ROCm fork of LLVM or upstream LLVM. It would be really great to have these packages in Guix proper, but first of course the base ROCm packages need to be added after deciding how to deal with the different architectures. Also, are several ROCm versions necessary or would only one (the current latest) version suffice? Cheers, David [1] https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix/ [2] https://issues.guix.gnu.org/69591 [3] https://codeberg.org/dtelsing/Guix/src/branch/pytorch [4] https://codeberg.org/dtelsing/Guix-HPC/src/branch/pytorch-rocm [5] https://gitweb.gentoo.org/repo/gentoo.git/tree/sci-libs/rocThrust/files/rocThrust-4.0-operator_new.patch
Re: cuirass building and deployment
i will report more, for now you can see there are thousands of build errors tryign to bootstrap http://34.41.82.208:8080/eval/7?status=failed On Sun, Mar 24, 2024, 07:27 Ludovic Courtès wrote: > Hi, > > Jim Dupont skribis: > > > have been struggling to deploy cuirass on a gcp cluster. > > I was able to install it manually on one machine and use it from guix > > talking to postgres and nginx running in ubuntu. the shepherd wont build > or > > install, needs more love. > > I have gotten to finally build on guix and can export the package from > one > > system to another. > > Can someone help me understand how to document the exact steps to > reproduce > > a build? > > There are constant issues with guile modules not found or loading, > > autoconfig failing or other errors, > > Could you be more specific about the build errors you encounter? > > To run Cuirass, the best option is install the ‘cuirass’ package in > Guix: ‘guix install cuirass’. Then you can start the ‘cuirass register’ > and ‘cuirass web’ processes manually or get systemd to handle them (I > don’t think Cuirass provides ‘.service’ files, but it should.) > > If you want to be on the bleeding edge (not recommended), the easiest > way is to run ‘guix build -f guix.scm’ from the Cuirass source tree. > > HTH! > > Ludo’. >
Re: cuirass building and deployment
Hi, Jim Dupont skribis: > have been struggling to deploy cuirass on a gcp cluster. > I was able to install it manually on one machine and use it from guix > talking to postgres and nginx running in ubuntu. the shepherd wont build or > install, needs more love. > I have gotten to finally build on guix and can export the package from one > system to another. > Can someone help me understand how to document the exact steps to reproduce > a build? > There are constant issues with guile modules not found or loading, > autoconfig failing or other errors, Could you be more specific about the build errors you encounter? To run Cuirass, the best option is install the ‘cuirass’ package in Guix: ‘guix install cuirass’. Then you can start the ‘cuirass register’ and ‘cuirass web’ processes manually or get systemd to handle them (I don’t think Cuirass provides ‘.service’ files, but it should.) If you want to be on the bleeding edge (not recommended), the easiest way is to run ‘guix build -f guix.scm’ from the Cuirass source tree. HTH! Ludo’.
Status of ‘core-updates’
Hello! What’s the status of ‘core-updates’? What are the areas where help is needed? I know a lot has happened since the last update¹, which is roughly when I dropped the ball due to other commitments, but I’m not sure where we are now. Thanks, Ludo’. ¹ https://lists.gnu.org/archive/html/guix-devel/2024-01/msg00096.html
Re: Error handling when 'guix substitute' dies
Hi Ada, Ada Stevenson skribis: > Sometimes, usually when I'm on an enterprise network like my > university's of library's wifi, the `guix substitute` process dies > with a "TLS error in procedure 'write_to_session_record_port': Error > in the push function" error message. My connection is rock-solid > otherwise, and sometimes it doesn't happen at all. What version of guix-daemon are you using? Was it installed through ‘guix pull’ or via another distro? I’ve seen this before but I haven’t experienced it in a long time, so I wonder if I’m just lucky or if there are other factors at play. > I'm not sure if this is a fault in the actual Guix code, or there's > some Guile library somewhere that has this bug. Anyway, I think it > would be a useful feature to have a way to automatically restart the > `guix substitute` process or otherwise recover from this error. Some > sort of `--restart=no.restarts.permitted` flag. Whenever I'm updating > my system I tend to leave and do something else, and when this happens > I come back and nothing's actually been done, and the error is > transient so I don't gain anything from seeing this message. ‘guix substitute’ is a ‘guix-daemon’ helper, which automatically (re)starts it when needed. Now, what we could do is have ‘guix substitute’ gracefully handle those errors and keep going. I believe this one-liner should do it: diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm index 37cd08e289..3af0bf0019 100755 --- a/guix/scripts/substitute.scm +++ b/guix/scripts/substitute.scm @@ -494,7 +494,9 @@ (define* (download-nar narinfo destination (define (try-fetch choices) (match choices (((uri compression file-size) rest ...) - (guard (c ((and (pair? rest) (http-get-error? c)) + (guard (c ((and (pair? rest) + (or (http-get-error? c) + (network-error? c))) (warning (G_ "download from '~a' failed, trying next URL~%") (uri->string uri)) (try-fetch rest))) I’ll go ahead with this change if there are no objections. Thanks, Ludo’.