Losing signing keys for custom Guix channel

2024-03-24 Thread elaexuotee
Hey devs,

So I lost the PGP key that I was using to sign commits on a private Guix
channel of mine. Is there a way to introduce a hard break in my channel
authentication?

Despite updating authorization settings, pulls complain that my latest commit
isn't signed by an authorized key.

Here are the changes I've made:
- New public key added to keyring branch
- Appended new key fingerprint to .guix-authorizations (at commit X)
- Update introduction in .config/guix/channels.scm
  - Point to commit X
  - Update openpgp-fingerprint

As a sanity check, I've confirmed that the fingerprint on commit X, the
fingerprint in .guix-authorizations, and the openpgp-fingerprint in my
channels.scm are all the same.

What am I missing?



Re: Should commits rather be buildable or small

2024-03-24 Thread dan

Hi John,

On 3/25/2024 9:15 AM, John Kehayias wrote:

dan: Do you know if just a version/hash bump is at least buildable? Or
are the changes necessary for the packages to build/function at all?
Or I guess if the non-version changes are applicable to the current
version first?


I'll give it a try tomorrow.  I think if packages are updated in certain 
sequence, it should be at least buildable.  I'll give update or send a 
new patch series if I have any progress on this.


--
dan




Re: Should commits rather be buildable or small

2024-03-24 Thread John Kehayias
Hi,

Apologies for the delay. I would like to get things rolling on
mesa-updates and building, including the vulkan updates, so a choice
will have to be made :)

Thanks for the input so far!

On Tue, Mar 05, 2024 at 06:19 AM, Liliana Marie Prikler wrote:

> Hi,
>
> Am Montag, dem 04.03.2024 um 21:38 + schrieb John Kehayias:
>> [...]
>> 1. Essentially squash to one commit where all of vulkan is updated in
>> one commit. The main upside is that nothing should break (within
>> vulkan, dependents to be fixed as needed) and it shows as "one"
>> change; the main downside is that the proposed changes are not just
>> trivial version bumps. Harder to then disentangle as needed.
>>
>> 2. Make each commit updating a package, but don't use the variable
>> %vulkan-sdk-version, updating each package with a version as it is
>> done. Then do a commit where all the versions are replaced by the
>> variable. This seems like unnecessary work to me and while it stops
>> the obvious breaking (source hashes don't match once variable is
>> updated but package hasn't yet) versions are still mixed which is
>> likely a problem.
>>
>> 3. Go with the series as proposed: this means after the first commit
>> for sure all other vulkan packages and dependents don't build, as the
>> source hashes won't match until the commit that updates that package.
>> Along with version mixing, this perhaps doesn't give you a helpful
>> git bisect either?
>>
>> None are perfect. What do people think?
> I think 1 would be workable if the changes to the packages are minimal.
> You should also check whether you can just do the version bumps and
> then the other changes – or flip the order.
>

As currently proposed, the changes are not minimal.

dan: Do you know if just a version/hash bump is at least buildable? Or
are the changes necessary for the packages to build/function at all?
Or I guess if the non-version changes are applicable to the current
version first?

> I don't really see the benefit with 2.  Normally, we'd have "-next"
> variants to catch nontrivial updates (among other things), but those
> don't seem a good approach here.
>
> If nothing else works, 3 is indeed an option to fall back to, albeit
> begrudgingly.  As noted for 1, you could check whether bumping all the
> hashes and then only fixing whatever else for the builds is an option
> here.
>

That's what I'll have to do I think, unless indeed the versions
changes can be made separately and still build. I can mark each patch
in the commit log that it is part of a series updating all the vulkan
packages. That might be something worth doing in general for cases
like this, to help out future time travelers and when e.g. searching
the log and finding a commit.

> Alternative 4 would be to build those -next variants and then replace
> the base vulkan all at once.  This has the advantage of not doing any
> version mixing in-between IIUC.
>

That's also an idea. Add a %vulkan-version-next or something like
that, and -next variants of all the packages using that version
instead. A bit clumsy and perhaps convoluted with the extra work for
maybe minimal gain.

I'll wait to see if dan has any information of what changes can be
made independently, but I guess I'll just have to make a decision on
mesa-updates.

Thanks!

John




Shepherd timers

2024-03-24 Thread Ludovic Courtès
Hello Guix!

I pushed to the ‘devel’ branch of the Shepherd a new module that
implements “timers” along with ‘herd’ support to display information
about them.

It lets you provide configuration like this one:

--8<---cut here---start->8---
(use-modules (shepherd service timer))

(define timer
  (service '(my-timer)
   #:start (make-timer-constructor
(calendar-event #:seconds '(0 7 15 22 30 45))
(command '("sh" "-c" "echo Hi from $PWD.; sleep 20; echo 
done")))
   #:stop (make-timer-destructor)))

(register-services (list timer))
(start-in-the-background '(my-timer))
--8<---cut here---end--->8---

And then ‘my-timer’ invokes the given command at the moments that match
the constraints defined by ‘calendar-event’—in this case any time the
number of seconds is equal to 0, 7, 15, 22, 30, or 45.  You can also
make it every Monday at 9AM etc., as you would expect.

The ‘herd’ command provides details information about the timer:

--8<---cut here---start->8---
$ ./herd -s sock status my-timer
Status of my-timer:
  It is running since 21:09:32 (68 seconds ago).
  Timed service.
  Periodically running: sh -c "echo Hi from $PWD.; sleep 20; echo done".
  Child process: 1814
  It is enabled.
  Provides (my-timer).
  Requires ().
  Will not be respawned.

Recent runs:
  2024-03-24 21:10:04 Process exited successfully.
  2024-03-24 21:10:19 Process exited successfully.
  2024-03-24 21:10:26 Process exited successfully.
  2024-03-24 21:10:34 Process exited successfully.
  2024-03-24 21:10:35 Process terminated with signal 15.

Recent messages:
  2024-03-24 21:10:29 Hi from /home/ludo.

Upcoming timer alarms:
  21:10:45 (in 5 seconds)
  21:11:00 (in 20 seconds)
  21:11:07 (in 27 seconds)
  21:11:15 (in 35 seconds)
  21:11:22 (in 42 seconds)
--8<---cut here---end--->8---

And of course you can do anything you can do with a service: stop it,
unload it, load a replacement, and so on.

Feedback & suggestions welcome!

Ludo’.



Re: PyTorch with ROCm

2024-03-24 Thread David Elsing
Hi Ricardo,

thanks for the information!

Ricardo Wurmus  writes:

> Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added
> python-pytorch2 at version 2.2.1.  Do you think you could adjust your
> patches to modify that one instead?

I already adjusted the patches yesterday to remove the python-pytorch2
package you added, as the patch series updates the main python-pytorch
package to version 2.
The new inputs of your package were already included, with the exception
of python-opt-einsum. I had overlooked it before and included it now. :)
Is there a reason to keep version 1 around? Then I could adjust the
patches again. Otherwise, it makes sense for me to move the
python-pytorch package to version 2.2.1 and have a package variant with
2.0.1 for r-torch (which I kept and adjusted).

Due to problems when building dependencies, the new package only
succeeds to build for x86_64. As I explained in the patch series, asmjit
fails on armhf because GCC runs out of memory (it reaches 4 GB I think
and more is of course not possible) and cpuinfo has a known bug on
aarch64 [1], which causes the tests to fail and AFAICT also break
PyTorch at runtime. Through python-pytorch -> python-expecttest ->
poetry -> python-keyring -> python-secretstorage -> python-cryptography,
the python-pytorch package now depends on rust, which currently requires
too much memory to build on 32 bit systems, so i686 is not supported
either.

What do you think should be done here?

I added all packages required for the core tests to native-inputs, but
decided to disable them as they require a long time to run. If I remove
the test inputs (in particular python-expecttest), the package could
probably also be built for i686. Would it be acceptable keep them as a
comment for reference?

> I think it is sufficient to only have the current version of ROCm; other
> versions could be added if there is reasonable demand.

That sounds good to me.

Best,
David

[1] https://github.com/pytorch/cpuinfo/issues/14



Re: PyTorch with ROCm

2024-03-24 Thread Ricardo Wurmus


Hi David,

> after seeing that ROCm packages [1] are available in the Guix-HPC
> channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2.

Excellent initiative!

> For this, I first unbundled the (many) remaining dependencies of the
> python-pytorch package and updated it to 2.2.1, the patch series for
> which can be found here [2,3].

Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added
python-pytorch2 at version 2.2.1.  Do you think you could adjust your
patches to modify that one instead?

> It would be really great to have these packages in Guix proper, but
> first of course the base ROCm packages need to be added after deciding
> how to deal with the different architectures. Also, are several ROCm
> versions necessary or would only one (the current latest) version
> suffice?

As for the ROCm-specific work, I'm not qualified to comment.  I do
support a move of the ROCm packages from Guix HPC to Guix proper.

I think it is sufficient to only have the current version of ROCm; other
versions could be added if there is reasonable demand.

-- 
Ricardo



Re: Bug#1066113: guix: CVE-2024-27297

2024-03-24 Thread pelzflorian (Florian Pelz)
On 2024-03-16, Vagrant Cascadian wrote:
> For anyone with Guix or Nix installed, if I understand correctly, it
> basically allows arbitrarily replacing the source code for anything that
> you might build using Guix or Nix.

Yes, for multi-user systems and people running untrusted code in “guix
shell -CW” container isolation, there is risk.

Regards,
Florian



PyTorch with ROCm

2024-03-24 Thread David Elsing
Hello,

after seeing that ROCm packages [1] are available in the Guix-HPC
channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2.

For this, I first unbundled the (many) remaining dependencies of the
python-pytorch package and updated it to 2.2.1, the patch series for
which can be found here [2,3].

For building ROCm and building the remaining packages, I did not apply
the same quality standard as for python-pytorch and just tried to get it
working at all with ROCM 6.0.2. To reduce the build time, I also only
tested them for gfx1101 as set in the %amdgpu-targets variable in
amd/rocm-base.scm (which needs to be adjusted for other GPUs). Here, it
seemed to work fine on my GPU.

The changes for the ROCm packages are here [4] as a modification of
Guix-HPC. There, the python-pytorch-rocm package in
amd/machine-learning.scm depends on the python-pytorch-avx package in
[2,3]. Both python-pytorch and python-pytorch-avx support AVX2 / AVX-512
instructions, but the latter also has support for fbgemm and nnpack. I
used it over python-pytorch because AVX2 or AVX-512 instructions should
be available on a CPU with PCIe atomics anyway, which ROCm requires.

For some packages, such as composable-kernel, the build time and
memory requirement is already very high when building only for one GPU
architecture, so maybe it would be best to make a separate package for
each architecture?
I'm not sure they can be combined however, as the GPU code is included
in the shared libraries. Thus all dependent packages like
python-pytorch-rocm would need to be built for each architecture as
well, which is a large duplication for the non-GPU parts.

There were a few other issues as well, some of them should probably be
addressed upstream:
- Many tests assume a GPU to be present, so they need to be disabled.
- For several packages (e.g. rocfft), I had to disable the
  validate-runpath? phase, as there was an error when reading ELF
  files. It is however possible that I also disabled it for packages
  where it was not necessary, but it was the case for rocblas at
  least. Here, kernels generated are contained in ELF files, which are
  detected by elf-file? in guix/build/utils.scm, but rejected by
  has-elf-header? in guix/elf.scm, which leads to an error.
- Dependencies of python-tensile copy source files and later copy them
  with shutil.copy, sometimes twice. This leads to permission errors,
  as the permissions in the store are kept, so I patched it to use
  shutil.copyfile instead.
- There were a few errors due to using the GCC 11 system headers with
  rocm-toolchain (which is based on Clang+LLVM). For roctracer,
  replacing std::experimental::filesystem by std::filesystem suffices,
  but for rocthrust, the placement new operator is not found. I
  applied the patch from Gentoo [5], where it is replaced by a simple
  assignment. It looks like UB to me though, even if it happens to
  work. The question is whether this is a bug in libstdc++, clang or
  amdclang++...
- rocMLIR also contains a fork of the LLVM source tree and it is not
  clear at a first glance how exactly it differs from the main ROCm
  fork of LLVM or upstream LLVM.

It would be really great to have these packages in Guix proper, but
first of course the base ROCm packages need to be added after deciding
how to deal with the different architectures. Also, are several ROCm
versions necessary or would only one (the current latest) version
suffice?

Cheers,
David

[1] https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix/
[2] https://issues.guix.gnu.org/69591
[3] https://codeberg.org/dtelsing/Guix/src/branch/pytorch
[4] https://codeberg.org/dtelsing/Guix-HPC/src/branch/pytorch-rocm
[5] 
https://gitweb.gentoo.org/repo/gentoo.git/tree/sci-libs/rocThrust/files/rocThrust-4.0-operator_new.patch



Re: cuirass building and deployment

2024-03-24 Thread Jim Dupont
i will report more, for now you can see there are thousands of build errors
tryign to bootstrap
http://34.41.82.208:8080/eval/7?status=failed

On Sun, Mar 24, 2024, 07:27 Ludovic Courtès  wrote:

> Hi,
>
> Jim Dupont  skribis:
>
> > have been struggling to deploy cuirass on  a gcp cluster.
> > I was able to install it manually on one machine and use it from guix
> > talking to postgres and nginx running in ubuntu. the shepherd wont build
> or
> > install, needs more love.
> > I have gotten to finally build on guix and can export the package from
> one
> > system to another.
> > Can someone help me understand how to document the exact steps to
> reproduce
> > a build?
> > There are constant issues with guile modules not found or loading,
> > autoconfig failing or other errors,
>
> Could you be more specific about the build errors you encounter?
>
> To run Cuirass, the best option is install the ‘cuirass’ package in
> Guix: ‘guix install cuirass’.  Then you can start the ‘cuirass register’
> and ‘cuirass web’ processes manually or get systemd to handle them (I
> don’t think Cuirass provides ‘.service’ files, but it should.)
>
> If you want to be on the bleeding edge (not recommended), the easiest
> way is to run ‘guix build -f guix.scm’ from the Cuirass source tree.
>
> HTH!
>
> Ludo’.
>


Re: cuirass building and deployment

2024-03-24 Thread Ludovic Courtès
Hi,

Jim Dupont  skribis:

> have been struggling to deploy cuirass on  a gcp cluster.
> I was able to install it manually on one machine and use it from guix
> talking to postgres and nginx running in ubuntu. the shepherd wont build or
> install, needs more love.
> I have gotten to finally build on guix and can export the package from one
> system to another.
> Can someone help me understand how to document the exact steps to reproduce
> a build?
> There are constant issues with guile modules not found or loading,
> autoconfig failing or other errors,

Could you be more specific about the build errors you encounter?

To run Cuirass, the best option is install the ‘cuirass’ package in
Guix: ‘guix install cuirass’.  Then you can start the ‘cuirass register’
and ‘cuirass web’ processes manually or get systemd to handle them (I
don’t think Cuirass provides ‘.service’ files, but it should.)

If you want to be on the bleeding edge (not recommended), the easiest
way is to run ‘guix build -f guix.scm’ from the Cuirass source tree.

HTH!

Ludo’.



Status of ‘core-updates’

2024-03-24 Thread Ludovic Courtès
Hello!

What’s the status of ‘core-updates’?  What are the areas where help is
needed?

I know a lot has happened since the last update¹, which is roughly when
I dropped the ball due to other commitments, but I’m not sure where we
are now.

Thanks,
Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2024-01/msg00096.html



Re: Error handling when 'guix substitute' dies

2024-03-24 Thread Ludovic Courtès
Hi Ada,

Ada Stevenson  skribis:

> Sometimes, usually when I'm on an enterprise network like my
> university's of library's wifi, the `guix substitute` process dies
> with a "TLS error in procedure 'write_to_session_record_port': Error
> in the push function" error message. My connection is rock-solid
> otherwise, and sometimes it doesn't happen at all.

What version of guix-daemon are you using?  Was it installed through
‘guix pull’ or via another distro?

I’ve seen this before but I haven’t experienced it in a long time, so I
wonder if I’m just lucky or if there are other factors at play.

> I'm not sure if this is a fault in the actual Guix code, or there's
> some Guile library somewhere that has this bug. Anyway, I think it
> would be a useful feature to have a way to automatically restart the
> `guix substitute` process or otherwise recover from this error. Some
> sort of `--restart=no.restarts.permitted` flag. Whenever I'm updating
> my system I tend to leave and do something else, and when this happens
> I come back and nothing's actually been done, and the error is
> transient so I don't gain anything from seeing this message.

‘guix substitute’ is a ‘guix-daemon’ helper, which automatically
(re)starts it when needed.

Now, what we could do is have ‘guix substitute’ gracefully handle those
errors and keep going.  I believe this one-liner should do it:

diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 37cd08e289..3af0bf0019 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -494,7 +494,9 @@ (define* (download-nar narinfo destination
   (define (try-fetch choices)
 (match choices
   (((uri compression file-size) rest ...)
-   (guard (c ((and (pair? rest) (http-get-error? c))
+   (guard (c ((and (pair? rest)
+   (or (http-get-error? c)
+   (network-error? c)))
   (warning (G_ "download from '~a' failed, trying next URL~%")
(uri->string uri))
   (try-fetch rest)))

I’ll go ahead with this change if there are no objections.

Thanks,
Ludo’.