Re: Mechanism for helping in multi-channels configuration (and Xapian index)

2024-05-06 Thread Simon Tournier
Hi,

Sorry for the long delay.

On lun., 18 mars 2024 at 16:05, Christina O'Donnell  wrote:

>> 2: https://issues.guix.gnu.org/issue/39258

> As I said above, [2] is a fairly long thread, but I think I get the 
> general idea. It seems that Xapian was implemented but didn't have the 
> desired speedup. Am I getting the right impression there?

Not really.  From my memories, the first blocker for implementing search
with Xapian was adding Xapian as dependency of Guix.  This addition
would be a bad idea, IMHO.  However…

At the time of discussing Xapian-based “guix search”,
GUIX_EXTENSIONS_PATH was at its infancy.  Therefore, it was not really
on the table.

… Xapian-based “package search” appears to me an option if it is turned
into a Guix extension.  This way, adding Xapian as dependency is not for
all but only for those who want more features. :-)

I have on my TODO list to resume the work:

 1. Benchmark Xapian-based search
 2. Benchmark Xapian index building

Then depending on that, it draws the directions.


Cheers,
simon




Re: Mechanism for helping in multi-channels configuration (and Xapian index)

2024-03-18 Thread Christina O'Donnell

Hi Simon,

Sorry for the really long delay, I meant to reply after I'd had a good 
read through the conversation you linked, but I haven't had a chance to 
really get into it yet, but I have read enough to get a surface idea of 
the project. The project looks fun, and looks like it will help Guix 
users and developers so I'd be on board in principle.


On 15/02/2024 15:05, Simon Tournier wrote:

...

I think you mean ’fold-packages’.


   2. Have a script that determines the symbols needed by each file. (Macros
      make this more difficult, but.)

Well, this would be difficult, IMHO.  Somehow, it is what the compiler
does. :-)
I asked on guile-devel and Maxime suggested using `module-map` or 
`module-for-each` which map over all symbols in a module. Presumably 
this would know what's quotes literally and what's a proper symbol.

   3. Have both scripts have an incremental version that runs on diffs (for
      performance).
   4. Run this for every commit on every branch on every channel caching the
      result.
   5. Have a CI script keep this updated for new commits.
   6. Have a server track incompatibilities.

Here, I think the issue is that one server needs to track all the
channels.  And that’s a too strong assumption, IMHO.

I think the design should be something on channel maintainer side.
Somehow, the main Guix channel could be seen as a Git submodule from the
channel side and the issue is that information is not tracked.

There is this ’.guix-channel’ file which allows to describe channel
dependencies.  And the improvements could be to add more there.  The
question is what to add and how to add it.  Keeping in mind the
simplicity and the maintenance burden-free. :-)


Okay, this makes sense. I'm thinking that you could have something like 
a sqlite index that can be generated by running a script on the code. 
The index could exist on the server separate to the channel repo, 
pointed at by the .guix-channel file. The commit hook could: (1) update 
the local index to include the latest commit; (2) update the hash inside 
.guix-channel. Then a push hook could also push the index to the server.


It's a bit clunky because you've got this binary blob that you have to 
synchronize with the channel, and it's easy to get this wrong. Putting 
the index in the channel repo would bloat the channel with old versions 
of the index. Forcing users to generate the index from scratch is 
undesirable too.


As an alternative to having the index referenced in the .guix-channel, 
we could use git-annex. This would take care of: Fetching the index, 
uploading the new index on push, and updating the hash. No extra steps 
would be /required/ by developers, as it won't be necessary to have the 
index 100% up to date. But developers could choose to regenerate the 
index and call `git annex sync`. I suspect that adding git-annex as a 
dependency would be resisted, but that's the way I think would work 
best. And could apply to existing indexes. It depends on how long it 
takes to generate the index from scratch. There was some talk of 
data.guix.gnu.org using PostgreSQL to index packages. I suppose it'd be 
worth figuring out what they do to see if they have anything sql or code 
that might be portable to sqlite.



Full disclosure: I've got nothing lined up for the summer yet, so I'm on the
prowl for GSoC projects :)

Cool!

In that spirit, one tool that is missing is: search packages in all the
history. Somehow the need is described by this message [1]: how to find
which Guix revision provides which version of Foo?

In addition, “guix search” is slow [2].

Well, I have started the embryo of an extension based on Guile-Xapian
for indexing and improving the search.  Really an embryo. :-)

I think this would fit some GSoC. ;-)


As I said above, [2] is a fairly long thread, but I think I get the 
general idea. It seems that Xapian was implemented but didn't have the 
desired speedup. Am I getting the right impression there?


It's certainly an interesting problem. I'll keep thinking about it.

Kind regards,

Christina


1: Re: List available versions of package.
Philippe Veber 
Tue, 11 Jun 2019 09:43:08 +0200
id:CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com
https://lists.gnu.org/archive/html/help-guix/2019-06
https://yhetil.org/guix/CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com

2: https://issues.guix.gnu.org/issue/39258




Re: Mechanism for helping in multi-channels configuration

2024-03-12 Thread Attila Lendvai
> Although I concur with this need, I do not see how it would be help for
> detecting compatibility between channels. :-)


maybe i'm overthinking this, and all we need is a way to point to git commit 
ranges that are compatible.

more specifically, i'm maintaining the guix-crypto channel, and i often miss 
the ability to point to a guix commit, beyond which there is a change in guix 
that my channel is not yet compatible with. if my users issue a `guix pull`, 
then it would not pull the guix channel beyond that commit, and warn the users 
that it's being held back.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“In the electronics industry, patents are of no value whatsoever in spurring 
research and development.”
— vice-president of Intel Corporation, Business Week, 11 May 1981.




Re: Mechanism for helping in multi-channels configuration

2024-02-15 Thread Simon Tournier
Hi Attila,

On mar., 06 févr. 2024 at 17:16, Attila Lendvai  wrote:
>> The wishlist is: provide a machine-readable description on guix-science
>> channel side in order to help in finding the good overlap between
>> commits of different channels.
>
> i wrote about a missing abstraction here:
>
> https://lists.gnu.org/archive/html/guix-devel/2023-12/msg00104.html

You wrote in [1]:

it's probably the same thing that causes the discrepancy between
git commits and substitutes: the build servers are not building
every commit of the git repo. they pick an unpredictable (?)
series of commits, skipping some in between.  if i guix pull, or
guix time-machine to the "wrong" commit, then i'll need to build
some stuff locally. sometimes these can be heavy packages.

To my knowledge:

 + ci.guix (Cuirass) fetches every 5 minutes (IIRC) and builds the last
   commit.

 + bordeaux.guix (Build Coordinator) fetches the batch from the mailing
   list guix-commits:
   

About CI, yes it is unpredictable.  About Bordeaux, it is not really. :-)

1: Re: Should commits rather be buildable or small
Attila Lendvai 
Sun, 10 Dec 2023 23:20:25 +
id:SXjFmdTgxwHYE-Z6t7SZOykuXMBiD454EF2uad96jGQemgJ6hXki_f1C7VxVHKHa4b7_j5UwJmffh_FiQqEz_bIYIBn9tpG4s9F7W1eIDAQ=@lendvai.name
https://lists.gnu.org/archive/html/guix-devel/2023-12
https://yhetil.org/guix/SXjFmdTgxwHYE-Z6t7SZOykuXMBiD454EF2uad96jGQemgJ6hXki_f1C7VxVHKHa4b7_j5UwJmffh_FiQqEz_bIYIBn9tpG4s9F7W1eIDAQ=@lendvai.name

> the git commit log is a too fine-grained granularity here. there
> should be something like a 'guix log' above the git log that could be
> used, among other things, to encode inter-channel dependencies.

Considering the current status and how substitutes are GC, the first
step would be the retention of some substitutes.  And thus the
specification for a policy of such retention.  It would allow to build a
database that could be queried by this hypothetical “guix log” – which
should be more something under “guix weather” IMHO.

For the interested readers, thread about retention:

Building and caching old Guix derivations for a faster time machine
Ricardo Wurmus 
Fri, 10 Nov 2023 10:29:28 +0100
id:87o7g29c94@elephly.net
https://lists.gnu.org/archive/html/guix-devel/2023-11
https://yhetil.org/guix/87o7g29c94@elephly.net

Substitute retention
Ludovic Courtès 
Tue, 12 Oct 2021 18:04:25 +0200
id:87y26ytek6.fsf...@inria.fr
https://lists.gnu.org/archive/html/guix-devel/2021-10
https://yhetil.org/guix/87y26ytek6.fsf...@inria.fr

Although I concur with this need, I do not see how it would be help for
detecting compatibility between channels. :-)

Cheers,
simon



Re: Mechanism for helping in multi-channels configuration

2024-02-15 Thread Simon Tournier
Hi Christina,

On sam., 03 févr. 2024 at 15:27, Christina O'Donnell  wrote:

>   1. Have a script that scrapes all the define-public symbols in every 
> file in
>      every package.

I think you mean ’fold-packages’.

>   2. Have a script that determines the symbols needed by each file. (Macros
>      make this more difficult, but.)

Well, this would be difficult, IMHO.  Somehow, it is what the compiler
does. :-)

>   3. Have both scripts have an incremental version that runs on diffs (for
>      performance).
>   4. Run this for every commit on every branch on every channel caching the
>      result.
>   5. Have a CI script keep this updated for new commits.
>   6. Have a server track incompatibilities.

Here, I think the issue is that one server needs to track all the
channels.  And that’s a too strong assumption, IMHO.

I think the design should be something on channel maintainer side.
Somehow, the main Guix channel could be seen as a Git submodule from the
channel side and the issue is that information is not tracked.

There is this ’.guix-channel’ file which allows to describe channel
dependencies.  And the improvements could be to add more there.  The
question is what to add and how to add it.  Keeping in mind the
simplicity and the maintenance burden-free. :-)


> Full disclosure: I've got nothing lined up for the summer yet, so I'm on the
> prowl for GSoC projects :)

Cool!

In that spirit, one tool that is missing is: search packages in all the
history. Somehow the need is described by this message [1]: how to find
which Guix revision provides which version of Foo?

In addition, “guix search” is slow [2].

Well, I have started the embryo of an extension based on Guile-Xapian
for indexing and improving the search.  Really an embryo. :-)

I think this would fit some GSoC. ;-)


Cheers,
simon

1: Re: List available versions of package.
Philippe Veber 
Tue, 11 Jun 2019 09:43:08 +0200
id:CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com
https://lists.gnu.org/archive/html/help-guix/2019-06
https://yhetil.org/guix/CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com

2: https://issues.guix.gnu.org/issue/39258



Re: Mechanism for helping in multi-channels configuration

2024-02-06 Thread Attila Lendvai
> The wishlist is: provide a machine-readable description on guix-science
> channel side in order to help in finding the good overlap between
> commits of different channels.


i wrote about a missing abstraction here:

https://lists.gnu.org/archive/html/guix-devel/2023-12/msg00104.html

which is more or less related to this.

the git commit log is a too fine-grained granularity here. there should be 
something like a 'guix log' above the git log that could be used, among other 
things, to encode inter-channel dependencies.

maybe frequent semver releases for guix channels could work as reference points 
to be used to formally encode inter-channel dependencies? (and to guide the 
substitute chaching/building; mark "safe points" for the time-machine; etc)

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
Life is a tragedy to those who feel and a comedy to those who think.




Re: Mechanism for helping in multi-channels configuration

2024-02-06 Thread Attila Lendvai
> Anything is better than an obscure failure/backtrace


i disagree with this specific statement. in the long run, the (inconspicuous) 
cost of added complexity can easily move anything into net negative territory.

IOW, feel encouraged to account for the cost of complexity. it's rarely done 
prior to setbacks.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Until we have met the monsters in ourselves, we keep trying to slay them in 
the outer world. And we find that we cannot. For all darkness in the world 
stems from darkness in the heart. And it is there that we must do our work.”
— Marianne Williamson (1952–), 'Everyday Grace: Having Hope, Finding 
Forgiveness And Making Miracles' (2004)




Re: Mechanism for helping in multi-channels configuration

2024-02-06 Thread Maxim Cournoyer
Hi,

Simon Tournier  writes:

> Hi,
>
> Well, using Guix bdab356 from a little bit more than one month old, then
> associating the channel guix-science 0b3d4a2f last week, I get the
> failure:
>
> $ guix build /gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv
> The following derivation will be built:
>   /gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv
> building /gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv...
> (repl-version 0 1 1)
> WARNING: (guix-science build bazel-build-system): imported module (guix build 
> utils) overrides core binding `delete'
> (exception unbound-variable (value #f) (value "Unbound variable: ~S") (value 
> (python-nr-stream)) (value #f))
> builder for `/gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv' 
> failed to produce output path 
> `/gnu/store/qzgj4vig3vklbznz1i0pgy11nr3z4rv9-guix-science'
> build of /gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv failed
> View build log at 
> '/var/log/guix/drvs/g3/aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv.gz'.
> guix build: error: build of 
> `/gnu/store/g3aa5rh7bs5pyxd3q1gvhwz1s9z1vh3z-guix-science.drv' failed
>
> Well, that’s expected!  Guix bdab356 does not contain python-nr-stream
> introduced by commit 7dfe41aa71a4a4a9d6065a44e9c6271717215b3e.
>
> The wishlist is: provide a machine-readable description on guix-science
> channel side in order to help in finding the good overlap between
> commits of different channels.
>
> It could be nice if instead of an hard error, “guix pull” could say:
> « the channel ’guix’ needs to be at least at commit 1234abc ».

Anything is better than an obscure failure/backtrace, so I'd say it's a
good idea, especially if you are motivated to hack on it.

-- 
Thanks,
Maxim



Re: Mechanism for helping in multi-channels configuration

2024-02-03 Thread Christina O'Donnell

Hi Simon,


The wishlist is: provide a machine-readable description on guix-science
channel side in order to help in finding the good overlap between
commits of different channels.

It could be nice if instead of an hard error, “guix pull” could say:
« the channel ’guix’ needs to be at least at commit 1234abc ».


I was just thinking about these kinds of errors. It would also happen 
between

channels when packages are split from a single file (eg. golang.scm to
golang-xyz.scm). Then channels immediately go out of sync as we're doing
continual releases. So, it wouldn't just be for time-machine. It's all a 
bit too
fragile for my liking. I assume we won't be to frequent versioned 
releases any

time soon..

A sketch of a solution might be:

 1. Have a script that scrapes all the define-public symbols in every 
file in

    every package.
 2. Have a script that determines the symbols needed by each file. (Macros
    make this more difficult, but.)
 3. Have both scripts have an incremental version that runs on diffs (for
    performance).
 4. Run this for every commit on every branch on every channel caching the
    result.
 5. Have a CI script keep this updated for new commits.
 6. Have a server track incompatibilities.

For example, a 'definition-reference' could look like,

(definition-reference (commit-range start-hash end-hash)
  file-path
  identifier)

(definition-reference (commit-range "44b340d..." "06dba3b...")
  "gnu/packages/golang"
  'go-github-com-rs-xid)

Commit ranges makes the size of entries tractable (since package 
probably aren't

getting moved / deleted / added very much).

Then use a hash table, (or trie or B+ Tree, or distributed hash table, 
etc) to

go from identifier to definition-reference.

You would probably would also want to know commit date so you could 
index on it.
That would let you find versions that supplied the identifier that are 
as close

as possible chronologically to a particular version of a different channel

Now this isn't perfect (in case anyone was getting that impression ;):
 - It won't have any idea about version incompatibilities.
 - It couldn't trace renamed variables.
 - And probably more.

Might be useful to additionally track package versions, but that might 
run into

resource issues.

I'm thinking a Guile daemon backed by SQLite.. What do you think?

Full disclosure: I've got nothing lined up for the summer yet, so I'm on the
prowl for GSoC projects :)

Kind regards,
 - Christina