On +2024-04-30 22:18:03 -0400, Richard Sent wrote: > Hi Guix! > > When running guix search, relevance in synopsis and description fields > are computed strictly by the number of matches, both as a word and as a > subword. Ideally, if a search string matches an isolated word in a > search, that result should be considered more relevant than simply > matching a subword, even multiple times. > > To illustrate, imagine trying to find what package provides the `rsh` > binary and running running `$ guix search rsh`. This binary is part of > `inetutils` and the description field contains: > > > Inetutils is a collection of common network programs, such as an ftp > > client and server, a telnet client and server, an rsh client and > > server, and hostname. > > Most likely, this is what the user is interested in. However, inetutils > does not show up until roughly the ~75th result with a relevance of 2 > (the lowest possible relevance). > > Almost every search result beforehand contains the string "rsh" as a > component of another word, such as "marshaling", "powershell", and > "hershey". However, these match multiple times and are weighted > significantly higher. > > Ideally, guix search should rate inetutils higher because the string > "rsh" occurs as its own word, not as a component of another, unrelated > word. (Very, very people would search "rsh" looking for matches with > "hershey", even if "hershey" occurs multiple times.) > > Another example of where this can happen is with "dig", part of the bind > package. Searching for "dig" returns garbage because "dig" is a common > subword. Bind is scored with a relevance of 2, even though bind's > description emphasises that dig is part of it. > > This would improve the experience when searching with strings that > commonly occur as subwords. > > Since this change can't occur in a vacuum, care should be taken not to > reduce the effectiveness of other reasonably forseeable search queries. > > -- > Take it easy, > Richard Sent > Making my computer weirder one commit at a time. > > >
I like your proposal :) I'm wondering how [1] compares in what it does for your use(ful) case. (I am not familiar with Hyper Estraier beyond being prompted for gnu.org searching) [1] <https://directory.fsf.org/wiki/Hyper_Estraier> -- Regards, Bengt Richter