December 9, 2022 12:32 PM, "zimoun" <zimon.touto...@gmail.com> wrote:
> Hi, > > Preparing some Python stuff, I was toying with the package > python-networkx. And Guix is awesome because it is easy to extract the > graph of dependencies. > > Here dependencies are just inputs, native-inputs and propagated-inputs. > It could be interesting to also include build-system dependencies, I > have been lazy. :-) > > My initial question is to know what are the “essentials”? By essential, > I mean the “important“ ones, the “hot” ones, etc. The ones which are > “influencers” – yeah the world is a social network. :-) > > First, let extract the graph with a tiny Scheme script: > > $ guix repl -- packages-to-dict.scm > dod.py > > Then, let import that into an IPython session: > > $ guix shell python python-ipython \ > python-scipy python-matplotlib python-networkx -- ipython > > and run another tiny Python script for plotting. See Figure attached. > > We can compare a link analysis metrics [1] and a centrality measure > [2]; say PageRank [3] and Eigenvector [4]. More the value is large and > higher the package is “important“ (for this metrics). > > And the Directed and Undirected graphs can be compared, using Networkx > [5,6]. Well, Eigenvector centrality (or Katz centrality [7]) is failing > because the power iteration does not converge but other metrics could be > also considered. Here is just a first rough toy. :-) > > According to PageRank applied to the Directed Graph, the 10 most > “important” packages are: > > --8<---------------cut here---------------start------------->8--- > [('pkg-config-0.29.2', 0.02418335991713879), > ('perl-5.34.0', 0.015404032767249512), > ('coreutils-minimal-8.32', 0.013240458675517012), > ('zlib-1.2.11', 0.009107245584307803), > ('python-pytest-6.2.5', 0.008413060648307678), > ('ncurses-6.2.20210619', 0.007598925467605917), > ('r-knitr-1.41', 0.00554772892485958), > ('sbcl-rt-1990.12.19-1.a6a7503', 0.004884721933452539), > ('bzip2-1.0.8', 0.004800877844001881), > ('python-3.9.9', 0.00415536078558266)] > --8<---------------cut here---------------end--------------->8--- > > And if we compare the 3 results (Undirected with PageRank and > Eigenvector, and Directed with PageRank only, then 10 most “important” > packages are: > > --8<---------------cut here---------------start------------->8--- > ['pkg-config-0.29.2', > 'glib-2.70.2', > 'zlib-1.2.11', > 'gtk+-3.24.30', > 'perl-5.34.0', > 'gettext-minimal-0.21', > 'qtbase-5.15.5', > 'libxml2-2.9.12', > 'python-3.9.9', > 'autoconf-2.69'] > --8<---------------cut here---------------end--------------->8--- > > Somehow, it means that these packages have an high influence on all the > others. Now, we can roughly compare with the release-manifest.scm [8], > > --8<---------------cut here---------------start------------->8--- > '("bootstrap-tarballs" "gcc-toolchain" "nss-certs" > "openssh" "emacs" "vim" "python" "guile" "guix"))) > '("coreutils" "grep" "findutils" "gawk" "make" > #;"gcc-toolchain" "tar" "xz"))) > '("xorg-server" "xfce" "gnome" "mate" "enlightenment" > "openbox" "awesome" "i3-wm" "ratpoison" > "emacs" "emacs-exwm" "emacs-desktop-environment" > "xlockmore" "slock" "libreoffice" > "connman" "network-manager" "network-manager-applet" > "openssh" "ntp" "tor" > "linux-libre" "grub-hybrid" > '("coreutils" "grep" "sed" "findutils" "diffutils" "patch" > "gawk" "gettext" "gzip" "xz" > "hello" "zlib")))) > --8<---------------cut here---------------end--------------->8--- > > Well, we could investigate more and play more with some graphs tools. > For instance, include all the build-system dependencies and so on. > > Some list about “statistically important” packages could help for > improving the list of “essential” packages. > > Although Python is great, I would like to run Guile. Any Guile library > for manipulating graph is around? https://packages.guix.gnu.org/packages/guile2.2-charting/0.2.0-1.75f755b/ Thought it may be guile 2 only...? > > All that to say, Guix is great! :-) And perhaps some of you have already > some Guile code for analysing graphs. Maybe. > > Well, comment or idea is welcome. :-) > > 1: <https://en.wikipedia.org/wiki/Network_theory#Link_analysis> > 2: <https://en.wikipedia.org/wiki/Network_theory#Centrality_measures> > 3: <https://en.wikipedia.org/wiki/PageRank> > 4: <https://en.wikipedia.org/wiki/Eigenvector_centrality> > 5: > <https://networkx.org/documentation/stable/reference/algorithms/link_analysis.html> > 6: > <https://networkx.org/documentation/stable/reference/algorithms/centrality.html> > 7: <https://en.wikipedia.org/wiki/Katz_centrality> > 8: > <https://git.savannah.gnu.org/cgit/guix.git/tree/etc/release-manifest.scm#n47> > > Cheers, > simon