Hi, Preparing some Python stuff, I was toying with the package python-networkx. And Guix is awesome because it is easy to extract the graph of dependencies.
Here dependencies are just inputs, native-inputs and propagated-inputs. It could be interesting to also include build-system dependencies, I have been lazy. :-) My initial question is to know what are the “essentials”? By essential, I mean the “important“ ones, the “hot” ones, etc. The ones which are “influencers” – yeah the world is a social network. :-) First, let extract the graph with a tiny Scheme script: $ guix repl -- packages-to-dict.scm > dod.py Then, let import that into an IPython session: $ guix shell python python-ipython \ python-scipy python-matplotlib python-networkx -- ipython and run another tiny Python script for plotting. See Figure attached. We can compare a link analysis metrics [1] and a centrality measure [2]; say PageRank [3] and Eigenvector [4]. More the value is large and higher the package is “important“ (for this metrics). And the Directed and Undirected graphs can be compared, using Networkx [5,6]. Well, Eigenvector centrality (or Katz centrality [7]) is failing because the power iteration does not converge but other metrics could be also considered. Here is just a first rough toy. :-) According to PageRank applied to the Directed Graph, the 10 most “important” packages are: --8<---------------cut here---------------start------------->8--- [('pkg-config-0.29.2', 0.02418335991713879), ('perl-5.34.0', 0.015404032767249512), ('coreutils-minimal-8.32', 0.013240458675517012), ('zlib-1.2.11', 0.009107245584307803), ('python-pytest-6.2.5', 0.008413060648307678), ('ncurses-6.2.20210619', 0.007598925467605917), ('r-knitr-1.41', 0.00554772892485958), ('sbcl-rt-1990.12.19-1.a6a7503', 0.004884721933452539), ('bzip2-1.0.8', 0.004800877844001881), ('python-3.9.9', 0.00415536078558266)] --8<---------------cut here---------------end--------------->8--- And if we compare the 3 results (Undirected with PageRank and Eigenvector, and Directed with PageRank only, then 10 most “important” packages are: --8<---------------cut here---------------start------------->8--- ['pkg-config-0.29.2', 'glib-2.70.2', 'zlib-1.2.11', 'gtk+-3.24.30', 'perl-5.34.0', 'gettext-minimal-0.21', 'qtbase-5.15.5', 'libxml2-2.9.12', 'python-3.9.9', 'autoconf-2.69'] --8<---------------cut here---------------end--------------->8--- Somehow, it means that these packages have an high influence on all the others. Now, we can roughly compare with the release-manifest.scm [8], --8<---------------cut here---------------start------------->8--- '("bootstrap-tarballs" "gcc-toolchain" "nss-certs" "openssh" "emacs" "vim" "python" "guile" "guix"))) '("coreutils" "grep" "findutils" "gawk" "make" #;"gcc-toolchain" "tar" "xz"))) '("xorg-server" "xfce" "gnome" "mate" "enlightenment" "openbox" "awesome" "i3-wm" "ratpoison" "emacs" "emacs-exwm" "emacs-desktop-environment" "xlockmore" "slock" "libreoffice" "connman" "network-manager" "network-manager-applet" "openssh" "ntp" "tor" "linux-libre" "grub-hybrid" '("coreutils" "grep" "sed" "findutils" "diffutils" "patch" "gawk" "gettext" "gzip" "xz" "hello" "zlib")))) --8<---------------cut here---------------end--------------->8--- Well, we could investigate more and play more with some graphs tools. For instance, include all the build-system dependencies and so on. Some list about “statistically important” packages could help for improving the list of “essential” packages. Although Python is great, I would like to run Guile. Any Guile library for manipulating graph is around? All that to say, Guix is great! :-) And perhaps some of you have already some Guile code for analysing graphs. Maybe. Well, comment or idea is welcome. :-) 1: <https://en.wikipedia.org/wiki/Network_theory#Link_analysis> 2: <https://en.wikipedia.org/wiki/Network_theory#Centrality_measures> 3: <https://en.wikipedia.org/wiki/PageRank> 4: <https://en.wikipedia.org/wiki/Eigenvector_centrality> 5: <https://networkx.org/documentation/stable/reference/algorithms/link_analysis.html> 6: <https://networkx.org/documentation/stable/reference/algorithms/centrality.html> 7: <https://en.wikipedia.org/wiki/Katz_centrality> 8: <https://git.savannah.gnu.org/cgit/guix.git/tree/etc/release-manifest.scm#n47> Cheers, simon
packages-to-dict.scm
Description: packages-to-dict.scm
import networkx as nx import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt import numpy as np exec(open("dod.py").read()) UG = nx.Graph(dod) DG = nx.DiGraph(dod) nnodes = DG.number_of_nodes() # nx.draw(G, with_labels=True);plt.show() metrics = [ (UG, nx.eigenvector_centrality), (UG, nx.pagerank), (DG, nx.pagerank), ] results = [sorted(metric(G).items(), key=lambda k: k[1], reverse=True) for G, metric in metrics] x = [i+1 for i in range(nnodes)] y = lambda rs: [v for _, v in rs] plt.loglog(x, y(results[0]), label="eigen Undirected", color='red', linestyle='-.') plt.loglog(x, y(results[1]), label="pagerank Undirected", color='blue', linestyle='--') plt.loglog(x, y(results[2]), label="pagerank Directed", color='blue', linestyle='-') plt.legend() plt.xlabel("packages") plt.ylabel("metric value") plt.grid() plt.xlim([1, nnodes+2]) plt.ylim([1e-5, 1]) plt.show() def common(results, n): packages = results[0][:2*n].copy() for result in results[1:]: for name, value in packages: if not name in [name for name, _ in result]: packages.remove((name, value)) return [name for name, _ in packages[:n]]