I finally went the "independent table that I manually (de)serialize before and after download" route. The current state of the script is available at
https://github.com/gasche/opam/blob/ocamlbuild-migration-script/admin-scripts/add_ocamlbuild_dependency.ml and I'm looking for feedback on the preliminary results at https://github.com/ocaml/opam-repository/pull/5140 On Thu, Nov 12, 2015 at 6:47 PM, Gabriel Scherer <[email protected]> wrote: > Hi opam-devel, > > I'm currently hacking on a script to do a bulk update of OPAM > metadata, adding "ocamlbuild" as an explicit dependency of all > packages my killer heuristic decides certainly use ocamlbuild (right > now: there is a _tags or myocamlbuild.ml somewhere, but I'm soon going > to integrate the fact that an _oasis file explicitly lists ocamlbuild > as the relied-upon build system). > > This is rather simple, with most of the time spent browsing through > the rich opam-library API. > - iterate over all packages in the repository (using the nice > Opam_admin_top.iter_packages function) > - for each package download the archive (I used > OpamAction.download_package for this, although it requires an > OpamState.t argument that I wasn't sure how to build¹) > - extract each archive (OpamFilename.extract_generic_file, under some > OpamFilename.with_tmp_dir call to get automatic cleanup) > - walk the archive to test ocamlbuild usage > > Caching downloaded archive works very well, so re-running the script > (during my test-refine feedback loops) does not re-download those as > well. Unfortunately, for a handful of packages, download fails, and it > only fails after a rather long timeout has expired, so just > re-iterating on those failed packages make a process that should be > instantaneous takes several minutes. > > So here is my question: how can I test whether a package archive is > already in the cache? Because I know now that all packages that won't > time out have been cached by previous runs of my script, I could > iterate only on those. But I didn't find a clear way to do that (this > seems to be available internally in some OpamHTTP backend, but I > haven't seen this exported). > > A way to cache not only the successfully downloaded archives, but also > the "did not work" last time decision would also fit the bill. In the > worst case I could store that information in an independent table that > I would (de)serialize across invocations of my script. > > (Opam seems to have fancy download functions designed to download a > lot of stuff in parallel, but that seems incompatible with the > sequential workflow imposed by `iter_packages`. I could first iterate > to build a list of URLs, then download everything in parallel, then > re-iterate but then again I need to only access the archives whose > download actually succeeded.) > > While we're at it: is there a simple way to get a pretty string from a > Package.t value? I use > Printf.sprintf "%s.%s" > (OpamPackage.name_to_string package) > (OpamPackage.version_to_string package) > but would expect this to be available already. > > The complete code of the current prototype script (it is not editing > any metada so far, just printing out the results that seem reasonable, > except that the _oasis part of the heuristic needs to be implemented > to get realistic results) is available at > > > https://github.com/gasche/opam/blob/2badfa0810e25ded1495b28b2ec8ff53f03a90cc/admin-scripts/add_ocamlbuild_dependency.ml > > Any comment or advice is warmly welcome. In particular there is a > question in a comment about: what is the right way to build a > OpamState.t value? _______________________________________________ opam-devel mailing list [email protected] http://lists.ocaml.org/listinfo/opam-devel
