I've spent the last week getting intimate with the CPAN metadata, or in many cases, the lack thereof, and want to share what I've learned.
First, some background.... We're trying to implement a script called "cpan2efs", which will essentially do what cpanp/cpanm do for installing a single module, and then some. Primarily, map one or more module names to distributions, determine their dependencies, and install the missing dependencies, before then installing the requested module(s). Sounds simple, right? For modules that have META.yml files, this is usually pretty straight forward, provided the META.yml contents are correct. However, there are currently 20831 active (meaning: listed in 02packages.details.txt.gz) distributions, and of those about 75% have META.yml files. The problem is how to deal with the other 25%. Modules like CPAN::FindDependencies, which got the initial cpan2efs implementation working, figure out the dependencies by running the Makefile.PL, and parsing the output, or by directly parsing the Makefile.PL. This is problematic, because a lot of modules have obnoxiously interactive Makefile.PL files, and they don't take reasonable default. In many cases, they go into infinite loops when you don't answer the questions they ask. There's a very long list of special cases to deal with here, if one was to attempt to automate this. CPAN::FIndDependencies, however, compresses all the dependencies into a single list, while we really need them separated into runtime, build and test dependencies. Tools like cpanp/cpanm can easily handle this by parsing the Makefile.PL output, and then recursively installing those modules, but that only works because everything is being installed in the same target directory. In our case, we need to get each distribution installed into it's own EFS release, so that approach doesn't work so well. I've spent the last day trying to figure out if we can solve the metadata problem on the CPAN side. We have created a CPAN::Mini on madefsd01:/home/minicpan/latest, and I have been trying to create a complete cache of the META.yml files, so that our installation tools don't have to do this dynamically. This has had very mixed results so far. First, I simple retrieved any existing META.yml files over the web from search.cpan.org, of they were found. Then I started processing the Makefile.PLs, and that's where things just fall apart. Before you even get through the authors starting with A, you encounter 5 or 6 modules that can't be processed automatically. My strategy was to try to do: perl Makefile.PL && make metafile or the Build.PL equivalent, and then copy the resulting META.yml into my cache. With almost 5000 missing META.yml files, this is going to be time consuming. What I'm still trying to do is create a script that can run after we've updated a minicpan repo, that incrementally processes new archives, and extracts the META.yml file, if included, and if not, attempts to generate one. I think this approach can work, and then we can code cpan2efs to use META.yml exclusively. When a META.yml file is missing, then the module will require manual intervention to install the first time. Once you have it installed, then you will have a working efsdeploy.conf (and possible some hooks)m and future upgrades shoul be very easy. I have also just started to look at things like CPANDB, and the plethora of CPAN-related modules, and it looks like everyone's fighting with the slowly evolving state of CPAN metadata. Once we have this basic setup working, it might be worth the time in looking deeper into the cpantesters code, too.
_______________________________________________ EFS-dev mailing list [email protected] http://mailman.openefs.org/mailman/listinfo/efs-dev
