[Natalia is not able to send this week's report by herself, so I'm sending the draft she sent me yesterday, footnotes are mine]
Hello, This is my fourth report on the work progress on a project PyPI to Debian Repository Converter. Work ---- Over the past weeks I’ve worked mainly over improving components of my tool, refinement the algorithm, practical implementation of all plugin methods and increase their efficiency, handling command line options and create the Debian source/binary repository which contains converted packages. Overall functionality --------------------- Current status of my program looks like this: * initial part based on generators: - creating a list of packages to convert (from PyPI by its XML-RPC methods or from specified directories) - preparing packages by downloading tarballs, if needed: repacking and renaming them, extracting archives * part of the work based on the plugins system: - converting packages (currently implemented: stdeb and pkgme) - building source packages (currently implemented: dpkg-source) - building binary packages (currently implemented: dpkg-buildpackage) - exporting to repository (currently implemented: apt-ftparchive) * storing all information in the database. Very nice moment during my recent work on the tool was generating final product, that is: a complete Debian repository based on converted packages and even successfully installation of a package (source and binary) from the repository created by PyPI2Deb :-) Algorithm --------- According to the findings of discussions with my mentor, based on lessons learned from the trials and several approaches to design, algorithm was finally developed (hopefully) and almost entirely implemented as follows: [l] get list of packages [p] select a new package/version pair - if there are no more, end the program [h] get status of selected pair from the database - if the package is already in the repo, go to [p] - if conversion files are on disk, go to [s] [c] select next convert plugin: - if there are no more, go to [p] - if option --force-conversion is enabled, go to [cc] - if selected plugin has already been used, go to [c] [cc] convert package/version using selected convert plugin - if it fails, go to [c] [s] build a source package - if it was not successful: - go to [c] if --try-next-conversion-plugin-if-building-src-pkg-fails is enabled (at the command line or configuration) - go to [p] [b] select next build plugin - if there are no more, go to [p] - if option --force-build is enabled, go to [bb] - if plugin has already been used, go to [b] [bb] build a binary package using selected build plugin - if it failed: - go to [b] if --try-next-build-plugin-if-building-fails is enabled - go to [c] if --try-next-conversion-plugin-if-building-fails is enabled - go to [p] [t] select next test plugin - if there are no more, go to [r] [tt] run tests for selected test plugin: - if test result is less than 50%: - if --try-next-converter-if-tests-fail is enabled, go to [c] - go to [p] - go to [t] [r] add to the repository Plugins ------- Thanks to storing results in a database, I was able to determine the most frequent causes of plugin failures. On this basis, I've improved the implementation of plugins (particularly: convert plugin’s “post_process” method) which significantly increased their effectiveness. Still the most troublesome problems are missing build dependencies, but I'll work on that (by adding new build plugin: pbuilder and/or sbuild and improving build dependency detection in conversion plugins).¹ Options ------- I didn’t want to confuse users with too many options, but as the work progresses, an increasing number of settings seemed to be useful from the point of view of a future user: * --config (path to the config file) - seems trivial, but because of the way Python imports modules, a combination of config settings and arguments given in the command line requires... considerable creativity to supply correct current values² * --pyversion (version of Python which packages have to support³) - classifiers system implemented in the PyPI repository, is not correctly used by many developers, so previously obvious choice of PyPI XMP-RPC methods to download list of desired packages didn't bring expected results and since I think the more converted packages, the better, I had to resort to some tricks to efficiently download information about many packages. Currently for Python 2 - I'm able to retrieve information from PyPI about ~16 000 packages (of about 23 thousand available) and about almost all that support Python 3. * --packages (convert only requested packages) - this is my favorite option, it makes debugging a lot easier. Properly implementing it cost me a lot of work (due to problems with appropriate queries to the XML-RPC and reasonable searching for the tarballs stored on the disk). Problem that has accompanied me for a long time was about which set of package names to operate on: the original PyPI ones or those already adapted to Debian's Python Policy requirements. I am really happy with this option, I can use it to convert newly released version of given package and update repository rapidly, it checks if requested package/version is available for selected Python interpreter, etc. [plugins] * --converter, --builder, --exporter - they allow user to select which (available) plugin(s) should perform given action. The default plugin order is based on priorities set by plugin authors and availability of required tools ("is_usable" method) [actions] * --force, --force-conversion, --force-build - gives plugins another try to convert/build package/version pair (if a plugin failed once, it's skipped by default for given package/version) * --try-next-converter-if-building-src-package-fails, --try-next-build-plugin-if-building-fails, --try-next-conversion-plugin-if-building-fails (still looking for better names ;-) - these options force build machine to do a bit more work, but eventually it's possible that more packages will be available in the repository due to this additional work. Thanks to all these options, program can resume its activities for a particular package at any moment: the tool checks the status of package in database, checks what files are already on disk, verifies if package was already converted, built and exported and skips these steps if necessary (unless user forces it to redo the work). I think that's a reasonable solution. TODO ---- To fulfill the objectives of the program and ideas developed in the course of its implementation, here's what I want to do next: * implement some tests plugins (lintian, lintian4py (with pyflakes), ...) * implement more advanced export plugins (mini-dinstall, reprepro)⁴ * an option to provide HTML logs from all actions (per plugin, per author, per package, etc.) * improve conversion tools to support Python 3 packages * testing, fixing bugs * improve PyPI XML-RPC methods to be more efficient for tool's needs Piotr's comments: [¹] This means Debian would benefit from fixing #652617, anyone cares to provide a patch? [²] Natalia spent some time figuring out why default config values were not updated with the ones from command line [³] i.e. python-foo vs. python3-foo packages [⁴] simple dput to external repo would be handy as well
signature.asc
Description: Digital signature