Hi,

as you might be aware, I've done my share on bitching about my mirror (f.pypi.python.org) breaking.

I have picked pep381client apart yesterday and rebuilt it - mostly from ground up.

You can find a working version here:
https://bitbucket.org/ctheune/bandersnatch

The focus has been on making it a lot more robust and a lot easier to repair a mirror when it's known to be broken. To achieve that I:

- refactored the code, trying to make it more intentional, less mechanical
- stop parsing the simple pages' html and make more use of the XML-RPC API
- add Tarek's worker/queue approach for parallelizing it
- keep as little state as possible on the client
- switch form timestamps to serial counters for checking what and how much to update
- handle locking of concurrent runs more gracefully

I think I have a good grasp of what's going on now so that I can keep maintining this in the future.

I'm currently re-initializing my own mirror. This basically can be run in-place by just removing the existing state data and calling my sync script (bsn-mirror) instead of pep381run with the same parameters.

Tomorrow I'll update the documentation, make it use a config file and put some lipstick on the main entry point. After that I should be ready for a release.

If you want to give it a try already, you just do this:

$ hg clone https://bitbucket/org/ctheune/bandersnatch
$ cd bandersnatch
$ virtualenv-2.7 .
$ bin/python bootstrap.py
$ bin/buildout
$ bin/bsn-mirror /my/mirror/path

Cheers,
Christian
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to