Just two things regarding the python scraping and parsing code in svn:
1) mixture of tabs and spaces - quite a few of the files have
inconsistent indentation. Didn't check everything, but for example:
python -tt pyscraper/createhansardindex.py
python -tt pyscraper/miscfuncs.py
python -tt pyscraper/patchtool.py
all give a TabError.
There is a reindent script here:
http://svn.python.org/view/*checkout*/python/trunk/Tools/scripts/reindent.py?revision=66903&content-type=text%2Fplain
2) Trying to understand the sequencing of the scripts, I ended up
playing about with this Dispatcher:
http://pastebin.com/m7e8b0b3d
the idea being to try and avoid those nested ifs and elifs in lazyrunall.py
Not fully thought out, but (fwiw) you would instead end up with
something like:
dispatcher = Dispatcher()
dispatcher.on__scrape__hansard = UpdateHansardIndex
dispatcher.on__scrape__lords = UpdateLordsHansardIndex
dispatcher.on__scrape__standing = UpdateStandingHansardIndex
dispatcher.on__scrape__chgpages = (GrabWatchCopies,
(datetime.date.today().isoformat(),), None)
dispatcher.on__scrape__force_scrape__regmem = RegMemPullGluePages
etc.
etc.
options, args = parser.parse_args()
dispatcher.run(args)
Just an idea.
_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public