https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf
I'm trying to make the following code run in parallel on separate CPU
cores but haven't had any success.
def make_links(self): for db in databases: link =
create_useful_link(self, Link, db) if link: scrape_db(self, link, db)
This is a web scraper which is working nicely in a leisurely sequential
manner. databases is a list of urls with gaps to be filled by
create_useful_link() which makes a link record from the Link class. The
self instance is a source of attributes for filling the url gaps. self
is a chemical substance and the link record url field when clicked in a
browser will bring up that external website with the chemical substance
selected for researching by the viewer. If successful, we then fetch the
external page and scrape a bunch of interesting data from it and turn
that into substance notes. scrape_db() doesn't return anything but it
does create up to nine other records.
from joblib import Parallel, delayed
class Substance( etc ..
...
def make_links(self):
#Parallel(n_jobs=-2)(delayed(
# scrape_db(self, create_useful_link(self, Link, db), db)
for db in databases
#))
I'm getting a TypeError from Parallel delayed() - can't pickle generator
objects
So my question is how to write the commented code properly? I suspect I
haven't done enough comprehension.
Thanks for any help
Mike
_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug