https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf

I'm trying to make the following code run in parallel on separate CPU cores but haven't had any success.

def make_links(self): for db in databases: link = create_useful_link(self, Link, db) if link: scrape_db(self, link, db)

This is a web scraper which is working nicely in a leisurely sequential manner.  databases is a list of urls with gaps to be filled by create_useful_link() which makes a link record from the Link class. The self instance is a source of attributes for filling the url gaps. self is a chemical substance and the link record url field when clicked in a browser will bring up that external website with the chemical substance selected for researching by the viewer. If successful, we then fetch the external page and scrape a bunch of interesting data from it and turn that into substance notes. scrape_db() doesn't return anything but it does create up to nine other records.

        from joblib import Parallel, delayed

        class Substance( etc ..
            ...
            def make_links(self):
                #Parallel(n_jobs=-2)(delayed(
                #    scrape_db(self, create_useful_link(self, Link, db), db) 
for db in databases
                #))

I'm getting a TypeError from Parallel delayed() - can't pickle generator objects

So my question is how to write the commented code properly? I suspect I haven't done enough comprehension.

Thanks for any help

Mike
_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to