[melbourne-pug] Joblib question

Mike Dewhirst Thu, 08 Mar 2018 23:51:54 -0800

https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf

I'm trying to make the following code run in parallel on separate CPUcores but haven't had any success.

def make_links(self): for db in databases: link =create_useful_link(self, Link, db) if link: scrape_db(self, link, db)

This is a web scraper which is working nicely in a leisurely sequentialmanner. databases is a list of urls with gaps to be filled bycreate_useful_link() which makes a link record from the Link class. Theself instance is a source of attributes for filling the url gaps. selfis a chemical substance and the link record url field when clicked in abrowser will bring up that external website with the chemical substanceselected for researching by the viewer. If successful, we then fetch theexternal page and scrape a bunch of interesting data from it and turnthat into substance notes. scrape_db() doesn't return anything but itdoes create up to nine other records.


        from joblib import Parallel, delayed

        class Substance( etc ..
            ...
            def make_links(self):
                #Parallel(n_jobs=-2)(delayed(
                #    scrape_db(self, create_useful_link(self, Link, db), db) 
for db in databases
                #))

I'm getting a TypeError from Parallel delayed() - can't pickle generatorobjects

So my question is how to write the commented code properly? I suspect Ihaven't done enough comprehension.


Thanks for any help

Mike

_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

[melbourne-pug] Joblib question

Reply via email to