Hello scrapy-users! I really like scrapy, and I've been working on adding some machine learning to scraping. Its EXTREMELY ROUGH DRAFT is available on github:
https://github.com/johncadigan/sciscrapy It requires numpy, scipy and scikit. So far it can just make new classifiers and test them via a console app manager.py, but ultimately, the classifiers will be used with scrapy's pipeline feature to roughly categorize the data as scrapy crawls a site and having it available for manual review later. This is my first attempt at a real github project; I'd appreciate any feedback you have this early in development, especially with regards to the structure of the project; I'd also say its not really even ready for testing. I've tried to imitate scrapy's methods some, but there is lots of room for improvement. I'll probably rewrite manager.py to be more like the commandline scrapy. Thanks, -John -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
