Hello scrapy-users!

I really like scrapy, and I've been working on adding some machine learning 
to scraping. Its EXTREMELY ROUGH DRAFT is available on github:

https://github.com/johncadigan/sciscrapy

It requires numpy, scipy and scikit.

So far it can just make new classifiers and test them via a console app 
manager.py, but ultimately, the classifiers will be used with scrapy's 
pipeline feature to roughly categorize the data as scrapy crawls a site and 
having it available for manual review later.

This is my first attempt at a real github project; I'd appreciate any 
feedback you have this early in development, especially with regards to the 
structure of the project; I'd also say its not really even ready for 
testing. I've tried to imitate scrapy's methods some, but there is lots of 
room for improvement. I'll probably rewrite manager.py to be more like the 
commandline scrapy.


Thanks,

-John

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to