Hi everyone, My firm, ReportLab, builds quite a lot of solutions to publish content for people in PDF form. Sometimes, the quickest way to start a project (or even to assess the content quality and consistency of a client), is a one-off scrape of a web site to build a database. Sometimes, we need to set something up to scrape regularly. We did this for years before Scrapy and CSS selectors existed. Scrapy is impressive, but it's one more framework for my team to learn, on top of a dozen others, and maybe something people on this list can do better.
A typical target might be a university web site with a few hundred courses, or a travel site with a few hundred attractions. We would aim to extract "hard facts", images, and relevant chunks of rich text (which all go through a cleanup/normalisation function we already have). FWIW the ultimate destination would be a Django-powered database, but stored JSON files are fine. If anyone here is experienced and fast with Scrapy, and interested in an occasional piece of outsourced work, please contact me directly at "andy at reportlab dot com". I do not have a work order today, but I have a couple of sites which could be discussed and which might come up in the next 6 weeks. Best Regards, Andy Robinson CEO/Chief Architect ReportLab -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
