I'm just getting started with Scrapy and hacking up some proof of concepts for a bigger project...
So far I have a basic spider going and saving my data to a MySQL db. So far so good! :) So now I'm trying to figure out the following... Say I have two domains companyA / companyB and each site has the same page - with possibly the same or different content on each page. companyA.com/about-us.html companyB.com/about-us.html How would you go about spidering both and comparing pages? I don't really need to know WHAT is different - just that either the pages are the same or not. Right now while I'm spidering companyA I'm storing a hash or the page in my db - but I'm not sure where in the process I could check companyB? Do I do that while spidering? Do I run two spiders over each site and then compare afterwards? Thanks for the help! Jim -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
