It's definitely an edge case - and I'm not against spidering them both and doing the comparison outside of Scrapy. I'm stuffing everything in the db so that should be trivial.
Was just curious if maybe someone had done anything weird like this before :) Jim On Mon, Nov 9, 2015 at 1:03 PM, Travis <[email protected]> wrote: > Someone please correct me if they feel otherwise, but I don't really think > that's scrapys strength. > > I think of it as a great framework for the spidering and data extraction. > I usually do any post processing (like dupe id) in a separate script. That > way if you improve your dupe detection algo, it's not tied with your data > acquisition. > > I could see a situation where you'd want to limit spidering based on dupe > content. Is that what you want to do? Or is it more of a content survey? > > On Nov 9, 2015, at 12:38 PM, Jim Priest <[email protected]> wrote: > > I'm just getting started with Scrapy and hacking up some proof of concepts > for a bigger project... > > So far I have a basic spider going and saving my data to a MySQL db. So > far so good! :) > > So now I'm trying to figure out the following... > > Say I have two domains companyA / companyB and each site has the same page > - with possibly the same or different content on each page. > > companyA.com/about-us.html <http://companya.com/about-us.html> > companyB.com/about-us.html <http://companyb.com/about-us.html> > > How would you go about spidering both and comparing pages? > > I don't really need to know WHAT is different - just that either the pages > are the same or not. > > Right now while I'm spidering companyA I'm storing a hash or the page in > my db - but I'm not sure where in the process I could check companyB? > > Do I do that while spidering? Do I run two spiders over each site and then > compare afterwards? > > Thanks for the help! > Jim > > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
