Hi, I have a blog which I like to crawl every day. This works fine but I don't like to crawl/download everything again, just new pages.
I thought about generating a list of downloaded urls and check each time in the Downloader Middleware if this URL was sooner downloaded. The problem is, that the list is huge and it takes some time to look up and this for every request. Any better ideas? Is there a good way or maybe a Scrapy-functionality I don't know about? -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
