Thank you for your answer, what you posted is perfect! Cheers
On Tuesday, 14 June 2016 13:19:08 UTC+1, Neverlast N wrote: > > Hello, > > This is a quite common requirement. I think one good idea is to use a > lightweight semi-persistent storage solution, typically redis. Calculate a > hash of your item if it's large and "SET ID, HASH". Use Twisted > asynchronous Redis clients and you can get just slightly increased latency > but no noticeable throughput decrease. You can use this code as a starting > point: > https://github.com/scalingexcellence/scrapybook/blob/master/ch09/properties/properties/pipelines/redis.py > > > Cheers, > Dimitris > > > ------------------------------ > Date: Tue, 14 Jun 2016 01:41:59 -0700 > From: [email protected] <javascript:> > To: [email protected] <javascript:> > Subject: Drop scraped data if it matches all previous data > > Hello, > > I scrape sites and now I want to drop the scraped data if there's 'no > update'. > > Fortunately I have a unique ID per scraped data record so I could use this > ID field to compare if the data has changed or not. > > I run the scrapy with scrapy crawl in crontabs so every time I scrape I > startup a new instance, meaning if I would hold the scraped data in memory > using python code that wouldn't work. > > I don't think this is possible with item pipelines? A solution is just > that I post everything in a database and then use the item pipelines to > check the database using the unique ID and compare the data if it's new or > not, and drop the scraped data if it is the same. > > Thanks for the help, > > Cheers > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
