Thank you for your answer, what you posted is perfect!

Cheers


On Tuesday, 14 June 2016 13:19:08 UTC+1, Neverlast N wrote:
>
> Hello,
>
> This is a quite common requirement. I think one good idea is to use a 
> lightweight semi-persistent storage solution, typically redis. Calculate a 
> hash of your item if it's large and "SET ID, HASH". Use Twisted 
> asynchronous Redis clients and you can get just slightly increased latency 
> but no noticeable throughput decrease. You can use this code as a starting 
> point: 
> https://github.com/scalingexcellence/scrapybook/blob/master/ch09/properties/properties/pipelines/redis.py
>  
>  
> Cheers,
> Dimitris
>
>  
> ------------------------------
> Date: Tue, 14 Jun 2016 01:41:59 -0700
> From: [email protected] <javascript:>
> To: [email protected] <javascript:>
> Subject: Drop scraped data if it matches all previous data
>
> Hello, 
>
> I scrape sites and now I want to drop the scraped data if there's 'no 
> update'. 
>
> Fortunately I have a unique ID per scraped data record so I could use this 
> ID field to compare if the data has changed or not. 
>
> I run the scrapy with scrapy crawl in crontabs so every time I scrape I 
> startup a new instance, meaning if I would hold the scraped data in memory 
> using python code that wouldn't work. 
>
> I don't think this is possible with item pipelines? A solution is just 
> that I post everything in a database and then use the item pipelines to 
> check the database using the unique ID and compare the data if it's new or 
> not, and drop the scraped data if it is the same. 
>
> Thanks for the help, 
>
> Cheers
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to