Re: Middleware to avoid re-visiting already visited items Not Working (IgnoreVisitedItems)

Mrudul Tarwatkar Thu, 26 Dec 2013 05:21:53 -0800

I don't want to use a pipeline because it will take effect after the page 
has been crawled (downloaded).


On Tuesday, December 24, 2013 2:58:49 PM UTC+5:30, Mrudul Tarwatkar wrote:
>
> I managed to use This 
> Middleware<http://snipplr.com/view/67018/middleware-to-avoid-revisiting-already-visited-items/>
>  *to 
> avoid re-crawling of the url's already crawled in subsequent spider calls.*
>
> I *added the two items *
>
> visit_id = Field()
>> visit_status = Field()
>>
>
> to my *items.py* file
>
> but still the same url's are being crawled everytime I run the spider. 
> What is wrong in this implementation? 
> I can see that the *a single url has same* *fingerprint* 
> *for the visited_id field in subsequent spider calls.*What am I doing 
> wrong here?
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Middleware to avoid re-visiting already visited items Not Working (IgnoreVisitedItems)

Reply via email to