Hello, no one as ever needed such a thing ? If it doesn't exists I should do it. One of the implementation is to use a priority queue, instead of just adding new page to crawl into a FIFO you set up a priority in this queue. Do someone has I idea about implementing that ? Some pointers to this part of scrapy code would be much appreciated ! thanks
Le jeudi 17 juillet 2014 12:30:01 UTC+2, Magikmeuh a écrit : > > What I called 'smart refresh algorithm' (but in fact I realise that it's > maybe not the good term...) is the ability to schedule/change the crawling > refresh period of pages depending on the changing rate of the content. > You specify 2 range, the min and max crawl refresh. If the content of a > page never change, it tend to be the max time. > If, each time you fetch it again the content has changed, it tend to get > the min. > If it's partial it evolves between in this range. > > Is there something similar ? It would be very strange that it desn't > exists because I just can't imagine crawling a big site without this > functionnality... (and having a good refresh rate of pages of course) > > In your DeltaFetch if i understand well, it's a way to avoid to recrawl > pages that has been already fetched. > > > Le mercredi 16 juillet 2014 11:01:04 UTC+2, Paul Tremberth a écrit : >> >> Hi Frédéric, >> >> what do you mean by "smart refresh crawling"? >> scrapylib has the DeltaFetch spider middleware >> >> https://github.com/scrapinghub/scrapylib/blob/master/scrapylib/deltafetch.py >> >> Paul. >> >> On Wednesday, July 16, 2014 10:15:11 AM UTC+2, Magikmeuh wrote: >>> >>> Hello everyone, >>> >>> Does scrapy have a smart refresh crawling algorithm ? >>> >>> I don't see any trace of it in the documentation or on this googlegroup; >>> >>> Does someone have already implemented it ? >>> >>> Thanks >>> >>> >>> -- >>> Frédéric Passaniti >>> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
