We would like to implement something like that moving forward. In the meantime we have a lot of pages currently cached we'd like to check (these may never get updated so would never see the on_save hook), and we also have a lot of static resources we need to check as well that have no 'save now' hook available.
Ideally we'd have something that ran on a schedule for a broad update (once a week?) and then via implementing hooks where we can - that would cover everything else. Jim On Mon, Nov 2, 2015 at 4:55 PM, Travis Leleu <[email protected]> wrote: > Jim, I'd probably add a hook to the on_save event in your blogs that > pushes the URL into a queue. Have a simple script that saves the content > to your static failover. No need for a spider/crawler when you just want > to grab one page's content on an event trigger. > > Perhaps I'm not understanding why you'd need something heavy like scrapy, > you could write a 30 line python program to monitor the queue, > requests.get() the page, then save to static location. > > On Mon, Nov 2, 2015 at 5:16 PM, Jim Priest <[email protected]> wrote: > >> I should have provided a bit more info on our use case :) >> >> We have a lot of dynamic content in Drupal, blogs, etc. The failover >> content is static versions of this dynamic content. Currently this is done >> via a rather clunky Akamai tool which we're hoping to replace. >> >> Another goal is to more immediately update this content - ie someone >> updates a Drupal page, it is immediately spidered (via API call or >> something) and that content is then saved to failover. >> >> I could probably cobble something together with wget or some other tool >> but trying to not reinvent the wheel here as much as possible. >> >> Thanks! >> Jim >> >> >> >> On Monday, November 2, 2015 at 7:28:39 AM UTC-5, Jakob de Maeyer wrote: >>> >>> Hey Jim, >>> >>> Scrapy is great at two things: >>> 1. downloading web pages, and >>> 2. extracting unstructured data. >>> >>> In your case, you should have already have access to the raw files (via >>> FTP, etc.), as well as to the data in a structured format. It would be >>> possible to do what you're aiming at with Scrapy, but it doesn't seem to be >>> the most elegant solution. What speaks against setting up an rsync cronjob >>> or similar to keep the failover in sync? >>> >>> >>> Cheers, >>> -Jakob >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
