I should have provided a bit more info on our use case :) We have a lot of dynamic content in Drupal, blogs, etc. The failover content is static versions of this dynamic content. Currently this is done via a rather clunky Akamai tool which we're hoping to replace.
Another goal is to more immediately update this content - ie someone updates a Drupal page, it is immediately spidered (via API call or something) and that content is then saved to failover. I could probably cobble something together with wget or some other tool but trying to not reinvent the wheel here as much as possible. Thanks! Jim On Monday, November 2, 2015 at 7:28:39 AM UTC-5, Jakob de Maeyer wrote: > > Hey Jim, > > Scrapy is great at two things: > 1. downloading web pages, and > 2. extracting unstructured data. > > In your case, you should have already have access to the raw files (via > FTP, etc.), as well as to the data in a structured format. It would be > possible to do what you're aiming at with Scrapy, but it doesn't seem to be > the most elegant solution. What speaks against setting up an rsync cronjob > or similar to keep the failover in sync? > > > Cheers, > -Jakob > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
