We would like to implement something like that moving forward.

In the meantime we have a lot of pages currently cached we'd like to check
(these may never get updated so would never see the on_save hook), and we
also have a lot of static resources we need to check as well that have no
'save now' hook available.

Ideally we'd have something that ran on a schedule for a broad update (once
a week?) and then via implementing hooks where we can - that would cover
everything else.

Jim



On Mon, Nov 2, 2015 at 4:55 PM, Travis Leleu <[email protected]> wrote:

> Jim, I'd probably add a hook to the on_save event in your blogs that
> pushes the URL into a queue.  Have a simple script that saves the content
> to your static failover.  No need for a spider/crawler when you just want
> to grab one page's content on an event trigger.
>
> Perhaps I'm not understanding why you'd need something heavy like scrapy,
> you could write a 30 line python program to monitor the queue,
> requests.get() the page, then save to static location.
>
> On Mon, Nov 2, 2015 at 5:16 PM, Jim Priest <[email protected]> wrote:
>
>> I should have provided a bit more info on our use case :)
>>
>> We have a lot of dynamic content in Drupal, blogs, etc.   The failover
>> content is static versions of this dynamic content.  Currently this is done
>> via a rather clunky Akamai tool which we're hoping to replace.
>>
>> Another goal is to more immediately update this content - ie someone
>> updates a Drupal page, it is immediately spidered (via API call or
>> something) and that content is then saved to failover.
>>
>> I could probably cobble something together with wget or some other tool
>> but trying to not reinvent the wheel here as much as possible.
>>
>> Thanks!
>> Jim
>>
>>
>>
>> On Monday, November 2, 2015 at 7:28:39 AM UTC-5, Jakob de Maeyer wrote:
>>>
>>> Hey Jim,
>>>
>>> Scrapy is great at two things:
>>> 1. downloading web pages, and
>>> 2. extracting unstructured data.
>>>
>>> In your case, you should have already have access to the raw files (via
>>> FTP, etc.), as well as to the data in a structured format. It would be
>>> possible to do what you're aiming at with Scrapy, but it doesn't seem to be
>>> the most elegant solution. What speaks against setting up an rsync cronjob
>>> or similar to keep the failover in sync?
>>>
>>>
>>> Cheers,
>>> -Jakob
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to