Ahh! Yes, by google search API I meant Twitter search API!

I'm using a CRON job to trigger a special URL every 5 minutes.  Originally I
had this job on my own webhost, but I breached the terms of service because
a) sometimes the way I update the trend lists can take a long time and the
very basic PHP fetch I do was waiting for a return value (which it doesn't
really need to do) - this caused CPU limits on my cheap host to be exceeded
and b) my cheap host only allows jobs to be scheduled every 15 minutes!

I ended up with a two part solution:

1) I use http://www.webcron.org to schedule jobs that call a URL on my
webhost for longer jobs every 5 minutes or direct on GAE for shorter jobs.
Webcron charges by the length of job so sub-30 seconds is cheapest (0.0001
Euro cents or 1000 jobs per cent)

2) On my webhost I use cURL instead of a standard PHP fetch (which is how I
first did it) - this just triggers the job then terminates the script.  GAE
will happily continue to execute the job even though the listening party has
terminated. I get what I want and my webhost doesn't get upset.  I need to
do it in this "2-part" way becase webcron won't let you terminate a job
after calling it - this achieved what I wanted in a fairly cheap way for me.

Here is the PHP script I use

Note the URL doesn't need the HTTP:// part in front of it.

<?
$url = "myurl.appsot.com/somejob";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 2);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
?>


On Sun, Mar 15, 2009 at 4:52 PM, lock <lachlan.hu...@gmail.com> wrote:

>
> Hi Tim,
>
> Just had a look at Twendly, looks good! I've just got a few quick
> questions, if you wouldn't mind...
>
> 1. By 'google search API' you actually mean 'twitter seach API',
> yeah ? ;-)
>
> 2. How do you go about pulling data from twitter every 5 minutes?
> Unless I'm missing something there are no scheduled tasks in
> app engine (yet).  Using a cron job on another server to call a
> special URL maybe?
>
> The API key sounds like the proper solution, would be nice if
> there was a solution now though.
>
> Just an idea that probably won't work for most cases.  Get the
> client (via javascript) to pull data from twitter and send it on to
> app engine for processing/storage.  Not real pretty.
>
> Thanks, lock
>
> On Mar 15, 9:16 am, Tim Bull <tim.b...@binaryplex.com> wrote:
> > Interesting,
> >
> > I have a Twitter app (http://twendly.appspot.com) but I don't seem to be
> > having this issue at the moment.  However, while I read information every
> 5
> > minutes from the google search API (which is rate limited differently) I
> > only send a few messages (no more than 5 or 6 max and usually only 4) as
> the
> > hour clicks over.  Although ocasionally this drops a message, it's
> generally
> > pretty solid.  Perhaps because of when I'm sending them, I get in at the
> > start of the allocation.
> >
> > As far as scalability goes, I would say GAE is really suited for it's
> read
> > scalability, so if unless your Twitter bot writes are going to massive,
> then
> > scalability shouldn't be an issue if you move these writes over to a
> > seperate host.  I guess a (nasty but possible) pattern would be to have
> the
> > Twitter interaction come from your host which could act as a proxy, then
> use
> > App Engine for all the processing and reporting on the data.  At least in
> my
> > application this would be a potential work-around if this becomes an
> issue.
> >
> > Cheers
> >
> > Tim
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to