[google-appengine] Re: Parallel urlfetch utility class / function.

bFlood Mon, 16 Mar 2009 09:16:10 -0700


@joe - fire/forget - you can just skip the fetcher.wait() call (which
call AsyncAPIProxy.wait). I'm not sure of you would need a valid
callback but even if you did it could be a simple stub that does
nothing.


@david - have you made this work with datastore calls yet? having some
issues trying to figure out how to set pbrequest/pbresponse variables

cheers
brian


On Mar 16, 12:05 pm, Joe Bowman <bowman.jos...@gmail.com> wrote:
> Wow that's great. The SDK might be problematic for you, as it appears
> to be very single threaded, I know for a fact it can't reply to
> requests to itself.
>
> Out of curiosity, are you still using base urlfetch, or is it your own
> creation? While when Google releases their scheduled tasks
> functionality it will be less of an issue, if your solution had the
> ability to fire off urlfetch calls and not wait for a response, it
> could be a perfect fit for the gaeutilities cron utility.
>
> Currently it grabs a list of tasks it's supposed to run on request,
> sets a timestamp, runs one, the compares now() to the timestamp and if
> the timedelta is more than 1 second, stops running tasks and finishes
> the request. It already appears your project would be perfect for
> running all necessary tasks at once, and the MIT License I believe is
> compatible with the BSD license I've released gaeutilities, so would
> you have any personal objection to me including it in gaeutilities at
> some point, with proper attribution of course?
>
> If you haven't see that project, it's url ishttp://gaeutilities.appspot.com/
>
> On Mar 16, 11:03 am, David Wilson <d...@botanicus.net> wrote:
>
> > Joe,
>
> > I've only tested it in production. ;)
>
> > The code should work serially on the SDK, but I haven't tried yet.
>
> > David.
>
> > 2009/3/16 Joe Bowman <bowman.jos...@gmail.com>:
>
> > > Does the batch fetching working on live appengine applications, or
> > > only on the SDK?
>
> > > On Mar 16, 10:19 am, David Wilson <d...@botanicus.net> wrote:
> > >> I have no idea how definitive this is, but literally it means wall
> > >> clock time seems to be how CPU cost is measured. I guess this makes
> > >> sense for a few different reasons.
>
> > >> I found some internal function
> > >> "google3.apphosting.runtime._apphosting_runtime___python__apiproxy.get_requ
> > >>  est_cpu_usage"
> > >> with the docstring:
>
> > >>     Returns the number of megacycles used so far by this request.
> > >>     Does not include CPU used by API calls.
>
> > >> Calling it, then running time.sleep(5), then calling it again,
> > >> indicates thousands of megacycles used, yet in real terms the CPU was
> > >> probably doing nothing. I guess Datastore CPU, etc., is added on top
> > >> of this, but it seems to suggest to me that if you can drastically
> > >> reduce request time, quota usage should drop too.
>
> > >> I have yet to do any kind of rough measurements of Datastore CPU, so
> > >> I'm not sure how correct this all is.
>
> > >> David.
>
> > >>  - One of the guys on IRC suggested this means that per-request cost
> > >> is scaled during peak usage (and thus internal services running
> > >> slower).
>
> > >> 2009/3/16 peterk <peter.ke...@gmail.com>:
>
> > >> > A couple of questions re. CPU usage..
>
> > >> > "CPU time quota appears to be calculated based on literal time"
>
> > >> > Can you clarify what you mean here? I presume each async request eats
> > >> > into your CPU budget. But you say:
>
> > >> > "since you can burn a whole lot more AppEngine CPU more cheaply using
> > >> > the async api"
>
> > >> > Can you clarify how that's the case?
>
> > >> > I would guess as long as you're being billed for the cpu-ms spent in
> > >> > your asynchronous calls, Google would let you hang yourself with them
> > >> > when it comes to billing.. :) so I presume they'd let you squeeze in
> > >> > as many as your original request, and its limit, will allow for?
>
> > >> > Thanks again.
>
> > >> > On Mar 16, 2:00 pm, David Wilson <d...@botanicus.net> wrote:
> > >> >> It's completely undocumented (at this stage, anyway), but definitely
> > >> >> seems to work. A few notes I've come gathered:
>
> > >> >>  - CPU time quota appears to be calculated based on literal time,
> > >> >> rather than e.g. the UNIX concept of "time spent in running state".
>
> > >> >>  - I can fetch 100 URLs in 1.3 seconds from a machine colocated in
> > >> >> Germany using the asynchronous API. I can't begin to imagine how slow
> > >> >> (and therefore expensive in monetary terms) this would be using the
> > >> >> standard API.
>
> > >> >>  - The user-specified callback function appears to be invoked in a
> > >> >> separate thread; the RPC isn't "complete" until this callback
> > >> >> completes. The callback thread is still subject to the request
> > >> >> deadline.
>
> > >> >>  - It's a standard interface, and seems to have no parallel
> > >> >> restrictions at least for urlfetch and Datastore. However, I imagine
> > >> >> that it's possible restrictions may be placed here at some later
> > >> >> stage, since you can burn a whole lot more AppEngine CPU more cheaply
> > >> >> using the async api.
>
> > >> >>  - It's "standard" only insomuch as you have to fiddle with
> > >> >> AppEngine-internal protocolbuffer definitions for each service type.
> > >> >> This mostly means copy-pasting the standard sync call code from the
> > >> >> SDK, and hacking it to use pubsubhubub's proxy code.
>
> > >> >> Per the last point, you might be better waiting for an officially
> > >> >> sanctioned API for doing this, albeit I doubt the protocolbuffer
> > >> >> definitions change all that often.
>
> > >> >> Thanks for Brett Slatkin & co. for doing the digging required to get
> > >> >> the async stuff working! :)
>
> > >> >> David.
>
> > >> >> 2009/3/16 peterk <peter.ke...@gmail.com>:
>
> > >> >> > Very neat.. Thank you.
>
> > >> >> > Just to clarify, can we use this for all API calls? Datastore too? I
> > >> >> > didn't look very closely at the async proxy in pubsubhubub..
>
> > >> >> > Asynchronous calls available on all apis might give a lot to chew
> > >> >> > on.. :) It's been a while since I've worked with async function 
> > >> >> > calls
> > >> >> > or threading, might have to dig up some old notes to see where I 
> > >> >> > could
> > >> >> > extract gains from it in my app. Some common cases might be worth 
> > >> >> > the
> > >> >> > community documenting for all to benefit from, too.
>
> > >> >> > On Mar 16, 1:26 pm, David Wilson <d...@botanicus.net> wrote:
> > >> >> >> I've created a Google Code project to contain some batch utilities 
> > >> >> >> I'm
> > >> >> >> working on, based on async_apiproxy.py from pubsubhubbub[0]. The
> > >> >> >> project currently contains just a modified async_apiproxy.py that
> > >> >> >> doesn't require dummy google3 modules on the local machine, and a
> > >> >> >> megafetch.py, for batch-fetching URLs.
>
> > >> >> >>    http://code.google.com/p/appengine-async-tools/
>
> > >> >> >> David
>
> > >> >> >> [0]http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/async_a...
>
> > >> >> >> --
> > >> >> >> It is better to be wrong than to be vague.
> > >> >> >>   — Freeman Dyson
>
> > >> >> --
> > >> >> It is better to be wrong than to be vague.
> > >> >>   — Freeman Dyson
>
> > >> --
> > >> It is better to be wrong than to be vague.
> > >>   — Freeman Dyson
>
> > --
> > It is better to be wrong than to be vague.
> >   — Freeman Dyson
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Parallel urlfetch utility class / function.

Reply via email to