Hey Joe, With the gdata package you can do something like this instead:
As usual, completely untested code, but looks about right.. from youtube import YouTubeVideoFeedFromString def get_feeds_async(usernames): fetcher = megafetch.Fetcher() output = {} def cb(username, result): if isinstance(output, Exception): logging.error('could not fetch: %s', output) content = None else: content = YouTubeVideoFeedFromString(result.content) output[username] = content for username in usernames: url = 'http://gdata.youtube.com/feeds/api/users/%s/uploads' %\ (username,) fetcher.start(url, lambda result: cb(username, result)) fetcher.wait() return output feeds = get_feeds_async([ 'davemw', 'waverlyflams', 'googletechtalks', 'TheOnion', 'winterelaxation' ]) # feeds is now a mapping of usernames to YouTubeVideoFeed instances, or None if could not be fetched. 2009/3/18 Joe Bowman <bowman.jos...@gmail.com>: > > This may be a really dumb question, but.. I'm still learning so... > > Is there a way to do something other than a direct api call > asynchronously? I'm writing a script that pulls from multiple sources, > sometimes with higher level calls that use urlfetch, such as gdata. > Since I'm attempting to pull from multiple sources, and sometimes > multiple urls from each source, I'm trying to figure out if it's > possible to run other methods at the same time. > > For example, I want to pull a youtube entry for several different > authors. The youtube api doesn't allow multiple authors in a request > (I have a enhancement request in for that though), so I need to do a > yt_service.GetYouTubeVideoFeed() for each author, then splice them > together into one feed. As I'm also working with Boss, and eventually > Twitter, I'll have feeds to pull from those sources as well. > > My current application layout is using appengine-patch to provide > django. I've set up a Boss and Youtube "model" with get methods that > handle getting the data. So I can do something similar to: > > web_results = models.Boss.get(request.GET['term'], start=start) > news_results = models.Boss.get(request.GET['term'], vertical="news", > start=start) > youtube = models.Youtube.get(request.GET['term'], start=start) > > Ideally, I'd like some of those models to be able to do asynchronous > tasks within their get function, and then also, I'd like to run the > above requests at the same, which should really speed the request up. > > > On Mar 17, 9:20 am, Joe Bowman <bowman.jos...@gmail.com> wrote: >> Thanks, >> >> I'm going to give it a go for urlfetch calls for one project I'm >> working on this week. >> >> Not sure when I'd be able to include it in gaeutiltiies for cron and >> such, that project is currently lower on my priority list at the >> moment, but can't wait until I get a chance to play with it. Another >> idea I had for it is the ROTmodel (retry on timeout model) in the >> project, which could speed that process up. >> >> On Mar 17, 9:11 am, David Wilson <d...@botanicus.net> wrote: >> >> > 2009/3/16 Joe Bowman <bowman.jos...@gmail.com>: >> >> > > Wow that's great. The SDK might be problematic for you, as it appears >> > > to be very single threaded, I know for a fact it can't reply to >> > > requests to itself. >> >> > > Out of curiosity, are you still using base urlfetch, or is it your own >> > > creation? While when Google releases their scheduled tasks >> > > functionality it will be less of an issue, if your solution had the >> > > ability to fire off urlfetch calls and not wait for a response, it >> > > could be a perfect fit for the gaeutilities cron utility. >> >> > > Currently it grabs a list of tasks it's supposed to run on request, >> > > sets a timestamp, runs one, the compares now() to the timestamp and if >> > > the timedelta is more than 1 second, stops running tasks and finishes >> > > the request. It already appears your project would be perfect for >> > > running all necessary tasks at once, and the MIT License I believe is >> > > compatible with the BSD license I've released gaeutilities, so would >> > > you have any personal objection to me including it in gaeutilities at >> > > some point, with proper attribution of course? >> >> > Sorry I missed this in the first reply - yeah work away! :) >> >> > David >> >> > > If you haven't see that project, it's url >> > > ishttp://gaeutilities.appspot.com/ >> >> > > On Mar 16, 11:03 am, David Wilson <d...@botanicus.net> wrote: >> > >> Joe, >> >> > >> I've only tested it in production. ;) >> >> > >> The code should work serially on the SDK, but I haven't tried yet. >> >> > >> David. >> >> > >> 2009/3/16 Joe Bowman <bowman.jos...@gmail.com>: >> >> > >> > Does the batch fetching working on live appengine applications, or >> > >> > only on the SDK? >> >> > >> > On Mar 16, 10:19 am, David Wilson <d...@botanicus.net> wrote: >> > >> >> I have no idea how definitive this is, but literally it means wall >> > >> >> clock time seems to be how CPU cost is measured. I guess this makes >> > >> >> sense for a few different reasons. >> >> > >> >> I found some internal function >> > >> >> "google3.apphosting.runtime._apphosting_runtime___python__apiproxy.get_requ >> > >> >> est_cpu_usage" >> > >> >> with the docstring: >> >> > >> >> Returns the number of megacycles used so far by this request. >> > >> >> Does not include CPU used by API calls. >> >> > >> >> Calling it, then running time.sleep(5), then calling it again, >> > >> >> indicates thousands of megacycles used, yet in real terms the CPU was >> > >> >> probably doing nothing. I guess Datastore CPU, etc., is added on top >> > >> >> of this, but it seems to suggest to me that if you can drastically >> > >> >> reduce request time, quota usage should drop too. >> >> > >> >> I have yet to do any kind of rough measurements of Datastore CPU, so >> > >> >> I'm not sure how correct this all is. >> >> > >> >> David. >> >> > >> >> - One of the guys on IRC suggested this means that per-request cost >> > >> >> is scaled during peak usage (and thus internal services running >> > >> >> slower). >> >> > >> >> 2009/3/16 peterk <peter.ke...@gmail.com>: >> >> > >> >> > A couple of questions re. CPU usage.. >> >> > >> >> > "CPU time quota appears to be calculated based on literal time" >> >> > >> >> > Can you clarify what you mean here? I presume each async request >> > >> >> > eats >> > >> >> > into your CPU budget. But you say: >> >> > >> >> > "since you can burn a whole lot more AppEngine CPU more cheaply >> > >> >> > using >> > >> >> > the async api" >> >> > >> >> > Can you clarify how that's the case? >> >> > >> >> > I would guess as long as you're being billed for the cpu-ms spent >> > >> >> > in >> > >> >> > your asynchronous calls, Google would let you hang yourself with >> > >> >> > them >> > >> >> > when it comes to billing.. :) so I presume they'd let you squeeze >> > >> >> > in >> > >> >> > as many as your original request, and its limit, will allow for? >> >> > >> >> > Thanks again. >> >> > >> >> > On Mar 16, 2:00 pm, David Wilson <d...@botanicus.net> wrote: >> > >> >> >> It's completely undocumented (at this stage, anyway), but >> > >> >> >> definitely >> > >> >> >> seems to work. A few notes I've come gathered: >> >> > >> >> >> - CPU time quota appears to be calculated based on literal time, >> > >> >> >> rather than e.g. the UNIX concept of "time spent in running >> > >> >> >> state". >> >> > >> >> >> - I can fetch 100 URLs in 1.3 seconds from a machine colocated in >> > >> >> >> Germany using the asynchronous API. I can't begin to imagine how >> > >> >> >> slow >> > >> >> >> (and therefore expensive in monetary terms) this would be using >> > >> >> >> the >> > >> >> >> standard API. >> >> > >> >> >> - The user-specified callback function appears to be invoked in a >> > >> >> >> separate thread; the RPC isn't "complete" until this callback >> > >> >> >> completes. The callback thread is still subject to the request >> > >> >> >> deadline. >> >> > >> >> >> - It's a standard interface, and seems to have no parallel >> > >> >> >> restrictions at least for urlfetch and Datastore. However, I >> > >> >> >> imagine >> > >> >> >> that it's possible restrictions may be placed here at some later >> > >> >> >> stage, since you can burn a whole lot more AppEngine CPU more >> > >> >> >> cheaply >> > >> >> >> using the async api. >> >> > >> >> >> - It's "standard" only insomuch as you have to fiddle with >> > >> >> >> AppEngine-internal protocolbuffer definitions for each service >> > >> >> >> type. >> > >> >> >> This mostly means copy-pasting the standard sync call code from >> > >> >> >> the >> > >> >> >> SDK, and hacking it to use pubsubhubub's proxy code. >> >> > >> >> >> Per the last point, you might be better waiting for an officially >> > >> >> >> sanctioned API for doing this, albeit I doubt the protocolbuffer >> > >> >> >> definitions change all that often. >> >> > >> >> >> Thanks for Brett Slatkin & co. for doing the digging required to >> > >> >> >> get >> > >> >> >> the async stuff working! :) >> >> > >> >> >> David. >> >> > >> >> >> 2009/3/16 peterk <peter.ke...@gmail.com>: >> >> > >> >> >> > Very neat.. Thank you. >> >> > >> >> >> > Just to clarify, can we use this for all API calls? Datastore >> > >> >> >> > too? I >> > >> >> >> > didn't look very closely at the async proxy in pubsubhubub.. >> >> > >> >> >> > Asynchronous calls available on all apis might give a lot to >> > >> >> >> > chew >> > >> >> >> > on.. :) It's been a while since I've worked with async function >> > >> >> >> > calls >> > >> >> >> > or threading, might have to dig up some old notes to see where >> > >> >> >> > I could >> > >> >> >> > extract gains from it in my app. Some common cases might be >> > >> >> >> > worth the >> > >> >> >> > community documenting for all to benefit from, too. >> >> > >> >> >> > On Mar 16, 1:26 pm, David Wilson <d...@botanicus.net> wrote: >> > >> >> >> >> I've created a Google Code project to contain some batch >> > >> >> >> >> utilities I'm >> > >> >> >> >> working on, based on async_apiproxy.py from pubsubhubbub[0]. >> > >> >> >> >> The >> > >> >> >> >> project currently contains just a modified async_apiproxy.py >> > >> >> >> >> that >> > >> >> >> >> doesn't require dummy google3 modules on the local machine, >> > >> >> >> >> and a >> > >> >> >> >> megafetch.py, for batch-fetching URLs. >> >> > >> >> >> >> http://code.google.com/p/appengine-async-tools/ >> >> > >> >> >> >> David >> >> > >> >> >> >> [0]http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/async_a... >> >> > >> >> >> >> -- >> > >> >> >> >> It is better to be wrong than to be vague. >> > >> >> >> >> — Freeman Dyson >> >> > >> >> >> -- >> > >> >> >> It is better to be wrong than to be vague. >> > >> >> >> — Freeman Dyson >> >> > >> >> -- >> > >> >> It is better to be wrong than to be vague. >> > >> >> — Freeman Dyson >> >> > >> -- >> > >> It is better to be wrong than to be vague. >> > >> — Freeman Dyson >> >> > -- >> > It is better to be wrong than to be vague. >> > — Freeman Dyson > > > -- It is better to be wrong than to be vague. — Freeman Dyson --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---