Re: [google-appengine] Threaded Python AppEngine for Dummies

2011-10-14 Thread Emlyn
I've posted a modified version of my original post here as a blog post.

Multi-threaded Python 2.7 WTFAQ?
http://appenginedevelopment.blogspot.com/2011/10/multi-threaded-python-27-wtfaq.html

-- 
Emlyn

http://my.syyn.cc - Synchonise Google+, Facebook, WordPress and Google
Buzz posts,
comments and all.
http://point7.wordpress.com - My blog
Find me on Facebook and Buzz




On 15 October 2011 13:00, Emlyn  wrote:
> On 15 October 2011 09:12, Ikai Lan (Google)  wrote:
>> Yep, your thinking here is correct! Be careful when using global memory as a
>> cache, though. Instances are capped at 128mb of memory, and if you exceed
>> that, your instance will be killed. This could lead to instance thrashing.
>> [On another note: congrats, you got me to read a long email ;).]
>> --
>> Ikai Lan
>> Developer Programs Engineer, Google App Engine
>> plus.ikailan.com | twitter.com/ikai
>> On Fri, Oct 14, 2011 at 2:34 AM, Emlyn  wrote:
>>>
>>> These are my first thoughts about approaching threaded python 2.7
>>> apps. Please critique this, I could be totally wrong here! And I don't
>>> want to be wrong. Thanks in advance.
>>>
>>> 
>>>
>>> Hello, dummy here.
>>>
>>> I'm just beginning my first experiments with python 2.7 apps, using
>>> "threadsafe: true". But I'm a clueless n00b as far as python goes.
>>> Well, not a n00b, but still a beginner. And then this multi-threading
>>> thing turns up, and I find myself groaning "oh man, really, does it
>>> have to get this complex?" I think I hear a lot of similar groans out
>>> there ;-)
>>>
>>> I'm betting that the whole "multithreaded" thing in python appengine
>>> apps is scaring plenty of people. I've done a lot of concurrent
>>> programming, but the prospect of dealing with threading in python has
>>> daunted me a bit because I'm a beginner with python and appengine as
>>> it is - this just makes life harder. But hey, it's being added for a
>>> reason; I'd best quit complaining and start figuring it out!
>>>
>>> Thinking about threads and python, I realised that I didn't know how I
>>> needed to actually use multi-threading to make my apps leaner and
>>> meaner. I mean, why would I use them? They're for doing inherently
>>> concurrent things. Serving up pages isn't inherently concurrent stuff,
>>> at the app development level. What exactly is expected here? Shouldn't
>>> the framework be doing that kind of thing for me?
>>>
>>> And of course that was the aha moment. The framework *is* doing the work
>>> for me.
>>>
>>> The situation with python appengine development up until now has been
>>> that instances process serially. They take a request, see it through
>>> to its end. They take another request. And so on. That's cool, but
>>> instances spend a lot of time sitting around waiting when they could
>>> be doing more work.
>>>
>>> But with the new python 2.7 support, you can tell appengine that it
>>> would be ok to give instances more work when they are blocked waiting
>>> for something. eg: if they are doing a big url fetch, or a long query
>>> from datastore, something like that, then it's cool to give them
>>> another request to begin working on, and come back to the waiting
>>> request later when its ready. You do that by setting "threadsafe:
>>> true" in your app.yaml .
>>>
>>> Being threadsafe sounds scary! But actually it shouldn't be a huge
>>> deal. Pretty much it's about what you shouldn't do.
>>>
>>> Multi-threading means having multiple points of execution on the one
>>> codebase in the one address space. Anything you do to touch things
>>> external to that (like datastore, memcache, url fetches) shouldn't
>>> care about that (assuming the client libraries are threadsafe). And
>>> normal code touching local variables will be fine.
>>>
>>> Probably the only real thing you've got to worry about is using
>>> instance memory (global variables more or less). That's because
>>> multiple requests, ie: multiple threads, can come in and fiddle with
>>> that global memory at the same time. You can fix that with some
>>> concurrency primitives, but if that sounds scary you can just avoid
>>> touching global memory in the first place.
>>>
>>> So if you're using instance memory as part of a caching strategy, for
>>> instance (caching like instance-memory -> memcache -> datastore), then
>>> you either need to make the instance memory caching threadsafe, or
>>> just stop using instance memory for that purpose.
>>>
>>> The other big gotcha, implied by this issue with global memory, is
>>> libraries. Which libraries are threadsafe? Plenty probably aren't,
>>> especially some of those shady 3rd party python libs you found lying
>>> around on code.google.com . Why not? Because they use global memory.
>>> But the built in libs should be ok, unless we've been specifically
>>> told they're not, and I don't recall any information like that.
>>>
>>> Oh, and your app needs to use WSGI script handlers, presumably because
>>> the cgi method we were recommended to use in py 2.5 apps is no

Re: [google-appengine] Threaded Python AppEngine for Dummies

2011-10-14 Thread Emlyn
On 15 October 2011 09:12, Ikai Lan (Google)  wrote:
> Yep, your thinking here is correct! Be careful when using global memory as a
> cache, though. Instances are capped at 128mb of memory, and if you exceed
> that, your instance will be killed. This could lead to instance thrashing.
> [On another note: congrats, you got me to read a long email ;).]
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> plus.ikailan.com | twitter.com/ikai

Thanks for reading the long email. Sorry, I should keep them shorter,
but I'm a natural blatherer.

I want to run some tests on the efficacy of using threadsafe:true.
Actually hitting real resources in those tests is a bit rude
(datastore might be ok, but urlfetch is a bit tough on the
target/victim).

If I use time.sleep() (eg: use frontend tasks that basically go
time.sleep(10)), is that going to block in a similar way to urlfetch
or db gets/puts, ie: in a way that'll let the instance process more
work?


> On Fri, Oct 14, 2011 at 2:34 AM, Emlyn  wrote:
>>
>> These are my first thoughts about approaching threaded python 2.7
>> apps. Please critique this, I could be totally wrong here! And I don't
>> want to be wrong. Thanks in advance.
>>
>> 
>>
>> Hello, dummy here.
>>
>> I'm just beginning my first experiments with python 2.7 apps, using
>> "threadsafe: true". But I'm a clueless n00b as far as python goes.
>> Well, not a n00b, but still a beginner. And then this multi-threading
>> thing turns up, and I find myself groaning "oh man, really, does it
>> have to get this complex?" I think I hear a lot of similar groans out
>> there ;-)
>>
>> I'm betting that the whole "multithreaded" thing in python appengine
>> apps is scaring plenty of people. I've done a lot of concurrent
>> programming, but the prospect of dealing with threading in python has
>> daunted me a bit because I'm a beginner with python and appengine as
>> it is - this just makes life harder. But hey, it's being added for a
>> reason; I'd best quit complaining and start figuring it out!
>>
>> Thinking about threads and python, I realised that I didn't know how I
>> needed to actually use multi-threading to make my apps leaner and
>> meaner. I mean, why would I use them? They're for doing inherently
>> concurrent things. Serving up pages isn't inherently concurrent stuff,
>> at the app development level. What exactly is expected here? Shouldn't
>> the framework be doing that kind of thing for me?
>>
>> And of course that was the aha moment. The framework *is* doing the work
>> for me.
>>
>> The situation with python appengine development up until now has been
>> that instances process serially. They take a request, see it through
>> to its end. They take another request. And so on. That's cool, but
>> instances spend a lot of time sitting around waiting when they could
>> be doing more work.
>>
>> But with the new python 2.7 support, you can tell appengine that it
>> would be ok to give instances more work when they are blocked waiting
>> for something. eg: if they are doing a big url fetch, or a long query
>> from datastore, something like that, then it's cool to give them
>> another request to begin working on, and come back to the waiting
>> request later when its ready. You do that by setting "threadsafe:
>> true" in your app.yaml .
>>
>> Being threadsafe sounds scary! But actually it shouldn't be a huge
>> deal. Pretty much it's about what you shouldn't do.
>>
>> Multi-threading means having multiple points of execution on the one
>> codebase in the one address space. Anything you do to touch things
>> external to that (like datastore, memcache, url fetches) shouldn't
>> care about that (assuming the client libraries are threadsafe). And
>> normal code touching local variables will be fine.
>>
>> Probably the only real thing you've got to worry about is using
>> instance memory (global variables more or less). That's because
>> multiple requests, ie: multiple threads, can come in and fiddle with
>> that global memory at the same time. You can fix that with some
>> concurrency primitives, but if that sounds scary you can just avoid
>> touching global memory in the first place.
>>
>> So if you're using instance memory as part of a caching strategy, for
>> instance (caching like instance-memory -> memcache -> datastore), then
>> you either need to make the instance memory caching threadsafe, or
>> just stop using instance memory for that purpose.
>>
>> The other big gotcha, implied by this issue with global memory, is
>> libraries. Which libraries are threadsafe? Plenty probably aren't,
>> especially some of those shady 3rd party python libs you found lying
>> around on code.google.com . Why not? Because they use global memory.
>> But the built in libs should be ok, unless we've been specifically
>> told they're not, and I don't recall any information like that.
>>
>> Oh, and your app needs to use WSGI script handlers, presumably because
>> the cgi method we were recommended to use in py 2.5

Re: [google-appengine] Threaded Python AppEngine for Dummies

2011-10-14 Thread Anand Mistry
If you are using instance caching, you can use the runtime API 
(http://code.google.com/appengine/docs/python/backends/runtimeapi.html) to 
look up your instance memory usage in order to control your cache size.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/FBIsJ9U7h7cJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Threaded Python AppEngine for Dummies

2011-10-14 Thread Ikai Lan (Google)
Yep, your thinking here is correct! Be careful when using global memory as a
cache, though. Instances are capped at 128mb of memory, and if you exceed
that, your instance will be killed. This could lead to instance thrashing.

[On another note: congrats, you got me to read a long email ;).]

--
Ikai Lan
Developer Programs Engineer, Google App Engine
plus.ikailan.com | twitter.com/ikai



On Fri, Oct 14, 2011 at 2:34 AM, Emlyn  wrote:

> These are my first thoughts about approaching threaded python 2.7
> apps. Please critique this, I could be totally wrong here! And I don't
> want to be wrong. Thanks in advance.
>
> 
>
> Hello, dummy here.
>
> I'm just beginning my first experiments with python 2.7 apps, using
> "threadsafe: true". But I'm a clueless n00b as far as python goes.
> Well, not a n00b, but still a beginner. And then this multi-threading
> thing turns up, and I find myself groaning "oh man, really, does it
> have to get this complex?" I think I hear a lot of similar groans out
> there ;-)
>
> I'm betting that the whole "multithreaded" thing in python appengine
> apps is scaring plenty of people. I've done a lot of concurrent
> programming, but the prospect of dealing with threading in python has
> daunted me a bit because I'm a beginner with python and appengine as
> it is - this just makes life harder. But hey, it's being added for a
> reason; I'd best quit complaining and start figuring it out!
>
> Thinking about threads and python, I realised that I didn't know how I
> needed to actually use multi-threading to make my apps leaner and
> meaner. I mean, why would I use them? They're for doing inherently
> concurrent things. Serving up pages isn't inherently concurrent stuff,
> at the app development level. What exactly is expected here? Shouldn't
> the framework be doing that kind of thing for me?
>
> And of course that was the aha moment. The framework *is* doing the work
> for me.
>
> The situation with python appengine development up until now has been
> that instances process serially. They take a request, see it through
> to its end. They take another request. And so on. That's cool, but
> instances spend a lot of time sitting around waiting when they could
> be doing more work.
>
> But with the new python 2.7 support, you can tell appengine that it
> would be ok to give instances more work when they are blocked waiting
> for something. eg: if they are doing a big url fetch, or a long query
> from datastore, something like that, then it's cool to give them
> another request to begin working on, and come back to the waiting
> request later when its ready. You do that by setting "threadsafe:
> true" in your app.yaml .
>
> Being threadsafe sounds scary! But actually it shouldn't be a huge
> deal. Pretty much it's about what you shouldn't do.
>
> Multi-threading means having multiple points of execution on the one
> codebase in the one address space. Anything you do to touch things
> external to that (like datastore, memcache, url fetches) shouldn't
> care about that (assuming the client libraries are threadsafe). And
> normal code touching local variables will be fine.
>
> Probably the only real thing you've got to worry about is using
> instance memory (global variables more or less). That's because
> multiple requests, ie: multiple threads, can come in and fiddle with
> that global memory at the same time. You can fix that with some
> concurrency primitives, but if that sounds scary you can just avoid
> touching global memory in the first place.
>
> So if you're using instance memory as part of a caching strategy, for
> instance (caching like instance-memory -> memcache -> datastore), then
> you either need to make the instance memory caching threadsafe, or
> just stop using instance memory for that purpose.
>
> The other big gotcha, implied by this issue with global memory, is
> libraries. Which libraries are threadsafe? Plenty probably aren't,
> especially some of those shady 3rd party python libs you found lying
> around on code.google.com . Why not? Because they use global memory.
> But the built in libs should be ok, unless we've been specifically
> told they're not, and I don't recall any information like that.
>
> Oh, and your app needs to use WSGI script handlers, presumably because
> the cgi method we were recommended to use in py 2.5 apps is not
> threadsafe.
>
> So to sum up, if you aren't too sure about multi threading and want to
> keep it simple, it seems like you can get your existing app processing
> parallel requests by doing the following:
>  - Remove uses of global instance memory (if you don't know what that
> means you're probably not doing it anyway)
>  - Remove/replace non threadsafe libraries (tricky - do more
> experienced pythonistas know of any way to easily determine this? eg
> pre-existing lists?)
>  - Modify your app starting point, the bit that wrangles your
> WSGIApplication, so that it works like this:
>
> http://code.google.com/appengine/docs/python/gett

[google-appengine] Threaded Python AppEngine for Dummies

2011-10-14 Thread Emlyn
These are my first thoughts about approaching threaded python 2.7
apps. Please critique this, I could be totally wrong here! And I don't
want to be wrong. Thanks in advance.



Hello, dummy here.

I'm just beginning my first experiments with python 2.7 apps, using
"threadsafe: true". But I'm a clueless n00b as far as python goes.
Well, not a n00b, but still a beginner. And then this multi-threading
thing turns up, and I find myself groaning "oh man, really, does it
have to get this complex?" I think I hear a lot of similar groans out
there ;-)

I'm betting that the whole "multithreaded" thing in python appengine
apps is scaring plenty of people. I've done a lot of concurrent
programming, but the prospect of dealing with threading in python has
daunted me a bit because I'm a beginner with python and appengine as
it is - this just makes life harder. But hey, it's being added for a
reason; I'd best quit complaining and start figuring it out!

Thinking about threads and python, I realised that I didn't know how I
needed to actually use multi-threading to make my apps leaner and
meaner. I mean, why would I use them? They're for doing inherently
concurrent things. Serving up pages isn't inherently concurrent stuff,
at the app development level. What exactly is expected here? Shouldn't
the framework be doing that kind of thing for me?

And of course that was the aha moment. The framework *is* doing the work for me.

The situation with python appengine development up until now has been
that instances process serially. They take a request, see it through
to its end. They take another request. And so on. That's cool, but
instances spend a lot of time sitting around waiting when they could
be doing more work.

But with the new python 2.7 support, you can tell appengine that it
would be ok to give instances more work when they are blocked waiting
for something. eg: if they are doing a big url fetch, or a long query
from datastore, something like that, then it's cool to give them
another request to begin working on, and come back to the waiting
request later when its ready. You do that by setting "threadsafe:
true" in your app.yaml .

Being threadsafe sounds scary! But actually it shouldn't be a huge
deal. Pretty much it's about what you shouldn't do.

Multi-threading means having multiple points of execution on the one
codebase in the one address space. Anything you do to touch things
external to that (like datastore, memcache, url fetches) shouldn't
care about that (assuming the client libraries are threadsafe). And
normal code touching local variables will be fine.

Probably the only real thing you've got to worry about is using
instance memory (global variables more or less). That's because
multiple requests, ie: multiple threads, can come in and fiddle with
that global memory at the same time. You can fix that with some
concurrency primitives, but if that sounds scary you can just avoid
touching global memory in the first place.

So if you're using instance memory as part of a caching strategy, for
instance (caching like instance-memory -> memcache -> datastore), then
you either need to make the instance memory caching threadsafe, or
just stop using instance memory for that purpose.

The other big gotcha, implied by this issue with global memory, is
libraries. Which libraries are threadsafe? Plenty probably aren't,
especially some of those shady 3rd party python libs you found lying
around on code.google.com . Why not? Because they use global memory.
But the built in libs should be ok, unless we've been specifically
told they're not, and I don't recall any information like that.

Oh, and your app needs to use WSGI script handlers, presumably because
the cgi method we were recommended to use in py 2.5 apps is not
threadsafe.

So to sum up, if you aren't too sure about multi threading and want to
keep it simple, it seems like you can get your existing app processing
parallel requests by doing the following:
 - Remove uses of global instance memory (if you don't know what that
means you're probably not doing it anyway)
 - Remove/replace non threadsafe libraries (tricky - do more
experienced pythonistas know of any way to easily determine this? eg
pre-existing lists?)
 - Modify your app starting point, the bit that wrangles your
WSGIApplication, so that it works like this:
  
http://code.google.com/appengine/docs/python/gettingstartedpython27/helloworld.html
   and not like this:
  
http://code.google.com/appengine/docs/python/gettingstarted/usingwebapp.html
- Set up your app.yaml properly, as per:
  
http://code.google.com/appengine/docs/python/gettingstartedpython27/helloworld.html
- Update your SDK to 1.5.5 (or later) otherwise it'll refuse to upload.

I don't think the dev appserver will run your code concurrently yet,
but you can always set threadsafe: false for local development, then
change it before you upload.

On a related note, there is other stuff that you need to check to make
sure your ap