Are are using backends to parallelize large batches of work for our
users. This involves adding 100's of tasks to a dedicated queue which
spreads the tasks across 5 dynamic backends. Many of these tasks do a
url fetch to external web services.

When we run this for one user it seems to work fine but as soon as we
run it for many users (e.g. in a nightly batch run) we see most of the
backends eventually stopping processing new tasks. We've tried rate
limiting it but that has not helped either. I can manually stop the
backends that are frozen and new ones will fire up and start
processing until they too freeze. Eventually it all completes after a
few iterations of stopping frozen servers.

Obviously this is severely limiting our ability to scale so I'm
wondering how to diagnose this problem. We cannot reproduce this on
localhost since its not really multi-threaded and doesn't truly
replicate the deployed data patterns.

We could possibly handle this by using pull queues but that's more
code to write and I'd rather let the queue/backend scheduler do that
work for us.

What I'd like to do is attach a profiler to a backend - that would
immediately tell me where the freezes are coming from. Is that even
possible? If not, does anybody have any other tricks that we could use
to diagnose frozen backends?

fyi : the "frozen" instances have varied levels of memory from 150mb
to 250mb, consume zero cpu when frozen and process no new tasks.

Thanks for your attention and help, Steve

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to