Re: [google-appengine] Re-entrancy / deadlock bug with scheduler?

Jon McAlister Wed, 07 Sep 2011 13:19:15 -0700

Hi Pol,

I think I have a change that will fix this, which should be available
once 1.5.4 is pushed out. What is the app-id here?


On Mon, Sep 5, 2011 at 10:41 AM, Pol <i...@pol-online.net> wrote:
> Hi,
>
> Our app settings are as follow:
> - Python + HRD
> - Max Idle Instances: ( 2 )
> - Min Pending Latency: ( 100ms )
> As of right now, there are 3 instances alive.
>
> Without going too much into details, we have GAE integrated with EC2
> on which we run remote image processing tools. The tools are called
> directly using HTTP GETs from GAE and they returned their results as
> JSON (with gzip content encoding).
>
> There are currently 3 tasks in the processing queue on GAE
> continuously failing: the urlfetch() calls to the EC2 tool reach the
> 10 seconds timeout and bail. What doesn't make sense is that calling
> the EC2 tool directly using curl from random machines succeeds in less
> than 1 second.
>
> But here's the trick: under certain circumstances, the EC2 tool will
> call back to GAE (HEAD request that does a single db.get()) to check
> if the image has already been processed and this happens for these 3
> stuck tasks.
>
> If calling the EC2 tool from the command line and curl, we have the
> normal behavior:
> - EC2 tool retrieves image from arbitrary URL and computes a hash
> - EC2 tool does a HEAD call to GAE passing this hash to see if image
> has been already processed
>  - If yes, return empty JSON
>  - If no, process image and return full JSON
> This takes about 1 second.
>
> The exact same call done from GAE produces this behavior:
> - EC2 tool retrieves image from arbitrary URL and computes a hash
> - EC2 tool does a HEAD call to GAE passing this hash to see if image
> has been already processed
>  -> HEAD call hangs  <--- RE-ENTRANCY / DEADLOCK BUG in GAE
>  -> urlfetch() from GAE to EC2 reaches 10 seconds deadline and
> aborts  <-- BREAKS DEADLOCK
>  -> HEAD call finally returns
> - EC2 tool completes normally
>
> GAE logs confirm the bug:
>
> HEAD call from EC2 / curl origina
> 2011-09-05 10:19:52.502 /api/has_backing?
> bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703
> 200 368ms 48cpu_ms 8api_cpu_ms 0kb Everpix-Processor
>
> HEAD call from EC2 / GAE origin
> 2011-09-05 10:20:44.670 /api/has_backing?
> bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703
> 200 9712ms 48cpu_ms 8api_cpu_ms 0kb Everpix-Processor
> 2011-09-05 10:20:44.547 /task/import_photo 500 10348ms 28cpu_ms
> 8api_cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
> (see how the HEAD /api/has_backing call hangs for almost 10 seconds
> and only returns *after* /task/import_photo and its urlfetch() call to
> EC2 has aborted)
>
> And finally, AppStats confirms that it's not the head() Python
> execution itself that's hanging:
>
> (1) 2011-09-05 09:16:06.843 "HEAD /api/has_backing?
> bid=3bc4aeb08e01d3ba4bfab3282d2a198984a4fc1fab2ad9d1e8a39ee3cddd14da"
> 200 real=227ms cpu=24ms api=8ms overhead=0ms (1 RPC)
> (2) 2011-09-05 09:15:56.422 "POST /task/import_photo" 500 real=10002ms
> cpu=33ms api=8ms overhead=0ms (3 RPCs)
> (3) 2011-09-05 09:15:49.328 "HEAD /api/has_backing?
> bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703"
> 200 real=297ms cpu=21ms api=8ms overhead=0ms (1 RPC)
>
>
> This issue is currently 100% reproducible.
>
> - Pol
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re-entrancy / deadlock bug with scheduler?

Reply via email to