[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-19 Thread Cezary Wagner
Alexis,

It looks that GEA has hidden balancing/lack of resources problems or
deadlocks both results in RANDOM problems.

I think about moving to HRD but it looks that you suffer same on HRD
the last days so it will not help till GEA will not generate RANDOM
problems - I do some optimization it helps with time but not yet with
availability.

Day Total checksOutages Failed checks   Avg. response time  Uptime
2012.01.14  26  1   1   3.152   96.154 %
2012.01.15  47  1   1   4.937   97.872 %
2012.01.16  48  4   5   3.699   89.583 %
2012.01.17  48  3   5   2.507   89.583 %
2012.01.18  48  1   1   2.419   97.917 %
2012.01.19  47  2   2   3.143   95.745 %

On 18 Sty, 11:02, Alexis  wrote:
> Cezary, I was in the same position in August, with our Python apps on
> M/S.
> The DEE errors began to appear, then were more and more frequent and
> finally had a high impact on our app availability.
> Our warmup requests, that only load code, were randomly taking from a
> few secs to more than 30sec and so triggering the DEE.
>
> I could not figure out how it could be linked to the datastore, but
> Google team only had one word to mouth: move to HRD.
> No explanation at that time and made so sense for me but I decided to
> trust them.
>
> We migrated to HRD writing our own mapreduce and then all went fine
> (except for the costs that went higher but hey, was the price for a
> more reliable
> solution).
>
> But since something like mid-December, exactly the same symptoms
> begin to appear again. Less frequent and not really impacting our
> availability yet, but still there.
> And no official acknowledgement of this issue so far...

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-18 Thread Alexis
Cezary, I was in the same position in August, with our Python apps on
M/S.
The DEE errors began to appear, then were more and more frequent and
finally had a high impact on our app availability.
Our warmup requests, that only load code, were randomly taking from a
few secs to more than 30sec and so triggering the DEE.

I could not figure out how it could be linked to the datastore, but
Google team only had one word to mouth: move to HRD.
No explanation at that time and made so sense for me but I decided to
trust them.

We migrated to HRD writing our own mapreduce and then all went fine
(except for the costs that went higher but hey, was the price for a
more reliable
solution).

But since something like mid-December, exactly the same symptoms
begin to appear again. Less frequent and not really impacting our
availability yet, but still there.
And no official acknowledgement of this issue so far...

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-18 Thread Alexis

Cezary, I was in the same position in August, with our apps on M/S.

The DEE errors began to appear, then were more and more frequent and
finally had a high impact on our app availability.
Our warmup requests, that only load code, were randomly taking from a
few secs to more than 30sec and so triggering the DEE.
I could not figure out how it could be linked to the datastore, but
Google team only had one word to mouth: move to HRD.
No explanation at that time and made so sense for me but I decided to
trust them.

We migrated to HRD writing our own mapreduce and then all went fine
(except for the costs that went higher but hey, was the price for a
more reliable
solution).


But since something like mid-December, exactly the same symptoms
begin
to appear again. Less frequent and not really impacting our
availability yet, but still
there.

And no official acknowledgement of this issue so far...

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-18 Thread Alexis
I was in the same position in August, with our apps on M/S.
The DEE errors began to appear, then were more and more frequent and
finally had a high impact on our app availability.
We migrated to HRD writing our own mapreduce and then all went fine
(except for the costs but hey, was the price for a more reliable
solution).

But since something like mid-December, exactly the same symptoms began
to appear again.
Less frequent and not really impacting our availability yet, but still
there.
And no official acknowledgement of this issue so far...

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Cezary Wagner
Brandon,

Test on Master/Slave.

Not keeping point that my code is perfect I keep point that GEA LOAD
behavior is RANDOM/UNPREDICTED - once it load faster (5s) once it not
loads since DeadlineExceed (70s) - it not enough for production
system. Problem start from last weeks.

I will do some imports tracing and found what imports has the major
impact but problem is that its load RANDOM once faster once longer - I
do some downgrade one module but still GAE works worser than before
but should work faster since added very fast caching and
optimizations.

Planning to do HRD to see if it has any impact on customer experience/
availabity.

See how random it is (first load time if from module start and second
for the single module).




Shows time only above second - both requests has same input variance
is from 17s, 30s till deadline - it is NOT STABLE:




2012-01-17 14:16:07.807 ImportLogger 1.03 1.02 - frontend.web.handlers
- frontend.web.handlers.common:17.
D 2012-01-17 14:16:09.310 ImportLogger 2.53 1.50 - google -
frontend.web.handlers.common:28.
D 2012-01-17 14:16:10.164 ImportLogger 3.38 0.53 - frontend.web.cookie
- frontend.web.handlers.common:45.
D 2012-01-17 14:16:11.118 ImportLogger 4.34 0.95 - frontend.web -
frontend.web.handlers.common:47.
D 2012-01-17 14:16:12.106 ImportLogger 5.32 0.93 -
frontend.web.authentication - frontend.web.handlers.common:52.
D 2012-01-17 14:16:12.239 ImportLogger 6.66 6.66 - common handler -
__main__:13.
D 2012-01-17 14:16:18.552 ImportLogger 12.97 6.31 -
frontend.restaurant - __main__:29.
D 2012-01-17 14:16:19.886 ImportLogger 1.23 1.23 - cssutils -
css_inliner:8.
D 2012-01-17 14:16:19.924 ImportLogger 1.30 1.28 - css_inliner -
frontend.ordering.messages:24.
D 2012-01-17 14:16:22.118 ImportLogger 3.50 2.13 - utilities.time -
frontend.ordering.messages:39.
D 2012-01-17 14:16:22.579 ImportLogger 3.97 3.96 - frontend.ordering -
frontend.ordering.processing:22.
D 2012-01-17 14:16:22.642 ImportLogger 17.06 4.04 - frontend.ordering
- __main__:35.




2012-01-17 14:15:41.905 ImportLogger 0.95 0.61 -
frontend.ordering.order - __main__:33.
D 2012-01-17 14:15:59.633 ImportLogger 17.32 17.32 - cssutils -
css_inliner:8.
D 2012-01-17 14:16:03.322 ImportLogger 21.01 3.69 - BeautifulSoup -
css_inliner:13.
D 2012-01-17 14:16:04.559 ImportLogger 22.25 1.24 -
pynliner.soupselect - css_inliner:17.
D 2012-01-17 14:16:04.559 ImportLogger 22.51 22.37 - css_inliner -
frontend.ordering.messages:24.
D 2012-01-17 14:16:06.902 ImportLogger 24.85 2.34 -
messaging.sms.smsapipl - frontend.ordering.messages:27.
D 2012-01-17 14:16:07.636 ImportLogger 25.59 0.73 - link_shortener -
frontend.ordering.messages:29.
D 2012-01-17 14:16:08.990 ImportLogger 26.94 0.64 - globals -
frontend.ordering.messages:54.
D 2012-01-17 14:16:08.990 ImportLogger 27.02 27.01 - frontend.ordering
- frontend.ordering.processing:22.
D 2012-01-17 14:16:10.166 ImportLogger 28.20 0.85 -
frontend.ordering.order_state_controller -
frontend.ordering.processing:26.
D 2012-01-17 14:16:12.010 ImportLogger 30.04 1.84 - frontend.billing -
frontend.ordering.processing:30.
D 2012-01-17 14:16:12.010 ImportLogger 31.06 30.10 - frontend.ordering
- __main__:35.

On Jan 17, 10:50 am, "Brandon Wirtz"  wrote:
> You keep pointing to reasons you believe it isn't your code, and why we are
> wrong, but you aren't trying the things we point out.  If your time out are
> at 70 seconds you are doing something wrong, it shouldn't take that long to
> start up. EVER.
>
> Also you are on M/S which has its own rules. Jump in to the modern world and
> run on HR with the rest of us happy shiny people, rather than running on the
> unsupported, failed experiment that M/S is.
>
>
>
>
>
>
>
> -Original Message-
> From: google-appengine@googlegroups.com
>
> [mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
> Sent: Tuesday, January 17, 2012 1:43 AM
> To: Google App Engine
> Subject: [google-appengine] Re: Why are several production issues related to
> DeadlineExceededErrors being ignored?
>
> Kenneth,
>
> The timeouts could not be 10s or 15s - since DeadlineExceededErrors occurs
> in my logs is about 60s - I think that 10s or 15s is myths or wrong
> implementation since what is sense to give 60s if it should start in 15s???
>
> Common timeouts are: 64608ms, 70806ms, 63093ms, 64499ms, ...
>
> On Jan 17, 9:10 am, Kenneth  wrote:
> > Wow, sorry to hear that. Is it java or python? I've been on hrd now
> > for about a month. I have the sliders set to el cheapo mode and
> > haven't had any timeouts at all.
>
> > My understanding is the startup timeout is 10 seconds. The total
> > request timeout is 60 seconds.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, 

RE: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Brandon Wirtz
You keep pointing to reasons you believe it isn't your code, and why we are
wrong, but you aren't trying the things we point out.  If your time out are
at 70 seconds you are doing something wrong, it shouldn't take that long to
start up. EVER.

Also you are on M/S which has its own rules. Jump in to the modern world and
run on HR with the rest of us happy shiny people, rather than running on the
unsupported, failed experiment that M/S is.


-Original Message-
From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
Sent: Tuesday, January 17, 2012 1:43 AM
To: Google App Engine
Subject: [google-appengine] Re: Why are several production issues related to
DeadlineExceededErrors being ignored?

Kenneth,

The timeouts could not be 10s or 15s - since DeadlineExceededErrors occurs
in my logs is about 60s - I think that 10s or 15s is myths or wrong
implementation since what is sense to give 60s if it should start in 15s???

Common timeouts are: 64608ms, 70806ms, 63093ms, 64499ms, ...


On Jan 17, 9:10 am, Kenneth  wrote:
> Wow, sorry to hear that.  Is it java or python?  I've been on hrd now
> for about a month.  I have the sliders set to el cheapo mode and
> haven't had any timeouts at all.
>
> My understanding is the startup timeout is 10 seconds. The total
> request timeout is 60 seconds.

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



RE: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Brandon Wirtz
You are on M/S. You could be doing "Echo Hello World" and nothing else and
get Deadline Exceeded Errors. Maybe not quite but if you have any imports
that use Memcache than you would.

M/S has Zero advantage at this point.


-Original Message-
From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
Sent: Tuesday, January 17, 2012 1:37 AM
To: Google App Engine
Subject: [google-appengine] Re: Why are several production issues related to
DeadlineExceededErrors being ignored?

Brandon,

I am doing service management/development many years and your explanation
not convince me since it not show partially valid root cause - I think so -
see why it not convincing.

Low availability is a fact see report again it random for same thing:
2012.01.14  26  1   1   3.152   96.154 %
2012.01.15  47  1   1   4.937   97.872 %
2012.01.16  48  4   5   3.699   89.583 %
2012.01.17  20  0   0   1.699   100 %

Now it looks that it works but I try to open some test pages and it opens
slowly comparing to previous week. TODAY customer experience is still bad -
I do some optimization it visible in times.

You said that it could matter on/have root cause:
1. Configuration of application - why it is RANDOM - once something loads
once not loads - same configuration should give same results.

2. Instance die early if I hit memory limit - good point but not see in log
any message about memory and application is much below 48M - it should not
die - why it is RANDOM.

3. Master/Slave is NOT ROOT CAUSE since DeadlineExceededError occcurs before
any datastore will be touched so there is not impact - are you agree with
that?

4. If Master/Slave is not supported by Google is there any information that
is not supported and from which date?

QUICK example of DeadlineExceededError (dead in imports):
:
Traceback (most recent call last):
  File "/base/data/home/apps/wcinamy/1-9-9-6.356145314073492481/
frontend/web/order_in_restaurant.py", line 20, in 
    from frontend.web.cache import restaurant_menu as cache_restaurant_menu
  File "/base/data/home/apps/wcinamy/1-9-9-6.356145314073492481/
frontend/web/cache/restaurant_menu.py", line 4, in 
    from google.appengine.api import memcache

On Jan 17, 12:19 am, "Brandon Wirtz"  wrote:
> If it is a software / configuration issue in your setup then it is not
> down time.
>
> If you have errors on M/S you are on your own.
>
> Instances die early if you hit the soft memory limit. Likely if you
> have load time issues you have memory usage issues as well.
>
>
>
>
>
>
>
> -Original Message-
> From: google-appengine@googlegroups.com
>
> [mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
> Sent: Monday, January 16, 2012 2:37 PM
> To: Google App Engine
> Subject: [google-appengine] Re: Why are several production issues
> related to DeadlineExceededErrors being ignored?
>
> Brandom,
>
> I think that problem of DeadlinesExceeds is different:
> 1st of GAE availability is not measure considering DeadlinesExceeds -
> it will be rather not 100%. If it not monitored it has not impact on
quality.
> Am I wrong?
> This is data for the last week:
> Day     Total checks    Outages Failed checks   Avg. response time
> Uptime
> 2012.01.14      26      1       1       3.152   96.154 %
> 2012.01.15      47      1       1       4.937   97.872 %
> 2012.01.16      47      4       5       3.657   89.362 %
>
> 2nd My code without changes through months becomes slower/not faster -
> I have optimized code and it works not faster - on SDK it executes
> 5s-1s (excluding load time) see above results from production. It
> could be suffered with imports but why it once loads and once not
> loads it random - it should not load every time or it problem with
> balance/resource - why IT IS RANDOM?
>
> 3rd Instance DIES just after load not survive 15min period - instance
> was load for 1 min than it dies - that is some problem or lack of
> balance/resource?
>
> 4rd Maybe master/slave has impact?
>
> 5th The application not works in my case under low traffic EARLY DIES
> of INSTANCES and DeadlinesExceeds kills 3%-10% of customer
> traffic(stats above)
> - I am imagine that GAE works some for higher traffic with exception
> that number of errors will be lower - maybe - but how achieve high
> traffic if it not works on lower?
>
> Please answer to question or propose other solution - GAE concept is
> good but the current my customers experience is not excellent.
>
> On 16 Sty, 18:58, "Brandon Wirtz"  wrote:
> > I agree multiple imports is not supposed to be a problem, but I have
> > seen it cause issues, or seen issues be resolved by not doing it.
>

[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Cezary Wagner
Kenneth,

The timeouts could not be 10s or 15s - since DeadlineExceededErrors
occurs in my logs is about 60s - I think that 10s or 15s is myths or
wrong implementation since what is sense to give 60s if it should
start in 15s???

Common timeouts are: 64608ms, 70806ms, 63093ms, 64499ms, ...


On Jan 17, 9:10 am, Kenneth  wrote:
> Wow, sorry to hear that.  Is it java or python?  I've been on hrd now for
> about a month.  I have the sliders set to el cheapo mode and haven't had
> any timeouts at all.
>
> My understanding is the startup timeout is 10 seconds. The total request
> timeout is 60 seconds.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Cezary Wagner
Brandon,

I am doing service management/development many years and your
explanation not convince me since it not show partially valid root
cause - I think so - see why it not convincing.

Low availability is a fact see report again it random for same thing:
2012.01.14  26  1   1   3.152   96.154 %
2012.01.15  47  1   1   4.937   97.872 %
2012.01.16  48  4   5   3.699   89.583 %
2012.01.17  20  0   0   1.699   100 %

Now it looks that it works but I try to open some test pages and it
opens slowly comparing to previous week. TODAY customer experience is
still bad - I do some optimization it visible in times.

You said that it could matter on/have root cause:
1. Configuration of application - why it is RANDOM - once something
loads once not loads - same configuration should give same results.

2. Instance die early if I hit memory limit - good point but not see
in log any message about memory and application is much below 48M - it
should not die - why it is RANDOM.

3. Master/Slave is NOT ROOT CAUSE since DeadlineExceededError occcurs
before any datastore will be touched so there is not impact - are you
agree with that?

4. If Master/Slave is not supported by Google is there any information
that is not supported and from which date?

QUICK example of DeadlineExceededError (dead in imports):
:
Traceback (most recent call last):
  File "/base/data/home/apps/wcinamy/1-9-9-6.356145314073492481/
frontend/web/order_in_restaurant.py", line 20, in 
    from frontend.web.cache import restaurant_menu as
cache_restaurant_menu
  File "/base/data/home/apps/wcinamy/1-9-9-6.356145314073492481/
frontend/web/cache/restaurant_menu.py", line 4, in 
    from google.appengine.api import memcache

On Jan 17, 12:19 am, "Brandon Wirtz"  wrote:
> If it is a software / configuration issue in your setup then it is not down
> time.
>
> If you have errors on M/S you are on your own.
>
> Instances die early if you hit the soft memory limit. Likely if you have
> load time issues you have memory usage issues as well.
>
>
>
>
>
>
>
> -Original Message-
> From: google-appengine@googlegroups.com
>
> [mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
> Sent: Monday, January 16, 2012 2:37 PM
> To: Google App Engine
> Subject: [google-appengine] Re: Why are several production issues related to
> DeadlineExceededErrors being ignored?
>
> Brandom,
>
> I think that problem of DeadlinesExceeds is different:
> 1st of GAE availability is not measure considering DeadlinesExceeds - it
> will be rather not 100%. If it not monitored it has not impact on quality.
> Am I wrong?
> This is data for the last week:
> Day     Total checks    Outages Failed checks   Avg. response time
> Uptime
> 2012.01.14      26      1       1       3.152   96.154 %
> 2012.01.15      47      1       1       4.937   97.872 %
> 2012.01.16      47      4       5       3.657   89.362 %
>
> 2nd My code without changes through months becomes slower/not faster - I
> have optimized code and it works not faster - on SDK it executes 5s-1s
> (excluding load time) see above results from production. It could be
> suffered with imports but why it once loads and once not loads it random -
> it should not load every time or it problem with balance/resource - why IT
> IS RANDOM?
>
> 3rd Instance DIES just after load not survive 15min period - instance was
> load for 1 min than it dies - that is some problem or lack of
> balance/resource?
>
> 4rd Maybe master/slave has impact?
>
> 5th The application not works in my case under low traffic EARLY DIES of
> INSTANCES and DeadlinesExceeds kills 3%-10% of customer traffic(stats above)
> - I am imagine that GAE works some for higher traffic with exception that
> number of errors will be lower - maybe - but how achieve high traffic if it
> not works on lower?
>
> Please answer to question or propose other solution - GAE concept is good
> but the current my customers experience is not excellent.
>
> On 16 Sty, 18:58, "Brandon Wirtz"  wrote:
> > I agree multiple imports is not supposed to be a problem, but I have
> > seen it cause issues, or seen issues be resolved by not doing it.
>
> > Remember that the Google Implementation of Python has its own
> "Specialness"
> > and what is true in traditional python is not always quite the same in
> > GAE land.
>
> > From: google-appengine@googlegroups.com
> > [mailto:google-appengine@googlegroups.com] On Behalf Of Karl Rosaen
> > Sent: Monday, January 16, 2012 7:32 AM
> > To: google-appengine@googlegroups.com
> > Subject: Re: [google-appengine] Re: Why are several production issues
> > related to DeadlineExceededErrors being

[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Kenneth
Wow, sorry to hear that.  Is it java or python?  I've been on hrd now for 
about a month.  I have the sliders set to el cheapo mode and haven't had 
any timeouts at all.

My understanding is the startup timeout is 10 seconds. The total request 
timeout is 60 seconds.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/_A24iQ-Y_OwJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-17 Thread Kenneth
I always found that startup errors and datastore errors happened at the 
same time. I assume instances are getting the code from the datastore.

For reasons unknown, the master slave datastore is broken and Google's 
solution is hrd.  I was in the same mental position as you at the beginning 
of December, cursing Google over pos that is the msd, filing issues and 
appealing to Googlers to fix it, looking at the mess that is migration and 
thinking no way.  Then I just jumped before Christmas and haven't looked 
back.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/uweMR1eO4HQJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



RE: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Brandon Wirtz
If it is a software / configuration issue in your setup then it is not down
time.

If you have errors on M/S you are on your own.

Instances die early if you hit the soft memory limit. Likely if you have
load time issues you have memory usage issues as well.


-Original Message-
From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Cezary Wagner
Sent: Monday, January 16, 2012 2:37 PM
To: Google App Engine
Subject: [google-appengine] Re: Why are several production issues related to
DeadlineExceededErrors being ignored?

Brandom,

I think that problem of DeadlinesExceeds is different:
1st of GAE availability is not measure considering DeadlinesExceeds - it
will be rather not 100%. If it not monitored it has not impact on quality.
Am I wrong?
This is data for the last week:
Day Total checksOutages Failed checks   Avg. response time
Uptime
2012.01.14  26  1   1   3.152   96.154 %
2012.01.15  47  1   1   4.937   97.872 %
2012.01.16  47  4   5   3.657   89.362 %

2nd My code without changes through months becomes slower/not faster - I
have optimized code and it works not faster - on SDK it executes 5s-1s
(excluding load time) see above results from production. It could be
suffered with imports but why it once loads and once not loads it random -
it should not load every time or it problem with balance/resource - why IT
IS RANDOM?

3rd Instance DIES just after load not survive 15min period - instance was
load for 1 min than it dies - that is some problem or lack of
balance/resource?

4rd Maybe master/slave has impact?

5th The application not works in my case under low traffic EARLY DIES of
INSTANCES and DeadlinesExceeds kills 3%-10% of customer traffic(stats above)
- I am imagine that GAE works some for higher traffic with exception that
number of errors will be lower - maybe - but how achieve high traffic if it
not works on lower?

Please answer to question or propose other solution - GAE concept is good
but the current my customers experience is not excellent.

On 16 Sty, 18:58, "Brandon Wirtz"  wrote:
> I agree multiple imports is not supposed to be a problem, but I have
> seen it cause issues, or seen issues be resolved by not doing it.
>
> Remember that the Google Implementation of Python has its own
"Specialness"
> and what is true in traditional python is not always quite the same in
> GAE land.
>
> From: google-appengine@googlegroups.com
> [mailto:google-appengine@googlegroups.com] On Behalf Of Karl Rosaen
> Sent: Monday, January 16, 2012 7:32 AM
> To: google-appengine@googlegroups.com
> Subject: Re: [google-appengine] Re: Why are several production issues
> related to DeadlineExceededErrors being ignored?
>
> Brandon, thanks so much for taking the time to put together the video,
> very helpful.
>
> The key insight seem to be: time spent in the queue waiting for a
> frontend counts towards the limit for a DeadlineExceedError.  This
> seems silly - seems to me user visible latency, and framework level
> timeout enforcement should be decoupled in this case.  But good
> insight and glad to better understand this behavior.  This also makes
> me wonder what the benefit of having 'auto' for max pending latency
> would ever be - I'm going to slide mine down to ~1s.
>
> One quibble about your advice for 'avoid importing code more than once':
> this shouldn't be a major issue in python unless you are importing a
> module from within a function that is called several times:
>
> Although Python's interpreter is optimized to not import the same
> module multiple times, repeatedly executing an import statement can
> seriously affect performance in some circumstances.
>
> http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statem...
> rhead
>
> Just wanted to clarify that one needn't fret about the same utility
> module being imported from two modules or anything like that.
>
> Karl
> --
> You received this message because you are subscribed to the Google
> Groups "Google App Engine" group.
> To view this discussion on the web
visithttps://groups.google.com/d/msg/google-appengine/-/7FTyQ34tagsJ.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group
athttp://groups.google.com/group/google-appengine?hl=en.

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.

[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Cezary Wagner
Brandom,

I think that problem of DeadlinesExceeds is different:
1st of GAE availability is not measure considering DeadlinesExceeds -
it will be rather not 100%. If it not monitored it has not impact on
quality. Am I wrong?
This is data for the last week:
Day Total checksOutages Failed checks   Avg. response time  Uptime
2012.01.14  26  1   1   3.152   96.154 %
2012.01.15  47  1   1   4.937   97.872 %
2012.01.16  47  4   5   3.657   89.362 %

2nd My code without changes through months becomes slower/not faster -
I have optimized code and it works not faster - on SDK it executes
5s-1s (excluding load time) see above results from production. It
could be suffered with imports but why it once loads and once not
loads it random - it should not load every time or it problem with
balance/resource - why IT IS RANDOM?

3rd Instance DIES just after load not survive 15min period - instance
was load for 1 min than it dies - that is some problem or lack of
balance/resource?

4rd Maybe master/slave has impact?

5th The application not works in my case under low traffic EARLY DIES
of INSTANCES and DeadlinesExceeds kills 3%-10% of customer
traffic(stats above) - I am imagine that GAE works some for higher
traffic with exception that number of errors will be lower - maybe -
but how achieve high traffic if it not works on lower?

Please answer to question or propose other solution - GAE concept is
good but the current my customers experience is not excellent.

On 16 Sty, 18:58, "Brandon Wirtz"  wrote:
> I agree multiple imports is not supposed to be a problem, but I have seen it
> cause issues, or seen issues be resolved by not doing it.
>
> Remember that the Google Implementation of Python has its own "Specialness"
> and what is true in traditional python is not always quite the same in GAE
> land.
>
> From: google-appengine@googlegroups.com
> [mailto:google-appengine@googlegroups.com] On Behalf Of Karl Rosaen
> Sent: Monday, January 16, 2012 7:32 AM
> To: google-appengine@googlegroups.com
> Subject: Re: [google-appengine] Re: Why are several production issues
> related to DeadlineExceededErrors being ignored?
>
> Brandon, thanks so much for taking the time to put together the video, very
> helpful.
>
> The key insight seem to be: time spent in the queue waiting for a frontend
> counts towards the limit for a DeadlineExceedError.  This seems silly -
> seems to me user visible latency, and framework level timeout enforcement
> should be decoupled in this case.  But good insight and glad to better
> understand this behavior.  This also makes me wonder what the benefit of
> having 'auto' for max pending latency would ever be - I'm going to slide
> mine down to ~1s.
>
> One quibble about your advice for 'avoid importing code more than once':
> this shouldn't be a major issue in python unless you are importing a module
> from within a function that is called several times:
>
> Although Python's interpreter is optimized to not import the same module
> multiple times, repeatedly executing an import statement can seriously
> affect performance in some circumstances.
>
> http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statem...
> rhead
>
> Just wanted to clarify that one needn't fret about the same utility module
> being imported from two modules or anything like that.
>
> Karl
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web 
> visithttps://groups.google.com/d/msg/google-appengine/-/7FTyQ34tagsJ.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group 
> athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Alexis

Kenneth: I'm talking HRD

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



RE: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Brandon Wirtz
I agree multiple imports is not supposed to be a problem, but I have seen it
cause issues, or seen issues be resolved by not doing it.
 
Remember that the Google Implementation of Python has its own "Specialness"
and what is true in traditional python is not always quite the same in GAE
land.
 
From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Karl Rosaen
Sent: Monday, January 16, 2012 7:32 AM
To: google-appengine@googlegroups.com
Subject: Re: [google-appengine] Re: Why are several production issues
related to DeadlineExceededErrors being ignored?
 
Brandon, thanks so much for taking the time to put together the video, very
helpful.
 
The key insight seem to be: time spent in the queue waiting for a frontend
counts towards the limit for a DeadlineExceedError.  This seems silly -
seems to me user visible latency, and framework level timeout enforcement
should be decoupled in this case.  But good insight and glad to better
understand this behavior.  This also makes me wonder what the benefit of
having 'auto' for max pending latency would ever be - I'm going to slide
mine down to ~1s.
 
One quibble about your advice for 'avoid importing code more than once':
this shouldn't be a major issue in python unless you are importing a module
from within a function that is called several times:
 
Although Python's interpreter is optimized to not import the same module
multiple times, repeatedly executing an import statement can seriously
affect performance in some circumstances.
 
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Ove
rhead
 
Just wanted to clarify that one needn't fret about the same utility module
being imported from two modules or anything like that.
 
Karl
-- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/google-appengine/-/7FTyQ34tagsJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Karl Rosaen
Kenneth, 

Agreed that running on the master / slave datastore is a liability these 
days (which IMO is pretty irresponsible on GAEs part, I see no reason why 
major latency spikes can't be avoided even if there are advantages to the 
high replication datastore and we can eventually migrate. If we must 
migrate, give us a deadline, and fully support the master/slave until then. 
 but I digress...) but the DEEs we're talking about here are happening 
during warmup unrelated to datastore operations, and could effect anyone 
regardless of master/slave vs HR.  For these cases I think Brandon's advice 
is spot on: lower your max pending latency to 1s or so.

Usually when we get hit with DEEs related to being on master / slave, I can 
see a corresponding spike in the system status dashboard:

http://code.google.com/status/appengine/detail/datastore/2012/01/16#ae-trust-detail-datastore-get-latency

and at least I know what's going on.  Actually, despite my whining, the 
last time we were hit by DEEs related to master/slave datastore latency 
spikes was over a month ago - GAE folks if you are listening and you have 
been trying to improve the reliability of master / slave, thank you.  I 
think when we were getting pounded by DEEs last week I started to freak out 
and pointed the finger at master / slave latency prematurely when it was 
really related to DEEs during warmup.

Anyways, thanks to Brandon, I also know what's going on in the case of DEEs 
during warmup too!

Best,
Karl

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/vpGUvTpkovgJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Karl Rosaen
Brandon, thanks so much for taking the time to put together the video, very 
helpful.

The key insight seem to be: time spent in the queue waiting for a frontend 
counts towards the limit for a DeadlineExceedError.  This seems silly - 
seems to me user visible latency, and framework level timeout enforcement 
should be decoupled in this case.  But good insight and glad to better 
understand this behavior.  This also makes me wonder what the benefit of 
having 'auto' for max pending latency would ever be - I'm going to slide 
mine down to ~1s.

One quibble about your advice for 'avoid importing code more than once': 
this shouldn't be a major issue in python unless you are importing a module 
from within a function that is called several times:

Although Python's interpreter is optimized to not import the same module 
> multiple times, repeatedly executing an import statement can seriously 
> affect performance in some circumstances.


http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead

Just wanted to clarify that one needn't fret about the same utility module 
being imported from two modules or anything like that.

Karl

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/7FTyQ34tagsJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Kenneth
Are we talking master slave or high replication datastores?  If we're 
talking master slave, and I'm pretty certain we are, then forget it. 
 Google isn't going to help you.  You need to migrate to hrd to avoid these 
problems.  My life was hell with deadline errors.  Then I bit the bullet 
and migrated and not a single timeout error since.

If you are on hrd then I'm at a loss.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/9CIJs6DwoUwJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



RE: [google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Brandon Wirtz
We may have to get someone from GOOG to weigh in.

Requests have a 60s time limit, but from testing it seems that
Initialization has a 15s limit.  I don't know if this is a bug, a feature,
or a flaw in my testing.

Also my "best practices" doesn't guarantee to fix everything, there could
still be a bug, but I was hoping that be eliminating the obvious causes we
might get closer to resolution.



-Original Message-
From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Alexis
Sent: Monday, January 16, 2012 1:30 AM
To: Google App Engine
Subject: [google-appengine] Re: Why are several production issues related to
DeadlineExceededErrors being ignored?

Thanks for the shot!

Here are some comments:

- isn't the limit 60sec instead of 15sec?

- We indeed have the pending latency slider set to default, but out of our
12 instances we have 3 resident ones, so the slider will have little effect
on our application's performance.

- Requests that DEE typically have this signature:
ms=63498 cpu_ms=1097 api_cpu_ms=0 cpm_usd=0.030639 loading_request=1
pending_ms=373 exit_code=104
So it did not spent much time waiting to be served... And the DEE is raised
during the import phase.

- Our warmup requests, that import most of our modules then return a simple
"ok" string, take in average 2100ms to complete (min 860, max 3800).
But during DEE spikes they can also raise DEE.

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-16 Thread Alexis
Thanks for the shot!

Here are some comments:

- isn't the limit 60sec instead of 15sec?

- We indeed have the pending latency slider set to default,
but out of our 12 instances we have 3 resident ones, so the slider
will have little effect on our application's performance.

- Requests that DEE typically have this signature:
ms=63498 cpu_ms=1097 api_cpu_ms=0 cpm_usd=0.030639 loading_request=1
pending_ms=373 exit_code=104
So it did not spent much time waiting to be served... And the DEE is
raised during the import phase.

- Our warmup requests, that import most of our modules then return a
simple "ok" string, take in average 2100ms to complete (min 860, max
3800).
But during DEE spikes they can also raise DEE.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-15 Thread Cezary Wagner
I think that is big problem since GAE that no handle problem with
customer experience of start-ups/low traffic application it kills
traffic expansion.

First instance loads to slow to server first customer - result is drop
with 500 status customer goes away that should not occur since it
block traffic expansion.

One method I know to deal with it is to write pinger which generate
traffic to keep-a-live instance but it against good architecture
design GAE should allow the same.

Another method is always on but it need to be paid - better is write
pinger with use cron (it should be called in time less than 15min) on
GAE or other server.

Pinger allow also to trace availability - you could measure if all
pings occurs.

On 14 Sty, 23:15, "Brandon Wirtz"  wrote:
> Do you have a warm up handler configured in your Yaml?
> If you don't then the new instance has to warm up and handle a request.
> Specifying a Warm up that simply initializes some variables and  logs an
> event "Warm up complete".
>
> Should fix your issue.
>
> I don't think you have "Platform issues" I think you have Google hasn't
> documented all best practices issues.
>
> From: google-appengine@googlegroups.com
> [mailto:google-appengine@googlegroups.com] On Behalf Of Karl Rosaen
> Sent: Saturday, January 14, 2012 6:26 AM
> To: google-appengine@googlegroups.com
> Subject: Re: [google-appengine] Why are several production issues related to
> DeadlineExceededErrors being ignored?
>
> Thanks Brandon.  Many of the DeadlineExceededErrors were occurring during
> warmup requests, according to the stacktraces, during python import
> statements.  I upped the number of idle instances in an attempt to mitigate
> this sort of thrashing, and your advice makes sense for this case.  Our
> pending latency is set to 'Automatic' on both ends.
>
> I'm attaching some graphs from the period when this was the worst
>
> Instances:
>
>  m78Mc08/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.08.59%252 5
> 20AM.png>
>
> Requests per second:
>
>  YssPK_4/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.17.39%252 5
> 20AM.png>
>
> Milliseconds per request:
>
>  uPvgw50/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.09.41%252 5
> 20AM.png>
>
> This suggests that some higher latency handlers were hit (some people were
> editing content), taking up the existing front end instances, after which
> GAE was trying to spin up some dynamic instances to serve other requests.
> But during warmup, there were DeadelineExceededErrors during file imports,
> suggesting that the dynamic instances aren't being given enough time to
> warmup.
>
> Increasing the idle instances helps.  So perhaps the revised question, at
> least for our particular situation is: why, under load, do the dynamic
> instances timeout during warmup?  That seems to compound the problem as the
> dynamic instances aren't able to serve the requests that are backed up,
> leading to user visible 500 errors, and more attempts to dynamically load
> instances.
>
> Does my theory have any holes?  Is relying on dynamic instances to handle
> spikes without 500 errors unrealistic?  I know the docs state, "A smaller
> number of idle Instances means your application costs less to run, but may
> encounter more startup latency during load spikes." but thrashing on
> DeadlineExceededErrors during warmup seems to indicate that dynamic
> instances can't be relied upon for load spikes at all right now.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web 
> visithttps://groups.google.com/d/msg/google-appengine/-/bYRgRhlKZjoJ.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group 
> athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-14 Thread Alexis
+1

On 14 jan, 03:54, GAEfan  wrote:
> +1

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-13 Thread GAEfan
+1

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Why are several production issues related to DeadlineExceededErrors being ignored?

2012-01-13 Thread Olivier
+1

On Jan 13, 11:21 am, Karl Rosaen  wrote:
> We've been hit by random DeadlineExceededErrors in the past 36 hours, so I
> filed a production 
> ticket:http://code.google.com/p/googleappengine/issues/detail?id=6729
>
> it's disconcerting to see that several similar issues have been left
> ignored over the past few weeks:
>
> http://code.google.com/p/googleappengine/issues/detail?id=6629http://code.google.com/p/googleappengine/issues/detail?id=6688http://code.google.com/p/googleappengine/issues/detail?id=6701http://code.google.com/p/googleappengine/issues/detail?id=6707
>
> Could someone please help?

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.