Hey Kaan, I'll echo what Jesse has said about the new efforts in place to provide closer work between the community of developers and Cloud Platform Support, and I look forward to the good discussions that can be had here, as well as working together on stackoverflow and the public issue tracker to make the best use of those forums.
Thanks for taking the time to bring up some issues you've been seeing. In regards to each of these issues, I'll enumerate them from one to four, according to the order they appeared in your post. I'll discuss my impression of what the issue may be, or what information is missing in order to make a good issue report. I'll also generally comment with some advice on where to move next in getting some support eyes on any potential issues. 1. "Request was aborted after waiting too long to attempt to service your request" - If you've observed log lines with this error appearing when you have a number of tasks in-queue which seem to overload the processing power of your available instances, this may indicate a platform issue or it may also indicate an issue in your own app's config/code, although it's not possible to tell without more details, such as the following: * the .yaml/.xml config files (mostly the scaling settings are of interest) * a brief description of what the system was doing, tending to prefer code snippets, numbers, code, and logs to brief informal verbal description * a time-frame and name of an affected instance - With such details, an adequate issue report can be created and dealt with in the public issue tracker <https://code.google.com/p/googleappengine/issues/list>, or a valid stack overflow question <http://stackoverflow.com/questions/tagged/google-app-engine> can be created, depending on whether you perceive it to be a platform or user code issue. 2. "google.appengine.api.taskqueue.taskqueue.TransientError" - As documented here <https://cloud.google.com/appengine/docs/python/taskqueue/overview-pull#Python_Leasing_tasks>, it's possible this can happen when using Pull queues. This can be, as you correctly observe, related to rate-limiting in the infrastructure, although you feel the details of how rates are set are not sufficiently documented. It's likely that this derives from attempting to lease_tasks() from the queue too often, but it's true that we can't be sure. - I definitely understand you here and encourage you to create a public issue tracker thread which can be starred by other users to demonstrate an interest in more detailed documentation around this limit. I - In the meantime, where we still need to be able to handle these errors on a platform which does allow you to scale up aggressively, in the context of a data-center (network) with shared but well-isolated and ample resources, error-responses such as these will occur periodically. A well-scalable app can ride out transient errors and rate-limiting with a small application of exponential back-off, non-spiking, etc. I encourage you to take the advice of the docs and attempt to rate-limit when you see this error, as it's likely the lease_rate() per queue is too fast. - If you find that a behaviour still appears anomalous to you - that is to say if a behaviour of the system seems out of sync with the documented behaviour - then you should open a public issue tracker <https://code.google.com/p/googleappengine/issues/list> issue with sufficient information to allow investigation. If the issue report contains sufficient information, it will be likely to produce a positive result, and quickly. 3. "DeadlineExceededError" - This issue can also occur by the same cause as for 2., and it's worth investigating. My advice again is to create a public issue tracker issue as soon as you notice something that you perceive to be anomalous about the behaviour of any App Engine system. 4. "push/pull queue anomalies" - I'm unsure what you mean by this, although as above, if you feel there's an issue on the platform, I want to encourage you to report it adequately, as we're here and happy to So, to conclude, once each of these issues you bring up can be investigated along with the documented behaviour and, if necessary, can be developed into a proper issue report for the platform, the public issue tracker issue you create will be picked up and brought to the attention of platform developers / engineers / support. If, rather than a platform issue, it looks like the issue is related to your specific use of the services on the platform, you should rather create a stackoverflow question on the related tags, to get support in that form. Finally, to address what you say in parentheses before the end of "------", it's definitely possible to implement easing and rate-limiting on pull queue task execution, since the frequency of task lease/execution is tune-able in whatever timing logic you set up. For push queues, to implement easing, you can define a stepped gradient of queues with different configured processing rates, bucket sizes, etc. <https://cloud.google.com/appengine/docs/java/config/queue#Defining_Push_Queues_and_Processing_Rates>, and have the task-adding logic read the current state of fullness in the various queues (you can store information about the queue fullness/rate, etc. in Memcache or Datastore, or just use API calls to the Task Queue API), possibly along with API calls to get the number of instances in the handler module <https://cloud.google.com/appengine/docs/python/modules/functions#get_num_instances>, to determine which queue to step up to / include in the rotation of queues which receive tasks (your discretion) when adding tasks with given payloads, etc. Using the basic building blocks, some complex timing logic can be implemented, and if you feel that you'd like to make a feature request such as "provide easing parameter in queue configuration", describing how it works, the place for feature requests is the public issue tracker <https://code.google.com/p/googleappengine/issues/list>. I hope you've come away from this feeling heard, and with a better understanding of where and how to get support with any issues you may encounter. I tried to address each of the issues you brought up to make sure you get useful information. Have a great day! - Nick On Tuesday, May 26, 2015 at 4:57:06 AM UTC-4, Kaan Soral wrote: > > I've been using App Engine for probably something like 5 years, I have one > major app that has been running for 5 years, it's very well polished, and > the traffic and behaviour of the app is very predictable *knocking on wood* > I have another app that I've been working on for 3 years, it didn't take > off yet, the new app is unpredictable in behaviour, it's vast and > unthrottled > > While the old app has been handling millions of requests without errors > and issues, the new one is failing on even the simplest tasks, the logs are > filled with TransientError's, instance failures, transaction failures, the > whole thing is chaotic > > The old app has throttled queues and basic tasks, the throughput is well > calibrated to complete all the tasks in 5 hours, using optimal amount of > instances, the traffic is regular, it eases in and eases out throughout the > day (without throttling, the old app was in similar state before) > The new app is built to perform, so it's queues have no limits, it trusts > App Engine to scale as much as it can > > Well turns out that trust isn't well placed, App Engine is supposed to > scale on it's own, yet when you leave the limits to the App Engine, it > fails to perform > You might ask: "Why would I use App Engine if I'm going to manually scale > the limits myself?" - That's a good question, If you're going to have to > adjust the limits and throughput manually while your app grows, you might > as well use AWS or a similar more reliable service > > This is mostly a rant post, but the advice is still solid, one has to > manually calibrate the throughput of routines to prevent app spikes, the > instance births and deaths should always be eased in and eased out, > otherwise various services of app engine fail to perform > On the bright side, throttling also reduces the costs significantly, so > it's a good idea to always keep an eye on the app and manually calibrate > all routines - on the other side, if your app gains additional traffic > without your supervision, these routines will hog and halt > > ------ > > On a more technical side, some of these errors are: > "Request was aborted after waiting too long to attempt to service your > request." - they come in 100's - flood the logs - these are taskqueue > push tasks, so the error is pretty stupid, if they can't be handled, they > should be left in the queue > "google.appengine.api.taskqueue.taskqueue.TransientError" - these are > from pull queues, there are invisible/untold limits of pull queues, this is > also very concerning, because if your app grows, your scale might be bound > by these limits, so try not to use pull queues too much > "DeadlineExceededError's" - these are pretty random and rare, yet when > you run thousands of tasks, you get these in your logs, they might be > omitted > Transactions errors and anomalies: these used to happen a lot, but I > switched to a pull queue based logic to prevent them, now they are replaced > by pull queue anomalies > > (It would have been great if limits and capacities of each service was > more transparent, and I really think taskqueues need some eased bucket > configurations, things that will help task batches to be executed in an > eased manner, currently the only way to achieve this is to put flat and low > throughput limits - similarly, same kind of control can be achieved on the > instance scheduler level) > > ------ > > Also, after 5 years, I gave up on app engine support, during a time we > used to get actual support from this google groups, currently it's just > random initial replies and no follow ups, so unless you are paying $500 or > something monthly for support, don't expect much support, you are on your > own to detect the issues and prevent them through experimentation and > volunteer help > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/241c324c-9bcf-4428-bb7b-e75727f90fe1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.