[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
@Kaan, thank you for continuing to respond. I think it's time to move beyond this thread and take action to have the potential issue looked at properly. If you think you've found an issue, you should create a Public Issue Tracker thread https://code.google.com/p/googleappengine/issues/list with one of the following: 1. A timeframe on an affected instance, or 2. An example that can be used to reproduce, or 3. At least your custom task monitoring solution and leave the creation of a massive burst of tasks to the engineer that attempts to reproduce 4. You could even, in theory, request the thread you create to be set to private so you can upload your whole codebase The bottom line is this: no issue can be looked at or worked on, if it is claimed by somebody who saw it in the wild, until it's reproducible or at least has some kind of trace in logs somewhere. We're happy to look up logs on an affected version/module/instance/etc. during a given timeframe, and try to determine what happened, if anything indeed happened. Either of these will suffice, but anything less is not really valid when reporting a bug. Every software project in the galaxy beyond hobbyist-size will have these requirements for bug reporting. At present, no viable data has been produced on how to observe the behaviour claimed to occur. All we have is the fact that a custom implementation of task-counting produced the numbers 11476 and 11507 as output. I hope you can see the gap between what's produced and what's needed. If nobody who has observed the issue (if indeed there is such an issue to observe) and wants to see a fix (I assume this is you) wants to point towards a reproducing example, the very least information requested was a time-frame on an affected instance. All that's needed is to say: On my project primal-buttress-1337, instance 1 of module devel, version v1, began enqueuing 1,000,000 tasks at [TIMESTAMP], and you can see a log line for each task being added, up until the last task is added at [TIMESTAMP]. The last task was executed at [TIMESTAMP]. The number of Flobbb entities created in the Datastore is only 999,998. Therefore I believe this indicates 2 tasks were enqueued but never executed. There are no records in the logs of any task failures. If and when such information is produced in a Public Issue Tracker thread, you'll be pleased to see it taken up, worked on, and a solution will come in a course of time appropriate to the severity of the issue. Up to this point, such viable data has not been received, and hence the deadlocked nature of this thread. Please proceed to open a Public Issue Tracker thread with the information necessary to report an issue. On Sunday, March 1, 2015 at 1:23:12 PM UTC-5, Kaan Soral wrote: I just ran my tasks, in 2 stages In the first stage, task sequences executed as expected, I only noticed a single additional incoming task, thought it was a glitch / repetition in the counting logic I also experienced a ndb.delete_multi lock issue that is not related to the taskqueue, but works to count/combine some side metrics, this deadlock prevented that operation, caused some data loss, yet the taskqueue's were unaffected (first time I experienced this issue) I just ran another set of task sequences, this time I noticed a significant increase in additionally executed tasks Workers Out: 11476 In: 11507 31 tasks re-executed, since these tasks are unique by name, they don't cause task explosions, yet I verify that these additional tasks really executed, because the total number of elements visited are proportionally increased (I don't see any task error logs, these issues / executions are all silent) To sum up, I didn't notice any missing tasks this time around, yet I noticed some invisibly re-executed tasks To sum up this entire flurry, the main issue is that, tasks re-execute, fail silently, these are all counted as retries, yet they might happen 100 times and count as 100 retries, these retry's should actually be user failures and not internal ones -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/057d423c-95f8-4f61-b7e7-d48c68f61c1e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
I just ran my tasks, in 2 stages In the first stage, task sequences executed as expected, I only noticed a single additional incoming task, thought it was a glitch / repetition in the counting logic I also experienced a ndb.delete_multi lock issue that is not related to the taskqueue, but works to count/combine some side metrics, this deadlock prevented that operation, caused some data loss, yet the taskqueue's were unaffected (first time I experienced this issue) I just ran another set of task sequences, this time I noticed a significant increase in additionally executed tasks Workers Out: 11476 In: 11507 31 tasks re-executed, since these tasks are unique by name, they don't cause task explosions, yet I verify that these additional tasks really executed, because the total number of elements visited are proportionally increased (I don't see any task error logs, these issues / executions are all silent) To sum up, I didn't notice any missing tasks this time around, yet I noticed some invisibly re-executed tasks To sum up this entire flurry, the main issue is that, tasks re-execute, fail silently, these are all counted as retries, yet they might happen 100 times and count as 100 retries, these retry's should actually be user failures and not internal ones -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/2e55afcf-cf1e-41d3-8fbd-894681cbbb52%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
For me the way I noticed it was by chance, also it seems it happened many times unnoticed. I was running a task for each namespace and each task would send an email. I noticed I received less emails than number of namespaces. I looked at logs there and couldn't find any errors nor even any evidence of these tasks ever executing, also i am sure they were sent to queue. As for max retries param, I believe I had a max set to 2-3. Thanks On Friday, February 27, 2015 at 12:02:07 AM UTC, paynen wrote: In addition, re: anybody experiencing this issue, it would be very helpful to include your queue config file, to make sure whether you're specifying any kind of max retries parameter. On Thursday, February 26, 2015 at 8:54:02 AM UTC-5, husayt wrote: Hi @paynen, this is the problem. It's almost not possible to replicate externally, as it happens somewhere in internal appengine stack. and the main problem, as also explained by Kaan, it never hits logs. So there is not much we can do here as GAE users. This can be replicated only with access to internals of GAE. Can I also stress, that this is the number one issue on my list. I had a support case created and it didn't go forward because I couldn't replicate the problem. One thing I can say it more likely to happen when we have bursts of taks. Hope this helps, HG On Wednesday, February 25, 2015 at 10:52:06 PM UTC, paynen wrote: If anybody reading other than OP is also affected by this and can provide minimally a reproducing example or an affected timeframe on a given instance, this will be the minimum information needed to look into a potential issue. I'm continuing to monitor this thread, and I hope we can get this addressed as soon as possible, as soon as it's demonstrated/repro'd. On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote: rate: 500/s bucket_size: 100 retry_parameters: task_retry_limit: 6 min_backoff_seconds: 2 max_doublings: 3 Although my queue configuration is broad enough to handle occasional internal failures, I noticed and verified that the taskqueue leaves some tasks unexecuted ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] - happens both with normal instances and basic_scaling/B4 instances ) I first noticed the issue when some operations that should have done were left undone Than I inspected the taskqueue execution with a custom routine that tracks / counts ingoing and executing tasks, a routine that I perfected long ago, and noticed the missing executions The issue isn't persistent, after a re-deployment and re-test, the same routine managed to traverse all the entities as it's supposed to TL;DR - some taskqueue tasks silently fail to execute, this should never happen, but it happens very frequently without any reason, causes damage and confusion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/275d8af0-b49f-4157-ab3a-164c82e57693%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Hi @paynen, this is the problem. It's almost not possible to replicate externally, as it happens somewhere in internal appengine stack. and the main problem, as also explained by Kaan, it never hits logs. So there is not much we can do here as GAE users. This can be replicated only with access to internals of GAE. Can I also stress, that this is the number one issue on my list. I had a support case created and it didn't go forward because I couldn't replicate the problem. One thing I can say it more likely to happen when we have bursts of taks. Hope this helps, HG On Wednesday, February 25, 2015 at 10:52:06 PM UTC, paynen wrote: If anybody reading other than OP is also affected by this and can provide minimally a reproducing example or an affected timeframe on a given instance, this will be the minimum information needed to look into a potential issue. I'm continuing to monitor this thread, and I hope we can get this addressed as soon as possible, as soon as it's demonstrated/repro'd. On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote: rate: 500/s bucket_size: 100 retry_parameters: task_retry_limit: 6 min_backoff_seconds: 2 max_doublings: 3 Although my queue configuration is broad enough to handle occasional internal failures, I noticed and verified that the taskqueue leaves some tasks unexecuted ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] - happens both with normal instances and basic_scaling/B4 instances ) I first noticed the issue when some operations that should have done were left undone Than I inspected the taskqueue execution with a custom routine that tracks / counts ingoing and executing tasks, a routine that I perfected long ago, and noticed the missing executions The issue isn't persistent, after a re-deployment and re-test, the same routine managed to traverse all the entities as it's supposed to TL;DR - some taskqueue tasks silently fail to execute, this should never happen, but it happens very frequently without any reason, causes damage and confusion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/0b23d4fb-28f3-4af7-af58-447a51c5b31f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Hi @husayt, Of course, I've been following this thread and understand the issue doesn't appear in logs directly. I guess I'm just wondering how you've managed to determine that this is happening if there isn't any trace anywhere... If there is such a trace, I'd appreciate if you could get it to us, as it's necessary to look into it. Whether that's an affected timeframe on a given instance, some logs demonstrating the issue (@Kaan above mentioned 5xx spikes in their logs, you could try to also make calls to the REST API to demonstrate that X tasks were enqueued, but demonstrate only X - N finished, thus meaning N went missing), or a minimally-reproducing example. Access to the internals of GAE is our specialty ;) We just need a little help from you in pointing where to look, since there are a lot of internals to look at. So, do you have the info required? Again, please note that an affected timeframe on a given instance (app id, version + module) are potentially enough info. This is a high priority for us if it's a high priority for you. Regards, NP On Thursday, February 26, 2015 at 8:54:02 AM UTC-5, husayt wrote: Hi @paynen, this is the problem. It's almost not possible to replicate externally, as it happens somewhere in internal appengine stack. and the main problem, as also explained by Kaan, it never hits logs. So there is not much we can do here as GAE users. This can be replicated only with access to internals of GAE. Can I also stress, that this is the number one issue on my list. I had a support case created and it didn't go forward because I couldn't replicate the problem. One thing I can say it more likely to happen when we have bursts of taks. Hope this helps, HG On Wednesday, February 25, 2015 at 10:52:06 PM UTC, paynen wrote: If anybody reading other than OP is also affected by this and can provide minimally a reproducing example or an affected timeframe on a given instance, this will be the minimum information needed to look into a potential issue. I'm continuing to monitor this thread, and I hope we can get this addressed as soon as possible, as soon as it's demonstrated/repro'd. On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote: rate: 500/s bucket_size: 100 retry_parameters: task_retry_limit: 6 min_backoff_seconds: 2 max_doublings: 3 Although my queue configuration is broad enough to handle occasional internal failures, I noticed and verified that the taskqueue leaves some tasks unexecuted ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] - happens both with normal instances and basic_scaling/B4 instances ) I first noticed the issue when some operations that should have done were left undone Than I inspected the taskqueue execution with a custom routine that tracks / counts ingoing and executing tasks, a routine that I perfected long ago, and noticed the missing executions The issue isn't persistent, after a re-deployment and re-test, the same routine managed to traverse all the entities as it's supposed to TL;DR - some taskqueue tasks silently fail to execute, this should never happen, but it happens very frequently without any reason, causes damage and confusion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/6c46efeb-d975-4cc9-bf52-b44a39fc3d1e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
I have a large mapping task I need to run soon, I will report with more detail if I detect the issue on that execution I can't try to reproduce the issue, mainly because I don't have much funds to spare, both time and money-wise, however I will help as the issue arises, and I really appreciate your help and interest in the issue I would like to humbly suggest a general method I use a lot instead: 1) Implement a task burst routine, that bursts tasks with different configurations, names, no-names, repetitions, loads, no-loads, run them on different retry queues (the tasks can be error-free / simple tasks, as task errors are not the issue here) 2) Implement a counting routine / a routine to make sure the tasks executed as they should 3) Implement extreme scenarios, for example natural instance deaths etc. that happens during these task bursts 4) Run this routine, at different times, on different natural production environments (as it seems to me, there are good environments, and there are bad ones, and we get random environments instead, this is just an impression) (I would like to humbly suggest a simple approach for all critical appengine services, like datastore transactions for example, I noticed there is a small possibility transactions re-execute, but I don't bring it up yet, as it's impossible to track and report, yet from my internal checks, I regularly get flags of routines that re-execute, just an fyi on another similar appengine anomaly, these two issues are also similar, they both happen on bursts and non-deterministically, a glitchy transaction doesn't do much damage, yet a taskqueue one does, as queue executions are more vital in most-scenarios) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/fda827fd-5d48-49db-a4ba-563b471049f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
In addition, re: anybody experiencing this issue, it would be very helpful to include your queue config file, to make sure whether you're specifying any kind of max retries parameter. On Thursday, February 26, 2015 at 8:54:02 AM UTC-5, husayt wrote: Hi @paynen, this is the problem. It's almost not possible to replicate externally, as it happens somewhere in internal appengine stack. and the main problem, as also explained by Kaan, it never hits logs. So there is not much we can do here as GAE users. This can be replicated only with access to internals of GAE. Can I also stress, that this is the number one issue on my list. I had a support case created and it didn't go forward because I couldn't replicate the problem. One thing I can say it more likely to happen when we have bursts of taks. Hope this helps, HG On Wednesday, February 25, 2015 at 10:52:06 PM UTC, paynen wrote: If anybody reading other than OP is also affected by this and can provide minimally a reproducing example or an affected timeframe on a given instance, this will be the minimum information needed to look into a potential issue. I'm continuing to monitor this thread, and I hope we can get this addressed as soon as possible, as soon as it's demonstrated/repro'd. On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote: rate: 500/s bucket_size: 100 retry_parameters: task_retry_limit: 6 min_backoff_seconds: 2 max_doublings: 3 Although my queue configuration is broad enough to handle occasional internal failures, I noticed and verified that the taskqueue leaves some tasks unexecuted ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] - happens both with normal instances and basic_scaling/B4 instances ) I first noticed the issue when some operations that should have done were left undone Than I inspected the taskqueue execution with a custom routine that tracks / counts ingoing and executing tasks, a routine that I perfected long ago, and noticed the missing executions The issue isn't persistent, after a re-deployment and re-test, the same routine managed to traverse all the entities as it's supposed to TL;DR - some taskqueue tasks silently fail to execute, this should never happen, but it happens very frequently without any reason, causes damage and confusion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/60a43deb-1899-4df8-ae8d-96204af1448d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
If anybody reading other than OP is also affected by this and can provide minimally a reproducing example or an affected timeframe on a given instance, this will be the minimum information needed to look into a potential issue. I'm continuing to monitor this thread, and I hope we can get this addressed as soon as possible, as soon as it's demonstrated/repro'd. On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote: rate: 500/s bucket_size: 100 retry_parameters: task_retry_limit: 6 min_backoff_seconds: 2 max_doublings: 3 Although my queue configuration is broad enough to handle occasional internal failures, I noticed and verified that the taskqueue leaves some tasks unexecuted ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] - happens both with normal instances and basic_scaling/B4 instances ) I first noticed the issue when some operations that should have done were left undone Than I inspected the taskqueue execution with a custom routine that tracks / counts ingoing and executing tasks, a routine that I perfected long ago, and noticed the missing executions The issue isn't persistent, after a re-deployment and re-test, the same routine managed to traverse all the entities as it's supposed to TL;DR - some taskqueue tasks silently fail to execute, this should never happen, but it happens very frequently without any reason, causes damage and confusion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/b97a79e6-cb98-4b34-b9c6-59b76efea41c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Hi Kaan, Just lending support to this: engineer told me that internal/silent/invisible execution failures also count as task retries I have also been told this. We have an open-source framework for doing workflows based on task queue and one of the features we tried to build in to it was an alerting system when the final task retry failed (and thus the task would go away permanently). We had to give up on the feature because it was possible (and we saw it) for tasks to fail before ever hitting our code, and this counts as a retry, and if it happened to be on the last retry, then our alert code wouldn't get a chance to operate. In short, it would have been an unreliable feature. Note, this was a long time ago though (3+ years). j On Monday, 23 February 2015 23:49:44 UTC-6, Kaan Soral wrote: If there was any logs, it wouldn't be an issue, I would likely solved the issue on the logs, the issue is there are no logs With this recent issue, the only error logs are some instances dying critically, from memory overflows's (after going over 256mb's - my theory is a ndb memory leak, but it's offtopic) (There are 5-6 error logs like these, all instance overflows, since there are tens of thousands of other operations, the error ratio is low) In this specific case, the critical death of instances might be related to the issue, those instances might be taking some tasks with them, and leave the tasks unexecuted (theory) I will dig in and find the discussion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/db137884-3510-4042-909a-7781f5283d9d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
You can't see these issues from the logs, yet you can see them as 5xx spikes from the error *graphs*/dashboard (most of the time) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/ae6947f5-0ec1-4011-8efa-3e9361a6e781%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Thanks a lot for the support, j and HG, it's great to hear I'm not the only one Jason, it makes sense to let some queues retry indefinitely, I initially had only 1 retries, as I thought those retries for user failures, than I moved onto my current config, as it's extremely unlikely that 6 retries with backoffs could be exhausted, however it seems they can be, similarly, there is a possibility that, the issue might also occur with indefinite retries too, it's still not 100% certain internal retry failures are the cause, although several engineers claimed they were (I generally try to avoid indefinite retries, as they might produce serious side effects) The conversation that I learned about the retry behaviour was with: hanssens[t]google.com titled TaskQueue Followup on 3/22/14 (He was very helpful, thanks) You can't see these issues from the logs, yet you can see them as 5xx spikes from the error logs (most of the time) (By the way, this specific instance of the issue cost me almost 200$'s - I'm constantly running and inspecting routines that trigger many entities, costs a lot, I initially decided to use basic_scaling to prevent instance/cost spikes, limit instances, yet basic_scaling made the issue worse somehow) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/047eaae6-f6b4-44ad-a95a-b993ab2a4aad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
basic_scaling made the issue worse somehow So I suspect that the silent failures occur when the dispatcher is attempting to service a dequeued task, but cannot find an available instance (nor spin one up) in some unknown period of time. Just a guess though. j On Tuesday, 24 February 2015 12:45:18 UTC-6, Kaan Soral wrote: You can't see these issues from the logs, yet you can see them as 5xx spikes from the error *graphs*/dashboard (most of the time) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/39b982b5-9e9d-4c73-8775-7642933e78ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Hi Kaan, I had exactly same issue (and maybe still have, as there is no easy way to find it). This caused us a huge problem as we were migrating our namespaces to new dataschema and we only found out after some namespaces stopped working after migration. Upon examination we found out that that tasks didn't run for these namespaces, there were no error logs and I don't know how many of these missed tasks we had before and after. This is scary. It happens during bursts when we run many tasks at once. The major problem is the absence of logs. It is difficult to reproduce and even more difficult to catch them. Task queues are offered as a guaranteed task execution solution, but this shows they are not. Best, HG On Tuesday, February 24, 2015 at 2:48:34 PM UTC, Jason Collins wrote: Hi Kaan, Just lending support to this: engineer told me that internal/silent/invisible execution failures also count as task retries I have also been told this. We have an open-source framework for doing workflows based on task queue and one of the features we tried to build in to it was an alerting system when the final task retry failed (and thus the task would go away permanently). We had to give up on the feature because it was possible (and we saw it) for tasks to fail before ever hitting our code, and this counts as a retry, and if it happened to be on the last retry, then our alert code wouldn't get a chance to operate. In short, it would have been an unreliable feature. Note, this was a long time ago though (3+ years). j On Monday, 23 February 2015 23:49:44 UTC-6, Kaan Soral wrote: If there was any logs, it wouldn't be an issue, I would likely solved the issue on the logs, the issue is there are no logs With this recent issue, the only error logs are some instances dying critically, from memory overflows's (after going over 256mb's - my theory is a ndb memory leak, but it's offtopic) (There are 5-6 error logs like these, all instance overflows, since there are tens of thousands of other operations, the error ratio is low) In this specific case, the critical death of instances might be related to the issue, those instances might be taking some tasks with them, and leave the tasks unexecuted (theory) I will dig in and find the discussion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/0cbf6d1e-a898-4bb1-be38-2cfe46975d96%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
If you're looking for guaranteed task execution, you probably don't want to specify max retries or max age. But regardless, the fact (if it is still a fact) that a task with max retries might never hit your code before disappearing is still a pretty unfortunate thing. j On Tuesday, 24 February 2015 11:20:55 UTC-6, husayt wrote: Hi Kaan, I had exactly same issue (and maybe still have, as there is no easy way to find it). This caused us a huge problem as we were migrating our namespaces to new dataschema and we only found out after some namespaces stopped working after migration. Upon examination we found out that that tasks didn't run for these namespaces, there were no error logs and I don't know how many of these missed tasks we had before and after. This is scary. It happens during bursts when we run many tasks at once. The major problem is the absence of logs. It is difficult to reproduce and even more difficult to catch them. Task queues are offered as a guaranteed task execution solution, but this shows they are not. Best, HG On Tuesday, February 24, 2015 at 2:48:34 PM UTC, Jason Collins wrote: Hi Kaan, Just lending support to this: engineer told me that internal/silent/invisible execution failures also count as task retries I have also been told this. We have an open-source framework for doing workflows based on task queue and one of the features we tried to build in to it was an alerting system when the final task retry failed (and thus the task would go away permanently). We had to give up on the feature because it was possible (and we saw it) for tasks to fail before ever hitting our code, and this counts as a retry, and if it happened to be on the last retry, then our alert code wouldn't get a chance to operate. In short, it would have been an unreliable feature. Note, this was a long time ago though (3+ years). j On Monday, 23 February 2015 23:49:44 UTC-6, Kaan Soral wrote: If there was any logs, it wouldn't be an issue, I would likely solved the issue on the logs, the issue is there are no logs With this recent issue, the only error logs are some instances dying critically, from memory overflows's (after going over 256mb's - my theory is a ndb memory leak, but it's offtopic) (There are 5-6 error logs like these, all instance overflows, since there are tens of thousands of other operations, the error ratio is low) In this specific case, the critical death of instances might be related to the issue, those instances might be taking some tasks with them, and leave the tasks unexecuted (theory) I will dig in and find the discussion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/dad9e53b-c5e5-461e-8ca9-82acff11ed50%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
If there was any logs, it wouldn't be an issue, I would likely solved the issue on the logs, the issue is there are no logs With this recent issue, the only error logs are some instances dying critically, from memory overflows's (after going over 256mb's - my theory is a ndb memory leak, but it's offtopic) (There are 5-6 error logs like these, all instance overflows, since there are tens of thousands of other operations, the error ratio is low) In this specific case, the critical death of instances might be related to the issue, those instances might be taking some tasks with them, and leave the tasks unexecuted (theory) I will dig in and find the discussion -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/33cb9edb-60cd-4174-9519-36b3bee5e4c2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
Hey Kaan, That sounds like a serious problem. Is it possible for you to share some logs that demonstrate the issue occurring, or better yet a minimally-reproducing example? This is vital to identifying and fixing a possible issue on the platform. Secondly, could you possibly point to the issue ticket # or forum thread where the discussion with the engineer occurred? I'm interested to see the full context of whether this is a known issue or something else. Regards, paynen On Monday, February 23, 2015 at 6:49:57 PM UTC-5, Kaan Soral wrote: I also experienced this same issue previously, an engineer told me that internal/silent/invisible execution failures also count as task retries, which is illogical and likely a design/logic fault - That's why I increased my retry parameters after that issue/research-flurry and didn't experience task loss up until now, now even with the increased retry parameters, I'm experiencing this issue, on multiple instance/scaling configurations, it's killing me slowly -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/705f171a-e394-46a2-bf82-5d18bfa74f25%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: [SEVERE] App Engine can't consistently execute tasks
I also experienced this same issue previously, an engineer told me that internal/silent/invisible execution failures also count as task retries, which is illogical and likely a design/logic fault - That's why I increased my retry parameters after that issue/research-flurry and didn't experience task loss up until now, now even with the increased retry parameters, I'm experiencing this issue, on multiple instance/scaling configurations, it's killing me slowly -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/db90c598-2c01-4936-a276-9c6bae345ee2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.