Hi Kaan,
I had exactly same issue (and maybe still have, as there is no easy way to 
find it). This caused us a huge problem as we were migrating our namespaces 
to new dataschema and we only found out after some namespaces stopped 
working after migration.
Upon examination we found out that that tasks didn't run for these 
namespaces, there were no error logs and I don't know how many of these 
missed tasks we had before and after.

This is scary. It happens during bursts when we run many tasks at once. The 
major problem is the absence of logs.
It is difficult to reproduce and even more difficult to catch them.

Task queues are offered as a guaranteed task execution solution, but this 
shows they are not.

Best,
HG


On Tuesday, February 24, 2015 at 2:48:34 PM UTC, Jason Collins wrote:
>
> Hi Kaan,
>
> Just lending support to this:
>
> engineer told me that internal/silent/invisible execution failures also 
>> count as task retries
>
>
> I have also been told this. We have an open-source framework for doing 
> workflows based on task queue and one of the features we tried to build in 
> to it was an alerting system when the final task retry failed (and thus the 
> task would go away permanently). We had to give up on the feature because 
> it was possible (and we saw it) for tasks to fail before ever hitting our 
> code, and this counts as a retry, and if it happened to be on the last 
> retry, then our alert code wouldn't get a chance to operate. 
>
> In short, it would have been an unreliable feature. Note, this was a long 
> time ago though (3+ years).
>
> j
>
> On Monday, 23 February 2015 23:49:44 UTC-6, Kaan Soral wrote:
>>
>> If there was any logs, it wouldn't be an issue, I would likely solved the 
>> issue on the logs, the issue is there are no logs
>>
>> With this recent issue, the only error logs are some instances dying 
>> critically, from memory overflows's (after going over 256mb's - my theory 
>> is a ndb memory leak, but it's offtopic)
>> (There are 5-6 error logs like these, all instance overflows, since there 
>> are tens of thousands of other operations, the error ratio is low)
>>
>> In this specific case, the critical death of instances might be related 
>> to the issue, those instances might be taking some tasks with them, and 
>> leave the tasks unexecuted (theory)
>>
>> I will dig in and find the discussion
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/0cbf6d1e-a898-4bb1-be38-2cfe46975d96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to