Hi @husayt,

Of course, I've been following this thread and understand the issue doesn't 
appear in logs directly. I guess I'm just wondering how you've managed to 
determine that this is happening if there isn't any trace anywhere... If 
there is such a trace, I'd appreciate if you could get it to us, as it's 
necessary to look into it. Whether that's an affected timeframe on a given 
instance, some logs demonstrating the issue (@Kaan above mentioned 5xx 
spikes in their logs, you could try to also make calls to the REST API to 
demonstrate that X tasks were enqueued, but demonstrate only X - N 
finished, thus meaning N went missing), or a minimally-reproducing example. 
Access to the internals of GAE is our specialty ;) We just need a little 
help from you in pointing where to look, since there are a lot of 
"internals" to look at.

So, do you have the info required? Again, please note that an affected 
timeframe on a given instance (app id, version + module) are potentially 
enough info. This is a high priority for us if it's a high priority for you.

Regards,

NP

On Thursday, February 26, 2015 at 8:54:02 AM UTC-5, husayt wrote:
>
> Hi @paynen,
> this is the problem. It's almost not possible to replicate externally, as 
> it happens somewhere in internal appengine stack.
>
> and the main problem, as also explained by Kaan, it never hits logs.
>
> So there is not much we can do here as GAE users. This can be replicated 
> only with access to internals of GAE.
>
> Can I also  stress, that this is the number one issue on my list. I had a 
> support case created and it didn't go forward because I couldn't replicate 
> the problem.
>
> One thing I can say it more likely to happen when we have bursts of taks.
>
>
> Hope this helps,
> HG
> On Wednesday, February 25, 2015 at 10:52:06 PM UTC, paynen wrote:
>>
>> If anybody reading other than OP is also affected by this and can provide 
>> minimally a reproducing example or an affected timeframe on a given 
>> instance, this will be the minimum information needed to look into a 
>> potential issue. 
>>
>> I'm continuing to monitor this thread, and I hope we can get this 
>> addressed as soon as possible, as soon as it's demonstrated/repro'd.
>>
>> On Monday, February 23, 2015 at 6:46:49 PM UTC-5, Kaan Soral wrote:
>>>
>>>   rate: 500/s
>>>>
>>>>   bucket_size: 100
>>>>
>>>>   retry_parameters:
>>>>
>>>>     task_retry_limit: 6
>>>>
>>>>     min_backoff_seconds: 2
>>>>
>>>>     max_doublings: 3
>>>>
>>>>
>>> Although my queue configuration is broad enough to handle occasional 
>>> internal failures, I noticed and verified that the taskqueue leaves some 
>>> tasks unexecuted
>>> ( 1% to 10%, happens when you burst tasks / run a mapreduce job [custom] 
>>> - happens both with normal instances and basic_scaling/B4 instances )
>>>
>>> I first noticed the issue when some operations that should have done 
>>> were left undone
>>>
>>> Than I inspected the taskqueue execution with a custom routine that 
>>> tracks / counts ingoing and executing tasks, a routine that I perfected 
>>> long ago, and noticed the missing executions
>>>
>>> The issue isn't persistent, after a re-deployment and re-test, the same 
>>> routine managed to traverse all the entities as it's supposed to
>>>
>>> TL;DR - some taskqueue tasks silently fail to execute, this should never 
>>> happen, but it happens very frequently without any reason, causes damage 
>>> and confusion
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/6c46efeb-d975-4cc9-bf52-b44a39fc3d1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to