FYI. The site automatically restarted the instances after 18 hours of 
outage at about 6am GMT.

So I presume it simply was the zone being exhausted and nothing to do with 
my specific instances, and my 10 hours of trying to fix it was all in vain. 
Next time I won't bother.

On Saturday, 12 March 2022 at 19:05:40 UTC Pip Jones wrote:

> When trying to deploy my App Engine Flex app today, I am getting this 
> error, after the build has completed.
>
> `ERROR: (gcloud.app.deploy) Error Response: [9] An internal error occurred 
> while processing task 
> /app-engine-flex/flex_await_healthy/flex_await_healthy>2022-03-12T14:09:32.742Z6575.hg.2:
>  
> The region us-west2 does not have enough resources available to fulfill the 
> request. Please try again later.`
>
> After about 10x attempts, and about 2 hours, it killed off my previous 
> running version's two instances, and so now my company's app it down, and I 
> cannot bring it back up. 
>
> I get this error pretty regularly when deploying new versions (almost 
> every time) but it usually succeeds after a couple of attempts, so I've 
> just lived with it. But now my site is down and it's 4 hours later, and I'd 
> really like to know if there's something I can do to fix it, or is it just 
> a case of waiting for the zone to have more capacity?
>
> A newly deployed version shows up in the console (and command line version 
> list command), for a while but then disappears on its own.
>
> I have checked my quotas under IAM & Admin, and nothing is above 30% 
> allocation at most. Besides now my site is not running, I don't have many 
> resources in use.
>
> I noticed the previous version had 2 instances which seemed stuck in 
> "restarting" state from a couple of weeks ago. I killed them off manually 
> in the console thinking these might have been consuming resources. I wonder 
> if this has somehow skewed the auto-scaler? I was hoping it would 
> eventually repair itself, as it seems GCP sometimes takes a while to do 
> stuff in the background.
>
> I have tried restarting the previous version in the console, but it just 
> sits at 0 instances. It's autoscaled, but the autoscaling has stopped 
> working. I tried stopping it, waiting, then restarting it. 
>
> I have checked the stackdriver logs and it's definitely a 
> ZONE_RESOURCE_POOL_EXHAUSTED 
> error.
> e.g.
> serviceName: "compute.googleapis.com"
> status: {
> code: 8
> details: [
> 0: {
> @type: "type.googleapis.com/google.protobuf.Struct"
> value: {
> zoneResourcePoolExhausted: {
> resource: {
> project: {
> canonicalProjectId: "XXX"
> }
> resourceName: "us-west2-b"
> resourceType: "ZONE"
> scope: {
> scopeName: "global"
> scopeType: "GLOBAL"
> }
> }
> }
> }
> }
> ]
> message: "ZONE_RESOURCE_POOL_EXHAUSTED"
> }
>
>
> I have tried increasing the readiness_check: app_start_timeout_sec and 
> increasing failure_threshold and timeouts etc in case this was on the edge, 
> but judging by the logs, the instance doesn't even begin to get booted (due 
> to the VM not being allocated).
>
> I tried re-deploying the previous version again.
>
> I tried stopping the current version (which previously was "SERVING" but 
> with 0 instances) and then deploying, but this doesn't help. So at this 
> point I'm deploying over nothing running at all in my project, confirming 
> it cannot be quotas.
>
> I noticed in my service logs though, seemingly inconsistent reports of the 
> number of instances. This doesn't make sense because the are NO instances 
> running either before or after deployment.
>
> 2022-03-12 13:09:55.422 GMT
> The number of running VMs for version 20220211t141828 changed from 2 to 1
> 2022-03-12 13:10:19.089 GMT
> The number of running VMs for version 20220211t141828 changed from 1 to 3
> 2022-03-12 13:10:25.919 GMT
> The number of running VMs for version 20220211t141828 changed from 3 to 4
> 2022-03-12 13:10:41.160 GMT
> The number of running VMs for version 20220211t141828 changed from 4 to 2
>
> I tried deploying the app to a different service name, (and was going to 
> change my dispatch to reroute to that) but that service deployment failed 
> with the same error.
>
> The status pages look OK. 
>
> I've tried --verbosity=debug which didn't reveal any extra info.
>
> I've read every post I can find (including my own previous post in this 
> group where is was a "prerequesite" error caused by quotas), and the only 
> thing I seem to be left with is migrating my app to a new project in a more 
> reliable zone like us-central? However this will be a lot of work as I'm 
> using GCS, Functions, and networking to other providers which will all have 
> to be migrated.
>
> Is there any way to get more detailed information on the resource problem?
>
> thanks
> Pip Jones
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/728da8e6-6b90-41ac-af7e-2f771c95ef79n%40googlegroups.com.

Reply via email to