Does that have potential to break other things? We could presumably also update https://github.com/apache/beam/blob/4718cdff87fed4f92636e94dbf3a04c2315d6a95/.test-infra/jenkins/job_IODatastoresCredentialsRotation.groovy#L38 to pool-1 instead.
I put up https://github.com/apache/beam/pull/24466 in case that is preferable. On Thu, Dec 1, 2022 at 1:29 PM Yi Hu <ya...@google.com> wrote: > Thanks for reporting. I have bumped the pool size of io-datastore as we > have more tests being added and the default-pool frequently becomes > unschedulable due to memory constraints. A simple fix is just rename the > 'pool1' back to 'default-pool'. > > On Thu, Dec 1, 2022 at 1:26 PM Danny McCormick <dannymccorm...@google.com> > wrote: > >> Yes, I was just starting to look into this. Looks like this is the result >> of this job failing - >> https://github.com/apache/beam/blob/ec2a07b38c1f640c62e7c3b96966f18b334a7ce9/.test-infra/jenkins/job_IODatastoresCredentialsRotation.groovy#L49 >> >> The error is: >> >> ``` >> >> *21:25:58* + gcloud container clusters upgrade io-datastores >> --node-pool=default-pool --zone=us-central1-a --quiet*21:25:59* ERROR: >> (gcloud.container.clusters.upgrade) No node pool found matching the name >> [default-pool]. >> >> ``` >> >> >> from >> https://ci-beam.apache.org/job/Rotate%20IO-Datastores%20Cluster%20Credentials/6/console >> >> >> It looks like there's been some change to the cluster that is causing the >> job to fail. If we don't fix this and rerun, the cluster's creds will >> expire (probably in like a monthish). I'm not sure what the impact of that >> would be, I think probably broken IO integration tests. >> >> @John Casey <johnjca...@google.com> or @Yi Hu <ya...@google.com> might >> know more about this, I think the cluster in question is >> https://pantheon.corp.google.com/kubernetes/clusters/details/us-central1-a/io-datastores/details?mods=dataflow_dev&project=apache-beam-testing >> >> Next steps are: >> 1) figuring out why there's no longer a default-pool >> 2) Either recreating it or modifying the cred rotation logic >> 3) (Minor) Fixing the url in the Jenkins job so it actually points to the >> failing job when we get emails like this >> >> On Thu, Dec 1, 2022 at 1:18 PM Byron Ellis via dev <dev@beam.apache.org> >> wrote: >> >>> Is there something we need to do here? >>> >>> On Thu, Dec 1, 2022 at 10:10 AM Apache Jenkins Server < >>> jenk...@builds.apache.org> wrote: >>> >>>> Something went wrong during the automatic credentials rotation for >>>> IO-Datastores Cluster, performed at Thu Dec 01 15:00:47 UTC 2022. It may be >>>> necessary to check the state of the cluster certificates. For further >>>> details refer to the following links: >>>> * https://ci-beam.apache.org/job/beam_SeedJob_Standalone/ >>>> * https://ci-beam.apache.org/. >>> >>>