Yeah, I just noticed that. May I know how I can abort all the jobs at once? I only saw that I can cancel the jobs one-by-one.
Thanks, --Gautham On 2024/04/28 15:19:13 Ayush Saxena wrote: > Thanx Gautham for chasing this. > > I think there are still some 119 in the build queue, if you see on the left > here [1](Search for Build Queue). They are all stuck on "Waiting for next > available executor on Windows" > > If you aborted all previously & they showed up now again, then something is > still messed up with the configurations that the pipeline is getting > triggered for the existing PR (not new), if you didn't abort earlier then > maybe you need to abort all the ones in queue and free up the resources. > > One example of build waiting (as of now) for resource since past 7 hours [2] > > Let me know if you are stuck, we can together get things figured out :-) > > -Ayush > > > [1] > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds > [2] > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/job/PR-6423/2/console > > On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra <gaur...@apache.org> wrote: > > > Hi folks, > > > > I apologize for the inconvenience caused. I've now applied the mitigation > > described in [3]. > > > > Unfortunately, there are only 12 Windows nodes in the whole swarm of > > Jenkins build nodes. > > Thus, this caused a starvation of the Windows nodes for other projects. > > > > I had reached out to the infra team several months ago and requested them > > to add more > > Windows nodes, but it was turned down. I'm not sure if there's a way > > around this, other than > > getting more Windows nodes. > > > > Thanks, > > --Gautham > > > > On 2024/04/28 04:53:32 Ayush Saxena wrote: > > > Found this on dev@hadoop -> Moving to common-dev (the ML we use) > > > > > > I think there was some initiative to enable Windows Pre-Commit for every > > PR > > > and that seems to have gone wild, either the number of PRs raised are way > > > more than the capacity the nodes can handle or something got > > misconfigured > > > in the job itself that the build is getting triggered for all the open PR > > > not just new, which is leading to starvation of resources. > > > > > > To the best of my knowledge > > > @Gautham Banasandra <gaur...@apache.org> / @Iñigo Goiri < > > elgo...@gmail.com> are > > > chasing the initiative, can you folks help check? > > > > > > There are concerns raised by the Infra team here [1] on dev@hadoop > > > > > > Most probably something messed up while configuring the > > > hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I > > think > > > it scheduled for all open ones, something similar happened long-long ago > > > when we were doing migrations, can fetch pointers from [3] > > > > > > [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl > > > [2] > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds > > > [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23 > > > > > > -Ayush > > > > > > > > > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote: > > > > I'm not familiar with Windows build. But you may have better luck > > reaching > > > > out to Apache Infra > > > > https://infra.apache.org/contact.html > > > > > > > > mailing list, jira or even slack > > > > > > > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez <cesargu...@gmail.com> > > > > wrote: > > > > > > > > > Hello, > > > > > An option that can be implemented in the Hadoop pipeline [1] is to > > set a > > > > > timeout [2] on critical stages within the pipelines, for example in > > > > > "Windows 10" stage . > > > > > As for the issue the Ci build is logging [3] in the > > hadoop-multibranch > > > jobs > > > > > reported by Chris, it seems the issue is around the Post (cleanup) > > > pipeline > > > > > process. My two cents is to use cleanWs() instead of deleteDir() as > > > > > documented in: https://plugins.jenkins.io/ws-cleanup/ > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10 > > > > > > > > > > [2] > > > > > > > > > > > > > > > https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit > > > > > > > > > > [3] > > > > > > > > > > Still waiting to schedule task > > > > > Waiting for next available executor on ‘Windows > > > > > <https://ci-hadoop.apache.org/label/Windows/>’[Pipeline] // > > > > > node[Pipeline] stage > > > > > < > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console# > > > > > >[Pipeline] > > > > > { (Declarative: Post Actions) > > > > > < > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console# > > > > > >[Pipeline] > > > > > script < > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console# > > > > > >[Pipeline] > > > > > { < > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console# > > > > > >[Pipeline] > > > > > deleteDir < > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console# > > > > > >[Pipeline] > > > > > }[Pipeline] // scriptError when executing cleanup post condition: > > > > > Also: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: > > > > > ca1b7f2f-ec16-4bde-ac51-85f964794e37 > > > > > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException: > > > > > Required context class hudson.FilePath is missing > > > > > Perhaps you forgot to surround the code with a step that provides > > > > > this, such as: node > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:265) > > > > > at > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:300) > > > > > at > > > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124) > > > > > at > > > jdk.internal.reflect.GeneratedMethodAccessor1084.invoke(Unknown > > > > > Source) > > > > > at > > > > > > > > > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > > > > > at > > > > > > > org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98) > > > > > at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) > > > > > at > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225) > > > > > at > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034) > > > > > at > > > > > > > > > > org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41) > > > > > at > > > > > > > > > > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) > > > > > at > > > > > > > > > > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) > > > > > at > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180) > > > > > at > > > > > > > > > > org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:163) > > > > > at > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178) > > > > > at > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182) > > > > > at > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152) > > > > > at > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152) > > > > > at > > > > > > > > > > com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105) > > > > > at WorkflowScript.run(WorkflowScript:196) > > > > > at ___cps.transform___(Native Method) > > > > > at > > > > > > > > > > com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90) > > > > > at > > > > > > > > > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116) > > > > > at > > > > > > > > > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:80) > > > > > at > > > jdk.internal.reflect.GeneratedMethodAccessor1046.invoke(Unknown > > > > > Source) > > > > > at > > > > > > > > > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > > > > > at > > > > > > > > > > com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) > > > > > at > > > > > > > com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) > > > > > at com.cloudbees.groovy.cps.Next.step(Next.java:83) > > > > > at > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152) > > > > > at > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146) > > > > > at > > > > > > > > > > org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136) > > > > > at > > > > > > > > > > org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275) > > > > > at > > > com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295) > > > > > at > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97) > > > > > at > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > > > > > at > > > > > > > > > > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > > > > > at > > > > > > > > > > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > > > > > at > > > > > > > > > > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > > > > > at > > > > > > > > > > jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51) > > > > > at > > > > > > > > > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > > > > at > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > > > > > at > > > > > > > > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > > > > at > > > > > > > > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > > > > > at java.base/java.lang.Thread.run(Thread.java:829) > > > > > [Pipeline] }[Pipeline] // stage[Pipeline] End of PipelineQueue task > > > > > was cancelled > > > > > org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: > > > > > dc84ec50-8661-44a1-a7c0-ba575feca31d > > > > > > > > > > > > > > > El vie, 26 abr 2024 a las 7:56, Chris Thistlethwaite (< > > chr...@apache.org > > > >) > > > > > escribió: > > > > > > > > > > > Greetings all! > > > > > > > > > > > > It was brought to my attention this morning that all the shared > > > Jenkins > > > > > > Windows nodes were leased out to ci-hadoop. Upon investigation, it > > > > > > looks like there are several builds stuck for the last 3+ days. The > > > > > > particular build in question is > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/ > > > > > > > > > > > > There are a ton of Windows builds in the queue as well, so even if > > I > > > > > > start killing these off, they are going to be taking over the nodes > > > > > > again and likely failing/sticking at the same place. > > > > > > > > > > > > Can someone take a look at the build config? I'll have to force > > stop > > > > > > these builds. > > > > > > > > > > > > Please add me to any replies as I'm not subbed to this list. > > > > > > > > > > > > Thanks! > > > > > > -Chris T. > > > > > > #asfinfra > > > > > > > > > > > > > > > > > > > > > -- > > > > > Atentamente: > > > > > César Hernández. > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org