I'm following along on lists.a.o. I can cancel all the Windows jobs in queue, 
we have a groovy script for that.

-Chris T.
#asfinfra

On 2024/04/28 17:35:21 Gautham Banasandra wrote:
> Yeah, I just noticed that. May I know how I can abort all the jobs at once? I 
> only saw that I
> can cancel the jobs one-by-one.
> 
> Thanks,
> --Gautham
> 
> On 2024/04/28 15:19:13 Ayush Saxena wrote:
> > Thanx Gautham for chasing this.
> > 
> > I think there are still some 119 in the build queue, if you see on the left
> > here [1](Search for Build Queue). They are all stuck on "Waiting for next
> > available executor on Windows"
> > 
> > If you aborted all previously & they showed up now again, then something is
> > still messed up with the configurations that the pipeline is getting
> > triggered for the existing PR (not new), if you didn't abort earlier then
> > maybe you need to abort all the ones in queue and free up the resources.
> > 
> > One example of build waiting (as of now) for resource since past 7 hours [2]
> > 
> > Let me know if you are stuck, we can together get things figured out :-)
> > 
> > -Ayush
> > 
> > 
> > [1]
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> > [2]
> > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/job/PR-6423/2/console
> > 
> > On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra <gaur...@apache.org> wrote:
> > 
> > > Hi folks,
> > >
> > > I apologize for the inconvenience caused. I've now applied the mitigation
> > > described in [3].
> > >
> > > Unfortunately, there are only 12 Windows nodes in the whole swarm of
> > > Jenkins build nodes.
> > > Thus, this caused a starvation of the Windows nodes for other projects.
> > >
> > > I had reached out to the infra team several months ago and requested them
> > > to add more
> > > Windows nodes, but it was turned down. I'm not sure if there's a way
> > > around this, other than
> > > getting more Windows nodes.
> > >
> > > Thanks,
> > > --Gautham
> > >
> > > On 2024/04/28 04:53:32 Ayush Saxena wrote:
> > > > Found this on dev@hadoop -> Moving to common-dev (the ML we use)
> > > >
> > > > I think there was some initiative to enable Windows Pre-Commit for every
> > > PR
> > > > and that seems to have gone wild, either the number of PRs raised are 
> > > > way
> > > > more than the capacity the nodes can handle or something got
> > > misconfigured
> > > > in the job itself that the build is getting triggered for all the open 
> > > > PR
> > > > not just new, which is leading to starvation of resources.
> > > >
> > > > To the best of my knowledge
> > > > @Gautham Banasandra <gaur...@apache.org> / @Iñigo Goiri <
> > > elgo...@gmail.com> are
> > > > chasing the initiative, can you folks help check?
> > > >
> > > > There are concerns raised by the Infra team here [1] on dev@hadoop
> > > >
> > > > Most probably something messed up while configuring the
> > > > hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I
> > > think
> > > > it scheduled for all open ones, something similar happened long-long ago
> > > > when we were doing migrations, can fetch pointers from [3]
> > > >
> > > > [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl
> > > > [2]
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> > > > [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23
> > > >
> > > > -Ayush
> > > >
> > > >
> > > > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote:
> > > > > I'm not familiar with Windows build. But you may have better luck
> > > reaching
> > > > > out to Apache Infra
> > > > > https://infra.apache.org/contact.html
> > > > >
> > > > > mailing list, jira or even slack
> > > > >
> > > > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez <cesargu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > > An option that can be implemented in the Hadoop pipeline [1] is to
> > > set a
> > > > > > timeout [2] on critical stages within the pipelines, for example in
> > > > > > "Windows 10" stage .
> > > > > > As for the issue the Ci build is logging [3] in the
> > > hadoop-multibranch
> > > > jobs
> > > > > > reported by Chris, it seems the issue is around the Post (cleanup)
> > > > pipeline
> > > > > > process. My two cents is to use cleanWs() instead of deleteDir() as
> > > > > > documented in: https://plugins.jenkins.io/ws-cleanup/
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > > https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10
> > > > > >
> > > > > > [2]
> > > > > >
> > > > > >
> > > >
> > > https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit
> > > > > >
> > > > > > [3]
> > > > > >
> > > > > > Still waiting to schedule task
> > > > > > Waiting for next available executor on ‘Windows
> > > > > > <https://ci-hadoop.apache.org/label/Windows/>’[Pipeline] //
> > > > > > node[Pipeline] stage
> > > > > > <
> > > > > >
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > > >[Pipeline]
> > > > > > { (Declarative: Post Actions)
> > > > > > <
> > > > > >
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > > >[Pipeline]
> > > > > > script <
> > > > > >
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > > >[Pipeline]
> > > > > > { <
> > > > > >
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > > >[Pipeline]
> > > > > > deleteDir <
> > > > > >
> > > >
> > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > > > >[Pipeline]
> > > > > > }[Pipeline] // scriptError when executing cleanup post condition:
> > > > > > Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > > > > ca1b7f2f-ec16-4bde-ac51-85f964794e37
> > > > > > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
> > > > > > Required context class hudson.FilePath is missing
> > > > > > Perhaps you forgot to surround the code with a step that provides
> > > > > > this, such as: node
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:265)
> > > > > >         at
> > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:300)
> > > > > >         at
> > > > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
> > > > > >         at
> > > > jdk.internal.reflect.GeneratedMethodAccessor1084.invoke(Unknown
> > > > > > Source)
> > > > > >         at
> > > > > >
> > > >
> > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > >         at 
> > > > > > java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > > > >         at
> > > > > >
> > > org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
> > > > > >         at 
> > > > > > groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
> > > > > >         at
> > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
> > > > > >         at
> > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034)
> > > > > >         at
> > > > > >
> > > >
> > > org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41)
> > > > > >         at
> > > > > >
> > > >
> > > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
> > > > > >         at
> > > > > >
> > > >
> > > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
> > > > > >         at
> > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180)
> > > > > >         at
> > > > > >
> > > >
> > > org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:163)
> > > > > >         at
> > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178)
> > > > > >         at
> > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182)
> > > > > >         at
> > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
> > > > > >         at
> > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
> > > > > >         at
> > > > > >
> > > >
> > > com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105)
> > > > > >         at WorkflowScript.run(WorkflowScript:196)
> > > > > >         at ___cps.transform___(Native Method)
> > > > > >         at
> > > > > >
> > > >
> > > com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90)
> > > > > >         at
> > > > > >
> > > >
> > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116)
> > > > > >         at
> > > > > >
> > > >
> > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:80)
> > > > > >         at
> > > > jdk.internal.reflect.GeneratedMethodAccessor1046.invoke(Unknown
> > > > > > Source)
> > > > > >         at
> > > > > >
> > > >
> > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > >         at 
> > > > > > java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > > > >         at
> > > > > >
> > > >
> > > com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
> > > > > >         at
> > > > > >
> > > com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
> > > > > >         at com.cloudbees.groovy.cps.Next.step(Next.java:83)
> > > > > >         at
> > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
> > > > > >         at
> > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
> > > > > >         at
> > > > > >
> > > >
> > > org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
> > > > > >         at
> > > > > >
> > > >
> > > org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
> > > > > >         at
> > > > com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
> > > > > >         at
> > > > > >
> > > >
> > > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
> > > > > >         at
> > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > > > > >         at
> > > > > >
> > > >
> > > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
> > > > > >         at
> > > > > >
> > > >
> > > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
> > > > > >         at
> > > > > >
> > > >
> > > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
> > > > > >         at
> > > > > >
> > > >
> > > jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
> > > > > >         at
> > > > > >
> > > >
> > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > > > > >         at
> > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > > > > >         at
> > > > > >
> > > >
> > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > > >         at
> > > > > >
> > > >
> > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > > >         at java.base/java.lang.Thread.run(Thread.java:829)
> > > > > > [Pipeline] }[Pipeline] // stage[Pipeline] End of PipelineQueue task
> > > > > > was cancelled
> > > > > > org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > > > > dc84ec50-8661-44a1-a7c0-ba575feca31d
> > > > > >
> > > > > >
> > > > > > El vie, 26 abr 2024 a las 7:56, Chris Thistlethwaite (<
> > > chr...@apache.org
> > > > >)
> > > > > > escribió:
> > > > > >
> > > > > > > Greetings all!
> > > > > > >
> > > > > > > It was brought to my attention this morning that all the shared
> > > > Jenkins
> > > > > > > Windows nodes were leased out to ci-hadoop. Upon investigation, it
> > > > > > > looks like there are several builds stuck for the last 3+ days. 
> > > > > > > The
> > > > > > > particular build in question is
> > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/
> > > > > > >
> > > > > > > There are a ton of Windows builds in the queue as well, so even if
> > > I
> > > > > > > start killing these off, they are going to be taking over the 
> > > > > > > nodes
> > > > > > > again and likely failing/sticking at the same place.
> > > > > > >
> > > > > > > Can someone take a look at the build config? I'll have to force
> > > stop
> > > > > > > these builds.
> > > > > > >
> > > > > > > Please add me to any replies as I'm not subbed to this list.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > -Chris T.
> > > > > > > #asfinfra
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Atentamente:
> > > > > > César Hernández.
> > > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >
> > >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to