Hi Chris, Would it be possible for us (members outside the infra team) to run the groovy script to cancel all the jobs? We're still experimenting with the Jenkins configuration so that it doesn't run for all the open PRs. So, it would be great if we could run it by ourselves instead of reaching out to you folks.
Thanks, --Gautham -----Original Message----- From: Ayush Saxena <ayush...@gmail.com> Sent: Monday, April 29, 2024 7:29 PM To: Chris Thistlethwaite <chr...@apache.org> Cc: common-dev@hadoop.apache.org Subject: Re: Hadoop Windows Build Thanx Chris, that would be great -Ayush On Mon, 29 Apr 2024 at 19:07, Chris Thistlethwaite <chr...@apache.org> wrote: > I'm following along on lists.a.o. I can cancel all the Windows jobs in > queue, we have a groovy script for that. > > -Chris T. > #asfinfra > > On 2024/04/28 17:35:21 Gautham Banasandra wrote: > > Yeah, I just noticed that. May I know how I can abort all the jobs > > at > once? I only saw that I > > can cancel the jobs one-by-one. > > > > Thanks, > > --Gautham > > > > On 2024/04/28 15:19:13 Ayush Saxena wrote: > > > Thanx Gautham for chasing this. > > > > > > I think there are still some 119 in the build queue, if you see on > > > the > left > > > here [1](Search for Build Queue). They are all stuck on "Waiting > > > for > next > > > available executor on Windows" > > > > > > If you aborted all previously & they showed up now again, then > something is > > > still messed up with the configurations that the pipeline is > > > getting triggered for the existing PR (not new), if you didn't > > > abort earlier > then > > > maybe you need to abort all the ones in queue and free up the > resources. > > > > > > One example of build waiting (as of now) for resource since past 7 > hours [2] > > > > > > Let me know if you are stuck, we can together get things figured > > > out > :-) > > > > > > -Ayush > > > > > > > > > [1] > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/ch > ange-requests/builds > > > [2] > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/ch > ange-requests/job/PR-6423/2/console > > > > > > On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra > > > <gaur...@apache.org> > wrote: > > > > > > > Hi folks, > > > > > > > > I apologize for the inconvenience caused. I've now applied the > mitigation > > > > described in [3]. > > > > > > > > Unfortunately, there are only 12 Windows nodes in the whole > > > > swarm of Jenkins build nodes. > > > > Thus, this caused a starvation of the Windows nodes for other > projects. > > > > > > > > I had reached out to the infra team several months ago and > > > > requested > them > > > > to add more > > > > Windows nodes, but it was turned down. I'm not sure if there's a > > > > way around this, other than getting more Windows nodes. > > > > > > > > Thanks, > > > > --Gautham > > > > > > > > On 2024/04/28 04:53:32 Ayush Saxena wrote: > > > > > Found this on dev@hadoop -> Moving to common-dev (the ML we > > > > > use) > > > > > > > > > > I think there was some initiative to enable Windows Pre-Commit > > > > > for > every > > > > PR > > > > > and that seems to have gone wild, either the number of PRs > > > > > raised > are way > > > > > more than the capacity the nodes can handle or something got > > > > misconfigured > > > > > in the job itself that the build is getting triggered for all > > > > > the > open PR > > > > > not just new, which is leading to starvation of resources. > > > > > > > > > > To the best of my knowledge > > > > > @Gautham Banasandra <gaur...@apache.org> / @Iñigo Goiri < > > > > elgo...@gmail.com> are > > > > > chasing the initiative, can you folks help check? > > > > > > > > > > There are concerns raised by the Infra team here [1] on > > > > > dev@hadoop > > > > > > > > > > Most probably something messed up while configuring the > > > > > hadoop-multibranch-windows job, it shows some 613 PR scheduled > [2], I > > > > think > > > > > it scheduled for all open ones, something similar happened > long-long ago > > > > > when we were doing migrations, can fetch pointers from [3] > > > > > > > > > > [1] > https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl > > > > > [2] > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/ch > ange-requests/builds > > > > > [3] > https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23 > > > > > > > > > > -Ayush > > > > > > > > > > > > > > > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote: > > > > > > I'm not familiar with Windows build. But you may have better > > > > > > luck > > > > reaching > > > > > > out to Apache Infra > > > > > > https://infra.apache.org/contact.html > > > > > > > > > > > > mailing list, jira or even slack > > > > > > > > > > > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez < > cesargu...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hello, > > > > > > > An option that can be implemented in the Hadoop pipeline > > > > > > > [1] > is to > > > > set a > > > > > > > timeout [2] on critical stages within the pipelines, for > example in > > > > > > > "Windows 10" stage . > > > > > > > As for the issue the Ci build is logging [3] in the > > > > hadoop-multibranch > > > > > jobs > > > > > > > reported by Chris, it seems the issue is around the Post > (cleanup) > > > > > pipeline > > > > > > > process. My two cents is to use cleanWs() instead of > deleteDir() as > > > > > > > documented in: https://plugins.jenkins.io/ws-cleanup/ > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-wi > ndows-10 > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeou > t-enforce-time-limit > > > > > > > > > > > > > > [3] > > > > > > > > > > > > > > Still waiting to schedule task Waiting for next available > > > > > > > executor on ‘Windows > > > > > > > <https://ci-hadoop.apache.org/label/Windows/>’[Pipeline] > > > > > > > // node[Pipeline] stage < > > > > > > > > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR- > 1137/1/console# > > > > > > > >[Pipeline] > > > > > > > { (Declarative: Post Actions) < > > > > > > > > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR- > 1137/1/console# > > > > > > > >[Pipeline] > > > > > > > script < > > > > > > > > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR- > 1137/1/console# > > > > > > > >[Pipeline] > > > > > > > { < > > > > > > > > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR- > 1137/1/console# > > > > > > > >[Pipeline] > > > > > > > deleteDir < > > > > > > > > > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR- > 1137/1/console# > > > > > > > >[Pipeline] > > > > > > > }[Pipeline] // scriptError when executing cleanup post > condition: > > > > > > > Also: > org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: > > > > > > > ca1b7f2f-ec16-4bde-ac51-85f964794e37 > > > > > > > > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException: > > > > > > > Required context class hudson.FilePath is missing Perhaps > > > > > > > you forgot to surround the code with a step that > provides > > > > > > > this, such as: node > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvaila > bility(StepDescriptor.java:265) > > > > > > > at > > > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:300 > > > > > ) > > > > > > > at > > > > > > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.ja > va:124) > > > > > > > at > > > > > jdk.internal.reflect.GeneratedMethodAccessor1084.invoke(Unknow > > > > > n > > > > > > > Source) > > > > > > > at > > > > > > > > > > > > > > > > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Del > egatingMethodAccessorImpl.java:43) > > > > > > > at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > > > > > > > at > > > > > > > > > > > > org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:9 > 8) > > > > > > > at > groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) > > > > > > > at > > > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225 > > > > > ) > > > > > > > at > > > > > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034 > > > > > ) > > > > > > > at > > > > > > > > > > > > > > > > > org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaCl > assSite.java:41) > > > > > > > at > > > > > > > > > > > > > > > > > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSit > eArray.java:47) > > > > > > > at > > > > > > > > > > > > > > > > > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCal > lSite.java:116) > > > > > > > at > > > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:18 > > > > > 0) > > > > > > > at > > > > > > > > > > > > > > > > > org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterc > eptor.java:23) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor > .onMethodCall(SandboxInterceptor.java:163) > > > > > > > at > > > > > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:17 > > > > > 8) > > > > > > > at > > > > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182) > > > > > > > at > > > > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152) > > > > > > > at > > > > > > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152) > > > > > > > at > > > > > > > > > > > > > > > > > com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvo > ker.java:17) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingIn > voker.java:105) > > > > > > > at WorkflowScript.run(WorkflowScript:196) > > > > > > > at ___cps.transform___(Native Method) > > > > > > > at > > > > > > > > > > > > > > > > > com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(Continuatio > nGroup.java:90) > > > > > > > at > > > > > > > > > > > > > > > > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispa > tchOrArg(FunctionCallBlock.java:116) > > > > > > > at > > > > > > > > > > > > > > > > > com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixNa > me(FunctionCallBlock.java:80) > > > > > > > at > > > > > jdk.internal.reflect.GeneratedMethodAccessor1046.invoke(Unknow > > > > > n > > > > > > > Source) > > > > > > > at > > > > > > > > > > > > > > > > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Del > egatingMethodAccessorImpl.java:43) > > > > > > > at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > > > > > > > at > > > > > > > > > > > > > > > > > com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive > (ContinuationPtr.java:72) > > > > > > > at > > > > > > > > > > > > com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21 > ) > > > > > > > at com.cloudbees.groovy.cps.Next.step(Next.java:83) > > > > > > > at > > > > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152) > > > > > > > at > > > > > > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146) > > > > > > > at > > > > > > > > > > > > > > > > > org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.u > se(GroovyCategorySupport.java:136) > > > > > > > at > > > > > > > > > > > > > > > > > org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySu > pport.java:275) > > > > > > > at > > > > > com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146 > > > > > ) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(Sandb > oxContinuable.java:18) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxCont > inuable.java:51) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.ja > va:187) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.j > ava:423) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGrou > p.java:331) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGrou > p.java:295) > > > > > > > at > > > > > > > > > > > > > > > > > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmEx > ecutorService.java:97) > > > > > > > at > > > > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > > > > > > > at > > > > > > > > > > > > > > > > > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorServ > ice.java:139) > > > > > > > at > > > > > > > > > > > > > > > > > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExe > cutorService.java:28) > > > > > > > at > > > > > > > > > > > > > > > > > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecu > torService.java:68) > > > > > > > at > > > > > > > > > > > > > > > > > jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExe > cutorService.java:51) > > > > > > > at > > > > > > > > > > > > > > > > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executor > s.java:515) > > > > > > > at > > > > > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > > > > > > > at > > > > > > > > > > > > > > > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > Executor.java:1128) > > > > > > > at > > > > > > > > > > > > > > > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:628) > > > > > > > at java.base/java.lang.Thread.run(Thread.java:829) > > > > > > > [Pipeline] }[Pipeline] // stage[Pipeline] End of > > > > > > > PipelineQueue > task > > > > > > > was cancelled > > > > > > > org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: > > > > > > > dc84ec50-8661-44a1-a7c0-ba575feca31d > > > > > > > > > > > > > > > > > > > > > El vie, 26 abr 2024 a las 7:56, Chris Thistlethwaite (< > > > > chr...@apache.org > > > > > >) > > > > > > > escribió: > > > > > > > > > > > > > > > Greetings all! > > > > > > > > > > > > > > > > It was brought to my attention this morning that all the > shared > > > > > Jenkins > > > > > > > > Windows nodes were leased out to ci-hadoop. Upon > investigation, it > > > > > > > > looks like there are several builds stuck for the last > > > > > > > > 3+ > days. The > > > > > > > > particular build in question is > > > > > > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/ > > > > > > > > > > > > > > > > There are a ton of Windows builds in the queue as well, > > > > > > > > so > even if > > > > I > > > > > > > > start killing these off, they are going to be taking > > > > > > > > over > the nodes > > > > > > > > again and likely failing/sticking at the same place. > > > > > > > > > > > > > > > > Can someone take a look at the build config? I'll have > > > > > > > > to > force > > > > stop > > > > > > > > these builds. > > > > > > > > > > > > > > > > Please add me to any replies as I'm not subbed to this list. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > -Chris T. > > > > > > > > #asfinfra > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Atentamente: > > > > > > > César Hernández. > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------- > > > > ----- To unsubscribe, e-mail: > > > > common-dev-unsubscr...@hadoop.apache.org > > > > For additional commands, e-mail: > > > > common-dev-h...@hadoop.apache.org > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org