I am looking at our builds and I try to understand why our agents are often disconnected during the builds. We have in general a stacktrace like
maven6 was marked offline: Connection was broken: java.io.IOException: Pipe closed after 0 cycles at org.apache.sshd.common.channel.ChannelPipedInputStream.read(ChannelPipedInputStream.java:118) at org.apache.sshd.common.channel.ChannelPipedInputStream.read(ChannelPipedInputStream.java:101) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:92) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:73) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) As far I can see we are using 16Gb "hosts" for linux agents Something very strange is that the jenkins agent (this small component doing the link between the build host and the controller) is configured with `-Xms8g -Xmx8g` thus we are reserving for it 50% of the server mem (even more because of the non-heap) This one in general should require in general really less. 1Gb is already a lot from my exp. Due to this, the OS can see it has the biggest process on the host and decide to kill it when the rest of the memory is used by the build. I think we should decrease this value. (I can do it but I don't know how was configured the ci.apache.org agents and I would like to not add more issue if this setting was here in the past I don't think it is the root cause of our instabilities (at least all) and there is something else I have to find but it's a cheap fix to try FYI our agents VMs are ~like this today: - Java + Home: `/usr/local/asfpackages/java/oraclejdk-1.8.0-291/jre` + Vendor: Oracle Corporation + Version: 1.8.0_291 + Maximum memory: 7.67 GB (8232370176) + Allocated memory: 7.67 GB (8232370176) + Free memory: 6.03 GB (6470953760) + In-use memory: 1.64 GB (1761416416) + GC strategy: ParallelGC + Available CPUs: 4 8Gb is reserved, 1Gb is used (because the GC does nothing as the Free mem is high) I would be in favor to try to launch them with -Xms128m -Xmx1g -XX:+UseG1GC -XX:+UseStringDeduplication I think it's enough customization to start with Cheers On Wed, Jul 21, 2021 at 1:28 PM Arnaud Héritier <aherit...@gmail.com> wrote: > I am not sure about the setup > AFAICS we don't use any JDK installer ( > https://ci-maven.apache.org/configureTools/ ) thus I suppose that the > different JDKs are supposed to be installed directly on the agent ? > I am not sure how it was done on the previous environment > > On Sun, Jul 18, 2021 at 5:30 PM Tibor Digana <tibordig...@apache.org> > wrote: > >> The new CI system has the following issue: >> >> /home/jenkins/tools/java/latest1.7/bin/java: not found >> >> >> https://ci-maven.apache.org/job/Maven/job/maven-box/job/maven-surefire/job/master/104/execution/node/183/log/ >> >> >> >> On Wed, Jun 30, 2021 at 8:03 PM Gavin McDonald <gmcdon...@apache.org> >> wrote: >> >> > Hi Maven folks. >> > >> > Infra has decided to separate off the Maven build jobs from >> > ci-builds.apache.org over to its very own Jenkins Controller and >> Agents. >> > >> > This means that Maven now has a dedicated Jenkins environment for >> itself. >> > It >> > also means that no other projects build jobs can build on the Maven >> nodes; >> > and >> > then Maven jobs will no longer be able to build on the ci-builds jobs. >> > >> > Your new Controller is set up as https://ci-maven.apache.org and all >> Maven >> > Committers >> > can login via LDAP and create jobs. >> > >> > At the time of writing, there is one node/agent attached but I am >> building >> > 4 more - all >> > Ubuntu 20.04 and based in our Azure account. >> > >> > We can automagically move all your jobs over from ci-builds to ci-maven >> - I >> > just need someone to tell me go ahead and do it. >> > >> > In the meantime, feel free to have a test. The remaining 4 agents will >> be >> > online >> > by tomorrow. We will review after a month if 5 is enough nodes. >> > >> > As with other projects having their own dedicated controller, who have >> > taken advantage >> > of this isolation by having some nodes/agents given to the project as a >> > 'targeted donation' >> > so someone here may know of a Company will to donate 5 - 10 or more >> nodes >> > specifically >> > for Maven Jenkins environment. Infra can afford to hand you over 5 right >> > now. >> > >> > Let me know if you have any questions, otherwise let me know when I can >> > make the >> > transfer of your jobs. >> > >> > Thanks >> > >> > -- >> > >> > *Gavin McDonald* >> > Systems Administrator >> > ASF Infrastructure Team >> > >> > > > -- > Arnaud Héritier > Twitter/Skype : aheritier > -- Arnaud Héritier Twitter/Skype : aheritier