I think it was a memory issue, but I couldn't see any Out of Memory or relate errors initially. Regardless, I have some room to play with on my ESX servers so I cranked up the agents to 32GB each and updated the configs. So far so good!
Thanks!! On Tue, Mar 12, 2019 at 3:31 PM Aravind SV <[email protected]> wrote: > Hello Jeff, > > I remember an issue where a large test artifact made the GoCD agent run > out of memory and one of the threads died. Can you check if you see any > OutOfMemory exceptions in the log? > > Another thing to try: If possible, make those artifacts be non-test > artifacts. That way they won't be parsed (and you won't see the exception > you saw). If that stops the agents from not going into LostContact, maybe > the problem was with parsing the test results? > > Cheers, > Aravind > > On Mon, Mar 11, 2019 at 11:51:50 -0600, Jeff wrote: > > All, > > > > In my GoCD setup all running on Ubuntu 16.04, I have the latest version > of > > goCD (server version: 19.2.0-8641, agent version: 19.2.0-8641). I have > six > > agent machines, each running a total of 4 agent processes. Ubuntu 16.04 > > is up-to-date as of 3/11/19. > > > > Two of the agent machine are constantly going offline, meaning the GoCD > > server shows their status as "LostContact". In looking at the logs, there > > appear to be no errors. Here is a 'tail' of one of the agent process > logs > > that is currently "LostContact": > > > > 2019-03-08 12:32:55,309 WARN [scheduler-2] BaseCommandBuilder:92 - > return > > code is 1 > > 2019-03-08 12:32:55,349 INFO [scheduler-2] HttpService:138 - Got back > 200 > > from server > > 2019-03-08 12:32:55,349 INFO [scheduler-2] DefaultGoPublisher:101 - > Agent [ > > go-agent06.mycompany.com, 10.131.24.205, > > 8fad4457-695b-45d5-9af8-f1eb9ff9a397] is reporting build result [Failed] > to > > Go Server for Build [responders_INT/46/Validate/1/validate/28411] > > 2019-03-08 12:32:55,381 INFO [scheduler-2] HttpService:138 - Got back > 200 > > from server > > 2019-03-08 12:32:55,382 INFO [scheduler-2] DefaultGoPublisher:95 - > Agent [ > > go-agent06.mycompany.com, 10.131.24.205, > > 8fad4457-695b-45d5-9af8-f1eb9ff9a397] is reporting status [Completing] to > > Go Server for Build [responders_INT/46/Validate/1/validate/28411] > > 2019-03-08 12:32:55,410 INFO [scheduler-2] ArtifactsPublisher:69 - > > Pluggable metadata folder is empty. > > 2019-03-08 12:32:57,641 INFO [scheduler-2] HttpService:71 - Uploading > file > > > [/var/lib/go-agent/data/cruise-5730ea53-05c1-4215-a110-b76c17960317/9efb8cf5-8076-4c6b-b75d-67ccf3891a16/surefire-reports.zip] > > to url [ > > > https://go.mycompany.com:8154/go/remoting/files/responders_INT/46/Validate/1/validate/respondders_results/qa-libs/qa-responders/target?attempt=1&buildId=28411 > > ] > > 2019-03-08 12:32:58,768 INFO [scheduler-2] HttpService:138 - Got back > 201 > > from server > > 2019-03-08 12:32:59,008 INFO [pool-4-thread-1] HttpService:138 - Got > back > > 200 from server > > > > It looks like the failed test run was reported to the server correctly. > > The agent process is still running and no other errors appear. > > > > On the server I see a lot of these errors in the go-shine.log: > > > > com.thoughtworks.studios.shine.ShineRuntimeException: Could not create > > graph from XML RDF! > > at > > > com.thoughtworks.studios.shine.semweb.sesame.SesameGraph.addTriplesFromRDFXMLAbbrev(SesameGraph.java:287) > > at > > > com.thoughtworks.studios.shine.semweb.grddl.GRDDLTransformer.transform(GRDDLTransformer.java:57) > > at > > > com.thoughtworks.studios.shine.xunit.AntJUnitReportRDFizer.importXUnit(AntJUnitReportRDFizer.java:57) > > at > > > com.thoughtworks.studios.shine.xunit.AntJUnitReportRDFizer.importFile(AntJUnitReportRDFizer.java:52) > > at > > > com.thoughtworks.studios.shine.cruise.stage.details.XMLArtifactImporter.importXML(XMLArtifactImporter.java:71) > > ... > > Caused by: org.eclipse.rdf4j.rio.RDFParseException: unexpected literal > > [line 34, column 20] > > at > > > org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:322) > > at > > > org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportError(AbstractRDFParser.java:684) > > at > > > org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.reportError(RDFXMLParser.java:1237) > > at > > org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.text(RDFXMLParser.java:563) > > > > > > > > Could this be killing the connection somehow on the server side? > > > > Anyone else see this? > > > > Thanks! > > > > -- > > You received this message because you are subscribed to the Google > Groups "go-cd" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
