Dear All, Thank you for your suggestions.
I just update to you. I did DB migration from H2DB to Postgresql (Amazon Aurora) successfully. No GoCD server self restarting and DB lock issues were happened again after DB migration successfully for 2 weeks. I just share my DB migration procedure below. Hope It will help anyone facing this issue. *Procedure* *1.Provision DB (Amazon Aurora)* psql -h rdsdbhost.rds.amazonaws.com -U gocd -d postgres create DB in Postgresql CREATE ROLE "gocd_database_user" PASSWORD 'gocd_database_password' NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN; CREATE DATABASE "gocd" ENCODING="UTF8" TEMPLATE="template0"; GRANT ALL PRIVILEGES ON DATABASE "gocd" TO "gocd_database_user"; *Must use DB OWNER* ALTER DATABASE gocd OWNER TO gocd_database_user; *2.Provision EC2* *3.Prepare DB migration script on EC2* *Install Java* yum install java-1.11.0-openjdk (Amazon Linux 2) or yum install java-11-openjdk (Redhat) *Download DB migration script and unzip* curl -L -o gocd-database-migrator-1.0.4.tgz https://github.com/gocd/gocd-database-migrator/releases/download/1.0.4-229-exp/gocd-database-migrator-1.0.4.tgz gunzip gocd-database-migrator-1.0.4.tgz tar xf gocd-database-migrator-1.0.4.tar *4.Mount EFS to EC2* mkdir -p /gocd/godata mount -t efs -o tls fs-efsid:/gocd/godata /gocd/godata *5.Set GoCD maintenance mode* *6.Stop GoCD server* *7.Run DB migration script* cd gocd-database-migrator-1.0.4 ./bin/gocd-database-migrator \ --insert \ --progress \ --source-db-url='jdbc:h2:/gocd/godata/db/h2db/cruise' \ --target-db-url='jdbc:postgresql://rdsdbhost.rds.amazonaws.com:5432/gocd' \ --target-db-user='gocd_database_user' \ --target-db-password='gocd_database_password' *8.Change DB config* cd /gocd/godata/config *Create db.properties file* db.driver=org.postgresql.Driver db.url=jdbc:postgresql://rdsdbhost.rds.amazonaws.com:5432/gocd db.user=gocd_database_user db.password=gocd_database_password *Modify db.properties file permission* chown 1000:1000 db.properties chmod 644 db.properties *9.Start GoCD server* Best Regards, Komgrit On Thursday, September 12, 2024 at 6:07:20 PM UTC+7 Sriram Narayanan wrote: > On Thu, Sep 12, 2024 at 6:33 PM Chad Wilson <[email protected]> > wrote: > >> The warnings on materials not matching and upstream pipelines may be able >> to be ignored. Not relevant to this. Similar with the maximum backtracking >> limit (unrelated problem to this issue). >> >> But >> >> 1. is the GoCD server process being restarted after these "The >> database has been closed" errors? *Before* you then start seeing the >> locking errors? (e.g do you see it logging from something like >> Jetty9Server:199 >> - Configuring Jetty using /etc/go/jetty.xml again, which only happens >> at ) >> 2. Are there other errors before "The database has been closed >> [90098-200]" if you go back further looking for stack traces or "Out of >> memory"? >> >> >> There are other threads which describe very similar problems, and similar >> to them you probably need to keep finding your root cause: >> https://groups.google.com/g/go-cd/c/KPyCqTpxS-k/m/61Ps4wHvDQAJ >> https://groups.google.com/g/go-cd/c/4yuK8dx8m-Q/m/dpre3JAhAgAJ >> >> Note that the users both traced back H2DB issues to "Out of memory" >> errors. Switching to Postgres is unlikely to fix memory problems, which is >> why it's important to eliminate this, in my opinion. >> > > After reading through the various messages, I am inclined to agree with > Chad. While switching to Postgres has its own benefits, it is wise to > identify and address the root cause. > > Komgrit, do you have backups configured? See: > https://docs.gocd.org/current/advanced_usage/one_click_backup.html > > >> >> -Chad >> >> On Thu, Sep 12, 2024 at 4:08 PM Komgrit Aneksri <[email protected]> >> wrote: >> >>> Thank you Chad, >>> >>> I dig into GoCD Server logs before DB locked. >>> >>> *I always found many ERROR messages below.* >>> >>> 2024-09-10 08:14:09,413 WARN [118@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: ar-eod-service-deploy-prod. Possible Reasons: (1) >>> Upstream pipelines have not been built yet. (2) Materials do not match >>> between configuration and build-cause. >>> 2024-09-10 08:14:09,416 WARN [120@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: thinslice-eligibility-service-deploy-nonProd. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:09,437 WARN [122@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: mercury-bff-order-prod. Possible Reasons: (1) Upstream >>> pipelines have not been built yet. (2) Materials do not match between >>> configuration and build-cause. >>> 2024-09-10 08:14:09,441 WARN [121@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: digital-help-ios-payment-publish-6.11. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:09,446 WARN [121@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-litigation-web-keyfong-ka-deploy-prod. >>> Possible Reasons: (1) Upstream pipelines have not been built yet. (2) >>> Materials do not match between configuration and build-cause. >>> 2024-09-10 08:14:09,447 WARN [113@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: sonarqube-venus-backend. Possible Reasons: (1) >>> Upstream pipelines have not been built yet. (2) Materials do not match >>> between configuration and build-cause. >>> 2024-09-10 08:14:09,449 WARN [118@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-enforcement-bff-deploy-qa. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:09,451 WARN [122@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: ar-eod-cdc-deploy-prod. Possible Reasons: (1) Upstream >>> pipelines have not been built yet. (2) Materials do not match between >>> configuration and build-cause. >>> 2024-09-10 08:14:09,472 WARN [114@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: deep-product-deploy-prod. Possible Reasons: (1) >>> Upstream pipelines have not been built yet. (2) Materials do not match >>> between configuration and build-cause. >>> 2024-09-10 08:14:09,480 WARN [121@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: portal-goloyalty-digital-campaign-prod-deployment. >>> Possible Reasons: (1) Upstream pipelines have not been built yet. (2) >>> Materials do not match between configuration and build-cause. >>> 2024-09-10 08:14:09,503 WARN [117@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-frontend-notification-service-deploy-qa. >>> Possible Reasons: (1) Upstream pipelines have not been built yet. (2) >>> Materials do not match between configuration and build-cause. >>> 2024-09-10 08:14:09,512 ERROR [117@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:220 - Error while >>> scheduling pipeline: ar-loan-job-deploy-prod >>> com.thoughtworks.go.server.service.dd.MaxBackTrackLimitReachedException: >>> Maximum Backtracking limit reached while trying to resolve revisions for >>> material >>> DependencyMaterialConfig{pipelineName='ar-loan-job-deploy-nonProd', >>> stageName='Deployment-uat'} >>> at >>> com.thoughtworks.go.server.service.dd.DependencyFanInNode.hasMoreInstances(DependencyFanInNode.java:233) >>> at >>> com.thoughtworks.go.server.service.dd.DependencyFanInNode.fillNextRevisions(DependencyFanInNode.java:122) >>> at >>> com.thoughtworks.go.server.service.dd.DependencyFanInNode.handleNeedMoreRevisions(DependencyFanInNode.java:83) >>> at >>> com.thoughtworks.go.server.service.dd.DependencyFanInNode.initRevision(DependencyFanInNode.java:75) >>> at >>> com.thoughtworks.go.server.service.dd.DependencyFanInNode.populateRevisions(DependencyFanInNode.java:61) >>> at >>> com.thoughtworks.go.server.service.dd.FanInGraph.initChildren(FanInGraph.java:311) >>> at >>> com.thoughtworks.go.server.service.dd.FanInGraph.computeRevisions(FanInGraph.java:174) >>> at >>> com.thoughtworks.go.server.service.PipelineService.getRevisionsBasedOnDependencies(PipelineService.java:219) >>> at >>> com.thoughtworks.go.server.service.AutoBuild.fanInOn(AutoBuild.java:108) >>> at >>> com.thoughtworks.go.server.service.AutoBuild.onModifications(AutoBuild.java:67) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:191) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) >>> at java.base/java.lang.Thread.run(Unknown Source) >>> >>> >>> *And When DB locked, There were messages below* >>> >>> 2024-09-10 08:14:18,425 INFO [qtp1814840342-32487118] Stage:236 - Stage >>> is being completed by transition id: 2129759 >>> 2024-09-10 08:14:18,623 WARN [121@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: application-domain-security-group-prod. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,626 WARN [117@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-enforcement-bff-deploy-prod. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,632 WARN [118@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-enforcement-service-deploy-qa. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,637 WARN [115@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-litigation-ka-bff-deploy-prod. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,641 WARN [120@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-recovery-work-list-deploy-qa. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,644 WARN [116@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,644 WARN [115@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,660 ERROR [115@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been >>> closed [90098-200] >>> 2024-09-10 08:14:18,658 ERROR [116@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been >>> closed [90098-200] >>> 2024-09-10 08:14:18,647 WARN [122@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-litigation-web-ka-deploy-prod. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,644 WARN [121@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,660 ERROR [121@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been >>> closed [90098-200] >>> 2024-09-10 08:14:18,665 WARN [122@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-onescreen-cdc-deploy-uat. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,686 WARN [114@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,696 WARN [119@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: collections-recovery-web-ui-deploy-qa. Possible >>> Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do >>> not match between configuration and build-cause. >>> 2024-09-10 08:14:18,686 ERROR [114@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been >>> closed [90098-200] >>> 2024-09-10 08:14:18,706 WARN [122@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,706 ERROR [122@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been >>> closed [90098-200] >>> 2024-09-10 08:14:18,713 WARN [119@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:175 - Error while >>> scheduling pipeline: mercury-bff-report-test-env. Possible Reasons: (1) >>> Upstream pipelines have not been built yet. (2) Materials do not match >>> between configuration and build-cause. >>> 2024-09-10 08:14:18,715 WARN [118@MessageListener for >>> ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, >>> SQLState: 90098 >>> 2024-09-10 08:14:18,719 ERROR [116@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:220 - Error while >>> scheduling pipeline: collections-litigation-cdc-deploy-qa >>> org.springframework.dao.DataAccessResourceFailureException: Hibernate >>> operation: could not execute query; SQL [SELECT materials.id FROM >>> pipelineMaterialRevisions INNER JOIN pipelines ON >>> pipelineMaterialRevisions.pipelineId = pipelines.id INNER JOIN >>> modifications on modifications.id = >>> pipelineMaterialRevisions.torevisionId INNER JOIN materials on >>> modifications.materialId = materials.id WHERE materials.id = ? AND >>> pipelineMaterialRevisions.toRevisionId >= ? AND >>> pipelineMaterialRevisions.fromRevisionId <= ? AND pipelines.name = ? >>> GROUP BY materials.id;]; The database has been closed [90098-200]; >>> nested exception is org.h2.jdbc.JdbcSQLNonTransientConnectionException: The >>> database has been closed [90098-200] >>> at >>> org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:79) >>> at >>> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) >>> at >>> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82) >>> at >>> org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428) >>> at >>> org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:414) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:416) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342) >>> at >>> com.thoughtworks.go.server.persistence.MaterialRepository.hasPipelineEverRunWith(MaterialRepository.java:853) >>> at >>> com.thoughtworks.go.server.materials.MaterialChecker.hasPipelineEverRunWith(MaterialChecker.java:100) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:186) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) >>> at java.base/java.lang.Thread.run(Unknown Source) >>> Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The >>> database has been closed [90098-200] >>> at org.h2.message.DbException.getJdbcSQLException(DbException.java:622) >>> at org.h2.message.DbException.getJdbcSQLException(DbException.java:429) >>> at org.h2.message.DbException.get(DbException.java:205) >>> at org.h2.message.DbException.get(DbException.java:181) >>> at org.h2.message.DbException.get(DbException.java:170) >>> at org.h2.engine.Database.checkPowerOff(Database.java:506) >>> at org.h2.command.Command.executeQuery(Command.java:224) >>> at >>> org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:114) >>> at >>> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) >>> at >>> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) >>> at >>> org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208) >>> at org.hibernate.loader.Loader.getResultSet(Loader.java:1953) >>> at org.hibernate.loader.Loader.doQuery(Loader.java:802) >>> at >>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274) >>> at org.hibernate.loader.Loader.doList(Loader.java:2542) >>> at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276) >>> at org.hibernate.loader.Loader.list(Loader.java:2271) >>> at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:316) >>> at org.hibernate.impl.SessionImpl.listCustomQuery(SessionImpl.java:1842) >>> at >>> org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:165) >>> at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:157) >>> at >>> com.thoughtworks.go.server.persistence.MaterialRepository.lambda$hasPipelineEverRunWith$10(MaterialRepository.java:876) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411) >>> ... 11 common frames omitted >>> 2024-09-10 08:14:18,719 ERROR [121@MessageListener for >>> ScheduleCheckListener] BuildCauseProducerService:220 - Error while >>> scheduling pipeline: ar-loan-infrastructure-rds-snapshot-prod >>> org.springframework.dao.DataAccessResourceFailureException: Hibernate >>> operation: could not execute query; SQL [SELECT materials.id FROM >>> pipelineMaterialRevisions INNER JOIN pipelines ON >>> pipelineMaterialRevisions.pipelineId = pipelines.id INNER JOIN >>> modifications on modifications.id = >>> pipelineMaterialRevisions.torevisionId INNER JOIN materials on >>> modifications.materialId = materials.id WHERE materials.id = ? AND >>> pipelineMaterialRevisions.toRevisionId >= ? AND >>> pipelineMaterialRevisions.fromRevisionId <= ? AND pipelines.name = ? >>> GROUP BY materials.id;]; The database has been closed [90098-200]; >>> nested exception is org.h2.jdbc.JdbcSQLNonTransientConnectionException: The >>> database has been closed [90098-200] >>> at >>> org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:79) >>> at >>> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) >>> at >>> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82) >>> at >>> org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428) >>> at >>> org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:414) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:416) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342) >>> at >>> com.thoughtworks.go.server.persistence.MaterialRepository.hasPipelineEverRunWith(MaterialRepository.java:853) >>> at >>> com.thoughtworks.go.server.materials.MaterialChecker.hasPipelineEverRunWith(MaterialChecker.java:100) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:186) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) >>> at >>> com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) >>> at >>> com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) >>> at >>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) >>> at java.base/java.lang.Thread.run(Unknown Source) >>> Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The >>> database has been closed [90098-200] >>> at org.h2.message.DbException.getJdbcSQLException(DbException.java:622) >>> at org.h2.message.DbException.getJdbcSQLException(DbException.java:429) >>> at org.h2.message.DbException.get(DbException.java:194) >>> at org.h2.engine.Session.getTransaction(Session.java:1792) >>> at >>> org.h2.engine.Session.startStatementWithinTransaction(Session.java:1815) >>> at org.h2.command.Command.executeQuery(Command.java:190) >>> at >>> org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:114) >>> at >>> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) >>> at >>> org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) >>> at >>> org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208) >>> at org.hibernate.loader.Loader.getResultSet(Loader.java:1953) >>> at org.hibernate.loader.Loader.doQuery(Loader.java:802) >>> at >>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274) >>> at org.hibernate.loader.Loader.doList(Loader.java:2542) >>> at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276) >>> at org.hibernate.loader.Loader.list(Loader.java:2271) >>> at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:316) >>> at org.hibernate.impl.SessionImpl.listCustomQuery(SessionImpl.java:1842) >>> at >>> org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:165) >>> at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:157) >>> at >>> com.thoughtworks.go.server.persistence.MaterialRepository.lambda$hasPipelineEverRunWith$10(MaterialRepository.java:876) >>> at >>> org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411) >>> ... 11 common frames omitted >>> >>> Best Regards, >>> Komgrit >>> On Wednesday, September 11, 2024 at 2:11:45 PM UTC+7 Chad Wilson wrote: >>> >>>> Before we get into memory stats, did you look at the logs to see if the >>>> server is being restarted internally as I described below? No point >>>> looking >>>> at memory stats unless we have evidence there is a memory problem. >>>> >>>> You generally cannot use OS-level stats to debug memory usage for Java >>>> applications like GoCD on its own - you need to look at the internal Java >>>> heap used/free stats. You might have available memory at container/host >>>> level, but the JVM is not using it due to the settings, and so you are >>>> still running out of memory. Furthermore, the increase you show is almost >>>> all file buffer/cache rather than application usage. >>>> >>>> If I recall correctly, by default the GoCD server only starts with a >>>> max heap size of 1G which is relatively small for a bigger server with >>>> perhaps hundreds of pipelines, however we should try to find evidence of >>>> that before randomly changing things or going deeper. >>>> >>>> -Chad >>>> >>>> On Wed, Sep 11, 2024 at 12:31 PM Komgrit Aneksri <[email protected]> >>>> wrote: >>>> >>>>> Thank you Chad for help me to investigate and suggest. >>>>> >>>>> Here are more information about my GoCD server resources below. >>>>> >>>>> We are using worker node as c6g.2xlarge. >>>>> >>>>> Currently, CPU usage is 40 - 45 % >>>>> >>>>> After restarted GoCD has memory free around 3GB, Then GoCD server run >>>>> >>>>> After restart >>>>> bash-5.1$ free -m >>>>> total used free shared buff/cache >>>>> available >>>>> Mem: 15678 3037 4500 3 8395 >>>>> 12640 >>>>> Swap: 0 0 0 >>>>> >>>>> A while has passed to now, memory free was reduced to 260MB >>>>> bash-5.1$ free -m >>>>> total used free shared buff/cache >>>>> available >>>>> Mem: 15678 3394 260 3 12278 >>>>> 12283 >>>>> Swap: 0 0 0 >>>>> >>>>> JVM is default setting. >>>>> >>>>> Regards, >>>>> Komgrit >>>>> >>>>> On Wednesday, September 11, 2024 at 9:38:20 AM UTC+7 Chad Wilson wrote: >>>>> >>>>>> If this has never happened before, and only just started happening, >>>>>> then *something* must have changed. Might be worth figuring that out. >>>>>> >>>>>> A database becomes locked like this only when two instances are >>>>>> trying to connect to the same H2 database file, or one crashed somehow >>>>>> without releasing the lock. Probably need to see the full error/stack >>>>>> trace >>>>>> to see the root cause, however usually it's something like "Caused by: >>>>>> java.lang.IllegalStateException: The file is locked: >>>>>> nio:/godata/db/h2db/cruise.mv.db [1.4.200/7]" >>>>>> >>>>>> I suggest you look inside the GoCD server log file more directly, not >>>>>> just k8s stats. GoCD runs as a multi-process container, and has its own >>>>>> process manager (Tanuki Java wrapper) so it is possible that even >>>>>> without >>>>>> Kubernetes showing container or pod restarts that GoCD itself has been >>>>>> restarted by the Tanuki process manager. it will log when it does so. >>>>>> I'd >>>>>> look for when the errors started, and then scroll back through the >>>>>> container logs to see if the process was restarted by Tanuki. It will >>>>>> restart the main JVM if it thinks the main server process is not >>>>>> responding, or due to OOM errors etc. Perhaps the lock is not being >>>>>> released fast enough. Anyway - your root problem may be heap >>>>>> size/memory/CPU constraints rather than the database itself. >>>>>> >>>>>> Even if you use Postgres, if you have cases where there are two GoCD >>>>>> server instances overlapping or trying to share the database file you >>>>>> will >>>>>> have other issues of some sort (due to race conditions) and if you have >>>>>> some other server stability issue causing restarts it's probably wise to >>>>>> understand how it is getting into this state first so you're addressing >>>>>> the >>>>>> right problem. >>>>>> >>>>>> As for migration to Postgres, the docs are at >>>>>> https://github.com/gocd/gocd-database-migrator . There's nothing >>>>>> specific for EKS/Kubernetes however generally speaking you'd need to >>>>>> >>>>>> - prepare your postgres instance per >>>>>> >>>>>> https://docs.gocd.org/current/installation/configuring_database/postgres.html >>>>>> - (when ready to do the "proper" run) stop your GoCD server >>>>>> instance >>>>>> - get your H2 DB file off EFS somewhere to run the migration tool >>>>>> against >>>>>> - run the migrator tool >>>>>> - change GoCD server Helm chart to mount the db.properties that >>>>>> tell it how to connect to Postgres >>>>>> - start the GoCD server instances against postgres >>>>>> >>>>>> >>>>>> -Chad >>>>>> >>>>>> On Wed, Sep 11, 2024 at 9:56 AM Komgrit Aneksri <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I have no any change in configurations. >>>>>>> >>>>>>> But We have add new users and new pipelines every day. >>>>>>> >>>>>>> Pods is still running status and no restart/spawn/evicted. >>>>>>> >>>>>>> If we have to migrate h2 to postgresql. Do you have any migration >>>>>>> documentations for K8s? >>>>>>> >>>>>>> Regards, >>>>>>> Komgrit >>>>>>> On Tuesday, September 10, 2024 at 10:32:31 PM UTC+7 Chad Wilson >>>>>>> wrote: >>>>>>> >>>>>>>> What changed in your setup when this started happening? >>>>>>>> >>>>>>>> Is your GoCD server pod crashing and being automatically restarted? >>>>>>>> Are nodes it is running on dying and the pod being re-scheduled >>>>>>>> elsewhere? >>>>>>>> >>>>>>>> On Tue, 10 Sept 2024, 17:02 Komgrit Aneksri, <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi team, >>>>>>>>> >>>>>>>>> I am facing issue about database. >>>>>>>>> >>>>>>>>> Error message below >>>>>>>>> Could not open JDBC Connection for transaction; nested exception >>>>>>>>> is org.h2.jdbc.JdbcSQLNonTransientConnectionException: Database may >>>>>>>>> be >>>>>>>>> already in use: null. Possible solutions: close all other >>>>>>>>> connection(s); >>>>>>>>> use the server mode [90020-200] >>>>>>>>> >>>>>>>>> Now I did restarted the gocd server then it is back to normal now. >>>>>>>>> >>>>>>>>> I used GoCD version 23.1.0 running on EKS >>>>>>>>> >>>>>>>>> And store files and database (h2) on EFS. >>>>>>>>> >>>>>>>>> I have found this issue 2 times to now (last Thursday and today) >>>>>>>>> >>>>>>>>> Cloud you please help me what should improve for fix this issue? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Komgrit >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "go-cd" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/go-cd/9d1a20c3-b463-4dd3-a376-b2bcd014091cn%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/go-cd/9d1a20c3-b463-4dd3-a376-b2bcd014091cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "go-cd" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/go-cd/30709745-11f5-40dc-a194-0365f39a1c1en%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/go-cd/30709745-11f5-40dc-a194-0365f39a1c1en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "go-cd" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/go-cd/df53a982-2be6-40d4-b64c-2ed551b5a191n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/go-cd/df53a982-2be6-40d4-b64c-2ed551b5a191n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "go-cd" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/go-cd/34ef081a-c453-4687-b923-5a477bc88841n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/go-cd/34ef081a-c453-4687-b923-5a477bc88841n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "go-cd" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/go-cd/CAA1RwH85ZVPpv6m4tNGtBdta79ONS3dE1HD4Jm8WJxr7ShxLYg%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/go-cd/CAA1RwH85ZVPpv6m4tNGtBdta79ONS3dE1HD4Jm8WJxr7ShxLYg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/go-cd/363329d0-0a0a-4e7d-83f5-608e8f59738fn%40googlegroups.com.
