The warnings on materials not matching and upstream pipelines may be able to be ignored. Not relevant to this. Similar with the maximum backtracking limit (unrelated problem to this issue).
But 1. is the GoCD server process being restarted after these "The database has been closed" errors? *Before* you then start seeing the locking errors? (e.g do you see it logging from something like Jetty9Server:199 - Configuring Jetty using /etc/go/jetty.xml again, which only happens at ) 2. Are there other errors before "The database has been closed [90098-200]" if you go back further looking for stack traces or "Out of memory"? There are other threads which describe very similar problems, and similar to them you probably need to keep finding your root cause: https://groups.google.com/g/go-cd/c/KPyCqTpxS-k/m/61Ps4wHvDQAJ https://groups.google.com/g/go-cd/c/4yuK8dx8m-Q/m/dpre3JAhAgAJ Note that the users both traced back H2DB issues to "Out of memory" errors. Switching to Postgres is unlikely to fix memory problems, which is why it's important to eliminate this, in my opinion. -Chad On Thu, Sep 12, 2024 at 4:08 PM Komgrit Aneksri <[email protected]> wrote: > Thank you Chad, > > I dig into GoCD Server logs before DB locked. > > *I always found many ERROR messages below.* > > 2024-09-10 08:14:09,413 WARN [118@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: ar-eod-service-deploy-prod. Possible Reasons: (1) > Upstream pipelines have not been built yet. (2) Materials do not match > between configuration and build-cause. > 2024-09-10 08:14:09,416 WARN [120@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: thinslice-eligibility-service-deploy-nonProd. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:09,437 WARN [122@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: mercury-bff-order-prod. Possible Reasons: (1) Upstream > pipelines have not been built yet. (2) Materials do not match between > configuration and build-cause. > 2024-09-10 08:14:09,441 WARN [121@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: digital-help-ios-payment-publish-6.11. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:09,446 WARN [121@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-litigation-web-keyfong-ka-deploy-prod. > Possible Reasons: (1) Upstream pipelines have not been built yet. (2) > Materials do not match between configuration and build-cause. > 2024-09-10 08:14:09,447 WARN [113@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: sonarqube-venus-backend. Possible Reasons: (1) > Upstream pipelines have not been built yet. (2) Materials do not match > between configuration and build-cause. > 2024-09-10 08:14:09,449 WARN [118@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-enforcement-bff-deploy-qa. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:09,451 WARN [122@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: ar-eod-cdc-deploy-prod. Possible Reasons: (1) Upstream > pipelines have not been built yet. (2) Materials do not match between > configuration and build-cause. > 2024-09-10 08:14:09,472 WARN [114@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: deep-product-deploy-prod. Possible Reasons: (1) > Upstream pipelines have not been built yet. (2) Materials do not match > between configuration and build-cause. > 2024-09-10 08:14:09,480 WARN [121@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: portal-goloyalty-digital-campaign-prod-deployment. > Possible Reasons: (1) Upstream pipelines have not been built yet. (2) > Materials do not match between configuration and build-cause. > 2024-09-10 08:14:09,503 WARN [117@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-frontend-notification-service-deploy-qa. > Possible Reasons: (1) Upstream pipelines have not been built yet. (2) > Materials do not match between configuration and build-cause. > 2024-09-10 08:14:09,512 ERROR [117@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:220 - Error while > scheduling pipeline: ar-loan-job-deploy-prod > com.thoughtworks.go.server.service.dd.MaxBackTrackLimitReachedException: > Maximum Backtracking limit reached while trying to resolve revisions for > material > DependencyMaterialConfig{pipelineName='ar-loan-job-deploy-nonProd', > stageName='Deployment-uat'} > at > com.thoughtworks.go.server.service.dd.DependencyFanInNode.hasMoreInstances(DependencyFanInNode.java:233) > at > com.thoughtworks.go.server.service.dd.DependencyFanInNode.fillNextRevisions(DependencyFanInNode.java:122) > at > com.thoughtworks.go.server.service.dd.DependencyFanInNode.handleNeedMoreRevisions(DependencyFanInNode.java:83) > at > com.thoughtworks.go.server.service.dd.DependencyFanInNode.initRevision(DependencyFanInNode.java:75) > at > com.thoughtworks.go.server.service.dd.DependencyFanInNode.populateRevisions(DependencyFanInNode.java:61) > at > com.thoughtworks.go.server.service.dd.FanInGraph.initChildren(FanInGraph.java:311) > at > com.thoughtworks.go.server.service.dd.FanInGraph.computeRevisions(FanInGraph.java:174) > at > com.thoughtworks.go.server.service.PipelineService.getRevisionsBasedOnDependencies(PipelineService.java:219) > at com.thoughtworks.go.server.service.AutoBuild.fanInOn(AutoBuild.java:108) > at > com.thoughtworks.go.server.service.AutoBuild.onModifications(AutoBuild.java:67) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:191) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) > at java.base/java.lang.Thread.run(Unknown Source) > > > *And When DB locked, There were messages below* > > 2024-09-10 08:14:18,425 INFO [qtp1814840342-32487118] Stage:236 - Stage > is being completed by transition id: 2129759 > 2024-09-10 08:14:18,623 WARN [121@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: application-domain-security-group-prod. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,626 WARN [117@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-enforcement-bff-deploy-prod. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,632 WARN [118@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-enforcement-service-deploy-qa. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,637 WARN [115@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-litigation-ka-bff-deploy-prod. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,641 WARN [120@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-recovery-work-list-deploy-qa. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,644 WARN [116@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,644 WARN [115@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,660 ERROR [115@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been > closed [90098-200] > 2024-09-10 08:14:18,658 ERROR [116@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been > closed [90098-200] > 2024-09-10 08:14:18,647 WARN [122@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-litigation-web-ka-deploy-prod. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,644 WARN [121@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,660 ERROR [121@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been > closed [90098-200] > 2024-09-10 08:14:18,665 WARN [122@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-onescreen-cdc-deploy-uat. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,686 WARN [114@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,696 WARN [119@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: collections-recovery-web-ui-deploy-qa. Possible > Reasons: (1) Upstream pipelines have not been built yet. (2) Materials do > not match between configuration and build-cause. > 2024-09-10 08:14:18,686 ERROR [114@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been > closed [90098-200] > 2024-09-10 08:14:18,706 WARN [122@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,706 ERROR [122@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:234 - The database has been > closed [90098-200] > 2024-09-10 08:14:18,713 WARN [119@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:175 - Error while > scheduling pipeline: mercury-bff-report-test-env. Possible Reasons: (1) > Upstream pipelines have not been built yet. (2) Materials do not match > between configuration and build-cause. > 2024-09-10 08:14:18,715 WARN [118@MessageListener for > ScheduleCheckListener] JDBCExceptionReporter:233 - SQL Error: 90098, > SQLState: 90098 > 2024-09-10 08:14:18,719 ERROR [116@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:220 - Error while > scheduling pipeline: collections-litigation-cdc-deploy-qa > org.springframework.dao.DataAccessResourceFailureException: Hibernate > operation: could not execute query; SQL [SELECT materials.id FROM > pipelineMaterialRevisions INNER JOIN pipelines ON > pipelineMaterialRevisions.pipelineId = pipelines.id INNER JOIN > modifications on modifications.id = > pipelineMaterialRevisions.torevisionId INNER JOIN materials on > modifications.materialId = materials.id WHERE materials.id = ? AND > pipelineMaterialRevisions.toRevisionId >= ? AND > pipelineMaterialRevisions.fromRevisionId <= ? AND pipelines.name = ? > GROUP BY materials.id;]; The database has been closed [90098-200]; nested > exception is org.h2.jdbc.JdbcSQLNonTransientConnectionException: The > database has been closed [90098-200] > at > org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:79) > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82) > at > org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428) > at > org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:414) > at > org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:416) > at > org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342) > at > com.thoughtworks.go.server.persistence.MaterialRepository.hasPipelineEverRunWith(MaterialRepository.java:853) > at > com.thoughtworks.go.server.materials.MaterialChecker.hasPipelineEverRunWith(MaterialChecker.java:100) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:186) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The > database has been closed [90098-200] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:622) > at org.h2.message.DbException.getJdbcSQLException(DbException.java:429) > at org.h2.message.DbException.get(DbException.java:205) > at org.h2.message.DbException.get(DbException.java:181) > at org.h2.message.DbException.get(DbException.java:170) > at org.h2.engine.Database.checkPowerOff(Database.java:506) > at org.h2.command.Command.executeQuery(Command.java:224) > at > org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:114) > at > org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) > at > org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) > at > org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208) > at org.hibernate.loader.Loader.getResultSet(Loader.java:1953) > at org.hibernate.loader.Loader.doQuery(Loader.java:802) > at > org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274) > at org.hibernate.loader.Loader.doList(Loader.java:2542) > at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276) > at org.hibernate.loader.Loader.list(Loader.java:2271) > at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:316) > at org.hibernate.impl.SessionImpl.listCustomQuery(SessionImpl.java:1842) > at > org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:165) > at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:157) > at > com.thoughtworks.go.server.persistence.MaterialRepository.lambda$hasPipelineEverRunWith$10(MaterialRepository.java:876) > at > org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411) > ... 11 common frames omitted > 2024-09-10 08:14:18,719 ERROR [121@MessageListener for > ScheduleCheckListener] BuildCauseProducerService:220 - Error while > scheduling pipeline: ar-loan-infrastructure-rds-snapshot-prod > org.springframework.dao.DataAccessResourceFailureException: Hibernate > operation: could not execute query; SQL [SELECT materials.id FROM > pipelineMaterialRevisions INNER JOIN pipelines ON > pipelineMaterialRevisions.pipelineId = pipelines.id INNER JOIN > modifications on modifications.id = > pipelineMaterialRevisions.torevisionId INNER JOIN materials on > modifications.materialId = materials.id WHERE materials.id = ? AND > pipelineMaterialRevisions.toRevisionId >= ? AND > pipelineMaterialRevisions.fromRevisionId <= ? AND pipelines.name = ? > GROUP BY materials.id;]; The database has been closed [90098-200]; nested > exception is org.h2.jdbc.JdbcSQLNonTransientConnectionException: The > database has been closed [90098-200] > at > org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:79) > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82) > at > org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428) > at > org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:414) > at > org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:416) > at > org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342) > at > com.thoughtworks.go.server.persistence.MaterialRepository.hasPipelineEverRunWith(MaterialRepository.java:853) > at > com.thoughtworks.go.server.materials.MaterialChecker.hasPipelineEverRunWith(MaterialChecker.java:100) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:186) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.newProduceBuildCause(BuildCauseProducerService.java:148) > at > com.thoughtworks.go.server.scheduling.BuildCauseProducerService.autoSchedulePipeline(BuildCauseProducerService.java:110) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:44) > at > com.thoughtworks.go.server.scheduling.ScheduleCheckListener.onMessage(ScheduleCheckListener.java:24) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The > database has been closed [90098-200] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:622) > at org.h2.message.DbException.getJdbcSQLException(DbException.java:429) > at org.h2.message.DbException.get(DbException.java:194) > at org.h2.engine.Session.getTransaction(Session.java:1792) > at org.h2.engine.Session.startStatementWithinTransaction(Session.java:1815) > at org.h2.command.Command.executeQuery(Command.java:190) > at > org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:114) > at > org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) > at > org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122) > at > org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208) > at org.hibernate.loader.Loader.getResultSet(Loader.java:1953) > at org.hibernate.loader.Loader.doQuery(Loader.java:802) > at > org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274) > at org.hibernate.loader.Loader.doList(Loader.java:2542) > at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276) > at org.hibernate.loader.Loader.list(Loader.java:2271) > at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:316) > at org.hibernate.impl.SessionImpl.listCustomQuery(SessionImpl.java:1842) > at > org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:165) > at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:157) > at > com.thoughtworks.go.server.persistence.MaterialRepository.lambda$hasPipelineEverRunWith$10(MaterialRepository.java:876) > at > org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411) > ... 11 common frames omitted > > Best Regards, > Komgrit > On Wednesday, September 11, 2024 at 2:11:45 PM UTC+7 Chad Wilson wrote: > >> Before we get into memory stats, did you look at the logs to see if the >> server is being restarted internally as I described below? No point looking >> at memory stats unless we have evidence there is a memory problem. >> >> You generally cannot use OS-level stats to debug memory usage for Java >> applications like GoCD on its own - you need to look at the internal Java >> heap used/free stats. You might have available memory at container/host >> level, but the JVM is not using it due to the settings, and so you are >> still running out of memory. Furthermore, the increase you show is almost >> all file buffer/cache rather than application usage. >> >> If I recall correctly, by default the GoCD server only starts with a max >> heap size of 1G which is relatively small for a bigger server with perhaps >> hundreds of pipelines, however we should try to find evidence of that >> before randomly changing things or going deeper. >> >> -Chad >> >> On Wed, Sep 11, 2024 at 12:31 PM Komgrit Aneksri <[email protected]> >> wrote: >> >>> Thank you Chad for help me to investigate and suggest. >>> >>> Here are more information about my GoCD server resources below. >>> >>> We are using worker node as c6g.2xlarge. >>> >>> Currently, CPU usage is 40 - 45 % >>> >>> After restarted GoCD has memory free around 3GB, Then GoCD server run >>> >>> After restart >>> bash-5.1$ free -m >>> total used free shared buff/cache >>> available >>> Mem: 15678 3037 4500 3 8395 >>> 12640 >>> Swap: 0 0 0 >>> >>> A while has passed to now, memory free was reduced to 260MB >>> bash-5.1$ free -m >>> total used free shared buff/cache >>> available >>> Mem: 15678 3394 260 3 12278 >>> 12283 >>> Swap: 0 0 0 >>> >>> JVM is default setting. >>> >>> Regards, >>> Komgrit >>> >>> On Wednesday, September 11, 2024 at 9:38:20 AM UTC+7 Chad Wilson wrote: >>> >>>> If this has never happened before, and only just started happening, >>>> then *something* must have changed. Might be worth figuring that out. >>>> >>>> A database becomes locked like this only when two instances are trying >>>> to connect to the same H2 database file, or one crashed somehow without >>>> releasing the lock. Probably need to see the full error/stack trace to see >>>> the root cause, however usually it's something like "Caused by: >>>> java.lang.IllegalStateException: The file is locked: >>>> nio:/godata/db/h2db/cruise.mv.db [1.4.200/7]" >>>> >>>> I suggest you look inside the GoCD server log file more directly, not >>>> just k8s stats. GoCD runs as a multi-process container, and has its own >>>> process manager (Tanuki Java wrapper) so it is possible that even without >>>> Kubernetes showing container or pod restarts that GoCD itself has been >>>> restarted by the Tanuki process manager. it will log when it does so. I'd >>>> look for when the errors started, and then scroll back through the >>>> container logs to see if the process was restarted by Tanuki. It will >>>> restart the main JVM if it thinks the main server process is not >>>> responding, or due to OOM errors etc. Perhaps the lock is not being >>>> released fast enough. Anyway - your root problem may be heap >>>> size/memory/CPU constraints rather than the database itself. >>>> >>>> Even if you use Postgres, if you have cases where there are two GoCD >>>> server instances overlapping or trying to share the database file you will >>>> have other issues of some sort (due to race conditions) and if you have >>>> some other server stability issue causing restarts it's probably wise to >>>> understand how it is getting into this state first so you're addressing the >>>> right problem. >>>> >>>> As for migration to Postgres, the docs are at >>>> https://github.com/gocd/gocd-database-migrator . There's nothing >>>> specific for EKS/Kubernetes however generally speaking you'd need to >>>> >>>> - prepare your postgres instance per >>>> >>>> https://docs.gocd.org/current/installation/configuring_database/postgres.html >>>> - (when ready to do the "proper" run) stop your GoCD server instance >>>> - get your H2 DB file off EFS somewhere to run the migration tool >>>> against >>>> - run the migrator tool >>>> - change GoCD server Helm chart to mount the db.properties that >>>> tell it how to connect to Postgres >>>> - start the GoCD server instances against postgres >>>> >>>> >>>> -Chad >>>> >>>> On Wed, Sep 11, 2024 at 9:56 AM Komgrit Aneksri <[email protected]> >>>> wrote: >>>> >>>>> I have no any change in configurations. >>>>> >>>>> But We have add new users and new pipelines every day. >>>>> >>>>> Pods is still running status and no restart/spawn/evicted. >>>>> >>>>> If we have to migrate h2 to postgresql. Do you have any migration >>>>> documentations for K8s? >>>>> >>>>> Regards, >>>>> Komgrit >>>>> On Tuesday, September 10, 2024 at 10:32:31 PM UTC+7 Chad Wilson wrote: >>>>> >>>>>> What changed in your setup when this started happening? >>>>>> >>>>>> Is your GoCD server pod crashing and being automatically restarted? >>>>>> Are nodes it is running on dying and the pod being re-scheduled >>>>>> elsewhere? >>>>>> >>>>>> On Tue, 10 Sept 2024, 17:02 Komgrit Aneksri, <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi team, >>>>>>> >>>>>>> I am facing issue about database. >>>>>>> >>>>>>> Error message below >>>>>>> Could not open JDBC Connection for transaction; nested exception is >>>>>>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: Database may be >>>>>>> already >>>>>>> in use: null. Possible solutions: close all other connection(s); use the >>>>>>> server mode [90020-200] >>>>>>> >>>>>>> Now I did restarted the gocd server then it is back to normal now. >>>>>>> >>>>>>> I used GoCD version 23.1.0 running on EKS >>>>>>> >>>>>>> And store files and database (h2) on EFS. >>>>>>> >>>>>>> I have found this issue 2 times to now (last Thursday and today) >>>>>>> >>>>>>> Cloud you please help me what should improve for fix this issue? >>>>>>> >>>>>>> Best regards, >>>>>>> Komgrit >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "go-cd" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/go-cd/9d1a20c3-b463-4dd3-a376-b2bcd014091cn%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/go-cd/9d1a20c3-b463-4dd3-a376-b2bcd014091cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "go-cd" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/go-cd/30709745-11f5-40dc-a194-0365f39a1c1en%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/go-cd/30709745-11f5-40dc-a194-0365f39a1c1en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "go-cd" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/go-cd/df53a982-2be6-40d4-b64c-2ed551b5a191n%40googlegroups.com >>> <https://groups.google.com/d/msgid/go-cd/df53a982-2be6-40d4-b64c-2ed551b5a191n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/go-cd/34ef081a-c453-4687-b923-5a477bc88841n%40googlegroups.com > <https://groups.google.com/d/msgid/go-cd/34ef081a-c453-4687-b923-5a477bc88841n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAA1RwH85ZVPpv6m4tNGtBdta79ONS3dE1HD4Jm8WJxr7ShxLYg%40mail.gmail.com.
