When the issue happened (10:24) wrapper log only shows:
INFO | jvm 3 | 2022/05/18 10:23:44 | May 18, 2022 10:23:44 AM
jakarta.mail.Session loadResource
INFO | jvm 3 | 2022/05/18 10:23:44 | WARNING: expected resource not
found: /META-INF/javamail.default.address.map
INFO | jvm 3 | 2022/05/18 10:24:06 | May 18, 2022 10:24:05 AM
jakarta.mail.Session loadResource
INFO | jvm 3 | 2022/05/18 10:24:06 | WARNING: expected resource not
found: /META-INF/javamail.default.address.map
INFO | jvm 3 | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc]
qtp1718515850-41: Retried waiting for GCLocker too often allocating 256
words
INFO | jvm 3 | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc]
105@MessageListener for WorkFinder: Retried waiting for GCLocker too often
allocating 386 words
INFO | jvm 3 | 2022/05/18 10:29:42 | May 18, 2022 10:29:42 AM
jakarta.mail.Session loadResource
INFO | jvm 3 | 2022/05/18 10:29:42 | WARNING: expected resource not
found: /META-INF/javamail.default.address.map
Having a closer look at the log showed that the OOM situation already
happened one day earlier than expected. On 05/17 at round about 11:30 the
wrapper-log shows the out of memory error messages:
2022-05-17 11:30:39,871 ERROR [104@MessageListener for WorkFinder]
JMSMessageListenerAdapter:87 - Exception thrown in message handling by
listener com.thoughtworks.go.server.messaging.scheduling.WorkFinder@4cf95d4c
org.springframework.transaction.TransactionSystemException: Could not roll
back JDBC transaction; nested exception is
org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been
closed [90098-200]
at
org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:329)
at
org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:857)
at
org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:834)
at
org.springframework.transaction.support.TransactionTemplate.rollbackOnException(TransactionTemplate.java:164)
at
org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
at
com.thoughtworks.go.server.transaction.TransactionTemplate.execute(TransactionTemplate.java:28)
at
com.thoughtworks.go.server.service.ScheduleService.rescheduleAbandonedBuildIfNecessary(ScheduleService.java:640)
at
com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:183)
at
com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:165)
at
com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:60)
at
com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:32)
at
com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83)
at
com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database
has been closed [90098-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
at org.h2.message.DbException.get(DbException.java:194)
at org.h2.engine.Session.getTransaction(Session.java:1792)
at org.h2.engine.Session.getStatementSavepoint(Session.java:1804)
at org.h2.engine.Session.setSavepoint(Session.java:915)
at org.h2.command.Command.executeUpdate(Command.java:244)
at org.h2.jdbc.JdbcConnection.rollbackInternal(JdbcConnection.java:1530)
at org.h2.jdbc.JdbcConnection.rollback(JdbcConnection.java:561)
at
org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
at
org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
at
org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:326)
... 13 common frames omitted
2022-05-17 11:30:54,167 INFO [WrapperJarAppMain] Jetty9Server:199 -
Configuring Jetty using /etc/go/jetty.xml
2022-05-17 11:30:54,236 WARN [WrapperJarAppMain] Server:357 -
ErrorPageMapper not supported for Server level Error Handling
2022-05-17 11:30:54,372 WARN [WrapperJarAppMain] AbstractHandler:96 - No
Server set for ResourceHandler@51be8b61{STOPPED}
2022-05-17 11:30:58,763 WARN [WrapperJarAppMain] ConnectionManager:117 -
The file /etc/go/db.properties specified by `go.db.config` does not exist.
2022-05-17 11:30:59,154 INFO [WrapperJarAppMain] DatabaseMigrator:40 -
Upgrading database, this might take a while depending on the size of the
database.
2022-05-17 11:30:59,154 INFO [WrapperJarAppMain] DatabaseMigrator:49 -
************************************************************************
2022-05-17 11:30:59,155 INFO [WrapperJarAppMain] DatabaseMigrator:49 -
WARNING: Shutting down your server at this point will lead to a database
corruption. Please wait until the database upgrade completes.
2022-05-17 11:30:59,155 INFO [WrapperJarAppMain] DatabaseMigrator:49 -
************************************************************************
2022-05-17 11:31:01,356 INFO [WrapperJarAppMain] DatabaseMigrator:57 -
Database upgrade completed successfully.
2022-05-17 11:31:01,357 INFO [WrapperJarAppMain] DataMigrationRunner:34 -
Running data migrations...
2022-05-17 11:31:01,388 INFO [WrapperJarAppMain] DataMigrationRunner:49 -
Data migration took 3 ms
2022-05-17 11:31:01,391 INFO [WrapperJarAppMain] DataMigrationRunner:49 -
Data migration took 0 ms
2022-05-17 11:31:01,391 INFO [WrapperJarAppMain] DataMigrationRunner:39 -
Data migrations completed.
2022-05-17 11:31:02,206 WARN [WrapperJarAppMain] ConfigurationFactory:136
- No configuration found. Configuring ehcache from ehcache-failsafe.xml
found in the classpath:
jar:file:/var/lib/go-server/work/jetty-0_0_0_0-8153-cruise_war-_go-any-/webapp/WEB-INF/lib/ehcache-2.10.9.2.jar!/ehcache-failsafe.xml
2022-05-17 11:31:02,383 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration [com.thoughtworks.go.domain.AccessToken];
using defaults.
2022-05-17 11:31:02,436 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration
[com.thoughtworks.go.domain.NotificationFilter]; using defaults.
2022-05-17 11:31:02,538 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration [com.thoughtworks.go.domain.User]; using
defaults.
2022-05-17 11:31:02,560 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration [com.thoughtworks.go.domain.Plugin]; using
defaults.
2022-05-17 11:31:02,570 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration
[com.thoughtworks.go.domain.EnvironmentVariable]; using defaults.
2022-05-17 11:31:02,705 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration
[com.thoughtworks.go.domain.User.notificationFilters]; using defaults.
2022-05-17 11:31:02,789 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration [org.hibernate.cache.UpdateTimestampsCache];
using defaults.
2022-05-17 11:31:02,826 WARN [WrapperJarAppMain] EhCacheProvider:93 -
Could not find configuration [org.hibernate.cache.StandardQueryCache];
using defaults.
2022-05-17 11:31:04,815 WARN [WrapperJarAppMain] BrokerService:2163 -
Temporary Store limit is 51200 mb (current store usage is 0 mb). The data
directory: /var/lib/go-server only has 45488 mb of usable space. -
resetting to maximum available disk space: 45488 mb
2022-05-17 11:31:04,961 INFO [WrapperJarAppMain] ConnectionManager:98 -
Done loading query extensions, found
com.thoughtworks.go.server.database.h2.H2QueryExtensions@6a25c389
2022-05-17 11:31:08,045 INFO [WrapperJarAppMain] GoConfigMigration:93 -
Upgrading config file from version 139 to version 139
2022-05-17 11:31:08,097 INFO [WrapperJarAppMain] GoConfigMigration:101 -
Finished upgrading config file
2022-05-17 11:31:08,097 INFO [WrapperJarAppMain] GoConfigMigrator:106 -
[Config Save] Starting Config Save post upgrade using
FullConfigSaveNormalFlow
2022-05-17 11:31:08,183 INFO [Thread-76] DefaultPluginJarChangeListener:67
- Plugin load starting:
/var/lib/go-server/plugins/bundled/gocd-ldap-authentication-plugin.jar
Afterwards it seems the server is running as usual, until the OOM and
"locked" issue is occurring 23 hours later. So I can imagine that you are
right and it is doing some "unclean restart". I suppose that GoCD is
innocent, since you can never plan how to act when out of memory occurs,
that's unpredictable. I assume that raising the size will solve the issue.
Thanks again for all the support, this is really a great community.
Chad Wilson schrieb am Freitag, 20. Mai 2022 um 10:38:37 UTC+2:
> No problem!
>
> Perhaps what is happening is that after the OOM has occurred, it is being
> detected by the "wrapper" process shipped with GoCD with and the Java
> process is being restarted. If so, you should be able to see that
> auto-restart happening in the logs, including the go-server-wrapper.log.
>
> Normally I'd expect that to ensure the file lock is cleanly released
> before the new process tries to acquire it, but perhaps it is not shutting
> down cleanly, or there is something unusual about the file system.
>
> If it keeps happening, in addition to lsof, perhaps lslocks (or looking
> at /proc/locks) is of use to see which PIDs have locked which files more
> explicitly.
>
> In any case, fixing the OOMs is probably a good idea regardless of this,
> so it seems sensible to try with a larger heap to address that issue first.
>
> -Chad
>
>
> On Fri, May 20, 2022 at 3:39 PM AquilaNiger <[email protected]> wrote:
>
>> Hi Chad,
>>
>> thanks for your support.
>>
>>>
>>> - Does the server recover after some time, or you need to restart
>>> GoCD or take some other action to fix it?
>>>
>>> No it does not recover, I have to restart GoCD.
>>
>>>
>>> - How are you running GoCD? i.e in which environment? Container?
>>> Standard server?
>>>
>>> Standard Server on a Ubuntu 18.04.6 LTS virtual machine
>>
>>>
>>> - Is your DB file on some kind of network mount or something like
>>> that?
>>>
>>> No, it isn't.
>>
>>>
>>> - Is there a way to verify there aren't multiple processes/GoCD
>>> Instances trying to access the file?
>>> - when it happens, are you able to use OS-level commands such as
>>> lsof to see if other/multiple processes have handles on the DB file
>>> (depends on whether storage is local)
>>>
>>> Currently this happens only once in a while (last time there were 6 days
>> between the database issues). lsof is a good idea! I'll try that the next
>> time it happens.
>>
>>>
>>> - Would be good to confirm you don't see GoCD crashing or getting
>>> auto-restarted in your logs to rule out GoCD itself having a different
>>> problem, and then this problem is being caused by a zombie GoCD process
>>> or
>>> some kind of stale lock which takes time to expire.
>>>
>>> Actually, we found out with yesterdays go-server.log that the root cause
>> seems to be Out of Memory of Java:
>> 2022-05-18 10:24:18,741 WARN [105@MessageListener for WorkFinder]
>> BasicDataSource:58 - An internal object pool swallowed an Exception.
>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been
>> closed; SQL statement: ROLLBACK [90098-200]
>> ...
>> 2022-05-18 10:24:18,742 WARN [105@MessageListener for WorkFinder]
>> BasicDataSource:58 - An internal object pool swallowed an Exception.
>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: Out of memory.
>> [90108-200]
>> ...
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>
>> Actually we haven't touched heap size of GoCD up to now. Therefore, we
>> increased it now in wrapper-properties.conf and hope that the error will be
>> gone. I hope that does not only defer the error to some days later.
>>
>>>
>>> - Do you have any overrides to DB configuration, e.g a custom
>>> *config/db.properties* file?
>>>
>>> No.
>>
>>
>>> To answer your question on the trace files, I think you get two files
>>> when the main trace file reaches an H2-configured maximum size. I ask the
>>> above question on DB properties as I think GoCD sets that to 16MB by
>>> default, whereas yours seems to have got to 64MB which seems curious.
>>>
>>
>> Thanks, that explains a lot. You're right, the "old" file contains
>> timestamps from 3:16 to 6:38 and the new one from 6:38 to 8:16.
>>
>>
>>> There is a way to change the locking approach H2 uses
>>> <https://www.h2database.com/html/advanced.html#file_locking_protocols>
>>> (back to the older ;FILE_LOCK=FS - which creates the stale
>>> cruise.lock.db you have in your screenshot) if the issue is with the
>>> filesystem, however I imagine you'd want to rule out multiple processes or
>>> some other issue first.
>>>
>>
>> Thanks for the hint, I'll keep that in mind as "last resort".
>>
>> Thank you for your support. I think we'll wait now if the error occurs
>> again.
>>
>> Julia
>>
>> --
>>
> You received this message because you are subscribed to the Google Groups
>> "go-cd" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/go-cd/c6286df3-b0d0-4772-9f65-7fa95793bd9bn%40googlegroups.com.