No problem!

Perhaps what is happening is that after the OOM has occurred, it is being
detected by the "wrapper" process shipped with GoCD with and the Java
process is being restarted. If so, you should be able to see that
auto-restart happening in the logs, including the go-server-wrapper.log.

Normally I'd expect that to ensure the file lock is cleanly released before
the new process tries to acquire it, but perhaps it is not shutting down
cleanly, or there is something unusual about the file system.

If it keeps happening, in addition to lsof, perhaps lslocks (or looking at
/proc/locks) is of use to see which PIDs have locked which files more
explicitly.

In any case, fixing the OOMs is probably a good idea regardless of this, so
it seems sensible to try with a larger heap to address that issue first.

-Chad


On Fri, May 20, 2022 at 3:39 PM AquilaNiger <[email protected]> wrote:

> Hi Chad,
>
> thanks for your support.
>
>>
>>    - Does the server recover after some time, or you need to restart
>>    GoCD or take some other action to fix it?
>>
>> No it does not recover, I have to restart GoCD.
>
>>
>>    - How are you running GoCD? i.e in which environment? Container?
>>    Standard server?
>>
>> Standard Server on a Ubuntu 18.04.6 LTS virtual machine
>
>>
>>    - Is your DB file on some kind of network mount or something like
>>    that?
>>
>> No, it isn't.
>
>>
>>    - Is there a way to verify there aren't multiple processes/GoCD
>>    Instances trying to access the file?
>>       - when it happens, are you able to use OS-level commands such as
>>       lsof to see if other/multiple processes have handles on the DB file
>>       (depends on whether storage is local)
>>
>> Currently this happens only once in a while (last time there were 6 days
> between the database issues). lsof is a good idea! I'll try that the next
> time it happens.
>
>>
>>    - Would be good to confirm you don't see GoCD crashing or getting
>>    auto-restarted in your logs to rule out GoCD itself having a different
>>    problem, and then this problem is being caused by a zombie GoCD process or
>>    some kind of stale lock which takes time to expire.
>>
>> Actually, we found out with yesterdays go-server.log that the root cause
> seems to be Out of Memory of Java:
> 2022-05-18 10:24:18,741 WARN [105@MessageListener for WorkFinder]
> BasicDataSource:58 - An internal object pool swallowed an Exception.
> org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been
> closed; SQL statement: ROLLBACK [90098-200]
> ...
> 2022-05-18 10:24:18,742 WARN [105@MessageListener for WorkFinder]
> BasicDataSource:58 - An internal object pool swallowed an Exception.
> org.h2.jdbc.JdbcSQLNonTransientConnectionException: Out of memory.
> [90108-200]
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
> Actually we haven't touched heap size of GoCD up to now. Therefore, we
> increased it now in wrapper-properties.conf and hope that the error will be
> gone. I hope that does not only defer the error to some days later.
>
>>
>>    - Do you have any overrides to DB configuration, e.g a custom
>>    *config/db.properties* file?
>>
>> No.
>
>
>> To answer your question on the trace files, I think you get two files
>> when the main trace file reaches an H2-configured maximum size. I ask the
>> above question on DB properties as I think GoCD sets that to 16MB by
>> default, whereas yours seems to have got to 64MB which seems curious.
>>
>
> Thanks, that explains a lot. You're right, the "old" file contains
> timestamps from 3:16 to 6:38 and the new one from 6:38 to 8:16.
>
>
>> There is a way to change the locking approach H2 uses
>> <https://www.h2database.com/html/advanced.html#file_locking_protocols>
>> (back to the older ;FILE_LOCK=FS - which creates the stale
>> cruise.lock.db you have in your screenshot) if the issue is with the
>> filesystem, however I imagine you'd want to rule out multiple processes or
>> some other issue first.
>>
>
> Thanks for the hint, I'll keep that in mind as "last resort".
>
> Thank you for your support. I think we'll wait now if the error occurs
> again.
>
> Julia
>
> --
> You received this message because you are subscribed to the Google Groups
> "go-cd" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com
> <https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/CAA1RwH8zpw4Q3Z-sCskh7fk%3De%3DsXad9ZCmst6U5WRo%2BcaiP--Q%40mail.gmail.com.

Reply via email to