On Thu, Sep 26, 2024 at 11:50 AM Rémy Maucherat <r...@apache.org> wrote:
>
> On Thu, Sep 26, 2024 at 11:18 AM <build...@apache.org> wrote:
> >
> > Build status: BUILD FAILED: failed compile (failure)
> > Worker used: bb_worker2_ubuntu
> > URL: https://ci2.apache.org/#builders/120/builds/81
> > Blamelist: remm <r...@apache.org>
> > Build Text: failed compile (failure)
> > Status Detected: new failure
> > Build Source Stamp: [branch main] 36c226f2141573a841b6cd4d7ad8e8b582080510
>
> Certificate reloading, plus the use of the FFM version of the
> implementation in parallel, causes a reliable crash on shutdown for
> tomcat-native/JNI. I haven't been able to find out yet what is wrong
> (certainly this kind of use is inappropriate, you would use either
> tomcat-native or FFM, not both).

The crash scenario seems to be like this:
- The OpenSSL/JNI test runs, everything is good.
- During the test, a certificate reload is done, leaving a dangling
SSL_CTX from OpenSSL, associated with an OpenSSL context object.
SSL_CTX_free is run whenever GC occurs through a cleaner.
- Library.terminate() is called since Tomcat is destroyed at the end
of the test.
- GC is independent from that and the JVM is still running, so the
OpenSSL context object is still there. No crash (yet), but the JVM
crash is now unavoidable.
- *In the same JVM*, the OpenSSL/FFM test now starts.
- The test runs fine, does its own certificate reloading, leaving
another OpenSSL unrefed context object.
- JVM sees some objects to GC, GC finally runs in parallel with
everything (it would not be fun otherwise).
- GC cleans up the OpenSSL context from the OpenSSL/JNI test run.
However, since APR has been released in terminateAPR, this crashes the
JVM in some random location (OpenSSL, APR, etc) but always while
running the org.apache.tomcat.jni.SSLContext.free method.

It seems to be fixed if I add a System.gc() in
AprLifecycleListener.terminateAPR(), which would cause the
unreferenced OpenSSL contexts to go through their SSL_CTX_free
immediately. Fingers crossed ... This seems to be an acceptable
compromise to be able to run the testsuite efficiently.

Rémy

> Rémy
>
> >
> > Steps:
> >
> >   worker_preparation: 0
> >
> >   git: 0
> >
> >   shell: 0
> >
> >   shell_1: 0
> >
> >   shell_2: 0
> >
> >   shell_3: 0
> >
> >   shell_4: 0
> >
> >   shell_5: 0
> >
> >   shell_6: 0
> >
> >   compile: 1
> >
> >   shell_7: 0
> >
> >   shell_8: 0
> >
> >   shell_9: 0
> >
> >   shell_10: 0
> >
> >   Rsync docs to nightlies.apache.org: 0
> >
> >   shell_11: 0
> >
> >   Rsync RAT to nightlies.apache.org: 0
> >
> >   compile_1: 2
> >
> >   shell_12: 0
> >
> >   Rsync Logs to nightlies.apache.org: 0
> >
> >
> > -- ASF Buildbot
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: dev-h...@tomcat.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to