Here's some more detail.

It looks like the ConcurrentAuthorityFactory#ShutdownHook is not actually
shutting down the TimerThread which is created by Derby from the
LocalDataSource#getConnection method.

Here's the stack trace where the thread is being created all the way down
to Derby:

[image: image.png]

Here's the *ConcurrentAuthorityFactory#ShutdownHook*

/**
 * A hook to be executed either when the {@link
ConcurrentAuthorityFactory} is collected by the garbage collector,
 * when the Java Virtual Machine is shutdown, or when the module is
uninstalled by the OSGi or Servlet container.
 *
 * <p><strong>Do not keep reference to the enclosing factory</strong>
- in particular,
 * this class must not be static - otherwise the factory would never
been garbage collected.</p>
 */
private static final class ShutdownHook<DAO extends
GeodeticAuthorityFactory>           // MUST be static!
        extends PhantomReference<ConcurrentAuthorityFactory<DAO>>
implements Disposable, Callable<Object>
{
    /**
     * The {@link ConcurrentAuthorityFactory#availableDAOs} queue.
     */
    private final Deque<DataAccessRef<DAO>> availableDAOs;

    /**
     * Creates a new shutdown hook for the given factory.
     */
    ShutdownHook(final ConcurrentAuthorityFactory<DAO> factory) {
        super(factory, ReferenceQueueConsumer.QUEUE);
        availableDAOs = factory.availableDAOs;
    }

    /**
     * Invoked indirectly by the garbage collector when the {@link
ConcurrentAuthorityFactory} is disposed.
     */
    @Override
    public void dispose() {
        Shutdown.unregister(this);
        try {
            call();
        } catch (Exception exception) {
            /*
             * Pretend that the exception is logged by
`ConcurrentAuthorityFactory.dispose()`.
             * This is not true, but carries the idea that the error
occurred while cleaning
             * ConcurrentAuthorityFactory after garbage collection.
             */
            unexpectedException("dispose", exception);
        }
    }

    /**
     * Invoked at JVM shutdown time, or when the container (OSGi or
Servlet) uninstall the bundle containing SIS.
     */
    @Override
    public Object call() throws Exception {
        final List<DAO> factories;
        synchronized (availableDAOs) {
            factories = ConcurrentAuthorityFactory.clear(availableDAOs);
        }
        // No call to confirmClose(List<DAO>) as we want to force close.
        close(factories);
        return null;
    }
}


When tracing the above code (invoked through *Shutdown.stop*) there doesn't
seem to be any attempt to stop the TimerThread; so it leaks. Is this a
derby bug?

Nicholas Knize, Ph.D., GISP
Chief Technology Officer  |  Lucenia <https://lucenia.io>
Apache Lucene PMC Member and Committer
[email protected]


On Wed, Dec 4, 2024 at 12:09 PM Nicholas Knize <[email protected]> wrote:

> This is great. Thank you for the detailed information. Here are some
> follow ups:
>
> > All other 6 dependencies (geoapi, sis-utility, etc.) are transitive
> dependencies brought back automatically.
>
> This is true, except we have special gradle plugins that disable
> transitive dependency resolution  (`transitive = false`) because of the
> environments we operate in. We can't use gradle's "behind the scenes" to
> silently pull in dependencies w/o explicit consent (think CVE risk
> mitigation).
> My comment was more around the fact that, transitive or not, some
> sensitive environments make it more difficult to bring in a "new" project
> when it has so many dependencies that require special security scanning (a
> concept we put into practice when working on Lucene core).
>
> > Can you try to invoke the following method:
> org.apache.sis.system.Shutdown.stop(YourClassOnlyForLoggingPurpose.class);
> > Do you mean that you already tried above-cited Shutdown.stop method and
> it didn't shutdown the Derby database? If yes, this is a bug.
>
> I tried that method and the derby database does get shutdown but the
> EPSGFactory thread does not. Here is the code:
>
>
> @After
> @Override
> public void tearDown() throws Exception {
>
>     Shutdown.stop(ReprojectionProcessorFactoryTests.class);
>
>     super.tearDown();
> }
>
> Here is the exception:
>
>
> 1 thread leaked from SUITE scope at io.test.ReprojectionProcessorFactoryTests:
>    1) Thread[id=64, name=Timer-1, state=WAITING, 
> group=TGRP-ReprojectionProcessorFactoryTests]
>         at java.base/java.lang.Object.wait0(Native Method)
>         at java.base/java.lang.Object.wait(Object.java:378)
>         at java.base/java.lang.Object.wait(Object.java:352)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:543)
>         at java.base/java.util.TimerThread.run(Timer.java:522)
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from 
> SUITE scope at io.test.ReprojectionProcessorFactoryTests:
>    1) Thread[id=64, name=Timer-1, state=WAITING, 
> group=TGRP-ReprojectionProcessorFactoryTests]
>         at java.base/java.lang.Object.wait0(Native Method)
>         at java.base/java.lang.Object.wait(Object.java:378)
>         at java.base/java.lang.Object.wait(Object.java:352)
>         at java.base/java.util.TimerThread.mainLoop(Timer.java:543)
>         at java.base/java.util.TimerThread.run(Timer.java:522)
>       at __randomizedtesting.SeedInfo.seed([9200700DD47A228]:0)
>
>
> > The embedded Derby database is currently the easiest approach. We could
> provide alternatives if there are easier ones, but I don't know yet which
> one it could be. A SQL database is needed
>
> Thanks for that confirmation. This is where I'm looking at creating my own
> jar'd up Lucene index and using simple Lucene APIs to query the EPSG
> database instead. We did this with maxmind's GeoIP database and it resulted
> in a much smaller database because Lucene can highly compress the data (its
> designed for text search use cases like this after all).
>
> >  Indeed, the embedded Derby database targets one specific database
> engine... The rational was that if an application already has a database,
> it would be unfortunate to add another database on top of it.
>
> This makes sense and is definitely reasonable so I'm not knocking the
> approach. We have a use case that's more cloud native, thus needs to be
> lightweight and not carry all the sidecar SQL database overhead.
>
> > But do you have a list of the permissions that are causing problems?
>
> Here are the required permissions:
>
> grant {
>   permission java.lang.RuntimePermission "createClassLoader";
>   permission java.lang.RuntimePermission "getClassLoader";
>   permission java.lang.RuntimePermission "getStackWalkerWithClassReference";
>   permission java.lang.RuntimePermission "shutdownHooks";
>   permission org.apache.derby.shared.common.security.SystemPermission 
> "engine", "usederbyinternals";
>   permission java.io.FilePermission "derby.log", "read,write";
> };
>
>
> I wouldn't necessarily classify all of them as "problems" per se. This
> one, though, definitely causes concerns:
>
> ...
>   permission java.lang.RuntimePermission "shutdownHooks";
> ...
>
>
> The concern is that we have a "plugin" framework that enables users to
> write extension points. I have to look at limiting this permission to just
> the codebase needing it so users don't write nefarious shutdown hooks in
> their plugin that could kill the process.
>
> > SIS could be used when centimetric precision is desired, or for
> n-dimensional operations.
>
> This is why I'm spending time migrating from Proj4j to SIS as our default
> reprojection provider. In fact, the code I'm showing here is just small
> snippet in a larger SPI framework that enables the user to pick whatever
> Projection library they want ("proj4j", "sis",
> "my-projection-implementation").
>
> If I'm not mistaken, SIS has the ability to support the OGC dynamic CRS
> spec as well?
>
> Thanks again for all the help and information and let me know if you have
> any ideas how to resolve the remaining leaked thread.
>
>
> Nicholas Knize, Ph.D., GISP
> Chief Technology Officer  |  Lucenia <https://lucenia.io>
> Apache Lucene PMC Member and Committer
> [email protected]
>
>
> On Wed, Dec 4, 2024 at 11:23 AM Martin Desruisseaux <
> [email protected]> wrote:
>
>> Hello Nicholas
>>
>> Le 2024-12-04 à 16 h 31, Nicholas Knize a écrit :
>>
>> > (…snip…) Then the test harness goes to shutdown the derby memory
>> > database and SIS threads and I'm informed of the following leaked
>> threads:
>> >
>> In the list of threads that follow, I can identify the following:
>>
>>    * From Apache SIS:
>>       o id=54, name=ReferenceQueueConsumer
>>       o id=63, name=DelayedExecutor
>>   * Apparently from Derby:
>>       o id=57, name=derby.rawStoreDaemon
>>   * Apparently from Java itself, maybe through the use of java.util.Timer:
>>       o id=56, name=Timer-0
>>
>>
>> > (…snip about Derby shutdown…) Here I get the following exception
>> > thrown: java.sql.SQLNonTransientConnectionException: Database
>> > classpath:SIS_DATA/Databases/spatial-metadata' shutdown.
>> >
>> Yes this is normal. Derby shutdown always throws a SQLException as per
>> Derby specification. It sound surprising, but it is specified that way
>> in their method contract.
>>
>>
>> > Which gets me further, but I still have a lingering timer thread that
>> > I can only trace to the *EPSGFactory* class.
>> >
>> Can you try to invoke the following method:
>>
>>
>> org.apache.sis.system.Shutdown.stop(YourClassOnlyForLoggingPurpose.class);
>>
>> The above is not in public API, and its intend was to be invoked from
>> Servlet container or OSGi. There is classes below starting to do that in
>> an incubator module. They may be used as a source of inspiration:
>>
>>
>> https://github.com/apache/sis/tree/main/incubator/src/org.apache.sis.webapp/main/org/apache/sis/services
>>
>>
>> > Do I really need this derby shutdown gymnastics? If so, why doesn't
>> > SIS Shutdown class handle all this for me?
>> >
>> Do you mean that you already tried above-cited Shutdown.stop method and
>> it didn't shutdown the Derby database? If yes, this is a bug.
>>
>>
>> > 2. How can I gracefully shutdown the EPSGFactory timer thread? Are
>> > there more code acrobats needed to add a special Shutdown hook to
>> > handle this zombie thread?
>> >
>> Same as above. If the above-cited Shutdown.stop method was tried and
>> didn't worked, this is a bug that I will investigate.
>>
>>
>> > (…snip…) Is there a more simple approach to using the EPSG database
>> > that I'm missing
>> >
>> The embedded Derby database is currently the easiest approach. We could
>> provide alternatives if there are easier ones, but I don't know yet
>> which one it could be. A SQL database is needed (a list of CRS
>> definitions in a properties file is not sufficient), because the EPSG
>> registry is not only about CRS definitions. EPSG is also about
>> coordinate operations, and SIS refers to that database extensively every
>> times that a transform between a pair of CRS is requested.
>>
>>
>> > so I'm considering just investing the time in creating a jar that
>> > contains a simple Lucene index with the entire EPSG database so I can
>> > query it in a disconnected deployment without having to carry this
>> > Derby in memory SQL DB ball and chain around.
>> >
>> Indeed, the embedded Derby database targets one specific database
>> engine, and then the `sis-epsg` module provides the script for creating
>> the database in other environments. The rational was that if an
>> application already has a database, it would be unfortunate to add
>> another database on top of it. The `sis-epsg` module is aimed to allow
>> developers to use their own database engine. It could have been Lucene,
>> but can Lucene execute SQL scripts?
>>
>>
>> > *Here are some suggestions:*
>> >
>> > (…snip…) There are quite a few security permissions needed just to get
>> > SIS working for a "simple" use case of reprojecting a coordinate to
>> WGS84.
>> >
>> In a previous version, we were providing a security policy file, and SIS
>> also had some `catch (SecurityException e)` statements for execution in
>> security-constrained environment. Those statements have been removed in
>> SIS 1.4 (I do not remember if they were present in SIS 1.3). I way try
>> to check tomorrow. But do you have a list of the permissions that are
>> causing problems?
>>
>>
>> > Same goes for the *eight dependencies* needed for a "simple"
>> reprojection.
>> >
>> We already plan to avoid the Jakarta dependency (or make it optional),
>> but it may take a year.
>>
>> For the OpenGIS GeoAPI dependency, actually it could be used at Lucene's
>> advantage. That dependency is aimed to provide an implementation-neutral
>> API for coordinate operations, like JDBC provides an
>> implementation-neutral API for databases. If the code can use only
>> org.opengis interfaces as much as possible, it should be possible to not
>> depend explicitly on SIS, but give the choice to users. Proj4J do not
>> implement those interfaces yet, but it would be easy to do. Proj4J has
>> limited capabilities, but it may be sufficient for 2-dimensional
>> operations when errors up to 3 kilometers does not matter. SIS could be
>> used when centimetric precision is desired, or for n-dimensional
>> operations.
>>
>> Another part of the dependency is the metadata package, which is indeed
>> as large as the referencing part. But it would be worth to put some
>> parts of it to user's attention. In particular, the following code:
>>
>>     return CRS.findOperation(fromCRS, toCRS, bbox).getMathTransform();
>>
>> Should not invoke getMathTransform() so fast and paid some attention to
>> the CoordinateOperation object. In particular, the domain of validity
>> and the accuracy should, if possible, be reported to the user in some way.
>>
>>      Martin
>>
>>

Reply via email to