Here's some more detail.
It looks like the ConcurrentAuthorityFactory#ShutdownHook is not actually
shutting down the TimerThread which is created by Derby from the
LocalDataSource#getConnection method.
Here's the stack trace where the thread is being created all the way down
to Derby:
[image: image.png]
Here's the *ConcurrentAuthorityFactory#ShutdownHook*
/**
* A hook to be executed either when the {@link
ConcurrentAuthorityFactory} is collected by the garbage collector,
* when the Java Virtual Machine is shutdown, or when the module is
uninstalled by the OSGi or Servlet container.
*
* <p><strong>Do not keep reference to the enclosing factory</strong>
- in particular,
* this class must not be static - otherwise the factory would never
been garbage collected.</p>
*/
private static final class ShutdownHook<DAO extends
GeodeticAuthorityFactory> // MUST be static!
extends PhantomReference<ConcurrentAuthorityFactory<DAO>>
implements Disposable, Callable<Object>
{
/**
* The {@link ConcurrentAuthorityFactory#availableDAOs} queue.
*/
private final Deque<DataAccessRef<DAO>> availableDAOs;
/**
* Creates a new shutdown hook for the given factory.
*/
ShutdownHook(final ConcurrentAuthorityFactory<DAO> factory) {
super(factory, ReferenceQueueConsumer.QUEUE);
availableDAOs = factory.availableDAOs;
}
/**
* Invoked indirectly by the garbage collector when the {@link
ConcurrentAuthorityFactory} is disposed.
*/
@Override
public void dispose() {
Shutdown.unregister(this);
try {
call();
} catch (Exception exception) {
/*
* Pretend that the exception is logged by
`ConcurrentAuthorityFactory.dispose()`.
* This is not true, but carries the idea that the error
occurred while cleaning
* ConcurrentAuthorityFactory after garbage collection.
*/
unexpectedException("dispose", exception);
}
}
/**
* Invoked at JVM shutdown time, or when the container (OSGi or
Servlet) uninstall the bundle containing SIS.
*/
@Override
public Object call() throws Exception {
final List<DAO> factories;
synchronized (availableDAOs) {
factories = ConcurrentAuthorityFactory.clear(availableDAOs);
}
// No call to confirmClose(List<DAO>) as we want to force close.
close(factories);
return null;
}
}
When tracing the above code (invoked through *Shutdown.stop*) there doesn't
seem to be any attempt to stop the TimerThread; so it leaks. Is this a
derby bug?
Nicholas Knize, Ph.D., GISP
Chief Technology Officer | Lucenia <https://lucenia.io>
Apache Lucene PMC Member and Committer
[email protected]
On Wed, Dec 4, 2024 at 12:09 PM Nicholas Knize <[email protected]> wrote:
> This is great. Thank you for the detailed information. Here are some
> follow ups:
>
> > All other 6 dependencies (geoapi, sis-utility, etc.) are transitive
> dependencies brought back automatically.
>
> This is true, except we have special gradle plugins that disable
> transitive dependency resolution (`transitive = false`) because of the
> environments we operate in. We can't use gradle's "behind the scenes" to
> silently pull in dependencies w/o explicit consent (think CVE risk
> mitigation).
> My comment was more around the fact that, transitive or not, some
> sensitive environments make it more difficult to bring in a "new" project
> when it has so many dependencies that require special security scanning (a
> concept we put into practice when working on Lucene core).
>
> > Can you try to invoke the following method:
> org.apache.sis.system.Shutdown.stop(YourClassOnlyForLoggingPurpose.class);
> > Do you mean that you already tried above-cited Shutdown.stop method and
> it didn't shutdown the Derby database? If yes, this is a bug.
>
> I tried that method and the derby database does get shutdown but the
> EPSGFactory thread does not. Here is the code:
>
>
> @After
> @Override
> public void tearDown() throws Exception {
>
> Shutdown.stop(ReprojectionProcessorFactoryTests.class);
>
> super.tearDown();
> }
>
> Here is the exception:
>
>
> 1 thread leaked from SUITE scope at io.test.ReprojectionProcessorFactoryTests:
> 1) Thread[id=64, name=Timer-1, state=WAITING,
> group=TGRP-ReprojectionProcessorFactoryTests]
> at java.base/java.lang.Object.wait0(Native Method)
> at java.base/java.lang.Object.wait(Object.java:378)
> at java.base/java.lang.Object.wait(Object.java:352)
> at java.base/java.util.TimerThread.mainLoop(Timer.java:543)
> at java.base/java.util.TimerThread.run(Timer.java:522)
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from
> SUITE scope at io.test.ReprojectionProcessorFactoryTests:
> 1) Thread[id=64, name=Timer-1, state=WAITING,
> group=TGRP-ReprojectionProcessorFactoryTests]
> at java.base/java.lang.Object.wait0(Native Method)
> at java.base/java.lang.Object.wait(Object.java:378)
> at java.base/java.lang.Object.wait(Object.java:352)
> at java.base/java.util.TimerThread.mainLoop(Timer.java:543)
> at java.base/java.util.TimerThread.run(Timer.java:522)
> at __randomizedtesting.SeedInfo.seed([9200700DD47A228]:0)
>
>
> > The embedded Derby database is currently the easiest approach. We could
> provide alternatives if there are easier ones, but I don't know yet which
> one it could be. A SQL database is needed
>
> Thanks for that confirmation. This is where I'm looking at creating my own
> jar'd up Lucene index and using simple Lucene APIs to query the EPSG
> database instead. We did this with maxmind's GeoIP database and it resulted
> in a much smaller database because Lucene can highly compress the data (its
> designed for text search use cases like this after all).
>
> > Indeed, the embedded Derby database targets one specific database
> engine... The rational was that if an application already has a database,
> it would be unfortunate to add another database on top of it.
>
> This makes sense and is definitely reasonable so I'm not knocking the
> approach. We have a use case that's more cloud native, thus needs to be
> lightweight and not carry all the sidecar SQL database overhead.
>
> > But do you have a list of the permissions that are causing problems?
>
> Here are the required permissions:
>
> grant {
> permission java.lang.RuntimePermission "createClassLoader";
> permission java.lang.RuntimePermission "getClassLoader";
> permission java.lang.RuntimePermission "getStackWalkerWithClassReference";
> permission java.lang.RuntimePermission "shutdownHooks";
> permission org.apache.derby.shared.common.security.SystemPermission
> "engine", "usederbyinternals";
> permission java.io.FilePermission "derby.log", "read,write";
> };
>
>
> I wouldn't necessarily classify all of them as "problems" per se. This
> one, though, definitely causes concerns:
>
> ...
> permission java.lang.RuntimePermission "shutdownHooks";
> ...
>
>
> The concern is that we have a "plugin" framework that enables users to
> write extension points. I have to look at limiting this permission to just
> the codebase needing it so users don't write nefarious shutdown hooks in
> their plugin that could kill the process.
>
> > SIS could be used when centimetric precision is desired, or for
> n-dimensional operations.
>
> This is why I'm spending time migrating from Proj4j to SIS as our default
> reprojection provider. In fact, the code I'm showing here is just small
> snippet in a larger SPI framework that enables the user to pick whatever
> Projection library they want ("proj4j", "sis",
> "my-projection-implementation").
>
> If I'm not mistaken, SIS has the ability to support the OGC dynamic CRS
> spec as well?
>
> Thanks again for all the help and information and let me know if you have
> any ideas how to resolve the remaining leaked thread.
>
>
> Nicholas Knize, Ph.D., GISP
> Chief Technology Officer | Lucenia <https://lucenia.io>
> Apache Lucene PMC Member and Committer
> [email protected]
>
>
> On Wed, Dec 4, 2024 at 11:23 AM Martin Desruisseaux <
> [email protected]> wrote:
>
>> Hello Nicholas
>>
>> Le 2024-12-04 à 16 h 31, Nicholas Knize a écrit :
>>
>> > (…snip…) Then the test harness goes to shutdown the derby memory
>> > database and SIS threads and I'm informed of the following leaked
>> threads:
>> >
>> In the list of threads that follow, I can identify the following:
>>
>> * From Apache SIS:
>> o id=54, name=ReferenceQueueConsumer
>> o id=63, name=DelayedExecutor
>> * Apparently from Derby:
>> o id=57, name=derby.rawStoreDaemon
>> * Apparently from Java itself, maybe through the use of java.util.Timer:
>> o id=56, name=Timer-0
>>
>>
>> > (…snip about Derby shutdown…) Here I get the following exception
>> > thrown: java.sql.SQLNonTransientConnectionException: Database
>> > classpath:SIS_DATA/Databases/spatial-metadata' shutdown.
>> >
>> Yes this is normal. Derby shutdown always throws a SQLException as per
>> Derby specification. It sound surprising, but it is specified that way
>> in their method contract.
>>
>>
>> > Which gets me further, but I still have a lingering timer thread that
>> > I can only trace to the *EPSGFactory* class.
>> >
>> Can you try to invoke the following method:
>>
>>
>> org.apache.sis.system.Shutdown.stop(YourClassOnlyForLoggingPurpose.class);
>>
>> The above is not in public API, and its intend was to be invoked from
>> Servlet container or OSGi. There is classes below starting to do that in
>> an incubator module. They may be used as a source of inspiration:
>>
>>
>> https://github.com/apache/sis/tree/main/incubator/src/org.apache.sis.webapp/main/org/apache/sis/services
>>
>>
>> > Do I really need this derby shutdown gymnastics? If so, why doesn't
>> > SIS Shutdown class handle all this for me?
>> >
>> Do you mean that you already tried above-cited Shutdown.stop method and
>> it didn't shutdown the Derby database? If yes, this is a bug.
>>
>>
>> > 2. How can I gracefully shutdown the EPSGFactory timer thread? Are
>> > there more code acrobats needed to add a special Shutdown hook to
>> > handle this zombie thread?
>> >
>> Same as above. If the above-cited Shutdown.stop method was tried and
>> didn't worked, this is a bug that I will investigate.
>>
>>
>> > (…snip…) Is there a more simple approach to using the EPSG database
>> > that I'm missing
>> >
>> The embedded Derby database is currently the easiest approach. We could
>> provide alternatives if there are easier ones, but I don't know yet
>> which one it could be. A SQL database is needed (a list of CRS
>> definitions in a properties file is not sufficient), because the EPSG
>> registry is not only about CRS definitions. EPSG is also about
>> coordinate operations, and SIS refers to that database extensively every
>> times that a transform between a pair of CRS is requested.
>>
>>
>> > so I'm considering just investing the time in creating a jar that
>> > contains a simple Lucene index with the entire EPSG database so I can
>> > query it in a disconnected deployment without having to carry this
>> > Derby in memory SQL DB ball and chain around.
>> >
>> Indeed, the embedded Derby database targets one specific database
>> engine, and then the `sis-epsg` module provides the script for creating
>> the database in other environments. The rational was that if an
>> application already has a database, it would be unfortunate to add
>> another database on top of it. The `sis-epsg` module is aimed to allow
>> developers to use their own database engine. It could have been Lucene,
>> but can Lucene execute SQL scripts?
>>
>>
>> > *Here are some suggestions:*
>> >
>> > (…snip…) There are quite a few security permissions needed just to get
>> > SIS working for a "simple" use case of reprojecting a coordinate to
>> WGS84.
>> >
>> In a previous version, we were providing a security policy file, and SIS
>> also had some `catch (SecurityException e)` statements for execution in
>> security-constrained environment. Those statements have been removed in
>> SIS 1.4 (I do not remember if they were present in SIS 1.3). I way try
>> to check tomorrow. But do you have a list of the permissions that are
>> causing problems?
>>
>>
>> > Same goes for the *eight dependencies* needed for a "simple"
>> reprojection.
>> >
>> We already plan to avoid the Jakarta dependency (or make it optional),
>> but it may take a year.
>>
>> For the OpenGIS GeoAPI dependency, actually it could be used at Lucene's
>> advantage. That dependency is aimed to provide an implementation-neutral
>> API for coordinate operations, like JDBC provides an
>> implementation-neutral API for databases. If the code can use only
>> org.opengis interfaces as much as possible, it should be possible to not
>> depend explicitly on SIS, but give the choice to users. Proj4J do not
>> implement those interfaces yet, but it would be easy to do. Proj4J has
>> limited capabilities, but it may be sufficient for 2-dimensional
>> operations when errors up to 3 kilometers does not matter. SIS could be
>> used when centimetric precision is desired, or for n-dimensional
>> operations.
>>
>> Another part of the dependency is the metadata package, which is indeed
>> as large as the referencing part. But it would be worth to put some
>> parts of it to user's attention. In particular, the following code:
>>
>> return CRS.findOperation(fromCRS, toCRS, bbox).getMathTransform();
>>
>> Should not invoke getMathTransform() so fast and paid some attention to
>> the CoordinateOperation object. In particular, the domain of validity
>> and the accuracy should, if possible, be reported to the user in some way.
>>
>> Martin
>>
>>