> On Wed, Oct 14, 2020 at 6:32 PM Michael Osipov <micha...@apache.org> wrote: > > > Am 2020-10-14 um 12:32 schrieb R=C3=A9my Maucherat: > > > On Tue, Oct 13, 2020 at 8:27 PM Michael Osipov <micha...@apache.org> > > wrote: > > > > > >> Am 2020-10-13 um 16:05 schrieb r...@apache.org: > > >>> This is an automated email from the ASF dual-hosted git repository. > > >>> > > >>> remm pushed a commit to branch 8.5.x > > >>> in repository https://gitbox.apache.org/repos/asf/tomcat.git > > >>> > > >>> > > >>> The following commit(s) were added to refs/heads/8.5.x by this push: > > >>> new 883600b Always retry on a new connection, even when pooli= > ng > > >>> 883600b is described below > > >>> > > >>> commit 883600b8a77c0be93b3a8dc2404e8d1110970ee7 > > >>> Author: remm <r...@apache.org> > > >>> AuthorDate: Tue Oct 13 14:19:54 2020 +0200 > > >>> > > >>> Always retry on a new connection, even when pooling > > >>> > > >>> This keeps the same very simple design as for the single > > connection > > >>> scenario, for now. > > >>> --- > > >>> java/org/apache/catalina/realm/JNDIRealm.java | 22 > > >> +++++++++++++++++++--- > > >>> webapps/docs/changelog.xml | 5 +++++ > > >>> 2 files changed, 24 insertions(+), 3 deletions(-) > > >>> > > >>> diff --git a/java/org/apache/catalina/realm/JNDIRealm.java > > >> b/java/org/apache/catalina/realm/JNDIRealm.java > > >>> index 72087ab..98007f7 100644 > > >>> --- a/java/org/apache/catalina/realm/JNDIRealm.java > > >>> +++ b/java/org/apache/catalina/realm/JNDIRealm.java > > >>> @@ -1311,7 +1311,7 @@ public class JNDIRealm extends RealmBase { > > >>> close(connection); > > >>> > > >>> // open a new directory context. > > >>> - connection =3D get(); > > >>> + connection =3D get(true); > > >>> > > >>> // Try the authentication again. > > >>> principal =3D authenticate(connection, username, > > >> credentials); > > >>> @@ -2389,12 +2389,28 @@ public class JNDIRealm extends RealmBase { > > >>> * @exception NamingException if a directory server error occu= > rs > > >>> */ > > >>> protected JNDIConnection get() throws NamingException { > > >>> + return get(false); > > >>> + } > > >>> + > > >>> + /** > > >>> + * Open (if necessary) and return a connection to the configured > > >>> + * directory server for this Realm. > > >>> + * @param create when pooling, this forces creation of a new > > >> connection, > > >>> + * for example after an error > > >>> + * @return the connection > > >>> + * @exception NamingException if a directory server error occurs > > >>> + */ > > >>> + protected JNDIConnection get(boolean create) throws > > NamingException > > >> { > > >>> JNDIConnection connection =3D null; > > >>> // Use the pool if available, otherwise use the single > > >> connection > > >>> if (connectionPool !=3D null) { > > >>> - connection =3D connectionPool.pop(); > > >>> - if (connection =3D=3D null) { > > >>> + if (create) { > > >>> connection =3D create(); > > >>> + } else { > > >>> + connection =3D connectionPool.pop(); > > >>> + if (connection =3D=3D null) { > > >>> + connection =3D create(); > > >>> + } > > >>> } > > >>> } else { > > >>> singleConnectionLock.lock(); > > >> > > >> That suitable and simple approach. > > >> > > > > > > If you have the code for adding a max idle check on hand and tested, yo= > u > > > should add it IMO, it will be more efficient. > > > > I will need to give this a couple more weeks of testing. This is what I > > have observed today: > > > 2020-10-14T16:01:47.039 147.54.155.198 w99sezx0... "GET > > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 92132 > > > > > > 20609 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Acquiring > > directory server connection from pool > > > 20610 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Directory serve= > r > > connection from pool exceeds max idle time, closing it > > > 20611 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.close Closing directory > > server connection > > > 20612 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open Opening new > > directory server connection > > > 20613 2020-10-14T16:01:47 FEIN [https-openssl-apr-8444-exec-166] > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getUser Searching for > > username 'w99sezx0' in base ... > > > > As you can see it took 90 seconds to server the request because the > > connection has expired and close took way too long. In average the > > request takes: > > > 2020-10-14T13:57:06.730 10.81.50.232 osipovmi@... "GET > > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 70 > > > > when the connection is healthy. > > > > Ok, so there's some real incentive to avoid reusing a connection that was > idle for too long.
I made further analysis. I was partially wrong about my statement. The cause for the 90 s was a faulty KDC which did not respond in time for a service ticket. Java Kerberos does 3 retries with 30 s timeout. I have set to 1 retry and 1000 ms wait until the next KDC will be tried. Anyways, I am still convinced that some idle timeout is the right choice to avoid resource depletion on both sides. If hundreds of clients keep tens of connections open to a directory server for no reaon sooner or later everything will stall. I will let you know in a couple of weeks. But related to this, I have made a very interesting observation when the webapp is shutdown: > 2020-10-15T14:24:15.120 WARNUNG [deblndw024v...-startStop-2] > org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The > web application [x2tc-proxy-dsv] appears to have started a thread named > [Thread-7] but has failed to stop it. This is very likely to create a memory > leak. Stack trace of thread: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:502) > com.sun.jndi.ldap.Connection.pauseReader(Connection.java:804) > com.sun.jndi.ldap.Connection.run(Connection.java:944) > java.lang.Thread.run(Thread.java:748) When you look how LdapCtx#close() is implemented, you'll see that it holds a reference count to all naming enumrations. As long as they aren't really closed the connection is not closed immediately. I need to analyze why this is happening. The result could be stopping and starting an application could drown the entire VM sooner or later. Michael --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org