> On Wed, Oct 14, 2020 at 6:32 PM Michael Osipov <micha...@apache.org> wrote:
>
> > Am 2020-10-14 um 12:32 schrieb R=C3=A9my Maucherat:
> > > On Tue, Oct 13, 2020 at 8:27 PM Michael Osipov <micha...@apache.org>
> > wrote:
> > >
> > >> Am 2020-10-13 um 16:05 schrieb r...@apache.org:
> > >>> This is an automated email from the ASF dual-hosted git repository.
> > >>>
> > >>> remm pushed a commit to branch 8.5.x
> > >>> in repository https://gitbox.apache.org/repos/asf/tomcat.git
> > >>>
> > >>>
> > >>> The following commit(s) were added to refs/heads/8.5.x by this push:
> > >>>        new 883600b  Always retry on a new connection, even when pooli=
> ng
> > >>> 883600b is described below
> > >>>
> > >>> commit 883600b8a77c0be93b3a8dc2404e8d1110970ee7
> > >>> Author: remm <r...@apache.org>
> > >>> AuthorDate: Tue Oct 13 14:19:54 2020 +0200
> > >>>
> > >>>       Always retry on a new connection, even when pooling
> > >>>
> > >>>       This keeps the same very simple design as for the single
> > connection
> > >>>       scenario, for now.
> > >>> ---
> > >>>    java/org/apache/catalina/realm/JNDIRealm.java | 22
> > >> +++++++++++++++++++---
> > >>>    webapps/docs/changelog.xml                    |  5 +++++
> > >>>    2 files changed, 24 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/java/org/apache/catalina/realm/JNDIRealm.java
> > >> b/java/org/apache/catalina/realm/JNDIRealm.java
> > >>> index 72087ab..98007f7 100644
> > >>> --- a/java/org/apache/catalina/realm/JNDIRealm.java
> > >>> +++ b/java/org/apache/catalina/realm/JNDIRealm.java
> > >>> @@ -1311,7 +1311,7 @@ public class JNDIRealm extends RealmBase {
> > >>>                    close(connection);
> > >>>
> > >>>                    // open a new directory context.
> > >>> -                connection =3D get();
> > >>> +                connection =3D get(true);
> > >>>
> > >>>                    // Try the authentication again.
> > >>>                    principal =3D authenticate(connection, username,
> > >> credentials);
> > >>> @@ -2389,12 +2389,28 @@ public class JNDIRealm extends RealmBase {
> > >>>         * @exception NamingException if a directory server error occu=
> rs
> > >>>         */
> > >>>        protected JNDIConnection get() throws NamingException {
> > >>> +        return get(false);
> > >>> +    }
> > >>> +
> > >>> +    /**
> > >>> +     * Open (if necessary) and return a connection to the configured
> > >>> +     * directory server for this Realm.
> > >>> +     * @param create when pooling, this forces creation of a new
> > >> connection,
> > >>> +     *   for example after an error
> > >>> +     * @return the connection
> > >>> +     * @exception NamingException if a directory server error occurs
> > >>> +     */
> > >>> +    protected JNDIConnection get(boolean create) throws
> > NamingException
> > >> {
> > >>>            JNDIConnection connection =3D null;
> > >>>            // Use the pool if available, otherwise use the single
> > >> connection
> > >>>            if (connectionPool !=3D null) {
> > >>> -            connection =3D connectionPool.pop();
> > >>> -            if (connection =3D=3D null) {
> > >>> +            if (create) {
> > >>>                    connection =3D create();
> > >>> +            } else {
> > >>> +                connection =3D connectionPool.pop();
> > >>> +                if (connection =3D=3D null) {
> > >>> +                    connection =3D create();
> > >>> +                }
> > >>>                }
> > >>>            } else {
> > >>>                singleConnectionLock.lock();
> > >>
> > >> That suitable and simple approach.
> > >>
> > >
> > > If you have the code for adding a max idle check on hand and tested, yo=
> u
> > > should add it IMO, it will be more efficient.
> >
> > I will need to give this a couple more weeks of testing. This is what I
> > have observed today:
> > > 2020-10-14T16:01:47.039 147.54.155.198 w99sezx0... "GET
> > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 92132
> > >
> > > 20609 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166]
> > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Acquiring
> > directory server connection from pool
> > > 20610 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166]
> > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Directory serve=
> r
> > connection from pool exceeds max idle time, closing it
> > > 20611 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166]
> > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.close Closing directory
> > server connection
> > > 20612 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166]
> > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open Opening new
> > directory server connection
> > > 20613 2020-10-14T16:01:47 FEIN [https-openssl-apr-8444-exec-166]
> > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getUser Searching for
> > username 'w99sezx0' in base ...
> >
> > As you can see it took 90 seconds to server the request because the
> > connection has expired and close took way too long. In average the
> > request takes:
> > > 2020-10-14T13:57:06.730 10.81.50.232 osipovmi@... "GET
> > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 70
> >
> > when the connection is healthy.
> >
>
> Ok, so there's some real incentive to avoid reusing a connection that was
> idle for too long.

I made further analysis. I was partially wrong about my statement. The cause
for the 90 s was a faulty KDC which did not respond in time for a service
ticket. Java Kerberos does 3 retries with 30 s timeout. I have set to 1 retry
and 1000 ms wait until the next KDC will be tried.

Anyways, I am still convinced that some idle timeout is the right choice to
avoid resource depletion on both sides. If hundreds of clients keep tens of
connections open to a directory server for no reaon sooner or later everything
will stall. I will let you know in a couple of weeks.

But related to this, I have made a very interesting observation when the webapp
is shutdown:
> 2020-10-15T14:24:15.120 WARNUNG [deblndw024v...-startStop-2] 
> org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The 
> web application [x2tc-proxy-dsv] appears to have started a thread named 
> [Thread-7] but has failed to stop it. This is very likely to create a memory 
> leak. Stack trace of thread:
>  java.lang.Object.wait(Native Method)
>  java.lang.Object.wait(Object.java:502)
>  com.sun.jndi.ldap.Connection.pauseReader(Connection.java:804)
>  com.sun.jndi.ldap.Connection.run(Connection.java:944)
>  java.lang.Thread.run(Thread.java:748)

When you look how LdapCtx#close() is implemented, you'll see that it holds a
reference count to all naming enumrations. As long as they aren't really closed
the connection is not closed immediately. I need to analyze why this is
happening. The result could be stopping and starting an application could drown
the entire VM sooner or later.

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to