Vyom,

The NPE is originating from the finally block of the LdapClient
`authenticate` method. In the finally block there is an attempt
to restore the "read" timeout, since it was changed earlier in
`authenticate`, to reflect the connect timeout value, since the
subsequent operation(s) are considered part of the connect phase.

An unsolicited message may close the connection during
authenticate, which is does and LdapClient.ldapBind throws
"javax.naming.CommunicationException: Request: 1 cancelled". The
finally block is then executed and the previous exception is
effectively throw away when the NPE is encountered.

If we agree that such behaviour is reasonable ( and I think it
most likely is ), then the finally block needs to be more
defensive - check that conn is not null. It makes no sense to
attempt to reset the read timeout if conn is null.

With the following change, then the CommunicationException will
be thrown from `authenticate`. I think this is the behaviour we
want, no?


--- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
+++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
@@ -297,7 +297,8 @@
             conn.setV3(isLdapv3);
             return res;
         } finally {
-            conn.readTimeout = readTimeout;
+            if (conn != null)
+                conn.readTimeout = readTimeout;
         }
     }


-Chris.


On 07/09/18 17:47, Daniel Fuchs wrote:
Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:
On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here.

I'm referring to this pattern (in forceClose):

             conn.cleanup(null, false);
             conn = null;

             if (cleanPool) {
                 pcb.removePooledConnection(this);
             }

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

As you said server is closing the same connection which is retrieved from pool which  is correct but this should not throw unintentional NPE.

Agreed.

In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE.

LdapClient does not set `Ldapclient.conn` to null if connection is not pooled.

Really? That's not what I see... Am I missing something?

 >> [...]
If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

  I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.

OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...

I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection is not a problem  but setting `LdapClient.conn'  to null asynchronously is causing NPE.

Yes - I agree that setting conn to null is problematic.
I wonder why it's set to null. Maybe in an attempt to
make it eligible for GC earlier?

If that's not the issue, then I would suggest making it
final (AFAICS it's only set to non-null once) and cleaning
up the code that tests for conn == null.

If GC is the issue, then the best you can do is probably
only set it to null when the client is not pooled.

And if we could reduce the time window in which a thread
could get a stale client from the pool that might be good
too?

Is there anyway we could somehow add a test that reproduce
the NPE?

best regards,

-- daniel

[...]

On 24/08/18 11:35, vyom tewari wrote:
Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly making "null" to LdapClient.conn.
[...]

Reply via email to