Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
On 11/09/2018 12:15, Chris Hegarty wrote: On 11 Sep 2018, at 09:50, vyom tewari wrote: Hi Chris,Daniel, Please find the latest patch( http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ). Thanks Vyom, Reviewed. Agreed. best regards, -- daniel -Chris.
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
> On 11 Sep 2018, at 09:50, vyom tewari wrote: > > Hi Chris,Daniel, > > Please find the latest patch( > http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ). Thanks Vyom, Reviewed. -Chris.
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi Chris,Daniel, Please find the latest patch( http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ). I did the additional cleanup(removed redundant null check) as you suggested. Thanks, Vyom On Monday 10 September 2018 09:03 PM, Daniel Fuchs wrote: On 10/09/2018 15:53, vyom tewari wrote: Yes , this will definitely solve this NPE issue but we may hit NPE other places as well because we are setting 'LdapClient.com' to null asynchronously. I agree with Vyom here. Other solutions that have been investigated - such as only setting the connection to null when its ref count reaches zero now seem to complex (read: too risky because over the complexity threshold) to me. Maybe the best thing to do here is to stick to Vyom original's idea, but declare conn final and do all the necessary (small) cleanup that goes with it? Possibly also add a comment saying that CommunicationException will be thrown if some other thread uses the LdapClient after forceClose has been closed... best regards, -- daniel
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Vyom, You are correct, the change that I proposed is minimal and some fragility still remains. I talked with Daniel, walked through the code one more time, and I am now convinced that your original proposal is correct ( since the pooling is at the level of LdapClient, rather than at the level of Connection ). Can I also suggest: 1) Remove the null assignments, rather commenting them out. 2) Make Connection `conn` final. 3) Remove all null checks for conn ( since it can no longer be null ). -Chris. On 10/09/18 15:53, vyom tewari wrote: On Monday 10 September 2018 05:30 PM, Chris Hegarty wrote: Vyom, The NPE is originating from the finally block of the LdapClient `authenticate` method. In the finally block there is an attempt to restore the "read" timeout, since it was changed earlier in `authenticate`, to reflect the connect timeout value, since the subsequent operation(s) are considered part of the connect phase. An unsolicited message may close the connection during authenticate, which is does and LdapClient.ldapBind throws "javax.naming.CommunicationException: Request: 1 cancelled". The finally block is then executed and the previous exception is effectively throw away when the NPE is encountered. If we agree that such behaviour is reasonable ( and I think it most likely is ), then the finally block needs to be more defensive - check that conn is not null. It makes no sense to attempt to reset the read timeout if conn is null. With the following change, then the CommunicationException will be thrown from `authenticate`. I think this is the behaviour we want, no? Yes , this will definitely solve this NPE issue but we may hit NPE other places as well because we are setting 'LdapClient.com' to null asynchronously. Vyom --- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java +++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java @@ -297,7 +297,8 @@ conn.setV3(isLdapv3); return res; } finally { - conn.readTimeout = readTimeout; + if (conn != null) + conn.readTimeout = readTimeout; } } -Chris. On 07/09/18 17:47, Daniel Fuchs wrote: Hi Vyom, Please see inline... I've elided some parts... On 07/09/2018 10:11, vyom tewari wrote: On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote: So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. I checked the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here. I'm referring to this pattern (in forceClose): conn.cleanup(null, false); conn = null; if (cleanPool) { pcb.removePooledConnection(this); } The connection is cleaned up, then nulled, then only removed from the pool. I'm questioning whether the first thing to do would be to remove the connection from the pool, to reduce the time window in which the client could be retrieved from the pool by another thread while forceClose is in process. I am not sure what side effect this could have - as I haven't studied the code that retrieves the client from the pool, but I'm wondering whether that's worth exploring. As you said server is closing the same connection which is retrieved from pool which is correct but this should not throw unintentional NPE. Agreed. In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE. LdapClient does not set `Ldapclient.conn` to null if connection is not pooled. Really? That's not what I see... Am I missing something? >> [...] If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE. OK - that's good to know. I agree that CommunicationException is better than NPE. If so then not nulling the connection is at least an improvement. But this would deserve a comment in the code, I think. More below... I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For inst
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
On 10/09/2018 15:53, vyom tewari wrote: Yes , this will definitely solve this NPE issue but we may hit NPE other places as well because we are setting 'LdapClient.com' to null asynchronously. I agree with Vyom here. Other solutions that have been investigated - such as only setting the connection to null when its ref count reaches zero now seem to complex (read: too risky because over the complexity threshold) to me. Maybe the best thing to do here is to stick to Vyom original's idea, but declare conn final and do all the necessary (small) cleanup that goes with it? Possibly also add a comment saying that CommunicationException will be thrown if some other thread uses the LdapClient after forceClose has been closed... best regards, -- daniel
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
On Monday 10 September 2018 05:30 PM, Chris Hegarty wrote: Vyom, The NPE is originating from the finally block of the LdapClient `authenticate` method. In the finally block there is an attempt to restore the "read" timeout, since it was changed earlier in `authenticate`, to reflect the connect timeout value, since the subsequent operation(s) are considered part of the connect phase. An unsolicited message may close the connection during authenticate, which is does and LdapClient.ldapBind throws "javax.naming.CommunicationException: Request: 1 cancelled". The finally block is then executed and the previous exception is effectively throw away when the NPE is encountered. If we agree that such behaviour is reasonable ( and I think it most likely is ), then the finally block needs to be more defensive - check that conn is not null. It makes no sense to attempt to reset the read timeout if conn is null. With the following change, then the CommunicationException will be thrown from `authenticate`. I think this is the behaviour we want, no? Yes , this will definitely solve this NPE issue but we may hit NPE other places as well because we are setting 'LdapClient.com' to null asynchronously. Vyom --- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java +++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java @@ -297,7 +297,8 @@ conn.setV3(isLdapv3); return res; } finally { - conn.readTimeout = readTimeout; + if (conn != null) + conn.readTimeout = readTimeout; } } -Chris. On 07/09/18 17:47, Daniel Fuchs wrote: Hi Vyom, Please see inline... I've elided some parts... On 07/09/2018 10:11, vyom tewari wrote: On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote: So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. I checked the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here. I'm referring to this pattern (in forceClose): conn.cleanup(null, false); conn = null; if (cleanPool) { pcb.removePooledConnection(this); } The connection is cleaned up, then nulled, then only removed from the pool. I'm questioning whether the first thing to do would be to remove the connection from the pool, to reduce the time window in which the client could be retrieved from the pool by another thread while forceClose is in process. I am not sure what side effect this could have - as I haven't studied the code that retrieves the client from the pool, but I'm wondering whether that's worth exploring. As you said server is closing the same connection which is retrieved from pool which is correct but this should not throw unintentional NPE. Agreed. In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE. LdapClient does not set `Ldapclient.conn` to null if connection is not pooled. Really? That's not what I see... Am I missing something? >> [...] If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE. OK - that's good to know. I agree that CommunicationException is better than NPE. If so then not nulling the connection is at least an improvement. But this would deserve a comment in the code, I think. More below... I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For instance taking the connection out of the pool before closing it, instead of closing it before taking it out of the pool, etc... Changing order will not help, i think closing the same connection is not a problem but setting `LdapClient.conn' to null asynchronously is causing NPE. Yes - I agree that setting conn to null is problematic. I wonder why it's set to null. Maybe in an attempt to make it eligible for GC earlier? If that's not the issue, then I would suggest making it final (AFAICS it's only set to non-null once) and cleaning up the code t
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Nice find Chris. But I have to wonder why the NPE wasn't thrown at the very beginning of `authenticate` namely one of these two lines: int readTimeout = conn.readTimeout; conn.readTimeout = conn.connectTimeout; which happen to be called even before ensureOpen() is called a few lines later! cheers, -- daniel On 10/09/2018 13:00, Chris Hegarty wrote: Vyom, The NPE is originating from the finally block of the LdapClient `authenticate` method. In the finally block there is an attempt to restore the "read" timeout, since it was changed earlier in `authenticate`, to reflect the connect timeout value, since the subsequent operation(s) are considered part of the connect phase. An unsolicited message may close the connection during authenticate, which is does and LdapClient.ldapBind throws "javax.naming.CommunicationException: Request: 1 cancelled". The finally block is then executed and the previous exception is effectively throw away when the NPE is encountered. If we agree that such behaviour is reasonable ( and I think it most likely is ), then the finally block needs to be more defensive - check that conn is not null. It makes no sense to attempt to reset the read timeout if conn is null. With the following change, then the CommunicationException will be thrown from `authenticate`. I think this is the behaviour we want, no? --- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java +++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java @@ -297,7 +297,8 @@ conn.setV3(isLdapv3); return res; } finally { - conn.readTimeout = readTimeout; + if (conn != null) + conn.readTimeout = readTimeout; } } -Chris.
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Vyom, The NPE is originating from the finally block of the LdapClient `authenticate` method. In the finally block there is an attempt to restore the "read" timeout, since it was changed earlier in `authenticate`, to reflect the connect timeout value, since the subsequent operation(s) are considered part of the connect phase. An unsolicited message may close the connection during authenticate, which is does and LdapClient.ldapBind throws "javax.naming.CommunicationException: Request: 1 cancelled". The finally block is then executed and the previous exception is effectively throw away when the NPE is encountered. If we agree that such behaviour is reasonable ( and I think it most likely is ), then the finally block needs to be more defensive - check that conn is not null. It makes no sense to attempt to reset the read timeout if conn is null. With the following change, then the CommunicationException will be thrown from `authenticate`. I think this is the behaviour we want, no? --- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java +++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java @@ -297,7 +297,8 @@ conn.setV3(isLdapv3); return res; } finally { -conn.readTimeout = readTimeout; +if (conn != null) +conn.readTimeout = readTimeout; } } -Chris. On 07/09/18 17:47, Daniel Fuchs wrote: Hi Vyom, Please see inline... I've elided some parts... On 07/09/2018 10:11, vyom tewari wrote: On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote: So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. I checked the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here. I'm referring to this pattern (in forceClose): conn.cleanup(null, false); conn = null; if (cleanPool) { pcb.removePooledConnection(this); } The connection is cleaned up, then nulled, then only removed from the pool. I'm questioning whether the first thing to do would be to remove the connection from the pool, to reduce the time window in which the client could be retrieved from the pool by another thread while forceClose is in process. I am not sure what side effect this could have - as I haven't studied the code that retrieves the client from the pool, but I'm wondering whether that's worth exploring. As you said server is closing the same connection which is retrieved from pool which is correct but this should not throw unintentional NPE. Agreed. In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE. LdapClient does not set `Ldapclient.conn` to null if connection is not pooled. Really? That's not what I see... Am I missing something? >> [...] If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE. OK - that's good to know. I agree that CommunicationException is better than NPE. If so then not nulling the connection is at least an improvement. But this would deserve a comment in the code, I think. More below... I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For instance taking the connection out of the pool before closing it, instead of closing it before taking it out of the pool, etc... Changing order will not help, i think closing the same connection is not a problem but setting `LdapClient.conn' to null asynchronously is causing NPE. Yes - I agree that setting conn to null is problematic. I wonder why it's set to null. Maybe in an attempt to make it eligible for GC earlier? If that's not the issue, then I would suggest making it final (AFAICS it's only set to non-null once) and cleaning up the code that tests for conn == null. If GC is the issue, then the best you can do is probably only set it to null when the client is not pooled. And if we could reduce the time window in which a thread could get a stale client from t
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi Vyom, Please see inline... I've elided some parts... On 07/09/2018 10:11, vyom tewari wrote: On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote: So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. I checked the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here. I'm referring to this pattern (in forceClose): conn.cleanup(null, false); conn = null; if (cleanPool) { pcb.removePooledConnection(this); } The connection is cleaned up, then nulled, then only removed from the pool. I'm questioning whether the first thing to do would be to remove the connection from the pool, to reduce the time window in which the client could be retrieved from the pool by another thread while forceClose is in process. I am not sure what side effect this could have - as I haven't studied the code that retrieves the client from the pool, but I'm wondering whether that's worth exploring. As you said server is closing the same connection which is retrieved from pool which is correct but this should not throw unintentional NPE. Agreed. In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE. LdapClient does not set `Ldapclient.conn` to null if connection is not pooled. Really? That's not what I see... Am I missing something? >> [...] If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE. OK - that's good to know. I agree that CommunicationException is better than NPE. If so then not nulling the connection is at least an improvement. But this would deserve a comment in the code, I think. More below... I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For instance taking the connection out of the pool before closing it, instead of closing it before taking it out of the pool, etc... Changing order will not help, i think closing the same connection is not a problem but setting `LdapClient.conn' to null asynchronously is causing NPE. Yes - I agree that setting conn to null is problematic. I wonder why it's set to null. Maybe in an attempt to make it eligible for GC earlier? If that's not the issue, then I would suggest making it final (AFAICS it's only set to non-null once) and cleaning up the code that tests for conn == null. If GC is the issue, then the best you can do is probably only set it to null when the client is not pooled. And if we could reduce the time window in which a thread could get a stale client from the pool that might be good too? Is there anyway we could somehow add a test that reproduce the NPE? best regards, -- daniel [...] On 24/08/18 11:35, vyom tewari wrote: Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. [...]
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi Daniel, Thanks for detailed review and comment. Please find my answers inline. Thanks, Vyom On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote: Hi Vyom, IIUC, the issue only happens when connections (clients?) to the server are pooled. I assume the server may decide to close what it thinks is an idle connection at any time. correct So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. I checked the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here. As you said server is closing the same connection which is retrieved from pool which is correct but this should not throw unintentional NPE. In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE. LdapClient does not set `Ldapclient.conn` to null if connection is not pooled. The question for me would be "what is the expected behavior if the server decides to close an idle connection?" I would expect that new InitialDirContext() should have some way of dealing with that with retrying? If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE. I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For instance taking the connection out of the pool before closing it, instead of closing it before taking it out of the pool, etc... Changing order will not help, i think closing the same connection is not a problem but setting `LdapClient.conn' to null asynchronously is causing NPE. Please do let me know if i am missing something. best regards, -- daniel On 04/09/2018 15:04, vyom tewari wrote: On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote: Hi Vyom, On 24/08/18 11:35, vyom tewari wrote: Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. Sorry, I don't know this code all that well, but I think that more explanation will be needed before this code can be reviewed. Hi Chris, let me try to explain issue little bit. The issue is a if ldap connection has already been established and then the LDAP directory server sends an (unsolicited) "Notice of Disconnection", the client's processing of this LDAP message can race with an application thread calling new InitialDirContext() to authenticate user credentials (i.e.bind) and throw NPE. I did some analysis and found out that when server send an (unsolicited) "Notice of Disconnection", LdapClient.forceClose() will be called in LdapClient.processUnsolicited() which is called asynchronously by the reader thread in Connection, this means 'LdapClient.conn' may set to null anytime after it received "Notice of Disconnection". The LdapClient and the Connection seem to be loosely coupled. I think part of this is to allow the LdapClient to be GC'ed and finalized separately to the Connection ( that can be reused ). Not setting `conn` to null could have a negative impact on this loose coupling. I also note that the synchronization is implemented poorly in the LdapClient, `conn` is operated on both from within synchronized blocks and from unsynchronized blocks ( which I think is the reason you see the unexpected null ). I agree, not setting 'conn' to null will definitely have some impact. I check the LdapClient and it looks to me it is intentional(i may be wrong) that 'conn' is operated on both from within synchronize blocks and from unsynchronize block. LdapClient, java doc says that connection(conn) take care of it's own sync. ## access from outside LdapClient must sync; * conn - no sync; Connection takes care of its own sync * unsolicited - sync Vector; multiple operations sync'ed ## Please have a look and do let me know your thought on the above. Thanks,
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi Vyom, IIUC, the issue only happens when connections (clients?) to the server are pooled. I assume the server may decide to close what it thinks is an idle connection at any time. So what we see here is actually a race between an open connection being retrieved from the pool, and the server deciding to close that connection. The connection/client is retrieved from the pool, then unfortunately closed by the server after having been selected for the next operation. The question for me would be "what is the expected behavior if the server decides to close an idle connection?" I would expect that new InitialDirContext() should have some way of dealing with that with retrying? If so will leaving the 'conn' field assigned ensure that the retry happens, or will new InitialDirContext() fail with some other exception because the connection has been closed? Or does the code simply assume that asynchronous closing of the connection by the server can only happen while it's sitting idle in the pool? I wonder if some other mechanism could be used to reduce and hopefully eliminate the time window in which the race could appear. For instance taking the connection out of the pool before closing it, instead of closing it before taking it out of the pool, etc... best regards, -- daniel On 04/09/2018 15:04, vyom tewari wrote: On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote: Hi Vyom, On 24/08/18 11:35, vyom tewari wrote: Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. Sorry, I don't know this code all that well, but I think that more explanation will be needed before this code can be reviewed. Hi Chris, let me try to explain issue little bit. The issue is a if ldap connection has already been established and then the LDAP directory server sends an (unsolicited) "Notice of Disconnection", the client's processing of this LDAP message can race with an application thread calling new InitialDirContext() to authenticate user credentials (i.e.bind) and throw NPE. I did some analysis and found out that when server send an (unsolicited) "Notice of Disconnection", LdapClient.forceClose() will be called in LdapClient.processUnsolicited() which is called asynchronously by the reader thread in Connection, this means 'LdapClient.conn' may set to null anytime after it received "Notice of Disconnection". The LdapClient and the Connection seem to be loosely coupled. I think part of this is to allow the LdapClient to be GC'ed and finalized separately to the Connection ( that can be reused ). Not setting `conn` to null could have a negative impact on this loose coupling. I also note that the synchronization is implemented poorly in the LdapClient, `conn` is operated on both from within synchronized blocks and from unsynchronized blocks ( which I think is the reason you see the unexpected null ). I agree, not setting 'conn' to null will definitely have some impact. I check the LdapClient and it looks to me it is intentional(i may be wrong) that 'conn' is operated on both from within synchronize blocks and from unsynchronize block. LdapClient, java doc says that connection(conn) take care of it's own sync. ## access from outside LdapClient must sync; * conn - no sync; Connection takes care of its own sync * unsolicited - sync Vector; multiple operations sync'ed ## Please have a look and do let me know your thought on the above. Thanks, Vyom -Chris.
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote: Hi Vyom, On 24/08/18 11:35, vyom tewari wrote: Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. Sorry, I don't know this code all that well, but I think that more explanation will be needed before this code can be reviewed. Hi Chris, let me try to explain issue little bit. The issue is a if ldap connection has already been established and then the LDAP directory server sends an (unsolicited) "Notice of Disconnection", the client's processing of this LDAP message can race with an application thread calling new InitialDirContext() to authenticate user credentials (i.e.bind) and throw NPE. I did some analysis and found out that when server send an (unsolicited) "Notice of Disconnection", LdapClient.forceClose() will be called in LdapClient.processUnsolicited() which is called asynchronously by the reader thread in Connection, this means 'LdapClient.conn' may set to null anytime after it received "Notice of Disconnection". The LdapClient and the Connection seem to be loosely coupled. I think part of this is to allow the LdapClient to be GC'ed and finalized separately to the Connection ( that can be reused ). Not setting `conn` to null could have a negative impact on this loose coupling. I also note that the synchronization is implemented poorly in the LdapClient, `conn` is operated on both from within synchronized blocks and from unsynchronized blocks ( which I think is the reason you see the unexpected null ). I agree, not setting 'conn' to null will definitely have some impact. I check the LdapClient and it looks to me it is intentional(i may be wrong) that 'conn' is operated on both from within synchronize blocks and from unsynchronize block. LdapClient, java doc says that connection(conn) take care of it's own sync. ## access from outside LdapClient must sync; * conn - no sync; Connection takes care of its own sync * unsolicited - sync Vector; multiple operations sync'ed ## Please have a look and do let me know your thought on the above. Thanks, Vyom -Chris.
Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi Vyom, On 24/08/18 11:35, vyom tewari wrote: Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. Sorry, I don't know this code all that well, but I think that more explanation will be needed before this code can be reviewed. The LdapClient and the Connection seem to be loosely coupled. I think part of this is to allow the LdapClient to be GC'ed and finalized separately to the Connection ( that can be reused ). Not setting `conn` to null could have a negative impact on this loose coupling. I also note that the synchronization is implemented poorly in the LdapClient, `conn` is operated on both from within synchronized blocks and from unsynchronized blocks ( which I think is the reason you see the unexpected null ). -Chris.
RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
Hi All, Please review this simple fix below webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html bugid: https://bugs.openjdk.java.net/browse/JDK-8205330 This fix will resolve the race in LdapClient where we are explicitly making "null" to LdapClient.conn. Thanks, Vyom