Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-11 Thread Daniel Fuchs

On 11/09/2018 12:15, Chris Hegarty wrote:



On 11 Sep 2018, at 09:50, vyom tewari  wrote:

Hi Chris,Daniel,

Please find the latest patch( 
http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ).


Thanks Vyom, Reviewed.


Agreed.

best regards,

-- daniel



-Chris.





Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-11 Thread Chris Hegarty


> On 11 Sep 2018, at 09:50, vyom tewari  wrote:
> 
> Hi Chris,Daniel,
> 
> Please find the latest patch( 
> http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ).

Thanks Vyom, Reviewed.

-Chris.

Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-11 Thread vyom tewari

Hi Chris,Daniel,

Please find the latest patch( 
http://cr.openjdk.java.net/~vtewari/8205330/webrev0.1/index.html ).


I did the additional cleanup(removed redundant null check) as you suggested.

Thanks,

Vyom


On Monday 10 September 2018 09:03 PM, Daniel Fuchs wrote:

On 10/09/2018 15:53, vyom tewari wrote:
Yes , this will definitely solve this NPE issue but we may hit NPE 
other places as well because we are setting 'LdapClient.com' to null 
asynchronously.


I agree with Vyom here.

Other solutions that have been investigated - such as only
setting the connection to null when its ref count reaches zero
now seem to complex (read: too risky because over the complexity
threshold) to me.

Maybe the best thing to do here is to stick to Vyom original's
idea, but declare conn final and do all the necessary (small)
cleanup that goes with it? Possibly also add a comment saying
that CommunicationException will be thrown if some other thread
uses the LdapClient after forceClose has been closed...

best regards,

-- daniel





Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-10 Thread Chris Hegarty

Vyom,

You are correct, the change that I proposed is minimal
and some fragility still remains.

I talked with Daniel, walked through the code one more
time, and I am now convinced that your original proposal
is correct ( since the pooling is at the level of
LdapClient, rather than at the level of Connection ).

Can I also suggest:
 1) Remove the null assignments, rather commenting them
out.
 2) Make Connection `conn` final.
 3) Remove all null checks for conn ( since it can no
longer be null ).

-Chris.

On 10/09/18 15:53, vyom tewari wrote:



On Monday 10 September 2018 05:30 PM, Chris Hegarty wrote:

Vyom,

The NPE is originating from the finally block of the LdapClient
`authenticate` method. In the finally block there is an attempt
to restore the "read" timeout, since it was changed earlier in
`authenticate`, to reflect the connect timeout value, since the
subsequent operation(s) are considered part of the connect phase.

An unsolicited message may close the connection during
authenticate, which is does and LdapClient.ldapBind throws
"javax.naming.CommunicationException: Request: 1 cancelled". The
finally block is then executed and the previous exception is
effectively throw away when the NPE is encountered.

If we agree that such behaviour is reasonable ( and I think it
most likely is ), then the finally block needs to be more
defensive - check that conn is not null. It makes no sense to
attempt to reset the read timeout if conn is null.

With the following change, then the CommunicationException will
be thrown from `authenticate`. I think this is the behaviour we
want, no?

Yes , this will definitely solve this NPE issue but we may hit NPE other 
places as well because we are setting 'LdapClient.com' to null 
asynchronously.

Vyom


--- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
+++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
@@ -297,7 +297,8 @@
 conn.setV3(isLdapv3);
 return res;
 } finally {
-    conn.readTimeout = readTimeout;
+    if (conn != null)
+    conn.readTimeout = readTimeout;
 }
 }


-Chris.


On 07/09/18 17:47, Daniel Fuchs wrote:

Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:

On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and 
"removePooledConnection" are "synchronized" so there should not be 
any problem(concurrency issue) here.


I'm referring to this pattern (in forceClose):

 conn.cleanup(null, false);
 conn = null;

 if (cleanPool) {
 pcb.removePooledConnection(this);
 }

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

As you said server is closing the same connection which is retrieved 
from pool which  is correct but this should not throw unintentional 
NPE.


Agreed.

In LdapClient, we set the "Ldapclient.conn" to explicitly null 
asynchronously after we remove the connection from pool and which is 
creating the NPE.


LdapClient does not set `Ldapclient.conn` to null if connection is 
not pooled.


Really? That's not what I see... Am I missing something?

 >> [...]

If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

  I don't know if retrying is feasible in 'IntialDirContext' but if 
we leave 'LdapClient.conn' assigned then we will get 
"javax.naming.CommunicationException" which tells that, underline 
connection is closed. I am not 100% sure if this is right approach 
but if we set 'LdapClient.conn' to null asynchronously then we will 
hit NPE.


OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...


I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For inst

Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-10 Thread Daniel Fuchs

On 10/09/2018 15:53, vyom tewari wrote:
Yes , this will definitely solve this NPE issue but we may hit NPE other 
places as well because we are setting 'LdapClient.com' to null 
asynchronously.


I agree with Vyom here.

Other solutions that have been investigated - such as only
setting the connection to null when its ref count reaches zero
now seem to complex (read: too risky because over the complexity
threshold) to me.

Maybe the best thing to do here is to stick to Vyom original's
idea, but declare conn final and do all the necessary (small)
cleanup that goes with it? Possibly also add a comment saying
that CommunicationException will be thrown if some other thread
uses the LdapClient after forceClose has been closed...

best regards,

-- daniel



Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-10 Thread vyom tewari




On Monday 10 September 2018 05:30 PM, Chris Hegarty wrote:

Vyom,

The NPE is originating from the finally block of the LdapClient
`authenticate` method. In the finally block there is an attempt
to restore the "read" timeout, since it was changed earlier in
`authenticate`, to reflect the connect timeout value, since the
subsequent operation(s) are considered part of the connect phase.

An unsolicited message may close the connection during
authenticate, which is does and LdapClient.ldapBind throws
"javax.naming.CommunicationException: Request: 1 cancelled". The
finally block is then executed and the previous exception is
effectively throw away when the NPE is encountered.

If we agree that such behaviour is reasonable ( and I think it
most likely is ), then the finally block needs to be more
defensive - check that conn is not null. It makes no sense to
attempt to reset the read timeout if conn is null.

With the following change, then the CommunicationException will
be thrown from `authenticate`. I think this is the behaviour we
want, no?

Yes , this will definitely solve this NPE issue but we may hit NPE other 
places as well because we are setting 'LdapClient.com' to null 
asynchronously.

Vyom


--- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
+++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
@@ -297,7 +297,8 @@
 conn.setV3(isLdapv3);
 return res;
 } finally {
-    conn.readTimeout = readTimeout;
+    if (conn != null)
+    conn.readTimeout = readTimeout;
 }
 }


-Chris.


On 07/09/18 17:47, Daniel Fuchs wrote:

Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:

On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and 
"removePooledConnection" are "synchronized" so there should not be 
any problem(concurrency issue) here.


I'm referring to this pattern (in forceClose):

 conn.cleanup(null, false);
 conn = null;

 if (cleanPool) {
 pcb.removePooledConnection(this);
 }

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

As you said server is closing the same connection which is retrieved 
from pool which  is correct but this should not throw unintentional 
NPE.


Agreed.

In LdapClient, we set the "Ldapclient.conn" to explicitly null 
asynchronously after we remove the connection from pool and which is 
creating the NPE.


LdapClient does not set `Ldapclient.conn` to null if connection is 
not pooled.


Really? That's not what I see... Am I missing something?

 >> [...]

If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

  I don't know if retrying is feasible in 'IntialDirContext' but if 
we leave 'LdapClient.conn' assigned then we will get 
"javax.naming.CommunicationException" which tells that, underline 
connection is closed. I am not 100% sure if this is right approach 
but if we set 'LdapClient.conn' to null asynchronously then we will 
hit NPE.


OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...


I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection 
is not a problem  but setting `LdapClient.conn'  to null 
asynchronously is causing NPE.


Yes - I agree that setting conn to null is problematic.
I wonder why it's set to null. Maybe in an attempt to
make it eligible for GC earlier?

If that's not the issue, then I would suggest making it
final (AFAICS it's only set to non-null once) and cleaning
up the code t

Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-10 Thread Daniel Fuchs

Nice find Chris.

But I have to wonder why the NPE wasn't thrown at the
very beginning of `authenticate` namely one of these
two lines:

int readTimeout = conn.readTimeout;
conn.readTimeout = conn.connectTimeout;

which happen to be called even before ensureOpen() is
called a few lines later!

cheers,

-- daniel

On 10/09/2018 13:00, Chris Hegarty wrote:

Vyom,

The NPE is originating from the finally block of the LdapClient
`authenticate` method. In the finally block there is an attempt
to restore the "read" timeout, since it was changed earlier in
`authenticate`, to reflect the connect timeout value, since the
subsequent operation(s) are considered part of the connect phase.

An unsolicited message may close the connection during
authenticate, which is does and LdapClient.ldapBind throws
"javax.naming.CommunicationException: Request: 1 cancelled". The
finally block is then executed and the previous exception is
effectively throw away when the NPE is encountered.

If we agree that such behaviour is reasonable ( and I think it
most likely is ), then the finally block needs to be more
defensive - check that conn is not null. It makes no sense to
attempt to reset the read timeout if conn is null.

With the following change, then the CommunicationException will
be thrown from `authenticate`. I think this is the behaviour we
want, no?


--- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
+++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
@@ -297,7 +297,8 @@
  conn.setV3(isLdapv3);
  return res;
  } finally {
-    conn.readTimeout = readTimeout;
+    if (conn != null)
+    conn.readTimeout = readTimeout;
  }
  }


-Chris.


Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-10 Thread Chris Hegarty

Vyom,

The NPE is originating from the finally block of the LdapClient
`authenticate` method. In the finally block there is an attempt
to restore the "read" timeout, since it was changed earlier in
`authenticate`, to reflect the connect timeout value, since the
subsequent operation(s) are considered part of the connect phase.

An unsolicited message may close the connection during
authenticate, which is does and LdapClient.ldapBind throws
"javax.naming.CommunicationException: Request: 1 cancelled". The
finally block is then executed and the previous exception is
effectively throw away when the NPE is encountered.

If we agree that such behaviour is reasonable ( and I think it
most likely is ), then the finally block needs to be more
defensive - check that conn is not null. It makes no sense to
attempt to reset the read timeout if conn is null.

With the following change, then the CommunicationException will
be thrown from `authenticate`. I think this is the behaviour we
want, no?


--- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
+++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
@@ -297,7 +297,8 @@
 conn.setV3(isLdapv3);
 return res;
 } finally {
-conn.readTimeout = readTimeout;
+if (conn != null)
+conn.readTimeout = readTimeout;
 }
 }


-Chris.


On 07/09/18 17:47, Daniel Fuchs wrote:

Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:

On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and 
"removePooledConnection" are "synchronized" so there should not be any 
problem(concurrency issue) here.


I'm referring to this pattern (in forceClose):

     conn.cleanup(null, false);
     conn = null;

     if (cleanPool) {
     pcb.removePooledConnection(this);
     }

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

As you said server is closing the same connection which is retrieved 
from pool which  is correct but this should not throw unintentional NPE.


Agreed.

In LdapClient, we set the "Ldapclient.conn" to explicitly null 
asynchronously after we remove the connection from pool and which is 
creating the NPE.


LdapClient does not set `Ldapclient.conn` to null if connection is not 
pooled.


Really? That's not what I see... Am I missing something?

 >> [...]

If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

  I don't know if retrying is feasible in 'IntialDirContext' but if we 
leave 'LdapClient.conn' assigned then we will get 
"javax.naming.CommunicationException" which tells that, underline 
connection is closed. I am not 100% sure if this is right approach but 
if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.


OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...


I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection is 
not a problem  but setting `LdapClient.conn'  to null asynchronously 
is causing NPE.


Yes - I agree that setting conn to null is problematic.
I wonder why it's set to null. Maybe in an attempt to
make it eligible for GC earlier?

If that's not the issue, then I would suggest making it
final (AFAICS it's only set to non-null once) and cleaning
up the code that tests for conn == null.

If GC is the issue, then the best you can do is probably
only set it to null when the client is not pooled.

And if we could reduce the time window in which a thread
could get a stale client from t

Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-07 Thread Daniel Fuchs

Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:

On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and 
"removePooledConnection" are "synchronized" so there should not be any 
problem(concurrency issue) here.


I'm referring to this pattern (in forceClose):

conn.cleanup(null, false);
conn = null;

if (cleanPool) {
pcb.removePooledConnection(this);
}

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

As you said server is closing the same connection which is retrieved 
from pool which  is correct but this should not throw unintentional NPE.


Agreed.

In LdapClient, we set the "Ldapclient.conn" to explicitly null 
asynchronously after we remove the connection from pool and which is 
creating the NPE.


LdapClient does not set `Ldapclient.conn` to null if connection is not 
pooled.


Really? That's not what I see... Am I missing something?

>> [...]

If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

  I don't know if retrying is feasible in 'IntialDirContext' but if we 
leave 'LdapClient.conn' assigned then we will get 
"javax.naming.CommunicationException" which tells that, underline 
connection is closed. I am not 100% sure if this is right approach but 
if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.


OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...


I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection is 
not a problem  but setting `LdapClient.conn'  to null asynchronously is 
causing NPE.


Yes - I agree that setting conn to null is problematic.
I wonder why it's set to null. Maybe in an attempt to
make it eligible for GC earlier?

If that's not the issue, then I would suggest making it
final (AFAICS it's only set to non-null once) and cleaning
up the code that tests for conn == null.

If GC is the issue, then the best you can do is probably
only set it to null when the client is not pooled.

And if we could reduce the time window in which a thread
could get a stale client from the pool that might be good
too?

Is there anyway we could somehow add a test that reproduce
the NPE?

best regards,

-- daniel

[...]


On 24/08/18 11:35, vyom tewari wrote:

Hi All,

Please review this simple fix below

webrev: 
http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html


bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are 
explicitly making "null" to LdapClient.conn.

[...]


Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-07 Thread vyom tewari

Hi Daniel,

Thanks for detailed review and comment. Please find my answers inline.

Thanks,

Vyom


On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:

Hi Vyom,

IIUC, the issue only happens when connections (clients?) to the
server are pooled. I assume the server may decide to
close what it thinks is an idle connection at any time.


correct

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and 
"removePooledConnection" are "synchronized" so there should not be any 
problem(concurrency issue) here.


As you said server is closing the same connection which is retrieved 
from pool which  is correct but this should not throw unintentional NPE.


In LdapClient, we set the "Ldapclient.conn" to explicitly null 
asynchronously after we remove the connection from pool and which is 
creating the NPE.


LdapClient does not set `Ldapclient.conn` to null if connection is not 
pooled.



The question for me would be "what is the expected behavior
if the server decides to close an idle connection?"
I would expect that new InitialDirContext() should have some
way of dealing with that with retrying?
If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

 I don't know if retrying is feasible in 'IntialDirContext' but if we 
leave 'LdapClient.conn' assigned then we will get 
"javax.naming.CommunicationException" which tells that, underline 
connection is closed. I am not 100% sure if this is right approach but 
if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.



I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection is 
not a problem  but setting `LdapClient.conn'  to null asynchronously is 
causing NPE.


Please do let me know if i am missing something.


best regards,

-- daniel

On 04/09/2018 15:04, vyom tewari wrote:



On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:

Hi Vyom,

On 24/08/18 11:35, vyom tewari wrote:

Hi All,

Please review this simple fix below

webrev: 
http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html


bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are 
explicitly making "null" to LdapClient.conn.


Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.


Hi Chris, let me try to explain issue little bit.

The issue is a if ldap connection has already been established and 
then the LDAP directory server sends an (unsolicited) "Notice of 
Disconnection", the client's processing of this LDAP message can race 
with an application thread calling new InitialDirContext() to 
authenticate user credentials (i.e.bind) and throw NPE.


I did some analysis and found out that when server send an 
(unsolicited) "Notice of Disconnection", LdapClient.forceClose() will 
be called in LdapClient.processUnsolicited() which is called 
asynchronously by the reader thread in Connection, this means 
'LdapClient.conn' may set to null anytime after it received  "Notice 
of Disconnection".



The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).

I agree, not setting  'conn' to null will definitely have some 
impact. I check the LdapClient and it looks to me it is intentional(i 
may be wrong) that 'conn' is operated on both from within synchronize 
blocks and from unsynchronize block.


LdapClient, java doc says that connection(conn) take care of it's own 
sync.


##
    access from outside LdapClient must sync;
  *   conn - no sync; Connection takes care of its own sync
  *   unsolicited - sync Vector; multiple operations sync'ed

##

Please have a look and do let me know your thought on the above.

Thanks,

Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-04 Thread Daniel Fuchs

Hi Vyom,

IIUC, the issue only happens when connections (clients?) to the
server are pooled. I assume the server may decide to
close what it thinks is an idle connection at any time.

So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

The question for me would be "what is the expected behavior
if the server decides to close an idle connection?"
I would expect that new InitialDirContext() should have some
way of dealing with that with retrying?
If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

best regards,

-- daniel

On 04/09/2018 15:04, vyom tewari wrote:



On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:

Hi Vyom,

On 24/08/18 11:35, vyom tewari wrote:

Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly 
making "null" to LdapClient.conn.


Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.


Hi Chris, let me try to explain issue little bit.

The issue is a if ldap connection has already been established and then 
the LDAP directory server sends an (unsolicited) "Notice of 
Disconnection", the client's processing of this LDAP message can race 
with an application thread calling new InitialDirContext() to 
authenticate user credentials (i.e.bind) and throw NPE.


I did some analysis and found out that when server send an (unsolicited) 
"Notice of Disconnection",  LdapClient.forceClose() will be called in 
LdapClient.processUnsolicited() which is called asynchronously by the 
reader thread in Connection, this means 'LdapClient.conn' may set to 
null anytime after it received  "Notice of Disconnection".



The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).

I agree, not setting  'conn' to null will definitely have some impact. I 
check the LdapClient and it looks to me it is intentional(i may be 
wrong) that 'conn' is operated on both from within synchronize blocks 
and from unsynchronize block.


LdapClient, java doc says that connection(conn) take care of it's own sync.

##
    access from outside LdapClient must sync;
  *   conn - no sync; Connection takes care of its own sync
  *   unsolicited - sync Vector; multiple operations sync'ed

##

Please have a look and do let me know your thought on the above.

Thanks,
Vyom

-Chris.






Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-09-04 Thread vyom tewari




On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:

Hi Vyom,

On 24/08/18 11:35, vyom tewari wrote:

Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly 
making "null" to LdapClient.conn.


Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.


Hi Chris, let me try to explain issue little bit.

The issue is a if ldap connection has already been established and then 
the LDAP directory server sends an (unsolicited) "Notice of 
Disconnection", the client's processing of this LDAP message can race 
with an application thread calling new InitialDirContext() to 
authenticate user credentials (i.e.bind) and throw NPE.


I did some analysis and found out that when server send an (unsolicited) 
"Notice of Disconnection",  LdapClient.forceClose() will be called in 
LdapClient.processUnsolicited() which is called asynchronously by the 
reader thread in Connection, this means 'LdapClient.conn' may set to 
null anytime after it received  "Notice of Disconnection".



The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).

I agree, not setting  'conn' to null will definitely have some impact.  
I check the LdapClient and it looks to me it is intentional(i may be 
wrong) that 'conn' is operated on both from within synchronize blocks 
and from unsynchronize block.


LdapClient, java doc says that connection(conn) take care of it's own sync.

##
   access from outside LdapClient must sync;
 *   conn - no sync; Connection takes care of its own sync
 *   unsolicited - sync Vector; multiple operations sync'ed

##

Please have a look and do let me know your thought on the above.

Thanks,
Vyom

-Chris.




Re: RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-08-24 Thread Chris Hegarty

Hi Vyom,

On 24/08/18 11:35, vyom tewari wrote:

Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly 
making "null" to LdapClient.conn.


Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.

The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).

-Chris.


RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

2018-08-24 Thread vyom tewari

Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly 
making "null" to LdapClient.conn.


Thanks,

Vyom