[ 
https://issues.apache.org/jira/browse/DERBY-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725630#action_12725630
 ] 

Mamta A. Satoor commented on DERBY-4053:
----------------------------------------

Based on Kathey's suggestion, I tried putting sleep in the Network Server code 
right in the middle of ping protocol handshake.

Following is what happens for ping on the server side (in 
NetworkServerControlImpl)
private void sendOK(DDMWriter writer) throws Exception 
{ 
      writeCommandReplyHeader(writer); 
      writer.writeByte(OK); 
      writer.flush(); 
} 
I have copied the sendOK code inline where the ping is handled in 
NetworkServerControlImpl.processCommands(). Additionally, I changed that copied 
code to have the server sleep after writing the header but before sending the 
ok to the ping client as shown below.
      writeCommandReplyHeader(writer);
      writer.flush();
      System.out.println("before going to sleep");
      Thread.sleep(10000);
      System.out.println("after sleep");
      writer.writeByte(OK);
      System.out.println("after sending OK");
      writer.flush();
      System.out.println("after flushing OK");
With the code changes above, I thought I would be able to reproduce the bug if 
I tried shutting down server while the server was still sleeping during ping 
handshake (ie before the ping protocol handshake is all finished). What I found 
was that the server shutdown properly, ping client got expected Invalid reply 
from network server: Insufficient data. We thought that if we tried bringing 
the server back up and tried ping on the new server session, it will hang 
because of the earlier insufficient data but that didn't happen. A hang here 
would have probably duplicated the intermittent hang behavior that we see when 
the nightly tests are running.

Little more info on exact steps for the test case above
Window 1 : Start the server
        java org.apache.derby.drda.NetworkServerControl -noSecurityManager 
start -p 1639
Window 2 : ping the server (this put the server in the sleep mode)
        java org.apache.derby.drda.NetworkServerControl ping -p 1639
Window 3: while server is sleeping, send shutdown request
        java org.apache.derby.drda.NetworkServerControl shutdown -p 1639

After spending more time on the experiment above, found that the ping client 
was getting
insufficient data because of the "writer.flush()" which I added right before 
Thread.sleep(...). This happened both with Sun and IBM jvms(1.6 versions). once 
I took the 
additional writer.flush() out, the ping client ran successfully and there was 
no insufficient
data error.

The goal here is to get a consistently (small) reproducible test case which 
will make debugging the problem easier but have not been to cause the ping to 
run into insufficient data in a small repro yet. Will brainstorm more but in 
the meantime, if anyone has any ideas
on what may be causing the insufficient data error, I can pursue those.

> suites.All hang with message java.net.BindException: Address already in use: 
> NET_Bind in derby.log 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4053
>                 URL: https://issues.apache.org/jira/browse/DERBY-4053
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server, Test
>    Affects Versions: 10.5.1.1
>            Reporter: Kathey Marsden
>         Attachments: derby-4053_repro_dont_commit_diff.txt, derby.log, 
> javacore-20090420-1735.txt, javacore.20090211.123031.4000.0001.txt, 
> suites.All.out
>
>
> Running suites.All with IBM 1.5  on 10.5.0.0 alpha - (743198)  I got a hang 
> in the test run.  The last test to run successfully was 
> xtestNestedSavepoints, but I am not sure exactly what test caused  the hang.  
> I took a thread dump which I will attach, which showed network server up and 
> running but no ClientThread and a ping attempt blocked.
> This hang is very similar to the hang that was seen after the fix attempts 
> for DERBY-1465 but that change was backed out so it is not related to that 
> change.   It could be that the change for DERBY-1465 just made this highly 
> intermittent problem more likely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to