[ 
https://issues.apache.org/jira/browse/DERBY-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005332#comment-13005332
 ] 

Kathey Marsden commented on DERBY-4319:
---------------------------------------

I have the machine back in the state where this reproduces and am sorry to say 
that there is still a hang in a different method, even with my prior attempt to 
get past it, but since I can reproduce now, I should be able to make some 
progress on this issue.    I'll record some info here in case it becomes hard 
to reproduce again.


The current state of hang is that the launched network server process which 
seems to specify all the drda parameters without values:
cloudtst 6488248 4390978   0 14:41:38      -  0:20 /local1/IBM_JDK/15sr13/sdk/jr
e/bin/java -classpath /local1/kmarsden/repro/derby-4319/jars//derby.jar:/local1/
kmarsden/repro/derby-4319/jars//derbyrun.jar:/local1/kmarsden/repro/derby-4319/j
ars//derbyTesting.jar:/local1/kmarsden/repro/derby-4319/jars//junit.jar -Dderby.
drda.logConnections= -Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby
.drda.keepAlive= -Dderby.drda.timeSlice= -Dderby.drda.host= -Dderby.drda.portNum
ber= -Dderby.drda.minThreads= -Dderby.drda.maxThreads= -Dderby.drda.startNetwork
Server= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl start -h
localhost -p 1527

I will attach the javacore with thread dump as 
LaunchedNetworkServer.javacore.20110309.160148.6488248.0001.txt

The server threads look pretty normal with a ClientThread running waiting to 
accept requests.

The test process is hung in NetworkServerTestSetup.complete(). I am not sure if 
it is later or if the change I made just did not work.  I will attach the test 
process file as:
TestProcess.javacore.20110310.123703.4390978.0001.txt

If I try to ping the server from the command line I get a ConnectionReset error:
$ java org.apache.derby.drda.NetworkServerControl ping
Thu Mar 10 12:47:39 PST 2011 : Error on client socket:
 Connection reset
Thu Mar 10 12:47:39 PST 2011 : Connection reset
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:116)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(N
etworkServerControlImpl.java:2873)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Networ
kServerControlImpl.java:2817)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Ne
tworkServerControlImpl.java:1253)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(NetworkServe
rControlImpl.java:1228)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(Netwo
rkServerControlImpl.java:2260)
        at org.apache.derby.drda.NetworkServerControl.main(NetworkServerControl.
java:320)


Then after that subsequent ping attempts hang and a new thread dump on the 
Network Server process shows that the ClientThread is no longer there.   I 
think this should never happen. I think a lot of work has been put into making 
sure that the ClientThread always survives any type of error in order host more 
connections. see attachment  
LaunchedNetworkServerAfterPing.javacore.20110310.124948.6488248.0002.txt


Another thing to note is that prior to the defaultProperties test there was 
actually a stack trace in the setPortPriorty test with a Connection reset  
which did not cause failure. see TestOutput2011-03-09.txt .out


This issue actually has many facets that are worth working on:

1) How do we make sure a spawned network server process is destroyed if it  
hangs the whole suite?

2)  Under  what circumstances can the Network Server ClientThread that loops 
accepting new connections be destroyed?

3) What sort of problem is being caused on AIX by starting network server with 
these odd options?  I am thinking maybe it is related to soTimeout or keepalive 
getting set to an unexpected option but am not sure.

I have been holding off on working on 3, because it provides a good 
reproduction for issue one and two but think that at this point, the best thing 
to do would be to disable the problematic fixture on AIX whether it is 
testSetpPortPriority or testDefaultProperties.    Then I can work on all three 
issues in a logical order and pace without release concerns.  I'll look into 
doing that.




> hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties
> -------------------------------------------------------------------
>
>                 Key: DERBY-4319
>                 URL: https://issues.apache.org/jira/browse/DERBY-4319
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Client
>    Affects Versions: 10.5.2.0
>         Environment: ibm jvm 1.5 SR9-0 on IBM AIX 3.5
>            Reporter: Myrna van Lunteren
>            Assignee: Kathey Marsden
>              Labels: derby_triage10_8
>         Attachments: derby-4317_timeout_for_complete_diff.txt, 
> derby-4319_teardown_kill_on_bad_ping.txt, 
> javacore.20090723.093837.25380.0001.txt, 
> javacore.20090723.093909.24726.0001.txt
>
>
> The test run for 10.5.2.0 hung in suites.All. The console output (the run was 
> with -Dderby.tests.trace=true) showed ttestDefaultProperties had successfully 
> completed but the run was halted.
> ps -eaf | grep java showed the process that kicked off suites.All, and a 
> networkserver process with the following flags:
> - classpath <classpath including derby.jar, derbytools.jar, derbyclient.jar, 
> derbynet.jar, derbyTesting.jar, derbyrun.jar, derbyTesting.jar and junit.jar> 
> -Dderby.drda.logConnections= -Dderby.drda.traceAll= 
> -Dderby.drda.traceDirectory= -Dderby.drda.keepAlive= -Dderby.drda.timeSlice= 
> -Dderby.drda.host= -Dderby.drda.portNumber= -derby.drda.minThreads= 
> -Dderby.drda.maxThreads= -Dderby.drda.startNetworkServer= -Dderby.drda.debug= 
> org.apache.derby.drda.NetworkServerControl start -h localhost -p 1527
> This process had been sitting for 2 days.
> After killing the NetworkServerControl process, the test continued 
> successfully (except for DERBY-4186, fixed in trunk), but the following was 
> put out to the console:
>  START-SPAWNED:SpawnedNetworkServer STANDARD OUTPUT: exit code=137
> 2009-07-18 03:16:07.157 GMT : Security manager installed using the Basic 
> server
> security policy.
> 2009-07-18 03:16:09.169 GMT : Apache Derby Network Server - 10.5.2.0 - 
> (794445)
> started and ready to accept connections on port 1527
> END-SPAWNED  :SpawnedNetworkServer STANDARD OUTPUT:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to