[ https://issues.apache.org/jira/browse/NIFI-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898390#comment-15898390 ]
Dima Kovalyov commented on NIFI-2699: ------------------------------------- I hit this problem all the time when i work with large NiFi flows, 6 groups, up-to 8 processors in each with 30000+ flow files in queue. > Improve handling of response timeouts in cluster > ------------------------------------------------ > > Key: NIFI-2699 > URL: https://issues.apache.org/jira/browse/NIFI-2699 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework, Core UI > Reporter: Jeff Storck > Priority: Minor > > When running as a cluster, if a node is unable to respond within the socket > timeout (eg, hitting a breakpoint while debugging), an > IllegalClusterStateException will be thrown that causes the UI to show the > "check config and fix errors" page. Once the node is communicating with the > cluster again (i.e., breakpoint in the code is passed), the UI can be > reloaded and the cluster recovers from the timeout without any user > intervention at the service level. However, user experience could be > improved. If a user initiates a replicated request to a node that is unable > to respond within the socket timeout duration, the user might think NiFi > crashed, when it in fact didn't. > Here is the stack trace that was encountered during testing: > {code} > 2016-08-29 11:36:59,041 DEBUG [NiFi Web Server-22] > o.a.n.w.a.c.IllegalClusterStateExceptionMapper > org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: Node > localhost:8443 is unable to fulfill this request due to: Unexpected Response > Code 500 > at > org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$2.onCompletion(ThreadPoolRequestReplicator.java:471) > ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] > at > org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:729) > ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_92] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_92] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_92] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_92] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92] > Caused by: com.sun.jersey.api.client.ClientHandlerException: > java.net.SocketTimeoutException: Read timed out > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) > ~[jersey-client-1.19.jar:1.19] > at com.sun.jersey.api.client.Client.handle(Client.java:652) > ~[jersey-client-1.19.jar:1.19] > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) > ~[jersey-client-1.19.jar:1.19] > at > com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > ~[jersey-client-1.19.jar:1.19] > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560) > ~[jersey-client-1.19.jar:1.19] > at > org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:537) > ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] > at > org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:720) > ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] > ... 5 common frames omitted > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > ~[na:1.8.0_92] > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > ~[na:1.8.0_92] > at java.net.SocketInputStream.read(SocketInputStream.java:170) > ~[na:1.8.0_92] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[na:1.8.0_92] > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) > ~[na:1.8.0_92] > at sun.security.ssl.InputRecord.read(InputRecord.java:503) > ~[na:1.8.0_92] > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) > ~[na:1.8.0_92] > at > sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) > ~[na:1.8.0_92] > at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) > ~[na:1.8.0_92] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[na:1.8.0_92] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[na:1.8.0_92] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[na:1.8.0_92] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[na:1.8.0_92] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[na:1.8.0_92] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) > ~[na:1.8.0_92] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) > ~[na:1.8.0_92] > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > ~[na:1.8.0_92] > at > sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) > ~[na:1.8.0_92] > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) > ~[jersey-client-1.19.jar:1.19] > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) > ~[jersey-client-1.19.jar:1.19] > ... 11 common frames omitted > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)