[ https://issues.apache.org/jira/browse/NIFI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hsin-Ying Lee updated NIFI-7866: -------------------------------- Comment: was deleted (was: Hi Mark Can we catch NPE and throw ConnectionException from loadFromConnectionResponse method?) > When cluster coordinator dies, other nodes may have trouble rejoining cluster > ----------------------------------------------------------------------------- > > Key: NIFI-7866 > URL: https://issues.apache.org/jira/browse/NIFI-7866 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Reporter: Mark Payne > Priority: Major > > When the cluster coordinator is lost, the nodes must now begin communicating > with a newly elected Cluster Coordinator. This is handled through the > StandardFlowService. > When the `handleReconnectionRequest` method is called and the request > provided does not contain the dataflow, the node is to connect to the cluster > coordinator and request the dataflow: > {code:java} > private void handleReconnectionRequest(final ReconnectionRequestMessage > request) { > try { > logger.info("Processing reconnection request from cluster > coordinator."); > // reconnect > ConnectionResponse connectionResponse = new > ConnectionResponse(getNodeId(), request.getDataFlow(), > request.getInstanceId(), request.getNodeConnectionStatuses(), > request.getComponentRevisions()); > if (connectionResponse.getDataFlow() == null) { > logger.info("Received a Reconnection Request that contained no > DataFlow. Will attempt to connect to cluster using local flow."); > connectionResponse = connect(false, false, > createDataFlowFromController()); > } > loadFromConnectionResponse(connectionResponse); > ... {code} > However, if the call above to `connect(false, false, > createDataFlowFromController()` returns false (which is a valid case), that > null value is passed along to the loadFromConnectionResponse. This method > expects a non-null connectionResponse and throws a NullPointerException, > resulting in the following stack trace (stack trace based on nifi 1.11.4): > {code:java} > 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] > o.a.nifi.controller.StandardFlowService Handling reconnection request failed > due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node > to cluster due to: > java.lang.NullPointerExceptionorg.apache.nifi.cluster.ConnectionException: > Failed to connect node to cluster due to: java.lang.NullPointerExceptionat > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)at > > org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)at > > org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)at > > org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)at > java.lang.Thread.run(Thread.java:748)Caused by: > java.lang.NullPointerException: nullat > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)... > 4 common frames omitted {code} > This results in the node not reconnecting to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)