[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Li reassigned STORM-3767: ------------------------------- Assignee: Ethan Li > NPE on getComponentPendingProfileActions > ----------------------------------------- > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug > Reporter: Ethan Li > Assignee: Ethan Li > Priority: Major > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 20m > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 65555 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map<String, SchedulerAssignment> newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)