[ https://issues.apache.org/jira/browse/SLIDER-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Billie Rinaldi updated SLIDER-1154: ----------------------------------- Attachment: SLIDER-1154.1.patch [~gsaha], here is my analysis and a patch. Another pair of eyes would be appreciated. h6. slider-core/src/main/java/org/apache/slider/server/appmaster/SliderAppMaster.java {code} 1594 if (amRegistrationData == null) { {code} {code} 1614 asyncRMClient.unregisterApplicationMaster(appStatus, appMessage, null); {code} These occur in a synchronized method that cleans up after AM stop has been initiated. These lines do not rely on appState, so additional appState lock does not need to be held. {code} 1258 agentOpsUrl = 1259 "https://" + appMasterHostname + ":" + agentWebApp.getSecuredPort(); 1260 agentStatusUrl = 1261 "https://" + appMasterHostname + ":" + agentWebApp.getPort(); {code} The private startAgentWebApp method containing these lines is only called when appState lock is held. h6. slider-core/src/main/java/org/apache/slider/server/appmaster/state/AppState.java {code} 496 return instanceDefinitionSnapshot; {code} This get method should be made synchronized so partial instance definition updates are not seen. {code} 480 return appConfSnapshot; 484 return internalsSnapshot; 476 return resourcesSnapshot; 500 return unresolvedInstanceDefinition; 387 return failedContainers; {code} The appConfSnapshot, internalsSnapshot, resourcesSnapshot, unresolvedInstanceDefinition, and failedContainers appear to be modified atomically, so these getters do not need to be synchronized. h6. slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java {code} 283 failedRecently = 0; {code} I'll synchronize this method. h6. slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeInstance.java {code} 255 for (NodeEntry entry : nodeEntries) { {code} This is in a "toFullString" method that is only used for logging, so it doesn't need to be synchronized. {code} 180 return !nodeState.isUnusable(); {code} This is changed atomically and doesn't need to be synchronized (NodeState is an enum). h6. slider-core/src/main/java/org/apache/slider/server/appmaster/state/OutstandingRequest.java {code} 168 return escalated; {code} I'll synchronize this method. {code} 172 return mayEscalate; 164 return escalationTimeoutMillis; {code} These getters don't need to be synchronized. One is only used in a test, the other is unused. h6. slider-core/src/main/java/org/apache/slider/server/appmaster/state/RoleHistory.java {code} 991 return outstandingRequests.listPlacedRequests(); 1000 return outstandingRequests.listOpenRequests(); {code} These methods are VisibleForTesting. {code} 210 if (roleCountInSource != roleSize) { {code} This is in a private method only called from a synchronized method. {code} 1008 return outstandingRequests.escalateOutstandingRequests(now()); {code} I'll synchronize this method. {code} 333 NodeInstance nodeInstance = nodemap.get(hostname); {code} It looks like this getNodeInformation method is only for displaying node information to the user through REST/IPC endpoints. Also the javadoc says "It's OK to be incomplete," which is probably why it wasn't synchronized in the first place. Not sure if we should add synchronization or not. {code} 1015 return outstandingRequests.cancelOutstandingAARequests(); {code} The method outstandingRequests.cancelOutstandingAARequests has synchronization which should be sufficient. > Code issues - 21 concurrent data access violations > -------------------------------------------------- > > Key: SLIDER-1154 > URL: https://issues.apache.org/jira/browse/SLIDER-1154 > Project: Slider > Issue Type: Bug > Components: other > Affects Versions: Slider 0.91 > Reporter: Gour Saha > Fix For: Slider 1.0.0 > > Attachments: SLIDER-1154.1.patch > > > Following 21 possible concurrent data access violations needs to be evaluated > and fixed. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/SliderAppMaster.java > {code} > 1594 if (amRegistrationData == null) { > {code} > missing_lock: Accessing amRegistrationData without holding lock > SliderAppMaster.appState. Elsewhere, > "org.apache.slider.server.appmaster.SliderAppMaster.amRegistrationData" is > accessed with SliderAppMaster.appState held 5 out of 6 times. > {code} > 1614 asyncRMClient.unregisterApplicationMaster(appStatus, appMessage, > null); > {code} > missing_lock: Accessing asyncRMClient without holding lock > SliderAppMaster.appState. Elsewhere, > "org.apache.slider.server.appmaster.SliderAppMaster.asyncRMClient" is > accessed with SliderAppMaster.appState held 5 out of 7 times. > {code} > 1258 agentOpsUrl = > 1259 "https://" + appMasterHostname + ":" + > agentWebApp.getSecuredPort(); > 1260 agentStatusUrl = > 1261 "https://" + appMasterHostname + ":" + agentWebApp.getPort(); > {code} > missing_lock: Accessing appMasterHostname without holding lock > SliderAppMaster.appState. Elsewhere, > "org.apache.slider.server.appmaster.SliderAppMaster.appMasterHostname" is > accessed with SliderAppMaster.appState held 8 out of 10 times. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/state/AppState.java > {code} > 496 return instanceDefinitionSnapshot; > {code} > missing_lock: Accessing instanceDefinitionSnapshot without holding lock > AppState.this. Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.instanceDefinitionSnapshot" > is accessed with AppState.this held 2 out of 3 times. > {code} > 480 return appConfSnapshot; > {code} > missing_lock: Accessing appConfSnapshot without holding lock AppState.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.appConfSnapshot" is > accessed with AppState.this held 2 out of 3 times. > {code} > 484 return internalsSnapshot; > {code} > missing_lock: Accessing internalsSnapshot without holding lock AppState.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.internalsSnapshot" is > accessed with AppState.this held 2 out of 3 times. > {code} > 476 return resourcesSnapshot; > {code} > missing_lock: Accessing resourcesSnapshot without holding lock AppState.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.resourcesSnapshot" is > accessed with AppState.this held 2 out of 3 times. > {code} > 500 return unresolvedInstanceDefinition; > {code} > missing_lock: Accessing unresolvedInstanceDefinition without holding lock > AppState.this. Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.unresolvedInstanceDefinition" > is accessed with AppState.this held 2 out of 3 times. > {code} > 387 return failedContainers; > {code} > missing_lock: Accessing failedContainers without holding lock AppState.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.AppState.failedContainers" is > accessed with AppState.this held 2 out of 3 times. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java > {code} > 283 failedRecently = 0; > {code} > missing_lock: Accessing failedRecently without holding lock NodeEntry.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.NodeEntry.failedRecently" is > accessed with NodeEntry.this held 4 out of 5 times. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeInstance.java > {code} > 255 for (NodeEntry entry : nodeEntries) { > {code} > missing_lock: Accessing nodeEntries without holding lock NodeInstance.this. > Elsewhere, > "org.apache.slider.server.appmaster.state.NodeInstance.nodeEntries" is > accessed with NodeInstance.this held 8 out of 9 times. > {code} > 180 return !nodeState.isUnusable(); > {code} > missing_lock: Accessing nodeState without holding lock NodeInstance.this. > Elsewhere, "org.apache.slider.server.appmaster.state.NodeInstance.nodeState" > is accessed with NodeInstance.this held 4 out of 5 times. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/state/OutstandingRequest.java > {code} > 168 return escalated; > {code} > missing_lock: Accessing escalated without holding lock > OutstandingRequest.this. Elsewhere, > "org.apache.slider.server.appmaster.state.OutstandingRequest.escalated" is > accessed with OutstandingRequest.this held 5 out of 6 times. > {code} > 172 return mayEscalate; > {code} > missing_lock: Accessing mayEscalate without holding lock > OutstandingRequest.this. Elsewhere, > "org.apache.slider.server.appmaster.state.OutstandingRequest.mayEscalate" is > accessed with OutstandingRequest.this held 4 out of 5 times. > {code} > 164 return escalationTimeoutMillis; > {code} > missing_lock: Accessing escalationTimeoutMillis without holding lock > OutstandingRequest.this. Elsewhere, > "org.apache.slider.server.appmaster.state.OutstandingRequest.escalationTimeoutMillis" > is accessed with OutstandingRequest.this held 2 out of 3 times. > h6. > slider-core/src/main/java/org/apache/slider/server/appmaster/state/RoleHistory.java > {code} > 991 return outstandingRequests.listPlacedRequests(); > {code} > missing_lock: Accessing outstandingRequests without holding lock > RoleHistory.this. Elsewhere, > "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is > accessed with RoleHistory.this held 10 out of 14 times. > {code} > 1000 return outstandingRequests.listOpenRequests(); > {code} > missing_lock: Accessing outstandingRequests without holding lock > RoleHistory.this. Elsewhere, > "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is > accessed with RoleHistory.this held 10 out of 14 times. > {code} > 210 if (roleCountInSource != roleSize) { > {code} > missing_lock: Accessing roleSize without holding lock RoleHistory.this. > Elsewhere, "org.apache.slider.server.appmaster.state.RoleHistory.roleSize" is > accessed with RoleHistory.this held 5 out of 7 times. > {code} > 1008 return outstandingRequests.escalateOutstandingRequests(now()); > {code} > missing_lock: Accessing outstandingRequests without holding lock > RoleHistory.this. Elsewhere, > "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is > accessed with RoleHistory.this held 10 out of 14 times. > {code} > 333 NodeInstance nodeInstance = nodemap.get(hostname); > {code} > missing_lock: Accessing nodemap without holding lock RoleHistory.this. > Elsewhere, "org.apache.slider.server.appmaster.state.RoleHistory.nodemap" is > accessed with RoleHistory.this held 16 out of 17 times. > {code} > 1015 return outstandingRequests.cancelOutstandingAARequests(); > {code} > missing_lock: Accessing outstandingRequests without holding lock > RoleHistory.this. Elsewhere, > "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is > accessed with RoleHistory.this held 10 out of 14 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)