[ 
https://issues.apache.org/jira/browse/SLIDER-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated SLIDER-1154:
-----------------------------------
    Attachment: SLIDER-1154.1.patch

[~gsaha], here is my analysis and a patch. Another pair of eyes would be 
appreciated.

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/SliderAppMaster.java
{code}
 1594    if (amRegistrationData == null) {
{code}
{code}
 1614      asyncRMClient.unregisterApplicationMaster(appStatus, appMessage, 
null);
{code}
These occur in a synchronized method that cleans up after AM stop has been 
initiated. These lines do not rely on appState, so additional appState lock 
does not need to be held.

{code}
1258    agentOpsUrl =
1259        "https://"; + appMasterHostname + ":" + agentWebApp.getSecuredPort();
1260    agentStatusUrl =
1261        "https://"; + appMasterHostname + ":" + agentWebApp.getPort();
{code}
The private startAgentWebApp method containing these lines is only called when 
appState lock is held.

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/state/AppState.java
{code}
496    return instanceDefinitionSnapshot;
{code}
This get method should be made synchronized so partial instance definition 
updates are not seen.

{code}
480    return appConfSnapshot;
484    return internalsSnapshot;
476    return resourcesSnapshot;
500    return unresolvedInstanceDefinition;
387    return failedContainers;
{code}
The appConfSnapshot, internalsSnapshot, resourcesSnapshot, 
unresolvedInstanceDefinition, and failedContainers appear to be modified 
atomically, so these getters do not need to be synchronized.

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java
{code}
283    failedRecently = 0;
{code}
I'll synchronize this method.

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeInstance.java
{code}
255    for (NodeEntry entry : nodeEntries) {
{code}
This is in a "toFullString" method that is only used for logging, so it doesn't 
need to be synchronized.

{code}
180    return !nodeState.isUnusable();
{code}
This is changed atomically and doesn't need to be synchronized (NodeState is an 
enum).

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/state/OutstandingRequest.java
{code}
168    return escalated;
{code}
I'll synchronize this method.

{code}
172    return mayEscalate;
164    return escalationTimeoutMillis;
{code}
These getters don't need to be synchronized. One is only used in a test, the 
other is unused.

h6. 
slider-core/src/main/java/org/apache/slider/server/appmaster/state/RoleHistory.java
{code}
 991    return outstandingRequests.listPlacedRequests();
1000    return outstandingRequests.listOpenRequests();
{code}
These methods are VisibleForTesting.

{code}
210    if (roleCountInSource != roleSize) {
{code}
This is in a private method only called from a synchronized method.

{code}
1008    return outstandingRequests.escalateOutstandingRequests(now());
{code}
I'll synchronize this method.

{code}
333    NodeInstance nodeInstance = nodemap.get(hostname);
{code}
It looks like this getNodeInformation method is only for displaying node 
information to the user through REST/IPC endpoints. Also the javadoc says "It's 
OK to be incomplete," which is probably why it wasn't synchronized in the first 
place. Not sure if we should add synchronization or not.

{code}
1015    return outstandingRequests.cancelOutstandingAARequests();
{code}
The method outstandingRequests.cancelOutstandingAARequests has synchronization 
which should be sufficient.

> Code issues - 21 concurrent data access violations
> --------------------------------------------------
>
>                 Key: SLIDER-1154
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1154
>             Project: Slider
>          Issue Type: Bug
>          Components: other
>    Affects Versions: Slider 0.91
>            Reporter: Gour Saha
>             Fix For: Slider 1.0.0
>
>         Attachments: SLIDER-1154.1.patch
>
>
> Following 21 possible concurrent data access violations needs to be evaluated 
> and fixed.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/SliderAppMaster.java
> {code}
> 1594    if (amRegistrationData == null) {
> {code}
> missing_lock: Accessing amRegistrationData without holding lock 
> SliderAppMaster.appState. Elsewhere, 
> "org.apache.slider.server.appmaster.SliderAppMaster.amRegistrationData" is 
> accessed with SliderAppMaster.appState held 5 out of 6 times.
> {code}
> 1614      asyncRMClient.unregisterApplicationMaster(appStatus, appMessage, 
> null);
> {code}
> missing_lock: Accessing asyncRMClient without holding lock 
> SliderAppMaster.appState. Elsewhere, 
> "org.apache.slider.server.appmaster.SliderAppMaster.asyncRMClient" is 
> accessed with SliderAppMaster.appState held 5 out of 7 times.
> {code}
> 1258    agentOpsUrl =
> 1259        "https://"; + appMasterHostname + ":" + 
> agentWebApp.getSecuredPort();
> 1260    agentStatusUrl =
> 1261        "https://"; + appMasterHostname + ":" + agentWebApp.getPort();
> {code}
> missing_lock: Accessing appMasterHostname without holding lock 
> SliderAppMaster.appState. Elsewhere, 
> "org.apache.slider.server.appmaster.SliderAppMaster.appMasterHostname" is 
> accessed with SliderAppMaster.appState held 8 out of 10 times.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/state/AppState.java
> {code}
> 496    return instanceDefinitionSnapshot;
> {code}
> missing_lock: Accessing instanceDefinitionSnapshot without holding lock 
> AppState.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.instanceDefinitionSnapshot"
>  is accessed with AppState.this held 2 out of 3 times.
> {code}
> 480    return appConfSnapshot;
> {code}
> missing_lock: Accessing appConfSnapshot without holding lock AppState.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.appConfSnapshot" is 
> accessed with AppState.this held 2 out of 3 times.
> {code}
> 484    return internalsSnapshot;
> {code}
> missing_lock: Accessing internalsSnapshot without holding lock AppState.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.internalsSnapshot" is 
> accessed with AppState.this held 2 out of 3 times.
> {code}
> 476    return resourcesSnapshot;
> {code}
> missing_lock: Accessing resourcesSnapshot without holding lock AppState.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.resourcesSnapshot" is 
> accessed with AppState.this held 2 out of 3 times.
> {code}
> 500    return unresolvedInstanceDefinition;
> {code}
> missing_lock: Accessing unresolvedInstanceDefinition without holding lock 
> AppState.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.unresolvedInstanceDefinition"
>  is accessed with AppState.this held 2 out of 3 times.
> {code}
> 387    return failedContainers;
> {code}
> missing_lock: Accessing failedContainers without holding lock AppState.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.AppState.failedContainers" is 
> accessed with AppState.this held 2 out of 3 times.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java
> {code}
> 283    failedRecently = 0;
> {code}
> missing_lock: Accessing failedRecently without holding lock NodeEntry.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.NodeEntry.failedRecently" is 
> accessed with NodeEntry.this held 4 out of 5 times.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeInstance.java
> {code}
> 255    for (NodeEntry entry : nodeEntries) {
> {code}
> missing_lock: Accessing nodeEntries without holding lock NodeInstance.this. 
> Elsewhere, 
> "org.apache.slider.server.appmaster.state.NodeInstance.nodeEntries" is 
> accessed with NodeInstance.this held 8 out of 9 times.
> {code}
> 180    return !nodeState.isUnusable();
> {code}
> missing_lock: Accessing nodeState without holding lock NodeInstance.this. 
> Elsewhere, "org.apache.slider.server.appmaster.state.NodeInstance.nodeState" 
> is accessed with NodeInstance.this held 4 out of 5 times.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/state/OutstandingRequest.java
> {code}
> 168    return escalated;
> {code}
> missing_lock: Accessing escalated without holding lock 
> OutstandingRequest.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.OutstandingRequest.escalated" is 
> accessed with OutstandingRequest.this held 5 out of 6 times.
> {code}
> 172    return mayEscalate;
> {code}
> missing_lock: Accessing mayEscalate without holding lock 
> OutstandingRequest.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.OutstandingRequest.mayEscalate" is 
> accessed with OutstandingRequest.this held 4 out of 5 times.
> {code}
> 164    return escalationTimeoutMillis;
> {code}
> missing_lock: Accessing escalationTimeoutMillis without holding lock 
> OutstandingRequest.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.OutstandingRequest.escalationTimeoutMillis"
>  is accessed with OutstandingRequest.this held 2 out of 3 times.
> h6. 
> slider-core/src/main/java/org/apache/slider/server/appmaster/state/RoleHistory.java
> {code}
>  991    return outstandingRequests.listPlacedRequests();
> {code}
> missing_lock: Accessing outstandingRequests without holding lock 
> RoleHistory.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is 
> accessed with RoleHistory.this held 10 out of 14 times.
> {code}
> 1000    return outstandingRequests.listOpenRequests();
> {code}
> missing_lock: Accessing outstandingRequests without holding lock 
> RoleHistory.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is 
> accessed with RoleHistory.this held 10 out of 14 times.
> {code}
> 210    if (roleCountInSource != roleSize) {
> {code}
> missing_lock: Accessing roleSize without holding lock RoleHistory.this. 
> Elsewhere, "org.apache.slider.server.appmaster.state.RoleHistory.roleSize" is 
> accessed with RoleHistory.this held 5 out of 7 times.
> {code}
> 1008    return outstandingRequests.escalateOutstandingRequests(now());
> {code}
> missing_lock: Accessing outstandingRequests without holding lock 
> RoleHistory.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is 
> accessed with RoleHistory.this held 10 out of 14 times.
> {code}
> 333    NodeInstance nodeInstance = nodemap.get(hostname);
> {code}
> missing_lock: Accessing nodemap without holding lock RoleHistory.this. 
> Elsewhere, "org.apache.slider.server.appmaster.state.RoleHistory.nodemap" is 
> accessed with RoleHistory.this held 16 out of 17 times.
> {code}
> 1015    return outstandingRequests.cancelOutstandingAARequests();
> {code}
> missing_lock: Accessing outstandingRequests without holding lock 
> RoleHistory.this. Elsewhere, 
> "org.apache.slider.server.appmaster.state.RoleHistory.outstandingRequests" is 
> accessed with RoleHistory.this held 10 out of 14 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to