[ 
https://issues.apache.org/jira/browse/HADOOP-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032618#comment-15032618
 ] 

Colin Patrick McCabe commented on HADOOP-12593:
-----------------------------------------------

[~gliptak], as the comment you pasted states, "there is only a single writer to 
thread-local StatisticsData objects."  A single writer cannot race with itself, 
as [~sjlee0] commented.

[~sjlee0] wrote:
bq. I'd be the first to admit that the way it is written is not very clear or 
intentional, but t appears correct. The only exception to the single-writer 
rule is Statistics.rootData. However, that is also currently protected by the 
Statistics instance lock. Again, that could be made more explicit (e.g. 
sometimes synchronization is done in a couple of layers above in the call 
chain).

Hmm.  The declaration of {{rootData}} has a comment saying that it is protected 
by the Statistics lock.  There's also a comment near {{reset}} talking about 
the locking, commenting that "Both reads and writes to rootData are done under 
the lock, so we're free to modify rootData from any thread that holds the 
lock".  Are other places we could add comments to make it clearer?  To be 
honest, I think the locking here is better documented than most places in our 
code base.

I did a quick scan of our uses of {{volatile long}}, and the only place I can 
find that might be a problem is here:
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java}}
 might have a "lost update" here:
{code}
    void benchmarkOne() throws IOException {
      for(int idx = 0; idx < opsPerThread; idx++) {
        if((localNumOpsExecuted+1) % statsOp.ugcRefreshCount == 0)
          refreshUserMappingsProto.refreshUserToGroupsMappings();
        long stat = statsOp.executeOp(daemonId, idx, arg1);
        localNumOpsExecuted++;
        localCumulativeTime += stat;  <=== should be using AtomicLong
      }
    }
{code}

{{localCumulativeTime}} should be an AtomicLong to prevent lost updates.

Perhaps we should change this JIRA to be about fixing this issue.  Or maybe we 
should close this and file the {{localCumulativeTime}} issue as a follow-on, to 
make it less confusing.  [~steve_l], what do you think?

> multiple "volatile long" field declarations exist in the Hadoop codebase
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-12593
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12593
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> If you get your IDE to scan for "volatile long", you find 20-30 entries. 
> Volatile operations on `long` variables are not guaranteed to be atomic, so 
> these usages can be vulnerable to race conditions generating invalid data.
> they need to be replaced by AtomicLong references, except in the specific 
> case that you want performance values for statistics, and are prepared to 
> take the risk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to