[GitHub] storm pull request #2764: STORM-3147: Port ClusterSummary as metrics to Stor...

zd-project Thu, 09 Aug 2018 12:16:20 -0700

Github user zd-project commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2764#discussion_r209048016
  
    --- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
    @@ -1984,11 +2074,13 @@ private int fragmentedCpu() {
             Cluster cluster = new Cluster(inimbus, supervisors, 
topoToSchedAssignment, topologies, conf);
             cluster.setStatusMap(idToSchedStatus.get());
     
    -        long beforeSchedule = System.currentTimeMillis();
    +        schedulingStartTime.set(Time.nanoTime());
             scheduler.schedule(topologies, cluster);
    -        long scheduleTimeElapsedMs = System.currentTimeMillis() - 
beforeSchedule;
    -        LOG.debug("Scheduling took {} ms for {} topologies", 
scheduleTimeElapsedMs, topologies.getTopologies().size());
    -        scheduleTopologyTimeMs.update(scheduleTimeElapsedMs);
    +        //Will compiler optimize the order of evalutation and cause race 
condition?
    --- End diff --
    
    If no code reordering happens, gauge should always evaluate `currTime` 
first, and has to get startTime from `schedulingStartTimeNs` is set to null. So 
if we guarantee that elpased is evaluated after that, 
longest-scheduling-time-ms will not exceeds the real longest scheduling time.
    
    That being said, I think the race here should be pretty negligible 
especially if we discard the decimals in ns-to-ms conversion. Meanwhile I did 
hear about complaints of hanging schedulers before, so I say we keep partial 
measurement and remove the comments, or just "please be noticed that it's 
normal to see minor jiggling in the longest scheduling time due to race 
condition."

---

[GitHub] storm pull request #2764: STORM-3147: Port ClusterSummary as metrics to Stor...

Reply via email to