Github user zd-project commented on a diff in the pull request:
https://github.com/apache/storm/pull/2764#discussion_r209048016
--- Diff:
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -1984,11 +2074,13 @@ private int fragmentedCpu() {
Cluster cluster = new Cluster(inimbus, supervisors,
topoToSchedAssignment, topologies, conf);
cluster.setStatusMap(idToSchedStatus.get());
- long beforeSchedule = System.currentTimeMillis();
+ schedulingStartTime.set(Time.nanoTime());
scheduler.schedule(topologies, cluster);
- long scheduleTimeElapsedMs = System.currentTimeMillis() -
beforeSchedule;
- LOG.debug("Scheduling took {} ms for {} topologies",
scheduleTimeElapsedMs, topologies.getTopologies().size());
- scheduleTopologyTimeMs.update(scheduleTimeElapsedMs);
+ //Will compiler optimize the order of evalutation and cause race
condition?
--- End diff --
If no code reordering happens, gauge should always evaluate `currTime`
first, and has to get startTime from `schedulingStartTimeNs` is set to null. So
if we guarantee that elpased is evaluated after that,
longest-scheduling-time-ms will not exceeds the real longest scheduling time.
That being said, I think the race here should be pretty negligible
especially if we discard the decimals in ns-to-ms conversion. Meanwhile I did
hear about complaints of hanging schedulers before, so I say we keep partial
measurement and remove the comments, or just "please be noticed that it's
normal to see minor jiggling in the longest scheduling time due to race
condition."
---