priyeshkaratha commented on PR #10564:
URL: https://github.com/apache/ozone/pull/10564#issuecomment-4764847203

   > > Fixes a datanode metrics lifecycle issue where VolumeInfoMetrics 
remained registered after failVolume(), which could keep triggering 
MethodMetric -> getCommitted and flood logs with NPEs in failed-volume 
scenarios.
   > 
   > I believe this has been fixed in 
[ec2634d](https://github.com/apache/ozone/commit/ec2634d8d25bc8163c7c48fa869fc8bd584f0a6d).
   > 
   > Can you please explain how `HddsVolume.committedBytes` can be `null` for 
failed volume in current code?
   
   You are right. committedBytes won't be null in the code. My idea is to 
unregister VolumeInfoMetrics
   The VolumeInfoMetrics source stays in the metrics registry after failure. 
The timer keeps calling registry.snapshot() then  getCommitted(), 
getContainers(), getVolumeState(), etc. on a failed volume forever, until 
shutdown() is eventually called.
   
   Today those methods are safe (they return values from AtomicLong, 
ConcurrentSkipListSet, or enum state). But the flooding mechanism is still 
structurally active. Any future change that makes one of those @Metric methods 
throw on a failed volume will immediately produce the exact log flood pattern. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to