github-actions[bot] commented on code in PR #64347:
URL: https://github.com/apache/doris/pull/64347#discussion_r3389553590


##########
be/src/common/cgroup_memory_ctl.cpp:
##########
@@ -107,15 +109,31 @@ struct CgroupsV2Reader : CGroupMemoryCtl::ICgroupsReader {
         std::unordered_map<std::string, int64_t> metrics_map;
         CGroupUtil::read_int_metric_from_cgroup_file((_mount_file_dir / 
"memory.stat"),
                                                      metrics_map);
-        if (*value < metrics_map["inactive_file"]) {
-            return Status::CgroupError("CgroupsV2Reader read_memory_usage 
negative memory usage");
+        int64_t inactive_file =
+                metrics_map.contains("inactive_file") ? 
metrics_map["inactive_file"] : 0;
+        int64_t active_file = metrics_map.contains("active_file") ? 
metrics_map["active_file"] : 0;
+        int64_t slab_reclaimable =
+                metrics_map.contains("slab_reclaimable") ? 
metrics_map["slab_reclaimable"] : 0;
+        if (inactive_file < 1 || active_file < 1 || slab_reclaimable < 1) {

Review Comment:
   These counters can legitimately be zero; that is different from the metric 
being absent. For example, a cgroup with only anonymous memory and no 
file-cache pages reports `inactive_file 0` and `active_file 0`, but 
`refresh_proc_meminfo()` calls this path every 
`memory_maintenance_sleep_time_ms` (50 ms by default), so this warning would be 
emitted continuously. Please check key presence separately from the value, and 
rate-limit any warning for truly missing metrics.



##########
be/src/common/cgroup_memory_ctl.cpp:
##########
@@ -107,15 +109,31 @@ struct CgroupsV2Reader : CGroupMemoryCtl::ICgroupsReader {
         std::unordered_map<std::string, int64_t> metrics_map;
         CGroupUtil::read_int_metric_from_cgroup_file((_mount_file_dir / 
"memory.stat"),
                                                      metrics_map);
-        if (*value < metrics_map["inactive_file"]) {
-            return Status::CgroupError("CgroupsV2Reader read_memory_usage 
negative memory usage");
+        int64_t inactive_file =
+                metrics_map.contains("inactive_file") ? 
metrics_map["inactive_file"] : 0;
+        int64_t active_file = metrics_map.contains("active_file") ? 
metrics_map["active_file"] : 0;
+        int64_t slab_reclaimable =
+                metrics_map.contains("slab_reclaimable") ? 
metrics_map["slab_reclaimable"] : 0;
+        if (inactive_file < 1 || active_file < 1 || slab_reclaimable < 1) {
+            // In this scenario, not return error, ignore it and print log.
+            LOG(WARNING) << "CgroupsV2Reader read_memory_usage missing 
expected metrics in "
+                            "memory.stat, inactive_file: "
+                         << inactive_file << ", active_file: " << active_file
+                         << ", slab_reclaimable: " << slab_reclaimable;
         }
-        // the reason why we subtract inactive_file described here:
+
+        const int64_t reclaimable_usage = inactive_file + active_file + 
slab_reclaimable;
+        if (*value < reclaimable_usage) {

Review Comment:
   `memory.current` and `memory.stat` are read in separate syscalls, so this 
sum is not an atomic snapshot. With concurrent page-cache growth, 
`memory.current` can be from before the growth while `active_file + 
inactive_file + slab_reclaimable` is from after it; returning `CgroupError` 
makes `MemInfo::refresh_proc_meminfo()` drop the cgroup refresh state and 
temporarily recompute physical/available memory from host `/proc/meminfo` 
instead of the cgroup limit. Please handle the race by clamping/retrying rather 
than treating it as a fatal invariant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to