21 juin 2016 14:13 "Yannis Aribaud" <b...@d6bell.net> a écrit: > Hi everyone, > > I recently it this bug in the kernel using a vanilla 4.6.2 release. > It seems that somewhere in the load average calculation a division by 0 > occurs (see the stack trace > at the end). > > [snipped] > > I'm not an expert at all but I suspect that is the issue's origin. Shouldn't > the function > cfs_rq_load_avg use an atomic_long_read() to avoid this ?
After digging a bit more, this can't be the problem as this function obviously can't return negative value. I found that it can maybe come from the update_cfs_rq_load_avg function in the following block: if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); removed_load = 1; } The max_t(long, sa->load_avg - r, 0) can result in a negative value keeped by the max_t function as the long would wrap up then generate a division by zero in task_h_load function. Best regards, -- Yannis Aribaud