On 2023-07-25 09:02, Niels Möller wrote:

Curious. I would suspect some enviromental factors (e.g., high processor
temperature sometimes leading to some frquency throttling, or something
else on the machine periodically competing for cpu).

Thanks for responding.
That machine runs completely alone.
It just completed 908 rounds of the 11121 rounds.
The pattern changed a bit, once there was an upward phase before going back:
https://stamm-wilbrandt.de/en/forum/908.png

Eich of the 908 round on x-axis takes nearly 10 minutes, so plateaus seen take a long time. Im am on that machine only short times for getting latest data or so.

Because of Ryzen 7000 series burnout risk I applied latest stable 1.24 bios before that run. 7600X CPU max operating temperature is 95°C, and I operated it with PBO 85°C before bios upgrade. But after upgrade I cannot set it above 75°C anymore, and maximally measured temperature with stress test on all threads showed 76°C. So high temperature cannot be it, and not throttling.

I did "perf stat process" before starting the now >6 days run and saw that CPU frequency was always in 5.2..5.4GHz range (5.3GHz is nominal burst CPU frequency).


Here the current state of the PC:

hermann@7600x:~/9383761-digit-prime$ uptime
 21:46:24 up 6 days,  4:12,  1 user,  load average: 1.00, 1.00, 1.00
hermann@7600x:~/9383761-digit-prime$
hermann@7600x:~/9383761-digit-prime$ ps -ef | grep job
root      629465  629454 99 Jul22 ?        4-04:00:15 ./job 297
hermann 1947666 1942377 0 21:46 pts/0 00:00:00 grep --color=auto job
hermann@7600x:~/9383761-digit-prime$

I had to do a hotfix after 2 days of computation to avoid bug that was guaranteed to happen after 75 day, details here:
https://github.com/Hermann-SW/9383761-digit-prime#hotfix

Total non-computing time by hotfix tool was less than 10 seconds only.


Maybe it would be helpful to monitor performance related measurements
such as cpu frequency (or cycle counter), temperature, and count of
cache misses as computation progresses?

AMD is not good at temperature measurement support for Linux, but I found an article with kernel driver change and then all will be fine. I do not want to risk current >70 day computation though, so on Linux side I am blind temperature wise at the moment (all tools get wrong values with bad kernel support).

But I just learned that I can attach perf to a running process, and did that, two times 5 seconds, and once 15 seconds. Always showing 5.23GHz, near nominal 5.3GHz burst frequency.

How do you want me to count cache misses, with perf?
Over longer timeframe I think, multiple of 10min/round?


hermann@7600x:~/9383761-digit-prime$ ps -ef | grep job
root      629465  629454 99 Jul22 ?        4-04:08:14 ./job 297
hermann 1949425 1942377 0 21:54 pts/0 00:00:00 grep --color=auto job hermann@7600x:~/9383761-digit-prime$ echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid
[sudo] password for hermann:
1
hermann@7600x:~/9383761-digit-prime$ sudo perf stat -e cycles,task-clock -p 629465 sleep 5

 Performance counter stats for process id '629465':

    26,173,724,427      cycles                    #    5.234 GHz
5,000.98 msec task-clock # 1.000 CPUs utilized

       5.001249395 seconds time elapsed

hermann@7600x:~/9383761-digit-prime$ sudo perf stat -e cycles,task-clock -p 629465 sleep 5

 Performance counter stats for process id '629465':

    26,156,801,154      cycles                    #    5.231 GHz
5,000.58 msec task-clock # 1.000 CPUs utilized

       5.000851191 seconds time elapsed

hermann@7600x:~/9383761-digit-prime$ sudo perf stat -e cycles,task-clock -p 629465 sleep 15

 Performance counter stats for process id '629465':

    78,483,835,106      cycles                    #    5.232 GHz
14,999.67 msec task-clock # 1.000 CPUs utilized

      15.000559563 seconds time elapsed

hermann@7600x:~/9383761-digit-prime$


Regards,

Hermann.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to