Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)

2023-04-10 Thread Xiyue Deng


Xiyue Deng  writes:

> Xiyue Deng  writes:
>
>> Xiyue Deng  writes:
>>
>>> So after some more tries it looks like this issue is not directly memory
>>> usage related.  I've tried the following:
>>>
>>> * Using older kernel version when I was on Bullseye.
>>> * Have a cronjob to drop memory caches every minutes.
>>> * Using Gnome on Wayland by default or Xorg.
>>>
>>> And this can still happen when I was running a qemu-based Win11 VM using
>>> virtual manager.  So this rules out the possibility of a kernel issue
>>> and OOM killer issue.  All that is certain is that this issue can be
>>> reproduced when running my qemu-based Win11 VM and in a few hours it
>>> will trigger this lockup.
>>>
>>> As this system has been running Bullseye for a few years with zero
>>> problem, I'm hopeful this should work for Bookworm as well.  If you have
>>> anything in mind that may worth a try please feel free to share.  The
>>> more ideas the better.
>>>
>>> Thanks in advance!
>>
>> So, to rule out possible software issues, I've done a clean install of
>> Bookworm and Bullseye, and this issue still happens.  I guess this
>> largely lowers the possibility of a software cause.  I've also done a
>> 10-hour memtest session and it passed so I guess it was proven to be
>> clean as well.
>>
>> For the next step, I'll go with the hardware aspect.  I want to thank
>> for the helps, suggestions, and brainstorming from various people from
>> #debian{,-next} IRC channels!  Will try to get to the bottom of this.
>>
>
> Actually after I decided to contact the customer service of my box[1],
> after a few rounds of suggestions (reset CMOS, reinstall system, etc.),
> they provided an update to the BIOS that supposed to Windows 10/11
> freezing when accessing the fTPM module.  After flashing the new BIOS,
> I've been running the system on high load for 12+ hours without issue.
> Though a much longer testing period is needed to make sure the fix is
> sufficient, I think this is looking very promising!  Will report back
> after a week.
>
> Hope this is useful for anyway having similar issues.

It has been over a week after applying the BIOS update to my Minisforum
Elitemini HX90[1] and except a manual reboot my system has been running
totally fine!  So I'd consider this issue as resolved.  In case you are
using similar system from the same vendor and experiencing similar
system freezing issues, please contact the customer support for a
similar BIOS updates.

I'd like to thank the wonderful people at #debian{,-next} on IRC again
for helping me and the suggestions during the debugging!

>
> [1] https://store.minisforum.com/products/hx90
>
>>>
>>> (Replies to Timothy below inline.)
>>>
>>> Timothy M Butterworth  writes:
>>>
 On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng  wrote:

  Timothy M Butterworth  writes:

  > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng  wrote:
  >
  >  Hi,
  >
  >  I have an AMD64 system[1] that has been running fine on Bullseye for a
  >  few years, and recently following the soft freeze on Bookworm I 
 upgraded
  >  my system to try it out, and the system has been frequently losing
  >  response.  Initially I thought it was because of some issue of my
  >  qemu-based Win11 virtual machine as it happens most frequently when it
  >  was running and filed a bug report[2].  But then it happened again
  >  without it running because some other program had slowly used up most 
 of
  >  the memory again, though not as frequently as the VM was running.
  >
  >  Now in retrospect, when I was using Bullseye the total memory was also
  >  mostly used up most of the time, with a few hundreds of megabytes
  >  reported as free and a few Gigs reported as cache, and it has been
  >  running fine.  I'm not sure what has changed in Bookworm and having to
  >  manually restart the machine is a pretty annoying and unpleasant
  >  experience.
  >
  >  Does anyone seeing a similar problem as well?  What can I do to avoid
  >  this?  Any suggest is welcome.
  >
  >  Thanks in advance.
  >
  > Open the command prompt and run `su` to switch user to root. Then run 
 `sync && echo 1 > /proc/sys/vm/drop_caches`
  as
  > root. This will write RAM caches to the hard drive to free up memory. 
 You have to run this as root as sudo, my
  preferred
  > method, returns a permission disabled error.

  Thanks for the tip!  I'll try it out.
>>>
>>> So unfortunately this doesn't help either, as it happens again with very
>>> low cache usage.
>>>
>>> `free -h`:
>>>
>>>totalusedfree  shared  buff/cache   
>>> available
>>> Mem:30Gi13Gi16Gi   206Mi   1.4Gi
>>> 17Gi
>>> Swap:  979Mi  0B   979Mi
>>>
>>> `top` excerpt:
>>>
>>> top - 14:55:05 up 18 min, 11 users,  load average: 1.77, 1.65, 1.09
>>> Tasks: 504 

Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)

2023-03-30 Thread Xiyue Deng


Xiyue Deng  writes:

> Xiyue Deng  writes:
>
>> So after some more tries it looks like this issue is not directly memory
>> usage related.  I've tried the following:
>>
>> * Using older kernel version when I was on Bullseye.
>> * Have a cronjob to drop memory caches every minutes.
>> * Using Gnome on Wayland by default or Xorg.
>>
>> And this can still happen when I was running a qemu-based Win11 VM using
>> virtual manager.  So this rules out the possibility of a kernel issue
>> and OOM killer issue.  All that is certain is that this issue can be
>> reproduced when running my qemu-based Win11 VM and in a few hours it
>> will trigger this lockup.
>>
>> As this system has been running Bullseye for a few years with zero
>> problem, I'm hopeful this should work for Bookworm as well.  If you have
>> anything in mind that may worth a try please feel free to share.  The
>> more ideas the better.
>>
>> Thanks in advance!
>
> So, to rule out possible software issues, I've done a clean install of
> Bookworm and Bullseye, and this issue still happens.  I guess this
> largely lowers the possibility of a software cause.  I've also done a
> 10-hour memtest session and it passed so I guess it was proven to be
> clean as well.
>
> For the next step, I'll go with the hardware aspect.  I want to thank
> for the helps, suggestions, and brainstorming from various people from
> #debian{,-next} IRC channels!  Will try to get to the bottom of this.
>

Actually after I decided to contact the customer service of my box[1],
after a few rounds of suggestions (reset CMOS, reinstall system, etc.),
they provided an update to the BIOS that supposed to Windows 10/11
freezing when accessing the fTPM module.  After flashing the new BIOS,
I've been running the system on high load for 12+ hours without issue.
Though a much longer testing period is needed to make sure the fix is
sufficient, I think this is looking very promising!  Will report back
after a week.

Hope this is useful for anyway having similar issues.

[1] https://store.minisforum.com/products/hx90

>>
>> (Replies to Timothy below inline.)
>>
>> Timothy M Butterworth  writes:
>>
>>> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng  wrote:
>>>
>>>  Timothy M Butterworth  writes:
>>>
>>>  > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng  wrote:
>>>  >
>>>  >  Hi,
>>>  >
>>>  >  I have an AMD64 system[1] that has been running fine on Bullseye for a
>>>  >  few years, and recently following the soft freeze on Bookworm I upgraded
>>>  >  my system to try it out, and the system has been frequently losing
>>>  >  response.  Initially I thought it was because of some issue of my
>>>  >  qemu-based Win11 virtual machine as it happens most frequently when it
>>>  >  was running and filed a bug report[2].  But then it happened again
>>>  >  without it running because some other program had slowly used up most of
>>>  >  the memory again, though not as frequently as the VM was running.
>>>  >
>>>  >  Now in retrospect, when I was using Bullseye the total memory was also
>>>  >  mostly used up most of the time, with a few hundreds of megabytes
>>>  >  reported as free and a few Gigs reported as cache, and it has been
>>>  >  running fine.  I'm not sure what has changed in Bookworm and having to
>>>  >  manually restart the machine is a pretty annoying and unpleasant
>>>  >  experience.
>>>  >
>>>  >  Does anyone seeing a similar problem as well?  What can I do to avoid
>>>  >  this?  Any suggest is welcome.
>>>  >
>>>  >  Thanks in advance.
>>>  >
>>>  > Open the command prompt and run `su` to switch user to root. Then run 
>>> `sync && echo 1 > /proc/sys/vm/drop_caches`
>>>  as
>>>  > root. This will write RAM caches to the hard drive to free up memory. 
>>> You have to run this as root as sudo, my
>>>  preferred
>>>  > method, returns a permission disabled error.
>>>
>>>  Thanks for the tip!  I'll try it out.
>>
>> So unfortunately this doesn't help either, as it happens again with very
>> low cache usage.
>>
>> `free -h`:
>>
>>totalusedfree  shared  buff/cache   
>> available
>> Mem:30Gi13Gi16Gi   206Mi   1.4Gi
>> 17Gi
>> Swap:  979Mi  0B   979Mi
>>
>> `top` excerpt:
>>
>> top - 14:55:05 up 18 min, 11 users,  load average: 1.77, 1.65, 1.09
>> Tasks: 504 total,   1 running, 503 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 68.8 id,  0.0 wa,  0.0 hi,  6.2 si,  0.0 
>> st 
>> MiB Mem :  31519.9 total,  16972.6 free,  13759.0 used,   1447.6 buff/cache  
>>
>> MiB Swap:980.0 total,980.0 free,  0.0 used.  17760.8 avail Mem 
>>
>> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ 
>> COMMAND
>>8886 libvirt+  20   0   11.1g   8.1g  26580 S  87.5  26.4  17:38.47 
>> qemu-sy+
>>5434 xiyueden  20   0 4047004   1.2g 170036 S   0.0   4.0   0:41.00 
>> thunder+
>>5143 xiyueden  20   0 7056664 526296 191152 S   0.0   1.6   2:19.65 
>> 

Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)

2023-03-28 Thread Xiyue Deng


Xiyue Deng  writes:

> So after some more tries it looks like this issue is not directly memory
> usage related.  I've tried the following:
>
> * Using older kernel version when I was on Bullseye.
> * Have a cronjob to drop memory caches every minutes.
> * Using Gnome on Wayland by default or Xorg.
>
> And this can still happen when I was running a qemu-based Win11 VM using
> virtual manager.  So this rules out the possibility of a kernel issue
> and OOM killer issue.  All that is certain is that this issue can be
> reproduced when running my qemu-based Win11 VM and in a few hours it
> will trigger this lockup.
>
> As this system has been running Bullseye for a few years with zero
> problem, I'm hopeful this should work for Bookworm as well.  If you have
> anything in mind that may worth a try please feel free to share.  The
> more ideas the better.
>
> Thanks in advance!

So, to rule out possible software issues, I've done a clean install of
Bookworm and Bullseye, and this issue still happens.  I guess this
largely lowers the possibility of a software cause.  I've also done a
10-hour memtest session and it passed so I guess it was proven to be
clean as well.

For the next step, I'll go with the hardware aspect.  I want to thank
for the helps, suggestions, and brainstorming from various people from
#debian{,-next} IRC channels!  Will try to get to the bottom of this.

>
> (Replies to Timothy below inline.)
>
> Timothy M Butterworth  writes:
>
>> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng  wrote:
>>
>>  Timothy M Butterworth  writes:
>>
>>  > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng  wrote:
>>  >
>>  >  Hi,
>>  >
>>  >  I have an AMD64 system[1] that has been running fine on Bullseye for a
>>  >  few years, and recently following the soft freeze on Bookworm I upgraded
>>  >  my system to try it out, and the system has been frequently losing
>>  >  response.  Initially I thought it was because of some issue of my
>>  >  qemu-based Win11 virtual machine as it happens most frequently when it
>>  >  was running and filed a bug report[2].  But then it happened again
>>  >  without it running because some other program had slowly used up most of
>>  >  the memory again, though not as frequently as the VM was running.
>>  >
>>  >  Now in retrospect, when I was using Bullseye the total memory was also
>>  >  mostly used up most of the time, with a few hundreds of megabytes
>>  >  reported as free and a few Gigs reported as cache, and it has been
>>  >  running fine.  I'm not sure what has changed in Bookworm and having to
>>  >  manually restart the machine is a pretty annoying and unpleasant
>>  >  experience.
>>  >
>>  >  Does anyone seeing a similar problem as well?  What can I do to avoid
>>  >  this?  Any suggest is welcome.
>>  >
>>  >  Thanks in advance.
>>  >
>>  > Open the command prompt and run `su` to switch user to root. Then run 
>> `sync && echo 1 > /proc/sys/vm/drop_caches`
>>  as
>>  > root. This will write RAM caches to the hard drive to free up memory. You 
>> have to run this as root as sudo, my
>>  preferred
>>  > method, returns a permission disabled error.
>>
>>  Thanks for the tip!  I'll try it out.
>
> So unfortunately this doesn't help either, as it happens again with very
> low cache usage.
>
> `free -h`:
>
>totalusedfree  shared  buff/cache   
> available
> Mem:30Gi13Gi16Gi   206Mi   1.4Gi
> 17Gi
> Swap:  979Mi  0B   979Mi
>
> `top` excerpt:
>
> top - 14:55:05 up 18 min, 11 users,  load average: 1.77, 1.65, 1.09
> Tasks: 504 total,   1 running, 503 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 68.8 id,  0.0 wa,  0.0 hi,  6.2 si,  0.0 
> st 
> MiB Mem :  31519.9 total,  16972.6 free,  13759.0 used,   1447.6 buff/cache   
>   
> MiB Swap:980.0 total,980.0 free,  0.0 used.  17760.8 avail Mem 
>
> PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
>8886 libvirt+  20   0   11.1g   8.1g  26580 S  87.5  26.4  17:38.47 
> qemu-sy+
>5434 xiyueden  20   0 4047004   1.2g 170036 S   0.0   4.0   0:41.00 
> thunder+
>5143 xiyueden  20   0 7056664 526296 191152 S   0.0   1.6   2:19.65 
> gnome-s+
> ...
>
>>
>>  >  
>>  >  
>>  >  [1] System info from inxi:
>>  >  CPU: 8-core AMD Ryzen 9 5900HX with Radeon Graphics (-MT MCP-)
>>  >  speed/min/max: 1199/1200/4679 MHz Kernel: 6.1.0-5-amd64 x86_64 Up: 7m
>>  >  Mem: 4844.4/31521.3 MiB (15.4%) Storage: 476.94 GiB (54.5% used) Procs: 
>> 535
>>  >  Shell: Bash inxi: 3.3.25
>>  >
>>  > Your system has 32 GB of RAM, it should not be getting used up. Run `free 
>> -h` What desktop are you using: KDE,
>>  GNOME,
>>  > LXQT etc? Are you using Wayland or X11? It looks like you have a memory 
>> leak in one of your applications. Try
>>  running
>>  > `top` and press `m` to sort by memory utilization.
>>
>>  I actually have a cronjob that runs every 5 minutes and collects memory
>>  

Re: Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)

2023-03-14 Thread Anssi Saari
Xiyue Deng  writes:

> As this system has been running Bullseye for a few years with zero
> problem, I'm hopeful this should work for Bookworm as well.  If you have
> anything in mind that may worth a try please feel free to share.  The
> more ideas the better.

To me the interesting question is, does the problem disappear if you go
back to Bullseye? If not then it's likely a hardware problem.



Bookworm system randomly not responding (was Re: Bookworm system not responding on high memory usage)

2023-03-12 Thread Xiyue Deng
So after some more tries it looks like this issue is not directly memory
usage related.  I've tried the following:

* Using older kernel version when I was on Bullseye.
* Have a cronjob to drop memory caches every minutes.
* Using Gnome on Wayland by default or Xorg.

And this can still happen when I was running a qemu-based Win11 VM using
virtual manager.  So this rules out the possibility of a kernel issue
and OOM killer issue.  All that is certain is that this issue can be
reproduced when running my qemu-based Win11 VM and in a few hours it
will trigger this lockup.

As this system has been running Bullseye for a few years with zero
problem, I'm hopeful this should work for Bookworm as well.  If you have
anything in mind that may worth a try please feel free to share.  The
more ideas the better.

Thanks in advance!

(Replies to Timothy below inline.)

Timothy M Butterworth  writes:

> On Sat, Mar 11, 2023 at 3:30 AM Xiyue Deng  wrote:
>
>  Timothy M Butterworth  writes:
>
>  > On Fri, Mar 10, 2023 at 7:57 PM Xiyue Deng  wrote:
>  >
>  >  Hi,
>  >
>  >  I have an AMD64 system[1] that has been running fine on Bullseye for a
>  >  few years, and recently following the soft freeze on Bookworm I upgraded
>  >  my system to try it out, and the system has been frequently losing
>  >  response.  Initially I thought it was because of some issue of my
>  >  qemu-based Win11 virtual machine as it happens most frequently when it
>  >  was running and filed a bug report[2].  But then it happened again
>  >  without it running because some other program had slowly used up most of
>  >  the memory again, though not as frequently as the VM was running.
>  >
>  >  Now in retrospect, when I was using Bullseye the total memory was also
>  >  mostly used up most of the time, with a few hundreds of megabytes
>  >  reported as free and a few Gigs reported as cache, and it has been
>  >  running fine.  I'm not sure what has changed in Bookworm and having to
>  >  manually restart the machine is a pretty annoying and unpleasant
>  >  experience.
>  >
>  >  Does anyone seeing a similar problem as well?  What can I do to avoid
>  >  this?  Any suggest is welcome.
>  >
>  >  Thanks in advance.
>  >
>  > Open the command prompt and run `su` to switch user to root. Then run 
> `sync && echo 1 > /proc/sys/vm/drop_caches`
>  as
>  > root. This will write RAM caches to the hard drive to free up memory. You 
> have to run this as root as sudo, my
>  preferred
>  > method, returns a permission disabled error.
>
>  Thanks for the tip!  I'll try it out.

So unfortunately this doesn't help either, as it happens again with very
low cache usage.

`free -h`:

   totalusedfree  shared  buff/cache   available
Mem:30Gi13Gi16Gi   206Mi   1.4Gi17Gi
Swap:  979Mi  0B   979Mi

`top` excerpt:

top - 14:55:05 up 18 min, 11 users,  load average: 1.77, 1.65, 1.09
Tasks: 504 total,   1 running, 503 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 68.8 id,  0.0 wa,  0.0 hi,  6.2 si,  0.0 st 
MiB Mem :  31519.9 total,  16972.6 free,  13759.0 used,   1447.6 buff/cache 
MiB Swap:980.0 total,980.0 free,  0.0 used.  17760.8 avail Mem 

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
   8886 libvirt+  20   0   11.1g   8.1g  26580 S  87.5  26.4  17:38.47 qemu-sy+
   5434 xiyueden  20   0 4047004   1.2g 170036 S   0.0   4.0   0:41.00 thunder+
   5143 xiyueden  20   0 7056664 526296 191152 S   0.0   1.6   2:19.65 gnome-s+
...

>
>  >  
>  >  
>  >  [1] System info from inxi:
>  >  CPU: 8-core AMD Ryzen 9 5900HX with Radeon Graphics (-MT MCP-)
>  >  speed/min/max: 1199/1200/4679 MHz Kernel: 6.1.0-5-amd64 x86_64 Up: 7m
>  >  Mem: 4844.4/31521.3 MiB (15.4%) Storage: 476.94 GiB (54.5% used) Procs: 
> 535
>  >  Shell: Bash inxi: 3.3.25
>  >
>  > Your system has 32 GB of RAM, it should not be getting used up. Run `free 
> -h` What desktop are you using: KDE,
>  GNOME,
>  > LXQT etc? Are you using Wayland or X11? It looks like you have a memory 
> leak in one of your applications. Try
>  running
>  > `top` and press `m` to sort by memory utilization.
>
>  I actually have a cronjob that runs every 5 minutes and collects memory
>  usage.  As I mentioned, it usually happens when I use qemu (see [1] for
>  free and [2] for top).  At another time it happened when deluge is
>  leaking memory (see [3] for free [4] for top).
>
>  Interestingly as you can see, in all such cases, even though the free
>  amount is low, the buff/cache is still pretty large so the system is not
>  really overloaded.  Plus, on Bullseye such memory usage also happens all
>  the time and this never happened.  I was suspecting that maybe the
>  kernel is panicking when memory hits certain limit, but I don't see it
>  in kern.log or syslog.
>
>  Any suggestion to restore to Bullseye status is appreciated.  Thanks in
>  advance!
>
>  [1] `free