Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Sean Kamath



> On Dec 3, 2022, at 09:06, Stuart Henderson  wrote:
> AFAIK the main options available at that point are:
> 
> deadlocks waiting for resources
> detect the problem and randomly kill processes (e.g. linux oom killer)
> detect the problem and panic


I recall a long time ago reading on LKML that if the oom-killer is triggered, 
the recommendation is to reboot as soon as possible.  Many people don’t even 
know it runs until they can’t figure out why some random daemon is not running.

> (in particular a lot
> of software really doesn't behave well when malloc fails)

Thank you Linux for giving us the guaranteed success of malloc(). . .

I hate overcommit.  I get why it’s there, but that doesn’t mean I can’t hate it.

Sean


Re: Xorg freeze with ThinkPad A485 / ATI Radeon Vega

2022-12-03 Thread Joel Carnat

Le 03/12/2022 à 21:51, Adriano Barbosa a écrit :

On Sat, Dec 03, 2022 at 06:01:38PM +0100, Joel Carnat wrote:

Le 02/12/2022 à 10:21, Bodie a écrit :

On Fri Dec 2, 2022 at 12:14 AM CET, Joel Carnat wrote:

Hi,

About once a week, Xorg freezes while I'm using my ThinkPad A485 with OpenBSD 
7.2. I've tried switching the window manager (XFCE, Gnome, WindowMaker, cwm) 
but it still happens. I only have a few apps opened (Firefox ESR, a terminal, a 
file manager). Tonight, I had just rebooted the system (because of syspatch and 
fw_update) and uptime was 1H30. The other times, I could suspend/resume a few 
times until freeze happened.

Xorg is frozen in the sense that the cursor can only move but can't interact 
with windows. Same for the keyboard, no shortcuts works. I can't even switch to 
console with Ctrl+Alt+F1. I'm stuck with a screenshot-like of what I was doing.

Note that sshd does work. I can remotely connect to the laptop. If I restart 
xenodm/gdm, it just fails. So I have to ˋrebootˋ.

dmesg only outputs:
[drm] *ERROR* ring sdma0 timeout, signaled seq=19512, emitted seq=19512
[drm] *ERROR* Process information: process  pid 0 thread  pid

I've attached the full dmesg and Xorg logs.

Is there something I can do to debug further?



What is happening in other parts? Like top(1), systat(1), vmstat(8),
Any modifications in /etc/sysctl.conf for more files, connections,..
What login class are you in?



Oh, it seems nextcloud (Nextcloud Client) taking a bunch of resources (about
80% CPU in top) has to be killed from ssh. Then Xorg starts responding
again...

What is weird is that issuing various commands in SSH do not suffer from
this freeze / slow effects. Only the X environment.



Did you experience this on older versions?
Last update I added a build dependency (x11/gnome/libcloudproviders)
as a significant difference besides the upgrade itself, but I have no
reason to think that is the cause.
Could you test without this dependency?


The version I'm using is 3.6.1p0.

Cheers,
Joel C.



Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Sven F.
On Sat, Dec 3, 2022 at 12:08 PM Stuart Henderson
 wrote:
>
> On 2022-12-03, Sven F.  wrote:
> > Bit sad the kernel stopped working thought.
>
> AFAIK the main options available at that point are:
>
> deadlocks waiting for resources
> detect the problem and randomly kill processes (e.g. linux oom killer)
> detect the problem and panic

i was idealizing keep enough resource for critical process like
 * shell
 * sshd

>
> There isn't really a lot else it could do, it has already done things
> like reduce buffer cache by this point (ok not 100% of cache in the top
> output you show, but a fair bit of it).
>
> Actually I was wrong with "Tweaking login.conf won't help"; you could
> reduce the max datasize to something that fits, to protect the machine.
> While this won't help actually run the software (in particular a lot
> of software really doesn't behave well when malloc fails), it might
> help avoid deadlocks.
>

yes going that route and checking the usage of the DB, clearly the problem here


-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do



Re: Xorg freeze with ThinkPad A485 / ATI Radeon Vega

2022-12-03 Thread Adriano Barbosa
On Sat, Dec 03, 2022 at 06:01:38PM +0100, Joel Carnat wrote:
> Le 02/12/2022 à 10:21, Bodie a écrit :
> > On Fri Dec 2, 2022 at 12:14 AM CET, Joel Carnat wrote:
> > > Hi,
> > > 
> > > About once a week, Xorg freezes while I'm using my ThinkPad A485 with 
> > > OpenBSD 7.2. I've tried switching the window manager (XFCE, Gnome, 
> > > WindowMaker, cwm) but it still happens. I only have a few apps opened 
> > > (Firefox ESR, a terminal, a file manager). Tonight, I had just rebooted 
> > > the system (because of syspatch and fw_update) and uptime was 1H30. The 
> > > other times, I could suspend/resume a few times until freeze happened.
> > > 
> > > Xorg is frozen in the sense that the cursor can only move but can't 
> > > interact with windows. Same for the keyboard, no shortcuts works. I can't 
> > > even switch to console with Ctrl+Alt+F1. I'm stuck with a screenshot-like 
> > > of what I was doing.
> > > 
> > > Note that sshd does work. I can remotely connect to the laptop. If I 
> > > restart xenodm/gdm, it just fails. So I have to ˋrebootˋ.
> > > 
> > > dmesg only outputs:
> > > [drm] *ERROR* ring sdma0 timeout, signaled seq=19512, emitted seq=19512
> > > [drm] *ERROR* Process information: process  pid 0 thread  pid
> > > 
> > > I've attached the full dmesg and Xorg logs.
> > > 
> > > Is there something I can do to debug further?
> > > 
> > 
> > What is happening in other parts? Like top(1), systat(1), vmstat(8),
> > Any modifications in /etc/sysctl.conf for more files, connections,..
> > What login class are you in?
> > 
> 
> Oh, it seems nextcloud (Nextcloud Client) taking a bunch of resources (about
> 80% CPU in top) has to be killed from ssh. Then Xorg starts responding
> again...
> 
> What is weird is that issuing various commands in SSH do not suffer from
> this freeze / slow effects. Only the X environment.
> 

Did you experience this on older versions?
Last update I added a build dependency (x11/gnome/libcloudproviders)
as a significant difference besides the upgrade itself, but I have no
reason to think that is the cause.
Could you test without this dependency?

Obrigado!
--
Adriano


Index: Makefile
===
RCS file: /cvs/ports/net/nextcloudclient/Makefile,v
retrieving revision 1.42
diff -u -p -r1.42 Makefile
--- Makefile17 Nov 2022 06:16:04 -  1.42
+++ Makefile3 Dec 2022 20:34:23 -
@@ -32,18 +32,16 @@ WANTLIB += sqlite3 ssl z
 MODULES =  devel/cmake \
x11/qt5
 
-BUILD_DEPENDS =devel/gettext,-tools \
-   x11/gnome/libcloudproviders
+BUILD_DEPENDS =devel/gettext,-tools
 
 # for converting svg icons to png
 BUILD_DEPENDS +=   x11/gnome/librsvg
 
 # for tests, but detected during configure
-BUILD_DEPENDS +=   devel/cmocka \
+BUILD_DEPENDS +=   devel/cmocka
 
 RUN_DEPENDS =  devel/desktop-file-utils \
misc/shared-mime-info \
-   x11/gnome/libcloudproviders \
x11/gtk+3,-guic \
x11/qt5/qtgraphicaleffects \
x11/qt5/qtquickcontrols



Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Stuart Henderson
On 2022-12-03, Sven F.  wrote:
> Bit sad the kernel stopped working thought.

AFAIK the main options available at that point are:

deadlocks waiting for resources
detect the problem and randomly kill processes (e.g. linux oom killer)
detect the problem and panic

There isn't really a lot else it could do, it has already done things
like reduce buffer cache by this point (ok not 100% of cache in the top
output you show, but a fair bit of it).

Actually I was wrong with "Tweaking login.conf won't help"; you could
reduce the max datasize to something that fits, to protect the machine.
While this won't help actually run the software (in particular a lot
of software really doesn't behave well when malloc fails), it might
help avoid deadlocks.


-- 
Please keep replies on the mailing list.



Re: Xorg freeze with ThinkPad A485 / ATI Radeon Vega

2022-12-03 Thread Joel Carnat

Le 02/12/2022 à 10:21, Bodie a écrit :

On Fri Dec 2, 2022 at 12:14 AM CET, Joel Carnat wrote:

Hi,

About once a week, Xorg freezes while I'm using my ThinkPad A485 with OpenBSD 
7.2. I've tried switching the window manager (XFCE, Gnome, WindowMaker, cwm) 
but it still happens. I only have a few apps opened (Firefox ESR, a terminal, a 
file manager). Tonight, I had just rebooted the system (because of syspatch and 
fw_update) and uptime was 1H30. The other times, I could suspend/resume a few 
times until freeze happened.

Xorg is frozen in the sense that the cursor can only move but can't interact 
with windows. Same for the keyboard, no shortcuts works. I can't even switch to 
console with Ctrl+Alt+F1. I'm stuck with a screenshot-like of what I was doing.

Note that sshd does work. I can remotely connect to the laptop. If I restart 
xenodm/gdm, it just fails. So I have to ˋrebootˋ.

dmesg only outputs:
[drm] *ERROR* ring sdma0 timeout, signaled seq=19512, emitted seq=19512
[drm] *ERROR* Process information: process  pid 0 thread  pid

I've attached the full dmesg and Xorg logs.

Is there something I can do to debug further?



What is happening in other parts? Like top(1), systat(1), vmstat(8),
Any modifications in /etc/sysctl.conf for more files, connections,..
What login class are you in?



Oh, it seems nextcloud (Nextcloud Client) taking a bunch of resources (about 
80% CPU in top) has to be killed from ssh. Then Xorg starts responding again...


What is weird is that issuing various commands in SSH do not suffer from this 
freeze / slow effects. Only the X environment.




Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Bodie
On Sat Dec 3, 2022 at 2:39 PM CET, Sven F. wrote:
> On Sat, Dec 3, 2022 at 6:44 AM Stuart Henderson 
> wrote:
>
> > On 2022-12-02, Sven F.  wrote:
> > > On Fri, Dec 2, 2022 at 11:33 AM Stuart Henderson
> > > wrote:
> > >>
> > >> On 2022-12-02, Sven F.  wrote:
> > >> > Hello,
> > >> >
> > >> > Main problem is the kernel goes into a loop and never break,
> > >> > so no ddb
> > >> > I have similar setups (same driver and stack) , and this one only
> > >> > is more prone to the error, even if the virt / qemu driver is partly
> > responsible
> > >> > the kernel should not loop the `scsi_xfer pool exhausted`
> > >> > message for ever and maybe fall into ddb after a while or
> > >> > handle this differently.
> > >> >
> > >> > Is there's step I can do to avoid or better document the bug ?
> > >> > ( i would very much like not upgrading 7.2 just yet this one )
> > >> >
> > >> >  * I had eye on it :
> > >> >
> > >> > load averages:  5.22,  2.50,  1.74
> > >> > 111 processes: 3 running, 107 idle, 1 on processor
> > >> > CPU states:  0.0% user,  0.0% nice, 34.3% sys,  0.0% spin,  0.0% intr,
> > >> > 65.7% idle
> > >> > Memory: Real: 1101M/1915M act/tot Free: 24K Cache: 96M Swap:
> > 1012M/1012M
> > >>
> > >> You have run out of RAM, don't do that
> > >>
> > >>
> > >
> > > Okay i will tweak login.conf more, but what did run out of ram :'(
> >
> > Your 2GB VM that you're trying to run a database on ran out of RAM.
> >
> > Tweaking login.conf won't help. You either need to add RAM or change
> > something so the software you're running uses less RAM. (You might
> > possibly avoid some hangs by increasing swap space, but running a
> > database in swap really isn't going to work).
> >
> > --
> > Please keep replies on the mailing list.
>
>
>
>
> Thank you . You’re right . I m currently figuring out how much ram I need
> and this makes me like sql db even less.
>

If you want something smaller then maybe SQLite is better option then
Regular MySQL (MariaDB). But even mobile phones are having these days more RAM
then you provided to your guest. So it can run, but then depends what
are you doing in that database. So eg. if your queries cache in mem

>
> Bit sad the kernel stopped working thought.
>
> >
> >
> > --
> --
> -
> Knowing is not enough; we must apply. Willing is not enough; we must do



Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Sven F.
On Sat, Dec 3, 2022 at 6:44 AM Stuart Henderson 
wrote:

> On 2022-12-02, Sven F.  wrote:
> > On Fri, Dec 2, 2022 at 11:33 AM Stuart Henderson
> > wrote:
> >>
> >> On 2022-12-02, Sven F.  wrote:
> >> > Hello,
> >> >
> >> > Main problem is the kernel goes into a loop and never break,
> >> > so no ddb
> >> > I have similar setups (same driver and stack) , and this one only
> >> > is more prone to the error, even if the virt / qemu driver is partly
> responsible
> >> > the kernel should not loop the `scsi_xfer pool exhausted`
> >> > message for ever and maybe fall into ddb after a while or
> >> > handle this differently.
> >> >
> >> > Is there's step I can do to avoid or better document the bug ?
> >> > ( i would very much like not upgrading 7.2 just yet this one )
> >> >
> >> >  * I had eye on it :
> >> >
> >> > load averages:  5.22,  2.50,  1.74
> >> > 111 processes: 3 running, 107 idle, 1 on processor
> >> > CPU states:  0.0% user,  0.0% nice, 34.3% sys,  0.0% spin,  0.0% intr,
> >> > 65.7% idle
> >> > Memory: Real: 1101M/1915M act/tot Free: 24K Cache: 96M Swap:
> 1012M/1012M
> >>
> >> You have run out of RAM, don't do that
> >>
> >>
> >
> > Okay i will tweak login.conf more, but what did run out of ram :'(
>
> Your 2GB VM that you're trying to run a database on ran out of RAM.
>
> Tweaking login.conf won't help. You either need to add RAM or change
> something so the software you're running uses less RAM. (You might
> possibly avoid some hangs by increasing swap space, but running a
> database in swap really isn't going to work).
>
> --
> Please keep replies on the mailing list.




Thank you . You’re right . I m currently figuring out how much ram I need
and this makes me like sql db even less.


Bit sad the kernel stopped working thought.

>
>
> --
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


Re: Possible Bug - 7.1 stable - scsi_xfer pool exhausted

2022-12-03 Thread Stuart Henderson
On 2022-12-02, Sven F.  wrote:
> On Fri, Dec 2, 2022 at 11:33 AM Stuart Henderson
> wrote:
>>
>> On 2022-12-02, Sven F.  wrote:
>> > Hello,
>> >
>> > Main problem is the kernel goes into a loop and never break,
>> > so no ddb
>> > I have similar setups (same driver and stack) , and this one only
>> > is more prone to the error, even if the virt / qemu driver is partly 
>> > responsible
>> > the kernel should not loop the `scsi_xfer pool exhausted`
>> > message for ever and maybe fall into ddb after a while or
>> > handle this differently.
>> >
>> > Is there's step I can do to avoid or better document the bug ?
>> > ( i would very much like not upgrading 7.2 just yet this one )
>> >
>> >  * I had eye on it :
>> >
>> > load averages:  5.22,  2.50,  1.74
>> > 111 processes: 3 running, 107 idle, 1 on processor
>> > CPU states:  0.0% user,  0.0% nice, 34.3% sys,  0.0% spin,  0.0% intr,
>> > 65.7% idle
>> > Memory: Real: 1101M/1915M act/tot Free: 24K Cache: 96M Swap: 1012M/1012M
>>
>> You have run out of RAM, don't do that
>>
>>
>
> Okay i will tweak login.conf more, but what did run out of ram :'(

Your 2GB VM that you're trying to run a database on ran out of RAM.

Tweaking login.conf won't help. You either need to add RAM or change
something so the software you're running uses less RAM. (You might
possibly avoid some hangs by increasing swap space, but running a
database in swap really isn't going to work).

-- 
Please keep replies on the mailing list.