Bug#1072299: Compositor-related crashes

2024-06-25 Thread Andres Salomon

On 6/25/24 00:23, Daniel Richard G. wrote:

On Thu, 2024 Jun 13 15:07-04:00, Andres Salomon wrote:


Oops, sorry, I didn't see this email. Let me finish getting v126 out the
door, and then circle back if the crash is still present there.


I still see the problem with .114, alas. Tabs crash, sometimes the whole
browser goes. I often leave my home PC with Chromium up in the morning,
only to find it gone in the evening.

While stopping on SIGILL has not caught the crashes leading to the
"invalid opcode" syslog error, GDB does occasionally stop at a location
indicating an out of memory condition. Here is a typical backtrace:

 (gdb) bt
 #0  0x5572f6c3dc4d in 
partition_alloc::internal::OnNoMemoryInternal(unsigned long) ()
 #1  0x5572f6c3dc59 in 
partition_alloc::TerminateBecauseOutOfMemory(unsigned long) ()
 #2  0x5572f31a33cf in 
gpu::ClientSharedImageInterface::CreateSharedImage(gpu::SharedImageInfo const&) 
()
 #3  0x5572f82ba0bd in 
cc::BitmapRasterBufferProvider::AcquireBufferForRaster(cc::ResourcePool::InUsePoolResource
 const&, unsigned long, unsigned long, bool, bool, bool) ()
 #4  0x5572f8225d8f in cc::TileManager::CreateRasterTask(cc::PrioritizedTile 
const&, cc::TargetColorParams const&, 
cc::TileManager::PrioritizedWorkToSchedule*) ()
 #5  0x5572f82237f4 in cc::TileManager::AssignGpuMemoryToTiles() ()
 [...]




That backtrace makes sense; it's running OnNoMemoryInternal(), which 
calls PA_IMMEDIATE_CRASH(), which generates an invalid opcode customized 
for various architectures (ironically to provide a better backtrace 
compared to calling abort() or something).


It's a maze of #ifdefs, but it appears that on x86, it calls 'int3' 
(which should emit SIGTRAP), followed by 'ud2' (undefined instruction). 
You can probably get gdb to catch the SIGTRAP, but that honestly doesn't 
help diagnose _why_ you're running out of memory.


I don't know how chromium handles GPU memory, but I wonder if the issue 
is that GPU memory can't be swapped, and the partition allocator is just 
grabbing way too much of it and wasting it?


Try changing the following in 
base/allocator/partition_allocator/src/partition_alloc/partition_alloc_constants.h 
, around line 144:


constexpr size_t kMaxPartitionPagesPerRegularSlotSpan = 4;

Change that value to be 3 or 2, and see if that helps. Keep in mind by 
doing this you're trading memory for speed, so memory allocations might 
be a bit slower.


If that doesn't help, it could also be that the CreateSharedImage() code 
for your specific graphics driver is leaking memory or something. If you 
can try a different graphics driver/stack, that could also point to a 
specific bug in the driver.




I am running Chromium under strong memory pressure (3 GB RAM with
hundreds of tabs open), but with a good amount of swap space (16 GB) of
which not more than 2 GB is typically used. When loading a new page or
the like, Chromium will sometimes spend a few seconds paging to swap,
and at other times it will crash. Prior to this bug report, however, the
crashes were very rare---swap access may have slowed things down, but
the browser otherwise ran like a tank. Now, I often get crashes in the
middle of typing text into a form window, which is especially
aggravating as the text is not restored with the page.

I have Memory Saver enabled. Are there any other settings you're aware
of that I should experiment with? I already keep an eye on Chromium's
Task Manager with the "Memory footprint" column in descending order, so
if a site goes nuts with RAM usage, I can usually catch it.



Memory Saver is exactly what I was going to recommend as a workaround, 
before I saw the backtrace.  ;)




OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1072299: Compositor-related crashes

2024-06-03 Thread Daniel Richard G.
I'm going to need a spot of help with this.

I have Chromium running under GDB, with surprisingly low overhead (I can
browse like normal if I drop the --single-process flag). As far as I
could find, the "trap invalid opcode" error reported in syslog is
synonymous with a SIGILL, so I set "handle SIGILL stop pass".
Unfortunately, the trap errors continue to occur without GDB stopping
execution.

Do you know how to set this up to get to a backtrace? Maybe a way of
disabling the signal/crash handler?



Bug#1072299: Compositor-related crashes

2024-05-31 Thread Daniel Richard G.
On Fri, 2024 May 31 21:49-04:00, Andres Salomon wrote:
> Oh! Apparently my info is outdated. According to 
> , this was fixed back in August. It does 
> indeed look like
>  
> has the dbgsym packages for .141.

Thanks for the pointer. I did not know about debian-security-debug, as
the Debian wiki pages make no mention of it.

I've installed .141 and the dbgsym package, and confirmed that at
least the tab crash still occurs. Will try to get some useful
telemetry out of this.


--Daniel



Bug#1072299: Compositor-related crashes

2024-05-31 Thread Andres Salomon

On 5/31/24 18:59, Daniel Richard G. wrote:

On Fri, 2024 May 31 18:36-04:00, Andres Salomon wrote:


I'm going from memory here, but I believe the dak installation on
security.debian.org doesn't keep dbgsym packages for historical reasons.
Thus, they're only available once chromium gets moved to
stable-proposed-updates. https://tracker.debian.org/pkg/chromium shows
.60 as being the last one in stable-p-u. At some point in the next week
or two, someone from the release team will likely accept the newer
chromium packages into stable-p-u, at which point the dbgsym packages
for .141 (or whatever the latest version is) will be available.


Eeegh, not a great state of affairs for a package that revs this often.


Oh! Apparently my info is outdated. According to 
, this was fixed back in August. It does 
indeed look like
 
has the dbgsym packages for .141.


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1072299: Compositor-related crashes

2024-05-31 Thread Daniel Richard G.
On Fri, 2024 May 31 18:36-04:00, Andres Salomon wrote:
>
> I'm going from memory here, but I believe the dak installation on 
> security.debian.org doesn't keep dbgsym packages for historical reasons. 
> Thus, they're only available once chromium gets moved to 
> stable-proposed-updates. https://tracker.debian.org/pkg/chromium shows 
> .60 as being the last one in stable-p-u. At some point in the next week 
> or two, someone from the release team will likely accept the newer 
> chromium packages into stable-p-u, at which point the dbgsym packages 
> for .141 (or whatever the latest version is) will be available.

Eeegh, not a great state of affairs for a package that revs this often.

> It sucks, but it is what it is. You could either spend a bunch of time 
> building chromium for the dbgsym packages, or I could put my local build 
> of .141 online w/ dbgsym packages for you to try out (assuming amd64?), 
> or you could downgrade to .60 and use those dbgsym packages.

If it's not too much trouble to put up that .141 package (and the
problem still persists in that version), I'll gladly make use of it.

> Yes, just running 'chromium -g' will launch it inside gdb; you may have 
> to manually type 'run' to start it inside gdb, I forget.  But then 
> you'll get a backtrace (or you can ctrl-c and run 'bt', if it's a 
> deadlock or something). I haven't bothered w/ core dumps of chromium 
> before, so I can't speak to that.

Understood. The system in question is a bit tight on memory, so
hopefully it won't fall over with Chromium under GDB.


--Daniel



Bug#1072299: Compositor-related crashes

2024-05-31 Thread Andres Salomon

On 5/31/24 15:51, Daniel Richard G. wrote:

I believe I've found a correlation: The crashes seem to have started
with an instance of firefox-esr (115.11.0esr-1~deb12u1) that I was
running on the side, since earlier today. Once I closed Firefox, the
crashiness went away, completely.

(This is on the same laptop that needs --use-gl=egl to avoid visual
artifacts, so that might have something to do with this)

On Fri, 2024 May 31 15:27-04:00, Andres Salomon wrote:


Interesting. Any chance of a backtrace (with the chromium-dbgsym
package)? I'm wondering if some (bundled) third party lib has started
requiring newer cpu extensions or something.


I'm happy to provide this, but two questions:

1. In http://debug.mirrors.debian.org/debian-debug/pool/main/c/chromium/
as well as https://deb.debian.org/debian-debug/pool/main/c/chromium/,
I don't see any packages with a matching version string of
"125.0.6422.112-1~deb12u1" (and .141 isn't there yet). Am I missing
something?


I'm going from memory here, but I believe the dak installation on 
security.debian.org doesn't keep dbgsym packages for historical reasons. 
Thus, they're only available once chromium gets moved to 
stable-proposed-updates. https://tracker.debian.org/pkg/chromium shows 
.60 as being the last one in stable-p-u. At some point in the next week 
or two, someone from the release team will likely accept the newer 
chromium packages into stable-p-u, at which point the dbgsym packages 
for .141 (or whatever the latest version is) will be available.


It sucks, but it is what it is. You could either spend a bunch of time 
building chromium for the dbgsym packages, or I could put my local build 
of .141 online w/ dbgsym packages for you to try out (assuming amd64?), 
or you could downgrade to .60 and use those dbgsym packages.






2. To get the stack trace, is the right way just running the whole
thing in GDB, using "chromium -g"? Or do you set it up to make a
core dump? (Sure would be nice to have an Apport-like after-the-fact
workflow for this)




Yes, just running 'chromium -g' will launch it inside gdb; you may have 
to manually type 'run' to start it inside gdb, I forget.  But then 
you'll get a backtrace (or you can ctrl-c and run 'bt', if it's a 
deadlock or something). I haven't bothered w/ core dumps of chromium 
before, so I can't speak to that.


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1072299: Compositor-related crashes

2024-05-31 Thread Daniel Richard G.
Sigh, spoke too soon.

Chromium still crashes in both modes (tab and browser) even without
Firefox running, but much less frequently. I had a good half-hour
without crashes after closing Firefox, enough to lead me to think that
was the cause.

At this point, we're probably better off waiting to see if .141
still has the issue. But is there a reason why that -dbgsym package
isn't there?


--Daniel



Bug#1072299: Compositor-related crashes

2024-05-31 Thread Daniel Richard G.
I believe I've found a correlation: The crashes seem to have started
with an instance of firefox-esr (115.11.0esr-1~deb12u1) that I was
running on the side, since earlier today. Once I closed Firefox, the
crashiness went away, completely.

(This is on the same laptop that needs --use-gl=egl to avoid visual
artifacts, so that might have something to do with this)

On Fri, 2024 May 31 15:27-04:00, Andres Salomon wrote:
>
> Interesting. Any chance of a backtrace (with the chromium-dbgsym 
> package)? I'm wondering if some (bundled) third party lib has started 
> requiring newer cpu extensions or something.

I'm happy to provide this, but two questions:

1. In http://debug.mirrors.debian.org/debian-debug/pool/main/c/chromium/
   as well as https://deb.debian.org/debian-debug/pool/main/c/chromium/,
   I don't see any packages with a matching version string of
   "125.0.6422.112-1~deb12u1" (and .141 isn't there yet). Am I missing
   something?

2. To get the stack trace, is the right way just running the whole
   thing in GDB, using "chromium -g"? Or do you set it up to make a
   core dump? (Sure would be nice to have an Apport-like after-the-fact
   workflow for this)


--Daniel



Bug#1072299: Compositor-related crashes

2024-05-31 Thread Andres Salomon

On 5/31/24 14:48, Daniel Richard G. wrote:

Package: chromium
Version: 125.0.6422.112-1~deb12u1
Severity: important

Recently, I have been observing crashes of individual tabs, and even
of the entire browser, when navigating some Web pages. The crashed
tabs correlate with the following syslog messages (multiple instances
listed below):

2024-05-31T12:42:35.334876-04:00 runabout kernel: [1324259.940186] traps: 
Compositor[125485] trap invalid opcode ip:55a9cc8c18cd sp:7ff9a0ded490 error:0 
in chromium[55a9c7e22000+b13d000]
2024-05-31T12:57:20.174268-04:00 runabout kernel: [1325144.782743] traps: 
Compositor[125761] trap invalid opcode ip:55a9cc8c18cd sp:7ff9a0ded1e0 error:0 
in chromium[55a9c7e22000+b13d000]
2024-05-31T13:24:20.059327-04:00 runabout kernel: [1326764.664063] traps: 
Compositor[126515] trap invalid opcode ip:55daaca498cd sp:7f9f6c1ed1e0 error:0 
in chromium[55daa7faa000+b13d000]
2024-05-31T13:55:26.767783-04:00 runabout kernel: [1328631.258090] traps: 
Compositor[126307] trap invalid opcode ip:55daaca498cd sp:7f9f6c1ecfb0 error:0 
in chromium[55daa7faa000+b13d000]

The whole-browser crash occurs with no unusual messages to syslog or
~/.xsession-errors, strangely enough.

And even stranger, only today (May 31) have I started observing these
crashes. This particular version has been installed and running fine
since May 23, and now it's crashing left and right. /var/log/dpkg.log
shows no new package installs since the 25th, and I haven't futzed with
any configurations for at least that long.

Since the error relates to an invalid opcode, I'll include some details
about the CPU:

vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T7200  @ 2.00GHz
stepping: 6
microcode   : 0xc7
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 
monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti tpr_shadow dtherm


--Daniel


Interesting. Any chance of a backtrace (with the chromium-dbgsym 
package)? I'm wondering if some (bundled) third party lib has started 
requiring newer cpu extensions or something.


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1072299: Compositor-related crashes

2024-05-31 Thread Daniel Richard G.
Package: chromium
Version: 125.0.6422.112-1~deb12u1
Severity: important

Recently, I have been observing crashes of individual tabs, and even
of the entire browser, when navigating some Web pages. The crashed
tabs correlate with the following syslog messages (multiple instances
listed below):

2024-05-31T12:42:35.334876-04:00 runabout kernel: [1324259.940186] traps: 
Compositor[125485] trap invalid opcode ip:55a9cc8c18cd sp:7ff9a0ded490 error:0 
in chromium[55a9c7e22000+b13d000]
2024-05-31T12:57:20.174268-04:00 runabout kernel: [1325144.782743] traps: 
Compositor[125761] trap invalid opcode ip:55a9cc8c18cd sp:7ff9a0ded1e0 error:0 
in chromium[55a9c7e22000+b13d000]
2024-05-31T13:24:20.059327-04:00 runabout kernel: [1326764.664063] traps: 
Compositor[126515] trap invalid opcode ip:55daaca498cd sp:7f9f6c1ed1e0 error:0 
in chromium[55daa7faa000+b13d000]
2024-05-31T13:55:26.767783-04:00 runabout kernel: [1328631.258090] traps: 
Compositor[126307] trap invalid opcode ip:55daaca498cd sp:7f9f6c1ecfb0 error:0 
in chromium[55daa7faa000+b13d000]

The whole-browser crash occurs with no unusual messages to syslog or
~/.xsession-errors, strangely enough.

And even stranger, only today (May 31) have I started observing these
crashes. This particular version has been installed and running fine
since May 23, and now it's crashing left and right. /var/log/dpkg.log
shows no new package installs since the 25th, and I haven't futzed with
any configurations for at least that long.

Since the error relates to an invalid opcode, I'll include some details
about the CPU:

vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T7200  @ 2.00GHz
stepping: 6
microcode   : 0xc7
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 
monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti tpr_shadow dtherm


--Daniel