Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
On 01/09/2022 01:03, Bresalier, Rob (Nokia - US/Murray Hill) wrote: Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. Because exit() and _exit() are C library functions but both call the SYS_exit system call and that is what strace shows. The difference is that _exit doesn't run atexit() handlers or do any other cleanup before calling SYS_exit. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> Normally, if it is the OOM that kills a process, you should find a trace of > this in the system logs. I looked in every system log I could find, there was no indication of OOM killing it in any system log. > I do not understand what you mean by reducing the nr of callers from 12 to 6. > What are these callers ? Is that some threads of the process you are running > under valgrind ? > I mean the --num-callers option core option to valgrind. By default this is 12, and I didn't specify it. I tried using --num-callers=6 to reduce memory consumption. From the valgrind manual this means " Specifies the maximum number of entries shown in stack traces that identify program locations.". By reducing it to 6 I was hoping to reduce valgrind memory consumption in case it really was OOM killer, which I really doubt now. > And just in case: are you using the last version of Valgrind ? Yes I used the last version of valgrind and many earlier versions. > You might use "strace" on valgrind to see what is going on at the time > _exit(0) is called. I did use 'strace' and dmesg. Neither indicated it was OOM killer. I did happen to save the strace log when the SIGKILL happened. Here is the part around the _exit(0): read(2040, "R", 1) = 1 gettid()= 3332 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 gettid()= 3332 write(2041, "S", 1) = 1 exit(0) = ? +++ killed by SIGKILL +++ Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. The strace log doesn't indicate anything special happening around the _exit(0). When I removed it the SIGKILL went away. > You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v > -v -v Was not aware of this and didn't try it. Don't have time to try it now. Regards, Rob ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
On Wed, 2022-08-31 at 17:42 +, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > > When running memcheck on a massive monolith embedded executable > > (237MB stripped, 1.8GiB unstripped), after I stop the executable under > > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak > > reports are printed. The parent process sees that the return status of > > memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). > > We found that removing a call to _exit(0) made it so that valgrind is no > longer > SIGKILLED. > > Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed? > > Previously exit(0) was called, without the leading underscore, but changed it > to > _exit(0) to really make sure no memory was being deallocated. This worked > well on a > different process, so we carried it over to this one, that is why we did it. > > Even with exit(0) (no underscore), in this process there is not much > deallocation going > on in exit handlers, so have lots of doubts that valgrind/memcheck was using > too much > memory and invoking the OOM killer. > > Using strace and dmesg while we had _exit(0) in use didn't show that OOM > killer was > SIGKILLing valgrind. > > I also tried reducing number of callers from 12 to 6 when using _exit(0), > still got the > SIGKILL. > > Also tried using a system that had an additional 4GByte of memory, and also > got the > SIGKILL there. > > So I have many doubts that Valgrind was getting SIGKILLed due to too much > memory usage. > > Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if > anyone had any > ideas? Normally, if it is the OOM that kills a process, you should find a trace of this in the system logs. I do not understand what you mean by reducing the nr of callers from 12 to 6. What are these callers ? Is that some threads of the process you are running under valgrind ? And just in case: are you using the last version of Valgrind ? You might use "strace" on valgrind to see what is going on at the time _exit(0) is called. You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v -v -v Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> When running memcheck on a massive monolith embedded executable > (237MB stripped, 1.8GiB unstripped), after I stop the executable under > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak > reports are printed. The parent process sees that the return status of > memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). We found that removing a call to _exit(0) made it so that valgrind is no longer SIGKILLED. Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed? Previously exit(0) was called, without the leading underscore, but changed it to _exit(0) to really make sure no memory was being deallocated. This worked well on a different process, so we carried it over to this one, that is why we did it. Even with exit(0) (no underscore), in this process there is not much deallocation going on in exit handlers, so have lots of doubts that valgrind/memcheck was using too much memory and invoking the OOM killer. Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer was SIGKILLing valgrind. I also tried reducing number of callers from 12 to 6 when using _exit(0), still got the SIGKILL. Also tried using a system that had an additional 4GByte of memory, and also got the SIGKILL there. So I have many doubts that Valgrind was getting SIGKILLed due to too much memory usage. Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if anyone had any ideas? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> > > Is there anything that can be done with memcheck to make it consume less > > memory? > > No. In fact, Yes :). Or more precisely, yes, memory can be somewhat reduced :). See my other mail. Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
On Fri, 2022-08-05 at 15:34 +, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > > If finding memory leaks is the only goal (for instance, if you are > > satisfied that > > memcheck has found all the overrun blocks, uninitialized reads, etc.) then > > https://github.com/KDE/heaptrack is the best tool. > > Thanks! I didn't know about heaptrack. I will look definitely into that. Does > heaptrack > also show the 'still reachable' types of leaks that memcheck does? > > Any chance that the 'massif' tool would survive the OOM killer? This may be > easier for > me to get going as I already have valgrind built. > > Is there anything that can be done with memcheck to make it consume less > memory? You might be interested in looking at the slides of the FOSDEM presentation 'Tuning Valgrind for your workload' https://archive.fosdem.org/2015/schedule/event/valgrind_tuning/attachments/slides/743/export/events/attachments/valgrind_tuning/slides/743/tuning_V_for_your_workload.pdf There are several things you can do to reduce memcheck memory usage. Note also that you can also run leak search while your program runs, either via memcheck client requests or from the shell, using vgdb. Philippe ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
Is there anything that can be done with memcheck to make it consume less memory? First of all, figure out whether memcheck got sigkilled because the machine ran out of space, or because you hit some shell limit/ulimit. In the former case, you can then try adding swap space to the machine. In the latter case you'll need to mess with the shell's ulimit settings. You could also try reducing the (data) size of the workload. Massif and Memcheck are different tools and do largely different things. Whether or not you can use one or the other depends a lot on the specifics of what problem you're trying to solve. J ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
Is there anything that can be done with memcheck to make it consume less memory? No. Well, you can use the command-line argument "--num-callers=" to reduce the length of tracebacks that are stored in the "red zones" just before and after an allocated block. This might help enough if you have zillions of "still reachable" blocks. But you get shorter tracebacks, which might not give enough information to find and fix the leak quickly. If you do not have zillions of "still reachable" blocks, then --num-callers will not help so much; but probably would not be needed anyway. ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
Does heaptrack also show the 'still reachable' types of leaks that memcheck does? Heaptrack intercepts malloc+free+etc, then logs the parameters, result, and traceback; but otherwise lets the progcess-original malloc+free+etc do the work. Heaptrack does not notice, and does not care, what you do with the result of malloc(), except whether or not the pointer returned by malloc() ever gets passed as an argument to free(). When heaptrack performs analysis, then any result from malloc() that has not been free()d is a "leak" as far as heaptrack is concerned. So that includes what memcheck calls "still reachable" but not (yet) a leak. Any chance that the 'massif' tool would survive the OOM killer? This may be easier for me to get going as I already have valgrind built. Worth a try if you have a day or so to spend. Like all valgrind tools, massif relies on emulating the instruction stream, so the basic ~10X run-time slowdown applies. Is there anything that can be done with memcheck to make it consume less memory? No. ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> > If you want to know for sure who killed it then strace it while it > > runs and it should show you who sends the signel but my bet is that > > it's the kernel. > I tried strace -p on my process before I triggered its exit. The strace output ends saying with: "+++ killed by SIGKILL +++", but I don't find anything about who sent it. > Or possibly watch `dmesg -w` running in another shell. > I tried 'dmesg -w' but it didn't say anything about the SIGKILL. Is there something that has to be configured for dmesg to say the source of the SIGKILL? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
Thanks Tom. Do you think I'd have better luck using the "massif" tool? Would "massif" be able to avoid the OOM killer? Or is there a way to reduce the amount of memory that memcheck will use? -Original Message- From: Tom Hughes Sent: Friday, August 5, 2022 10:08 AM To: Bresalier, Rob (Nokia - US/Murray Hill) ; valgrind-users@lists.sourceforge.net Subject: Re: memcheck is getting SIGKILLed before leak report is output On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > When running memcheck on a massive monolith embedded executable (237MB > stripped, 1.8GiB unstripped), after I stop the executable under > valgrind I see the “HEAP SUMMARY” but then valgrind dies before any > leak reports are printed. The parent process sees that the return > status of memcheck is that it was SIGKILLed (status returned in > waitpid call is ‘9’). I am 99.9% sure that the parent process is not the one > sending the SIGKILL. > Is it possible that valgrind SIGKILLs itself? Is there a reason that > the linux kernel (Wind River Linux) could be sending a SIGKILL to > valgrind/memcheck? I do not see any messages about Out of Memory/OOM > killer killing valgrind. Previous experience with this executable is > that there are almost 3 million leak reports (most of them are “still > reachable”), could that be occupying too much memory. Any ideas/advice > to figure out what is going on? Almost certainly the kernel OOM kiied it. If you want to know for sure who killed it then strace it while it runs and it should show you who sends the signel but my bet is that it's the kernel. > One thing I see in the logs is about “unhandled ioctl 0xa5 with no > size/direction hints”. Could this be a trigger for this crash/sigkill? Not really, no. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
> If finding memory leaks is the only goal (for instance, if you are satisfied > that > memcheck has found all the overrun blocks, uninitialized reads, etc.) then > https://github.com/KDE/heaptrack is the best tool. Thanks! I didn't know about heaptrack. I will look definitely into that. Does heaptrack also show the 'still reachable' types of leaks that memcheck does? Any chance that the 'massif' tool would survive the OOM killer? This may be easier for me to get going as I already have valgrind built. Is there anything that can be done with memcheck to make it consume less memory? ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
On 05/08/2022 16:08, Tom Hughes via Valgrind-users wrote: If you want to know for sure who killed it then strace it while it runs and it should show you who sends the signel but my bet is that it's the kernel. Or possibly watch `dmesg -w` running in another shell. J ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
When running memcheck on a massive monolith embedded executable (237MB stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see the “HEAP SUMMARY” but then valgrind dies before any leak reports are printed. If finding memory leaks is the only goal (for instance, if you are satisfied that memcheck has found all the overrun blocks, uninitialized reads, etc.) then https://github.com/KDE/heaptrack is the best tool. The data-gathering phase runs in any Linux process using LD_PRELOAD and libunwind. The analysis phase runs a GUI under KDE, and/or generates *useful* text reports: leaks by individual size, leaks by total size for a given traceback, allocations (leaked or not) by frequency or total size, etc. I like the text-only analysis, which avoids the requirement for KDE. Heaptrack CPU overhead tends to be around 20% or less, so it does not take forever. Heaptrack does require disk space to record data (sequential access only), so you may need several gigabytes (locally or via network.) ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] memcheck is getting SIGKILLed before leak report is output
On 05/08/2022 14:09, Bresalier, Rob (Nokia - US/Murray Hill) wrote: When running memcheck on a massive monolith embedded executable (237MB stripped, 1.8GiB unstripped), after I stop the executable under valgrind I see the “HEAP SUMMARY” but then valgrind dies before any leak reports are printed. The parent process sees that the return status of memcheck is that it was SIGKILLed (status returned in waitpid call is ‘9’). I am 99.9% sure that the parent process is not the one sending the SIGKILL. Is it possible that valgrind SIGKILLs itself? Is there a reason that the linux kernel (Wind River Linux) could be sending a SIGKILL to valgrind/memcheck? I do not see any messages about Out of Memory/OOM killer killing valgrind. Previous experience with this executable is that there are almost 3 million leak reports (most of them are “still reachable”), could that be occupying too much memory. Any ideas/advice to figure out what is going on? Almost certainly the kernel OOM kiied it. If you want to know for sure who killed it then strace it while it runs and it should show you who sends the signel but my bet is that it's the kernel. One thing I see in the logs is about “unhandled ioctl 0xa5 with no size/direction hints”. Could this be a trigger for this crash/sigkill? Not really, no. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users