Re: [lttng-dev] Capturing snapshot on kernel panic

2024-05-16 Thread Mathieu Desnoyers via lttng-dev

Hi Damien,

If kexec is not an option on your system, you might be able to
access the pmem+dax filesystem after a warm reboot, but it very
much depends on whether your bios clears your memory or not on
warm reboot.

Cheers,

Mathieu

On 2024-05-16 14:22, Damien Berget via lttng-dev wrote:

Thanks Kienan for these quick suggestions,
we'll investigate the pmem route (I was not aware of the lttng-cash 
utility, it's pretty nice) even if I'm not sure how fast it would burn 
through our SSD, it might still be worth trying.
As for kexec-tool, it's not officially supported on our embedded modules 
unfortunately, so we might be SOL there. We may have to try to add our 
own trace-point in kernel to use as trigger.

Cheers
Damien

On Thu, May 16, 2024 at 8:12 AM Kienan Stewart > wrote:


Hi Damien,

I want to expand on one of the options that could work for your case.

On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
 > Hi Damien,
 >
 >
 > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
 >> Good day,
 >> we have been using LTTng successfully to capture snapshots on user
 >> defined tracepoints and it did provide invaluable to debug our
issues.
 >> Thanks to all the contributors of this project!
 >>
 >> We'd like to know if it would be possible to trigger on a kernel
 >> panic? I might be dubiously possible as you would still need to
have
 >> the file-system working to write the results but I should ask.
 >>
 >
 > For userspace tracing, I think the recommendation is usually to
use a
 > dax/pmem device and have the buffers for the session mapped
there. After
 > a panic, the contents of the buffers can be restored using
lttng-crash[1].
 >
 > Note that dax/pem isn't supported by the kernel space tracer at
this time.
 >
 > If I recall, there are other ways to things in the panic sequence
(that
 > aren't lttng specific), but I'm personally not as familiar with the
 > details of that stage of linux.
 >

It's possible to kexec-tools to load a new kernel post-panic[1]. If
your
system uses kexec, the contents of RAM aren't necessarily flushed, and
if both the initial kernel and post-panic kernel started by kexec have
the same configuration for an emulated PMEM device using the memmap
paramenter [2,3] that region of memory can have a daxfs created in it
post-clean boot.

Note: some systems may not flush the memory during a warm reboot, but
this is dependent on the BIOS.

When your system boots you could do something like the following:

   * If it's a clean boot, create the daxfs
   * If it's an "unclean" boot (e.g. the daxfs already exists, or a
kernel parameter informs you that it started post-panic) then you can
copy/move/use lttng-crash to persistent storage for analysis
   * Start tracing using a snapshot session and the userspace
buffers on
the daxfs.

In this type of situation the "snapshot" command is never invoked
directly, but the recovery of the buffers to create a snapshot is
possible.

[1]:
https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html

[2]:
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 

[3]:

https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap
 


thanks,
kienan

 >> Looking at available kernel syscall, the "reboot" one seems like a
 >> good candidate, however I was not able to capture a snapshot on
it. I
 >> have tested the setup below with "--name=chdir" syscall and it
 >> works, "cd" to a directory will create a trace. But no dice with
reboot.
 >>
 >
 > The details of how this work will depend on your system. For
example, my
 > installations tend to use systemd as PID 1. The broad strokes
seem to
 > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
 > believe then kicks off the reboot.service, the PID 1 is swapped to
 > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent
to all
 > processes, unmounts, syncs, calls the reboot system call [2,3].
 >
 > As both the sigterm and the unmounts are done before the syscall,
 > lttng-sessiond and the consumers will have already shutdown by
the time
 > it enters.
 >
 > While this doesn't necessarily help your original question of
panics, if
 > you want to snapshot before shutdown or reboot and are using
systemd,
 > it's possible to leave a script or 

Re: [lttng-dev] Capturing snapshot on kernel panic

2024-05-16 Thread Damien Berget via lttng-dev
Thanks Kienan for these quick suggestions,
we'll investigate the pmem route (I was not aware of the lttng-cash
utility, it's pretty nice) even if I'm not sure how fast it would burn
through our SSD, it might still be worth trying.
As for kexec-tool, it's not officially supported on our embedded modules
unfortunately, so we might be SOL there. We may have to try to add our own
trace-point in kernel to use as trigger.
Cheers
Damien

On Thu, May 16, 2024 at 8:12 AM Kienan Stewart 
wrote:

> Hi Damien,
>
> I want to expand on one of the options that could work for your case.
>
> On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
> > Hi Damien,
> >
> >
> > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
> >> Good day,
> >> we have been using LTTng successfully to capture snapshots on user
> >> defined tracepoints and it did provide invaluable to debug our issues.
> >> Thanks to all the contributors of this project!
> >>
> >> We'd like to know if it would be possible to trigger on a kernel
> >> panic? I might be dubiously possible as you would still need to have
> >> the file-system working to write the results but I should ask.
> >>
> >
> > For userspace tracing, I think the recommendation is usually to use a
> > dax/pmem device and have the buffers for the session mapped there. After
> > a panic, the contents of the buffers can be restored using
> lttng-crash[1].
> >
> > Note that dax/pem isn't supported by the kernel space tracer at this
> time.
> >
> > If I recall, there are other ways to things in the panic sequence (that
> > aren't lttng specific), but I'm personally not as familiar with the
> > details of that stage of linux.
> >
>
> It's possible to kexec-tools to load a new kernel post-panic[1]. If your
> system uses kexec, the contents of RAM aren't necessarily flushed, and
> if both the initial kernel and post-panic kernel started by kexec have
> the same configuration for an emulated PMEM device using the memmap
> paramenter [2,3] that region of memory can have a daxfs created in it
> post-clean boot.
>
> Note: some systems may not flush the memory during a warm reboot, but
> this is dependent on the BIOS.
>
> When your system boots you could do something like the following:
>
>   * If it's a clean boot, create the daxfs
>   * If it's an "unclean" boot (e.g. the daxfs already exists, or a
> kernel parameter informs you that it started post-panic) then you can
> copy/move/use lttng-crash to persistent storage for analysis
>   * Start tracing using a snapshot session and the userspace buffers on
> the daxfs.
>
> In this type of situation the "snapshot" command is never invoked
> directly, but the recovery of the buffers to create a snapshot is possible.
>
> [1]: https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
> [2]:
> https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
> [3]:
>
> https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap
>
> thanks,
> kienan
>
> >> Looking at available kernel syscall, the "reboot" one seems like a
> >> good candidate, however I was not able to capture a snapshot on it. I
> >> have tested the setup below with "--name=chdir" syscall and it
> >> works, "cd" to a directory will create a trace. But no dice with reboot.
> >>
> >
> > The details of how this work will depend on your system. For example, my
> > installations tend to use systemd as PID 1. The broad strokes seem to
> > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
> > believe then kicks off the reboot.service, the PID 1 is swapped to
> > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all
> > processes, unmounts, syncs, calls the reboot system call [2,3].
> >
> > As both the sigterm and the unmounts are done before the syscall,
> > lttng-sessiond and the consumers will have already shutdown by the time
> > it enters.
> >
> > While this doesn't necessarily help your original question of panics, if
> > you want to snapshot before shutdown or reboot and are using systemd,
> > it's possible to leave a script or binary in a known directory so that
> > it's invoked prior to the rest of the shutdown sequence[4].
> >
> > [1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
> > [2]:
> >
> https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c
> > [3]:
> >
> https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77
> > [4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/
> >
> > hope this helps,
> > kienan
> >
> >> Would you have any suggestions?
> >> Thanks for your help,
> >> Cheers
> >> Damien
> >>
> >> 
> >>
> >> # Prep output dir
> >> mkdir /application/trace/
> >> rm -rf /application/trace/*
> >>
> >> # Create session
> >> sudo lttng destroy snapshot-trace-session
> >> sudo lttng create snapshot-trace-session --snapshot
> >> 

Re: [lttng-dev] Capturing snapshot on kernel panic

2024-05-16 Thread Kienan Stewart via lttng-dev

Hi Damien,

I want to expand on one of the options that could work for your case.

On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:

Hi Damien,


On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:

Good day,
we have been using LTTng successfully to capture snapshots on user 
defined tracepoints and it did provide invaluable to debug our issues. 
Thanks to all the contributors of this project!


We'd like to know if it would be possible to trigger on a kernel 
panic? I might be dubiously possible as you would still need to have 
the file-system working to write the results but I should ask.




For userspace tracing, I think the recommendation is usually to use a 
dax/pmem device and have the buffers for the session mapped there. After 
a panic, the contents of the buffers can be restored using lttng-crash[1].


Note that dax/pem isn't supported by the kernel space tracer at this time.

If I recall, there are other ways to things in the panic sequence (that 
aren't lttng specific), but I'm personally not as familiar with the 
details of that stage of linux.




It's possible to kexec-tools to load a new kernel post-panic[1]. If your 
system uses kexec, the contents of RAM aren't necessarily flushed, and 
if both the initial kernel and post-panic kernel started by kexec have 
the same configuration for an emulated PMEM device using the memmap 
paramenter [2,3] that region of memory can have a daxfs created in it 
post-clean boot.


Note: some systems may not flush the memory during a warm reboot, but 
this is dependent on the BIOS.


When your system boots you could do something like the following:

 * If it's a clean boot, create the daxfs
 * If it's an "unclean" boot (e.g. the daxfs already exists, or a 
kernel parameter informs you that it started post-panic) then you can 
copy/move/use lttng-crash to persistent storage for analysis
 * Start tracing using a snapshot session and the userspace buffers on 
the daxfs.


In this type of situation the "snapshot" command is never invoked 
directly, but the recovery of the buffers to create a snapshot is possible.


[1]: https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
[2]: 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
[3]: 
https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap


thanks,
kienan

Looking at available kernel syscall, the "reboot" one seems like a 
good candidate, however I was not able to capture a snapshot on it. I 
have tested the setup below with "--name=chdir" syscall and it 
works, "cd" to a directory will create a trace. But no dice with reboot.




The details of how this work will depend on your system. For example, my 
installations tend to use systemd as PID 1. The broad strokes seem to 
be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I 
believe then kicks off the reboot.service, the PID 1 is swapped to 
/usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all 
processes, unmounts, syncs, calls the reboot system call [2,3].


As both the sigterm and the unmounts are done before the syscall, 
lttng-sessiond and the consumers will have already shutdown by the time 
it enters.


While this doesn't necessarily help your original question of panics, if 
you want to snapshot before shutdown or reboot and are using systemd, 
it's possible to leave a script or binary in a known directory so that 
it's invoked prior to the rest of the shutdown sequence[4].


[1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
[2]: 
https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c
[3]: 
https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77

[4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/

hope this helps,
kienan


Would you have any suggestions?
Thanks for your help,
Cheers
Damien



# Prep output dir
mkdir /application/trace/
rm -rf /application/trace/*

# Create session
sudo lttng destroy snapshot-trace-session
sudo lttng create snapshot-trace-session --snapshot 
--output="/application/trace/"

sudo lttng enable-channel --kernel --num-subbuf=8 channelk
sudo lttng enable-channel --userspace --num-subbuf=8 channelu

# Configure session
sudo lttng enable-event --kernel --syscall --all --channel channelk
sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk
sudo lttng enable-event --userspace --all --channel channelu
sudo lttng add-context -u -t vtid -t procname
sudo lttng remove-trigger trig_reboot
sudo lttng add-trigger --name=trig_reboot \
         --condition=event-rule-matches --type=kernel:syscall:entry \
         --name=reboot\
         --action=snapshot-session snapshot-trace-session \
         --rate-policy=once-after:1

# start & list info
sudo lttng start
sudo lttng list snapshot-trace-session
sudo lttng list-triggers

# test 

Re: [lttng-dev] Capturing snapshot on kernel panic

2024-05-16 Thread Kienan Stewart via lttng-dev

Hi Damien,


On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:

Good day,
we have been using LTTng successfully to capture snapshots on user 
defined tracepoints and it did provide invaluable to debug our issues. 
Thanks to all the contributors of this project!


We'd like to know if it would be possible to trigger on a kernel panic? 
I might be dubiously possible as you would still need to have the 
file-system working to write the results but I should ask.




For userspace tracing, I think the recommendation is usually to use a 
dax/pmem device and have the buffers for the session mapped there. After 
a panic, the contents of the buffers can be restored using lttng-crash[1].


Note that dax/pem isn't supported by the kernel space tracer at this time.

If I recall, there are other ways to things in the panic sequence (that 
aren't lttng specific), but I'm personally not as familiar with the 
details of that stage of linux.


Looking at available kernel syscall, the "reboot" one seems like a good 
candidate, however I was not able to capture a snapshot on it. I have 
tested the setup below with "--name=chdir" syscall and it works, "cd" to 
a directory will create a trace. But no dice with reboot.




The details of how this work will depend on your system. For example, my 
installations tend to use systemd as PID 1. The broad strokes seem to 
be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I 
believe then kicks off the reboot.service, the PID 1 is swapped to 
/usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all 
processes, unmounts, syncs, calls the reboot system call [2,3].


As both the sigterm and the unmounts are done before the syscall, 
lttng-sessiond and the consumers will have already shutdown by the time 
it enters.


While this doesn't necessarily help your original question of panics, if 
you want to snapshot before shutdown or reboot and are using systemd, 
it's possible to leave a script or binary in a known directory so that 
it's invoked prior to the rest of the shutdown sequence[4].


[1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
[2]: 
https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c
[3]: 
https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77

[4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/

hope this helps,
kienan


Would you have any suggestions?
Thanks for your help,
Cheers
Damien



# Prep output dir
mkdir /application/trace/
rm -rf /application/trace/*

# Create session
sudo lttng destroy snapshot-trace-session
sudo lttng create snapshot-trace-session --snapshot 
--output="/application/trace/"

sudo lttng enable-channel --kernel --num-subbuf=8 channelk
sudo lttng enable-channel --userspace --num-subbuf=8 channelu

# Configure session
sudo lttng enable-event --kernel --syscall --all --channel channelk
sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk
sudo lttng enable-event --userspace --all --channel channelu
sudo lttng add-context -u -t vtid -t procname
sudo lttng remove-trigger trig_reboot
sudo lttng add-trigger --name=trig_reboot \
         --condition=event-rule-matches --type=kernel:syscall:entry \
         --name=reboot\
         --action=snapshot-session snapshot-trace-session \
         --rate-policy=once-after:1

# start & list info
sudo lttng start
sudo lttng list snapshot-trace-session
sudo lttng list-triggers

# test it...
sudo reboot

#=== reconnect and Nothing :(
$ ls -alu /application/trace/
drwxr-xr-x    2 u  u       4096 May 15  2024 .
drwxr-xr-x   10 u  u       4096 May 15  2024 ..


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] Capturing snapshot on kernel panic

2024-05-15 Thread Damien Berget via lttng-dev
Good day,
we have been using LTTng successfully to capture snapshots on user defined
tracepoints and it did provide invaluable to debug our issues. Thanks to
all the contributors of this project!

We'd like to know if it would be possible to trigger on a kernel panic? I
might be dubiously possible as you would still need to have the file-system
working to write the results but I should ask.

Looking at available kernel syscall, the "reboot" one seems like a good
candidate, however I was not able to capture a snapshot on it. I have
tested the setup below with "--name=chdir" syscall and it works, "cd" to a
directory will create a trace. But no dice with reboot.

Would you have any suggestions?
Thanks for your help,
Cheers
Damien



# Prep output dir
mkdir /application/trace/
rm -rf /application/trace/*

# Create session
sudo lttng destroy snapshot-trace-session
sudo lttng create snapshot-trace-session --snapshot
--output="/application/trace/"
sudo lttng enable-channel --kernel --num-subbuf=8 channelk
sudo lttng enable-channel --userspace --num-subbuf=8 channelu

# Configure session
sudo lttng enable-event --kernel --syscall --all --channel channelk
sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk
sudo lttng enable-event --userspace --all --channel channelu
sudo lttng add-context -u -t vtid -t procname
sudo lttng remove-trigger trig_reboot
sudo lttng add-trigger --name=trig_reboot \
--condition=event-rule-matches --type=kernel:syscall:entry \
--name=reboot\
--action=snapshot-session snapshot-trace-session \
--rate-policy=once-after:1

# start & list info
sudo lttng start
sudo lttng list snapshot-trace-session
sudo lttng list-triggers

# test it...
sudo reboot

#=== reconnect and Nothing :(
$ ls -alu /application/trace/
drwxr-xr-x2 u  u   4096 May 15  2024 .
drwxr-xr-x   10 u  u   4096 May 15  2024 ..
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev