Re: [lttng-dev] rcu_cmpxchg_pointer() documentation patch
On 2024-07-04 13:33, Ondřej Surý via lttng-dev wrote: Hi, looks like my git-send-email configuration is not correct and my mailserver ate the patch, so here's one created by git-format-patch... Nothing important, but it bite me today... Merged into master, stable-0.14, stable-0.13, thanks! Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] Common Trace Format 2 (CTF2) specification and sub-specs
Hi, Here is the status of the specification and sub-specifications: - "CTF2‑SPEC‑2.0: Common Trace Format version 2" https://diamon.org/ctf/files/CTF2-SPEC-2.0.html This is the official CTF 2.0 spec. The URL https://diamon.org/ctf points to the latest version of the Common Trace Format spec. - "CTF2‑DOCID‑2.0: CTF 2 document identifier format" https://diamon.org/ctf/files/CTF2-DOCID-2.0.html The CTF 2 specification refers to it. - "CTF2-FS-1.0: Layout of a CTF 2 trace stored on a file system" https://diamon.org/ctf/files/CTF2-FS-1.0.html This document covers how Babeltrace and Trace Compass will expect the CTF2 files on the filesystem, and how Babeltrace and LTTng plan to produce them. This is by all means an "optional" specification, which means it is up to the implementation to decide whether they want to abide by it or not. Philippe plans to soon release a CTF2-FS-2.0 document with pretty much the same content as version 1.0, but formatted following the CTF2‑DOCID‑2.0 specification. We are planning to add an index of those relevant "optional" specifications within the CTF2 specification so they can easily be found. If we end up having a new storage pattern that end up being commonly used by implementations, e.g. storing a CTF2 trace within a binary blob in OpenTelemetry [1], we can always create a new sub-specification similar to CTF2-FS to cover this. The following specification files were deprecated by the time the CTF2 specification was finalized: - https://diamon.org/ctf/files/CTF2-BASICATTRS-1.0.html - https://diamon.org/ctf/files/CTF2-PMETA-1.0.html As always, feedback is welcome! Thanks, Mathieu [1] https://github.com/open-telemetry/opentelemetry-specification/issues/3979 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [lttng-relayd] is there existing cases for relayd to stream over Android usb based adb?
On 2024-06-04 11:25, Wu, Yannan wrote: The device is a rooted android device *On the device:* lttng-sessiond -d --no-kernel lttng create my-live-session --live lttng enable-event -u lttng start *On the host:* adb reverse tcp:5343 tcp:5343 The adb reverse will fail for "/adb.exe: error: cannot bind to socket/" In the reversed order, if set up adb reverse from the host first and create the live session after, lttng-relayd on device cannot be started. Here is the error message: The "reverse order" you describe is the order you need. What you are missing is to run lttng-relayd on the host and to forward both ports 5342 *and* 5343. You will also need to either override the target URLs for the live control and data ports to prevent sessiond from auto-spawning a relayd, or forward the live viewer port as well through adb (5344). Overall: * First on the Host: lttng-relayd adb reverse tcp:5342 tcp:5342 # control port adb reverse tcp:5343 tcp:5343 # data port adb reverse tcp:5344 tcp:5344 # live viewer port * Then on the Android Device: lttng-sessiond -d --no-kernel lttng create my-live-session --live --ctrl-url=tcp://localhost:5342 --data-url=tcp://localhost:5343 lttng enable-event -u lttng start The reason why the relayd auto-spawn needs to be prevented is because the "lttng create" command line attempts to connect to the localhost relayd as a viewer (default port tcp 5344). So if you don't forward this port as well through adb, the sessiond will always try to auto-spawn a relayd which conflicts with your forwarded ports on the Android device. Technically either forwarding port 5344 or specifying control/data URL override is sufficient to prevent the relayd auto-spawn, but I'd recommend doing both if it is possible. Thanks, Mathieu /PERROR - 15:23:30.915938387 [9813/9829]: Failed to bind socket: Address already in use (in relay_socket_create() at /src/VodkaLttngTool/build/private/source/src/bin/lttng-relayd/main.c:1036)/ /Error: Health error occurred in relay_thread_listener/ /Error: A file descriptor leak has been detected: 1 tracked file descriptors are still being tracked/ -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [lttng-relayd] is there existing cases for relayd to stream over Android usb based adb?
Hi Amanda, For each of the 4 commands described below, please clarify on which device they are executed, whether on the Android device or on the Development device. Please make sure to follow to the letter the commands proposed by Kienan: in the correct order, and on the appropriate device. Thanks, Mathieu On 2024-06-02 22:55, Wu, Yannan via lttng-dev wrote: Yes. My test command is like below: 1. lttng-sessiond --d --no-kernel 2. yannanwu@ue91e96f2951b5c:~/trees/lttng_test_run$ lttng create my-user-space-live-session --live Live session my-user-space-live-session created. Traces will be output to tcp4://127.0.0.1:5342/ [data: 5343] Live timer interval set to 100 us 3. After this, I could "ps -Ax|grep lttng" and see lttng-relayd started. But once I start adb reverse, it will failed for failed binding to socket. 4. In the other order, if I start adb reverse first and lttng-create later, lttng-create will not fail but lttng-relayd is not started. By manually start lttng-relayd it will also failed for unable binding to socket. Amanda *From:* Kienan Stewart *Sent:* Friday, May 31, 2024 3:12:16 AM *To:* Wu, Yannan; lttng-dev@lists.lttng.org *Subject:* RE: [EXTERNAL] [lttng-dev] [lttng-relayd] is there existing cases for relayd to stream over Android usb based adb? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Amanda, I'd like to confirm my understanding the situation. Android device - Running lttng-sessiond with one or more configured sessions Development device - Connected to the android device over usb using adb You want want the data captured on the android device to be streamed via the usb connection rather than the other networks on the android device. Could you expand on the commands you used to set up the tracing sessions and relay, and where each of those commands were run? It sounds to me like you might want to be doing something like the following: (Development device) Start lttng-relayd: - tcp://0.0.0.0:5342 and :5343 will be bound on the development device - tcp://127.0.0.1:5344 will be available for the live reader (Development device) Create the reverses for the following ports: 5342 and 5343 - At this point :5342 and :5343 should be available on the android device and reach the relayd running on the development device (Android device) Start lttng-sessiond (Android device) Create session(s): `lttng create -U tcp://localhost/ - Using `-U/--set-url`, no relayd will be spawned on the android device (Android device) Start session(s) This setup should have the relayd running on the development and writing the traces there and/or viewing them with a live viewer. On the android device, the UST applications (if any) will connect to the local sessiond and consumers, which will shuttle the information over :5342 and :5343 to the developer device via the reverse sockets. Please note that I didn't have time to test this, so there might be some mistakes. As I requested above, clear details of the exact commands you use for the tracing setup would be very helpful to have the clearest understanding of what you're doing. hope this helps, kienan On 5/30/24 1:53 AM, Wu, Yannan via lttng-dev wrote: Hihi, there, I am currently working on enabling lttng live mode over android usb adb. Here is the situation, during debugging some network related issues, we dont want the trace data to be streamed via network to cause extra load to the system being profiled. Then we select to connect lttng-relayd with adb via port forwarding so that the data is "forward" to the host. *Here is the set up and the problem:* for the device: adb reverse tcp:5342 tcp:5342; adb reverse tcp:5343 tcp:5343; adb reverse tcp:5344 tcp:5344 Then starting up lttng with --live enabled. *What is expected:* lttng start streaming to the localhost. *What is seen: * the lttng-relayd failed to start. For unable binding to the socket. *The cause of this issue: * both adb reverse and lttng relayd need binding to the socket which is conflict with each other. So what I wanna ask is, for embedded system use cases, do we have successful use cases among team that could stream the trace data in live mode to the host with usb based adb? If not, any idea or suggestion to me on how to process forward? Amanda ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc
Re: [lttng-dev] [PATCH] Fix mm_vmscan_lru_isolate tracepoint for RHEL 9.4 kernel
On 2024-05-17 12:04, Kienan Stewart via lttng-dev wrote: Hi Martin, thanks for the patch. I changed the version range slightly. The RHEL kernel 5.14.0-427.13.1 still has the `isolate_mode` parameter in the `mm_vmscan_lru_isolate` tracepoint; it was only removed in 5.14.0-427.16.1. I also forward ported the patch to the master branch. The updated patches will be reviewed at: https://review.lttng.org/q/topic:%22buildfix-el9.4%22 Merged into lttng-modules master and stable-2.13, thanks! Mathieu thanks, kienan On 5/17/24 10:30 AM, Martin Hicks via lttng-dev wrote: Redhat has moved to using the format first found in the 6.7 kernel for the mm_vmscan_lru_isolate tracepoint. Signed-off-by: Martin Hicks --- include/instrumentation/events/mm_vmscan.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/instrumentation/events/mm_vmscan.h b/include/instrumentation/events/mm_vmscan.h index ea6f4b7..49a9eae 100644 --- a/include/instrumentation/events/mm_vmscan.h +++ b/include/instrumentation/events/mm_vmscan.h @@ -369,7 +369,9 @@ LTTNG_TRACEPOINT_EVENT_MAP(mm_shrink_slab_end, ) #endif -#if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(6,7,0)) +#if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(6,7,0) || \ + LTTNG_RHEL_KERNEL_RANGE(5,14,0,427,0,0, 5,15,0,0,0,0)) + LTTNG_TRACEPOINT_EVENT(mm_vmscan_lru_isolate, TP_PROTO(int classzone_idx, ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Capturing snapshot on kernel panic
he time > it enters. > > While this doesn't necessarily help your original question of panics, if > you want to snapshot before shutdown or reboot and are using systemd, > it's possible to leave a script or binary in a known directory so that > it's invoked prior to the rest of the shutdown sequence[4]. > > [1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems <https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems> > [2]: > https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c <https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c> > [3]: > https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77 <https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77> > [4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/ <https://www.systutorials.com/docs/linux/man/8-systemd-reboot/> > > hope this helps, > kienan > >> Would you have any suggestions? >> Thanks for your help, >> Cheers >> Damien >> >> >> >> # Prep output dir >> mkdir /application/trace/ >> rm -rf /application/trace/* >> >> # Create session >> sudo lttng destroy snapshot-trace-session >> sudo lttng create snapshot-trace-session --snapshot >> --output="/application/trace/" >> sudo lttng enable-channel --kernel --num-subbuf=8 channelk >> sudo lttng enable-channel --userspace --num-subbuf=8 channelu >> >> # Configure session >> sudo lttng enable-event --kernel --syscall --all --channel channelk >> sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk >> sudo lttng enable-event --userspace --all --channel channelu >> sudo lttng add-context -u -t vtid -t procname >> sudo lttng remove-trigger trig_reboot >> sudo lttng add-trigger --name=trig_reboot \ >> --condition=event-rule-matches --type=kernel:syscall:entry \ >> --name=reboot\ >> --action=snapshot-session snapshot-trace-session \ >> --rate-policy=once-after:1 >> >> # start & list info >> sudo lttng start >> sudo lttng list snapshot-trace-session >> sudo lttng list-triggers >> >> # test it... >> sudo reboot >> >> #=== reconnect and Nothing :( >> $ ls -alu /application/trace/ >> drwxr-xr-x 2 u u 4096 May 15 2024 . >> drwxr-xr-x 10 u u 4096 May 15 2024 .. >> >> >> ___ >> lttng-dev mailing list >> lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> > ___ > lttng-dev mailing list > lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> -- *Damien Berget* ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-modules 2.13.13 and 2.12.17 (Linux kernel tracer)
Hi, This is a stable release announcement for the LTTng kernel tracer, an out-of-tree kernel tracer for the Linux kernel. The LTTng project provides low-overhead, correlated userspace and kernel tracing on Linux. Its use of the Common Trace Format and a flexible control interface allows it to fulfill various workloads. * New in these releases: - LTTng-modules 2.13.13: - Introduce support for Linux v6.9. - Removed unused duplicated code, add missing static to function definitions, and add missing includes for function declarations which were observed when building against recent kernels with newer toolchains. We plan to adapt our CI to add jobs that will report warnings as errors when building lttng-modules against recent kernels with a recent tool chain so we can catch and fix those warnings earlier in the future. - In both LTTng-modules 2.12.17 and 2.13.13: - Fix incorrect get_pfnblock_flags_mask prototype which did not match upstream after upstream commit 535b81e209219 (v5.9). Fix the prototype mismatch detection code as well. This affects the event mm_page_alloc_extfrag which uses get_pageblock_migratetype(). Note that because the kernel macro get_pageblock_migratetype was also updated to pass 3 parameters to get_pfnblock_flags_mask as its kernel prototype was updated to expect three parameters, it does not matter that the lttng-modules wrapper expects 4 parameters and provides those 4 parameters to the kernel function. This issue should therefore not affect the runtime behavior. - Instrumentation updates to support EL 8.4+. - Instrumentation updates for RHEL kernels. - Instrumentation updates to the timer subsystem to adapt to changes backported in the 4.19 stable kernels. * Detailed change logs: 2024-05-13 (National Leprechaun Day) LTTng modules 2.13.13 * splice wrapper: Fix missing declaration * page alloc wrapper: Fix get_pfnblock_flags_mask prototype * lttng probe: include events-internal.h * syscalls: Remove unused duplicated code * statedump: Add missing events-internal.h include * lttng-events: Add missing static * event notifier: Add missing static * context callstack: Add missing static * lttng-clock: Add missing lttng/events-internal.h include * lttng-calibrate: Add missing static and include * lttng-bytecode: Remove dead code * lttng-abi: Add missing static to function definitions * ring buffer: Add missing static to function definitions * blkdev wrapper: Fix constness warning * Fix: timer_expire_entry changed in 4.19.312 * Fix: dev_base_lock removed in linux 6.9-rc1 * Fix: mm_compaction_migratepages changed in linux 6.9-rc1 * Fix: ASoC add component to set_bias_level events in linux 6.9-rc1 * Fix: ASoC snd_doc_dapm on linux 6.9-rc1 * Fix: build kvm probe on EL 8.4+ * Fix: support ext4_journal_start on EL 8.4+ * Fix: correct RHEL range for kmem_cache_free define 2024-05-13 (National Leprechaun Day) 2.12.17 * page alloc wrapper: Fix get_pfnblock_flags_mask prototype * Fix: timer_expire_entry changed in 4.19.312 * Fix: build kvm probe on EL 8.4+ * Fix: support ext4_journal_start on EL 8.4+ * Fix: correct RHEL range for kmem_cache_free define Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH urcu] fix: handle EINTR correctly in get_cpu_mask_from_sysfs
On 2024-05-02 10:32, Michael Jeanson wrote: On 2024-05-02 09:54, Mathieu Desnoyers wrote: On 2024-05-01 19:42, Benjamin Marzinski via lttng-dev wrote: If the read() in get_cpu_mask_from_sysfs() fails with EINTR, the code is supposed to retry, but the while loop condition has (bytes_read > 0), which is false when read() fails with EINTR. The result is that the code exits the loop, having only read part of the string. Use (bytes_read != 0) in the while loop condition instead, since the (bytes_read < 0) case is already handled in the loop. Thanks for the fix ! It is indeed the right thing to do. I would like to integrate this fix into the librseq and libside projects as well though, but I notice the the copy in liburcu is LGPLv2.1 whereas the copy in librseq and libside are MIT. Michael, should we first relicense the liburcu src/compat-smp.h implementation to MIT so it matches the license of the copies in librseq and libside ? Sure, please go ahead. For the records, we also have a copy of this code in lttng-ust, also under MIT license. So liburcu's copy is the only outlier there. Thanks, Mathieu Thanks, Mathieu Signed-off-by: Benjamin Marzinski --- src/compat-smp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compat-smp.h b/src/compat-smp.h index 31fa979..075a332 100644 --- a/src/compat-smp.h +++ b/src/compat-smp.h @@ -164,7 +164,7 @@ static inline int get_cpu_mask_from_sysfs(char *buf, size_t max_bytes, const cha total_bytes_read += bytes_read; assert(total_bytes_read <= max_bytes); - } while (max_bytes > total_bytes_read && bytes_read > 0); + } while (max_bytes > total_bytes_read && bytes_read != 0); /* * Make sure the mask read is a null terminated string. -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH urcu] fix: handle EINTR correctly in get_cpu_mask_from_sysfs
On 2024-05-01 19:42, Benjamin Marzinski via lttng-dev wrote: If the read() in get_cpu_mask_from_sysfs() fails with EINTR, the code is supposed to retry, but the while loop condition has (bytes_read > 0), which is false when read() fails with EINTR. The result is that the code exits the loop, having only read part of the string. Use (bytes_read != 0) in the while loop condition instead, since the (bytes_read < 0) case is already handled in the loop. Thanks for the fix ! It is indeed the right thing to do. I would like to integrate this fix into the librseq and libside projects as well though, but I notice the the copy in liburcu is LGPLv2.1 whereas the copy in librseq and libside are MIT. Michael, should we first relicense the liburcu src/compat-smp.h implementation to MIT so it matches the license of the copies in librseq and libside ? Thanks, Mathieu Signed-off-by: Benjamin Marzinski --- src/compat-smp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compat-smp.h b/src/compat-smp.h index 31fa979..075a332 100644 --- a/src/compat-smp.h +++ b/src/compat-smp.h @@ -164,7 +164,7 @@ static inline int get_cpu_mask_from_sysfs(char *buf, size_t max_bytes, const cha total_bytes_read += bytes_read; assert(total_bytes_read <= max_bytes); - } while (max_bytes > total_bytes_read && bytes_read > 0); + } while (max_bytes > total_bytes_read && bytes_read != 0); /* * Make sure the mask read is a null terminated string. -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-UST 2.12.10 and 2.13.8 (Linux user-space tracer)
LTTng-UST, the Linux Trace Toolkit Next Generation Userspace Tracer, is a low-overhead application tracer. The library "liblttng-ust" enables tracing of applications and libraries. New in both 2.12.10 and 2.13.8: * Add close_range wrapper to liblttng-ust-fd.so GNU libc 2.34 implements a new close_range symbol which is used by the ssh client and other applications to close all file descriptors, including those which do not belong to the application. Override this symbol to prevent the application from closing file descriptors actively used by lttng-ust. * Fix: libc wrapper: use initial-exec for malloc_nesting TLS Use the initial-exec TLS model for the malloc_nesting nesting guard variable to ensure that the GNU libc implementation of the TLS access don't trigger infinite recursion by calling the memory allocator wrapper functions, which can happen with global-dynamic. This fixes a liblttng-ust-libc-wrapper.so regression on recent Fedora distributions. * lttng-ust(3): Fix wrong len_type for sequence `len_type' of a sequence field must be of type unsigned integer. Some provided examples in the man page were incorrectly using a type signed integer, resulting in correct compilation, but error while decoding. New in 2.13.8: * ust-tracepoint-event: Add static check of sequences length type Add a compile-time check to validate that unsigned types are used for the length field of sequences. Detailed change logs: 2024-04-19 (National Garlic Day) lttng-ust 2.13.8 * Add close_range wrapper to liblttng-ust-fd.so * ust-tracepoint-event: Add static check of sequences length type * lttng-ust(3): Fix wrong len_type for sequence * Fix: libc wrapper: use initial-exec for malloc_nesting TLS 2024-04-19 (National Garlic Day) lttng-ust 2.12.10 * Add close_range wrapper to liblttng-ust-fd.so * lttng-ust(3): Fix wrong len_type for sequence * Fix: libc wrapper: use initial-exec for malloc_nesting TL Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Software Heritage archival notification for git.liburcu.org
On 2024-04-15 10:20, Michael Jeanson via lttng-dev wrote: On 2024-04-14 20:39, Paul Wise wrote: On Thu, 2024-04-11 at 13:45 -0400, Michael Jeanson wrote: I see no issues with this, thanks for the heads-up. PS: I note that git.liburcu.org and git.lttng.org seem to have identical contents. I wonder if SWH should be archiving just one of them or if we should archive both just in case they get split up? At the moment 'git.liburu.org' is just a CNAME for 'git.lttng.org', we Typo: git.liburcu.org Thanks, Mathieu don't have plans to split them up so far. ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Compile fix for urcu-bp.c
Hi, There are a few things missing before I can take this patch: - Missing commit message describing the issue, - Missing "Signed-off-by" tag. Thanks! Mathieu On 2024-03-29 10:06, Duncan Sands via lttng-dev wrote: --- src/urcu-bp.c +++ src/urcu-bp.c @@ -409,7 +409,7 @@ void expand_arena(struct registry_arena *arena) new_chunk_size_bytes, 0); if (new_chunk != MAP_FAILED) { /* Should not have moved. */ - assert(new_chunk == last_chunk); + urcu_posix_assert(new_chunk == last_chunk); memset((char *) last_chunk + old_chunk_size_bytes, 0, new_chunk_size_bytes - old_chunk_size_bytes); last_chunk->capacity = new_capacity; ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-modules 2.12.16 and 2.13.12 (Linux kernel tracer)
Hi, This is a release announcement for the currently maintained LTTng-modules Linux kernel tracer stables branches. * New and noteworthy in these releases: Linux kernel v6.8 is now supported by LTTng modules 2.13.12. If you need support for recent kernels (v5.18+), you will need to upgrade to a recent LTTng-modules 2.13.x. Both releases correct issues with SLE kernel version ranges detection. A compilation fix for RHEL 9.3 kernel is present in v2.13.12. Feedback is welcome! Thanks, Mathieu Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download Detailed change logs: 2024-03-21 (National Common Courtesy Day) LTTng modules 2.13.12 * docs: Add supported versions and fix-backport policy * docs: Add links to project resources * Fix: Correct minimum version in jbd2 SLE kernel range * Fix: Handle recent SLE major version codes * Fix: build on sles15sp4 * Compile fixes for RHEL 9.3 kernels * Fix: ext4_discard_preallocations changed in linux 6.8.0-rc3 * Fix: btrfs_get_extent flags and compress_type changed in linux 6.8.0-rc1 * Fix: btrfs_chunk tracepoints changed in linux 6.8.0-rc1 * Fix: strlcpy removed in linux 6.8.0-rc1 * Fix: timer_start changed in linux 6.8.0-rc1 * Fix: sched_stat_runtime changed in linux 6.8.0-rc1 2024-03-21 (National Common Courtesy Day) 2.12.16 * fix: lttng-probe-kvm-x86-mmu build with linux 6.6 * docs: Add supported versions and fix-backport policy * docs: Add links to project resources * Fix: Correct minimum version in jbd2 SLE kernel range * Fix: Handle recent SLE major version codes * Fix: build on sles15sp4 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH] coredump debugging: add a tracepoint to report the coredumping
On 2024-02-23 09:26, Steven Rostedt wrote: On Mon, 19 Feb 2024 13:01:16 -0500 Mathieu Desnoyers wrote: Between "sched_process_exit" and "sched_process_free", the task can still be observed by a trace analysis looking at sched and signal events: it's a zombie at that stage. Looking at the history of this tracepoint, it was added in 2008 by commit 0a16b60758433 ("tracing, sched: LTTng instrumentation - scheduler"). Hmm, LLTng? I wonder who the author was? [ common typo: LLTng -> LTTng ;-) ] Author: Mathieu Desnoyers :-D Mathieu, I would say it's your call on where the tracepoint can be located. You added it, you own it! Wow! that's now 16 years ago :) I've checked with Matthew Khouzam (maintainer of Trace Compass) which care about this tracepoint, and we have not identified any significant impact of moving it on its model of the scheduler, other than slightly changing its timing. I've also checked quickly in lttng-analyses and have not found any code that care about its specific placement. So I would say go ahead and move it earlier in do_exit(), it's fine by me. If you are interested in a bit of archeology, "sched_process_free" originated from my ltt-experimental 0.1.99.13 kernel patch against 2.6.12-rc4-mm2 back in September 2005 (that's 19 years ago). It was a precursor to the LTTng 0.x kernel patchset. https://lttng.org/files/ltt-experimental/patch-2.6.12-rc4-mm2-ltt-exp-0.1.99.13.gz Index: kernel/exit.c === --- a/kernel/exit.c (.../trunk/kernel/linux-2.6.12-rc4-mm2) (revision 41) +++ b/kernel/exit.c (.../branches/mathieu/linux-2.6.12-rc4-mm2) (revision 41) @@ -4,6 +4,7 @@ * Copyright (C) 1991, 1992 Linus Torvalds */ +#include #include #include #include @@ -55,6 +56,7 @@ static void __unhash_process(struct task } REMOVE_LINKS(p); + trace_process_free(p->pid); } void release_task(struct task_struct * p) @@ -832,6 +834,8 @@ fastcall NORET_TYPE void do_exit(long co } exit_mm(tsk); + trace_process_exit(tsk->pid); + exit_sem(tsk); __exit_files(tsk); __exit_fs(tsk); This was a significant improvement over the prior LTT which only had the equivalent of "sched_process_exit", which caused issues with the Linux scheduler model in LTTV due to zombie processes. Here is where it appeared in LTT back in 1999: http://www.opersys.com/ftp/pub/LTT/TracePackage-0.9.0.tgz patch-ltt-2.2.13-991118 diff -urN linux/kernel/exit.c linux-2.2.13/kernel/exit.c --- linux/kernel/exit.c Tue Oct 19 20:14:02 1999 +++ linux-2.2.13/kernel/exit.c Sun Nov 7 23:49:17 1999 @@ -14,6 +14,8 @@ #include #endif +#include + #include #include #include @@ -386,6 +388,8 @@ del_timer(>real_timer); end_bh_atomic(); + TRACE_PROCESS(TRACE_EV_PROCESS_EXIT, 0, 0); + lock_kernel(); fake_volatile: #ifdef CONFIG_BSD_PROCESS_ACCT And it was moved to its current location (after exit_mm()) a bit later (2001): http://www.opersys.com/ftp/pub/LTT/TraceToolkit-0.9.5pre2.tgz Patches/patch-ltt-linux-2.4.5-vanilla-010909-1.10 diff -urN linux/kernel/exit.c /ext2/home/karym/kernel/linux-2.4.5/kernel/exit.c --- linux/kernel/exit.c Fri May 4 17:44:06 2001 +++ /ext2/home/karym/kernel/linux-2.4.5/kernel/exit.c Wed Jun 20 12:39:24 2001 @@ -14,6 +14,8 @@ #include #endif +#include + #include #include #include @@ -439,6 +441,8 @@ #endif __exit_mm(tsk); + TRACE_PROCESS(TRACE_EV_PROCESS_EXIT, 0, 0); + lock_kernel(); sem_exit(); __exit_files(tsk); So this sched_process_exit placement was actually decided by Karim Yaghmour back in the LTT days (2001). I don't think he will mind us moving it around some 23 years later. ;) Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] New TLS usage in libgcc_s.so.1, compatibility impact
On 2024-01-15 14:42, Florian Weimer wrote: * Mathieu Desnoyers: [...] General use of lttng should be fine, I think, only the malloc wrapper has this problem. The purpose of the nesting counter TLS variable in the malloc wrapper is to catch situations like this where a global-dynamic TLS access (or any unexpected memory access done as a side-effect from calling libc) from within LTTng-UST instrumentation would internally attempt to call recursively into the malloc wrapper. In that nested case, we skip the instrumentation and call the libc function directly. I agree with your conclusion that only this nesting counter gating variable actually needs to be initial-exec. But moving all TLS variables used by lttng-ust from global-dynamic to initial-exec is tricky, because a prior attempt to do so introduced regressions in use-cases where lttng-ust was dlopen'd by Java or Python, AFAIU situations where the runtimes were already using most of the extra memory pool for dlopen'd libraries initial-exec variables, causing dlopen of lttng-ust to fail. Oh, right, that makes it quite difficult. Could you link a private copy of the libraries into the wrapper that uses initial-exec TLS? Unfortunately not easily, because by design LTTng-UST is meant to be a singleton per-process. Changing this would have far-reaching impacts on interactions with the LTTng-UST tracepoint instrumentation, as well as impacts on synchronization between the LTTng-UST agent thread and application calling fork/clone. Also AFAIR, the LTTng session daemon (at least until recently) does not expect multiple concurrent registrations from a given process. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] New TLS usage in libgcc_s.so.1, compatibility impact
ht stage of the test.) In this particular case, we can also paper over the test failure in glibc by not call free at all because the argument is a null pointer: diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 7b3dd9ab60..14c71cbd06 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t new_gen) dtv entry free it. Note: this is not AS-safe. */ /* XXX Ideally we will at some point create a memory pool. */ - free (dtv[modid].pointer.to_free); + if (dtv[modid].pointer.to_free != NULL) + free (dtv[modid].pointer.to_free); dtv[modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[modid].pointer.to_free = NULL; As the comment hints, we shouldn't be using malloc for TLS memory at all because it is not AS-safe, but that's a long-term change. This change seems rather specific to this particular test case failure because it relies on libgcc_s.so.1 never using TLS before it gets unloaded. Regarding the libgcc_s side, I'm not sure if the TLS usage there should be considered a real problem, although I'm a bit nervous about it. However, the current implementation caches one page of trampolines past the outermost nested function pointer deallocation (otherwise creating one function pointer per thread in a loop would be really expensive). It looks to me that is never freed, so if the thread exits even with proper unwinding (e.g., on glibc with code compiled with -fexceptions), there is a memory leak. Integration with glibc could avoid this issue, and also help with the longjmp problem, and fix setcontext/swapcontext, too. Thanks, Florian ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-modules 2.12.15 and 2.13.11 (Linux kernel tracer)
The LTTng modules provide Linux kernel tracing capability to the LTTng tracer toolset. * New and noteworthy in these releases: Newer Linux kernels (v6.6 and v6.7) are now supported by LTTng modules 2.13.11. If you need support for recent kernels (v5.18+), you will need to upgrade to a recent LTTng-modules 2.13.x. The "prio" context has been fixed in 2.13.11 to eliminate a crash triggered by calling a NULL pointer address when using the "prio" context (lttng add-context -k -t prio). This issue was introduced when refactoring the prio context code during the 2.13 development. The missing initialization was re-introduced, and the use of the kernel "task_prio()" symbol was entirely replaced by inlining a copy of this trivial function into lttng-modules instead. The "built-in.sh" script which can be used to add a link to lttng-modules within a kernel source tree to built LTTng into a Linux kernel image has been updated to adapt to changes introduced in Linux v6.1. A work-around to ensure that LTTng-modules works fine on CPUs and kernels with IBT support enabled has been integrated: When the Intel IBT feature is enabled, a CPU supporting this feature validates that all indirect jumps/calls land on an ENDBR64 instruction. The kernel seals functions which are not meant to be called indirectly, which means that calling functions indirectly from their address fetched using kallsyms or kprobes trigger a crash. Use the MSR_IA32_S_CET CET_ENDBR_EN MSR bit to temporarily disable ENDBR validation around indirect calls to kernel functions. Considering that the main purpose of this feature is to prevent ROP-style attacks, disabling the ENDBR validation temporarily around the call from a kernel module does not affect the ROP protection. Both 2.13.11 and 2.12.15: - Fix an issue with importing VFS namespace for Android kernels. - Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+ - Fix a hardening OOPS during validation of immediate strings in the bytecode validator when CONFIG_UBSAN_BOUNDS and/or CONFIG_FORTIFY_SOURCE are configured. It boils down to changing 0-len arrays to flexible arrays to let the toolchain know about our intent. - Add Ubuntu Kinetic kernel ranges for jbd2 instrumentation. Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download Detailed change logs: 2024-01-10 (National Houseplant Appreciation Day) LTTng modules 2.13.11 * Fix: Include linux/sched/rt.h for kernels v3.9 to v3.14 * Fix: Disable IBT around indirect function calls * Inline implementation of task_prio() * Fix: prio context NULL pointer exception * Fix: MODULE_IMPORT_NS is introduced in kernel 5.4 * Android: Import VFS namespace for android common kernel * Fix: get_file_rcu is missing in kernels < 4.1 * fix: lookup_fd_rcu replaced by lookup_fdget_rcu in linux 6.7.0-rc1 * fix: mm, vmscan signatures changed in linux 6.7.0-rc1 * fix: phys_proc_id and cpu_core_id moved in linux 6.7.0-rc1 * Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+ * Fix: bytecode validator: oops during validation of immediate string * fix: lttng-probe-kvm-x86-mmu build with linux 6.6 * fix: built-in lttng with kernel >= v6.1 * fix: ubuntu kinetic kernel range for jdb2 2024-01-10 (National Houseplant Appreciation Day) 2.12.15 * Fix: MODULE_IMPORT_NS is introduced in kernel 5.4 * Android: Import VFS namespace for android common kernel * Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+ * Fix: bytecode validator: oops during validation of immediate string * fix: ubuntu kinetic kernel range for jdb2 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-UST 2.12.9 and 2.13.7 (Linux user-space tracer)
LTTng-UST, the Linux Trace Toolkit Next Generation Userspace Tracer, is a low-overhead application tracer. The library "liblttng-ust" enables tracing of applications and libraries. * New and noteworthy in these releases: Specific to 2.13.7, a fix for misaligned urcu reader accesses was introduced. It only applies to the lttng-ust 2.13 branch because it implements its own "lttng-ust-urcu" flavor. Also specific to 2.13.7, "sync" vs "unsync" enablers are introduced to eliminate an O(n*m) algorithm: Eliminate iteration over unmodified enablers when synchronizing the enablers vs event state. The intent is to turn a O(m*n) algorithm (m = number of enablers, n = number of event probes) into a O(n) when enabling many additional events when tracing is active. Specifically in 2.12.9, the rfork() wrapper is fixed: it was not passing the flags arguments. This was fixed in a larger commit in the master and stable-2.13 branches. Both stable branches include: - a build system fix for documentation examples with old autoconf when used with a relative path. - a clang warning fix around volatile qualifier on function pointers. - Python agent uplift to adapt to modern python (>= 3.10), - a possible race condition in the ustfork helper. Enjoy! Mathieu Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download Detailed change logs: 2024-01-10 (National Houseplant Appreciation Day) lttng-ust 2.13.7 * fix: invoke MKDIR_P before changing directories * fix: -Wsingle-bit-bitfield-constant-conversion with clang16 * fix: clean java inner class files in examples * Introduce sync vs unsync enablers * Fix: misaligned urcu reader accesses * ustfork: Fix warning about volatile qualifier * ustfork: Fix possible race conditions * Fix: tracepoint: Remove trailing \ at the end of macro * fix: python agent: use stdlib distutils when setuptools is installed * fix: python agent: install on Debian python >= 3.10 * fix: python agent: Add a dependency on generated files * python: use setuptools with python >= 3.12 2024-01-10 (National Houseplant Appreciation Day) lttng-ust 2.12.9 * fix: invoke MKDIR_P before changing directories * fix: clean java inner class files in examples * ustfork: Fix warning about volatile qualifier * ustfork: Fix possible race conditions * Fix: FreeBSD: Pass flags arguments to rfork wrapper * fix: python agent: use stdlib distutils when setuptools is installed * fix: python agent: install on Debian python >= 3.10 * fix: python agent: Add a dependency on generated files * python: use setuptools with python >= 3.12 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH lttng-modules] Android: Import VFS namespace for android common kernel
On 2023-12-18 05:16, Lei wang via lttng-dev wrote: Android GKI kernel add limitation on fs interface usage. Need to import VFS namespace explicitly to make it workable for lttng-modules. Merged into lttng-modules master and 2.13 branches, thanks! Mathieu Signed-off-by: Lei wang --- src/wrapper/kallsyms.c | 4 1 file changed, 4 insertions(+) diff --git a/src/wrapper/kallsyms.c b/src/wrapper/kallsyms.c index 97897c4..9398c83 100644 --- a/src/wrapper/kallsyms.c +++ b/src/wrapper/kallsyms.c @@ -113,3 +113,7 @@ unsigned long wrapper_kallsyms_lookup_name(const char *name) EXPORT_SYMBOL_GPL(wrapper_kallsyms_lookup_name); #endif + +#ifdef CONFIG_ANDROID +MODULE_IMPORT_NS(VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver); +#endif -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] TSAN build broken on master branch
On 9/21/23 21:21, Olivier Dion via lttng-dev wrote: On Thu, 21 Sep 2023, Ondřej Surý via lttng-dev wrote: [...] It fails with: rculfhash.c:1189:2: error: address argument to atomic operation must be a pointer to integer ('typeof (node_next)' (aka 'struct cds_lfht_node **') invalid) uatomic_or_mo(node_next, REMOVED_FLAG, CMM_RELEASE); ^~~ ../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 'uatomic_or_mo' (void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \ ^ ~~~ rculfhash.c:1440:3: error: address argument to atomic operation must be a pointer to integer ('typeof (fini_bucket_next)' (aka 'struct cds_lfht_node **') invalid) uatomic_or(fini_bucket_next, REMOVED_FLAG); ^~ ../include/urcu/uatomic/builtins-generic.h:130:2: note: expanded from macro 'uatomic_or' uatomic_or_mo(addr, mask, CMM_RELAXED) ^~ ../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 'uatomic_or_mo' (void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \ ^ ~~~ Eh I thought we fixed that. Clang is very strict about these things. You can apply the following <https://review.lttng.org/c/userspace-rcu/+/10911/1>. That ought to fix the issue until we merge the patch. Fix merged into liburcu master, thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms
On 9/10/23 10:18, Mousa, Anas wrote: Hey Mathieu, Hi Anas, We see that upon recording a tracepoint, there are multiple stages of reserve-commit-write, where atomics and shared memory accesses take up a big part of the recording time, we're wondering, is there a "light-mode" of recording a tracepoint involving less logic or a mode which can potentially have lower latency? I've been working on the rseq(2) system call for a few years now, and this is intended to help reduce the cost of lttng-ust's ring buffer atomics on the tracing fast-path. The road ahead there is integration of rseq with lttng-ust, which did not show up on our customer feature requirements radar yet. In terms of logic involved in the lttng-ust tracepoints, I hope that my current work on "libside" will help steer away from tracepoint providers based on macros and generated code, replacing this by an efficient bytecode interpreter. This should allow me to inline many of the calls that are currently needed between the tracepoint probe provider and the lttng-ust ring buffer. Again, this is an area where I think we can have great speed improvements, but it did not show up on our customer's feature requirement radar yet. Also, are there any recent docs to share regarding tracepoint latency? There is a Polytechnique student who extensively analyzed this recently. Michel, do you have a pointer to his work ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [RFC] Deprecating RCU signal flavor
On 8/23/23 10:47, Paul E. McKenney wrote: On Mon, Aug 21, 2023 at 11:43:32AM -0400, Mathieu Desnoyers wrote: On 8/15/23 08:38, Mathieu Desnoyers via lttng-dev wrote: On 8/14/23 17:05, Olivier Dion via lttng-dev wrote: After discussing it with Mathieu, we agree on the following 3 phases for deprecating the signal flavor: 1) liburcu-signal will be implemented in term of liburcu-mb. The only difference between the two flavors will be the public header files, linked symbols and library name. Note that this add a regression in term of performance, since the implementation of liburcu-mb adds memory barriers on the reader side which are not present in the original liburcu-signal implementation. 2) Adding the deprecated attribute to every public functions exposed by the liburcu-signal flavor. At this point, tests for liburcu-signal will also be removed from the project. There will be no more support for this flavor. 3) Removing the liburcu-signal flavor completely from the project. Finally, here is a tentative versions release of mine for each phase: 1) 0.15.0 [October 2023] (also TSAN support yay!) 2) 0.15.1 3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking change) There is a distinction between the version number of the liburcu project (0.14) and the ABI soname for the shared objects. We may be able to do step (3) without going to 1.0.0 (I don't see removal of the urcu-signal flavor a strong enough motivation for hitting 1.0.0 yet). Technically speaking, given that we would be removing the entire liburcu-signal.so shared object, we would not be changing _symbols_ within an existing shared object, therefore I'm not even sure we need to bump the soname for all the other remaining shared objects. So after merging this commit: Phase 1 of deprecating liburcu-signal The first phase of liburcu-signal deprecation consists of implementing it in term of liburcu-mb. In other words, liburcu-signal is identical to liburcu-mb at the exception of the function symbols and public header files. This is done by: 1) Removing the RCU_SIGNAL specific code in urcu.c 2) Making the RCU_MB specific code also specific to RCU_SIGNAL in urcu.c 3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with cmm_barrier() around it. We could keep the explicit barriers, but that would require to add some cmm_annotate annotations. Therefore, to be less intrusive in a public header file, simply use the CMM_SEQ_CST like for the mb flavor. I notice that an application previously built against urcu-signal with _LGPL_SOURCE defined would have to be rebuilt, which would require a soname bump of urcu-signal. So considering that this phase 1 is not really a "drop in" replacement, I favor removing the urcu-signal flavor entirely before the next release. Thoughts ? The replacement is liburcu-mb, correct? After merging this "phase 1" of the removal, I noticed that we would need to require applications built with _LGPL_SOURCE defined and using liburcu-signal to be rebuilt, which would require a major library soname bump, which I would prefer to avoid unless necessary. Therefore, I went ahead and pushed additional commits in the master branch which completely remove liburcu-signal from the tree. Therefore, the next release of liburcu will not have the liburcu-signal header files nor its library shared objects. I will need to change perfbook, but that should be an easy change, plus sys_membarrier() is widely available by now. Users of liburcu-signal would be expected to migrate to liburcu-memb, which relies on membarrier to achieve similar performance, but with lower-overhead grace periods. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [RFC] Deprecating RCU signal flavor
On 8/15/23 08:38, Mathieu Desnoyers via lttng-dev wrote: On 8/14/23 17:05, Olivier Dion via lttng-dev wrote: After discussing it with Mathieu, we agree on the following 3 phases for deprecating the signal flavor: 1) liburcu-signal will be implemented in term of liburcu-mb. The only difference between the two flavors will be the public header files, linked symbols and library name. Note that this add a regression in term of performance, since the implementation of liburcu-mb adds memory barriers on the reader side which are not present in the original liburcu-signal implementation. 2) Adding the deprecated attribute to every public functions exposed by the liburcu-signal flavor. At this point, tests for liburcu-signal will also be removed from the project. There will be no more support for this flavor. 3) Removing the liburcu-signal flavor completely from the project. Finally, here is a tentative versions release of mine for each phase: 1) 0.15.0 [October 2023] (also TSAN support yay!) 2) 0.15.1 3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking change) There is a distinction between the version number of the liburcu project (0.14) and the ABI soname for the shared objects. We may be able to do step (3) without going to 1.0.0 (I don't see removal of the urcu-signal flavor a strong enough motivation for hitting 1.0.0 yet). Technically speaking, given that we would be removing the entire liburcu-signal.so shared object, we would not be changing _symbols_ within an existing shared object, therefore I'm not even sure we need to bump the soname for all the other remaining shared objects. So after merging this commit: Phase 1 of deprecating liburcu-signal The first phase of liburcu-signal deprecation consists of implementing it in term of liburcu-mb. In other words, liburcu-signal is identical to liburcu-mb at the exception of the function symbols and public header files. This is done by: 1) Removing the RCU_SIGNAL specific code in urcu.c 2) Making the RCU_MB specific code also specific to RCU_SIGNAL in urcu.c 3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with cmm_barrier() around it. We could keep the explicit barriers, but that would require to add some cmm_annotate annotations. Therefore, to be less intrusive in a public header file, simply use the CMM_SEQ_CST like for the mb flavor. I notice that an application previously built against urcu-signal with _LGPL_SOURCE defined would have to be rebuilt, which would require a soname bump of urcu-signal. So considering that this phase 1 is not really a "drop in" replacement, I favor removing the urcu-signal flavor entirely before the next release. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [RFC] Deprecating RCU signal flavor
On 8/14/23 17:05, Olivier Dion via lttng-dev wrote: After discussing it with Mathieu, we agree on the following 3 phases for deprecating the signal flavor: 1) liburcu-signal will be implemented in term of liburcu-mb. The only difference between the two flavors will be the public header files, linked symbols and library name. Note that this add a regression in term of performance, since the implementation of liburcu-mb adds memory barriers on the reader side which are not present in the original liburcu-signal implementation. 2) Adding the deprecated attribute to every public functions exposed by the liburcu-signal flavor. At this point, tests for liburcu-signal will also be removed from the project. There will be no more support for this flavor. 3) Removing the liburcu-signal flavor completely from the project. Finally, here is a tentative versions release of mine for each phase: 1) 0.15.0 [October 2023] (also TSAN support yay!) 2) 0.15.1 3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking change) There is a distinction between the version number of the liburcu project (0.14) and the ABI soname for the shared objects. We may be able to do step (3) without going to 1.0.0 (I don't see removal of the urcu-signal flavor a strong enough motivation for hitting 1.0.0 yet). Technically speaking, given that we would be removing the entire liburcu-signal.so shared object, we would not be changing _symbols_ within an existing shared object, therefore I'm not even sure we need to bump the soname for all the other remaining shared objects. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH] Fix: list lttng sub-directory in Kbuild
On 8/10/23 06:05, Richa Bharti wrote: From: Richa Bharti Hi! Thanks for your patch. I'm adding Michael Jeanson and the lttng-dev mailing list in CC. Thanks, Mathieu * Linux kernel>=6.1 reads sub-directory from Kbuild * Kernel < 6.1 reads sub-directory from Makefile Signed-off-by: Richa Bharti --- scripts/built-in.sh | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/scripts/built-in.sh b/scripts/built-in.sh index f0594ec..2451230 100755 --- a/scripts/built-in.sh +++ b/scripts/built-in.sh @@ -14,9 +14,19 @@ KERNEL_DIR="$(readlink --canonicalize-existing "$1")" # Symlink the lttng-modules directory in the kernel source ln -sf "$(pwd)" "${KERNEL_DIR}/lttng" +# Get kernel version from Makefile +version=$(grep -m 1 VERSION ${KERNEL_DIR}/Makefile | sed 's/^.*= //g') +patchlevel=$(grep -m 1 PATCHLEVEL ${KERNEL_DIR}/Makefile | sed 's/^.*= //g') +kernel_version=${version}.${patchlevel} + # Graft ourself to the kernel build system echo 'source "lttng/src/Kconfig"' >> "${KERNEL_DIR}/Kconfig" -sed -i 's#+= kernel/#+= kernel/ lttng/#' "${KERNEL_DIR}/Makefile" + +if awk "BEGIN {exit !(${kernel_version} >= 6.1)}"; then + echo 'obj-y += lttng/' >> "${KERNEL_DIR}/Kbuild" +else + sed -i 's#+= kernel/#+= kernel/ lttng/#' "${KERNEL_DIR}/Makefile" +fi echo >&2 echo "$0: done." >&2 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Status of LTTng-scope and Lttng-analyses
On 7/18/23 15:27, Cook, Layne via lttng-dev wrote: Can you tell me the status of the beta projects listed on the web site? LTTng scope LTTng analyses The github projects haven't had an activity for quite a while. Have these projects been abandoned, or superceded by something else? Hi Layne, Thanks for your interest in those projects! The LTTng scope beta project was an attempt at doing a significant UX redesign of Trace Compass, starting from a use-cases/user workflow perspective. We currently don't have the resources/funding/staff to work on this project further, so it has not progressed for a while. You should look at the Trace Compass and VSCode trace extension projects instead, which have a lot more activity: https://tracecompass.org https://github.com/eclipse-cdt-cloud/vscode-trace-extension The LTTng analyses beta project is a set of python scripts to analyze LTTng traces. Our original intent with that project was that EfficiOS would fund the work to create those analyses as prototypes in Python, and eventually customers would fund the rather large amount of work required to go from a prototype (slow scripts) to a production quality project (faster C++ implementation, generic state tracking module). Unfortunately, this never materialized, so this beta project has been on the back burner as well. In the recent years we have focused our efforts on the Babeltrace 2 project and on CTF2 (Common Trace Format version 2). Feel free to have a look at Trace Compass and VSCode trace extension, and please let us know if LTTng scope and LTTng analyses fill a gap that is not covered by those other tools. Thanks, Mathieu Thanks, LC ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Status of the RCU Red Black Tree
On 7/12/23 14:44, Uttormark, Mike via lttng-dev wrote: What became of the red-black tree effort? I see it in the git repo, 10 years old. It never made it onto master. What would it take to get it onto master and into a release branch? Hi Mike, There are a few things that are in the way of merging it into a liburcu release, namely: * An end user with a clearly defined use-case to allow defining a solid API, * Validation that those use-cases are not better covered by some variation of my RCU Judy Array prototype instead, ref.: https://github.com/urcu/userspace-rcu/tree/urcu/rcuja-simple-int * More testing, both within the liburcu project and in terms of use of the API from an application perspective, * Funding for all that work, allowing us to prioritize this effort with respect to our various other projects. Thanks for your interest in the liburcu Red-Black Tree prototype! Please don't hesitate to reach out to EfficiOS if HPE would like to explore supporting this project. Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Fwd: lttng issue
.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH lttng-modules 0/1] Introduce configure script to describe changes in linux kernel interface
On 7/4/23 14:39, Roxana Nicolescu wrote: Hi, Thanks a lot for your feedback. I realize I did not say the reason why I did not go for LTTNG_UBUNTU_KERNEL_RANGE. We deliver a bunch of different derivatives (inherited from the main kernel), each with its own version and it's impossible to use LTTNG_UBUNTU_KERNEL_RANGE alone. Derivatives in the same cycle don't have the same version number, so I cannot rely on the version alone to determine when a change has happened. For example these are some kernels we released last cycle: - linux (main kernel): 5.19.0-46 - linux-kvm: 5.19.0-1026 - linux-lowlatency: 5.19.0-1028 As you can see, linux-kvm and linux-lowlatency versions are not the same, and linux-lowlatency from 2 months ago version version number coincides with linux-kvm from now, but they don't match the same base. I hope that explains it. Initially I thought about exposing the version of the main kernel in the kernel headers that can be later used in the module, but then I came across openvswitch and that's how I came up with the idea of an initial configure step. But I totally understand if you think this is not worth it. LTTng modules use the UTS_UBUNTU_RELEASE_ABI from the Ubuntu generated/utsrelease.h kernel headers to detect tracepoint instrumentation changes. I don't understand why many kernel flavors would have the same ABI number with different ABI semantics, but I guess that's just how things are now. One way to solve this would be to detect the "-lowlatency" and "-kvm" suffixes in the string within generated/utsrelease.h UTS_RELEASE, e.g.: #define UTS_RELEASE "5.15.0-76-lowlatency" This could be done by LTTng modules by implementing a script similar to what we do for debian, fedora, rhel, and sle (see scripts/ in lttng-modules). Then we could have: * LTTNG_UBUNTU_KERNEL_RANGE for kernels where all flavors have the same kernel ABI. * LTTNG_UBUNTU_GENERIC_KERNEL_RANGE for generic kernels only, for situations where the kernel ABI differ between flavors, * LTTNG_UBUNTU_LOWLATENCY_KERNEL_RANGE for lowlatency kernels only, for situations where the kernel ABI differ between flavors, * LTTNG_UBUNTU_KVM_KERNEL_RANGE for kvm kernels only, for situations where the kernel ABI differ between flavors. It would all have been simpler if the UTS_UBUNTU_RELEASE_ABI would actually have been a versioned kernel ABI without different semantics across kernel flavors, but considering the current situation we will need to deal with this with scripts as we have done for other distributions. Thanks, Mathieu All the best, Roxana On 04/07/2023 20:07, Mathieu Desnoyers wrote: On 7/4/23 11:35, Michael Jeanson via lttng-dev wrote: On 2023-07-03 14:28, Roxana Nicolescu via lttng-dev wrote: This script described the changes in the linux kernel interface that affect compatibility with lttng-modules. It is introduced for a specific usecase where commit d87a7b4c77a9: "jbd2: use the correct print format" broke the interface between the kernel and lttng-module. 3 variables changed their type to tid_t (transaction, head and tid) in multiple function declarations. The lttng module was updated properly to ensure backwards compatibility by using the version of the kernel. But this change took into account only long term supported versions. As an example, ubuntu 5.19 kernels picked the linux kernel change from 5.15 without actually changing the linux kernel upstream version. This means the current tooling does not allow to fix the module for newer ubuntu 5.19 kernels. This script is supposed to solve the problem mentioned above, but to also make this change easier to integrate. We check the linux kernel header (include/trace/events/jbd2.h) if the types of tid, transaction and head variable have changed to tid_t and define these 3 variables in 'include/generated/config.h': TID_IS_TID_T 1 TRANSACTION_IS_TID_T 1 HEAD_IS_TID_T 1 In 'include/instrumentation/events/jbd2.h' we then check these to define the proper type of transaction, head and tid variables that will be later used in the function declarations that need them. This change is meant to remove the dependency on linux kernel version and the outcome is a bit cleaner that before. As with the previous implementation, this may need changes in the future if the kernel interface changes again. Note: This is a proposal for a simpler way of integrating linux kernel changes in lttng-modules. The implementation is very simple due to the fact that tid_t was introduced everywhere in one commit in include/trace/events/jbd2.h. I would like to get your opinion on this approach. If needed, it can be improved. Roxana Nicolescu (1): Introduce configure script to describe changes in linux kernel interface README.md | 3 +- configure | 36 + include/instrumentation/events/jbd2.h | 110 ++ 3 files changed, 61 insertions(+), 88 deletio
Re: [lttng-dev] [PATCH lttng-modules 0/1] Introduce configure script to describe changes in linux kernel interface
On 7/4/23 11:35, Michael Jeanson via lttng-dev wrote: On 2023-07-03 14:28, Roxana Nicolescu via lttng-dev wrote: This script described the changes in the linux kernel interface that affect compatibility with lttng-modules. It is introduced for a specific usecase where commit d87a7b4c77a9: "jbd2: use the correct print format" broke the interface between the kernel and lttng-module. 3 variables changed their type to tid_t (transaction, head and tid) in multiple function declarations. The lttng module was updated properly to ensure backwards compatibility by using the version of the kernel. But this change took into account only long term supported versions. As an example, ubuntu 5.19 kernels picked the linux kernel change from 5.15 without actually changing the linux kernel upstream version. This means the current tooling does not allow to fix the module for newer ubuntu 5.19 kernels. This script is supposed to solve the problem mentioned above, but to also make this change easier to integrate. We check the linux kernel header (include/trace/events/jbd2.h) if the types of tid, transaction and head variable have changed to tid_t and define these 3 variables in 'include/generated/config.h': TID_IS_TID_T 1 TRANSACTION_IS_TID_T 1 HEAD_IS_TID_T 1 In 'include/instrumentation/events/jbd2.h' we then check these to define the proper type of transaction, head and tid variables that will be later used in the function declarations that need them. This change is meant to remove the dependency on linux kernel version and the outcome is a bit cleaner that before. As with the previous implementation, this may need changes in the future if the kernel interface changes again. Note: This is a proposal for a simpler way of integrating linux kernel changes in lttng-modules. The implementation is very simple due to the fact that tid_t was introduced everywhere in one commit in include/trace/events/jbd2.h. I would like to get your opinion on this approach. If needed, it can be improved. Roxana Nicolescu (1): Introduce configure script to describe changes in linux kernel interface README.md | 3 +- configure | 36 + include/instrumentation/events/jbd2.h | 110 ++ 3 files changed, 61 insertions(+), 88 deletions(-) create mode 100755 configure Hi Roxana, While I can see advantages to a configure script approach to detect kernel source changes I don't think it's worth the added complexity on top of our current kernel version range system. We already have an Ubuntu specific kernel range macro that supplements the upstream version with Ubuntu's kernel ABI number: LTTNG_UBUNTU_KERNEL_RANGE(5,19,17,X, 6,0,0,0) I'll let Mathieu make the final call but I think that would be the preferred approach. Indeed, many of the kernel tracepoint code changes we had to deal with in the past 10 years would not be easy to track with configure scripts, so we would end up with not just one, but with a combination of two different mechanisms to adapt to kernel code changes. In order to keep things maintainable long-term, I prefer that we stay with the version-based approach as recommended by Michael. Thanks, Mathieu Regards, Michael ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
On 6/29/23 13:27, Olivier Dion wrote: On Thu, 29 Jun 2023, Olivier Dion wrote: [0] https://godbolt.org/z/3nW14M3v1 [1] https://godbolt.org/z/TcTeMeKbW Sorry. That was: [0] https://godbolt.org/z/ETcxnz4TW Change (volatile __typeof__(ptr))(ptr); for: (volatile __typeof__(*(ptr)) *)(ptr); and: void love_iso(int *x) { __atomic_store_n(cast_volatile(), 1, __ATOMIC_RELAXED); } for void love_iso(int *x) { __atomic_store_n(cast_volatile(x), 1, __ATOMIC_RELAXED); } Thanks, Mathieu [1] https://godbolt.org/z/jMjh8YoM4 -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
On 6/29/23 13:22, Olivier Dion wrote: On Thu, 22 Jun 2023, "Paul E. McKenney" wrote: On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote: On 6/21/23 19:19, Paul E. McKenney wrote: I suggest C11 volatile atomic load/store. Load/store fusing is permitted for non-volatile atomic loads and stores, and such fusing can ruin your code's entire day. ;-) After some testing, I got a wall of warnings: -Wignored-qualifiers: Warn if the return type of a function has a type qualifier such as "const". For ISO C such a type qualifier has no effect, since the value returned by a function is not an lvalue. For C++, the warning is only emitted for scalar types or "void". ISO C prohibits qualified "void" return types on function definitions, so such return types always receive a warning even without this option. Since we are using atomic builtins, for example load: type __atomic_load_n (type *ptr, int memorder) If we put the qualifier volatile to TYPE, we end up with the same qualifier on the return value, triggering a warning for each atomic operation. This seems to be only a problem when compiling in C++ [0] while in C it seems the compiler is more relaxed on this [1]. Ideas to make the toolchains happy? :-) Change: (__typeof__(*ptr) *volatile)(ptr); (which applies the volatile to the pointer, rather than what is pointed to) to either: (volatile __typeof__(*ptr) *)(ptr); or: (__typeof__(*ptr) volatile *)(ptr); Thanks, Mathieu [0] https://godbolt.org/z/3nW14M3v1 [1] https://godbolt.org/z/TcTeMeKbW -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
On 6/22/23 15:53, Olivier Dion wrote: On Thu, 22 Jun 2023, "Paul E. McKenney" wrote: I suggest C11 volatile atomic load/store. Load/store fusing is permitted for non-volatile atomic loads and stores, and such fusing can ruin your code's entire day. ;-) Good catch. Seems like not a problem on GCC (yet), but Clang is extremely aggressive and seems to do store fusing on some corner cases [0]. I don't think this is an example of store fusing, but rather just that the compiler can eliminate stores to static variables which are otherwise unused, making the entire variable useless. Thanks, Mathieu However, I do not find any simple reproducer of load/store fusing. Do you have example of such fusing, or is this a precaution? In the meantime, back to reading the standard to be certain :-) [0] https://godbolt.org/z/odKG9a75a -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
On 6/22/23 14:32, Paul E. McKenney wrote: On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote: On 6/21/23 19:19, Paul E. McKenney wrote: [...] diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h new file mode 100644 index 000..8e6a9b5 --- /dev/null +++ b/include/urcu/uatomic/builtins-generic.h @@ -0,0 +1,85 @@ +/* + * urcu/uatomic/builtins-generic.h + * + * Copyright (c) 2023 Olivier Dion + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H +#define _URCU_UATOMIC_BUILTINS_GENERIC_H + +#include + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED) + +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED) Does this lose the volatile semantics that the old-style definitions had? Yes. [...] +++ b/include/urcu/uatomic/builtins-x86.h @@ -0,0 +1,85 @@ +/* + * urcu/uatomic/builtins-x86.h + * + * Copyright (c) 2023 Olivier Dion + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef _URCU_UATOMIC_BUILTINS_X86_H +#define _URCU_UATOMIC_BUILTINS_X86_H + +#include + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED) + +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED) And same question here. Yes, this opens interesting questions: * what semantic do we want for uatomic_read/set ? * what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ? * do we want to allow load/store-shared to work on variables larger than a word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure) * what are the guarantees of a volatile type ? * what are the guarantees of a load/store relaxed in C11 ? Does the delta between volatile and C11 relaxed guarantees matter ? Is there an advantage to use C11 load/store relaxed over volatile ? Should we combine both C11 load/store _and_ volatile ? Should we use atomic_signal_fence instead ? I suggest C11 volatile atomic load/store. Load/store fusing is permitted for non-volatile atomic loads and stores, and such fusing can ruin your code's entire day. ;-) I'm OK with erring towards a safer approach, but just out of curiosity, do you have examples of compilers doing load or store fusion on C11 or C++11 relaxed atomics, or is it out of caution due to lack of explicit guarantees in the standards ? Does this lack of guarantee about fusion also apply to other MO such as acquire, release and seq.cst. ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured
On 6/21/23 19:19, Paul E. McKenney wrote: [...] diff --git a/include/urcu/uatomic/builtins-generic.h b/include/urcu/uatomic/builtins-generic.h new file mode 100644 index 000..8e6a9b5 --- /dev/null +++ b/include/urcu/uatomic/builtins-generic.h @@ -0,0 +1,85 @@ +/* + * urcu/uatomic/builtins-generic.h + * + * Copyright (c) 2023 Olivier Dion + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H +#define _URCU_UATOMIC_BUILTINS_GENERIC_H + +#include + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED) + +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED) Does this lose the volatile semantics that the old-style definitions had? Yes. [...] +++ b/include/urcu/uatomic/builtins-x86.h @@ -0,0 +1,85 @@ +/* + * urcu/uatomic/builtins-x86.h + * + * Copyright (c) 2023 Olivier Dion + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef _URCU_UATOMIC_BUILTINS_X86_H +#define _URCU_UATOMIC_BUILTINS_X86_H + +#include + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED) + +#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED) And same question here. Yes, this opens interesting questions: * what semantic do we want for uatomic_read/set ? * what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ? * do we want to allow load/store-shared to work on variables larger than a word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure) * what are the guarantees of a volatile type ? * what are the guarantees of a load/store relaxed in C11 ? Does the delta between volatile and C11 relaxed guarantees matter ? Is there an advantage to use C11 load/store relaxed over volatile ? Should we combine both C11 load/store _and_ volatile ? Should we use atomic_signal_fence instead ? Thanks, Mathieu Thanx, Paul + -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH] Avoid calling caa_container_of on NULL pointer in cds_lfhash macros
On 6/22/23 06:45, Ondřej Surý via lttng-dev wrote: (Sorry, I missed closing brackets in both macros, so resending fixed patch...) The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros would call caa_container_of() macro on NULL pointer. This is not a problem under normal circumstances as the check in the for loop fails and the loop-statement is not called with invalid (pos) value. However AddressSanitizer doesn't like that and complains about this: runtime error: applying non-zero offset 18446744073709551056 to null pointer Move the cds_lfht_iter_get_node(iter) != NULL from the cond-expression of the for loop into both init-clause and iteration-expression as conditional operator and check for (pos) value in the cond-expression instead. I've taken the liberty to reimplement this with a new helper "cds_lfht_entry". Can you review and try the following commits please ? https://review.lttng.org/c/userspace-rcu/+/10445 compiler.h: Introduce caa_unqual_scalar_typeof https://review.lttng.org/c/userspace-rcu/+/10446 Avoid calling caa_container_of on NULL pointer in cds_lfht macros Thanks! Mathieu Signed-off-by: Ondřej Surý --- include/urcu/rculfhash.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/include/urcu/rculfhash.h b/include/urcu/rculfhash.h index fbd33cc..64cc18f 100644 --- a/include/urcu/rculfhash.h +++ b/include/urcu/rculfhash.h @@ -546,22 +546,22 @@ void cds_lfht_resize(struct cds_lfht *ht, unsigned long new_size); #define cds_lfht_for_each_entry(ht, iter, pos, member) \ for (cds_lfht_first(ht, iter), \ - pos = caa_container_of(cds_lfht_iter_get_node(iter), \ - __typeof__(*(pos)), member);\ - cds_lfht_iter_get_node(iter) != NULL; \ + pos = (cds_lfht_iter_get_node(iter) != NULL ? caa_container_of(cds_lfht_iter_get_node(iter), \ + __typeof__(*(pos)), member) : NULL); \ + pos != NULL;\ cds_lfht_next(ht, iter),\ - pos = caa_container_of(cds_lfht_iter_get_node(iter), \ - __typeof__(*(pos)), member)) + pos = (cds_lfht_iter_get_node(iter) != NULL ? caa_container_of(cds_lfht_iter_get_node(iter), \ + __typeof__(*(pos)), member) : NULL)) #define cds_lfht_for_each_entry_duplicate(ht, hash, match, key, \ iter, pos, member) \ for (cds_lfht_lookup(ht, hash, match, key, iter), \ - pos = caa_container_of(cds_lfht_iter_get_node(iter), \ - __typeof__(*(pos)), member);\ - cds_lfht_iter_get_node(iter) != NULL; \ + pos = (cds_lfht_iter_get_node(iter) != NULL ? caa_container_of(cds_lfht_iter_get_node(iter), \ + __typeof__(*(pos)), member) : NULL); \ + pos != NULL;\ cds_lfht_next_duplicate(ht, match, key, iter), \ - pos = caa_container_of(cds_lfht_iter_get_node(iter), \ - __typeof__(*(pos)), member)) + pos = (cds_lfht_iter_get_node(iter) != NULL ? caa_container_of(cds_lfht_iter_get_node(iter), \ + __typeof__(*(pos)), member) : NULL)) #ifdef __cplusplus } -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured
On 6/21/23 20:53, Olivier Dion wrote: On Wed, 21 Jun 2023, "Paul E. McKenney" wrote: On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote: #ifndef cmm_mb #define cmm_mb()__sync_synchronize() Just out of curiosity, why not also implement cmm_mb() in terms of __atomic_thread_fence(__ATOMIC_SEQ_CST)? (Or is that a later patch?) IIRC, Mathieu and I agree that the definition of a thread fence -- acts as a synchronization fence between threads -- is too weak for what we want here. For example, with I/O devices. Although __sync_synchronize() is probably an alias for a SEQ_CST thread fence, its definition -- issues a full memory barrier -- is stronger. We do not want to rely on this assumption (alias) and prefer to rely on the documented definition instead. We should document this rationale with a new comment near the #define, in case anyone mistakenly decides to use a thread fence there to make it similar to the rest of the code in the future. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] I'm still getting empty ust traces using tracef
On 6/20/23 18:02, Brian Hutchinson wrote: On Thu, May 11, 2023 at 2:14 PM Mathieu Desnoyers wrote: On 2023-05-11 14:13, Mathieu Desnoyers via lttng-dev wrote: On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote: ... more background. I've always used ltt in the kernel so I don't have much experience with the user side of it and especially multi-threaded, multi-core so I'm probably missing some fundamental concepts that I need to understand. Which are the exact versions of LTTng-UST and LTTng-Tools you are using now ? (2.13.N or which git commit ?) Also, can you try using lttng-ust stable-2.13 branch, which includes the following commit ? commit be2ca8b563bab81be15cbce7b9f52422369f79f7 Author: Mathieu Desnoyers Date: Tue Feb 21 14:29:49 2023 -0500 Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is included Fix issues with missing symbols in use-cases where tracef.h is included before defining LTTNG_UST_TRACEPOINT_DEFINE, e.g.: #include #define LTTNG_UST_TRACEPOINT_DEFINE #include It is caused by the fact that tracef.h includes tracepoint.h in a context which has LTTNG_UST_TRACEPOINT_DEFINE undefined, and this is not re-evaluated for the following includes. Fix this by lifting the definition code in tracepoint.h outside of the header include guards, and #undef the old LTTNG_UST__DEFINE_TRACEPOINT before re-defining it to its new semantic. Use a new _LTTNG_UST_TRACEPOINT_DEFINE_ONCE include guard within the LTTNG_UST_TRACEPOINT_DEFINE defined case to ensure symbols are not duplicated. Signed-off-by: Mathieu Desnoyers Change-Id: I0ef720435003a7ca0bfcf29d7bf27866c5ff8678 I applied this patch and if I use "tracef" type calls in our application that is made up of a bunch of static libs ... the UST trace calls work. I verified that traces that were called from several different static libs all worked. But as soon as I include a "tracepoint" style tracepoint (that uses trace provider include files etc.) then doing a "lttng list -u" returns "None" for UST events. Is there some kind of rule that says a file can't use both tracef and tracepoint calls? Is there something special you have to do to use tracef and tracepoints in same file? Doing so appears to have broken everything. It should just work. Can you provide a minimal example of the compile unit having this issue ? Also you mention "static libs". Make sure you do *not* define "LTTNG_UST_TRACEPOINT_PROBE_DYNAMIC_LINKAGE" in this case. See the lttng-ust(3) man page for details (section "Statically linking the tracepoint provider"). Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms
On 6/21/23 01:39, Yitschak, Yehuda wrote: On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: Hello, Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveito n*platform1*? Thanks and best regards, Anas. I recommend using "perf" when tracing with the sample program in a loop to figure out the hot spots. With that information on the "fast" and "slow" system, we might be able to figure out what differs. Also, comparing the kernel configurations of the two systems can help. Also comparing the glibc versions of the two systems would be relevant. Also make sure you benchmark the lttng "snapshot" mode [1] to make sure you don't run into a situation where the disk/network I/O throughput cannot cope with the generated event throughput, thus causing the ring buffer to discard events. This would therefore "speed up" tracing from the application perspective because discarding an event is faster than writing it to a ring buffer. You mean we should avoid the "discard" loss mode and use "overwrite" loss mode since discard mode can fake fast performance ? Yes. In addition to use "overwrite-when-buffer-full" mode, the "snapshot" session also ensures that no consumer daemon extracts the trace data (unless an explicit snapshot record is performed), which allows comparing the ring buffer producer performance with minimal noise. If you really want to benchmark the discard-when-buffer-full mode and the the consumer daemon I/O behavior, then you need to take into account event discarded counts and the actual trace data size that was written to disk. Thanks, Mathieu Thanks, Mathieu [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms
On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: Hello, Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform1*? Thanks and best regards, Anas. I recommend using "perf" when tracing with the sample program in a loop to figure out the hot spots. With that information on the "fast" and "slow" system, we might be able to figure out what differs. Also, comparing the kernel configurations of the two systems can help. Also comparing the glibc versions of the two systems would be relevant. Also make sure you benchmark the lttng "snapshot" mode [1] to make sure you don't run into a situation where the disk/network I/O throughput cannot cope with the generated event throughput, thus causing the ring buffer to discard events. This would therefore "speed up" tracing from the application perspective because discarding an event is faster than writing it to a ring buffer. Thanks, Mathieu [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms
On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: Hello, Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform1*? Thanks and best regards, Anas. I recommend using "perf" when tracing with the sample program in a loop to figure out the hot spots. With that information on the "fast" and "slow" system, we might be able to figure out what differs. Also, comparing the kernel configurations of the two systems can help. Also comparing the glibc versions of the two systems would be relevant. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH] Fix: revise urcu_read_lock_update() comment
On 6/13/23 21:51, Li-Kuan Ou wrote: Read-side critical section nesting is tracked in lower-order bits and grace-period phase number use a single high-order bit Merged, thanks! Mathieu Signed-off-by: Li-Kuan Ou --- include/urcu/static/urcu-bp.h | 6 +++--- include/urcu/static/urcu-mb.h | 6 +++--- include/urcu/static/urcu-memb.h | 6 +++--- include/urcu/static/urcu-signal.h | 6 +++--- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h index 8ba3830..b163a90 100644 --- a/include/urcu/static/urcu-bp.h +++ b/include/urcu/static/urcu-bp.h @@ -137,9 +137,9 @@ static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr) /* * Helper for _urcu_bp_read_lock(). The format of urcu_bp_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _urcu_bp_read_lock() nesting, and a lower-order bit that contains either zero - * or URCU_BP_GP_CTR_PHASE. The smp_mb_slave() ensures that the accesses in + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _urcu_bp_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit + * that contains either zero or one. The smp_mb_slave() ensures that the accesses in * _urcu_bp_read_lock() happen before the subsequent read-side critical section. */ static inline void _urcu_bp_read_lock_update(unsigned long tmp) diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h index b97e42a..253d29b 100644 --- a/include/urcu/static/urcu-mb.h +++ b/include/urcu/static/urcu-mb.h @@ -63,9 +63,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_mb_reader); /* * Helper for _urcu_mb_read_lock(). The format of urcu_mb_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _urcu_mb_read_lock() nesting, and a lower-order bit that contains either zero - * or URCU_GP_CTR_PHASE. The cmm_smp_mb() ensures that the accesses in + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _urcu_mb_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit + * that contains either zero or one. The cmm_smp_mb() ensures that the accesses in * _urcu_mb_read_lock() happen before the subsequent read-side critical section. */ static inline void _urcu_mb_read_lock_update(unsigned long tmp) diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h index c8d102f..f64cb57 100644 --- a/include/urcu/static/urcu-memb.h +++ b/include/urcu/static/urcu-memb.h @@ -86,9 +86,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader); /* * Helper for _rcu_read_lock(). The format of urcu_memb_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _rcu_read_lock() nesting, and a lower-order bit that contains either zero - * or URCU_GP_CTR_PHASE. The smp_mb_slave() ensures that the accesses in + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _rcu_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit + * that contains either zero or one. The smp_mb_slave() ensures that the accesses in * _rcu_read_lock() happen before the subsequent read-side critical section. */ static inline void _urcu_memb_read_lock_update(unsigned long tmp) diff --git a/include/urcu/static/urcu-signal.h b/include/urcu/static/urcu-signal.h index c7577d3..707eaf8 100644 --- a/include/urcu/static/urcu-signal.h +++ b/include/urcu/static/urcu-signal.h @@ -64,9 +64,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_signal_reader); /* * Helper for _rcu_read_lock(). The format of urcu_signal_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _rcu_read_lock() nesting, and a lower-order bit that contains either zero - * or URCU_GP_CTR_PHASE. The cmm_barrier() ensures that the accesses in + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _rcu_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit + * that contains either zero or one. The cmm_barrier() ensures that the accesses in * _rcu_read_lock() happen before the subsequent read-side critical section. */ static inline void _urcu_signal_read_lock_update(unsigned long tmp) -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH] Fix: revise urcu_read_lock_update() comment
On 6/13/23 11:45, Li-Kuan Ou via lttng-dev wrote: Read-side critical section nesting is tracked in lower-order bits and grace-period phase number use a single high-order bit Thanks for the fix. Here is a comment below, Signed-off-by: Li-Kuan Ou --- include/urcu/static/urcu-bp.h | 4 ++-- include/urcu/static/urcu-mb.h | 4 ++-- include/urcu/static/urcu-memb.h | 4 ++-- include/urcu/static/urcu-signal.h | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h index 8ba3830..c90c9f1 100644 --- a/include/urcu/static/urcu-bp.h +++ b/include/urcu/static/urcu-bp.h @@ -137,8 +137,8 @@ static inline enum urcu_bp_state urcu_bp_reader_state(unsigned long *ctr) /* * Helper for _urcu_bp_read_lock(). The format of urcu_bp_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _urcu_bp_read_lock() nesting, and a lower-order bit that contains either zero + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _urcu_bp_read_lock() nesting, and a single high-order bit that contains either zero I think it would be clearer to state: Helper for _urcu_bp_read_lock(). The format of urcu_bp_gp.ctr (as well as the per-thread rcu_reader.ctr) has the lower-order bits containing a count of urcu_bp_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit that contains either zero or one. The smp_mb_slave() ensures that the accesses in urcu_bp_read_lock() happen before the subsequent read-side critical section. (likewise for similar comments in other files). Can you submit an updated patch please ? Thanks, Mathieu * or URCU_BP_GP_CTR_PHASE. The smp_mb_slave() ensures that the accesses in * _urcu_bp_read_lock() happen before the subsequent read-side critical section. */ diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h index b97e42a..218e2f3 100644 --- a/include/urcu/static/urcu-mb.h +++ b/include/urcu/static/urcu-mb.h @@ -63,8 +63,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_mb_reader); /* * Helper for _urcu_mb_read_lock(). The format of urcu_mb_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _urcu_mb_read_lock() nesting, and a lower-order bit that contains either zero + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _urcu_mb_read_lock() nesting, and a single high-order bit that contains either zero * or URCU_GP_CTR_PHASE. The cmm_smp_mb() ensures that the accesses in * _urcu_mb_read_lock() happen before the subsequent read-side critical section. */ diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h index c8d102f..b923f73 100644 --- a/include/urcu/static/urcu-memb.h +++ b/include/urcu/static/urcu-memb.h @@ -86,8 +86,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader); /* * Helper for _rcu_read_lock(). The format of urcu_memb_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _rcu_read_lock() nesting, and a lower-order bit that contains either zero + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _rcu_read_lock() nesting, and a single high-order bit that contains either zero * or URCU_GP_CTR_PHASE. The smp_mb_slave() ensures that the accesses in * _rcu_read_lock() happen before the subsequent read-side critical section. */ diff --git a/include/urcu/static/urcu-signal.h b/include/urcu/static/urcu-signal.h index c7577d3..00588b8 100644 --- a/include/urcu/static/urcu-signal.h +++ b/include/urcu/static/urcu-signal.h @@ -64,8 +64,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_signal_reader); /* * Helper for _rcu_read_lock(). The format of urcu_signal_gp.ctr (as well as - * the per-thread rcu_reader.ctr) has the upper bits containing a count of - * _rcu_read_lock() nesting, and a lower-order bit that contains either zero + * the per-thread rcu_reader.ctr) has the lower-order bits containing a count of + * _rcu_read_lock() nesting, and a single high-order bit that contains either zero * or URCU_GP_CTR_PHASE. The cmm_barrier() ensures that the accesses in * _rcu_read_lock() happen before the subsequent read-side critical section. */ -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] Tracing Summit - Last year's 2022 talk recordings are available online!
Hello all, The recordings for last year’s 2022 Tracing Summit talks were just posted to the DiaMon Workgroup channel! 2022 Tracing Summit Talks: https://www.youtube.com/playlist?list=PLuo4E47p5_7YbvyBpSHh-wO3KUVQ81BQR If you did not get the chance to attend last year, we invite you to take a look at the diverse tracing talks that included eBPF and Perfetto developments as well as updates for the core Linux kernel tracers. This year, we’re looking forward to hearing about your new tracing developments and challenging use cases at the 2023 Tracing Summit! If you’re interested in exchanging ideas with experts in state-of-the-art tracing, we invite you to submit a talk proposal soon as the deadline is coming up next week (June 16th). You can submit your 2023 Tracing Summit talk abstract here: https://cfp.tracingsummit.org/ts2023/cfp Best regards, Mathieu The 2023 Tracing Summit will be held in Bilbao, Spain on September 17th and 18th, at the Euskalduna Conference Centre, co-located with Open Source Summit Europe. To register, you can include the Tracing Summit as an add-on to your Open Source Summit ticket or use these links to register solely for the Tracing Summit: https://cvent.me/Gn0nkR (in-person, 80$), https://cvent.me/xywylX (virtual). For more info: https://tracingsummit.org/ The 2023 Tracing Summit is sponsored by EfficiOS and organized by Erica Bugden (EfficiOS), Olivier Dion (EfficiOS), and Mathieu Desnoyers (EfficiOS) on behalf of the Linux Foundation Diagnostic and Monitoring Workgroup. -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng UST 2.12.8/2.13.6 and LTTng modules 2.12.14/2.13.10 tracers
Hi, This is a stable release announcement for the LTTng UST and LTTng modules tracer projects. Those contain mainly bug fixes and add support for recent distributions and upstream kernels. What's new in both LTTng-UST 2.12.8 and 2.13.6: - Fix: use unaligned pointer accesses for lttng_inline_memcpy lttng_inline_memcpy receives pointers which can be unaligned. This causes issues (traps) specifically on arm 32-bit with 8-byte strings (including \0). - Fix: trace events in C constructors/destructors Adding a priority (150) to the tracepoint and tracepoint provider constructors/destructors ensures that we trace tracepoints located within C constructors/destructors with a higher priority value, including the default init priority of 65535, when the tracepoint vs tracepoint definition vs tracepoint probe provider are in different compile units (and in various link order one compared to another). - Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is included Fix issues with missing symbols in use-cases where tracef.h is included before defining LTTNG_UST_TRACEPOINT_DEFINE - Fix: segmentation fault on filter interpretation in "switch" mode Fix a bytecode interpreter crash when building with INTERPRETER_USE_SWITCH defined (used mainly for debugging purposes). What's new specifically in LTTng-UST 2.13.6: - Fix: `ip` context is expressed as a base-10 field The base for UST context field `ip` was changed from 16 (hexadecimal) to 10 (decimal), most likely an unintentional copy error in 4e48b5d. - Various fixes to build with -std=c99. - Fix: trace events in C++ constructors/destructors Wrap constructor and destructor functions to invoke them as functions with the constructor/destructor GNU C attributes, which ensures that those constructors/destructors are ordered before/after C++ constructors/destructors. What's new in LTTng modules 2.12.14 and 2.13.10: - fix: kallsyms wrapper on CONFIG_PPC64_ELF_ABI_V1 Work-around PPC64 ELF ABI v1 function descriptor issues when using kallsyms. - Add support for RHEL 9.0 and 9.1. What's new specifically in LTTng modules 2.12.14: - Various tracepoint instrumentation fixes to support kernel v5.18. What's new specifically in LTTng modules 2.13.10: - Various tracepoint instrumentation fixes to support kernel v6.3. Feedback is welcome! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-18 15:20, Brian Hutchinson wrote: On Thu, May 18, 2023 at 3:07 PM Brian Hutchinson wrote: On Thu, May 18, 2023 at 3:03 PM Mathieu Desnoyers wrote: On 2023-05-18 14:58, Brian Hutchinson wrote: On Thu, May 18, 2023 at 11:00 AM Brian Hutchinson wrote: On Thu, May 18, 2023 at 10:45 AM Mathieu Desnoyers wrote: On 2023-05-18 10:10, Brian Hutchinson wrote: [...] I updated my hello world to have a function I'd like to use the --userspace-probe method on with the very original name of 'probe_function': #include #include void probe_function(int i); int main(int argc, char *argv[]) { unsigned int i; puts("Hello, World!\nPress Enter to continue..."); /* * The following getchar() call only exists for the purpose of this * demonstration, to pause the application in order for you to have * time to list its tracepoints. You don't need it otherwise. */ getchar(); lttng_ust_tracef("Number %d, string %s", 23, "hi there!"); printf("Number %d, string %s", 23, "hi there!"); for (i = 0; i < argc; i++) { lttng_ust_tracef("Number %d, argv %s", i, argv[i]); printf("Number %d, argv %s", i, argv[i]); } puts("Quitting now!"); probe_function(i); return 0; } void probe_function(int i) { lttng_ust_tracef("Number %d, string %s", i * i, "i^2"); printf("Number %d, string %s", i * i, "i^2"); } ... and I get the same error as before when I try to enable the probe: # lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function Error: Missing event name(s). As the error states, you are missing the event name. See man 1 lttng-enable-event lttng [GENERAL OPTIONS] enable-event --kernel [--probe=SOURCE | --function=SOURCE | --syscall | --userspace-probe=SOURCE] [--filter=EXPR] [--session=SESSION] [--channel=CHANNEL] EVENT[,EVENT]... You will want something like: lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function my_probe_function Where "my_probe_function" is the event name that will appear in the collected traces. Wow! I must not have woken up this morning ha, ha. Thanks for that! The event is enabled now. Hope to actually get tracing data now. Well, I guess we just have the app that thwarts all attempts at tracing. I did a dynamic probe on several functions that should be getting called like crazy and again I get no tracing data. Tried it with my hello world example above after Mathieu set me straight on the event syntax and it works. I saw this comment in the documentation "As of this version, only USDT probes that are not surrounded by a reference counter (semaphore) are supported." I don't know that I can say that this function I'm probing isn't "surrounded" by a reference counter, it's in a large multi-threaded application so I guess it's possible. Sigh, I'm striking out every which way. No offense (since this is lttng list - please don't flame me ... I want/need lttng), but I think I'm going to try just straight kprobes and uprobes and see if trace compass can show those traces in an attempt to get "something/anything" working. If you attach to an ELF symbol (function), then there is no USDT in play, so it should not be related to the issue you have. That is what I was thinking which is why I wanted to try it. But if your functions happen to be inlined, then there will be nothing to attach to. Perhaps this is what happens there ? I don't see any evidence of anything being inlined in this module. I grepped the code to verify. Back to being stumped/stuck. I can do trace-cmd stuff and it works. The hello world above works so I don't "think" this is a problem but again in full disclosure I'll mention/ask about it. Does any of the lttng tools/libs depend on kernel headers? I ask because old yocto (Dunfell) built lttng package against a 4.something kernel and we're running a 5.10.69 kernel that lttng modules were added to it with the "builtin" script and built that way. Should probably have yocto build the local kernel too, but kernel is being built stand alone due to vendor stuff that hasn't been mainlined yet. I'm running out of things to think about that could be the issue. If lttng-modules can trace your smaller test application through uprobes, then the problem is likely elsewhere. Only lttng-modules has dependencies on kernel headers. lttng-tools/ust don't depend on kernel headers. Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-18 15:07, Brian Hutchinson wrote: [...] If you attach to an ELF symbol (function), then there is no USDT in play, so it should not be related to the issue you have. That is what I was thinking which is why I wanted to try it. But if your functions happen to be inlined, then there will be nothing to attach to. Perhaps this is what happens there ? I don't see any evidence of anything being inlined in this module. I grepped the code to verify. Back to being stumped/stuck. Make sure to check the resulting assembler and ELF symbol tables. The compiler is free to inline various functions unless they are explicitly marked as __attribute__((noinline)). Also, if LTO is enabled, further optimization can be done at link-time. One purpose of the UST tracepoints is to be less fragile with respect to specific optimizations done by the compiler and linker, thus guaranteeing that whatever is instrumented with a tracepoint is indeed available for tracing. Also, double-check that the path you pass to --userspace-probe really targets your executable or .so binary file, and is not just a symbolic link. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-18 14:58, Brian Hutchinson wrote: On Thu, May 18, 2023 at 11:00 AM Brian Hutchinson wrote: On Thu, May 18, 2023 at 10:45 AM Mathieu Desnoyers wrote: On 2023-05-18 10:10, Brian Hutchinson wrote: [...] I updated my hello world to have a function I'd like to use the --userspace-probe method on with the very original name of 'probe_function': #include #include void probe_function(int i); int main(int argc, char *argv[]) { unsigned int i; puts("Hello, World!\nPress Enter to continue..."); /* * The following getchar() call only exists for the purpose of this * demonstration, to pause the application in order for you to have * time to list its tracepoints. You don't need it otherwise. */ getchar(); lttng_ust_tracef("Number %d, string %s", 23, "hi there!"); printf("Number %d, string %s", 23, "hi there!"); for (i = 0; i < argc; i++) { lttng_ust_tracef("Number %d, argv %s", i, argv[i]); printf("Number %d, argv %s", i, argv[i]); } puts("Quitting now!"); probe_function(i); return 0; } void probe_function(int i) { lttng_ust_tracef("Number %d, string %s", i * i, "i^2"); printf("Number %d, string %s", i * i, "i^2"); } ... and I get the same error as before when I try to enable the probe: # lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function Error: Missing event name(s). As the error states, you are missing the event name. See man 1 lttng-enable-event lttng [GENERAL OPTIONS] enable-event --kernel [--probe=SOURCE | --function=SOURCE | --syscall | --userspace-probe=SOURCE] [--filter=EXPR] [--session=SESSION] [--channel=CHANNEL] EVENT[,EVENT]... You will want something like: lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function my_probe_function Where "my_probe_function" is the event name that will appear in the collected traces. Wow! I must not have woken up this morning ha, ha. Thanks for that! The event is enabled now. Hope to actually get tracing data now. Well, I guess we just have the app that thwarts all attempts at tracing. I did a dynamic probe on several functions that should be getting called like crazy and again I get no tracing data. Tried it with my hello world example above after Mathieu set me straight on the event syntax and it works. I saw this comment in the documentation "As of this version, only USDT probes that are not surrounded by a reference counter (semaphore) are supported." I don't know that I can say that this function I'm probing isn't "surrounded" by a reference counter, it's in a large multi-threaded application so I guess it's possible. Sigh, I'm striking out every which way. No offense (since this is lttng list - please don't flame me ... I want/need lttng), but I think I'm going to try just straight kprobes and uprobes and see if trace compass can show those traces in an attempt to get "something/anything" working. If you attach to an ELF symbol (function), then there is no USDT in play, so it should not be related to the issue you have. But if your functions happen to be inlined, then there will be nothing to attach to. Perhaps this is what happens there ? Mathieu Regards, Brian -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-18 10:10, Brian Hutchinson wrote: [...] I updated my hello world to have a function I'd like to use the --userspace-probe method on with the very original name of 'probe_function': #include #include void probe_function(int i); int main(int argc, char *argv[]) { unsigned int i; puts("Hello, World!\nPress Enter to continue..."); /* * The following getchar() call only exists for the purpose of this * demonstration, to pause the application in order for you to have * time to list its tracepoints. You don't need it otherwise. */ getchar(); lttng_ust_tracef("Number %d, string %s", 23, "hi there!"); printf("Number %d, string %s", 23, "hi there!"); for (i = 0; i < argc; i++) { lttng_ust_tracef("Number %d, argv %s", i, argv[i]); printf("Number %d, argv %s", i, argv[i]); } puts("Quitting now!"); probe_function(i); return 0; } void probe_function(int i) { lttng_ust_tracef("Number %d, string %s", i * i, "i^2"); printf("Number %d, string %s", i * i, "i^2"); } ... and I get the same error as before when I try to enable the probe: # lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function Error: Missing event name(s). As the error states, you are missing the event name. See man 1 lttng-enable-event lttng [GENERAL OPTIONS] enable-event --kernel [--probe=SOURCE | --function=SOURCE | --syscall | --userspace-probe=SOURCE] [--filter=EXPR] [--session=SESSION] [--channel=CHANNEL] EVENT[,EVENT]... You will want something like: lttng enable-event --kernel --userspace-probe=/usr/local/bin/hello:probe_function my_probe_function Where "my_probe_function" is the event name that will appear in the collected traces. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-17 12:37, Brian Hutchinson wrote: On Wed, May 17, 2023 at 12:08 PM Mathieu Desnoyers wrote: On 2023-05-16 22:11, Brian Hutchinson via lttng-dev wrote: Hi, I'm trying to figure out how to use uprobes with lttng. I can't use a normal uprobe for a line number just using the address I want to probe obtained from objdump? As in: echo 'p /usr/local/bin/my_app:0x2c3a8' >> /sys/kernel/debug/tracing/uprobe_events ... which isn't a function entry, it's just a line of code I want to probe on. This link says it has to be elf or sdt: https://lttng.org/man/1/lttng-enable-event/v2.11/#doc-opt--userspace-probe So can I not probe on just a line of code by specifying an address??? It doesn't look like these methods above will do what I'm wanting to do. I've tried to find examples of using enable-event --kernel --userspace-probe= but there doesn't appear to be many. There are examples here: https://lttng.org/docs/v2.13/#doc-enabling-disabling-events Indeed inserting a lttng-modules uprobe within functions is not supported at the moment, mainly because we prefer to err towards safety and don't have the validation in place to prevent corrupting the program's instructions if an end user would try to insert a uprobe at an address which is not an instruction boundary. Hmm, was really hoping to be able to do dynamic tracing without having to modify code. uprobes with the proper validations about instruction boundaries would eventually provide this. Another approach we want to invest time in is to integrate libpatch from Olivier Dion into lttng-ust. This would provide dynamic instrumentation with the performance of a purely userspace tracer. But those are all things that were never prioritized by any of our customers, so they progress at a "back burner" pace. I guess if I add a function call to a debug statement or something at the point I want to probe then I could use the elf example. Yes. So we only support inserting uprobe on functions and SDT probes at the moment. I've heard of system tap but never used it. Will have to look into that. I really want to get lttng-ust working but I'm getting pushback on the time I'm spending trying to get it to work ... and would really like to demonstrate something (was hoping kernel events and uprobes) quickly to an audience that knows nothing about lttng or full stack tracing to gain "buy in" for the effort. Understood. The main thing we are missing to help you on the UST front is a console log of the _application_ with LTTNG_UST_DEBUG=1. I suspect it is not collected in your tests. Thanks, Mathieu You know, those pesky things called schedules. Thanks, Brian Thanks, Mathieu Thanks, Brian ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=
On 2023-05-16 22:11, Brian Hutchinson via lttng-dev wrote: Hi, I'm trying to figure out how to use uprobes with lttng. I can't use a normal uprobe for a line number just using the address I want to probe obtained from objdump? As in: echo 'p /usr/local/bin/my_app:0x2c3a8' >> /sys/kernel/debug/tracing/uprobe_events ... which isn't a function entry, it's just a line of code I want to probe on. This link says it has to be elf or sdt: https://lttng.org/man/1/lttng-enable-event/v2.11/#doc-opt--userspace-probe So can I not probe on just a line of code by specifying an address??? It doesn't look like these methods above will do what I'm wanting to do. I've tried to find examples of using enable-event --kernel --userspace-probe= but there doesn't appear to be many. There are examples here: https://lttng.org/docs/v2.13/#doc-enabling-disabling-events Indeed inserting a lttng-modules uprobe within functions is not supported at the moment, mainly because we prefer to err towards safety and don't have the validation in place to prevent corrupting the program's instructions if an end user would try to insert a uprobe at an address which is not an instruction boundary. So we only support inserting uprobe on functions and SDT probes at the moment. Thanks, Mathieu Thanks, Brian ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] Tracing Summit 2023 Announcement and CFP
Hello all! This is a Call for Proposals for the Tracing Summit 2023[0] which will be held in Bilbao, Spain on the 17th and 18th of September, 2023. This year, the event is co-located with Open Source Summit Europe 2023 [1]. - Event dates: Sunday, September 17th - Monday, September 18th - Location: Bilbao, Spain and virtually (co-located with Open Source Summit Europe) - Registration cost - In person: $80.00 USD (Free for speakers) - Virtual: Free - Call for proposals link: [2] Important dates: - Call for proposals close: Friday, June 16th, at 11:59PM EDT - Call for proposals notifications: Friday, June 23rd - Schedule announcement: Tuesday, June 27th - Event dates: Sunday, September 17th - Monday, September 18th Stand-alone registration is expected to open next week. In the meantime, you can subscribe to the mailing list to get the latest information on the event: [3] The 2023 Tracing Summit is a two-day, single-track conference on the topic of tracing. The event focuses on software and hardware tracing, gathering developers and end-users of tracing and trace analysis tools. The main goal of the Tracing Summit is to provide space for discussion between people of the various areas that benefit from tracing, namely parallel, distributed and/or real-time systems, as well as kernel development. We are welcoming 30 minute presentations from both end users and developers, on topics covering, but not limited to: - Investigation workflow of real-time, latency, and throughput issues, - Trace collection and extraction, - Trace filtering, - Trace aggregation, - Trace formats, - Tracing multi-core systems, - Trace abstraction, - Trace modeling, - Automated trace analysis (e.g. dependency analysis), - Tracing large clusters and distributed systems, - Hardware-level tracing (e.g. DSP, GPU, bare-metal), - Trace visualization, - Interaction between debugging and tracing, - Tracing remote control, - Analysis of large trace datasets, - Cloud trace collection and analysis, - Integration between trace tools, - Live tracing & monitoring, - Dynamic instrumentation, - Programmable tracing (e.g. eBPF). Talks can cover recently available technologies, ongoing work, and yet non-existing technologies (that are compelling to end-users). Talks covering interesting or challenging tracing use cases are also welcome as they can reveal future directions or tooling needs. Please understand that this open forum is not the proper place to present sales or marketing pitches, nor technologies which are prevented from being freely used in open source. Please send any questions about this conference to . This event is organized by EfficiOS on behalf of the Linux Foundation Diagnostic and Monitoring Workgroup [4]. The organizers of this event are Mathieu Desnoyers (EfficiOS), Erica Bugden (EfficiOS) and Olivier Dion (EfficiOS). [0]: https://tracingsummit.org [1]: https://events.linuxfoundation.org/open-source-summit-europe/ [2]: https://cfp.tracingsummit.org/ts2023/cfp [3]: https://eepurl.com/goakfv [4]: https://diamon.org/ -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] I'm still getting empty ust traces using tracef
On 2023-05-12 10:52, Brian Hutchinson wrote: Hi Mathieu, On Fri, May 12, 2023 at 9:33 AM Mathieu Desnoyers wrote: On 2023-05-12 00:10, Brian Hutchinson wrote: Hmm, I missed this earlier somehow. So, I'm not the greatest at updating OE and Yocto recipes. I'm currently using this recipe: http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-kernel/lttng/lttng-ust_2.13.5.bb?h=master ... and it looks like the commit you are talking about is newer. I always think, oh, I'll just update the source URI in the recipe but it's never that simple ... and there are patches in the recipe etc. I've got a sdk (external toolchain) built for my embedded platform. Would it be too hard to just download stable-2.13 of everything and cross compile it outside of Yocto? What do you suggest? And do I need to do anything besides just get 2.13 stable working? I was kind of confused if I need to put a #define LTTNG_TRACEPOINT_DEFINE somewhere in my code. I'm not using a tracepoint provider packages at this point Hi Brian, You might want to provide a trimmed-down reproducer of your issue: example .c compile unit instrumented with tracepoints, example .c compile unit containing the tracepoint probes, and the log of the console when this application is run with LTTNG_UST_DEBUG=1. The code has two different areas where I'm trying to use tracef. The way the app is put together, each of these areas end up becoming static libs that all get lumped together to make the final executable (which is then linked with -llttng-ust and -ldl). If I'm reading between the lines correctly with respect to the commit you pointed out (that I'm missing), if I reduce the inclusion of I #include to one instance (like with the hello world that worked), I'm thinking the version I have might work. I don't know how I could trim down the large multi threaded app I'm trying to debug to share. Another dynamic I should mention in full disclosure, the app in question has been ported from a different OS and was on a single core cpu. The new host ( imx8) is a quad core A53 and since the app wasn't written for multicore, the cpu's are isolated and systemd is starting the app on cpu 0 but once it's up it switches it's affinity to cpu 1 so I don't know if that's a factor here or not so just mentioning it. I did try with LTTNG_UST_DEBUG=1 last night and it didn't put out much: export LTTNG_UST_DEBUG=1 # systemctl start my_app I suspect that because you run your application under systemctl, we are not seeing the console output from the application. The console output below appears to come from liblttng-ust-ctl.so linked within lttng-sessiond/consumerd, not the application. Can you find a way to run your application and capture the console output ? Thanks, Mathieu #lttng create my_tc_trace --output=/tmp/my_tc_trace Spawning a session daemon libringbuffer-clients[711/711] : LTT : ltt ring buffer client "relay-metadata-mmap" init (in lttng_ring_buffer_metadata_client_init() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/metadata-template.h:364) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-overwrite-mmap" init (in lttng_ring_buffer_client_overwrite_init() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-overwrite-rt-mmap" init (in lttng_ring_buffer_client_overwrite_rt_init() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-discard-mmap" init (in lttng_ring_buffer_client_discard_init() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-discard-rt-mmap" init (in lttng_ring_buffer_client_discard_rt_init() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826) [ 179.384456] LTTng: Loaded modules v2.13.9 (Nordicit�é) [ 179.390366] LTTng: Experimental bitwise enum enabled. libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-discard-rt-mmap" exit (in lttng_ring_buffer_client_discard_rt_exit() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-discard-mmap" exit (in lttng_ring_buffer_client_discard_exit() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-overwrite-rt-mmap" exit (in lttng_ring_buffer_client_overwrite_rt_exit() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833) libringbuffer-clients[711/711]: LTT : ltt ring buffer client "relay-overwrite-mmap" exit (in lttng_ring_buffer_client_overwrite_exit() at ../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:83
Re: [lttng-dev] I'm still getting empty ust traces using tracef
[adding back the mailing list] On 2023-05-12 09:33, Mathieu Desnoyers wrote: On 2023-05-12 00:10, Brian Hutchinson wrote: Hmm, I missed this earlier somehow. So, I'm not the greatest at updating OE and Yocto recipes. I'm currently using this recipe: http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-kernel/lttng/lttng-ust_2.13.5.bb?h=master ... and it looks like the commit you are talking about is newer. I always think, oh, I'll just update the source URI in the recipe but it's never that simple ... and there are patches in the recipe etc. I've got a sdk (external toolchain) built for my embedded platform. Would it be too hard to just download stable-2.13 of everything and cross compile it outside of Yocto? What do you suggest? And do I need to do anything besides just get 2.13 stable working? I was kind of confused if I need to put a #define LTTNG_TRACEPOINT_DEFINE somewhere in my code. I'm not using a tracepoint provider packages at this point Hi Brian, You might want to provide a trimmed-down reproducer of your issue: example .c compile unit instrumented with tracepoints, example .c compile unit containing the tracepoint probes, and the log of the console when this application is run with LTTNG_UST_DEBUG=1. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] I'm still getting empty ust traces using tracef
On 2023-05-11 14:13, Mathieu Desnoyers via lttng-dev wrote: On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote: ... more background. I've always used ltt in the kernel so I don't have much experience with the user side of it and especially multi-threaded, multi-core so I'm probably missing some fundamental concepts that I need to understand. Which are the exact versions of LTTng-UST and LTTng-Tools you are using now ? (2.13.N or which git commit ?) Also, can you try using lttng-ust stable-2.13 branch, which includes the following commit ? commit be2ca8b563bab81be15cbce7b9f52422369f79f7 Author: Mathieu Desnoyers Date: Tue Feb 21 14:29:49 2023 -0500 Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is included Fix issues with missing symbols in use-cases where tracef.h is included before defining LTTNG_UST_TRACEPOINT_DEFINE, e.g.: #include #define LTTNG_UST_TRACEPOINT_DEFINE #include It is caused by the fact that tracef.h includes tracepoint.h in a context which has LTTNG_UST_TRACEPOINT_DEFINE undefined, and this is not re-evaluated for the following includes. Fix this by lifting the definition code in tracepoint.h outside of the header include guards, and #undef the old LTTNG_UST__DEFINE_TRACEPOINT before re-defining it to its new semantic. Use a new _LTTNG_UST_TRACEPOINT_DEFINE_ONCE include guard within the LTTNG_UST_TRACEPOINT_DEFINE defined case to ensure symbols are not duplicated. Signed-off-by: Mathieu Desnoyers Change-Id: I0ef720435003a7ca0bfcf29d7bf27866c5ff8678 Thanks, Mathieu Thanks, Mathieu Regards, Brian On Thu, May 11, 2023 at 11:53 AM Brian Hutchinson wrote: Hi, I posted a while ago (thread - Using lttng 2.11 and UST doesn't appear to work - getting empty trace files) about this problem I'm having with getting empty trace logs. I've since upgraded to lttng v2.13 and while I can do a simple hello world program with tracef and get events in the log files, my more complicated large multi-threaded app I'm trying to debug is still getting empty log file traces. I can list the user space events in my app. Next I do: lttng enable-event --userspace 'lttng_ust_tracef:*' ... to enable the events, start lttng, start my app, and I get a trace directory structure that's empty. I feel like I've read every thread in the archives about people having the same problem. I did try using LD_PRELOAD with various libs thinking that was the problem but so far I'm still getting empty traces. So far I've tried: LD_PRELOAD=liblttng-ust-libc-wrapper.so.1:liblttng-ust-pthread-wrapper.so.1:liblttng-ust-dl.so.1:liblttng-ust-fork.so.1:liblttng-ust-fd.so.1 /usr/local/bin/my_app I guess one question I have is how do I determine which "helper libs" I need to preload? The application I'm working on is made up of a bunch of smaller static libs linked together into one big executable and that is linked with -llttng-ust and -ldl. I'm pretty stuck at the moment. Anyone have any wisdom on what I might be doing wrong or how I can tell why I'm not getting events in the logs? Thanks, Brian ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] I'm still getting empty ust traces using tracef
On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote: ... more background. I've always used ltt in the kernel so I don't have much experience with the user side of it and especially multi-threaded, multi-core so I'm probably missing some fundamental concepts that I need to understand. Which are the exact versions of LTTng-UST and LTTng-Tools you are using now ? (2.13.N or which git commit ?) Thanks, Mathieu Regards, Brian On Thu, May 11, 2023 at 11:53 AM Brian Hutchinson wrote: Hi, I posted a while ago (thread - Using lttng 2.11 and UST doesn't appear to work - getting empty trace files) about this problem I'm having with getting empty trace logs. I've since upgraded to lttng v2.13 and while I can do a simple hello world program with tracef and get events in the log files, my more complicated large multi-threaded app I'm trying to debug is still getting empty log file traces. I can list the user space events in my app. Next I do: lttng enable-event --userspace 'lttng_ust_tracef:*' ... to enable the events, start lttng, start my app, and I get a trace directory structure that's empty. I feel like I've read every thread in the archives about people having the same problem. I did try using LD_PRELOAD with various libs thinking that was the problem but so far I'm still getting empty traces. So far I've tried: LD_PRELOAD=liblttng-ust-libc-wrapper.so.1:liblttng-ust-pthread-wrapper.so.1:liblttng-ust-dl.so.1:liblttng-ust-fork.so.1:liblttng-ust-fd.so.1 /usr/local/bin/my_app I guess one question I have is how do I determine which "helper libs" I need to preload? The application I'm working on is made up of a bunch of smaller static libs linked together into one big executable and that is linked with -llttng-ust and -ldl. I'm pretty stuck at the moment. Anyone have any wisdom on what I might be doing wrong or how I can tell why I'm not getting events in the logs? Thanks, Brian ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] https://lists.lttng.org/pipermail/lttng-dev/2020-May/029631.html
On 2023-03-26 11:00, yashvardhan kukreti wrote: Hi Mathew, I have a question about this patch for lttng-modules and the use of register_kprobe() to fetch the function ptr. The question in this regard is especially from PPC64 ELF_ABI_v1 perspective. The functions on PPC64 are accessed via the Function descriptor while what register_kprobes returns is the entry point of the function. Hence using the return pointer tends to interpret the addr as the address of the function descriptor and dereferences the ppc_inst as the function entry point and crashes [ 4145.483594] kernel tried to execute exec-protected page (7c0802a6fb81ffe0) - exploit attempt? (uid: 0) here 7c0802a6 is the mfspr instruction from the code text section of the kallsyms_lookup_name() note for PPC_ELF_ABI_v1 the register_kprobes() searches for the dot variant of the symbol and only in case if cannot find the dot variant looks for the normal symbol. register_kprobe() -> kprobe_addr() -> kprobe_lookup_name() [arch variant replaces weak symbol] https://elixir.bootlin.com/linux/v5.10.174/C/ident/kprobe_lookup_name <https://elixir.bootlin.com/linux/v5.10.174/C/ident/kprobe_lookup_name> Please let me know if i make sense or that i may have missed something. I have looked at the code of 2.12.8 as well and 2.12.3 verstion of lttng-modules. Please have a look at commits (from stable-2.12 branch of lttng-modules): commit 53772db24facd84f1f3ddcf21a1ef5f162608721 Author: He Zhe Date: Tue Sep 27 15:59:42 2022 +0800 wrapper: powerpc64: fix kernel crash caused by do_get_kallsyms commit 8fe888d86ccad4226b05a536efb73d71bb091062 Author: Michael Jeanson Date: Thu Nov 24 14:25:33 2022 -0500 fix: kallsyms wrapper on ppc64el I suspect you'll also need this change currently in review: https://review.lttng.org/c/lttng-modules/+/9113 Please let us know if especially this last change fixes things on your side. Thanks, Mathieu Regards, Shashank -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
nqueue. */ while ((next = CMM_LOAD_SHARED(node->next)) == NULL) { if (___cds_wfcq_busy_wait(, blocking)) return CDS_WFCQ_WOULDBLOCK; } return next; } So the release semantic is provided by the implicit SEQ_CST barrier in: ___cds_wfcq_append(): old_tail = uatomic_xchg(>p, new_tail); (release) and the acquire semantic is provided by the implicit SEQ_CST barrier in: ___cds_wfcq_splice(): /* * Memory barrier implied before uatomic_xchg() orders store to * src_q->head before store to src_q->tail. This is required by * concurrent enqueue on src_q, which exchanges the tail before * updating the previous tail's next pointer. */ tail = uatomic_xchg(_q_tail->p, _q_head->node); Notice how the release/acquire semantic is provided by tail->p, which is atomically modified _before_ we set the node->next pointer. With this information, is there a specific annotation that would make sense ? Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
he stack for other uses. So somehow we should add an annotation about the lifetime of this object, which begins with DEFINE_URCU_WAIT_QUEUE() and ends right after "urcu_posix_assert(uatomic_read(>state) & URCU_WAIT_TEARDOWN);". Thanks, Mathieu which lead me to the fact that ThreadSanitizer doesn't intercept futex, but we can annotate the futexes: https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4 Oh boy... Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] RCU API usage from call_rcu callbacks?
On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote: Hi, the documentation is pretty silent on this, and asking here is probably going to be faster than me trying to use the source to figure this out. Is it legal to call_rcu() from within the call_rcu() callback? Yes. call_rcu callbacks can be chained. Note that you'll need to issue rcu_barrier() on program exit as many times as you chained call_rcu callbacks if you intend to make sure no queued callbacks still exist on program clean shutdown. See this comment above urcu_call_rcu_exit(): * Teardown the default call_rcu worker thread if there are no queued * callbacks on process exit. This prevents leaking memory. * * Here is how an application can ensure graceful teardown of this * worker thread: * * - An application queuing call_rcu callbacks should invoke * rcu_barrier() before it exits. * - When chaining call_rcu callbacks, the number of calls to * rcu_barrier() on application exit must match at least the maximum * number of chained callbacks. * - If an application chains callbacks endlessly, it would have to be * modified to stop chaining callbacks when it detects an application * exit (e.g. with a flag), and wait for quiescence with rcu_barrier() * after setting that flag. * - The statements above apply to a library which queues call_rcu * callbacks, only it needs to invoke rcu_barrier in its library * destructor. What about the other RCU (and CDS) API calls? They can be unless stated otherwise. For instance, rcu_barrier() cannot be called from a call_rcu worker thread. How does that interact with create_call_rcu_data()? I have event loops and I am initializing 1:1 call_rcu helper threads as I need to do some per-thread initialization as some of the destroy-like functions use random numbers (don't ask). As I recall, set_thread_call_rcu_data() will associate a call_rcu worker instance for the current thread. So all following call_rcu() invocations from that thread will be queued into this per-thread call_rcu queue, and handled by the call_rcu worker thread. But I wonder why you inherently need this 1:1 mapping, rather than using the content of the structure containing the rcu_head to figure out which per-thread data should be used ? If you manage to separate the context from the worker thread instances, then you could use per-cpu call_rcu worker threads, which will eventually scale even better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids [1]. If it's legal to call_rcu() from call_rcu thread, which thread is going to be used? The call_rcu invoked from the call_rcu worker thread will queue the call_rcu callback onto the queue handled by that worker thread. It does so by setting URCU_TLS(thread_call_rcu_data) = crdp; early in call_rcu_thread(). So any chained call_rcu is handled by the same call_rcu worker thread doing the chaining, with the exception of teardown where the pending callbacks are moved to the default worker thread. Thanks, Mathieu [1] https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoy...@efficios.com/ Thank you, Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Fwd: how to disable local file writing in relayd?
On 2023-03-22 02:39, Yuan Bin via lttng-dev wrote: Can I disable local-file-writing in lttng-relayd to avoid the disk space overhead, only using it as a live viewer? I am not sure why you bump this email thread. I already answered here. Perhaps you did not receive my reply ? https://lists.lttng.org/pipermail/lttng-dev/2023-March/030358.html Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
On 2023-03-20 15:38, Duncan Sands via lttng-dev wrote: Hi Mathieu, While OK for the general case, I would recommend that we immediately implement something more efficient on x86 32/64 which takes into account that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK prefixed atomic ops, which imply the barrier already, leaving the before/after_uatomic_*() as no-ops. maybe first check whether the GCC optimizers merge them. I believe some optimizations of atomic primitives are allowed and implemented, but I couldn't say which ones. Best wishes, Duncan. Tested on godbolt.org with: int a; void fct(void) { (void) __atomic_add_fetch(, 1, __ATOMIC_RELAXED); __atomic_thread_fence(__ATOMIC_SEQ_CST); } x86-64 gcc 12.2 -O2 -std=c11: fct: lock addDWORD PTR a[rip], 1 lock or QWORD PTR [rsp], 0 ret a: .zero 4 x86-64 clang 16.0.0 -O2 -std=c11: fct:# @fct lockinc dword ptr [rip + a] mfence ret a: .long 0 So none of gcc/clang optimize this today, hence the need for an x86-specific implementation. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
On 2023-03-21 10:51, Ondřej Surý via lttng-dev wrote: When adding REMOVED_FLAG to the pointers in the rculfhash implementation, retype the generic pointer to unsigned long to fix the following compiler error: You will need to update the patch subject as well. Thanks, Mathieu rculfhash.c:1201:2: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(>next, REMOVED_FLAG); ^ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ rculfhash.c:1444:3: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(_bucket->next, REMOVED_FLAG); ^~~~ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ This was not a problem before because the way the uatomic_or was implemented, but now we directly pass the addr to __atomic_or_fetch() and the compiler doesn't like the implicit conversion from pointer to pointer to integer. Signed-off-by: Ondřej Surý --- src/rculfhash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rculfhash.c b/src/rculfhash.c index b456415..5292725 100644 --- a/src/rculfhash.c +++ b/src/rculfhash.c @@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size, * Knowing which wins the race will be known after the garbage * collection phase, stay tuned! */ - uatomic_or(>next, REMOVED_FLAG); + uatomic_or((unsigned long *)>next, REMOVED_FLAG); /* We performed the (logical) deletion. */ /* @@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i, dbg_printf("remove entry: order %lu index %lu hash %lu\n", i, j, j); /* Set the REMOVED_FLAG to freeze the ->next for gc */ - uatomic_or(_bucket->next, REMOVED_FLAG); + uatomic_or((unsigned long *)_bucket->next, REMOVED_FLAG); _cds_lfht_gc_bucket(parent_bucket, fini_bucket); } ht->flavor->read_unlock(); -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers with __atomic builtins
: : : "memory") - -#define cmm_mb() membar_safe("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad") -#define cmm_rmb() membar_safe("#LoadLoad") -#define cmm_wmb() membar_safe("#StoreStore") Same comment as for ppc. - #ifdef __cplusplus } #endif diff --git a/include/urcu/arch/x86.h b/include/urcu/arch/x86.h index 744f9f9..af4487d 100644 --- a/include/urcu/arch/x86.h +++ b/include/urcu/arch/x86.h @@ -46,44 +46,8 @@ extern "C" { /* For backwards compat */ #define CONFIG_RCU_HAVE_FENCE 1 -#define cmm_mb()__asm__ __volatile__ ("mfence":::"memory") - -/* - * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when - * using SSE or working with I/O areas. cmm_smp_rmb/cmm_smp_wmb are - * only compiler barriers, which is enough for general use. - */ -#define cmm_rmb() __asm__ __volatile__ ("lfence":::"memory") -#define cmm_wmb() __asm__ __volatile__ ("sfence"::: "memory") -#define cmm_smp_rmb() cmm_barrier() -#define cmm_smp_wmb() cmm_barrier() Relying on the generic barrier for rmb and wmb would slow things down on x86, we may want to do like I suggest for ppc. - -#else - -/* - * We leave smp_rmb/smp_wmb as full barriers for processors that do not have - * fence instructions. - * - * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor - * systems, due to an erratum. The Linux kernel says that "Even distro - * kernels should think twice before enabling this", but for now let's - * be conservative and leave the full barrier on 32-bit processors. Also, - * IDT WinChip supports weak store ordering, and the kernel may enable it - * under our feet; cmm_smp_wmb() ceases to be a nop for these processors. - */ -#if (CAA_BITS_PER_LONG == 32) -#define cmm_mb()__asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#define cmm_rmb()__asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#define cmm_wmb()__asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory") -#else -#define cmm_mb()__asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#define cmm_rmb()__asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#define cmm_wmb()__asm__ __volatile__ ("lock; addl $0,0(%%rsp)":::"memory") -#endif Removing this removes support for older i686 and for URCU_ARCH_K1OM (Xeon Phi). Do we intend to remove that support ? Thanks, Mathieu #endif -#define caa_cpu_relax() __asm__ __volatile__ ("rep; nop" : : : "memory") - #define HAS_CAA_GET_CYCLES #define rdtscll(val) \ @@ -98,10 +62,10 @@ typedef uint64_t caa_cycles_t; static inline caa_cycles_t caa_get_cycles(void) { -caa_cycles_t ret = 0; + caa_cycles_t ret = 0; -rdtscll(ret); -return ret; + rdtscll(ret); + return ret; } This whitespace to tab cleanup should be moved to its own patch. Thanks, Mathieu /* -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
LAXED) + +#define uatomic_sub_return(addr, v) \ + __atomic_sub_fetch((addr), (v), __ATOMIC_SEQ_CST) + +#define uatomic_sub(addr, v) \ + (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_and(addr, mask) \ + (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_or(addr, mask) \ + (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED) +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED) + +#define cmm_smp_mb__before_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_and() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_or() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__after_uatomic_add() __atomic_thread_fence(__ATOMIC_SEQ_CST) +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() + +#define cmm_smp_mb() cmm_mb() #endif /* _URCU_UATOMIC_H */ [...] Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 1/7] Require __atomic builtins to build
On 2023-03-21 09:30, Ondřej Surý via lttng-dev wrote: Add autoconf checks for all __atomic builtins that urcu require, and adjust the gcc and clang versions in the README.md. Signed-off-by: Ondřej Surý --- README.md| 33 + configure.ac | 15 +++ 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index ba5bb08..a65a07a 100644 --- a/README.md +++ b/README.md @@ -68,30 +68,15 @@ Should also work on: (more testing needed before claiming support for these OS). -Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or -better. - -The C compiler used needs to support at least C99. The C++ compiler used -needs to support at least C++11. - -The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are -supported, with the following exceptions: - - - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile -accesses to offsets in a TLS structure on 32-bit x86. These versions are -therefore not compatible with `liburcu` on x86 32-bit -(i386, i486, i586, i686). -The problem has been reported to the GCC community: -<http://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg281255.html> - - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build. -See <http://kerneltrap.org/node/7507> - - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins -support. For ARM this was introduced with GCC 4.4: -<http://gcc.gnu.org/gcc-4.4/changes.html>. - - Linux aarch64 depends on GCC 5.1 or better because prior versions -perform unsafe access to deallocated stack. - -Clang version 3.0 (based on LLVM 3.0) is supported. +Linux ARM depends on running a Linux kernel 2.6.15 or better. + +The C compiler used needs to support at least C99 and __atomic +builtins. The C++ compiler used needs to support at least C++11 +and __atomic builtins. + +The GCC compiler versions 4.7 or better are supported. + +Clang version 3.1 (based on LLVM 3.1) is supported. Glibc >= 2.4 should work but the older version we test against is currently 2.17. diff --git a/configure.ac b/configure.ac index 909cf1d..cb7ba18 100644 --- a/configure.ac +++ b/configure.ac @@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [ AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is detected.]) ]) +# Require __atomic builtins +AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM( + [[int x, y;]], + [[__atomic_store_n(, 0, __ATOMIC_RELEASE); + __atomic_load_n(, __ATOMIC_CONSUME); + y = __atomic_exchange_n(, 1, __ATOMIC_ACQ_REL); + __atomic_compare_exchange_n(, , 0, 0, __ATOMIC_ACQ_REL, __ATOMIC_CONSUME); + __atomic_add_fetch(, 1, __ATOMIC_ACQ_REL); + __atomic_sub_fetch(, 1, __ATOMIC_ACQ_REL); + __atomic_and_fetch(, 0x01, __ATOMIC_ACQ_REL); + __atomic_or_fetch(, 0x01, __ATOMIC_ACQ_REL); + __atomic_thread_fence(__ATOMIC_ACQ_REL)]])], I think we also want to test for __atomic_signal_fence here. Thanks, Mathieu + [], + [AC_MSG_ERROR([The compiler does not support __atomic builtins])]) ## ## ## Optional features selection ## -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
On 2023-03-21 10:48, Ondřej Surý wrote: On 21. 3. 2023, at 15:46, Mathieu Desnoyers wrote: On 2023-03-21 06:21, Ondřej Surý wrote: On 20. 3. 2023, at 19:37, Mathieu Desnoyers wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: FIXME: This is experiment that adds explicit memory barrier in the free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to free the resources. Signed-off-by: Ondřej Surý --- src/workqueue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/workqueue.c b/src/workqueue.c index 1039d72..f21907f 100644 --- a/src/workqueue.c +++ b/src/workqueue.c @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref) struct urcu_workqueue_completion *completion; completion = caa_container_of(ref, struct urcu_workqueue_completion, ref); + assert(!urcu_ref_get_unless_zero(>ref)); Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ? I guess? My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release semantics and you do: if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) { /* __ATOMIC_ACQUIRE needed here */ free(obj); } we end up using following code in BIND 9: if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) { free(obj); } So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL, this patch should no longer be needed. Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL. Yeah, I think I already did that, but wrote the email before that. Nevertheless, my main point was that it should not be needed anymore. Agreed, let's see how it holds up to testing under TSAN. :) Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
On 2023-03-21 06:21, Ondřej Surý wrote: On 20. 3. 2023, at 19:37, Mathieu Desnoyers wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: FIXME: This is experiment that adds explicit memory barrier in the free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to free the resources. Signed-off-by: Ondřej Surý --- src/workqueue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/workqueue.c b/src/workqueue.c index 1039d72..f21907f 100644 --- a/src/workqueue.c +++ b/src/workqueue.c @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref) struct urcu_workqueue_completion *completion; completion = caa_container_of(ref, struct urcu_workqueue_completion, ref); + assert(!urcu_ref_get_unless_zero(>ref)); Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ? I guess? My experience with TSAN tells me, that you need some kind of memory barrier when using acquire-release semantics and you do: if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) { /* __ATOMIC_ACQUIRE needed here */ free(obj); } we end up using following code in BIND 9: if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) { free(obj); } So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL, this patch should no longer be needed. Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL. Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
On 2023-03-21 10:44, Mathieu Desnoyers wrote: On 2023-03-21 06:15, Ondřej Surý wrote: On 20. 3. 2023, at 19:31, Mathieu Desnoyers wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: When adding REMOVED_FLAG to the pointers in the rculfhash implementation, retype the generic pointer to uintptr_t to fix the compiler error. What is the compiler error ? I'm wondering whether the expected choice to match the rest of this file's content would be to use "uintptr_t *" or "unsigned long *" ? This is the error: rculfhash.c:1201:2: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(>next, REMOVED_FLAG); ^ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ rculfhash.c:1444:3: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(_bucket->next, REMOVED_FLAG); ^~~~ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ uintptr_t is defined as "unsigned integer type capable of holding a pointer to void" while unsigned long is at least 32-bit; I guess that works in a practise, but using unsigned long to retype the pointers might blow up (thinking of x32 which I know little about, but it's kind of hybrid architecture, isn't it?) x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even that architecture is OK with casting pointer to unsigned long. I agree with you that uintptr_t is the semantically correct type, but it should come as a separate change across the urcu code base: currently there are many places where void * is cast to unsigned long to do bitwise operations. I therefore recommend to use unsigned long here to stay similar to the rest of the code base, and keep the transition from unsigned long to uintptr_t for the future, as it is not an immediate issue we have to address. I forgot to mention: you should add the compiler error to the commit message. You should also explain why this was not an issue until now. It's probably related to the introduced use of __atomic builtins. Thanks, Mathieu Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
On 2023-03-21 06:15, Ondřej Surý wrote: On 20. 3. 2023, at 19:31, Mathieu Desnoyers wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: When adding REMOVED_FLAG to the pointers in the rculfhash implementation, retype the generic pointer to uintptr_t to fix the compiler error. What is the compiler error ? I'm wondering whether the expected choice to match the rest of this file's content would be to use "uintptr_t *" or "unsigned long *" ? This is the error: rculfhash.c:1201:2: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(>next, REMOVED_FLAG); ^ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ rculfhash.c:1444:3: error: address argument to atomic operation must be a pointer to integer ('struct cds_lfht_node **' invalid) uatomic_or(_bucket->next, REMOVED_FLAG); ^~~~ ../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or' (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) ^ ~~ uintptr_t is defined as "unsigned integer type capable of holding a pointer to void" while unsigned long is at least 32-bit; I guess that works in a practise, but using unsigned long to retype the pointers might blow up (thinking of x32 which I know little about, but it's kind of hybrid architecture, isn't it?) x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even that architecture is OK with casting pointer to unsigned long. I agree with you that uintptr_t is the semantically correct type, but it should come as a separate change across the urcu code base: currently there are many places where void * is cast to unsigned long to do bitwise operations. I therefore recommend to use unsigned long here to stay similar to the rest of the code base, and keep the transition from unsigned long to uintptr_t for the future, as it is not an immediate issue we have to address. Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
On 2023-03-20 14:38, Mathieu Desnoyers via lttng-dev wrote: On 2023-03-20 14:28, Ondřej Surý wrote: On 20. 3. 2023, at 19:03, Mathieu Desnoyers wrote: In doc/uatomic-api.md, we document: "```c type uatomic_cmpxchg(type *addr, type old, type new); ``` An atomic read-modify-write operation that performs this sequence of operations atomically: check if `addr` contains `old`. If true, then replace the content of `addr` by `new`. Return the value previously contained by `addr`. This function implies a full memory barrier before and after the atomic operation." This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure rather than __ATOMIC_CONSUME". From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html If desired is written into *ptr then true is returned and memory is affected according to the memory order specified by success_memorder. There are no restrictions on what memory order can be used here. Otherwise, false is returned and memory is affected according to failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor __ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by success_memorder. I think it makes sense that the failure_memorder has the same memorder as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as with __atomic_load_n, only following are permitted: The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME. Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere. And it _would_ make sense to use the same memorder on cmpxchg failure as uatomic_read if we were exposing a new API, but we are modifying an already exposed documented API, so I would stick to SEQ_CST for both cmpxchg success/failure. If we want to expose a new cmpxchg_relaxed_failure with a relaxed memorder on failure that would be fine, but we cannot change the semantic that is already documented. Thanks, Mathieu Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
On 2023-03-20 14:28, Ondřej Surý wrote: On 20. 3. 2023, at 19:03, Mathieu Desnoyers wrote: In doc/uatomic-api.md, we document: "```c type uatomic_cmpxchg(type *addr, type old, type new); ``` An atomic read-modify-write operation that performs this sequence of operations atomically: check if `addr` contains `old`. If true, then replace the content of `addr` by `new`. Return the value previously contained by `addr`. This function implies a full memory barrier before and after the atomic operation." This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure rather than __ATOMIC_CONSUME". From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html If desired is written into *ptr then true is returned and memory is affected according to the memory order specified by success_memorder. There are no restrictions on what memory order can be used here. Otherwise, false is returned and memory is affected according to failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor __ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by success_memorder. I think it makes sense that the failure_memorder has the same memorder as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as with __atomic_load_n, only following are permitted: The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME. Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere. Thanks, Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()
On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: FIXME: This is experiment that adds explicit memory barrier in the free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to free the resources. Signed-off-by: Ondřej Surý --- src/workqueue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/workqueue.c b/src/workqueue.c index 1039d72..f21907f 100644 --- a/src/workqueue.c +++ b/src/workqueue.c @@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref) struct urcu_workqueue_completion *completion; completion = caa_container_of(ref, struct urcu_workqueue_completion, ref); + assert(!urcu_ref_get_unless_zero(>ref)); Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of some sort ? Thanks, Mathieu free(completion); } -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c
On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: When adding REMOVED_FLAG to the pointers in the rculfhash implementation, retype the generic pointer to uintptr_t to fix the compiler error. What is the compiler error ? I'm wondering whether the expected choice to match the rest of this file's content would be to use "uintptr_t *" or "unsigned long *" ? Thanks, Mathieu Signed-off-by: Ondřej Surý --- src/rculfhash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rculfhash.c b/src/rculfhash.c index b456415..863387e 100644 --- a/src/rculfhash.c +++ b/src/rculfhash.c @@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size, * Knowing which wins the race will be known after the garbage * collection phase, stay tuned! */ - uatomic_or(>next, REMOVED_FLAG); + uatomic_or((uintptr_t *)>next, REMOVED_FLAG); /* We performed the (logical) deletion. */ /* @@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned long i, dbg_printf("remove entry: order %lu index %lu hash %lu\n", i, j, j); /* Set the REMOVED_FLAG to freeze the ->next for gc */ - uatomic_or(_bucket->next, REMOVED_FLAG); + uatomic_or((uintptr_t *)_bucket->next, REMOVED_FLAG); _cds_lfht_gc_bucket(parent_bucket, fini_bucket); } ht->flavor->read_unlock(); -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED
On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and CMM_STORE_SHARED(). Signed-off-by: Ondřej Surý --- include/urcu/system.h | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/include/urcu/system.h b/include/urcu/system.h index faae390..99e7443 100644 --- a/include/urcu/system.h +++ b/include/urcu/system.h @@ -26,7 +26,7 @@ * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come * before the load. */ -#define _CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p) +#define _CMM_LOAD_SHARED(p) __atomic_load_n(&(p), __ATOMIC_RELAXED) /* * Load a data from shared memory, doing a cache flush if required. @@ -42,7 +42,7 @@ * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should * follow the store. */ -#define _CMM_STORE_SHARED(x, v)__extension__ ({ CMM_ACCESS_ONCE(x) = (v); }) +#define _CMM_STORE_SHARED(x, v)__atomic_store_n(&(x), (v), __ATOMIC_RELAXED) __atomic_store_n() is void. _CMM_STORE_SHARED() should evaluate to (v) (unless we decide to change the semantic, which I would rather avoid). Thanks, Mathieu /* * Store v into x, where x is located in shared memory. Performs the @@ -51,9 +51,8 @@ #define CMM_STORE_SHARED(x, v) \ __extension__ \ ({ \ - __typeof__(x) _v = _CMM_STORE_SHARED(x, v); \ + _CMM_STORE_SHARED(x, v);\ cmm_smp_wmc(); \ - _v = _v;/* Work around clang "unused result" */ \ }) #endif /* _URCU_SYSTEM_H */ -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins
EREFERENCE_USE_VOLATILE, the user requires use of - * volatile access to implement rcu_dereference rather than - * memory_order_consume load from the C11/C++11 standards. - * * This may improve performance on weakly-ordered architectures where * the compiler implements memory_order_consume as a * memory_order_acquire, which is stricter than required by the @@ -83,35 +73,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ - -#if !defined (URCU_DEREFERENCE_USE_VOLATILE) &&\ - ((defined (__cplusplus) && __cplusplus >= 201103L) ||\ - (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)) -# define __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -#endif - -/* - * If p is const (the pointer itself, not what it points to), using - * __typeof__(p) would declare a const variable, leading to - * -Wincompatible-pointer-types errors. Using the statement expression - * makes it an rvalue and gets rid of the const-ness. - */ -#ifdef __URCU_DEREFERENCE_USE_ATOMIC_CONSUME -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(__extension__ ({ \ - __typeof__(p) __attribute__((unused)) _p0 = { 0 }; \ - _p0; \ - })) _p1; \ - __atomic_load(&(p), &_p1, __ATOMIC_CONSUME);\ - (_p1); \ - }) -#else -# define _rcu_dereference(p) __extension__ ({ \ - __typeof__(p) _p1 = CMM_LOAD_SHARED(p); \ - cmm_smp_read_barrier_depends(); \ - (_p1); \ - }) -#endif +#define _rcu_dereference(p) _rcu_get_pointer(&(p)) /** * _rcu_cmpxchg_pointer - same as rcu_assign_pointer, but tests if the pointer @@ -126,12 +88,12 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_cmpxchg_pointer(p, old, _new) \ - __extension__ \ - ({ \ - __typeof__(*p) _pold = (old); \ - __typeof__(*p) _pnew = (_new); \ - uatomic_cmpxchg(p, _pold, _pnew); \ +#define _rcu_cmpxchg_pointer(p, old, _new) \ + ({ \ + __typeof__(*(p)) __old = old; \ + __atomic_compare_exchange_n(p, &__old, _new, 0, \ + __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\ __ATOMIC_SEQ_CST on both success and failure. + __old; \ }) /** @@ -145,22 +107,11 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_xchg_pointer(p, v)\ - __extension__ \ - ({ \ - __typeof__(*p) _pv = (v); \ - uatomic_xchg(p, _pv); \ - }) - +#define _rcu_xchg_pointer(p, v) \ + __atomic_exchange_n(p, v, __ATOMIC_ACQ_REL) __ATOMIC_SEQ_CST. -#define _rcu_set_pointer(p, v)\ - do {\ - __typeof__(*p) _pv = (v); \ - if (!__builtin_constant_p(v) || \ - ((v) != NULL)) \ - cmm_wmb(); \ - uatomic_set(p, _pv);\ - } while (0) +#define _rcu_set_pointer(p, v) \ + __atomic_store_n(p, v, __ATOMIC_RELEASE) OK. Thanks, Mathieu /** * _rcu_assign_pointer - assign (publicize) a pointer to a new data structure @@ -178,7 +129,7 @@ extern "C" { * meets the 10-line criterion in LGPL, allowing this function to be * expanded directly in non-LGPL code. */ -#define _rcu_assign
Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()
On 2023-03-20 14:06, Mathieu Desnoyers via lttng-dev wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so ThreadSanitizer can understand the memory synchronization. You should update the patch subject and commit message to replace "thread" by "signal". FIXME: What should be the correct memory ordering here? ACQ_REL is what we want here, I think this is fine. We want to prevent the compiler from reordering loads/stores across the fence, but don't want any barrier instructions issued. We should probably make it SEQ_CST here as well, even though I doubt it changes anything in this very particular case of atomic_signal_fence. Thanks, Mathieu Thanks, Mathieu Signed-off-by: Ondřej Surý --- include/urcu/compiler.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h index 2f32b38..ede909f 100644 --- a/include/urcu/compiler.h +++ b/include/urcu/compiler.h @@ -28,7 +28,8 @@ #define caa_likely(x) __builtin_expect(!!(x), 1) #define caa_unlikely(x) __builtin_expect(!!(x), 0) -#define cmm_barrier() __asm__ __volatile__ ("" : : : "memory") +/* FIXME: What would be a correct memory ordering here? */ +#define cmm_barrier() __atomic_signal_fence(__ATOMIC_ACQ_REL) /* * Instruct the compiler to perform only a single access to a variable -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
On 2023-03-20 14:03, Mathieu Desnoyers via lttng-dev wrote: On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: Replace the custom assembly code in include/urcu/uatomic/ with __atomic builtins provided by C11-compatible compiler. [...] +#define UATOMIC_HAS_ATOMIC_BYTE +#define UATOMIC_HAS_ATOMIC_SHORT + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE) + +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME) + +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_cmpxchg(addr, old, new) \ + ({ \ + __typeof__(*(addr)) __old = old; \ + __atomic_compare_exchange_n(addr, &__old, new, 0, \ + __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\ Actually, I suspect we'd want to change __ATOMIC_ACQ_REL to __ATOMIC_SEQ_CST everywhere, because we want total order. Thanks, Mathieu In doc/uatomic-api.md, we document: "```c type uatomic_cmpxchg(type *addr, type old, type new); ``` An atomic read-modify-write operation that performs this sequence of operations atomically: check if `addr` contains `old`. If true, then replace the content of `addr` by `new`. Return the value previously contained by `addr`. This function implies a full memory barrier before and after the atomic operation." This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure rather than __ATOMIC_CONSUME". + __old; \ + }) + +#define uatomic_add_return(addr, v) \ + __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_add(addr, v) \ + (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_sub_return(addr, v) \ + __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_sub(addr, v) \ + (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_and(addr, mask) \ + (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_or(addr, mask) \ + (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED) +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED) + +#define cmm_smp_mb__before_uatomic_and() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_and() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_or() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_or() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_add() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_add() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() + +#define cmm_smp_mb() cmm_mb() While OK for the general case, I would recommend that we immediately implement something more efficient on x86 32/64 which takes into account that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK prefixed atomic ops, which imply the barrier already, leaving the before/after_uatomic_*() as no-ops. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()
On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so ThreadSanitizer can understand the memory synchronization. You should update the patch subject and commit message to replace "thread" by "signal". FIXME: What should be the correct memory ordering here? ACQ_REL is what we want here, I think this is fine. We want to prevent the compiler from reordering loads/stores across the fence, but don't want any barrier instructions issued. Thanks, Mathieu Signed-off-by: Ondřej Surý --- include/urcu/compiler.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h index 2f32b38..ede909f 100644 --- a/include/urcu/compiler.h +++ b/include/urcu/compiler.h @@ -28,7 +28,8 @@ #define caa_likely(x) __builtin_expect(!!(x), 1) #define caa_unlikely(x) __builtin_expect(!!(x), 0) -#define cmm_barrier() __asm__ __volatile__ ("" : : : "memory") +/* FIXME: What would be a correct memory ordering here? */ +#definecmm_barrier() __atomic_signal_fence(__ATOMIC_ACQ_REL) /* * Instruct the compiler to perform only a single access to a variable -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation
On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote: Replace the custom assembly code in include/urcu/uatomic/ with __atomic builtins provided by C11-compatible compiler. [...] +#define UATOMIC_HAS_ATOMIC_BYTE +#define UATOMIC_HAS_ATOMIC_SHORT + +#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE) + +#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME) + +#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_cmpxchg(addr, old, new) \ + ({ \ + __typeof__(*(addr)) __old = old; \ + __atomic_compare_exchange_n(addr, &__old, new, 0, \ + __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\ In doc/uatomic-api.md, we document: "```c type uatomic_cmpxchg(type *addr, type old, type new); ``` An atomic read-modify-write operation that performs this sequence of operations atomically: check if `addr` contains `old`. If true, then replace the content of `addr` by `new`. Return the value previously contained by `addr`. This function implies a full memory barrier before and after the atomic operation." This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure rather than __ATOMIC_CONSUME". + __old; \ + }) + +#define uatomic_add_return(addr, v) \ + __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_add(addr, v) \ + (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_sub_return(addr, v) \ + __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL) + +#define uatomic_sub(addr, v) \ + (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED) + +#define uatomic_and(addr, mask) \ + (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_or(addr, mask) \ + (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED) + +#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED) +#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED) + +#define cmm_smp_mb__before_uatomic_and() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_and() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_or() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_or() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_add() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__after_uatomic_add() __atomic_thread_fence(__ATOMIC_ACQ_REL) +#define cmm_smp_mb__before_uatomic_sub() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_sub() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_inc() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_inc() cmm_smp_mb__after_uatomic_add() +#define cmm_smp_mb__before_uatomic_dec() cmm_smp_mb__before_uatomic_add() +#define cmm_smp_mb__after_uatomic_dec() cmm_smp_mb__after_uatomic_add() + +#define cmm_smp_mb() cmm_mb() While OK for the general case, I would recommend that we immediately implement something more efficient on x86 32/64 which takes into account that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK prefixed atomic ops, which imply the barrier already, leaving the before/after_uatomic_*() as no-ops. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] userspace-rcu and ThreadSanitizer
On 2023-03-17 13:02, Ondřej Surý wrote: On 17. 3. 2023, at 14:44, Mathieu Desnoyers wrote: I would indeed like to remove all the custom atomics assembly code from liburcu now that there are good atomics support in the major compilers (gcc and clang). Here's very preliminary implementation: https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/2 I just did something wrong somewhere along the path and it doesn't compile now, but it did for me locally. I am submitting this now as it's 18:00 Friday evening and my kids are starting to be angry at me :). This will need some more work - I think some of the cmm_ macros might be dropped now, and somebody who does that more often than I should take a look at the memory orderings. A few comments: cmm_barrier() should rather be __atomic_signal_fence(). Also I notice this macro pattern (coding style): #define uatomic_set(addr, v) __atomic_store_n((addr), (v), __ATOMIC_RELEASE) The extra parentheses for parameters are not needed, because the comma is pretty much the last operator in terms of priority. The following would be preferred specifically because those are separated by comma: #define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE) Our memory barrier semantic are similar to the Linux kernel, where the following imply ACQ_REL because they return something: cmpxchg, add_return, sub_return, xchg. The rest (add, sub, and, or, inc, dec) are __ATOMIC_RELAXED. Note that cmm_smp_mb__before/after_uatomic_*() need to be implemented as __atomic_thread_fence(__ATOMIC_ACQ_REL). There are some architectures where we will want to keep a specialized version of those add, sub, and, or, inc, dec operations which include the ACQ_REL semantic, e.g. x86, where this is implied by the LOCK prefix. For those the cmm_smp_mb__before/after_uatomic_*() will be no-ops. The CMM_STORE_SHARED is not meant to have a RELEASE semantic. It is meant to update variables that don't need the release ordering. The ATOMIC_CONSUME was not the intent at the CMM_LOAD_SHARED level neither. (this is just from looking around at the patches, it would be better if we can have the patches posted to the mailing list for further discussion) Thanks! Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] userspace-rcu and ThreadSanitizer
On 2023-03-17 11:50, Ondřej Surý wrote: On 17. 3. 2023, at 14:44, Mathieu Desnoyers wrote: Sure, can you please submit the patch as a separate email with subject/commit message/signed-off-by tag ? https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/1.patch Would this work for you? Or do you need to have the patch attached? Having the patch attached (e.g. using git send-email) would be better, but I don't mind downloading the file for this time. Merged into liburcu master branch, thanks! Mathieu Ondrej -- Ondřej Surý (He/Him) ond...@sury.org -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] userspace-rcu and ThreadSanitizer
ded with clang-17 as noreturn is now reserved word: Sure, can you please submit the patch as a separate email with subject/commit message/signed-off-by tag ? Thanks! Mathieu diff --git a/include/urcu/uatomic/generic.h b/include/urcu/uatomic/generic.h index 89d1cfa..c3762b0 100644 --- a/include/urcu/uatomic/generic.h +++ b/include/urcu/uatomic/generic.h @@ -38,7 +38,7 @@ extern "C" { #endif #if !defined __OPTIMIZE__ || defined UATOMIC_NO_LINK_ERROR -static inline __attribute__((always_inline, noreturn)) +static inline __attribute__((always_inline, __noreturn__)) void _uatomic_link_error(void) { #ifdef ILLEGAL_INSTR diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h index 187727e..cc76f53 100644 --- a/src/urcu-call-rcu-impl.h +++ b/src/urcu-call-rcu-impl.h @@ -1055,7 +1055,7 @@ void urcu_register_rculfhash_atfork(struct urcu_atfork *atfork) * This unregistration function is deprecated, meant only for internal * use by rculfhash. */ -__attribute__((noreturn)) +__attribute__((__noreturn__)) void urcu_unregister_rculfhash_atfork(struct urcu_atfork *atfork __attribute__((unused))) { urcu_die(EPERM); Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU
On 2023-03-13 11:30, Ondřej Surý wrote: Hi Matthieu, I spent some more time with the userspace-rcu on Friday and over weekend and now I am in much better place. On 13. 3. 2023, at 15:29, Mathieu Desnoyers wrote: On 2023-03-11 01:04, Ondřej Surý via lttng-dev wrote: Hey, so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment, I am rewriting the internal address database (keeps the infrastructure information about names and addresses). That's indeed very interesting ! Thanks for the userspace-rcu! It saves a lot of time - while my colleague Tony Finch already wrote our internal QSBR implementation from scratch, it would be waste of time to try to reimplement the CDS part of the library. This is part of larger work to replace the internal BIND 9 database that's currently implemented as rwlocked RBT with qptrie, if you are interested Tony has good summary here: https://dotat.at/@/2023-02-28-qp-bind.html Speaking of tries, I have implemented RCU Judy arrays in liburcu feature branches a while back. Those never made it to the liburcu master branch because I had no real-life use for those so far, and I did not want to expose a public API that would bitrot without real-life user feedback. The lookups and ordered traversals (next/prev) are entirely RCU, and updates are either single-threaded, or use a strategy where locking is distributed within the trie so updates to data spatially discontinuous would not contend with each other. My original implementation supported integer keys as well as variable-length string keys. The advantage of Judy arrays is that it minimizes the number of cache-lines touched on lookup traversal. Let me know if this would be useful for your use-cases, and if so I can provide links to prototype branches. [...] So this is part with the hashtable lookup which seems to work well: rcu_read_lock(); struct cds_lfht_iter iter; struct cds_lfht_node *ht_node; cds_lfht_lookup(adb->names_ht, hashval, names_match, , ); ht_node = cds_lfht_iter_get_node(); bool unlink = false; if (ht_node == NULL) { /* Allocate a new name and add it to the hash table. */ adbname = new_adbname(adb, name, start_at_zone); ht_node = cds_lfht_add_unique(adb->names_ht, hashval, names_match, , >ht_node); if (ht_node != >ht_node) { /* ISC_R_EXISTS */ destroy_adbname(adbname); adbname = NULL; } } if (adbname == NULL) { INSIST(ht_node != NULL); adbname = caa_container_of(ht_node, dns_adbname_t, ht_node); unlink = true; } dns_adbname_ref(adbname); What is this dns_adbname_ref() supposed to do ? And is there a reference to adbname that is still used after rcu_read_unlock() ? What guarantees the existence of the adbname after rcu_read_unlock() ? This is part of the internal reference counting - there's a macro that expects `isc_refcount_t references;` member on the struct and it creates _ref, _unref, _attach and _detach functions for each struct. The last _detach/_unref calls a destroy function. rcu_read_unlock(); and here's the part where LRU gets updated: LOCK(>lock); /* Must be unlocked by the caller */ I suspect you use a scheme where you hold the RCU read-side to perform the lookup, and then you use the object with an internal lock held. But expecting the object to still exist after rcu read unlock is incorrect, unless some other reference counting scheme is used. Yeah, I was trying to minimize the sections where we hold the rcu_read locks, but I gave up and now there's rcu_read lock held for longer periods of time. We've used that kind of scheme in LTTng lttng-relayd, where we use RCU for short-term existence guarantee, and reference counting for longer-term existence guarantee. An example can be found here: https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-relayd/viewer-stream.cpp viewer_stream_get_by_id() attempts lookup from the hash table, and re-validates that the object exists with viewer_stream_get(), which checks if the refcount is already 0 as it tries to increment it with urcu_ref_get_unless_zero(). If zero, it does as if the object was not found. I recommend this kind of scheme if you intend to use both RCU and reference counting. Then you can place a mutex within the object, and use that mutex to provide mutual exclusion between concurrent accesses to the object that need to be serialized. In the destroy handler (called when the reference count reaches 0), you will typically want to unlink your object from the various data structures holding references to it (hash tables, lists), and th
Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU
start_at_zone=true, now=) at adb.c:1446 #2 0x7fae87a392bf in dns_adb_createfind (adb=0x7fae830142a0, loop=0x7fae842c3a20, cb=cb@entry=0x7fae87b28d9f , cbarg=0x7fae7c679000, name=name@entry=0x7fae804fc9b0, qname=0x7fae7c679010, qtype=1, options=63, now=, target=0x0, port=53, depth=1, qc=0x7fae7c651060, findp=0x7fae804fc698) at adb.c:2149 (gdb) frame 0 #0 0x7fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at /usr/include/x86_64-linux-gnu/urcu/rculist.h:71 71 elem->next->prev = elem->prev; (gdb) print elem->next $1 = (struct cds_list_head *) 0x0 (gdb) print elem $2 = (struct cds_list_head *) 0x7fae37e78880 So, I suspect, I am doing something wrong when updating the position of the the name in the LRU list. There are couple of places where we iterate through the LRU list (overmem cleaning can kick-in, the user initiated cleaning can start, shutdown can be happening...) It gets me to wonder whether you really need RCU for the LRU list ? Are those lookups very frequent ? And do they typically end up needing to grab a lock to protect against concurrent list modifications ? Is there perhaps already some LRU implementation using Userspace-RCU that I can take look at? I don't have an example implementing an LRU with a linked list specifically, but this is not different from other linked-list uses. Thanks, Mathieu Thank you! Ondrej -- Ondřej Surý (He/Him) ond...@sury.org ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] how to disable local file writing in relayd?
On 2023-03-06 00:12, Yuan Bin via lttng-dev wrote: Can I disable local-file-writing in lttng-relayd to avoid the disk space overhead, only using it as a live viewer? Not explicitly, but you can store your temporary files on a tmpfs file system (see lttng-relayd(8) --output command line parameter), which will only keep the relayd files in memory, and use the tracefile rotation feature to prevent the files from growing forever, e.g.: https://lttng.org/docs/v2.13/#doc-enabling-disabling-channels Example:Create a Linux kernel channel which rotates eight trace files of 4 MiB each for each stream. lttng enable-channel --kernel --tracefile-count=8 \ --tracefile-size=4194304 my-channel See lttng-enable-channel(1) for more info. I hope this helps! Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] LTTng-modules 2.12.13 and 2.13.9 (Linux kernel tracer)
This is a release announcement for the two currently maintained stable branches of the LTTng-modules project. * New in these releases LTTng-modules v2.13.9 contains a fix required to build against Linux v6.2. Both v2.12.13 and v2.13.9 contain a set of build fixes to follow evolution of the jbd2 tracepoint instrumentation within the Linux kernel 5.4 and 5.10 stable branches. * Changelog 2023-03-03 (Canadian Bacon Day) LTTng modules 2.13.9 * fix: jbd2: use the correct print format (v5.4.229) * fix: jbd2 upper bound for v5.10.163 * fix: jbd2: use the correct print format (v5.10.163) * fix: btrfs: move accessor helpers into accessors.h (v6.2) 2023-03-03 (Canadian Bacon Day) 2.12.13 * fix: jbd2: use the correct print format (v5.4.229) * fix: jbd2 upper bound for v5.10.163 * fix: jbd2: use the correct print format (v5.10.163) Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Filtering tracing by process name or PID/TID
On 2023-02-15 04:09, Rengar Stinkt via lttng-dev wrote: Dear community, I only recently started working with lttng tracing due to work related projects, so I am very new to this. I have done some research before posting this but I can't seem to find an answer. I am running several CPU load tests for specific processes on different devices using lttng and TraceCompass for visualization. I am running into the issue that 99.9% of traced processes are not of value to me and the tracing files get extremely big and hard to work with (filtering with TraceCompass is very slow). Now I thought of filtering the processes before tracing and I found filtering by PID and TID. The issue with this is that the PIDs and TIDs are unique on each device but change between devices. I then found the command "htop -d 0.1 -u **String**" to see currently running processes with a certain name. Now if I run this it shows me the running process IF they are running. I have time triggered and event triggered processes. There are many inconvenient workarounds to make it work, like triggering the events and finding out the PID and then manually copying all of the IDs and pasting them into "lttng track --kernel --pid=""". But I am trying to find a way to either filter by name right away, avoiding relying on PIDs or at least to have an automated process of doing it. But I am unfamiliar with running code in the PuTTY terminal that we are using, so I am trying to avoid this (for now). If this is the only option though, I will have to look into it. Is there any way to filter by name right away like in the mentioned htop command? Thank you so much in advance. This would be: lttng enable-event -k event_name --filter '$ctx.procname == "string"' Where "string" can include wildcards as well. Hoping this helps, Mathieu Dom ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] Userspace RCU 0.14.0, 0.13.3, 0.12.5 [EOL]
Hi, This is a release announcement for the Userspace RCU project. This is a set of releases, including the new 0.14 branch with the 0.14.0 release, and bug fix releases for the 0.13 and 0.12 branches. The 0.12.5 release is the last of the 0.12 branch, which reaches end of life with the release of 0.14. Here are the new features introduced in urcu 0.14.0: - C99 and C++11 are now the baseline requirements, as documented in README.md. - Introduce public APIs for C++, An important point to consider: urcu/compiler.h needs to include in C++, which prevents including urcu/compiler.h from extern "C" code. - Introduce new grace period polling APIs in urcu-memb,mb,signal,qsbr,bp flavors: struct urcu_gp_poll_state start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(struct urcu_gp_poll_state state); This allow periodically polling to check if a started grace period has completed, and thus check for grace period completion and some other condition as well. - rculfhash: introduce cds_lfht_node_init_deleted Allow initializing lfht node to "removed" state to allow querying whether the node is published in a hash table before it is added to the hash table and after it has been removed from the hash table. - Disable signals in URCU background threads Applications using signalfd depend on signals being blocked in all threads of the process, otherwise threads with unblocked signals can receive them and starve the signalfd. While some threads in URCU do block signals (e.g. workqueue worker for rculfhash), the call_rcu, defer_rcu, and rculfhash partition_resize_helper threads do not. Always block all signals before creating threads, and only unblock SIGRCU when registering a urcu-signal thread. Restore the SIGRCU signal to its pre-registration blocked state on unregistration. For rculfhash, cds_lfht_worker_init can be removed, because its only effect is to block all signals except SIGRCU. Blocking all signals is already done by the workqueue code, and unbloking SIGRCU is now done by the urcu signal flavor thread regisration. - Always use '__thread' for Thread local storage except on MSVC Use the GCC extension '__thread' [1] for Thread local storage on all C and C++ compilers except MSVC. While C11 and C++11 respectively offer '_Thread_local' and 'thread_local' as potentialy faster implementations, they offer no guarantees of compatibility when used in a library interface which might be used by both C and C++ client code. - Various test framework improvements. - Wire up membarrier system call on Alpha. The only missing architecture without membarrier wired up is MIPS. https://bugs.lttng.org/issues/940 Here are the fixes introduced in urcu 0.14.0, 0.13.3 and 0.12.5: - Fix: auto-resize hash table destroy deadlock Fix a deadlock for auto-resize hash tables when cds_lfht_destroy is called with RCU read-side lock held. - Join call_rcu worker thread in call_rcu_data_free (eliminate leaks) - Teardown default call_rcu worker on application exit Teardown the default call_rcu worker thread if there are no queued callbacks on process exit. This prevents leaking memory. Here is how an application can ensure graceful teardown of this worker thread: - An application queuing call_rcu callbacks should invoke rcu_barrier() before it exits. - When chaining call_rcu callbacks, the number of calls to rcu_barrier() on application exit must match at least the maximum number of chained callbacks. - If an application chains callbacks endlessly, it would have to be modified to stop chaining callbacks when it detects an application exit (e.g. with a flag), and wait for quiescence with rcu_barrier() after setting that flag. - The statements above apply to a library which queues call_rcu callbacks, only it needs to invoke rcu_barrier in its library destructor. - Allow building on MSYS2 Update cygwin libtool config in `configure.ac` to match MSYS2 build environments as well. MSYS2 is also a Windows build environment that produces DLLs. Feedback is welcome! Mathieu Project website: https://liburcu.org Git repository: git://git.liburcu.org/urcu.git -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
Hi Micke, I did tweaks to make the code C++ compatible even though it's currently only built in C. It makes it more future-proof. I've merged the resulting patch into lttng-ust master/stable-2.13/stable-2.12. Thanks for testing ! Mathieu On 2023-02-06 11:15, Beckius, Mikael wrote: Hello Mathieu! I added your latest implementation to my test and it seems to perform well on both arm and arm64. Since the test was written in C++ I had to make a small change to the cast in order for the test to compile. Micke -Ursprungligt meddelande- Från: Mathieu Desnoyers Skickat: den 2 februari 2023 17:26 Till: Beckius, Mikael ; lttng- d...@lists.lttng.org Ämne: Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization CAUTION: This email comes from a non Wind River email account! Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Mikael, I just tried another approach to fix this issue, see: https://review.lttng.org/c/lttng-ust/+/9413 Fix: use unaligned pointer accesses for lttng_inline_memcpy It is less intrusive than other approaches, and does not change the generated code on the most relevant architectures. Feedback is welcome, Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
Hi Mikael, I just tried another approach to fix this issue, see: https://review.lttng.org/c/lttng-ust/+/9413 Fix: use unaligned pointer accesses for lttng_inline_memcpy It is less intrusive than other approaches, and does not change the generated code on the most relevant architectures. Feedback is welcome, Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
On 2023-01-31 11:18, Mathieu Desnoyers wrote: On 2023-01-31 11:08, Mathieu Desnoyers wrote: On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote: Hello Matthieu! I have looked at this in place of Anders and as far as I can tell this is not an arm64 issue but an arm issue. And even on arm __ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs if size equals 8. So for ARM, perhaps we should do the following in include/lttng/ust-arch.h: #if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif And refer to https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options Based on that documentation, it is possible to build with -mno-unaligned-access, and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, unaligned accesses are not enabled. I would only push this kind of change into the master branch though, due to its impact and the fact that this is only a performance improvement. But setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for arm32 when __ARM_FEATURE_UNALIGNED is defined would still cause issues for 8-byte lttng_inline_memcpy with my proposed patch right ? AFAIU 32-bit arm with __ARM_FEATURE_UNALIGNED has unaligned accesses for 2 and 4 bytes accesses, but somehow traps for unaligned 8-bytes accesses ? Re-reading your analysis, I may have mistakenly concluded that using the lttng ust ring buffer in "packed" mode would be faster than aligned mode on arm32 and aarch64, but that's not really what you have benchmarked there. So forget what I said about setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS to 1 for arm32 and aarch64. There is a distinction between having efficient unaligned access and supporting unaligned accesses at all. For aarch64, it appears to support unaligned accesses, but it may be slower than aligned accesses AFAIU. For arm32, it supports unaligned accesses for 2 and 4 bytes when __ARM_FEATURE_UNALIGNED is set, but not for 8 bytes (it traps). Then it's not clear whether a 2 or 4 bytes access is slower when unaligned compared to aligned. At the end of the day, it's a question of compactness of the generated trace data (added throughput overhead) vs cpu time required to perform an unaligned access vs aligned. Thoughts ? Thanks, Mathieu Thanks, Mathieu In addition I did some performance testing of lttng_inline_memcpy by extracting it and adding it to a simple test program. It appears that the general performance increases on arm, arm64, arm on arm64 hardware and x86-64. But it also appears that on arm if you end up in memcpy the old code where you call memcpy directly is actually slightly faster. Nothing unexpected here. Just make sure that your test program does not call lttng_inline_memcpy with constant size values which end up optimizing away branches. In the context where lttng_inline_memcpy is used, most of the time its arguments are not constants. Skipping the memcpy fallback on arm for unaligned copies of sizes 2 and 4 further improves the performance This would be naturally done on your board if we conditionally set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for __ARM_FEATURE_UNALIGNED right ? and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the best performance on arm64. This could go into lttng-ust master branch as well, e.g.: #if defined(LTTNG_UST_ARCH_AARCH64) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif Thanks! Mathieu Micke ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
On 2023-01-31 11:08, Mathieu Desnoyers wrote: On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote: Hello Matthieu! I have looked at this in place of Anders and as far as I can tell this is not an arm64 issue but an arm issue. And even on arm __ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs if size equals 8. So for ARM, perhaps we should do the following in include/lttng/ust-arch.h: #if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif And refer to https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options Based on that documentation, it is possible to build with -mno-unaligned-access, and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, unaligned accesses are not enabled. I would only push this kind of change into the master branch though, due to its impact and the fact that this is only a performance improvement. But setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for arm32 when __ARM_FEATURE_UNALIGNED is defined would still cause issues for 8-byte lttng_inline_memcpy with my proposed patch right ? AFAIU 32-bit arm with __ARM_FEATURE_UNALIGNED has unaligned accesses for 2 and 4 bytes accesses, but somehow traps for unaligned 8-bytes accesses ? Thanks, Mathieu In addition I did some performance testing of lttng_inline_memcpy by extracting it and adding it to a simple test program. It appears that the general performance increases on arm, arm64, arm on arm64 hardware and x86-64. But it also appears that on arm if you end up in memcpy the old code where you call memcpy directly is actually slightly faster. Nothing unexpected here. Just make sure that your test program does not call lttng_inline_memcpy with constant size values which end up optimizing away branches. In the context where lttng_inline_memcpy is used, most of the time its arguments are not constants. Skipping the memcpy fallback on arm for unaligned copies of sizes 2 and 4 further improves the performance This would be naturally done on your board if we conditionally set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for __ARM_FEATURE_UNALIGNED right ? and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the best performance on arm64. This could go into lttng-ust master branch as well, e.g.: #if defined(LTTNG_UST_ARCH_AARCH64) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif Thanks! Mathieu Micke ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote: Hello Matthieu! I have looked at this in place of Anders and as far as I can tell this is not an arm64 issue but an arm issue. And even on arm __ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs if size equals 8. So for ARM, perhaps we should do the following in include/lttng/ust-arch.h: #if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif And refer to https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options Based on that documentation, it is possible to build with -mno-unaligned-access, and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, unaligned accesses are not enabled. I would only push this kind of change into the master branch though, due to its impact and the fact that this is only a performance improvement. In addition I did some performance testing of lttng_inline_memcpy by extracting it and adding it to a simple test program. It appears that the general performance increases on arm, arm64, arm on arm64 hardware and x86-64. But it also appears that on arm if you end up in memcpy the old code where you call memcpy directly is actually slightly faster. Nothing unexpected here. Just make sure that your test program does not call lttng_inline_memcpy with constant size values which end up optimizing away branches. In the context where lttng_inline_memcpy is used, most of the time its arguments are not constants. Skipping the memcpy fallback on arm for unaligned copies of sizes 2 and 4 further improves the performance This would be naturally done on your board if we conditionally set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for __ARM_FEATURE_UNALIGNED right ? and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the best performance on arm64. This could go into lttng-ust master branch as well, e.g.: #if defined(LTTNG_UST_ARCH_AARCH64) #define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 #endif Thanks! Mathieu Micke ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
On 2023-01-26 14:32, Anders Wallin wrote: Hi Matthieu, I've retired and no longer have access to any arch64 target to test it on. Thanks for your reply Anders, I've talked to Henrik and Pär today and they are already testing it out. Enjoy your retirement :) Best regards, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization
Hi Anders, Sorry for the long delay on this one, can you have a look at the following fix ? https://review.lttng.org/c/lttng-ust/+/9319 Fix: aarch64: do not perform unaligned stores If it passes your testing, I'll merge this into lttng-ust. Thanks, Mathieu On 2017-12-28 09:13, Anders Wallin wrote: Hi Mathieu, I finally got some time to dig into this issue. The crash only happens when metadata is written AND the size of the metadata will end up in a write that is 8,4,2 or 1 bytes long AND that the source or destination is not aligned correctly according to HW limitation. I have not found any simple way to keep the performance enhancement code that is run most of the time. Maybe the metadata writes should have it's own write function instead. Here is an example of a crash (code is from lttng-ust 2.9.1 and lttng-tools 2.9.6) where the size is 8 bytes and the src address is unaligned at 0xf3b7eeb2; #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 No locals. #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, config=0xf737c560 ) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 __len = 8 handle = 0xf3b2e0c0 backend_pages = chanb = 0xf3b2e2e0 offset = #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 No locals. #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=out>, metadata_str=0xf3b7eeb2 "", len=) at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size = 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' times>, backend_pages = 0xf690c000} chan = 0xf3b2e4d8 str = 0xf3b7eeb2 "" reserve_len = 8 ret = __func__ = '\000' __PRETTY_FUNCTION__ = '\000' ---Type to continue, or q to quit--- #4 0x000344cc in commit_one_metadata_packet (stream=stream@entry=0xf3b2e560) at ust-consumer.c:2206 write_len = ret = __PRETTY_FUNCTION__ = "commit_one_metadata_packet" #5 0x00036538 in lttng_ustconsumer_read_subbuffer (stream=stream@entry=0xf3b2e560, ctx=ctx@entry=0x25e6e8) at ust-consumer.c:2452 len = 4096 subbuf_size = 4093 padding = err = -11 write_index = 1 ret = ustream = index = {offset = 0, packet_size = 575697416355872, content_size = 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end = 34359738496, Regards Anders fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers mailto:mathieu.desnoy...@efficios.com>>: - On Nov 24, 2017, at 3:23 AM, Anders Wallin mailto:walli...@gmail.com>> wrote: Hi, architectures that has memory alignment restrictions may/will fail with the optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3. Please revert the patch or make it X86 specific. Hi Anders, This was added in the development cycle of lttng-ust 2.9. We could perhaps add a test on the pointer alignment for architectures that care about it, and fallback to memcpy in those cases. The revert approach would have been justified if this commit had been backported as a "fix" to a stable branch, which is not the case here. We should work on finding an acceptable solution that takes care of dealing with unaligned pointers on architectures that care about the difference. Thanks, Mathieu Regards Anders Wallin commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3 Author: Mathieu Desnoyers mailto:mathieu.desnoy...@efficios.com>> Date: Sun Sep 25 12:31:11 2016 -0400 Performance: implement lttng_inline_memcpy Because all length parameters received for serializing data coming from applications go through a callback, they are never constant, and it hurts performance to perform a call to memcpy each time. Signed-off-by: Mathieu Desnoyers mailto:mathieu.desnoy...@efficios.com>> diff --git a/libringbuffer/backend_internal.h b/libringbuffer/backend_internal.h index 90088b89..e597cf4d 100644 --- a/libringbuffer/backend_internal.h +++ b/libringbuffer/backend_internal.h @@ -592,6 +592,28 @@ int update_read_sb_index(const struct lttng_ust_lib_ring_buffer_config
[lttng-dev] [RELEASE] LTTng-modules 2.12.12 and 2.13.8 (Linux kernel tracer)
Hi, Those are stable release updates of the LTTng modules project. The most relevant change is that the 2.13.8 version introduces support for the 6.1 Linux kernel, kernel version ranges updates for the RHEL kernels, and a kallsyms wrapper fix on ppc64el. The LTTng modules provide Linux kernel tracing capability to the LTTng tracer toolset. * New in these releases: 2023-01-13 (National Sticker Day) LTTng modules 2.13.8 * fix: jbd2: use the correct print format * Fix: in_x32_syscall was introduced in v4.7.0 * Explicitly skip tracing x32 system calls * fix: kallsyms wrapper on ppc64el * fix: Adjust ranges for RHEL 8.6 kernels * fix: kvm-x86 requires CONFIG_KALLSYMS_ALL * fix: mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using (v6.1) 2023-01-13 (National Sticker Day) LTTng modules 2.12.12 * fix: jbd2: use the correct print format * Fix: in_x32_syscall was introduced in v4.7.0 * Explicitly skip tracing x32 system calls * fix: kallsyms wrapper on ppc64el * fix: Adjust ranges for RHEL 8.6 kernels * fix: kvm-x86 requires CONFIG_KALLSYMS_ALL Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] LTTng UST structure support
On 2023-01-09 09:02, chafraysse--- via lttng-dev wrote: Hi, I'm looking for a CTF writer to serialize instrumentations in an embedded Linux/Rust framework LTTng UST looked like a very strong option, but I want to serialize structures as CTF compound type structures and I did not see those supported in the doc or api This is correct. I am currently working on a new project called "libside" (see https://git.efficios.com/?p=libside.git;a=summary) which features support for compound types. However, we still need to do the heavy-lifting implementation work of integrating this with LTTng-UST. This is the plan towards supporting compound types in LTTng-UST. I'd love to have confirmation that I did not just miss something :) If LTTng UST is out for me I will probably try to use the ctf-writer module of babeltrace 2 instead For now the ctf-writer modules of bt2 would be an alternative to consider, but remember that it is not designed for low-impact tracing such as lttng-ust. So it depends on how much tracer overhead/runtime impact you can afford in your use-case. Thanks, Mathieu Best regards, Charles ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] lttv: Document project status as unmaintained
Hi Florian, I'll pull your patch into the lttv master branch, but please be aware that the LTTV project has not seen activity since 2013, is currently unmaintained, and that we do not plan on doing further releases of this project. Our efforts were diverted elsewhere on the trace analysis front, namely into Trace Compass and Babeltrace. In order to clarify the situation, I will introduce a commit into LTTV's master branch which will remove Yannick Brosseau from the maintainer role in the README file, and add this section at the beginning: PROJECT STATUS The LTTV project is currently unmaintained. If you need up-to-date tools to view/analyze LTTng traces, please consider the following alternatives: - Trace Compass (https://www.eclipse.org/tracecompass) - Babeltrace (https://babeltrace.org) Thank you Yannick for stepping into the role of maintainer near the end of this project lifetime. Michael Jeanson noticed that the lttv Fedora package was orphaned. He just adopted it and is currently investigating the Fedora documentation to figure out how to request its removal from Fedora. For those interested in historical artifacts, I created the lttv svn repository back in 2003 when I was sitting at the Decelles building at Ecole Polytechnique, working for Prof. Michel Dagenais: commit bbdf43d6e0e3bd3f9ade420e81915408cbe4fbba Author: compudj Date: Thu May 15 13:07:17 2003 + Initial repository layout git-svn-id: http://ltt.polymtl.ca/svn@1 04897980-b3bd-0310-b5e0-8ef037075253 This was the beginning of a fun ride which turned out motivating the creation of LTTng, the Linux kernel Tracepoints, the Common Trace Format, Trace Compass, Babeltrace, liburcu, the membarrier(2), and the rseq(2) system calls. LTTV had a good 10 years of activity from 2003 to 2013, but it is now high time to redirect users to Trace Compass and Babeltrace instead. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] README issue of liburcu
On 2022-11-02 03:24, Yongwei Wu via lttng-dev wrote: I apologize if this is not the right place. I do not see an Issues page on GitHub. The README on GitHub now says MacOS is among "Tested on", so should we remove Darwin from "Should also work on"? Removed from README file in the master branch. Thanks, Mathieu Best regards, Yongwei -- Yongwei Wu URL: http://wyw.dcweb.cn/ <http://wyw.dcweb.cn/> ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev