Re: [lttng-dev] [lttng-relayd] is there existing cases for relayd to stream over Android usb based adb?

2024-06-04 Thread Mathieu Desnoyers via lttng-dev

On 2024-06-04 11:25, Wu, Yannan wrote:

The device is a rooted android device

*On the device:*

lttng-sessiond -d --no-kernel

lttng create my-live-session --live

lttng enable-event -u 

lttng start


*On the host:*

adb reverse tcp:5343 tcp:5343

The adb reverse will fail for "/adb.exe: error: cannot bind to socket/"


In the reversed order, if set up adb reverse from the host first and 
create the live session after, lttng-relayd on device cannot be started.

Here is the error message:


The "reverse order" you describe is the order you need. What you are
missing is to run lttng-relayd on the host and to forward both ports
5342 *and* 5343. You will also need to either override the target URLs
for the live control and data ports to prevent sessiond from auto-spawning
a relayd, or forward the live viewer port as well through adb (5344).

Overall:

* First on the Host:

lttng-relayd
adb reverse tcp:5342 tcp:5342  # control port
adb reverse tcp:5343 tcp:5343  # data port
adb reverse tcp:5344 tcp:5344  # live viewer port

* Then on the Android Device:

lttng-sessiond -d --no-kernel
lttng create my-live-session --live --ctrl-url=tcp://localhost:5342 
--data-url=tcp://localhost:5343
lttng enable-event -u 
lttng start

The reason why the relayd auto-spawn needs to be prevented is because
the "lttng create" command line attempts to connect to the localhost relayd
as a viewer (default port tcp 5344). So if you don't forward this
port as well through adb, the sessiond will always try to auto-spawn
a relayd which conflicts with your forwarded ports on the Android
device.

Technically either forwarding port 5344 or specifying control/data
URL override is sufficient to prevent the relayd auto-spawn, but
I'd recommend doing both if it is possible.

Thanks,

Mathieu



/PERROR - 15:23:30.915938387 [9813/9829]: Failed to bind socket: Address 
already in use (in relay_socket_create() at 
/src/VodkaLttngTool/build/private/source/src/bin/lttng-relayd/main.c:1036)/

/Error: Health error occurred in relay_thread_listener/
/Error: A file descriptor leak has been detected: 1 tracked file 
descriptors are still being tracked/


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [lttng-relayd] is there existing cases for relayd to stream over Android usb based adb?

2024-06-04 Thread Mathieu Desnoyers via lttng-dev

Hi Amanda,

For each of the 4 commands described below, please clarify on
which device they are executed, whether on the Android device
or on the Development device.

Please make sure to follow to the letter the commands proposed
by Kienan: in the correct order, and on the appropriate device.

Thanks,

Mathieu


On 2024-06-02 22:55, Wu, Yannan via lttng-dev wrote:

Yes. My test command is like below:

 1. lttng-sessiond --d --no-kernel
 2.

yannanwu@ue91e96f2951b5c:~/trees/lttng_test_run$ lttng create 
my-user-space-live-session --live
Live session my-user-space-live-session created.
Traces will be output to tcp4://127.0.0.1:5342/ [data: 5343]
Live timer interval set to 100 us

 3. After this, I could "ps -Ax|grep lttng" and see lttng-relayd
started. But once I start adb reverse, it will failed for failed
binding to socket.
 4. In the other order, if I start adb reverse first and lttng-create
later, lttng-create will not fail but lttng-relayd is not started.
By manually start lttng-relayd it will also failed for unable
binding to socket.

Amanda


*From:* Kienan Stewart 
*Sent:* Friday, May 31, 2024 3:12:16 AM
*To:* Wu, Yannan; lttng-dev@lists.lttng.org
*Subject:* RE: [EXTERNAL] [lttng-dev] [lttng-relayd] is there existing 
cases for relayd to stream over Android usb based adb?
CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and 
know the content is safe.




Hi Amanda,

I'd like to confirm my understanding the situation.

Android device
    - Running lttng-sessiond with one or more configured sessions

Development device
    - Connected to the android device over usb using adb

You want want the data captured on the android device to be streamed via
the usb connection rather than the other networks on the android device.

Could you expand on the commands you used to set up the tracing sessions
and relay, and where each of those commands were run?

It sounds to me like you might want to be doing something like the
following:

(Development device) Start lttng-relayd:
    - tcp://0.0.0.0:5342 and :5343 will be bound on the development device
    - tcp://127.0.0.1:5344 will be available for the live reader

(Development device) Create the reverses for the following ports: 5342
and 5343
    - At this point :5342 and :5343 should be available on the android
device and reach the relayd running on the development device

(Android device) Start lttng-sessiond
(Android device) Create session(s): `lttng create -U tcp://localhost/
    - Using `-U/--set-url`, no relayd will be spawned on the android device
(Android device) Start session(s)

This setup should have the relayd running on the development and writing
the traces there and/or viewing them with a live viewer. On the android
device, the UST applications (if any) will connect to the local sessiond
and consumers, which will shuttle the information over :5342 and :5343
to the developer device via the reverse sockets.

Please note that I didn't have time to test this, so there might be some
mistakes. As I requested above, clear details of the exact commands you
use for the tracing setup would be very helpful to have the clearest
understanding of what you're doing.

hope this helps,
kienan

On 5/30/24 1:53 AM, Wu, Yannan via lttng-dev wrote:

Hihi, there,

I am currently working on enabling lttng live mode over android usb adb.
Here is the situation, during debugging some network related issues, we
dont want the trace data to be streamed via network to cause extra load
to the system being profiled. Then we select to connect lttng-relayd
with adb via port forwarding so that the data is "forward" to the host.

*Here is the set up and the problem:*

for the device:  adb reverse tcp:5342 tcp:5342; adb reverse tcp:5343
tcp:5343; adb reverse tcp:5344 tcp:5344
Then starting up lttng with --live enabled.

*What is expected:*
lttng start streaming to the localhost.
*What is seen: *
the lttng-relayd failed to start. For unable binding to the socket.

*The cause of this issue: *

both adb reverse and lttng relayd need binding to the socket which is
conflict with each other.


So what I wanna ask is, for embedded system use cases, do we have
successful use cases among team that could stream the trace data in live
mode to the host with usb based adb? If not, any idea or suggestion to
me on how to process forward?

Amanda





___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev 



___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Re: [lttng-dev] [PATCH] Fix mm_vmscan_lru_isolate tracepoint for RHEL 9.4 kernel

2024-05-22 Thread Mathieu Desnoyers via lttng-dev

On 2024-05-17 12:04, Kienan Stewart via lttng-dev wrote:

Hi Martin,

thanks for the patch.

I changed the version range slightly. The RHEL kernel 5.14.0-427.13.1 
still has the `isolate_mode` parameter in the `mm_vmscan_lru_isolate` 
tracepoint; it was only removed in 5.14.0-427.16.1.


I also forward ported the patch to the master branch.

The updated patches will be reviewed at: 
https://review.lttng.org/q/topic:%22buildfix-el9.4%22


Merged into lttng-modules master and stable-2.13, thanks!

Mathieu



thanks,
kienan

On 5/17/24 10:30 AM, Martin Hicks via lttng-dev wrote:



Redhat has moved to using the format first found in the 6.7 kernel
for the mm_vmscan_lru_isolate tracepoint.

Signed-off-by: Martin Hicks 
---
  include/instrumentation/events/mm_vmscan.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/instrumentation/events/mm_vmscan.h 
b/include/instrumentation/events/mm_vmscan.h

index ea6f4b7..49a9eae 100644
--- a/include/instrumentation/events/mm_vmscan.h
+++ b/include/instrumentation/events/mm_vmscan.h
@@ -369,7 +369,9 @@ LTTNG_TRACEPOINT_EVENT_MAP(mm_shrink_slab_end,
  )
  #endif
-#if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(6,7,0))
+#if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(6,7,0) || \
+ LTTNG_RHEL_KERNEL_RANGE(5,14,0,427,0,0, 5,15,0,0,0,0))
+
  LTTNG_TRACEPOINT_EVENT(mm_vmscan_lru_isolate,
  TP_PROTO(int classzone_idx,

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Capturing snapshot on kernel panic

2024-05-16 Thread Mathieu Desnoyers via lttng-dev

Hi Damien,

If kexec is not an option on your system, you might be able to
access the pmem+dax filesystem after a warm reboot, but it very
much depends on whether your bios clears your memory or not on
warm reboot.

Cheers,

Mathieu

On 2024-05-16 14:22, Damien Berget via lttng-dev wrote:

Thanks Kienan for these quick suggestions,
we'll investigate the pmem route (I was not aware of the lttng-cash 
utility, it's pretty nice) even if I'm not sure how fast it would burn 
through our SSD, it might still be worth trying.
As for kexec-tool, it's not officially supported on our embedded modules 
unfortunately, so we might be SOL there. We may have to try to add our 
own trace-point in kernel to use as trigger.

Cheers
Damien

On Thu, May 16, 2024 at 8:12 AM Kienan Stewart > wrote:


Hi Damien,

I want to expand on one of the options that could work for your case.

On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
 > Hi Damien,
 >
 >
 > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
 >> Good day,
 >> we have been using LTTng successfully to capture snapshots on user
 >> defined tracepoints and it did provide invaluable to debug our
issues.
 >> Thanks to all the contributors of this project!
 >>
 >> We'd like to know if it would be possible to trigger on a kernel
 >> panic? I might be dubiously possible as you would still need to
have
 >> the file-system working to write the results but I should ask.
 >>
 >
 > For userspace tracing, I think the recommendation is usually to
use a
 > dax/pmem device and have the buffers for the session mapped
there. After
 > a panic, the contents of the buffers can be restored using
lttng-crash[1].
 >
 > Note that dax/pem isn't supported by the kernel space tracer at
this time.
 >
 > If I recall, there are other ways to things in the panic sequence
(that
 > aren't lttng specific), but I'm personally not as familiar with the
 > details of that stage of linux.
 >

It's possible to kexec-tools to load a new kernel post-panic[1]. If
your
system uses kexec, the contents of RAM aren't necessarily flushed, and
if both the initial kernel and post-panic kernel started by kexec have
the same configuration for an emulated PMEM device using the memmap
paramenter [2,3] that region of memory can have a daxfs created in it
post-clean boot.

Note: some systems may not flush the memory during a warm reboot, but
this is dependent on the BIOS.

When your system boots you could do something like the following:

   * If it's a clean boot, create the daxfs
   * If it's an "unclean" boot (e.g. the daxfs already exists, or a
kernel parameter informs you that it started post-panic) then you can
copy/move/use lttng-crash to persistent storage for analysis
   * Start tracing using a snapshot session and the userspace
buffers on
the daxfs.

In this type of situation the "snapshot" command is never invoked
directly, but the recovery of the buffers to create a snapshot is
possible.

[1]:
https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html

[2]:
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 

[3]:

https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap
 


thanks,
kienan

 >> Looking at available kernel syscall, the "reboot" one seems like a
 >> good candidate, however I was not able to capture a snapshot on
it. I
 >> have tested the setup below with "--name=chdir" syscall and it
 >> works, "cd" to a directory will create a trace. But no dice with
reboot.
 >>
 >
 > The details of how this work will depend on your system. For
example, my
 > installations tend to use systemd as PID 1. The broad strokes
seem to
 > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
 > believe then kicks off the reboot.service, the PID 1 is swapped to
 > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent
to all
 > processes, unmounts, syncs, calls the reboot system call [2,3].
 >
 > As both the sigterm and the unmounts are done before the syscall,
 > lttng-sessiond and the consumers will have already shutdown by
the time
 > it enters.
 >
 > While this doesn't necessarily help your original question of
panics, if
 > you want to snapshot before shutdown or reboot and are using
systemd,
 > it's possible to leave a script or 

[lttng-dev] [RELEASE] LTTng-modules 2.13.13 and 2.12.17 (Linux kernel tracer)

2024-05-13 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a stable release announcement for the LTTng kernel tracer,
an out-of-tree kernel tracer for the Linux kernel.

The LTTng project provides low-overhead, correlated userspace and
kernel tracing on Linux. Its use of the Common Trace Format and a
flexible control interface allows it to fulfill various workloads.

* New in these releases:

- LTTng-modules 2.13.13:

  - Introduce support for Linux v6.9.

  - Removed unused duplicated code, add missing static to
function definitions, and add missing includes for function
declarations which were observed when building against recent
kernels with newer toolchains.
We plan to adapt our CI to add jobs that will report warnings
as errors when building lttng-modules against recent kernels
with a recent tool chain so we can catch and fix those warnings
earlier in the future.

- In both LTTng-modules 2.12.17 and 2.13.13:

  - Fix incorrect get_pfnblock_flags_mask prototype which did not match
upstream after upstream commit 535b81e209219 (v5.9). Fix the prototype
mismatch detection code as well. This affects the event
mm_page_alloc_extfrag which uses get_pageblock_migratetype(). Note that
because the kernel macro get_pageblock_migratetype was also updated
to pass 3 parameters to get_pfnblock_flags_mask as its kernel prototype
was updated to expect three parameters, it does not matter that the
lttng-modules wrapper expects 4 parameters and provides those 4 parameters
to the kernel function. This issue should therefore not affect the
runtime behavior.

  - Instrumentation updates to support EL 8.4+.

  - Instrumentation updates for RHEL kernels.

  - Instrumentation updates to the timer subsystem to adapt to
changes backported in the 4.19 stable kernels.


* Detailed change logs:

2024-05-13 (National Leprechaun Day) LTTng modules 2.13.13
* splice wrapper: Fix missing declaration
* page alloc wrapper: Fix get_pfnblock_flags_mask prototype
* lttng probe: include events-internal.h
* syscalls: Remove unused duplicated code
* statedump: Add missing events-internal.h include
* lttng-events: Add missing static
* event notifier: Add missing static
* context callstack: Add missing static
* lttng-clock: Add missing lttng/events-internal.h include
* lttng-calibrate: Add missing static and include
* lttng-bytecode: Remove dead code
* lttng-abi: Add missing static to function definitions
* ring buffer: Add missing static to function definitions
* blkdev wrapper: Fix constness warning
* Fix: timer_expire_entry changed in 4.19.312
* Fix: dev_base_lock removed in linux 6.9-rc1
* Fix: mm_compaction_migratepages changed in linux 6.9-rc1
* Fix: ASoC add component to set_bias_level events in linux 6.9-rc1
* Fix: ASoC snd_doc_dapm on linux 6.9-rc1
* Fix: build kvm probe on EL 8.4+
* Fix: support ext4_journal_start on EL 8.4+
* Fix: correct RHEL range for kmem_cache_free define

2024-05-13 (National Leprechaun Day) 2.12.17
* page alloc wrapper: Fix get_pfnblock_flags_mask prototype
* Fix: timer_expire_entry changed in 4.19.312
* Fix: build kvm probe on EL 8.4+
* Fix: support ext4_journal_start on EL 8.4+
* Fix: correct RHEL range for kmem_cache_free define

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH urcu] fix: handle EINTR correctly in get_cpu_mask_from_sysfs

2024-05-02 Thread Mathieu Desnoyers via lttng-dev

On 2024-05-02 10:32, Michael Jeanson wrote:

On 2024-05-02 09:54, Mathieu Desnoyers wrote:

On 2024-05-01 19:42, Benjamin Marzinski via lttng-dev wrote:

If the read() in get_cpu_mask_from_sysfs() fails with EINTR, the code is
supposed to retry, but the while loop condition has (bytes_read > 0),
which is false when read() fails with EINTR. The result is that the code
exits the loop, having only read part of the string.

Use (bytes_read != 0) in the while loop condition instead, since the
(bytes_read < 0) case is already handled in the loop.


Thanks for the fix ! It is indeed the right thing to do.

I would like to integrate this fix into the librseq and libside
projects as well though, but I notice the the copy in liburcu
is LGPLv2.1 whereas the copy in librseq and libside are
MIT.

Michael, should we first relicense the liburcu src/compat-smp.h
implementation to MIT so it matches the license of the copies
in librseq and libside ?


Sure, please go ahead.


For the records, we also have a copy of this code in lttng-ust,
also under MIT license. So liburcu's copy is the only outlier
there.

Thanks,

Mathieu




Thanks,

Mathieu



Signed-off-by: Benjamin Marzinski 
---
  src/compat-smp.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compat-smp.h b/src/compat-smp.h
index 31fa979..075a332 100644
--- a/src/compat-smp.h
+++ b/src/compat-smp.h
@@ -164,7 +164,7 @@ static inline int get_cpu_mask_from_sysfs(char 
*buf, size_t max_bytes, const cha

  total_bytes_read += bytes_read;
  assert(total_bytes_read <= max_bytes);
-    } while (max_bytes > total_bytes_read && bytes_read > 0);
+    } while (max_bytes > total_bytes_read && bytes_read != 0);
  /*
   * Make sure the mask read is a null terminated string.






--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH urcu] fix: handle EINTR correctly in get_cpu_mask_from_sysfs

2024-05-02 Thread Mathieu Desnoyers via lttng-dev

On 2024-05-01 19:42, Benjamin Marzinski via lttng-dev wrote:

If the read() in get_cpu_mask_from_sysfs() fails with EINTR, the code is
supposed to retry, but the while loop condition has (bytes_read > 0),
which is false when read() fails with EINTR. The result is that the code
exits the loop, having only read part of the string.

Use (bytes_read != 0) in the while loop condition instead, since the
(bytes_read < 0) case is already handled in the loop.


Thanks for the fix ! It is indeed the right thing to do.

I would like to integrate this fix into the librseq and libside
projects as well though, but I notice the the copy in liburcu
is LGPLv2.1 whereas the copy in librseq and libside are
MIT.

Michael, should we first relicense the liburcu src/compat-smp.h
implementation to MIT so it matches the license of the copies
in librseq and libside ?

Thanks,

Mathieu



Signed-off-by: Benjamin Marzinski 
---
  src/compat-smp.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compat-smp.h b/src/compat-smp.h
index 31fa979..075a332 100644
--- a/src/compat-smp.h
+++ b/src/compat-smp.h
@@ -164,7 +164,7 @@ static inline int get_cpu_mask_from_sysfs(char *buf, size_t 
max_bytes, const cha
  
  		total_bytes_read += bytes_read;

assert(total_bytes_read <= max_bytes);
-   } while (max_bytes > total_bytes_read && bytes_read > 0);
+   } while (max_bytes > total_bytes_read && bytes_read != 0);
  
  	/*

 * Make sure the mask read is a null terminated string.


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng-UST 2.12.10 and 2.13.8 (Linux user-space tracer)

2024-04-19 Thread Mathieu Desnoyers via lttng-dev

LTTng-UST, the Linux Trace Toolkit Next Generation Userspace Tracer,
is a low-overhead application tracer. The library "liblttng-ust" enables
tracing of applications and libraries.

New in both 2.12.10 and 2.13.8:

* Add close_range wrapper to liblttng-ust-fd.so

GNU libc 2.34 implements a new close_range symbol which is used
by the ssh client and other applications to close all file descriptors,
including those which do not belong to the application. Override
this symbol to prevent the application from closing file descriptors
actively used by lttng-ust.

* Fix: libc wrapper: use initial-exec for malloc_nesting TLS

Use the initial-exec TLS model for the malloc_nesting nesting guard
variable to ensure that the GNU libc implementation of the TLS access
don't trigger infinite recursion by calling the memory allocator wrapper
functions, which can happen with global-dynamic.

This fixes a liblttng-ust-libc-wrapper.so regression on recent
Fedora distributions.

* lttng-ust(3): Fix wrong len_type for sequence

`len_type' of a sequence field must be of type unsigned integer. Some
provided examples in the man page were incorrectly using a type signed
integer, resulting in correct compilation, but error while decoding.

New in 2.13.8:

* ust-tracepoint-event: Add static check of sequences length type

Add a compile-time check to validate that unsigned types are used
for the length field of sequences.

Detailed change logs:

2024-04-19 (National Garlic Day) lttng-ust 2.13.8
* Add close_range wrapper to liblttng-ust-fd.so
* ust-tracepoint-event: Add static check of sequences length type
* lttng-ust(3): Fix wrong len_type for sequence
* Fix: libc wrapper: use initial-exec for malloc_nesting TLS

2024-04-19 (National Garlic Day) lttng-ust 2.12.10
* Add close_range wrapper to liblttng-ust-fd.so
* lttng-ust(3): Fix wrong len_type for sequence
* Fix: libc wrapper: use initial-exec for malloc_nesting TL


Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Software Heritage archival notification for git.liburcu.org

2024-04-15 Thread Mathieu Desnoyers via lttng-dev

On 2024-04-15 10:20, Michael Jeanson via lttng-dev wrote:

On 2024-04-14 20:39, Paul Wise wrote:

On Thu, 2024-04-11 at 13:45 -0400, Michael Jeanson wrote:


I see no issues with this, thanks for the heads-up.


PS: I note that git.liburcu.org and git.lttng.org seem to have
identical contents. I wonder if SWH should be archiving just one of
them or if we should archive both just in case they get split up?


At the moment 'git.liburu.org' is just a CNAME for 'git.lttng.org', we 


Typo: git.liburcu.org

Thanks,

Mathieu


don't have plans to split them up so far.

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Compile fix for urcu-bp.c

2024-04-01 Thread Mathieu Desnoyers via lttng-dev

Hi,

There are a few things missing before I can take this patch:

- Missing commit message describing the issue,
- Missing "Signed-off-by" tag.

Thanks!

Mathieu

On 2024-03-29 10:06, Duncan Sands via lttng-dev wrote:

--- src/urcu-bp.c
+++ src/urcu-bp.c
@@ -409,7 +409,7 @@ void expand_arena(struct registry_arena *arena)
  new_chunk_size_bytes, 0);
  if (new_chunk != MAP_FAILED) {
  /* Should not have moved. */
-    assert(new_chunk == last_chunk);
+    urcu_posix_assert(new_chunk == last_chunk);
  memset((char *) last_chunk + old_chunk_size_bytes, 0,
  new_chunk_size_bytes - old_chunk_size_bytes);
  last_chunk->capacity = new_capacity;
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng-modules 2.12.16 and 2.13.12 (Linux kernel tracer)

2024-03-21 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a release announcement for the currently maintained
LTTng-modules Linux kernel tracer stables branches.

* New and noteworthy in these releases:

Linux kernel v6.8 is now supported by LTTng modules 2.13.12. If you need
support for recent kernels (v5.18+), you will need to upgrade to a
recent LTTng-modules 2.13.x.

Both releases correct issues with SLE kernel version ranges detection.

A compilation fix for RHEL 9.3 kernel is present in v2.13.12.

Feedback is welcome!

Thanks,

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

Detailed change logs:

2024-03-21 (National Common Courtesy Day) LTTng modules 2.13.12
* docs: Add supported versions and fix-backport policy
* docs: Add links to project resources
* Fix: Correct minimum version in jbd2 SLE kernel range
* Fix: Handle recent SLE major version codes
* Fix: build on sles15sp4
* Compile fixes for RHEL 9.3 kernels
* Fix: ext4_discard_preallocations changed in linux 6.8.0-rc3
* Fix: btrfs_get_extent flags and compress_type changed in linux 
6.8.0-rc1
* Fix: btrfs_chunk tracepoints changed in linux 6.8.0-rc1
* Fix: strlcpy removed in linux 6.8.0-rc1
* Fix: timer_start changed in linux 6.8.0-rc1
* Fix: sched_stat_runtime changed in linux 6.8.0-rc1

2024-03-21 (National Common Courtesy Day) 2.12.16
* fix: lttng-probe-kvm-x86-mmu build with linux 6.6
* docs: Add supported versions and fix-backport policy
* docs: Add links to project resources
* Fix: Correct minimum version in jbd2 SLE kernel range
* Fix: Handle recent SLE major version codes
* Fix: build on sles15sp4

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] coredump debugging: add a tracepoint to report the coredumping

2024-02-23 Thread Mathieu Desnoyers via lttng-dev

On 2024-02-23 09:26, Steven Rostedt wrote:

On Mon, 19 Feb 2024 13:01:16 -0500
Mathieu Desnoyers  wrote:


Between "sched_process_exit" and "sched_process_free", the task can still be
observed by a trace analysis looking at sched and signal events: it's a zombie 
at
that stage.


Looking at the history of this tracepoint, it was added in 2008 by commit
0a16b60758433 ("tracing, sched: LTTng instrumentation - scheduler").
Hmm, LLTng? I wonder who the author was?


[ common typo: LLTng -> LTTng ;-) ]



   Author: Mathieu Desnoyers 

  :-D

Mathieu, I would say it's your call on where the tracepoint can be located.
You added it, you own it!


Wow! that's now 16 years ago :)

I've checked with Matthew Khouzam (maintainer of Trace Compass)
which care about this tracepoint, and we have not identified any
significant impact of moving it on its model of the scheduler, other
than slightly changing its timing.

I've also checked quickly in lttng-analyses and have not found
any code that care about its specific placement.

So I would say go ahead and move it earlier in do_exit(), it's
fine by me.

If you are interested in a bit of archeology, "sched_process_free"
originated from my ltt-experimental 0.1.99.13 kernel patch against
2.6.12-rc4-mm2 back in September 2005 (that's 19 years ago). It was
a precursor to the LTTng 0.x kernel patchset.

https://lttng.org/files/ltt-experimental/patch-2.6.12-rc4-mm2-ltt-exp-0.1.99.13.gz

Index: kernel/exit.c
===
--- a/kernel/exit.c (.../trunk/kernel/linux-2.6.12-rc4-mm2) (revision 41)
+++ b/kernel/exit.c (.../branches/mathieu/linux-2.6.12-rc4-mm2) 
(revision 41)
@@ -4,6 +4,7 @@
  *  Copyright (C) 1991, 1992  Linus Torvalds
  */
 
+#include 

 #include 
 #include 
 #include 
@@ -55,6 +56,7 @@ static void __unhash_process(struct task
}
 
 	REMOVE_LINKS(p);

+  trace_process_free(p->pid);
 }
 
 void release_task(struct task_struct * p)

@@ -832,6 +834,8 @@ fastcall NORET_TYPE void do_exit(long co
}
exit_mm(tsk);
 
+	trace_process_exit(tsk->pid);

+
exit_sem(tsk);
__exit_files(tsk);
__exit_fs(tsk);

This was a significant improvement over the prior LTT which only
had the equivalent of "sched_process_exit", which caused issues
with the Linux scheduler model in LTTV due to zombie processes.

Here is where it appeared in LTT back in 1999:

http://www.opersys.com/ftp/pub/LTT/TracePackage-0.9.0.tgz

patch-ltt-2.2.13-991118

diff -urN linux/kernel/exit.c linux-2.2.13/kernel/exit.c
--- linux/kernel/exit.c Tue Oct 19 20:14:02 1999
+++ linux-2.2.13/kernel/exit.c  Sun Nov  7 23:49:17 1999
@@ -14,6 +14,8 @@
 #include 
 #endif
 
+#include 

+
 #include 
 #include 
 #include 
@@ -386,6 +388,8 @@
del_timer(>real_timer);
end_bh_atomic();
 
+	TRACE_PROCESS(TRACE_EV_PROCESS_EXIT, 0, 0);

+
lock_kernel();
 fake_volatile:
 #ifdef CONFIG_BSD_PROCESS_ACCT

And it was moved to its current location (after exit_mm()) a bit
later (2001):

http://www.opersys.com/ftp/pub/LTT/TraceToolkit-0.9.5pre2.tgz

Patches/patch-ltt-linux-2.4.5-vanilla-010909-1.10

diff -urN linux/kernel/exit.c /ext2/home/karym/kernel/linux-2.4.5/kernel/exit.c
--- linux/kernel/exit.c Fri May  4 17:44:06 2001
+++ /ext2/home/karym/kernel/linux-2.4.5/kernel/exit.c   Wed Jun 20 12:39:24 2001
@@ -14,6 +14,8 @@
 #include 
 #endif
 
+#include 

+
 #include 
 #include 
 #include 
@@ -439,6 +441,8 @@
 #endif
__exit_mm(tsk);
 
+	TRACE_PROCESS(TRACE_EV_PROCESS_EXIT, 0, 0);

+
lock_kernel();
sem_exit();
__exit_files(tsk);

So this sched_process_exit placement was actually decided
by Karim Yaghmour back in the LTT days (2001). I don't think
he will mind us moving it around some 23 years later. ;)

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] New TLS usage in libgcc_s.so.1, compatibility impact

2024-01-15 Thread Mathieu Desnoyers via lttng-dev

On 2024-01-15 14:42, Florian Weimer wrote:

* Mathieu Desnoyers:


[...]


General use of lttng should be fine, I think, only the malloc wrapper
has this problem.


The purpose of the nesting counter TLS variable in the malloc wrapper
is to catch situations like this where a global-dynamic TLS access
(or any unexpected memory access done as a side-effect from calling
libc) from within LTTng-UST instrumentation would internally attempt to
call recursively into the malloc wrapper. In that nested case, we skip
the instrumentation and call the libc function directly.

I agree with your conclusion that only this nesting counter gating variable
actually needs to be initial-exec.




But moving all TLS variables used by lttng-ust from global-dynamic to
initial-exec is tricky, because a prior attempt to do so introduced
regressions in use-cases where lttng-ust was dlopen'd by Java or
Python, AFAIU situations where the runtimes were already using most of
the extra memory pool for dlopen'd libraries initial-exec variables,
causing dlopen of lttng-ust to fail.


Oh, right, that makes it quite difficult.  Could you link a private copy
of the libraries into the wrapper that uses initial-exec TLS?


Unfortunately not easily, because by design LTTng-UST is meant to be a
singleton per-process. Changing this would have far-reaching impacts on
interactions with the LTTng-UST tracepoint instrumentation, as well as
impacts on synchronization between the LTTng-UST agent thread and
application calling fork/clone. Also AFAIR, the LTTng session daemon
(at least until recently) does not expect multiple concurrent
registrations from a given process.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] New TLS usage in libgcc_s.so.1, compatibility impact

2024-01-15 Thread Mathieu Desnoyers via lttng-dev

On 2024-01-13 07:49, Florian Weimer via lttng-dev wrote:

This commit

commit 8abddb187b33480d8827f44ec655f45734a1749d
Author: Andrew Burgess 
Date:   Sat Aug 5 14:31:06 2023 +0200

 libgcc: support heap-based trampolines
 
 Add support for heap-based trampolines on x86_64-linux, aarch64-linux,

 and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
 __builtin_nested_func_ptr_deleted functions for these targets.
 
 Co-Authored-By: Maxim Blinov 

 Co-Authored-By: Iain Sandoe 
 Co-Authored-By: Francois-Xavier Coudert 

added TLS usage to libgcc_s.so.1.  The way that libgcc_s is currently
built, it ends up using a dynamic TLS variant on the Linux targets.
This means that there is no up-front TLS allocation with glibc (but
there would be one with musl).


Trying to wrap my head around this:

If I get this right, the previous behavior was that glibc did allocate
global-dynamic variables from libraries which are preloaded and loaded
on c startup as if they were initial-exec, but now that libgcc_s.so.1
has a dynamic TLS variable, all those libraries loaded on c startup that
have global-dynamic TLS do not get the initial allocation special
treatment anymore. Is that more or less correct ?

(note: it's entirely possible that my understanding is entirely wrong,
please correct me if it's the case)



There is still a compatibility impact because glibc assigns a TLS module
ID upfront.  This seems to be what causes the
ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail.  We end
up with an infinite regress during process termination because
libgcc_s.so.1 has been loaded, resulting in a DTV update.  When this
happens, the bottom of the stack looks like this:

#4447 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
#4448 0x77fdb142 in free (ptr=)
 at ../include/rtld-malloc.h:50
#4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822
#4450 0x77fdb214 in update_get_addr (ti=0x77f2bfc0,
 gen=) at ../elf/dl-tls.c:916
#4451 0x77fddccc in __tls_get_addr ()
 at ../sysdeps/x86_64/tls_get_addr.S:55
#4452 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
#4453 0x77fdb142 in free (ptr=)
 at ../include/rtld-malloc.h:50
#4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822
#4455 0x77fdb214 in update_get_addr (ti=0x77f39fa0,
 gen=) at ../elf/dl-tls.c:916
#4456 0x77fddccc in __tls_get_addr ()
 at ../sysdeps/x86_64/tls_get_addr.S:55
#4457 0x77f36113 in lttng_ust_cancelstate_disable_push ()
from /lib64/liblttng-ust-common.so.1
#4458 0x77f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1
#4459 0x77f5175a in lttng_ust_cleanup () from /lib64/liblttng-ust.so.1
#4460 0x77fca0f2 in _dl_call_fini (
 closure_map=closure_map@entry=0x77fbe000) at dl-call_fini.c:43
#4461 0x77fce06e in _dl_fini () at dl-fini.c:114
#4462 0x77d82fe6 in __run_exit_handlers () from /lib64/libc.so.6

Cc:ing  for awareness.


I've prepared a change for lttng-ust to move the lttng-ust libc wrapper
"malloc nesting" guard variable from global-dynamic to initial-exec:

https://review.lttng.org/c/lttng-ust/+/11677 Fix: libc wrapper: use 
initial-exec for malloc_nesting TLS

This should help for the infinite recursion issue, but if my understanding
is correct about the impact of effectively changing the behavior used
for global-dynamic variables in preloaded and on-startup-loaded libraries
introduced by this libgcc change, I suspect we have other new issues here,
such as problems with async-signal safety of other global-dynamic variables
within LTTng-UST.

But moving all TLS variables used by lttng-ust from global-dynamic to
initial-exec is tricky, because a prior attempt to do so introduced regressions
in use-cases where lttng-ust was dlopen'd by Java or Python, AFAIU situations
where the runtimes were already using most of the extra memory pool for
dlopen'd libraries initial-exec variables, causing dlopen of lttng-ust
to fail.

Thanks Florian for letting us know about this,

Mathieu



The issue also requires a recent glibc with changes to DTV management:
commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls
access after dlopen [BZ #19924]").  If I understand things correctly,
before this glibc change, we didn't deallocate the old DTV, so there was
no call to the free function.

On the glibc side, we should recommend that intercepting mallocs and its
dependencies use initial-exec TLS because that kind of TLS does not use
malloc.  If intercepting mallocs using dynamic TLS work at all, that's
totally by accident, and was in the past helped by glibc bug 19924.  (I
don't think there is anything special about libgcc_s.so.1 that triggers
the test failure above, it is just an object with dynamic TLS that is
implicitly loaded via dlopen at the right stage of the test.)  In 

[lttng-dev] [RELEASE] LTTng-modules 2.12.15 and 2.13.11 (Linux kernel tracer)

2024-01-10 Thread Mathieu Desnoyers via lttng-dev

The LTTng modules provide Linux kernel tracing capability to the LTTng
tracer toolset.

* New and noteworthy in these releases:

Newer Linux kernels (v6.6 and v6.7) are now supported by LTTng modules
2.13.11. If you need support for recent kernels (v5.18+), you will
need to upgrade to a recent LTTng-modules 2.13.x.

The "prio" context has been fixed in 2.13.11 to eliminate a crash
triggered by calling a NULL pointer address when using the "prio"
context (lttng add-context -k -t prio). This issue was introduced
when refactoring the prio context code during the 2.13 development.
The missing initialization was re-introduced, and the use of the kernel
"task_prio()" symbol was entirely replaced by inlining a copy of this
trivial function into lttng-modules instead.

The "built-in.sh" script which can be used to add a link to lttng-modules
within a kernel source tree to built LTTng into a Linux kernel image
has been updated to adapt to changes introduced in Linux v6.1.

A work-around to ensure that LTTng-modules works fine on CPUs and kernels
with IBT support enabled has been integrated:

When the Intel IBT feature is enabled, a CPU supporting this feature
validates that all indirect jumps/calls land on an ENDBR64 instruction.

The kernel seals functions which are not meant to be called indirectly,

which means that calling functions indirectly from their address fetched
using kallsyms or kprobes trigger a crash.

Use the MSR_IA32_S_CET CET_ENDBR_EN MSR bit to temporarily disable ENDBR

validation around indirect calls to kernel functions. Considering that
the main purpose of this feature is to prevent ROP-style attacks,
disabling the ENDBR validation temporarily around the call from a kernel
module does not affect the ROP protection.


Both 2.13.11 and 2.12.15:

- Fix an issue with importing VFS namespace for Android kernels.

- Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+

- Fix a hardening OOPS during validation of immediate strings in the bytecode
  validator when CONFIG_UBSAN_BOUNDS and/or CONFIG_FORTIFY_SOURCE are
  configured. It boils down to changing 0-len arrays to flexible arrays
  to let the toolchain know about our intent.

- Add Ubuntu Kinetic kernel ranges for jbd2 instrumentation.

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

Detailed change logs:

2024-01-10 (National Houseplant Appreciation Day) LTTng modules 2.13.11
* Fix: Include linux/sched/rt.h for kernels v3.9 to v3.14
* Fix: Disable IBT around indirect function calls
* Inline implementation of task_prio()
* Fix: prio context NULL pointer exception
* Fix: MODULE_IMPORT_NS is introduced in kernel 5.4
* Android: Import VFS namespace for android common kernel
* Fix: get_file_rcu is missing in kernels < 4.1
* fix: lookup_fd_rcu replaced by lookup_fdget_rcu in linux 6.7.0-rc1
* fix: mm, vmscan signatures changed in linux 6.7.0-rc1
* fix: phys_proc_id and cpu_core_id moved in linux 6.7.0-rc1
* Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+
* Fix: bytecode validator: oops during validation of immediate string
* fix: lttng-probe-kvm-x86-mmu build with linux 6.6
* fix: built-in lttng with kernel >= v6.1
* fix: ubuntu kinetic kernel range for jdb2

2024-01-10 (National Houseplant Appreciation Day) 2.12.15
* Fix: MODULE_IMPORT_NS is introduced in kernel 5.4
* Android: Import VFS namespace for android common kernel
* Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+
* Fix: bytecode validator: oops during validation of immediate string
* fix: ubuntu kinetic kernel range for jdb2

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng-UST 2.12.9 and 2.13.7 (Linux user-space tracer)

2024-01-10 Thread Mathieu Desnoyers via lttng-dev

LTTng-UST, the Linux Trace Toolkit Next Generation Userspace Tracer,
is a low-overhead application tracer. The library "liblttng-ust" enables
tracing of applications and libraries.

* New and noteworthy in these releases:

Specific to 2.13.7, a fix for misaligned urcu reader accesses was
introduced. It only applies to the lttng-ust 2.13 branch because
it implements its own "lttng-ust-urcu" flavor.

Also specific to 2.13.7, "sync" vs "unsync" enablers are introduced
to eliminate an O(n*m) algorithm:

Eliminate iteration over unmodified enablers when synchronizing the
enablers vs event state.

The intent is to turn a O(m*n) algorithm (m = number of enablers, n =
number of event probes) into a O(n) when enabling many additional events
when tracing is active.

Specifically in 2.12.9, the rfork() wrapper is fixed: it was not
passing the flags arguments. This was fixed in a larger commit
in the master and stable-2.13 branches.

Both stable branches include:

- a build system fix for documentation examples with old autoconf when
  used with a relative path.

- a clang warning fix around volatile qualifier on function pointers.

- Python agent uplift to adapt to modern python (>= 3.10),

- a possible race condition in the ustfork helper.

Enjoy!

Mathieu

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

Detailed change logs:

2024-01-10 (National Houseplant Appreciation Day) lttng-ust 2.13.7
* fix: invoke MKDIR_P before changing directories
* fix: -Wsingle-bit-bitfield-constant-conversion with clang16
* fix: clean java inner class files in examples
* Introduce sync vs unsync enablers
* Fix: misaligned urcu reader accesses
* ustfork: Fix warning about volatile qualifier
* ustfork: Fix possible race conditions
* Fix: tracepoint: Remove trailing \ at the end of macro
* fix: python agent: use stdlib distutils when setuptools is installed
* fix: python agent: install on Debian python >= 3.10
* fix: python agent: Add a dependency on generated files
* python: use setuptools with python >= 3.12

2024-01-10 (National Houseplant Appreciation Day) lttng-ust 2.12.9
* fix: invoke MKDIR_P before changing directories
* fix: clean java inner class files in examples
* ustfork: Fix warning about volatile qualifier
* ustfork: Fix possible race conditions
* Fix: FreeBSD: Pass flags arguments to rfork wrapper
* fix: python agent: use stdlib distutils when setuptools is installed
* fix: python agent: install on Debian python >= 3.10
* fix: python agent: Add a dependency on generated files
* python: use setuptools with python >= 3.12


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH lttng-modules] Android: Import VFS namespace for android common kernel

2023-12-18 Thread Mathieu Desnoyers via lttng-dev

On 2023-12-18 05:16, Lei wang via lttng-dev wrote:

Android GKI kernel add limitation on fs interface usage.
Need to import VFS namespace explicitly to make it workable
for lttng-modules.



Merged into lttng-modules master and 2.13 branches, thanks!

Mathieu


Signed-off-by: Lei wang 
---
  src/wrapper/kallsyms.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/wrapper/kallsyms.c b/src/wrapper/kallsyms.c
index 97897c4..9398c83 100644
--- a/src/wrapper/kallsyms.c
+++ b/src/wrapper/kallsyms.c
@@ -113,3 +113,7 @@ unsigned long wrapper_kallsyms_lookup_name(const char *name)
  EXPORT_SYMBOL_GPL(wrapper_kallsyms_lookup_name);
  
  #endif

+
+#ifdef CONFIG_ANDROID
+MODULE_IMPORT_NS(VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver);
+#endif


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] TSAN build broken on master branch

2023-09-23 Thread Mathieu Desnoyers via lttng-dev

On 9/21/23 21:21, Olivier Dion via lttng-dev wrote:

On Thu, 21 Sep 2023, Ondřej Surý via lttng-dev  
wrote:
[...]

It fails with:

rculfhash.c:1189:2: error: address argument to atomic operation must be a 
pointer to integer ('typeof (node_next)' (aka 'struct cds_lfht_node **') 
invalid)
 uatomic_or_mo(node_next, REMOVED_FLAG, CMM_RELEASE);
 ^~~
../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 
'uatomic_or_mo'
 (void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \
^ ~~~
rculfhash.c:1440:3: error: address argument to atomic operation must be a 
pointer to integer ('typeof (fini_bucket_next)' (aka 'struct cds_lfht_node **') 
invalid)
 uatomic_or(fini_bucket_next, REMOVED_FLAG);
 ^~
../include/urcu/uatomic/builtins-generic.h:130:2: note: expanded from macro 
'uatomic_or'
 uatomic_or_mo(addr, mask, CMM_RELAXED)
 ^~
../include/urcu/uatomic/builtins-generic.h:123:10: note: expanded from macro 
'uatomic_or_mo'
 (void) __atomic_or_fetch(cmm_cast_volatile(addr), mask, \
^ ~~~


Eh I thought we fixed that.  Clang is very strict about these things.

You can apply the following
.  That ought to fix
the issue until we merge the patch.


Fix merged into liburcu master, thanks!

Mathieu

  


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

2023-09-11 Thread Mathieu Desnoyers via lttng-dev

On 9/10/23 10:18, Mousa, Anas wrote:

Hey Mathieu,


Hi Anas,



We see that upon recording a tracepoint, there are multiple stages of 
reserve-commit-write, where atomics and shared memory accesses take up a big part of the 
recording time,


we're wondering, is there a "light-mode" of recording a tracepoint 
involving less logic or


a mode which can potentially have lower latency?


I've been working on the rseq(2) system call for a few years now, and 
this is intended to help reduce the cost of lttng-ust's ring buffer 
atomics on the tracing fast-path. The road ahead there is integration of 
rseq with lttng-ust, which did not show up on our customer feature 
requirements radar yet.


In terms of logic involved in the lttng-ust tracepoints, I hope that my 
current work on "libside" will help steer away from tracepoint providers 
based on macros and generated code, replacing this by an efficient 
bytecode interpreter. This should allow me to inline many of the calls 
that are currently needed between the tracepoint probe provider and the 
lttng-ust ring buffer. Again, this is an area where I think we can have 
great speed improvements, but it did not show up on our customer's 
feature requirement radar yet.



Also, are there any recent docs to share regarding tracepoint latency?


There is a Polytechnique student who extensively analyzed this recently. 
Michel, do you have a pointer to his work ?


Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [RFC] Deprecating RCU signal flavor

2023-08-23 Thread Mathieu Desnoyers via lttng-dev

On 8/23/23 10:47, Paul E. McKenney wrote:

On Mon, Aug 21, 2023 at 11:43:32AM -0400, Mathieu Desnoyers wrote:

On 8/15/23 08:38, Mathieu Desnoyers via lttng-dev wrote:

On 8/14/23 17:05, Olivier Dion via lttng-dev wrote:


After discussing it with Mathieu, we agree on the following 3 phases for
deprecating the signal flavor:

   1) liburcu-signal will be implemented in term of liburcu-mb. The only
   difference between the two flavors will be the public header files,
   linked symbols and library name.  Note that this add a regression in
   term of performance, since the implementation of liburcu-mb adds memory
   barriers on the reader side which are not present in the original
   liburcu-signal implementation.

   2) Adding the deprecated attribute to every public functions exposed by
   the liburcu-signal flavor.  At this point, tests for liburcu-signal
   will also be removed from the project.  There will be no more support
   for this flavor.

   3) Removing the liburcu-signal flavor completely from the project.

Finally, here is a tentative versions release of mine for each phase:

   1) 0.15.0 [October 2023] (also TSAN support yay!)

   2) 0.15.1

   3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking
   change)


There is a distinction between the version number of the liburcu project
(0.14) and the ABI soname for the shared objects. We may be able to do
step (3) without going to 1.0.0 (I don't see removal of the urcu-signal
flavor a strong enough motivation for hitting 1.0.0 yet).

Technically speaking, given that we would be removing the entire
liburcu-signal.so shared object, we would not be changing _symbols_
within an existing shared object, therefore I'm not even sure we need to
bump the soname for all the other remaining shared objects.


So after merging this commit:

 Phase 1 of deprecating liburcu-signal
 The first phase of liburcu-signal deprecation consists of implementing
 it in term of liburcu-mb. In other words, liburcu-signal is identical to
 liburcu-mb at the exception of the function symbols and public header
 files.
 This is done by:
   1) Removing the RCU_SIGNAL specific code in urcu.c
   2) Making the RCU_MB specific code also specific to RCU_SIGNAL in
   urcu.c
   3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a
   atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with
   cmm_barrier() around it. We could keep the explicit barriers, but that
   would require to add some cmm_annotate annotations. Therefore, to be
   less intrusive in a public header file, simply use the CMM_SEQ_CST
   like for the mb flavor.

I notice that an application previously built against urcu-signal with
_LGPL_SOURCE defined would have to be rebuilt, which would require a
soname bump of urcu-signal.

So considering that this phase 1 is not really a "drop in" replacement,
I favor removing the urcu-signal flavor entirely before the next release.

Thoughts ?


The replacement is liburcu-mb, correct?


After merging this "phase 1" of the removal, I noticed that we would need
to require applications built with _LGPL_SOURCE defined and using
liburcu-signal to be rebuilt, which would require a major library soname
bump, which I would prefer to avoid unless necessary.

Therefore, I went ahead and pushed additional commits in the master branch
which completely remove liburcu-signal from the tree. Therefore, the next
release of liburcu will not have the liburcu-signal header files nor its
library shared objects.



I will need to change perfbook, but that should be an easy change,
plus sys_membarrier() is widely available by now.


Users of liburcu-signal would be expected to migrate to liburcu-memb, which
relies on membarrier to achieve similar performance, but with lower-overhead
grace periods.

Thanks,

Mathieu





--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [RFC] Deprecating RCU signal flavor

2023-08-21 Thread Mathieu Desnoyers via lttng-dev

On 8/15/23 08:38, Mathieu Desnoyers via lttng-dev wrote:

On 8/14/23 17:05, Olivier Dion via lttng-dev wrote:


After discussing it with Mathieu, we agree on the following 3 phases for
deprecating the signal flavor:

  1) liburcu-signal will be implemented in term of liburcu-mb. The only
  difference between the two flavors will be the public header files,
  linked symbols and library name.  Note that this add a regression in
  term of performance, since the implementation of liburcu-mb adds memory
  barriers on the reader side which are not present in the original
  liburcu-signal implementation.

  2) Adding the deprecated attribute to every public functions exposed by
  the liburcu-signal flavor.  At this point, tests for liburcu-signal
  will also be removed from the project.  There will be no more support
  for this flavor.

  3) Removing the liburcu-signal flavor completely from the project.

Finally, here is a tentative versions release of mine for each phase:

  1) 0.15.0 [October 2023] (also TSAN support yay!)

  2) 0.15.1

  3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking
  change)


There is a distinction between the version number of the liburcu project 
(0.14) and the ABI soname for the shared objects. We may be able to do 
step (3) without going to 1.0.0 (I don't see removal of the urcu-signal 
flavor a strong enough motivation for hitting 1.0.0 yet).


Technically speaking, given that we would be removing the entire 
liburcu-signal.so shared object, we would not be changing _symbols_ 
within an existing shared object, therefore I'm not even sure we need to 
bump the soname for all the other remaining shared objects.


So after merging this commit:

Phase 1 of deprecating liburcu-signal

The first phase of liburcu-signal deprecation consists of implementing

it in term of liburcu-mb. In other words, liburcu-signal is identical to
liburcu-mb at the exception of the function symbols and public header
files.

This is done by:

  1) Removing the RCU_SIGNAL specific code in urcu.c

  2) Making the RCU_MB specific code also specific to RCU_SIGNAL in

  urcu.c

  3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a

  atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with
  cmm_barrier() around it. We could keep the explicit barriers, but that
  would require to add some cmm_annotate annotations. Therefore, to be
  less intrusive in a public header file, simply use the CMM_SEQ_CST
  like for the mb flavor.

I notice that an application previously built against urcu-signal with
_LGPL_SOURCE defined would have to be rebuilt, which would require a
soname bump of urcu-signal.

So considering that this phase 1 is not really a "drop in" replacement,
I favor removing the urcu-signal flavor entirely before the next release.

Thoughts ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [RFC] Deprecating RCU signal flavor

2023-08-15 Thread Mathieu Desnoyers via lttng-dev

On 8/14/23 17:05, Olivier Dion via lttng-dev wrote:


After discussing it with Mathieu, we agree on the following 3 phases for
deprecating the signal flavor:

  1) liburcu-signal will be implemented in term of liburcu-mb. The only
  difference between the two flavors will be the public header files,
  linked symbols and library name.  Note that this add a regression in
  term of performance, since the implementation of liburcu-mb adds memory
  barriers on the reader side which are not present in the original
  liburcu-signal implementation.

  2) Adding the deprecated attribute to every public functions exposed by
  the liburcu-signal flavor.  At this point, tests for liburcu-signal
  will also be removed from the project.  There will be no more support
  for this flavor.

  3) Removing the liburcu-signal flavor completely from the project.

Finally, here is a tentative versions release of mine for each phase:

  1) 0.15.0 [October 2023] (also TSAN support yay!)

  2) 0.15.1

  3) 0.16.0 || 1.0.0 (maybe a major bump since this is an API breaking
  change)


There is a distinction between the version number of the liburcu project 
(0.14) and the ABI soname for the shared objects. We may be able to do 
step (3) without going to 1.0.0 (I don't see removal of the urcu-signal 
flavor a strong enough motivation for hitting 1.0.0 yet).


Technically speaking, given that we would be removing the entire 
liburcu-signal.so shared object, we would not be changing _symbols_ 
within an existing shared object, therefore I'm not even sure we need to 
bump the soname for all the other remaining shared objects.


Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] Fix: list lttng sub-directory in Kbuild

2023-08-10 Thread Mathieu Desnoyers via lttng-dev

On 8/10/23 06:05, Richa Bharti wrote:

From: Richa Bharti 


Hi!

Thanks for your patch. I'm adding Michael Jeanson and the lttng-dev 
mailing list in CC.


Thanks,

Mathieu



* Linux kernel>=6.1 reads sub-directory from Kbuild
* Kernel < 6.1 reads sub-directory from Makefile

Signed-off-by: Richa Bharti 
---
  scripts/built-in.sh | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/scripts/built-in.sh b/scripts/built-in.sh
index f0594ec..2451230 100755
--- a/scripts/built-in.sh
+++ b/scripts/built-in.sh
@@ -14,9 +14,19 @@ KERNEL_DIR="$(readlink --canonicalize-existing "$1")"
  # Symlink the lttng-modules directory in the kernel source
  ln -sf "$(pwd)" "${KERNEL_DIR}/lttng"
  
+# Get kernel version from Makefile

+version=$(grep -m 1 VERSION ${KERNEL_DIR}/Makefile | sed 's/^.*= //g')
+patchlevel=$(grep -m 1 PATCHLEVEL ${KERNEL_DIR}/Makefile | sed 's/^.*= //g')
+kernel_version=${version}.${patchlevel}
+
  # Graft ourself to the kernel build system
  echo 'source "lttng/src/Kconfig"' >> "${KERNEL_DIR}/Kconfig"
-sed -i 's#+= kernel/#+= kernel/ lttng/#' "${KERNEL_DIR}/Makefile"
+
+if awk "BEGIN {exit !(${kernel_version} >= 6.1)}"; then
+   echo 'obj-y += lttng/' >> "${KERNEL_DIR}/Kbuild"
+else
+   sed -i 's#+= kernel/#+= kernel/ lttng/#' "${KERNEL_DIR}/Makefile"
+fi
  
  echo >&2

  echo "$0: done." >&2


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Status of LTTng-scope and Lttng-analyses

2023-07-19 Thread Mathieu Desnoyers via lttng-dev

On 7/18/23 15:27, Cook, Layne via lttng-dev wrote:

Can you tell me the status of the beta projects listed on the web site?

LTTng scope
LTTng analyses

The github projects haven't had an activity for quite a while. Have 
these projects been abandoned, or superceded by something else?


Hi Layne,

Thanks for your interest in those projects!

The LTTng scope beta project was an attempt at doing a significant UX
redesign of Trace Compass, starting from a use-cases/user workflow
perspective. We currently don't have the resources/funding/staff to
work on this project further, so it has not progressed for a while.

You should look at the Trace Compass and VSCode trace extension
projects instead, which have a lot more activity:

https://tracecompass.org
https://github.com/eclipse-cdt-cloud/vscode-trace-extension

The LTTng analyses beta project is a set of python scripts to analyze
LTTng traces. Our original intent with that project was that EfficiOS
would fund the work to create those analyses as prototypes in Python,
and eventually customers would fund the rather large amount of work
required to go from a prototype (slow scripts) to a production quality
project (faster C++ implementation, generic state tracking module).
Unfortunately, this never materialized, so this beta project has been
on the back burner as well.

In the recent years we have focused our efforts on the Babeltrace 2
project and on CTF2 (Common Trace Format version 2).

Feel free to have a look at Trace Compass and VSCode trace extension, and
please let us know if LTTng scope and LTTng analyses fill a gap that is
not covered by those other tools.

Thanks,

Mathieu



Thanks,

LC

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Status of the RCU Red Black Tree

2023-07-12 Thread Mathieu Desnoyers via lttng-dev

On 7/12/23 14:44, Uttormark, Mike via lttng-dev wrote:
What became of the red-black tree effort?  I see it in the git repo, 10 
years old.  It never made it onto master.


What would it take to get it onto master and into a release branch?


Hi Mike,

There are a few things that are in the way of merging it into a liburcu
release, namely:

* An end user with a clearly defined use-case to allow defining a solid
  API,

* Validation that those use-cases are not better covered by some
  variation of my RCU Judy Array prototype instead, ref.:

  https://github.com/urcu/userspace-rcu/tree/urcu/rcuja-simple-int

* More testing, both within the liburcu project and in terms of use of
  the API from an application perspective,

* Funding for all that work, allowing us to prioritize this effort with
  respect to our various other projects.

Thanks for your interest in the liburcu Red-Black Tree prototype! Please
don't hesitate to reach out to EfficiOS if HPE would like to explore
supporting this project.

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Fwd: lttng issue

2023-07-12 Thread Mathieu Desnoyers via lttng-dev

On 7/10/23 13:28, Bala Gundeboina via lttng-dev wrote:



Hi ,
      i have copied the dependencies that i have installed on TDA4 evm 
board and i have copied  the dependency to the TI board and i getting 
below error , i found the were this lock will create but in that 
folder(/var/run/lttng) no lock file is present.

lttng create my-kernel-session --output=/home/root/
Spawning a session daemon
Warning: Failed to produce a random seed using getrandom(), falling back 
to pseudo-random device seed generation which will block until its pool 
is initialized: Failed to get true random data using getre


The issue appears to be caused by failure to obtain peudo-random numbers 
with getrandom(), so there may be something missing on your system in 
that area. Running lttng-sessiond under strace may help figure out 
exactly how this is failing.


Also providing the output of lttng-sessiond in verbose mode would be 
helpful (lttng-sessiond -vvv).


Thanks,

Mathieu


PERROR - 15:35:47.868588025 [Main]: remove lock file: No such file or 
directory (in sessiond_cleanup_lock_file() at main.cpp:955)

Error: Session daemon terminated with an error (exit status: 1)
Error: Problem occurred while launching session daemon 
(//bin/lttng-sessiond)


Can you please sort it out and is there any dependency i am missing:
sudo cp -Par bin/lttng /media/bala/root/bin/
sudo cp -Par bin/lttng-sessiond /media/bala/root/bin/
sudo cp -Par lib/liblttng-ctl.so /media/bala/root/lib/
sudo cp -Par lib/liblttng-ctl.so.0 /media/bala/root/lib/
sudo cp -Par lib/liblttng-ctl.so.0.0.0 /media/bala/root/lib/
sudo cp -Par lib/liblttng-ust-ctl.so.5 /media/bala/root/lib/
sudo cp -Par lib/librt.so.1 /media/bala/root/lib/
sudo cp -Par lib/librt-2.30.so /media/bala/root/lib/
sudo cp -Par lib/libm-2.30.so /media/bala/root/lib/
sudo cp -Par lib/libm.so.6 /media/bala/root/lib/
sudo cp -aPr liburcu-common.so.8 /media/bala/root1/usr/lib/
sudocp -aPr liburcu-common.so /media/bala/root1/usr/lib/
sudo cp -aPr liburcu-common.so.8.1.0 /media/bala/root1/usr/lib/

sudo cp -Par lib/libgcc_s.so /media/bala/root/lib/
sudo cp -Par lib/libgcc_s.so.1 /media/bala/root/lib/
sudo cp -Par lib/libpthread-2.30.so /media/bala/root/lib/
sudo cp -Par lib/libpthread.so.0 /media/bala/root/lib/
sudo cp -Par lib/libc.so.6 /media/bala/root/lib/
sudo cp -Par lib/libdl.so.2 /media/bala/root/lib/
sudo cp -Par lib/libdl-2.30.so /media/bala/root/lib/
sudo cp -Par lib/libz.so.1 /media/bala/root/lib/
sudo cp -Par lib/libz.so.1.2.11 /media/bala/root/lib/
sudo cp -Par lib/ld-linux-aarch64.so.1 /media/bala/root/lib/
sudo cp -Par lib/ld-2.30.so /media/bala/root/lib
sudo cp -aPr liburcu-cds.so /media/bala/root1/usr/lib/
sudo cp -aPr liburcu-cds.so.8 /media/bala/root1/usr/
sudo cp -aPr liburcu-cds.so.8.1.0 /media/bala/root1/usr/lib/


sudo cp -Par lib/liblttng-ust-common.so /media/bala/root/lib/
sudo cp -Par lib/liblttng-ust-ctl.so.5.0.0 /media/bala/root/lib/
sudo cp -Par lib/liblttng-ust-common.so.1.0.0 /media/bala/root/lib/
sudo cp -Par usr/lib/libstdc++.so /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libstdc++.so.6 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libstdc++.so.6.0.28 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libxml2.so /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libxml2.so.2 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libxml2.so.2.9.10 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libpopt.so /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libpopt.so.0 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libpopt.so.0.0.0 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libpopt.so.0.0.2 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu.so /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu.so.8 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu.so.8.1.0 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu-cds.so.8 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu-common.so.8 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu-cds.so.8.1.0 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/liburcu-common.so.8.1.0 /media/bala/root/usr/lib/
sudo cp -Par usr/local/local/lib/libnuma.so.1 /media/bala/root/usr/lib/
sudo cp -Par usr/local/local/lib/libnuma.so.1.0.0 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libkmod.so.2.3.4 /media/bala/root/usr/lib/
sudo cp -Par usr/lib/libkmod.so.2 /media/bala/root/usr/lib/


Thanks & Regards
Bala G


This message contains information that may be privileged or confidential 
and is the property of the KPIT Technologies Ltd. It is intended only 
for the person to whom it is addressed. If you are not the intended 
recipient, you are not authorized to read, print, retain copy, 
disseminate, distribute, or use this message or any part thereof. If you 
receive this message in error, please notify the sender immediately and 
delete all copies of this message. KPIT Technologies Ltd. does not 
accept any liability for virus infected mails.


___
lttng-dev mailing list
lttng-dev@lists.lttng.org

Re: [lttng-dev] [PATCH lttng-modules 0/1] Introduce configure script to describe changes in linux kernel interface

2023-07-04 Thread Mathieu Desnoyers via lttng-dev

On 7/4/23 14:39, Roxana Nicolescu wrote:

Hi,

Thanks a lot for your feedback.

I realize I did not say the reason why I did not go for 
LTTNG_UBUNTU_KERNEL_RANGE. We deliver a bunch of different

derivatives (inherited from the main kernel), each with its own
version and it's impossible to use LTTNG_UBUNTU_KERNEL_RANGE alone. 
Derivatives in the same cycle don't have the same version number, so

I cannot rely on the version alone to determine when a change has
happened. For example these are some kernels we released last cycle: 
- linux (main kernel): 5.19.0-46 - linux-kvm: 5.19.0-1026 -

linux-lowlatency: 5.19.0-1028 As you can see, linux-kvm and
linux-lowlatency versions are not the same, and linux-lowlatency from
2 months ago version version number coincides with linux-kvm from
now, but they don't match the same base. I hope that explains it.

Initially I thought about exposing the version of the main kernel in
the kernel headers that can be later used in the module, but then I
came across openvswitch and that's how I came up with the idea of an
initial configure step.

But I totally understand if you think this is not worth it.


LTTng modules use the UTS_UBUNTU_RELEASE_ABI from the Ubuntu 
generated/utsrelease.h kernel headers to detect tracepoint 
instrumentation changes. I don't understand why many kernel flavors 
would have the same ABI number with different ABI semantics, but I guess 
that's just how things are now.


One way to solve this would be to detect the "-lowlatency" and "-kvm" 
suffixes in the string within generated/utsrelease.h UTS_RELEASE, e.g.:


#define UTS_RELEASE "5.15.0-76-lowlatency"

This could be done by LTTng modules by implementing a script similar to 
what we do for debian, fedora, rhel, and sle (see scripts/ in 
lttng-modules).


Then we could have:

* LTTNG_UBUNTU_KERNEL_RANGE for kernels where all flavors have the same
  kernel ABI.

* LTTNG_UBUNTU_GENERIC_KERNEL_RANGE for generic kernels only, for
  situations where the kernel ABI differ between flavors,

* LTTNG_UBUNTU_LOWLATENCY_KERNEL_RANGE for lowlatency kernels only, for
  situations where the kernel ABI differ between flavors,

* LTTNG_UBUNTU_KVM_KERNEL_RANGE for kvm kernels only, for situations
  where the kernel ABI differ between flavors.

It would all have been simpler if the UTS_UBUNTU_RELEASE_ABI would 
actually have been a versioned kernel ABI without different semantics 
across kernel flavors, but considering the current situation we will 
need to deal with this with scripts as we have done for other distributions.


Thanks,

Mathieu



All the best, Roxana

On 04/07/2023 20:07, Mathieu Desnoyers wrote:

On 7/4/23 11:35, Michael Jeanson via lttng-dev wrote:

On 2023-07-03 14:28, Roxana Nicolescu via lttng-dev wrote:

This script described the changes in the linux kernel interface
that affect compatibility with lttng-modules.

It is introduced for a specific usecase where commit 
d87a7b4c77a9: "jbd2: use the correct print format" broke the
interface between the kernel and lttng-module. 3 variables 
changed their type to tid_t (transaction, head and tid) in

multiple function declarations. The lttng module was updated
properly to ensure backwards compatibility by using the version
of the kernel. But this change took into account only long term
supported versions. As an example, ubuntu 5.19 kernels picked
the linux kernel change from 5.15 without actually changing the
linux kernel upstream version. This means the current tooling
does not allow to fix the module for newer ubuntu 5.19
kernels.

This script is supposed to solve the problem mentioned above,
but to also make this change easier to integrate. We check the
linux kernel header (include/trace/events/jbd2.h) if the types
of tid, transaction and head variable have changed to tid_t
and define these 3 variables in 'include/generated/config.h': 
TID_IS_TID_T 1 TRANSACTION_IS_TID_T 1 HEAD_IS_TID_T 1


In 'include/instrumentation/events/jbd2.h' we then check these
to define the proper type of transaction, head and tid
variables that will be later used in the function declarations
that need them.

This change is meant to remove the dependency on linux kernel
version and the outcome is a bit cleaner that before. As with
the previous implementation, this may need changes in the 
future if the kernel interface changes again.


Note: This is a proposal for a simpler way of integrating linux
kernel changes in lttng-modules. The implementation is very
simple due to the fact that tid_t was introduced everywhere in
one commit in include/trace/events/jbd2.h. I would like to get
your opinion on this approach. If needed, it can be improved.

Roxana Nicolescu (1): Introduce configure script to describe
changes in linux kernel interface

README.md |   3 +- configure
|  36 + include/instrumentation/events/jbd2.h | 110 
++ 3 files changed, 61 insertions(+),

88 deletions(-) create mode 100755 configure



Hi 

Re: [lttng-dev] [PATCH lttng-modules 0/1] Introduce configure script to describe changes in linux kernel interface

2023-07-04 Thread Mathieu Desnoyers via lttng-dev

On 7/4/23 11:35, Michael Jeanson via lttng-dev wrote:

On 2023-07-03 14:28, Roxana Nicolescu via lttng-dev wrote:

This script described the changes in the linux kernel interface that
affect compatibility with lttng-modules.

It is introduced for a specific usecase where commit
d87a7b4c77a9: "jbd2: use the correct print format"
broke the interface between the kernel and lttng-module. 3 variables
changed their type to tid_t (transaction, head and tid) in multiple
function declarations. The lttng module was updated properly to ensure
backwards compatibility by using the version of the kernel.
But this change took into account only long term supported versions.
As an example, ubuntu 5.19 kernels picked the linux kernel change from
5.15 without actually changing the linux kernel upstream version. This
means the current tooling does not allow to fix the module for newer
ubuntu 5.19 kernels.

This script is supposed to solve the problem mentioned above, but to
also make this change easier to integrate.
We check the linux kernel header (include/trace/events/jbd2.h) if the
types of tid, transaction and head variable have changed to tid_t and
define these 3 variables in 'include/generated/config.h':
TID_IS_TID_T 1
TRANSACTION_IS_TID_T 1
HEAD_IS_TID_T 1

In 'include/instrumentation/events/jbd2.h' we then check these to define
the proper type of transaction, head and tid variables that will be
later used in the function declarations that need them.

This change is meant to remove the dependency on linux kernel version
and the outcome is a bit cleaner that before.
As with the previous implementation, this may need changes in the future
if the kernel interface changes again.

Note:
This is a proposal for a simpler way of integrating linux kernel changes
in lttng-modules. The implementation is very simple due to the fact that
tid_t was introduced everywhere in one commit in
include/trace/events/jbd2.h.
I would like to get your opinion on this approach. If needed, it can be
improved.

Roxana Nicolescu (1):
   Introduce configure script to describe changes in linux kernel
 interface

  README.md |   3 +-
  configure |  36 +
  include/instrumentation/events/jbd2.h | 110 ++
  3 files changed, 61 insertions(+), 88 deletions(-)
  create mode 100755 configure



Hi Roxana,

While I can see advantages to a configure script approach to detect 
kernel source changes I don't think it's worth the added complexity on 
top of our current kernel version range system.


We already have an Ubuntu specific kernel range macro that supplements 
the upstream version with Ubuntu's kernel ABI number:


LTTNG_UBUNTU_KERNEL_RANGE(5,19,17,X, 6,0,0,0)

I'll let Mathieu make the final call but I think that would be the 
preferred approach.


Indeed, many of the kernel tracepoint code changes we had to deal with 
in the past 10 years would not be easy to track with configure scripts, 
so we would end up with not just one, but with a combination of two 
different mechanisms to adapt to kernel code changes.


In order to keep things maintainable long-term, I prefer that we stay 
with the version-based approach as recommended by Michael.


Thanks,

Mathieu


Regards,

Michael
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured

2023-06-29 Thread Mathieu Desnoyers via lttng-dev

On 6/29/23 13:27, Olivier Dion wrote:

On Thu, 29 Jun 2023, Olivier Dion  wrote:


   [0] https://godbolt.org/z/3nW14M3v1
   [1] https://godbolt.org/z/TcTeMeKbW


Sorry.  That was:

 [0] https://godbolt.org/z/ETcxnz4TW


Change

(volatile __typeof__(ptr))(ptr);

for:

(volatile __typeof__(*(ptr)) *)(ptr);

and:

void love_iso(int *x)
{
 __atomic_store_n(cast_volatile(), 1,
  __ATOMIC_RELAXED);
}

for

void love_iso(int *x)
{
 __atomic_store_n(cast_volatile(x), 1,
  __ATOMIC_RELAXED);
}

Thanks,

Mathieu



 [1] https://godbolt.org/z/jMjh8YoM4
--

Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured

2023-06-29 Thread Mathieu Desnoyers via lttng-dev

On 6/29/23 13:22, Olivier Dion wrote:

On Thu, 22 Jun 2023, "Paul E. McKenney"  wrote:

On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:

On 6/21/23 19:19, Paul E. McKenney wrote:

I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
for non-volatile atomic loads and stores, and such fusing can ruin your
code's entire day.  ;-)


After some testing, I got a wall of warnings:

   -Wignored-qualifiers:

 Warn if the return type of a function has a type qualifier such as
 "const".  For ISO C such a type qualifier has no effect, since the
 value returned by a function is not an lvalue.  For C++, the warning
 is only emitted for scalar types or "void".  ISO C prohibits
 qualified "void" return types on function definitions, so such
 return types always receive a warning even without this option.

Since we are using atomic builtins, for example load:

   type __atomic_load_n (type *ptr, int memorder)

If we put the qualifier volatile to TYPE, we end up with the same
qualifier on the return value, triggering a warning for each atomic
operation.

This seems to be only a problem when compiling in C++ [0] while in C it
seems the compiler is more relaxed on this [1].

Ideas to make the toolchains happy? :-)


Change:

(__typeof__(*ptr) *volatile)(ptr);

(which applies the volatile to the pointer, rather than what is pointed to)

to either:

(volatile __typeof__(*ptr) *)(ptr);

or:

(__typeof__(*ptr) volatile *)(ptr);

Thanks,

Mathieu



   [0] https://godbolt.org/z/3nW14M3v1
   [1] https://godbolt.org/z/TcTeMeKbW



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured

2023-06-22 Thread Mathieu Desnoyers via lttng-dev

On 6/22/23 15:53, Olivier Dion wrote:

On Thu, 22 Jun 2023, "Paul E. McKenney"  wrote:


I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
for non-volatile atomic loads and stores, and such fusing can ruin your
code's entire day.  ;-)


Good catch.  Seems like not a problem on GCC (yet), but Clang is extremely
aggressive and seems to do store fusing on some corner cases [0].


I don't think this is an example of store fusing, but rather just that 
the compiler can eliminate stores to static variables which are 
otherwise unused, making the entire variable useless.


Thanks,

Mathieu



However, I do not find any simple reproducer of load/store fusing.  Do
you have example of such fusing, or is this a precaution?  In the
meantime, back to reading the standard to be certain :-)

  [0] https://godbolt.org/z/odKG9a75a



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured

2023-06-22 Thread Mathieu Desnoyers via lttng-dev

On 6/22/23 14:32, Paul E. McKenney wrote:

On Thu, Jun 22, 2023 at 11:55:55AM -0400, Mathieu Desnoyers wrote:

On 6/21/23 19:19, Paul E. McKenney wrote:
[...]

diff --git a/include/urcu/uatomic/builtins-generic.h 
b/include/urcu/uatomic/builtins-generic.h
new file mode 100644
index 000..8e6a9b5
--- /dev/null
+++ b/include/urcu/uatomic/builtins-generic.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-generic.h
+ *
+ * Copyright (c) 2023 Olivier Dion 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
+#define _URCU_UATOMIC_BUILTINS_GENERIC_H
+
+#include 
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)


Does this lose the volatile semantics that the old-style definitions
had?



Yes.

[...]


+++ b/include/urcu/uatomic/builtins-x86.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-x86.h
+ *
+ * Copyright (c) 2023 Olivier Dion 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_X86_H
+#define _URCU_UATOMIC_BUILTINS_X86_H
+
+#include 
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)


And same question here.


Yes, this opens interesting questions:

* what semantic do we want for uatomic_read/set ?

* what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ?

* do we want to allow load/store-shared to work on variables larger than a
word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure)

* what are the guarantees of a volatile type ?

* what are the guarantees of a load/store relaxed in C11 ?

Does the delta between volatile and C11 relaxed guarantees matter ?

Is there an advantage to use C11 load/store relaxed over volatile ? Should
we combine both C11 load/store _and_ volatile ? Should we use
atomic_signal_fence instead ?


I suggest C11 volatile atomic load/store.  Load/store fusing is permitted
for non-volatile atomic loads and stores, and such fusing can ruin your
code's entire day.  ;-)


I'm OK with erring towards a safer approach, but just out of curiosity, 
do you have examples of compilers doing load or store fusion on C11 or 
C++11 relaxed atomics, or is it out of caution due to lack of explicit 
guarantees in the standards ?


Does this lack of guarantee about fusion also apply to other MO such as 
acquire, release and seq.cst. ?


Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 02/11] urcu/uatomic: Use atomic builtins if configured

2023-06-22 Thread Mathieu Desnoyers via lttng-dev

On 6/21/23 19:19, Paul E. McKenney wrote:
[...]

diff --git a/include/urcu/uatomic/builtins-generic.h 
b/include/urcu/uatomic/builtins-generic.h
new file mode 100644
index 000..8e6a9b5
--- /dev/null
+++ b/include/urcu/uatomic/builtins-generic.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-generic.h
+ *
+ * Copyright (c) 2023 Olivier Dion 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_GENERIC_H
+#define _URCU_UATOMIC_BUILTINS_GENERIC_H
+
+#include 
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)


Does this lose the volatile semantics that the old-style definitions
had?



Yes.

[...]


+++ b/include/urcu/uatomic/builtins-x86.h
@@ -0,0 +1,85 @@
+/*
+ * urcu/uatomic/builtins-x86.h
+ *
+ * Copyright (c) 2023 Olivier Dion 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _URCU_UATOMIC_BUILTINS_X86_H
+#define _URCU_UATOMIC_BUILTINS_X86_H
+
+#include 
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELAXED)
+
+#define uatomic_read(addr) __atomic_load_n(addr, __ATOMIC_RELAXED)


And same question here.


Yes, this opens interesting questions:

* what semantic do we want for uatomic_read/set ?

* what semantic do we want for CMM_LOAD_SHARED/CMM_STORE_SHARED ?

* do we want to allow load/store-shared to work on variables larger than 
a word ? (e.g. on a uint64_t on a 32-bit architecture, or on a structure)


* what are the guarantees of a volatile type ?

* what are the guarantees of a load/store relaxed in C11 ?

Does the delta between volatile and C11 relaxed guarantees matter ?

Is there an advantage to use C11 load/store relaxed over volatile ? 
Should we combine both C11 load/store _and_ volatile ? Should we use 
atomic_signal_fence instead ?


Thanks,

Mathieu



Thanx, Paul


+


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] Avoid calling caa_container_of on NULL pointer in cds_lfhash macros

2023-06-22 Thread Mathieu Desnoyers via lttng-dev

On 6/22/23 06:45, Ondřej Surý via lttng-dev wrote:

(Sorry, I missed closing brackets in both macros, so resending fixed patch...)

The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
would call caa_container_of() macro on NULL pointer.  This is not a
problem under normal circumstances as the check in the for loop fails
and the loop-statement is not called with invalid (pos) value.

However AddressSanitizer doesn't like that and complains about this:

 runtime error: applying non-zero offset 18446744073709551056 to null 
pointer

Move the cds_lfht_iter_get_node(iter) != NULL from the cond-expression
of the for loop into both init-clause and iteration-expression as
conditional operator and check for (pos) value in the cond-expression
instead.


I've taken the liberty to reimplement this with a new helper "cds_lfht_entry".

Can you review and try the following commits please ?

https://review.lttng.org/c/userspace-rcu/+/10445 compiler.h: Introduce 
caa_unqual_scalar_typeof
https://review.lttng.org/c/userspace-rcu/+/10446 Avoid calling caa_container_of 
on NULL pointer in cds_lfht macros

Thanks!

Mathieu



Signed-off-by: Ondřej Surý 
---
  include/urcu/rculfhash.h | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/urcu/rculfhash.h b/include/urcu/rculfhash.h
index fbd33cc..64cc18f 100644
--- a/include/urcu/rculfhash.h
+++ b/include/urcu/rculfhash.h
@@ -546,22 +546,22 @@ void cds_lfht_resize(struct cds_lfht *ht, unsigned long 
new_size);
  
  #define cds_lfht_for_each_entry(ht, iter, pos, member)			\

for (cds_lfht_first(ht, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next(ht, iter),\
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL))
  
  #define cds_lfht_for_each_entry_duplicate(ht, hash, match, key,		\

iter, pos, member)  \
for (cds_lfht_lookup(ht, hash, match, key, iter),   \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member);\
-   cds_lfht_iter_get_node(iter) != NULL;   \
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL);
\
+   pos != NULL;\
cds_lfht_next_duplicate(ht, match, key, iter),  \
-   pos = caa_container_of(cds_lfht_iter_get_node(iter), \
-   __typeof__(*(pos)), member))
+   pos = (cds_lfht_iter_get_node(iter) != NULL ? 
caa_container_of(cds_lfht_iter_get_node(iter), \
+   __typeof__(*(pos)), member) : NULL))
  
  #ifdef __cplusplus

  }


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 04/11] urcu/arch/generic: Use atomic builtins if configured

2023-06-21 Thread Mathieu Desnoyers via lttng-dev

On 6/21/23 20:53, Olivier Dion wrote:

On Wed, 21 Jun 2023, "Paul E. McKenney"  wrote:

On Mon, May 15, 2023 at 04:17:11PM -0400, Olivier Dion wrote:

  #ifndef cmm_mb
  #define cmm_mb()__sync_synchronize()


Just out of curiosity, why not also implement cmm_mb() in terms of
__atomic_thread_fence(__ATOMIC_SEQ_CST)?  (Or is that a later patch?)


IIRC, Mathieu and I agree that the definition of a thread fence -- acts
as a synchronization fence between threads -- is too weak for what we
want here.  For example, with I/O devices.

Although __sync_synchronize() is probably an alias for a SEQ_CST thread
fence, its definition -- issues a full memory barrier -- is stronger.

We do not want to rely on this assumption (alias) and prefer to rely on
the documented definition instead.



We should document this rationale with a new comment near the #define,
in case anyone mistakenly decides to use a thread fence there to make it
similar to the rest of the code in the future.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] I'm still getting empty ust traces using tracef

2023-06-21 Thread Mathieu Desnoyers via lttng-dev

On 6/20/23 18:02, Brian Hutchinson wrote:

On Thu, May 11, 2023 at 2:14 PM Mathieu Desnoyers
 wrote:


On 2023-05-11 14:13, Mathieu Desnoyers via lttng-dev wrote:

On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote:

... more background.  I've always used ltt in the kernel so I don't
have much experience with the user side of it and especially
multi-threaded, multi-core so I'm probably missing some fundamental
concepts that I need to understand.


Which are the exact versions of LTTng-UST and LTTng-Tools you are using
now ? (2.13.N or which git commit ?)



Also, can you try using lttng-ust stable-2.13 branch, which includes the 
following commit ?

commit be2ca8b563bab81be15cbce7b9f52422369f79f7
Author: Mathieu Desnoyers 
Date:   Tue Feb 21 14:29:49 2023 -0500

  Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is 
included

  Fix issues with missing symbols in use-cases where tracef.h is included
  before defining LTTNG_UST_TRACEPOINT_DEFINE, e.g.:

   #include 
   #define LTTNG_UST_TRACEPOINT_DEFINE
   #include 

  It is caused by the fact that tracef.h includes tracepoint.h in a
  context which has LTTNG_UST_TRACEPOINT_DEFINE undefined, and this is not
  re-evaluated for the following includes.

  Fix this by lifting the definition code in tracepoint.h outside of the
  header include guards, and #undef the old LTTNG_UST__DEFINE_TRACEPOINT
  before re-defining it to its new semantic. Use a new
  _LTTNG_UST_TRACEPOINT_DEFINE_ONCE include guard within the
  LTTNG_UST_TRACEPOINT_DEFINE defined case to ensure symbols are not
  duplicated.

  Signed-off-by: Mathieu Desnoyers 
  Change-Id: I0ef720435003a7ca0bfcf29d7bf27866c5ff8678



I applied this patch and if I use "tracef" type calls in our
application that is made up of a bunch of static libs ... the UST
trace calls work.  I verified that traces that were called from
several different static libs all worked.

But as soon as I include a "tracepoint" style tracepoint (that uses
trace provider include files etc.) then doing a "lttng list -u"
returns "None" for UST events.

Is there some kind of rule that says a file can't use both tracef and
tracepoint calls?  Is there something special you have to do to use
tracef and tracepoints in same file?  Doing so appears to have broken
everything.


It should just work.

Can you provide a minimal example of the compile unit having this
issue ?

Also you mention "static libs". Make sure you do *not* define 
"LTTNG_UST_TRACEPOINT_PROBE_DYNAMIC_LINKAGE" in this case. See the 
lttng-ust(3) man page for details (section "Statically linking the 
tracepoint provider").


Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

2023-06-21 Thread Mathieu Desnoyers via lttng-dev

On 6/21/23 01:39, Yitschak, Yehuda wrote:

On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote:

On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote:

Hello,






Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveito
n*platform1*?


Thanks and best regards,

Anas.



I recommend using "perf" when tracing with the sample program in a
loop to figure out the hot spots. With that information on the "fast"
and "slow" system, we might be able to figure out what differs.

Also, comparing the kernel configurations of the two systems can help.
Also comparing the glibc versions of the two systems would be relevant.



Also make sure you benchmark the lttng "snapshot" mode [1] to make sure
you don't run into a situation where the disk/network I/O throughput cannot
cope with the generated event throughput, thus causing the ring buffer to
discard events. This would therefore "speed up" tracing from the application
perspective because discarding an event is faster than writing it to a ring
buffer.


You mean we should avoid the "discard" loss mode and use "overwrite" loss mode 
since discard mode can fake fast performance ?


Yes. In addition to use "overwrite-when-buffer-full" mode, the 
"snapshot" session also ensures that no consumer daemon extracts the 
trace data (unless an explicit snapshot record is performed), which 
allows comparing the ring buffer producer performance with minimal noise.


If you really want to benchmark the discard-when-buffer-full mode and 
the the consumer daemon I/O behavior, then you need to take into account 
event discarded counts and the actual trace data size that was written 
to disk.


Thanks,

Mathieu





Thanks,

Mathieu

[1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot


Thanks,

Mathieu




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

2023-06-20 Thread Mathieu Desnoyers via lttng-dev

On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote:

On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote:

Hello,




Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform1*?

Thanks and best regards,

Anas.



I recommend using "perf" when tracing with the sample program in a loop 
to figure out the hot spots. With that information on the "fast" and 
"slow" system, we might be able to figure out what differs.


Also, comparing the kernel configurations of the two systems can help. 
Also comparing the glibc versions of the two systems would be relevant.




Also make sure you benchmark the lttng "snapshot" mode [1] to make sure 
you don't run into a situation where the disk/network I/O throughput 
cannot cope with the generated event throughput, thus causing the ring 
buffer to discard events. This would therefore "speed up" tracing from 
the application perspective because discarding an event is faster than 
writing it to a ring buffer.


Thanks,

Mathieu

[1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot


Thanks,

Mathieu




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

2023-06-20 Thread Mathieu Desnoyers via lttng-dev

On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote:

Hello,




Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform1*?

Thanks and best regards,

Anas.



I recommend using "perf" when tracing with the sample program in a loop 
to figure out the hot spots. With that information on the "fast" and 
"slow" system, we might be able to figure out what differs.


Also, comparing the kernel configurations of the two systems can help. 
Also comparing the glibc versions of the two systems would be relevant.


Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] Fix: revise urcu_read_lock_update() comment

2023-06-15 Thread Mathieu Desnoyers via lttng-dev

On 6/13/23 21:51, Li-Kuan Ou wrote:

Read-side critical section nesting is tracked in lower-order bits
and grace-period phase number use a single high-order bit



Merged, thanks!

Mathieu


Signed-off-by: Li-Kuan Ou 
---
  include/urcu/static/urcu-bp.h | 6 +++---
  include/urcu/static/urcu-mb.h | 6 +++---
  include/urcu/static/urcu-memb.h   | 6 +++---
  include/urcu/static/urcu-signal.h | 6 +++---
  4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h
index 8ba3830..b163a90 100644
--- a/include/urcu/static/urcu-bp.h
+++ b/include/urcu/static/urcu-bp.h
@@ -137,9 +137,9 @@ static inline enum urcu_bp_state 
urcu_bp_reader_state(unsigned long *ctr)
  
  /*

   * Helper for _urcu_bp_read_lock().  The format of urcu_bp_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _urcu_bp_read_lock() nesting, and a lower-order bit that contains either 
zero
- * or URCU_BP_GP_CTR_PHASE.  The smp_mb_slave() ensures that the accesses in
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _urcu_bp_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE 
bit
+ * that contains either zero or one.  The smp_mb_slave() ensures that the 
accesses in
   * _urcu_bp_read_lock() happen before the subsequent read-side critical 
section.
   */
  static inline void _urcu_bp_read_lock_update(unsigned long tmp)
diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h
index b97e42a..253d29b 100644
--- a/include/urcu/static/urcu-mb.h
+++ b/include/urcu/static/urcu-mb.h
@@ -63,9 +63,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_mb_reader);
  
  /*

   * Helper for _urcu_mb_read_lock().  The format of urcu_mb_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _urcu_mb_read_lock() nesting, and a lower-order bit that contains either 
zero
- * or URCU_GP_CTR_PHASE.  The cmm_smp_mb() ensures that the accesses in
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _urcu_mb_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE 
bit
+ * that contains either zero or one.  The cmm_smp_mb() ensures that the 
accesses in
   * _urcu_mb_read_lock() happen before the subsequent read-side critical 
section.
   */
  static inline void _urcu_mb_read_lock_update(unsigned long tmp)
diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h
index c8d102f..f64cb57 100644
--- a/include/urcu/static/urcu-memb.h
+++ b/include/urcu/static/urcu-memb.h
@@ -86,9 +86,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader);
  
  /*

   * Helper for _rcu_read_lock().  The format of urcu_memb_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _rcu_read_lock() nesting, and a lower-order bit that contains either zero
- * or URCU_GP_CTR_PHASE.  The smp_mb_slave() ensures that the accesses in
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _rcu_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit
+ * that contains either zero or one.  The smp_mb_slave() ensures that the 
accesses in
   * _rcu_read_lock() happen before the subsequent read-side critical section.
   */
  static inline void _urcu_memb_read_lock_update(unsigned long tmp)
diff --git a/include/urcu/static/urcu-signal.h 
b/include/urcu/static/urcu-signal.h
index c7577d3..707eaf8 100644
--- a/include/urcu/static/urcu-signal.h
+++ b/include/urcu/static/urcu-signal.h
@@ -64,9 +64,9 @@ extern DECLARE_URCU_TLS(struct urcu_reader, 
urcu_signal_reader);
  
  /*

   * Helper for _rcu_read_lock().  The format of urcu_signal_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _rcu_read_lock() nesting, and a lower-order bit that contains either zero
- * or URCU_GP_CTR_PHASE.  The cmm_barrier() ensures that the accesses in
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _rcu_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit
+ * that contains either zero or one.  The cmm_barrier() ensures that the 
accesses in
   * _rcu_read_lock() happen before the subsequent read-side critical section.
   */
  static inline void _urcu_signal_read_lock_update(unsigned long tmp)


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] Fix: revise urcu_read_lock_update() comment

2023-06-13 Thread Mathieu Desnoyers via lttng-dev

On 6/13/23 11:45, Li-Kuan Ou via lttng-dev wrote:

Read-side critical section nesting is tracked in lower-order bits
and grace-period phase number use a single high-order bit



Thanks for the fix. Here is a comment below,


Signed-off-by: Li-Kuan Ou 
---
  include/urcu/static/urcu-bp.h | 4 ++--
  include/urcu/static/urcu-mb.h | 4 ++--
  include/urcu/static/urcu-memb.h   | 4 ++--
  include/urcu/static/urcu-signal.h | 4 ++--
  4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/urcu/static/urcu-bp.h b/include/urcu/static/urcu-bp.h
index 8ba3830..c90c9f1 100644
--- a/include/urcu/static/urcu-bp.h
+++ b/include/urcu/static/urcu-bp.h
@@ -137,8 +137,8 @@ static inline enum urcu_bp_state 
urcu_bp_reader_state(unsigned long *ctr)
  
  /*

   * Helper for _urcu_bp_read_lock().  The format of urcu_bp_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _urcu_bp_read_lock() nesting, and a lower-order bit that contains either 
zero
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _urcu_bp_read_lock() nesting, and a single high-order bit that contains 
either zero


I think it would be clearer to state:

Helper for _urcu_bp_read_lock().  The format of urcu_bp_gp.ctr (as well as
the per-thread rcu_reader.ctr) has the lower-order bits containing a count of
urcu_bp_read_lock() nesting, and a single high-order URCU_BP_GP_CTR_PHASE bit
that contains either zero or one.  The smp_mb_slave() ensures that the accesses
in urcu_bp_read_lock() happen before the subsequent read-side critical section.

(likewise for similar comments in other files).

Can you submit an updated patch please ?

Thanks,

Mathieu




   * or URCU_BP_GP_CTR_PHASE.  The smp_mb_slave() ensures that the accesses in
   * _urcu_bp_read_lock() happen before the subsequent read-side critical 
section.
   */
diff --git a/include/urcu/static/urcu-mb.h b/include/urcu/static/urcu-mb.h
index b97e42a..218e2f3 100644
--- a/include/urcu/static/urcu-mb.h
+++ b/include/urcu/static/urcu-mb.h
@@ -63,8 +63,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_mb_reader);
  
  /*

   * Helper for _urcu_mb_read_lock().  The format of urcu_mb_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _urcu_mb_read_lock() nesting, and a lower-order bit that contains either 
zero
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _urcu_mb_read_lock() nesting, and a single high-order bit that contains 
either zero
   * or URCU_GP_CTR_PHASE.  The cmm_smp_mb() ensures that the accesses in
   * _urcu_mb_read_lock() happen before the subsequent read-side critical 
section.
   */
diff --git a/include/urcu/static/urcu-memb.h b/include/urcu/static/urcu-memb.h
index c8d102f..b923f73 100644
--- a/include/urcu/static/urcu-memb.h
+++ b/include/urcu/static/urcu-memb.h
@@ -86,8 +86,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, urcu_memb_reader);
  
  /*

   * Helper for _rcu_read_lock().  The format of urcu_memb_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _rcu_read_lock() nesting, and a lower-order bit that contains either zero
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _rcu_read_lock() nesting, and a single high-order bit that contains either 
zero
   * or URCU_GP_CTR_PHASE.  The smp_mb_slave() ensures that the accesses in
   * _rcu_read_lock() happen before the subsequent read-side critical section.
   */
diff --git a/include/urcu/static/urcu-signal.h 
b/include/urcu/static/urcu-signal.h
index c7577d3..00588b8 100644
--- a/include/urcu/static/urcu-signal.h
+++ b/include/urcu/static/urcu-signal.h
@@ -64,8 +64,8 @@ extern DECLARE_URCU_TLS(struct urcu_reader, 
urcu_signal_reader);
  
  /*

   * Helper for _rcu_read_lock().  The format of urcu_signal_gp.ctr (as well as
- * the per-thread rcu_reader.ctr) has the upper bits containing a count of
- * _rcu_read_lock() nesting, and a lower-order bit that contains either zero
+ * the per-thread rcu_reader.ctr) has the lower-order bits containing a count 
of
+ * _rcu_read_lock() nesting, and a single high-order bit that contains either 
zero
   * or URCU_GP_CTR_PHASE.  The cmm_barrier() ensures that the accesses in
   * _rcu_read_lock() happen before the subsequent read-side critical section.
   */


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] Tracing Summit - Last year's 2022 talk recordings are available online!

2023-06-09 Thread Mathieu Desnoyers via lttng-dev

Hello all,

The recordings for last year’s 2022 Tracing Summit talks were just
posted to the DiaMon Workgroup channel!

2022 Tracing Summit Talks:
https://www.youtube.com/playlist?list=PLuo4E47p5_7YbvyBpSHh-wO3KUVQ81BQR

If you did not get the chance to attend last year, we invite you to take
a look at the diverse tracing talks that included eBPF and Perfetto
developments as well as updates for the core Linux kernel tracers.


This year, we’re looking forward to hearing about your new tracing
developments and challenging use cases at the 2023 Tracing Summit! If
you’re interested in exchanging ideas with experts in state-of-the-art
tracing, we invite you to submit a talk proposal soon as the deadline is
coming up next week (June 16th).

You can submit your 2023 Tracing Summit talk abstract here:
https://cfp.tracingsummit.org/ts2023/cfp

Best regards,

Mathieu



The 2023 Tracing Summit will be held in Bilbao, Spain on September 17th
and 18th, at the Euskalduna Conference Centre, co-located with Open
Source Summit Europe.

To register, you can include the Tracing Summit as an add-on to your
Open Source Summit ticket or use these links to register solely for the
Tracing Summit: https://cvent.me/Gn0nkR (in-person, 80$),
https://cvent.me/xywylX (virtual).

For more info: https://tracingsummit.org/

The 2023 Tracing Summit is sponsored by EfficiOS and organized by Erica
Bugden (EfficiOS), Olivier Dion (EfficiOS), and Mathieu Desnoyers
(EfficiOS) on behalf of the Linux Foundation Diagnostic and Monitoring
Workgroup.

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng UST 2.12.8/2.13.6 and LTTng modules 2.12.14/2.13.10 tracers

2023-06-07 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a stable release announcement for the LTTng UST and LTTng modules 
tracer projects.
Those contain mainly bug fixes and add support for recent distributions and 
upstream kernels.

What's new in both LTTng-UST 2.12.8 and 2.13.6:

- Fix: use unaligned pointer accesses for lttng_inline_memcpy

  lttng_inline_memcpy receives pointers which can be unaligned. This

  causes issues (traps) specifically on arm 32-bit with 8-byte strings
  (including \0).

- Fix: trace events in C constructors/destructors

  Adding a priority (150) to the tracepoint and tracepoint provider
  constructors/destructors ensures that we trace tracepoints located
  within C constructors/destructors with a higher priority value,
  including the default init priority of 65535, when the tracepoint vs
  tracepoint definition vs tracepoint probe provider are in different
  compile units (and in various link order one compared to another).

- Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is included

  Fix issues with missing symbols in use-cases where tracef.h is included
  before defining LTTNG_UST_TRACEPOINT_DEFINE

- Fix: segmentation fault on filter interpretation in "switch" mode

  Fix a bytecode interpreter crash when building with INTERPRETER_USE_SWITCH
  defined (used mainly for debugging purposes).


What's new specifically in LTTng-UST 2.13.6:

- Fix: `ip` context is expressed as a base-10 field

  The base for UST context field `ip` was changed from 16 (hexadecimal) to
  10 (decimal), most likely an unintentional copy error in 4e48b5d.

- Various fixes to build with -std=c99.

- Fix: trace events in C++ constructors/destructors

  Wrap constructor and destructor functions to invoke them as functions with
  the constructor/destructor GNU C attributes, which ensures that those
  constructors/destructors are ordered before/after C++
  constructors/destructors.


What's new in LTTng modules 2.12.14 and 2.13.10:

- fix: kallsyms wrapper on CONFIG_PPC64_ELF_ABI_V1

  Work-around PPC64 ELF ABI v1 function descriptor issues when using kallsyms.

- Add support for RHEL 9.0 and 9.1.


What's new specifically in LTTng modules 2.12.14:

- Various tracepoint instrumentation fixes to support kernel v5.18.


What's new specifically in LTTng modules 2.13.10:

- Various tracepoint instrumentation fixes to support kernel v6.3.


Feedback is welcome!

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-18 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-18 15:20, Brian Hutchinson wrote:

On Thu, May 18, 2023 at 3:07 PM Brian Hutchinson  wrote:


On Thu, May 18, 2023 at 3:03 PM Mathieu Desnoyers
 wrote:


On 2023-05-18 14:58, Brian Hutchinson wrote:

On Thu, May 18, 2023 at 11:00 AM Brian Hutchinson  wrote:


On Thu, May 18, 2023 at 10:45 AM Mathieu Desnoyers
 wrote:


On 2023-05-18 10:10, Brian Hutchinson wrote:
[...]

I updated my hello world to have a function I'd like to use the
--userspace-probe method on with the very original name of
'probe_function':

#include 
#include 

void probe_function(int i);

int main(int argc, char *argv[])
{
  unsigned int i;
  puts("Hello, World!\nPress Enter to continue...");
  /*
   * The following getchar() call only exists for the purpose of this
   * demonstration, to pause the application in order for you to have
   * time to list its tracepoints. You don't need it otherwise.
   */
  getchar();

  lttng_ust_tracef("Number %d, string %s", 23, "hi there!");
  printf("Number %d, string %s", 23, "hi there!");

  for (i = 0; i < argc; i++) {
  lttng_ust_tracef("Number %d, argv %s", i, argv[i]);
  printf("Number %d, argv %s", i, argv[i]);
  }

  puts("Quitting now!");

  probe_function(i);

  return 0;
}

void probe_function(int i) {

  lttng_ust_tracef("Number %d, string %s", i * i, "i^2");
  printf("Number %d, string %s", i * i, "i^2");

}

... and I get the same error as before when I try to enable the probe:
# lttng enable-event --kernel
--userspace-probe=/usr/local/bin/hello:probe_function
Error: Missing event name(s).


As the error states, you are missing the event name. See

man 1 lttng-enable-event

  lttng [GENERAL OPTIONS] enable-event --kernel
[--probe=SOURCE | --function=SOURCE | --syscall |
 --userspace-probe=SOURCE]
[--filter=EXPR] [--session=SESSION]
[--channel=CHANNEL] EVENT[,EVENT]...


You will want something like:

lttng enable-event --kernel 
--userspace-probe=/usr/local/bin/hello:probe_function my_probe_function

Where "my_probe_function" is the event name that will appear in the collected 
traces.


Wow!  I must not have woken up this morning ha, ha.  Thanks for that!
The event is enabled now.  Hope to actually get tracing data now.


Well, I guess we just have the app that thwarts all attempts at tracing.

I did a dynamic probe on several functions that should be getting
called like crazy and again I get no tracing data.

Tried it with my hello world example above after Mathieu set me
straight on the event syntax and it works.

I saw this comment in the documentation "As of this version, only USDT
probes that are not surrounded by a reference counter (semaphore) are
supported."

I don't know that I can say that this function I'm probing isn't
"surrounded" by a reference counter, it's in a large multi-threaded
application so I guess it's possible.

Sigh, I'm striking out every which way.

No offense (since this is lttng list - please don't flame me ... I
want/need lttng), but I think I'm going to try just straight kprobes
and uprobes and see if trace compass can show those traces in an
attempt to get "something/anything" working.


If you attach to an ELF symbol (function), then there is no USDT in
play, so it should not be related to the issue you have.


That is what I was thinking which is why I wanted to try it.



But if your functions happen to be inlined, then there will be nothing
to attach to. Perhaps this is what happens there ?


I don't see any evidence of anything being inlined in this module.  I
grepped the code to verify.

Back to being stumped/stuck.


I can do trace-cmd stuff and it works.  The hello world above works so
I don't "think" this is a problem but again in full disclosure I'll
mention/ask about it.

Does any of the lttng tools/libs depend on kernel headers?  I ask
because old yocto (Dunfell) built lttng package against a 4.something
kernel and we're running a 5.10.69 kernel that lttng modules were
added to it with the "builtin" script and built that way.

Should probably have yocto build the local kernel too, but kernel is
being built stand alone due to vendor stuff that hasn't been mainlined
yet.

I'm running out of things to think about that could be the issue.


If lttng-modules can trace your smaller test application through 
uprobes, then the problem is likely elsewhere.


Only lttng-modules has dependencies on kernel headers. lttng-tools/ust 
don't depend on kernel headers.


Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-18 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-18 15:07, Brian Hutchinson wrote:

[...]



If you attach to an ELF symbol (function), then there is no USDT in
play, so it should not be related to the issue you have.


That is what I was thinking which is why I wanted to try it.



But if your functions happen to be inlined, then there will be nothing
to attach to. Perhaps this is what happens there ?


I don't see any evidence of anything being inlined in this module.  I
grepped the code to verify.

Back to being stumped/stuck.


Make sure to check the resulting assembler and ELF symbol tables.

The compiler is free to inline various functions unless they are 
explicitly marked as __attribute__((noinline)). Also, if LTO is enabled, 
further optimization can be done at link-time.


One purpose of the UST tracepoints is to be less fragile with respect to 
specific optimizations done by the compiler and linker, thus 
guaranteeing that whatever is instrumented with a tracepoint is indeed 
available for tracing.


Also, double-check that the path you pass to --userspace-probe really 
targets your executable or .so binary file, and is not just a symbolic link.


Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-18 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-18 14:58, Brian Hutchinson wrote:

On Thu, May 18, 2023 at 11:00 AM Brian Hutchinson  wrote:


On Thu, May 18, 2023 at 10:45 AM Mathieu Desnoyers
 wrote:


On 2023-05-18 10:10, Brian Hutchinson wrote:
[...]

I updated my hello world to have a function I'd like to use the
--userspace-probe method on with the very original name of
'probe_function':

#include 
#include 

void probe_function(int i);

int main(int argc, char *argv[])
{
 unsigned int i;
 puts("Hello, World!\nPress Enter to continue...");
 /*
  * The following getchar() call only exists for the purpose of this
  * demonstration, to pause the application in order for you to have
  * time to list its tracepoints. You don't need it otherwise.
  */
 getchar();

 lttng_ust_tracef("Number %d, string %s", 23, "hi there!");
 printf("Number %d, string %s", 23, "hi there!");

 for (i = 0; i < argc; i++) {
 lttng_ust_tracef("Number %d, argv %s", i, argv[i]);
 printf("Number %d, argv %s", i, argv[i]);
 }

 puts("Quitting now!");

 probe_function(i);

 return 0;
}

void probe_function(int i) {

 lttng_ust_tracef("Number %d, string %s", i * i, "i^2");
 printf("Number %d, string %s", i * i, "i^2");

}

... and I get the same error as before when I try to enable the probe:
# lttng enable-event --kernel
--userspace-probe=/usr/local/bin/hello:probe_function
Error: Missing event name(s).


As the error states, you are missing the event name. See

man 1 lttng-enable-event

 lttng [GENERAL OPTIONS] enable-event --kernel
   [--probe=SOURCE | --function=SOURCE | --syscall |
--userspace-probe=SOURCE]
   [--filter=EXPR] [--session=SESSION]
   [--channel=CHANNEL] EVENT[,EVENT]...


You will want something like:

lttng enable-event --kernel 
--userspace-probe=/usr/local/bin/hello:probe_function my_probe_function

Where "my_probe_function" is the event name that will appear in the collected 
traces.


Wow!  I must not have woken up this morning ha, ha.  Thanks for that!
The event is enabled now.  Hope to actually get tracing data now.


Well, I guess we just have the app that thwarts all attempts at tracing.

I did a dynamic probe on several functions that should be getting
called like crazy and again I get no tracing data.

Tried it with my hello world example above after Mathieu set me
straight on the event syntax and it works.

I saw this comment in the documentation "As of this version, only USDT
probes that are not surrounded by a reference counter (semaphore) are
supported."

I don't know that I can say that this function I'm probing isn't
"surrounded" by a reference counter, it's in a large multi-threaded
application so I guess it's possible.

Sigh, I'm striking out every which way.

No offense (since this is lttng list - please don't flame me ... I
want/need lttng), but I think I'm going to try just straight kprobes
and uprobes and see if trace compass can show those traces in an
attempt to get "something/anything" working.


If you attach to an ELF symbol (function), then there is no USDT in 
play, so it should not be related to the issue you have.


But if your functions happen to be inlined, then there will be nothing 
to attach to. Perhaps this is what happens there ?


Mathieu



Regards,

Brian


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-18 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-18 10:10, Brian Hutchinson wrote:
[...]

I updated my hello world to have a function I'd like to use the
--userspace-probe method on with the very original name of
'probe_function':

#include 
#include 

void probe_function(int i);

int main(int argc, char *argv[])
{
unsigned int i;
puts("Hello, World!\nPress Enter to continue...");
/*
 * The following getchar() call only exists for the purpose of this
 * demonstration, to pause the application in order for you to have
 * time to list its tracepoints. You don't need it otherwise.
 */
getchar();

lttng_ust_tracef("Number %d, string %s", 23, "hi there!");
printf("Number %d, string %s", 23, "hi there!");

for (i = 0; i < argc; i++) {
lttng_ust_tracef("Number %d, argv %s", i, argv[i]);
printf("Number %d, argv %s", i, argv[i]);
}

puts("Quitting now!");

probe_function(i);

return 0;
}

void probe_function(int i) {

lttng_ust_tracef("Number %d, string %s", i * i, "i^2");
printf("Number %d, string %s", i * i, "i^2");

}

... and I get the same error as before when I try to enable the probe:
# lttng enable-event --kernel
--userspace-probe=/usr/local/bin/hello:probe_function
Error: Missing event name(s).


As the error states, you are missing the event name. See

man 1 lttng-enable-event

   lttng [GENERAL OPTIONS] enable-event --kernel
 [--probe=SOURCE | --function=SOURCE | --syscall |
  --userspace-probe=SOURCE]
 [--filter=EXPR] [--session=SESSION]
 [--channel=CHANNEL] EVENT[,EVENT]...


You will want something like:

lttng enable-event --kernel 
--userspace-probe=/usr/local/bin/hello:probe_function my_probe_function

Where "my_probe_function" is the event name that will appear in the collected 
traces.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-17 12:37, Brian Hutchinson wrote:

On Wed, May 17, 2023 at 12:08 PM Mathieu Desnoyers
 wrote:


On 2023-05-16 22:11, Brian Hutchinson via lttng-dev wrote:

Hi,

I'm trying to figure out how to use uprobes with lttng.

I can't use a normal uprobe for a line number just using the address I
want to probe obtained from objdump?  As in:

echo 'p /usr/local/bin/my_app:0x2c3a8' >>
/sys/kernel/debug/tracing/uprobe_events

... which isn't a function entry, it's just a line of code I want to probe on.

This link says it has to be elf or sdt:
https://lttng.org/man/1/lttng-enable-event/v2.11/#doc-opt--userspace-probe

So can I not probe on just a line of code by specifying an address???

It doesn't look like these methods above will do what I'm wanting to
do.  I've tried to find examples of using enable-event --kernel
--userspace-probe= but there doesn't appear to be many.



There are examples here:

https://lttng.org/docs/v2.13/#doc-enabling-disabling-events

Indeed inserting a lttng-modules uprobe within functions is not
supported at the moment, mainly because we prefer to err towards safety
and don't have the validation in place to prevent corrupting the
program's instructions if an end user would try to insert a uprobe at an
address which is not an instruction boundary.


Hmm, was really hoping to be able to do dynamic tracing without having
to modify code.


uprobes with the proper validations about instruction boundaries would 
eventually provide this. Another approach we want to invest time in is 
to integrate libpatch from Olivier Dion into lttng-ust. This would 
provide dynamic instrumentation with the performance of a purely 
userspace tracer.


But those are all things that were never prioritized by any of our 
customers, so they progress at a "back burner" pace.




I guess if I add a function call to a debug statement or something at
the point I want to probe then I could use the elf example.


Yes.





So we only support inserting uprobe on functions and SDT probes at
the moment.


I've heard of system tap but never used it.  Will have to look into that.

I really want to get lttng-ust working but I'm getting pushback on the
time I'm spending trying to get it to work ... and would really like
to demonstrate something (was hoping kernel events and uprobes)
quickly to an audience that knows nothing about lttng or full stack
tracing to gain "buy in" for the effort.


Understood.

The main thing we are missing to help you on the UST front is a console 
log of the _application_ with LTTNG_UST_DEBUG=1. I suspect it is not 
collected in your tests.


Thanks,

Mathieu




You know, those pesky things called schedules.

Thanks,

Brian



Thanks,

Mathieu



Thanks,

Brian
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Trying to understand use of lttng enable-event --kernel --userspace-probe=

2023-05-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-16 22:11, Brian Hutchinson via lttng-dev wrote:

Hi,

I'm trying to figure out how to use uprobes with lttng.

I can't use a normal uprobe for a line number just using the address I
want to probe obtained from objdump?  As in:

echo 'p /usr/local/bin/my_app:0x2c3a8' >>
/sys/kernel/debug/tracing/uprobe_events

... which isn't a function entry, it's just a line of code I want to probe on.

This link says it has to be elf or sdt:
https://lttng.org/man/1/lttng-enable-event/v2.11/#doc-opt--userspace-probe

So can I not probe on just a line of code by specifying an address???

It doesn't look like these methods above will do what I'm wanting to
do.  I've tried to find examples of using enable-event --kernel
--userspace-probe= but there doesn't appear to be many.



There are examples here:

https://lttng.org/docs/v2.13/#doc-enabling-disabling-events

Indeed inserting a lttng-modules uprobe within functions is not 
supported at the moment, mainly because we prefer to err towards safety 
and don't have the validation in place to prevent corrupting the 
program's instructions if an end user would try to insert a uprobe at an 
address which is not an instruction boundary.


So we only support inserting uprobe on functions and SDT probes at
the moment.

Thanks,

Mathieu



Thanks,

Brian
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] Tracing Summit 2023 Announcement and CFP

2023-05-15 Thread Mathieu Desnoyers via lttng-dev

Hello all!

This is a Call for Proposals for the Tracing Summit 2023[0] which will be held 
in Bilbao,
Spain on the 17th and 18th of September, 2023. This year, the event is 
co-located with
Open Source Summit Europe 2023 [1].

- Event dates: Sunday, September 17th - Monday, September 18th
- Location: Bilbao, Spain and virtually (co-located with Open Source Summit 
Europe)
- Registration cost
- In person: $80.00 USD (Free for speakers)
- Virtual: Free
- Call for proposals link: [2]

Important dates:
- Call for proposals close: Friday, June 16th, at 11:59PM EDT
- Call for proposals notifications: Friday, June 23rd
- Schedule announcement: Tuesday, June 27th
- Event dates: Sunday, September 17th - Monday, September 18th

Stand-alone registration is expected to open next week. In the meantime, you 
can subscribe
to the mailing list to get the latest information on the event: [3]

The 2023 Tracing Summit is a two-day, single-track conference on the topic of 
tracing. The
event focuses on software and hardware tracing, gathering developers and 
end-users of tracing
and trace analysis tools. The main goal of the Tracing Summit is to provide 
space for
discussion between people of the various areas that benefit from tracing, 
namely parallel,
distributed and/or real-time systems, as well as kernel development.

We are welcoming 30 minute presentations from both end users and developers, on 
topics
covering, but not limited to:

- Investigation workflow of real-time, latency, and throughput issues,
- Trace collection and extraction,
- Trace filtering,
- Trace aggregation,
- Trace formats,
- Tracing multi-core systems,
- Trace abstraction,
- Trace modeling,
- Automated trace analysis (e.g. dependency analysis),
- Tracing large clusters and distributed systems,
- Hardware-level tracing (e.g. DSP, GPU, bare-metal),
- Trace visualization,
- Interaction between debugging and tracing,
- Tracing remote control,
- Analysis of large trace datasets,
- Cloud trace collection and analysis,
- Integration between trace tools,
- Live tracing & monitoring,
- Dynamic instrumentation,
- Programmable tracing (e.g. eBPF).

Talks can cover recently available technologies, ongoing work, and yet 
non-existing
technologies (that are compelling to end-users). Talks covering interesting or 
challenging
tracing use cases are also welcome as they can reveal future directions or 
tooling needs.

Please understand that this open forum is not the proper place to present sales 
or marketing
pitches, nor technologies which are prevented from being freely used in open 
source.

Please send any questions about this conference to .

This event is organized by EfficiOS on behalf of the Linux Foundation 
Diagnostic and
Monitoring Workgroup [4].

The organizers of this event are Mathieu Desnoyers (EfficiOS), Erica Bugden 
(EfficiOS)
and Olivier Dion (EfficiOS).

[0]: https://tracingsummit.org
[1]: https://events.linuxfoundation.org/open-source-summit-europe/
[2]: https://cfp.tracingsummit.org/ts2023/cfp
[3]: https://eepurl.com/goakfv
[4]: https://diamon.org/

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] I'm still getting empty ust traces using tracef

2023-05-12 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-12 10:52, Brian Hutchinson wrote:

Hi Mathieu,

On Fri, May 12, 2023 at 9:33 AM Mathieu Desnoyers
 wrote:


On 2023-05-12 00:10, Brian Hutchinson wrote:

Hmm, I missed this earlier somehow.

So, I'm not the greatest at updating OE and Yocto recipes.  I'm
currently using this recipe:
http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-kernel/lttng/lttng-ust_2.13.5.bb?h=master

... and it looks like the commit you are talking about is newer.

I always think, oh, I'll just update the source URI in the recipe but
it's never that simple ... and there are patches in the recipe etc.

I've got a sdk (external toolchain) built for my embedded platform.
Would it be too hard to just download stable-2.13 of everything and
cross compile it outside of Yocto?

What do you suggest?

And do I need to do anything besides just get 2.13 stable working?  I
was kind of confused if I need to put a #define
LTTNG_TRACEPOINT_DEFINE somewhere in my code.  I'm not using a
tracepoint provider packages at this point


Hi Brian,

You might want to provide a trimmed-down reproducer of your issue:
example .c compile unit instrumented with tracepoints, example .c
compile unit containing the tracepoint probes, and the log of the
console when this application is run with LTTNG_UST_DEBUG=1.


The code has two different areas where I'm trying to use tracef.  The
way the app is put together, each of these areas end up becoming
static libs that all get lumped together to make the final executable
(which is then linked with -llttng-ust and -ldl).

If I'm reading between the lines correctly with respect to the commit
you pointed out (that I'm missing), if I reduce the inclusion of I
#include  to one instance (like with the hello world
that worked), I'm thinking the version I have might work.

I don't know how I could trim down the large multi threaded app I'm
trying to debug to share.

Another dynamic I should mention in full disclosure, the app in
question has been ported from a different OS and was on a single core
cpu.  The new host ( imx8) is a quad core A53 and since the app wasn't
written for multicore, the cpu's are isolated and systemd is starting
the app on cpu 0 but once it's up it switches it's affinity to cpu 1
so I don't know if that's a factor here or not so just mentioning it.

I did try with LTTNG_UST_DEBUG=1 last night and it didn't put out much:

export LTTNG_UST_DEBUG=1
# systemctl start my_app


I suspect that because you run your application under systemctl, we are 
not seeing the console output from the application.


The console output below appears to come from liblttng-ust-ctl.so linked 
within lttng-sessiond/consumerd, not the application.


Can you find a way to run your application and capture the console output ?

Thanks,

Mathieu




#lttng create my_tc_trace --output=/tmp/my_tc_trace
Spawning a session daemon
libringbuffer-clients[711/711]
: LTT : ltt ring buffer client
"relay-metadata-mmap" init
(in lttng_ring_buffer_metadata_client_init() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/metadata-template.h:364)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-overwrite-mmap" init
(in lttng_ring_buffer_client_overwrite_init() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-overwrite-rt-mmap" init
(in lttng_ring_buffer_client_overwrite_rt_init() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-discard-mmap" init
(in lttng_ring_buffer_client_discard_init() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-discard-rt-mmap" init
(in lttng_ring_buffer_client_discard_rt_init() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:826)
[  179.384456] LTTng: Loaded modules v2.13.9 (Nordicit�é)
[  179.390366] LTTng: Experimental bitwise enum enabled.
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-discard-rt-mmap" exit
(in lttng_ring_buffer_client_discard_rt_exit() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-discard-mmap" exit
(in lttng_ring_buffer_client_discard_exit() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-overwrite-rt-mmap" exit
(in lttng_ring_buffer_client_overwrite_rt_exit() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-overwrite-mmap" exit
(in lttng_ring_buffer_client_overwrite_exit() at
../../../lttng-ust-2.13.5/src/common/ringbuffer-clients/template.h:833)
libringbuffer-clients[711/711]: LTT : ltt ring buffer client
"relay-metadata-mmap" exit
(in 

Re: [lttng-dev] I'm still getting empty ust traces using tracef

2023-05-12 Thread Mathieu Desnoyers via lttng-dev

[adding back the mailing list]

On 2023-05-12 09:33, Mathieu Desnoyers wrote:

On 2023-05-12 00:10, Brian Hutchinson wrote:

Hmm, I missed this earlier somehow.

So, I'm not the greatest at updating OE and Yocto recipes.  I'm
currently using this recipe:
http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-kernel/lttng/lttng-ust_2.13.5.bb?h=master

... and it looks like the commit you are talking about is newer.

I always think, oh, I'll just update the source URI in the recipe but
it's never that simple ... and there are patches in the recipe etc.

I've got a sdk (external toolchain) built for my embedded platform.
Would it be too hard to just download stable-2.13 of everything and
cross compile it outside of Yocto?

What do you suggest?

And do I need to do anything besides just get 2.13 stable working?  I
was kind of confused if I need to put a #define
LTTNG_TRACEPOINT_DEFINE somewhere in my code.  I'm not using a
tracepoint provider packages at this point


Hi Brian,

You might want to provide a trimmed-down reproducer of your issue: 
example .c compile unit instrumented with tracepoints, example .c 
compile unit containing the tracepoint probes, and the log of the 
console when this application is run with LTTNG_UST_DEBUG=1.






Thanks,

Mathieu



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] I'm still getting empty ust traces using tracef

2023-05-11 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-11 14:13, Mathieu Desnoyers via lttng-dev wrote:

On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote:

... more background.  I've always used ltt in the kernel so I don't
have much experience with the user side of it and especially
multi-threaded, multi-core so I'm probably missing some fundamental
concepts that I need to understand.


Which are the exact versions of LTTng-UST and LTTng-Tools you are using 
now ? (2.13.N or which git commit ?)




Also, can you try using lttng-ust stable-2.13 branch, which includes the 
following commit ?

commit be2ca8b563bab81be15cbce7b9f52422369f79f7
Author: Mathieu Desnoyers 
Date:   Tue Feb 21 14:29:49 2023 -0500

Fix: Reevaluate LTTNG_UST_TRACEPOINT_DEFINE each time tracepoint.h is 
included

Fix issues with missing symbols in use-cases where tracef.h is included

before defining LTTNG_UST_TRACEPOINT_DEFINE, e.g.:

 #include 

 #define LTTNG_UST_TRACEPOINT_DEFINE
 #include 

It is caused by the fact that tracef.h includes tracepoint.h in a

context which has LTTNG_UST_TRACEPOINT_DEFINE undefined, and this is not
re-evaluated for the following includes.

Fix this by lifting the definition code in tracepoint.h outside of the

header include guards, and #undef the old LTTNG_UST__DEFINE_TRACEPOINT
before re-defining it to its new semantic. Use a new
_LTTNG_UST_TRACEPOINT_DEFINE_ONCE include guard within the
LTTNG_UST_TRACEPOINT_DEFINE defined case to ensure symbols are not
duplicated.

Signed-off-by: Mathieu Desnoyers 

Change-Id: I0ef720435003a7ca0bfcf29d7bf27866c5ff8678

Thanks,

Mathieu



Thanks,

Mathieu



Regards,

Brian

On Thu, May 11, 2023 at 11:53 AM Brian Hutchinson 
 wrote:


Hi,

I posted a while ago (thread - Using lttng 2.11 and UST doesn't appear
to work - getting empty trace files) about this problem I'm having
with getting empty trace logs.

I've since upgraded to lttng v2.13 and while I can do a simple hello
world program with tracef and get events in the log files, my more
complicated large multi-threaded app I'm trying to debug is still
getting empty log file traces.

I can list the user space events in my app.

Next I do:

lttng enable-event --userspace 'lttng_ust_tracef:*'

... to enable the events, start lttng, start my app,  and I get a
trace directory structure that's empty.

I feel like I've read every thread in the archives about people having
the same problem.

I did try using LD_PRELOAD with various libs thinking that was the
problem but so far I'm still getting empty traces.

So far I've tried:

LD_PRELOAD=liblttng-ust-libc-wrapper.so.1:liblttng-ust-pthread-wrapper.so.1:liblttng-ust-dl.so.1:liblttng-ust-fork.so.1:liblttng-ust-fd.so.1
/usr/local/bin/my_app

I guess one question I have is how do I determine which "helper libs"
I need to preload?

The application I'm working on is made up of a bunch of smaller static
libs linked together into one big executable and that is linked with
-llttng-ust and -ldl.

I'm pretty stuck at the moment.  Anyone have any wisdom on what I
might be doing wrong or how I can tell why I'm not getting events in
the logs?

Thanks,

Brian

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] I'm still getting empty ust traces using tracef

2023-05-11 Thread Mathieu Desnoyers via lttng-dev

On 2023-05-11 12:36, Brian Hutchinson via lttng-dev wrote:

... more background.  I've always used ltt in the kernel so I don't
have much experience with the user side of it and especially
multi-threaded, multi-core so I'm probably missing some fundamental
concepts that I need to understand.


Which are the exact versions of LTTng-UST and LTTng-Tools you are using 
now ? (2.13.N or which git commit ?)


Thanks,

Mathieu



Regards,

Brian

On Thu, May 11, 2023 at 11:53 AM Brian Hutchinson  wrote:


Hi,

I posted a while ago (thread - Using lttng 2.11 and UST doesn't appear
to work - getting empty trace files) about this problem I'm having
with getting empty trace logs.

I've since upgraded to lttng v2.13 and while I can do a simple hello
world program with tracef and get events in the log files, my more
complicated large multi-threaded app I'm trying to debug is still
getting empty log file traces.

I can list the user space events in my app.

Next I do:

lttng enable-event --userspace 'lttng_ust_tracef:*'

... to enable the events, start lttng, start my app,  and I get a
trace directory structure that's empty.

I feel like I've read every thread in the archives about people having
the same problem.

I did try using LD_PRELOAD with various libs thinking that was the
problem but so far I'm still getting empty traces.

So far I've tried:

LD_PRELOAD=liblttng-ust-libc-wrapper.so.1:liblttng-ust-pthread-wrapper.so.1:liblttng-ust-dl.so.1:liblttng-ust-fork.so.1:liblttng-ust-fd.so.1
/usr/local/bin/my_app

I guess one question I have is how do I determine which "helper libs"
I need to preload?

The application I'm working on is made up of a bunch of smaller static
libs linked together into one big executable and that is linked with
-llttng-ust and -ldl.

I'm pretty stuck at the moment.  Anyone have any wisdom on what I
might be doing wrong or how I can tell why I'm not getting events in
the logs?

Thanks,

Brian

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] https://lists.lttng.org/pipermail/lttng-dev/2020-May/029631.html

2023-03-27 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-26 11:00, yashvardhan kukreti wrote:


Hi Mathew,

I have a question about this patch for lttng-modules and the use of
register_kprobe() to fetch the function ptr.
The question in this regard is especially from PPC64 ELF_ABI_v1
perspective.

The functions on PPC64 are accessed via the Function descriptor
while what register_kprobes returns is the entry point of the function.
Hence using the return pointer tends to interpret the addr as the
address of the function descriptor and dereferences the ppc_inst as
the function entry point and crashes

[ 4145.483594] kernel tried to execute exec-protected page
(7c0802a6fb81ffe0) - exploit attempt? (uid: 0)
here 7c0802a6 is the mfspr instruction from the code text section of
the kallsyms_lookup_name()

note for PPC_ELF_ABI_v1 the register_kprobes() searches for the dot
variant of the symbol and only in case if cannot find the dot
variant looks for the normal symbol.
register_kprobe() -> kprobe_addr() -> kprobe_lookup_name() [arch
variant replaces weak symbol]
https://elixir.bootlin.com/linux/v5.10.174/C/ident/kprobe_lookup_name 


Please let me know if i make sense or that i may have missed something.

I have looked at the code of 2.12.8 as well and 2.12.3 verstion of
lttng-modules.


Please have a look at commits (from stable-2.12 branch of lttng-modules):

commit 53772db24facd84f1f3ddcf21a1ef5f162608721
Author: He Zhe 
Date:   Tue Sep 27 15:59:42 2022 +0800

wrapper: powerpc64: fix kernel crash caused by do_get_kallsyms

commit 8fe888d86ccad4226b05a536efb73d71bb091062
Author: Michael Jeanson 
Date:   Thu Nov 24 14:25:33 2022 -0500

fix: kallsyms wrapper on ppc64el

I suspect you'll also need this change currently in review:

https://review.lttng.org/c/lttng-modules/+/9113

Please let us know if especially this last change fixes things on your side.

Thanks,

Mathieu




Regards,
Shashank



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

2023-03-22 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-22 07:01, Ondřej Surý via lttng-dev wrote:

On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev  
wrote:

That's pretty much weird because the "Write" happens on stack local variable,
while the "Previous write" happens after futex, which lead me to the fact that
ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:

https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4


FTR neither annotating the futex with __tsan_acquire(addr) and 
__tsan_release(addr)
nor falling back to compat_futex_async() for ThreadSanitizer has helped.

It seems to me that TSAN still doesn't understand the synchronization between
RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
following reports:

   Write of size 8 at 0x7b5409c0 by thread T102:
 #0 __tsan_memset  (badcache_test+0x49257d) (BuildId: 
a7c1595d61e3ee411276cf89a536a8daefa959a3)
 #1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 
(libisc-9.19.12-dev.so+0x7d136) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 
(libisc-9.19.12-dev.so+0x7e0c3) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #3 bcentry_destroy_rcu 
/home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 
(libdns-9.19.12-dev.so+0x4e071) (BuildId: 
8a550b795003cd1075ff29590734c806d84e76e6)
 #4 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 
(liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)

   Previous atomic write of size 8 at 0x7b5409c0 by main thread (mutexes: 
write M0):
 #0 ___cds_wfcq_append 
/home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 
(liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
 #1 _cds_wfcq_enqueue 
/home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 
(liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
 #2 _call_rcu 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 
(liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
 #3 urcu_mb_barrier 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 
(liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
 #4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 
(libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 
8a550b795003cd1075ff29590734c806d84e76e6)
 [...]

E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is 
added to call_rcu() queue
and when call_rcu callbacks are called.  Annotating the bcentry with 
acquire/release here helps with this
particular data race, but it does not feel right to me to add annotation at 
this level.

The code is not very complicated there:

static void
bcentry_destroy_rcu(struct rcu_head *rcu_head) {
 dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
   rcu_head);
 /* __tsan_release(bad); <-- this helps */
 dns_badcache_t *bc = bad->bc;

 isc_mem_put(bc->mctx, bad, sizeof(*bad));

 dns_badcache_detach();
}

static void
bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
 /* There can be multiple deleters now */
 if (cds_lfht_del(ht, >ht_node) == 0) {
 /* __tsan_acquire(bad); <- this helps */
 call_rcu(>rcu_head, bcentry_destroy_rcu);
 }
}

static void
badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
 struct cds_lfht *oldht = rcu_xchg_pointer(>ht, ht);

 synchronize_rcu();

 rcu_read_lock();
 dns_bcentry_t *bad = NULL;
 struct cds_lfht_iter iter;
 cds_lfht_for_each_entry (oldht, , bad, ht_node) {
 bcentry_evict(oldht, bad);
 }
 rcu_read_unlock();
 rcu_barrier();
 RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
}

Any ideas?


I suspect what happens here is that TSAN is considering the

/*
 * Implicit memory barrier after uatomic_xchg() orders store to
 * q->tail before store to old_tail->next.
 *
 * At this point, dequeuers see a NULL tail->p->next, which
 * indicates that the queue is being appended to. The following
 * store will append "node" to the queue from a dequeuer
 * perspective.
 */
CMM_STORE_SHARED(old_tail->next, new_head);

within ___cds_wfcq_append() as racy.

This pairs with:

/*
 * Waiting for enqueuer to complete enqueue and return the next node.
 */
static inline struct cds_wfcq_node *
___cds_wfcq_node_sync_next(struct cds_wfcq_node *node, int blocking)
{
struct cds_wfcq_node *next;
int attempt = 0;

/*
 * Adaptative busy-looping waiting for enqueuer to complete enqueue.
 */
while ((next = 

Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

2023-03-22 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-22 04:02, Ondřej Surý via lttng-dev wrote:

Hi,

this happens with all the patches fully applied and doesn't seem to be caused 
by anything I am doing :)

WARNING: ThreadSanitizer: data race (pid=3995296)
   Write of size 8 at 0x7fb51e5fd048 by thread T296:
 #0 __tsan_memset  (badcache_test+0x49257d) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
 #1 urcu_mb_synchronize_rcu 
/home/ondrej/Projects/userspace-rcu/src/urcu.c:412:2 (liburcu-mb.so.8+0x35e0) 
(BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #2 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 
(liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

   Previous atomic write of size 4 at 0x7fb51e5fd048 by thread T295:
 #0 urcu_adaptative_wake_up 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:138:2 
(liburcu-mb.so.8+0x8cb9) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #1 urcu_wake_all_waiters 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:214:3 
(liburcu-mb.so.8+0x41de) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #2 urcu_mb_synchronize_rcu 
/home/ondrej/Projects/userspace-rcu/src/urcu.c:522:2 (liburcu-mb.so.8+0x3766) 
(BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #3 call_rcu_thread 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 
(liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

   Location is stack of thread T296.

   Thread T296 (tid=3995606, running) created by thread T272 at:
 #0 pthread_create  (badcache_test+0x44d5fb) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
 #1 call_rcu_data_init 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 
(liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #2 __create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 
(liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #3 urcu_mb_create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 
(liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 
(libisc-9.19.12-dev.so+0x7a0a0) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 
(libisc-9.19.12-dev.so+0x77890) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #6 isc__trampoline_run 
/home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 
(libisc-9.19.12-dev.so+0xaa6be) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)

   Thread T295 (tid=3995605, running) created by thread T261 at:
 #0 pthread_create  (badcache_test+0x44d5fb) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7)
 #1 call_rcu_data_init 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 
(liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #2 __create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 
(liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #3 urcu_mb_create_call_rcu_data 
/home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 
(liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
 #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 
(libisc-9.19.12-dev.so+0x7a0a0) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 
(libisc-9.19.12-dev.so+0x77890) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)
 #6 isc__trampoline_run 
/home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 
(libisc-9.19.12-dev.so+0xaa6be) (BuildId: 
a33cd26e483b73684928b4782627f1278c001605)

SUMMARY: ThreadSanitizer: data race 
(/home/ondrej/Projects/bind9/tests/dns/.libs/badcache_test+0x49257d) (BuildId: 
166dea93b2dca28264fc85c79b56d116cd491ed7) in __tsan_memset

This is between:
- DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
and
- uatomic_or(>state, URCU_WAIT_TEARDOWN);

That's pretty much weird because the "Write" happens on stack local variable,


Yes, it's the initialization of this "wait state" variable, which is located on 
the stack of the thread being blocked.

At initialization time, there are no concurrent accesses to this variable.


while the "Previous write" happens after futex,


That "previous" write was clearly for an unrelated prior blocking of the same thread. 
Basically what is missing here is information about the lifetime of this urcu_wait_node object. 
After urcu_adaptative_busy_wait() has observed the state (uatomic_read(>state) & 
URCU_WAIT_TEARDOWN), it knows that the object has no concurrent user anymore, which means the thread 
can move forward and reclaim this area of the stack for other uses.

So somehow we should add an 

Re: [lttng-dev] RCU API usage from call_rcu callbacks?

2023-03-22 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:

Hi,

the documentation is pretty silent on this, and asking here is probably going 
to be faster
than me trying to use the source to figure this out.

Is it legal to call_rcu() from within the call_rcu() callback?


Yes. call_rcu callbacks can be chained.

Note that you'll need to issue rcu_barrier() on program exit as many times as 
you chained call_rcu callbacks if you intend to make sure no queued callbacks 
still exist on program clean shutdown. See this comment above 
urcu_call_rcu_exit():

 * Teardown the default call_rcu worker thread if there are no queued
 * callbacks on process exit. This prevents leaking memory.
 *
 * Here is how an application can ensure graceful teardown of this
 * worker thread:
 *
 * - An application queuing call_rcu callbacks should invoke
 *   rcu_barrier() before it exits.
 * - When chaining call_rcu callbacks, the number of calls to
 *   rcu_barrier() on application exit must match at least the maximum
 *   number of chained callbacks.
 * - If an application chains callbacks endlessly, it would have to be
 *   modified to stop chaining callbacks when it detects an application
 *   exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
 *   after setting that flag.
 * - The statements above apply to a library which queues call_rcu
 *   callbacks, only it needs to invoke rcu_barrier in its library
 *   destructor.




What about the other RCU (and CDS) API calls?


They can be unless stated otherwise. For instance, rcu_barrier() cannot be 
called from a call_rcu worker thread.



How does that interact with create_call_rcu_data()?  I have  event loops and 
I am
initializing  1:1 call_rcu helper threads as I need to do some per-thread 
initialization
as some of the destroy-like functions use random numbers (don't ask).


As I recall, set_thread_call_rcu_data() will associate a call_rcu worker 
instance for the current thread. So all following call_rcu() invocations from 
that thread will be queued into this per-thread call_rcu queue, and handled by 
the call_rcu worker thread.

But I wonder why you inherently need this 1:1 mapping, rather than using the 
content of the structure containing the rcu_head to figure out which per-thread 
data should be used ?

If you manage to separate the context from the worker thread instances, then 
you could use per-cpu call_rcu worker threads, which will eventually scale even 
better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids 
[1].



If it's legal to call_rcu() from call_rcu thread, which thread is going to be 
used?


The call_rcu invoked from the call_rcu worker thread will queue the call_rcu 
callback onto the queue handled by that worker thread. It does so by setting

  URCU_TLS(thread_call_rcu_data) = crdp;

early in call_rcu_thread(). So any chained call_rcu is handled by the same 
call_rcu worker thread doing the chaining, with the exception of teardown where 
the pending callbacks are moved to the default worker thread.

Thanks,

Mathieu

[1] 
https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoy...@efficios.com/




Thank you,
Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Fwd: how to disable local file writing in relayd?

2023-03-22 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-22 02:39, Yuan Bin via lttng-dev wrote:



  Can I disable local-file-writing in lttng-relayd to avoid the disk 
space overhead, only using it as a live viewer?




I am not sure why you bump this email thread. I already answered here. 
Perhaps you did not receive my reply ?


https://lists.lttng.org/pipermail/lttng-dev/2023-March/030358.html

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-20 15:38, Duncan Sands via lttng-dev wrote:

Hi Mathieu,

While OK for the general case, I would recommend that we immediately 
implement something more efficient on x86 32/64 which takes into 
account that __ATOMIC_ACQ_REL atomic operations are implemented with 
LOCK prefixed atomic ops, which imply the barrier already, leaving the 
before/after_uatomic_*() as no-ops.


maybe first check whether the GCC optimizers merge them.  I believe some 
optimizations of atomic primitives are allowed and implemented, but I 
couldn't say which ones.


Best wishes, Duncan.


Tested on godbolt.org with:

int a;

void fct(void)
{
(void) __atomic_add_fetch(, 1, __ATOMIC_RELAXED);
__atomic_thread_fence(__ATOMIC_SEQ_CST);
}

x86-64 gcc 12.2 -O2 -std=c11:

fct:
lock addDWORD PTR a[rip], 1
lock or QWORD PTR [rsp], 0
ret
a:
.zero   4

x86-64 clang 16.0.0 -O2 -std=c11:

fct:# @fct
lockinc dword ptr [rip + a]
mfence
ret
a:
.long   0

So none of gcc/clang optimize this today, hence the need for an 
x86-specific implementation.


Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 10:51, Ondřej Surý via lttng-dev wrote:

When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to unsigned long to fix the
following compiler error:


You will need to update the patch subject as well.

Thanks,

Mathieu



rculfhash.c:1201:2: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
uatomic_or(>next, REMOVED_FLAG);
^
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
  ^ ~~
rculfhash.c:1444:3: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
uatomic_or(_bucket->next, REMOVED_FLAG);
^~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
(void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
  ^ ~~

This was not a problem before because the way the uatomic_or was
implemented, but now we directly pass the addr to __atomic_or_fetch()
and the compiler doesn't like the implicit conversion from pointer to
pointer to integer.

Signed-off-by: Ondřej Surý 
---
  src/rculfhash.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..5292725 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(>next, REMOVED_FLAG);
+   uatomic_or((unsigned long *)>next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
  
  	/*

@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(_bucket->next, REMOVED_FLAG);
+   uatomic_or((unsigned long *)_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 5/7] Replace the arch-specific memory barriers with __atomic builtins

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 09:31, Ondřej Surý via lttng-dev wrote:

Instead of a custom code, use the __atomic_thread_fence() builtin to
implement the cmm_mb(), cmm_rmb(), cmm_wmb(), cmm_smp_mb(),
cmm_smp_rmb(), and cmm_smp_wmb() on all architectures, and
cmm_read_barrier_depends() on alpha (otherwise it's still no-op).

family of functions

Signed-off-by: Ondřej Surý 
---
  include/urcu/arch/alpha.h   |  6 +++---
  include/urcu/arch/arm.h | 14 -
  include/urcu/arch/generic.h |  6 +++---
  include/urcu/arch/mips.h|  6 --
  include/urcu/arch/nios2.h   |  2 --
  include/urcu/arch/ppc.h | 25 --
  include/urcu/arch/s390.h|  2 --
  include/urcu/arch/sparc64.h | 13 
  include/urcu/arch/x86.h | 42 +++--
  9 files changed, 9 insertions(+), 107 deletions(-)

diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..61687c7 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,9 @@
  extern "C" {
  #endif
  
-#define cmm_mb()			__asm__ __volatile__ ("mb":::"memory")

-#define cmm_wmb()  __asm__ __volatile__ ("wmb":::"memory")
-#define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory")
+#ifndef cmm_read_barrier_depends
+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_CONSUME)
+#endif



I don't expect a #ifndef in arch-specific code. I would expect the 
ifndef in the generic code.


[...]


diff --git a/include/urcu/arch/generic.h b/include/urcu/arch/generic.h
index be6e41e..2715162 100644
--- a/include/urcu/arch/generic.h
+++ b/include/urcu/arch/generic.h
@@ -44,15 +44,15 @@ extern "C" {
   */
  
  #ifndef cmm_mb

-#define cmm_mb()__sync_synchronize()
+#define cmm_mb()   __atomic_thread_fence(__ATOMIC_SEQ_CST)
  #endif
  
  #ifndef cmm_rmb

-#define cmm_rmb()  cmm_mb()
+#define cmm_rmb()  __atomic_thread_fence(__ATOMIC_ACQUIRE)
  #endif
  
  #ifndef cmm_wmb

-#define cmm_wmb()  cmm_mb()
+#define cmm_wmb()  __atomic_thread_fence(__ATOMIC_RELEASE)


I don't think rmb/wmb map to ACQUIRE/RELEASE semantic. This is incorrect 
AFAIU. ACQUIRE/RELEASE are semi-permeable barriers preventing code 
motion in one direction or the other, whereas rmb/wmb are barriers that 
only affect code motion of either loads or stores (but in both directions).


In the generic case, rmb/wmb could map to 
__atomic_thread_fence(__ATOMIC_SEQ_CST).




  #endif
  
  #define cmm_mc()	cmm_barrier()

[...]

diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h
index 791529e..618f79c 100644
--- a/include/urcu/arch/ppc.h
+++ b/include/urcu/arch/ppc.h
@@ -34,31 +34,6 @@ extern "C" {
  /* Include size of POWER5+ L3 cache lines: 256 bytes */
  #define CAA_CACHE_LINE_SIZE   256
  
-#ifdef __NO_LWSYNC__

-#define LWSYNC_OPCODE  "sync\n"
-#else
-#define LWSYNC_OPCODE  "lwsync\n"
-#endif
-
-/*
- * Use sync for all cmm_mb/rmb/wmb barriers because lwsync does not
- * preserve ordering of cacheable vs. non-cacheable accesses, so it
- * should not be used to order with respect to MMIO operations.  An
- * eieio+lwsync pair is also not enough for cmm_rmb, because it will
- * order cacheable and non-cacheable memory operations separately---i.e.
- * not the latter against the former.
- */
-#define cmm_mb() __asm__ __volatile__ ("sync":::"memory")


I agree that we will want to use the generic implementation for smp_mb.


-
-/*
- * lwsync orders loads in cacheable memory with respect to other loads,
- * and stores in cacheable memory with respect to other stores.
- * Therefore, use it for barriers ordering accesses to cacheable memory
- * only.
- */
-#define cmm_smp_rmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
-#define cmm_smp_wmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")


I suspect that using the generic implementation will be slower than 
lwsync. I am tempted to keep a custom implementation for rmb/wmb on ppc. 
We could have a build mode specific for TSAN which overrides those to 
use smp_mb instead.



-
  #define mftbl()   \
__extension__   \
({  \

[...]

diff --git a/include/urcu/arch/sparc64.h b/include/urcu/arch/sparc64.h
index 1ff40f5..b4e25ca 100644
--- a/include/urcu/arch/sparc64.h
+++ b/include/urcu/arch/sparc64.h
@@ -40,19 +40,6 @@ extern "C" {
  
  #define CAA_CACHE_LINE_SIZE	256
  
-/*

- * Inspired from the Linux kernel. Workaround Spitfire bug #51.
- */
-#define membar_safe(type)  \
-__asm__ __volatile__("ba,pt %%xcc, 1f\n\t"   \
-"membar " type "\n"\
-"1:\n"   \
-: : : "memory")
-
-#define cmm_mb()   membar_safe("#LoadLoad | #LoadStore | #StoreStore | 
#StoreLoad")
-#define cmm_rmb()  membar_safe("#LoadLoad")
-#define cmm_wmb()  

Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 09:30, Ondřej Surý via lttng-dev wrote:

Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.

Signed-off-by: Ondřej Surý 
---
  include/Makefile.am|  16 -
  include/urcu/uatomic.h |  84 +++--
  include/urcu/uatomic/aarch64.h |  41 ---
  include/urcu/uatomic/alpha.h   |  32 --
  include/urcu/uatomic/arm.h |  57 ---
  include/urcu/uatomic/gcc.h |  46 ---
  include/urcu/uatomic/generic.h | 613 ---
  include/urcu/uatomic/hppa.h|  10 -
  include/urcu/uatomic/ia64.h|  41 ---
  include/urcu/uatomic/m68k.h|  44 ---
  include/urcu/uatomic/mips.h|  32 --
  include/urcu/uatomic/nios2.h   |  32 --
  include/urcu/uatomic/ppc.h | 237 
  include/urcu/uatomic/riscv.h   |  44 ---
  include/urcu/uatomic/s390.h| 170 -
  include/urcu/uatomic/sparc64.h |  81 -
  include/urcu/uatomic/tile.h|  41 ---
  include/urcu/uatomic/x86.h | 646 -
  18 files changed, 53 insertions(+), 2214 deletions(-)
  delete mode 100644 include/urcu/uatomic/aarch64.h
  delete mode 100644 include/urcu/uatomic/alpha.h
  delete mode 100644 include/urcu/uatomic/arm.h
  delete mode 100644 include/urcu/uatomic/gcc.h
  delete mode 100644 include/urcu/uatomic/generic.h
  delete mode 100644 include/urcu/uatomic/hppa.h
  delete mode 100644 include/urcu/uatomic/ia64.h
  delete mode 100644 include/urcu/uatomic/m68k.h
  delete mode 100644 include/urcu/uatomic/mips.h
  delete mode 100644 include/urcu/uatomic/nios2.h
  delete mode 100644 include/urcu/uatomic/ppc.h
  delete mode 100644 include/urcu/uatomic/riscv.h
  delete mode 100644 include/urcu/uatomic/s390.h
  delete mode 100644 include/urcu/uatomic/sparc64.h
  delete mode 100644 include/urcu/uatomic/tile.h
  delete mode 100644 include/urcu/uatomic/x86.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..53a28fd 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -59,24 +59,8 @@ nobase_include_HEADERS = \
urcu/syscall-compat.h \
urcu/system.h \
urcu/tls-compat.h \
-   urcu/uatomic/aarch64.h \
-   urcu/uatomic/alpha.h \
urcu/uatomic_arch.h \
-   urcu/uatomic/arm.h \
-   urcu/uatomic/gcc.h \
-   urcu/uatomic/generic.h \
urcu/uatomic.h \
-   urcu/uatomic/hppa.h \
-   urcu/uatomic/ia64.h \
-   urcu/uatomic/m68k.h \
-   urcu/uatomic/mips.h \
-   urcu/uatomic/nios2.h \
-   urcu/uatomic/ppc.h \
-   urcu/uatomic/riscv.h \
-   urcu/uatomic/s390.h \
-   urcu/uatomic/sparc64.h \
-   urcu/uatomic/tile.h \
-   urcu/uatomic/x86.h \
urcu/urcu-bp.h \
urcu/urcu-futex.h \
urcu/urcu.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..0327810 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,37 +22,59 @@
  #define _URCU_UATOMIC_H
  
  #include 

+#include 
  
-#if defined(URCU_ARCH_X86)

-#include 
-#elif defined(URCU_ARCH_PPC)
-#include 
-#elif defined(URCU_ARCH_S390)
-#include 
-#elif defined(URCU_ARCH_SPARC64)
-#include 
-#elif defined(URCU_ARCH_ALPHA)
-#include 
-#elif defined(URCU_ARCH_IA64)
-#include 
-#elif defined(URCU_ARCH_ARM)
-#include 
-#elif defined(URCU_ARCH_AARCH64)
-#include 
-#elif defined(URCU_ARCH_MIPS)
-#include 
-#elif defined(URCU_ARCH_NIOS2)
-#include 
-#elif defined(URCU_ARCH_TILE)
-#include 
-#elif defined(URCU_ARCH_HPPA)
-#include 
-#elif defined(URCU_ARCH_M68K)
-#include 
-#elif defined(URCU_ARCH_RISCV)
-#include 
-#else
-#error "Cannot build: unrecognized architecture, see ."
-#endif
+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_SEQ_CST)
+
+#define uatomic_cmpxchg(addr, old, new) \
+   ({  
\
+   __typeof__(*(addr)) __old = old;
\
+   __atomic_compare_exchange_n(addr, &__old, new, 0,   \
+   __ATOMIC_SEQ_CST, 
__ATOMIC_SEQ_CST);\
+   __old;  
\
+   })
+
+#define uatomic_add_return(addr, v) \
+   __atomic_add_fetch((addr), (v), __ATOMIC_SEQ_CST)


The extra parentheses around "addr" and "v" here are not needed due to 
operator priority of comma ",". Likewise elsewhere in this patch.


Also, as mentioned earlier, please special-case the x86 implementation 
to include the __ATOMIC_SEQ_CST into atomic operations.


Thanks,

Mathieu


+
+#define uatomic_add(addr, v) \
+   (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+   

Re: [lttng-dev] [PATCH 1/7] Require __atomic builtins to build

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 09:30, Ondřej Surý via lttng-dev wrote:

Add autoconf checks for all __atomic builtins that urcu require, and
adjust the gcc and clang versions in the README.md.

Signed-off-by: Ondřej Surý 
---
  README.md| 33 +
  configure.ac | 15 +++
  2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index ba5bb08..a65a07a 100644
--- a/README.md
+++ b/README.md
@@ -68,30 +68,15 @@ Should also work on:
  
  (more testing needed before claiming support for these OS).
  
-Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or

-better.
-
-The C compiler used needs to support at least C99. The C++ compiler used
-needs to support at least C++11.
-
-The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
-supported, with the following exceptions:
-
-  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
-accesses to offsets in a TLS structure on 32-bit x86. These versions are
-therefore not compatible with `liburcu` on x86 32-bit
-(i386, i486, i586, i686).
-The problem has been reported to the GCC community:
-
-  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
-See 
-  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins
-support. For ARM this was introduced with GCC 4.4:
-.
-  - Linux aarch64 depends on GCC 5.1 or better because prior versions
-perform unsafe access to deallocated stack.
-
-Clang version 3.0 (based on LLVM 3.0) is supported.
+Linux ARM depends on running a Linux kernel 2.6.15 or better.
+
+The C compiler used needs to support at least C99 and __atomic
+builtins. The C++ compiler used needs to support at least C++11
+and __atomic builtins.
+
+The GCC compiler versions 4.7 or better are supported.
+
+Clang version 3.1 (based on LLVM 3.1) is supported.
  
  Glibc >= 2.4 should work but the older version we test against is

  currently 2.17.
diff --git a/configure.ac b/configure.ac
index 909cf1d..cb7ba18 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [
AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is 
detected.])
  ])
  
+# Require __atomic builtins

+AC_COMPILE_IFELSE(
+   [AC_LANG_PROGRAM(
+   [[int x, y;]],
+   [[__atomic_store_n(, 0, __ATOMIC_RELEASE);
+ __atomic_load_n(, __ATOMIC_CONSUME);
+ y = __atomic_exchange_n(, 1, __ATOMIC_ACQ_REL);
+ __atomic_compare_exchange_n(, , 0, 0, __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);
+ __atomic_add_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_sub_fetch(, 1, __ATOMIC_ACQ_REL);
+ __atomic_and_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_or_fetch(, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_thread_fence(__ATOMIC_ACQ_REL)]])],


I think we also want to test for __atomic_signal_fence here.

Thanks,

Mathieu



+   [],
+   [AC_MSG_ERROR([The compiler does not support __atomic builtins])])
  
  ## ##

  ## Optional features selection ##


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 10:48, Ondřej Surý wrote:

On 21. 3. 2023, at 15:46, Mathieu Desnoyers  
wrote:

On 2023-03-21 06:21, Ondřej Surý wrote:

On 20. 3. 2023, at 19:37, Mathieu Desnoyers  
wrote:

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.
Signed-off-by: Ondřej Surý 
---
  src/workqueue.c | 1 +
  1 file changed, 1 insertion(+)
diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
   struct urcu_workqueue_completion *completion;
 completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
+ assert(!urcu_ref_get_unless_zero(>ref));


Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of 
some sort ?

I guess?
My experience with TSAN tells me, that you need some kind of memory barrier 
when using acquire-release
semantics and you do:
if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
   /* __ATOMIC_ACQUIRE needed here */
free(obj);
}
we end up using following code in BIND 9:
if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
free(obj);
}
So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
this patch should no longer be needed.


Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.


Yeah, I think I already did that, but wrote the email before that. 
Nevertheless, my main
point was that it should not be needed anymore.


Agreed, let's see how it holds up to testing under TSAN. :)

Thanks,

Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 06:21, Ondřej Surý wrote:

On 20. 3. 2023, at 19:37, Mathieu Desnoyers  
wrote:

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.
Signed-off-by: Ondřej Surý 
---
  src/workqueue.c | 1 +
  1 file changed, 1 insertion(+)
diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
   struct urcu_workqueue_completion *completion;
 completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);
+ assert(!urcu_ref_get_unless_zero(>ref));


Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() of 
some sort ?


I guess?

My experience with TSAN tells me, that you need some kind of memory barrier 
when using acquire-release
semantics and you do:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_RELEASE) == 0) {
   /* __ATOMIC_ACQUIRE needed here */
free(obj);
}

we end up using following code in BIND 9:

if (__atomic_sub_fetch(obj->ref, __ATOMIC_ACQ_REL) == 0) {
free(obj);
}

So, I am guessing after the change of uatomic_sub_return() to __ATOMIC_ACQ_REL,
this patch should no longer be needed.


Actually we want __ATOMIC_SEQ_CST, which is even stronger than ACQ_REL.

Thanks,

Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 10:44, Mathieu Desnoyers wrote:

On 2023-03-21 06:15, Ondřej Surý wrote:


On 20. 3. 2023, at 19:31, Mathieu Desnoyers 
 wrote:


On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.


What is the compiler error ? I'm wondering whether the expected choice
to match the rest of this file's content would be to use "uintptr_t 
*" or "unsigned long *" ?


This is the error:

rculfhash.c:1201:2: error: address argument to atomic operation must 
be a pointer to integer ('struct cds_lfht_node **' invalid)

 uatomic_or(>next, REMOVED_FLAG);
 ^
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
 (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
   ^ ~~
rculfhash.c:1444:3: error: address argument to atomic operation must 
be a pointer to integer ('struct cds_lfht_node **' invalid)

 uatomic_or(_bucket->next, REMOVED_FLAG);
 ^~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
 (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
   ^ ~~

uintptr_t is defined as "unsigned integer type capable of holding a 
pointer to void" while unsigned long is at least 32-bit;


I guess that works in a practise, but using unsigned long to retype 
the pointers might blow up (thinking of x32 which I know

little about, but it's kind of hybrid architecture, isn't it?)


x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even 
that architecture is OK with casting pointer to unsigned long.


I agree with you that uintptr_t is the semantically correct type, but it 
should come as a separate change across the urcu code base: currently 
there are many places where void * is cast to unsigned long to do 
bitwise operations.


I therefore recommend to use unsigned long here to stay similar to the 
rest of the code base, and keep the transition from unsigned long to 
uintptr_t for the future, as it is not an immediate issue we have to 
address.


I forgot to mention: you should add the compiler error to the commit 
message.


You should also explain why this was not an issue until now. It's 
probably related to the introduced use of __atomic builtins.


Thanks,

Mathieu



Thanks,

Mathieu




Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org





--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-21 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-21 06:15, Ondřej Surý wrote:



On 20. 3. 2023, at 19:31, Mathieu Desnoyers  
wrote:

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.


What is the compiler error ? I'm wondering whether the expected choice
to match the rest of this file's content would be to use "uintptr_t *" or "unsigned 
long *" ?


This is the error:

rculfhash.c:1201:2: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
 uatomic_or(>next, REMOVED_FLAG);
 ^
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
 (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
   ^ ~~
rculfhash.c:1444:3: error: address argument to atomic operation must be a 
pointer to integer ('struct cds_lfht_node **' invalid)
 uatomic_or(_bucket->next, REMOVED_FLAG);
 ^~~~
../include/urcu/uatomic.h:60:8: note: expanded from macro 'uatomic_or'
 (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
   ^ ~~

uintptr_t is defined as "unsigned integer type capable of holding a pointer to 
void" while unsigned long is at least 32-bit;

I guess that works in a practise, but using unsigned long to retype the 
pointers might blow up (thinking of x32 which I know
little about, but it's kind of hybrid architecture, isn't it?)


x32 uses 4 bytes for unsigned long, uintptr_t, and void * size. So even 
that architecture is OK with casting pointer to unsigned long.


I agree with you that uintptr_t is the semantically correct type, but it 
should come as a separate change across the urcu code base: currently 
there are many places where void * is cast to unsigned long to do 
bitwise operations.


I therefore recommend to use unsigned long here to stay similar to the 
rest of the code base, and keep the transition from unsigned long to 
uintptr_t for the future, as it is not an immediate issue we have to 
address.


Thanks,

Mathieu




Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-20 14:38, Mathieu Desnoyers via lttng-dev wrote:

On 2023-03-20 14:28, Ondřej Surý wrote:


On 20. 3. 2023, at 19:03, Mathieu Desnoyers 
 wrote:


In doc/uatomic-api.md, we document:

"```c
type uatomic_cmpxchg(type *addr, type old, type new);
```

An atomic read-modify-write operation that performs this
sequence of operations atomically: check if `addr` contains `old`.
If true, then replace the content of `addr` by `new`. Return the
value previously contained by `addr`. This function implies a full
memory barrier before and after the atomic operation."

This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
rather than __ATOMIC_CONSUME".



From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

If desired is written into *ptr then true is returned and memory is 
affected according to the memory order specified by success_memorder. 
There are no restrictions on what memory order can be used here.


Otherwise, false is returned and memory is affected according to 
failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor 
__ATOMIC_ACQ_REL. It also cannot be a stronger order than that 
specified by success_memorder.


I think it makes sense that the failure_memorder has the same memorder 
as uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - 
it's same as with __atomic_load_n, only following are permitted:


The valid memory order variants are __ATOMIC_RELAXED, 
__ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.


Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere.


And it _would_ make sense to use the same memorder on cmpxchg failure as 
uatomic_read if we were exposing a new API, but we are modifying an 
already exposed documented API, so I would stick to SEQ_CST for both 
cmpxchg success/failure.


If we want to expose a new cmpxchg_relaxed_failure with a relaxed 
memorder on failure that would be fine, but we cannot change the 
semantic that is already documented.


Thanks,

Mathieu



Thanks,

Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org





--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-20 14:28, Ondřej Surý wrote:



On 20. 3. 2023, at 19:03, Mathieu Desnoyers  
wrote:

In doc/uatomic-api.md, we document:

"```c
type uatomic_cmpxchg(type *addr, type old, type new);
```

An atomic read-modify-write operation that performs this
sequence of operations atomically: check if `addr` contains `old`.
If true, then replace the content of `addr` by `new`. Return the
value previously contained by `addr`. This function implies a full
memory barrier before and after the atomic operation."

This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
rather than __ATOMIC_CONSUME".



From: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html


If desired is written into *ptr then true is returned and memory is affected 
according to the memory order specified by success_memorder. There are no 
restrictions on what memory order can be used here.

Otherwise, false is returned and memory is affected according to 
failure_memorder. This memory order cannot be __ATOMIC_RELEASE nor 
__ATOMIC_ACQ_REL. It also cannot be a stronger order than that specified by 
success_memorder.


I think it makes sense that the failure_memorder has the same memorder as 
uatomic_read(), but it definitelly cannot be __ATOMIC_ACQ_REL - it's same as 
with __atomic_load_n, only following are permitted:


The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, 
__ATOMIC_ACQUIRE, and __ATOMIC_CONSUME.


Based on my other reply, we want "SEQ_CST" rather than ACQ_REL everywhere.

Thanks,

Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.

Signed-off-by: Ondřej Surý 
---
  src/workqueue.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
struct urcu_workqueue_completion *completion;
  
  	completion = caa_container_of(ref, struct urcu_workqueue_completion, ref);

+   assert(!urcu_ref_get_unless_zero(>ref));


Perhaps what we really want here is an ANNOTATE_UNPUBLISH_MEMORY_RANGE() 
of some sort ?


Thanks,

Mathieu


free(completion);
  }
  


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.


What is the compiler error ? I'm wondering whether the expected choice
to match the rest of this file's content would be to use "uintptr_t *" 
or "unsigned long *" ?


Thanks,

Mathieu



Signed-off-by: Ondřej Surý 
---
  src/rculfhash.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..863387e 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(>next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)>next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
  
  	/*

@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(_bucket->next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
CMM_STORE_SHARED().

Signed-off-by: Ondřej Surý 
---
  include/urcu/system.h | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..99e7443 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -26,7 +26,7 @@
   * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
   * before the load.
   */
-#define _CMM_LOAD_SHARED(p)   CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)   __atomic_load_n(&(p), __ATOMIC_RELAXED)
  
  /*

   * Load a data from shared memory, doing a cache flush if required.
@@ -42,7 +42,7 @@
   * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
   * follow the store.
   */
-#define _CMM_STORE_SHARED(x, v)__extension__ ({ CMM_ACCESS_ONCE(x) = 
(v); })
+#define _CMM_STORE_SHARED(x, v)__atomic_store_n(&(x), (v), 
__ATOMIC_RELAXED)


__atomic_store_n() is void. _CMM_STORE_SHARED() should evaluate to (v) 
(unless we decide to change the semantic, which I would rather avoid).


Thanks,

Mathieu

  
  /*

   * Store v into x, where x is located in shared memory. Performs the
@@ -51,9 +51,8 @@
  #define CMM_STORE_SHARED(x, v)
\
__extension__   \
({  \
-   __typeof__(x) _v = _CMM_STORE_SHARED(x, v); \
+   _CMM_STORE_SHARED(x, v);\
cmm_smp_wmc();  \
-   _v = _v;/* Work around clang "unused result" */   \
})
  
  #endif /* _URCU_SYSTEM_H */


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Instead of custom code, use the __atomic builtins to implement the
rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
rcu_assign_pointer().


This also changes the cmm_mb() family of functions, but not everywhere. 
This should be documented.


I'm also unsure why architecture code has #ifndef cmm_mb when we would 
expect the generic arch implementation to be conditional (the other way 
around).




Signed-off-by: Ondřej Surý 
---
  include/urcu/arch.h   | 20 +
  include/urcu/arch/alpha.h |  6 +++
  include/urcu/arch/arm.h   | 12 ++
  include/urcu/arch/mips.h  |  2 +
  include/urcu/arch/nios2.h |  2 +
  include/urcu/arch/ppc.h   |  6 +++
  include/urcu/arch/s390.h  |  2 +
  include/urcu/arch/sparc64.h   |  6 +++
  include/urcu/arch/x86.h   | 20 +
  include/urcu/static/pointer.h | 77 +++
  10 files changed, 90 insertions(+), 63 deletions(-)

diff --git a/include/urcu/arch.h b/include/urcu/arch.h
index d3914da..aec6fa1 100644
--- a/include/urcu/arch.h
+++ b/include/urcu/arch.h
@@ -21,6 +21,26 @@
  #ifndef _URCU_ARCH_H
  #define _URCU_ARCH_H
  
+#if !defined(__has_feature)

+#define __has_feature(x) 0
+#endif /* if !defined(__has_feature) */
+
+/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */
+#if __has_feature(address_sanitizer)
+#define __SANITIZE_ADDRESS__ 1
+#endif /* if __has_feature(address_sanitizer) */
+
+#ifdef __SANITIZE_THREAD__
+/* FIXME: Somebody who understands the barriers should look into this */
+#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)


This really needs to be __ATOMIC_SEQ_CST.


+#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)


I am really unsure that rmb/wmb semantics map to acq/rel. Paul, can you 
confirm ?



+#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)


SEQ_CST.


+#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)


Unsure (see above).


+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL)


This would map to __ATOMIC_CONSUME, but AFAIK the current implementation 
of this semantic is done with __ATOMIC_ACQUIRE which is stronger than 
what is really needed here. So we can expect a slowdown on some 
architectures if we go that way.


Should we favor code simplicity and long-term maintainability at the 
expense of performance in the short-term ? Or should we keep 
arch-specific implementations until the toolchains end up implementing a 
proper consume semantic ?



+#endif
+
  /*
   * Architecture detection using compiler defines.
   *
diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..84526ef 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,15 @@
  extern "C" {
  #endif
  
+#ifndef cmm_mb

  #define cmm_mb()  __asm__ __volatile__ ("mb":::"memory")
+#endif
+#ifndef cmm_wmb
  #define cmm_wmb() __asm__ __volatile__ ("wmb":::"memory")
+#endif
+#ifndef cmm_read_barrier_depends
  #define cmm_read_barrier_depends()__asm__ __volatile__ ("mb":::"memory")
+#endif
  

[...]

diff --git a/include/urcu/static/pointer.h b/include/urcu/static/pointer.h
index 9e46a57..3f116f3 100644
--- a/include/urcu/static/pointer.h
+++ b/include/urcu/static/pointer.h
@@ -38,6 +38,8 @@
  extern "C" {
  #endif
  
+#define _rcu_get_pointer(addr) __atomic_load_n(addr, __ATOMIC_CONSUME)

+
  /**
   * _rcu_dereference - reads (copy) a RCU-protected pointer to a local variable
   * into a RCU read-side critical section. The pointer can later be safely
@@ -49,14 +51,6 @@ extern "C" {
   * Inserts memory barriers on architectures that require them (currently only
   * Alpha) and documents which pointers are protected by RCU.
   *
- * With C standards prior to C11/C++11, the compiler memory barrier in
- * CMM_LOAD_SHARED() ensures that value-speculative optimizations (e.g.
- * VSS: Value Speculation Scheduling) does not perform the data read
- * before the pointer read by speculating the value of the pointer.
- * Correct ordering is ensured because the pointer is read as a volatile
- * access. This acts as a global side-effect operation, which forbids
- * reordering of dependent memory operations.


We should document that we end up relying on CONSUME for rcu_dereference 
in the patch commit message.



- *
   * With C standards C11/C++11, concerns about dependency-breaking
   * optimizations are taken care of by the "memory_order_consume" atomic
   * load.
@@ -65,10 +59,6 @@ extern "C" {
   * explicit because the pointer used as input argument is a pointer,
   * not an _Atomic type as required by C11/C++11.
   *
- * By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
- * volatile access to implement rcu_dereference rather than
- 

Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-20 14:06, Mathieu Desnoyers via lttng-dev wrote:

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.


You should update the patch subject and commit message to replace 
"thread" by "signal".




FIXME: What should be the correct memory ordering here?


ACQ_REL is what we want here, I think this is fine. We want to prevent
the compiler from reordering loads/stores across the fence, but don't
want any barrier instructions issued.


We should probably make it SEQ_CST here as well, even though I doubt it 
changes anything in this very particular case of atomic_signal_fence.


Thanks,

Mathieu



Thanks,

Mathieu



Signed-off-by: Ondřej Surý 
---
  include/urcu/compiler.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..ede909f 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,8 @@
  #define caa_likely(x)    __builtin_expect(!!(x), 1)
  #define caa_unlikely(x)    __builtin_expect(!!(x), 0)
-#define    cmm_barrier()    __asm__ __volatile__ ("" : : : "memory")
+/* FIXME: What would be a correct memory ordering here? */
+#define    cmm_barrier()    __atomic_signal_fence(__ATOMIC_ACQ_REL)
  /*
   * Instruct the compiler to perform only a single access to a variable




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-20 14:03, Mathieu Desnoyers via lttng-dev wrote:

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.


[...]

+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_ACQ_REL)

+
+#define uatomic_cmpxchg(addr, old, new) \
+    ({    \
+    __typeof__(*(addr)) __old = old;    \
+    __atomic_compare_exchange_n(addr, &__old, new, 0,    \
+    __ATOMIC_ACQ_REL, __ATOMIC_CONSUME);\




Actually, I suspect we'd want to change __ATOMIC_ACQ_REL to 
__ATOMIC_SEQ_CST everywhere, because we want total order.


Thanks,

Mathieu


In doc/uatomic-api.md, we document:

"```c
type uatomic_cmpxchg(type *addr, type old, type new);
```

An atomic read-modify-write operation that performs this
sequence of operations atomically: check if `addr` contains `old`.
If true, then replace the content of `addr` by `new`. Return the
value previously contained by `addr`. This function implies a full
memory barrier before and after the atomic operation."

This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
rather than __ATOMIC_CONSUME".


+    __old;    \
+    })
+
+#define uatomic_add_return(addr, v) \
+    __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_add(addr, v) \
+    (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+    __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_sub(addr, v) \
+    (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+    (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask)    \
+    (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, 
__ATOMIC_RELAXED)
+#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, 
__ATOMIC_RELAXED)

+
+#define cmm_smp_mb__before_uatomic_and()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_and()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_or()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_or()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_add()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_add()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_sub()
cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_sub()
cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_inc()
cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_inc()
cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_dec()
cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_dec()
cmm_smp_mb__after_uatomic_add()

+
+#define cmm_smp_mb()    cmm_mb()


While OK for the general case, I would recommend that we immediately 
implement something more efficient on x86 32/64 which takes into account 
that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK 
prefixed atomic ops, which imply the barrier already, leaving the 
before/after_uatomic_*() as no-ops.


Thanks,

Mathieu



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.


You should update the patch subject and commit message to replace 
"thread" by "signal".




FIXME: What should be the correct memory ordering here?


ACQ_REL is what we want here, I think this is fine. We want to prevent
the compiler from reordering loads/stores across the fence, but don't
want any barrier instructions issued.

Thanks,

Mathieu



Signed-off-by: Ondřej Surý 
---
  include/urcu/compiler.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..ede909f 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,8 @@
  #define caa_likely(x) __builtin_expect(!!(x), 1)
  #define caa_unlikely(x)   __builtin_expect(!!(x), 0)
  
-#define	cmm_barrier()	__asm__ __volatile__ ("" : : : "memory")

+/* FIXME: What would be a correct memory ordering here? */
+#definecmm_barrier()   __atomic_signal_fence(__ATOMIC_ACQ_REL)
  
  /*

   * Instruct the compiler to perform only a single access to a variable


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-20 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 17:37, Ondřej Surý via lttng-dev wrote:

Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.


[...]

+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_ACQ_REL)
+
+#define uatomic_cmpxchg(addr, old, new) \
+   ({  
\
+   __typeof__(*(addr)) __old = old;
\
+   __atomic_compare_exchange_n(addr, &__old, new, 0,   \
+   __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);\


In doc/uatomic-api.md, we document:

"```c
type uatomic_cmpxchg(type *addr, type old, type new);
```

An atomic read-modify-write operation that performs this
sequence of operations atomically: check if `addr` contains `old`.
If true, then replace the content of `addr` by `new`. Return the
value previously contained by `addr`. This function implies a full
memory barrier before and after the atomic operation."

This would map to a "__ATOMIC_ACQ_REL" semantic on cmpxchg failure
rather than __ATOMIC_CONSUME".


+   __old;  
\
+   })
+
+#define uatomic_add_return(addr, v) \
+   __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_add(addr, v) \
+   (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+   __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_sub(addr, v) \
+   (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+   (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask) \
+   (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_inc(addr) (void)__atomic_add_fetch((addr), 1, __ATOMIC_RELAXED)
+#define uatomic_dec(addr) (void)__atomic_sub_fetch((addr), 1, __ATOMIC_RELAXED)
+
+#define cmm_smp_mb__before_uatomic_and()   
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_and()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_or()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_or() 
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_add()   
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__after_uatomic_add()
__atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_mb__before_uatomic_sub()   cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_sub()
cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_inc()   cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_inc()
cmm_smp_mb__after_uatomic_add()
+#define cmm_smp_mb__before_uatomic_dec()   cmm_smp_mb__before_uatomic_add()
+#define cmm_smp_mb__after_uatomic_dec()
cmm_smp_mb__after_uatomic_add()
+
+#define cmm_smp_mb()   cmm_mb()


While OK for the general case, I would recommend that we immediately 
implement something more efficient on x86 32/64 which takes into account 
that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK 
prefixed atomic ops, which imply the barrier already, leaving the 
before/after_uatomic_*() as no-ops.


Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 13:02, Ondřej Surý wrote:

On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
wrote:

I would indeed like to remove all the custom atomics assembly code from liburcu 
now that there are good atomics support in the major compilers (gcc and clang).


Here's very preliminary implementation:

https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/2

I just did something wrong somewhere along the path and it doesn't compile now,
but it did for me locally.

I am submitting this now as it's 18:00 Friday evening and my kids are starting 
to
be angry at me :).

This will need some more work - I think some of the cmm_ macros might be dropped
now, and somebody who does that more often than I should take a look at the 
memory
orderings.


A few comments:

cmm_barrier() should rather be __atomic_signal_fence().

Also I notice this macro pattern (coding style):

#define uatomic_set(addr, v) __atomic_store_n((addr), (v), __ATOMIC_RELEASE)

The extra parentheses for parameters are not needed, because the comma is pretty
much the last operator in terms of priority. The following would be preferred
specifically because those are separated by comma:

#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)

Our memory barrier semantic are similar to the Linux kernel, where the following
imply ACQ_REL because they return something: cmpxchg, add_return, sub_return, 
xchg.

The rest (add, sub, and, or, inc, dec) are __ATOMIC_RELAXED. Note that
cmm_smp_mb__before/after_uatomic_*() need to be implemented as
__atomic_thread_fence(__ATOMIC_ACQ_REL).

There are some architectures where we will want to keep a specialized version
of those add, sub, and, or, inc, dec operations which include the ACQ_REL 
semantic,
e.g. x86, where this is implied by the LOCK prefix. For those the 
cmm_smp_mb__before/after_uatomic_*()
will be no-ops.

The CMM_STORE_SHARED is not meant to have a RELEASE semantic. It is meant to
update variables that don't need the release ordering. The ATOMIC_CONSUME was
not the intent at the CMM_LOAD_SHARED level neither.

(this is just from looking around at the patches, it would be better if we can 
have the
patches posted to the mailing list for further discussion)

Thanks!

Mathieu




Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 11:50, Ondřej Surý wrote:

On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
wrote:

Sure, can you please submit the patch as a separate email with subject/commit 
message/signed-off-by tag ?



https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/1.patch

Would this work for you?

Or do you need to have the patch attached?


Having the patch attached (e.g. using git send-email) would be better, 
but I don't mind downloading the file for this time. Merged into liburcu 
master branch, thanks!


Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-14 08:26, Ondřej Surý via lttng-dev wrote:

Hey,

we use ThreadSanitizer in BIND 9 CI quite extensively and with userspace-rcu
it's lit like an American house on Saturnalia ;).


Haha, I have no doubt about it. Userspace RCU is all about concurrent 
accesses, and so far possesses no TSAN annotations.




I have two questions:

1. I've tried to help TSAN by replacing the custom atomics with __atomic gcc
   primitives - that seems to work pretty well.  Note that using C11 stdatomics
   is frankly not possible here because it would require wrapping everything 
into
   _Atomic().


Agreed. gcc __atomic seems to be the way to go.



   Do you want me to contribute this back? And how should I plug this into the
   existing structure?  This touches:

include/urcu/static/pointer.h
include/urcu/system.h
include/urcu/uatomic.h


I would indeed like to remove all the custom atomics assembly code from 
liburcu now that there are good atomics support in the major compilers 
(gcc and clang). I am also tempted to bump the base compiler version 
requirements (for both gcc and clang) to something less ancient than 
what we have currently for the next liburcu releases if it helps us rely 
on non-buggy atomics implementations. The currently supported compilers 
are stated in the README.md file:


"Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 
or better.


The C compiler used needs to support at least C99. The C++ compiler used
needs to support at least C++11.

The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
supported, with the following exceptions:

  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
accesses to offsets in a TLS structure on 32-bit x86. These 
versions are

therefore not compatible with `liburcu` on x86 32-bit
(i386, i486, i586, i686).
The problem has been reported to the GCC community:

  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
See 
  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic 
builtins

support. For ARM this was introduced with GCC 4.4:
.
  - Linux aarch64 depends on GCC 5.1 or better because prior versions
perform unsafe access to deallocated stack.

Clang version 3.0 (based on LLVM 3.0) is supported."

For gcc, I wonder if gcc-4.8 has appropriate support for __atomic on all 
supported architectures supported by liburcu ?


I also wonder what would be a good conservative baseline version for clang.

As we introduce a newer compiler baseline version, I would be tempted to 
add a compiler version detection in include/urcu/compiler.h and #warn 
whenever the compiler is too old. This is similar to what we do for the
compiler disallow list with URCU_GCC_VERSION, but enforced with a 
warning rather than a #error. The last thing I want is to end up wasting 
people's time due to compiling with a buggy old compiler, so I favor a 
"fail early" approach.





2. I know there's KTSAN, so it must work somehow, but was there any success
   on using ThreadSanitizer on projects using Userspace-RCU?  It mostly seems
   to highlight the CDS parts of the code.


Not AFAIK.



I can help TSAN to understand some of the code or suppress some of the warnings,
but I do want to prevent the code to be full of stuff like this:

static void
destroy_adbname_rcu_head(struct rcu_head *rcu_head) {
 dns_adbname_t *adbname = caa_container_of(rcu_head, dns_adbname_t,
   rcu_head);

#ifdef __SANITIZE_THREAD__
 SPINLOCK(>lock);
 SPINUNLOCK(>lock);
#endif

 destroy_adbname(adbname);
}


Indeed, we'd want to improve the liburcu header files and implementation 
by adding the appropriate annotation there.




I am absolutely sure that the adbname can be destroyed here (because of the
reference counting), but TSAN had a problem with it. Doing the "fake" barrier
with a spinlock here made it stop consider this to be a data race.

I also had to disable the auto_resize of cds_lfht when running under TSAN.

I am also worried that by hiding some code from TSAN, we might miss a legitimate
error.

All I found using Google was this notice from 2014:
https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg05121.html

and perhaps this:
https://github.com/google/sanitizers/issues/1415

(Perhaps, I should look into annotating urcu code with TSAN annotations?)


Yes, I suspect we'll want to add TSAN annotation to liburcu code, and 
perhaps Helgrind and DRD annotations as well while we are at it. Those 
tools are very valuable development tools, which makes it worthwhile to 
add the relevant annotations to help them figure out liburcu intricacies.






3. As an extra bonus, this is going to be needed with clang-17 as noreturn is 
now
reserved word:


Sure, can 

Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU

2023-03-13 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-13 11:30, Ondřej Surý wrote:

Hi Matthieu,

I spent some more time with the userspace-rcu on Friday and over weekend and
now I am in much better place.


On 13. 3. 2023, at 15:29, Mathieu Desnoyers  
wrote:

On 2023-03-11 01:04, Ondřej Surý via lttng-dev wrote:

Hey,
so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
I am rewriting the internal address database (keeps the infrastructure
information about names and addresses).


That's indeed very interesting !


Thanks for the userspace-rcu! It saves a lot of time - while my colleague Tony 
Finch
already wrote our internal QSBR implementation from scratch, it would be waste 
of
time to try to reimplement the CDS part of the library.

This is part of larger work to replace the internal BIND 9 database that's 
currently
implemented as rwlocked RBT with qptrie, if you are interested Tony has good
summary here: https://dotat.at/@/2023-02-28-qp-bind.html


Speaking of tries, I have implemented RCU Judy arrays in liburcu feature 
branches a
while back. Those never made it to the liburcu master branch because I had no 
real-life
use for those so far, and I did not want to expose a public API that would 
bitrot without
real-life user feedback.

The lookups and ordered traversals (next/prev) are entirely RCU, and updates are
either single-threaded, or use a strategy where locking is distributed within
the trie so updates to data spatially discontinuous would not contend with each 
other.

My original implementation supported integer keys as well as variable-length 
string keys.

The advantage of Judy arrays is that it minimizes the number of cache-lines 
touched
on lookup traversal. Let me know if this would be useful for your use-cases, 
and if
so I can provide links to prototype branches.

[...]




So this is part with the hashtable lookup which seems to work well:
 rcu_read_lock();
 struct cds_lfht_iter iter;
 struct cds_lfht_node *ht_node;
 cds_lfht_lookup(adb->names_ht, hashval, names_match, , );
 ht_node = cds_lfht_iter_get_node();
 bool unlink = false;
 if (ht_node == NULL) {
 /* Allocate a new name and add it to the hash table. */
 adbname = new_adbname(adb, name, start_at_zone);
 ht_node = cds_lfht_add_unique(adb->names_ht, hashval,
   names_match, ,
   >ht_node);
 if (ht_node != >ht_node) {
 /* ISC_R_EXISTS */
 destroy_adbname(adbname);
 adbname = NULL;
 }
 }
 if (adbname == NULL) {
 INSIST(ht_node != NULL);
 adbname = caa_container_of(ht_node, dns_adbname_t, ht_node);
 unlink = true;
 }
 dns_adbname_ref(adbname);


What is this dns_adbname_ref() supposed to do ? And is there a reference to 
adbname
that is still used after rcu_read_unlock() ? What guarantees the existence of 
the
adbname after rcu_read_unlock() ?


This is part of the internal reference counting - there's a macro that expects 
`isc_refcount_t references;`
member on the struct and it creates _ref, _unref, _attach and _detach functions 
for each struct.

The last _detach/_unref calls a destroy function.


 rcu_read_unlock();
and here's the part where LRU gets updated:
 LOCK(>lock); /* Must be unlocked by the caller */


I suspect you use a scheme where you hold the RCU read-side to perform the 
lookup, and
then you use the object with an internal lock held. But expecting the object to 
still
exist after rcu read unlock is incorrect, unless some other reference counting 
scheme
is used.


Yeah, I was trying to minimize the sections where we hold the rcu_read locks, 
but I gave
up and now there's rcu_read lock held for longer periods of time.


We've used that kind of scheme in LTTng lttng-relayd, where we use RCU for 
short-term
existence guarantee, and reference counting for longer-term existence 
guarantee. An
example can be found here:

https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-relayd/viewer-stream.cpp

viewer_stream_get_by_id() attempts lookup from the hash table, and re-validates 
that the
object exists with viewer_stream_get(), which checks if the refcount is already 
0 as it
tries to increment it with urcu_ref_get_unless_zero(). If zero, it does as if 
the object
was not found. I recommend this kind of scheme if you intend to use both RCU 
and reference
counting.

Then you can place a mutex within the object, and use that mutex to provide 
mutual
exclusion between concurrent accesses to the object that need to be serialized.

In the destroy handler (called when the reference count reaches 0), you will 
typically
want to unlink your object from the various data structures holding references 
to it
(hash tables, lists), and then use call_rcu() 

Re: [lttng-dev] urcu/rculist.h clarifications - for implementing LRU

2023-03-13 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-11 01:04, Ondřej Surý via lttng-dev wrote:

Hey,

so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
I am rewriting the internal address database (keeps the infrastructure
information about names and addresses).


That's indeed very interesting !



There's a hashtable to keep the entries and there's associated LRU list.

For the hashtable the cds_lfht seems to work well, but I am kind of struggling
with cds_list (the urcu/rculist.h variant).

The names and entries works in pretty much similar way, so I am going to
describe just one.

The workhorse is get_attached_and_locked_name() function (I am going to
skip the parts where we create keys, checks if LRU needs to be updated, etc.)


It would help if you could share a git branch of your prototype. In order to 
reason
about RCU, we typically need to look at both the update-side and the read-side.
For instance I don't see the read-side of the LRU linked-list in the code 
snippets
below. We also need to have a complete picture of the object lifetime, from 
allocation
to reclaim/reuse. I don't see where the grace periods (either synchronize_rcu or
call_rcu) are before reclaim or reuse in the code snippets below.



So this is part with the hashtable lookup which seems to work well:

 rcu_read_lock();

 struct cds_lfht_iter iter;
 struct cds_lfht_node *ht_node;

 cds_lfht_lookup(adb->names_ht, hashval, names_match, , );

 ht_node = cds_lfht_iter_get_node();

 bool unlink = false;

 if (ht_node == NULL) {
 /* Allocate a new name and add it to the hash table. */
 adbname = new_adbname(adb, name, start_at_zone);

 ht_node = cds_lfht_add_unique(adb->names_ht, hashval,
   names_match, ,
   >ht_node);
 if (ht_node != >ht_node) {
 /* ISC_R_EXISTS */
 destroy_adbname(adbname);
 adbname = NULL;
 }
 }
 if (adbname == NULL) {
 INSIST(ht_node != NULL);
 adbname = caa_container_of(ht_node, dns_adbname_t, ht_node);
 unlink = true;
 }

 dns_adbname_ref(adbname);


What is this dns_adbname_ref() supposed to do ? And is there a reference to 
adbname
that is still used after rcu_read_unlock() ? What guarantees the existence of 
the
adbname after rcu_read_unlock() ?



 rcu_read_unlock();

and here's the part where LRU gets updated:

 LOCK(>lock); /* Must be unlocked by the caller */


I suspect you use a scheme where you hold the RCU read-side to perform the 
lookup, and
then you use the object with an internal lock held. But expecting the object to 
still
exist after rcu read unlock is incorrect, unless some other reference counting 
scheme
is used.


 if (NAME_DEAD(adbname)) {
 UNLOCK(>lock);
 dns_adbname_detach();
 goto again;
 }

 if (adbname->last_used + ADB_CACHE_MINIMUM <= last_update) {
 adbname->last_used = now;

 LOCK(>names_lru_lock);
 if (unlink) {
 cds_list_del_rcu(>list_node);
 }


This looks odd. I don't see the code implementing traversal of this list, but
I would expect a grace period between unlink of the node from a list and 
insertion
into another list, otherwise if there are RCU readers traversing the list
concurrently, they can observe an inconsistent state.


 cds_list_add_tail_rcu(>list_node, >names_lru);
 UNLOCK(>names_lru_lock);
 }

The NAME_DEAD gets updated under the adbname->lock in expire_name():

 if (!NAME_DEAD(adbname)) {
 adbname->flags |= NAME_IS_DEAD;

 /* Remove the adbname from the hashtable... */
 (void)cds_lfht_del(adb->names_ht, >ht_node);


I don't have the full context here, but AFAIR cds_lfht_del() allows two removals
of the same ht_node to be done concurrently, and only one will succeed (which is
probably what happens here). cds_list_del_rcu() however does not allow 
concurrent
removals of a list_node. So if you somehow get two RCU lookups to find the same
node in expire_name, one will likely do an extra unexpected cds_list_del_rcu().



 /* ... and LRU list */
 LOCK(>names_lru_lock);
 cds_list_del_rcu(>list_node);
 UNLOCK(>names_lru_lock);
 }

So, now the problem is that sometimes I get a crash under load:

(gdb) bt
#0  0x7fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at 
/usr/include/x86_64-linux-gnu/urcu/rculist.h:71
#1  get_attached_and_locked_name (adb=adb@entry=0x7fae830142a0, 
name=name@entry=0x7fae804fc9b0, start_at_zone=true, now=) at 
adb.c:1446
#2  0x7fae87a392bf in 

Re: [lttng-dev] how to disable local file writing in relayd?

2023-03-08 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-06 00:12, Yuan Bin via lttng-dev wrote:
  Can I disable local-file-writing in lttng-relayd to avoid the disk 
space overhead, only using it as a live viewer?


Not explicitly, but you can store your temporary files on a tmpfs file 
system (see lttng-relayd(8) --output command line parameter), which will 
only keep the relayd files in memory, and use the tracefile rotation 
feature to prevent the files from growing forever, e.g.:


https://lttng.org/docs/v2.13/#doc-enabling-disabling-channels

Example:Create a Linux kernel channel which rotates eight trace files of 
4 MiB each for each stream.


lttng enable-channel --kernel --tracefile-count=8 \
 --tracefile-size=4194304 my-channel

See lttng-enable-channel(1) for more info.

I hope this helps!

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng-modules 2.12.13 and 2.13.9 (Linux kernel tracer)

2023-03-03 Thread Mathieu Desnoyers via lttng-dev
This is a release announcement for the two currently maintained stable 
branches of the LTTng-modules project.


* New in these releases

LTTng-modules v2.13.9 contains a fix required to build against Linux v6.2.

Both v2.12.13 and v2.13.9 contain a set of build fixes to follow 
evolution of the jbd2 tracepoint instrumentation within the Linux kernel 
5.4 and 5.10 stable branches.


* Changelog

2023-03-03 (Canadian Bacon Day) LTTng modules 2.13.9
* fix: jbd2: use the correct print format (v5.4.229)
* fix: jbd2 upper bound for v5.10.163
* fix: jbd2: use the correct print format (v5.10.163)
* fix: btrfs: move accessor helpers into accessors.h (v6.2)

2023-03-03 (Canadian Bacon Day) 2.12.13
* fix: jbd2: use the correct print format (v5.4.229)
* fix: jbd2 upper bound for v5.10.163
* fix: jbd2: use the correct print format (v5.10.163)

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Filtering tracing by process name or PID/TID

2023-02-15 Thread Mathieu Desnoyers via lttng-dev

On 2023-02-15 04:09, Rengar Stinkt via lttng-dev wrote:

Dear community,
I only recently started working with lttng tracing due to work related 
projects, so I am very new to this. I have done some research before 
posting this but I can't seem to find an answer.
I am running several CPU load tests for specific processes on different 
devices using lttng and TraceCompass for visualization. I am running 
into the issue that 99.9% of traced processes are not of value to me and 
the tracing files get extremely big and hard to work with (filtering 
with TraceCompass is very slow).
Now I thought of filtering the processes before tracing and I found 
filtering by PID and TID. The issue with this is that the PIDs and TIDs 
are unique on each device but change between devices.
I then found the command "htop -d 0.1 -u **String**" to see currently 
running processes with a certain name.
Now if I run this it shows me the running process IF they are running. I 
have time triggered and event triggered processes. There are many 
inconvenient workarounds to make it work, like triggering the events and 
finding out the PID and then manually copying all of the IDs and pasting 
them into "lttng track --kernel --pid=""". But I am trying to find a way 
to either filter by name right away, avoiding relying on PIDs or at 
least to have an automated process of doing it. But I am unfamiliar with 
running code in the PuTTY terminal that we are using, so I am trying to 
avoid this (for now). If this is the only option though, I will have to 
look into it.
Is there any way to filter by name right away like in the mentioned htop 
command?

Thank you so much in advance.


This would be:

lttng enable-event -k event_name --filter '$ctx.procname == "string"'

Where "string" can include wildcards as well.

Hoping this helps,

Mathieu



Dom

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] Userspace RCU 0.14.0, 0.13.3, 0.12.5 [EOL]

2023-02-14 Thread Mathieu Desnoyers via lttng-dev

Hi,

This is a release announcement for the Userspace RCU project.

This is a set of releases, including the new 0.14 branch with the 0.14.0 
release, and bug fix releases for the 0.13 and 0.12 branches. The 0.12.5 
release is the last of the 0.12 branch, which reaches end of life with 
the release of 0.14.


Here are the new features introduced in urcu 0.14.0:

- C99 and C++11 are now the baseline requirements, as documented in
  README.md.

- Introduce public APIs for C++,

  An important point to consider: urcu/compiler.h needs to include
   in C++, which prevents including urcu/compiler.h
  from extern "C" code.

- Introduce new grace period polling APIs in urcu-memb,mb,signal,qsbr,bp
  flavors:

  struct urcu_gp_poll_state start_poll_synchronize_rcu(void);
  bool poll_state_synchronize_rcu(struct urcu_gp_poll_state state);

  This allow periodically polling to check if a started grace period has
  completed, and thus check for grace period completion and some other
  condition as well.

- rculfhash: introduce cds_lfht_node_init_deleted

  Allow initializing lfht node to "removed" state to allow querying
  whether the node is published in a hash table before it is added to
  the hash table and after it has been removed from the hash table.

- Disable signals in URCU background threads

  Applications using signalfd depend on signals being blocked in all
  threads of the process, otherwise threads with unblocked signals
  can receive them and starve the signalfd.

  While some threads in URCU do block signals (e.g. workqueue
  worker for rculfhash), the call_rcu, defer_rcu, and rculfhash
  partition_resize_helper threads do not.

  Always block all signals before creating threads, and only unblock
  SIGRCU when registering a urcu-signal thread. Restore the SIGRCU
  signal to its pre-registration blocked state on unregistration.

  For rculfhash, cds_lfht_worker_init can be removed, because its only
  effect is to block all signals except SIGRCU. Blocking all signals is
  already done by the workqueue code, and unbloking SIGRCU is now done
  by the urcu signal flavor thread regisration.

- Always use '__thread' for Thread local storage except on MSVC

  Use the GCC extension '__thread' [1] for Thread local storage on all C
  and C++ compilers except MSVC.

  While C11 and C++11 respectively offer '_Thread_local' and
  'thread_local' as potentialy faster implementations, they offer no
  guarantees of compatibility when used in a library interface which
  might be used by both C and C++ client code.

- Various test framework improvements.

- Wire up membarrier system call on Alpha. The only missing architecture
  without membarrier wired up is MIPS. https://bugs.lttng.org/issues/940


Here are the fixes introduced in urcu 0.14.0, 0.13.3 and 0.12.5:

- Fix: auto-resize hash table destroy deadlock

  Fix a deadlock for auto-resize hash tables when cds_lfht_destroy
  is called with RCU read-side lock held.

- Join call_rcu worker thread in call_rcu_data_free (eliminate leaks)

- Teardown default call_rcu worker on application exit

  Teardown the default call_rcu worker thread if there are no queued
  callbacks on process exit. This prevents leaking memory.

  Here is how an application can ensure graceful teardown of this
  worker thread:

  - An application queuing call_rcu callbacks should invoke
rcu_barrier() before it exits.
  - When chaining call_rcu callbacks, the number of calls to
rcu_barrier() on application exit must match at least the maximum
number of chained callbacks.
  - If an application chains callbacks endlessly, it would have to be
modified to stop chaining callbacks when it detects an application
exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
after setting that flag.
  - The statements above apply to a library which queues call_rcu
callbacks, only it needs to invoke rcu_barrier in its library
destructor.

- Allow building on MSYS2

  Update cygwin libtool config in `configure.ac` to match MSYS2 build
  environments as well. MSYS2 is also a Windows build environment that
  produces DLLs.

Feedback is welcome!

Mathieu


Project website: https://liburcu.org
Git repository: git://git.liburcu.org/urcu.git

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-02-06 Thread Mathieu Desnoyers via lttng-dev

Hi Micke,

I did tweaks to make the code C++ compatible even though it's currently 
only built in C. It makes it more future-proof.


I've merged the resulting patch into lttng-ust 
master/stable-2.13/stable-2.12. Thanks for testing !


Mathieu

On 2023-02-06 11:15, Beckius, Mikael wrote:

Hello Mathieu!

I added your latest implementation to my test and it seems to perform well on 
both arm and arm64. Since the test was written in C++ I had to make a small 
change to the cast in order for the test to compile.

Micke


-Ursprungligt meddelande-
Från: Mathieu Desnoyers 
Skickat: den 2 februari 2023 17:26
Till: Beckius, Mikael ; lttng-
d...@lists.lttng.org
Ämne: Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch
specific optimization

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and
know the content is safe.

Hi  Mikael,

I just tried another approach to fix this issue, see:

https://review.lttng.org/c/lttng-ust/+/9413 Fix: use unaligned pointer
accesses for lttng_inline_memcpy

It is less intrusive than other approaches, and does not change the generated
code on the
most relevant architectures.

Feedback is welcome,

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-02-02 Thread Mathieu Desnoyers via lttng-dev

Hi  Mikael,

I just tried another approach to fix this issue, see:

https://review.lttng.org/c/lttng-ust/+/9413 Fix: use unaligned pointer accesses 
for lttng_inline_memcpy

It is less intrusive than other approaches, and does not change the generated 
code on the
most relevant architectures.

Feedback is welcome,

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-01-31 Thread Mathieu Desnoyers via lttng-dev

On 2023-01-31 11:18, Mathieu Desnoyers wrote:

On 2023-01-31 11:08, Mathieu Desnoyers wrote:

On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote:

Hello Matthieu!

I have looked at this in place of Anders and as far as I can tell 
this is not an arm64 issue but an arm issue. And even on arm 
__ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs if 
size equals 8.


So for ARM, perhaps we should do the following in 
include/lttng/ust-arch.h:


#if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

And refer to 
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options


Based on that documentation, it is possible to build with 
-mno-unaligned-access,
and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline 
architectures,

unaligned accesses are not enabled.

I would only push this kind of change into the master branch though, 
due to

its impact and the fact that this is only a performance improvement.


But setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for arm32
when __ARM_FEATURE_UNALIGNED is defined would still cause issues for
8-byte lttng_inline_memcpy with my proposed patch right ?

AFAIU 32-bit arm with __ARM_FEATURE_UNALIGNED has unaligned accesses for
2 and 4 bytes accesses, but somehow traps for unaligned 8-bytes
accesses ?


Re-reading your analysis, I may have mistakenly concluded that using the
lttng ust ring buffer in "packed" mode would be faster than aligned mode 
on arm32 and aarch64, but that's not really what you have benchmarked there.


So forget what I said about setting 
LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS to 1 for arm32 and aarch64.


There is a distinction between having efficient unaligned access and
supporting unaligned accesses at all.

For aarch64, it appears to support unaligned accesses, but it may be
slower than aligned accesses AFAIU.

For arm32, it supports unaligned accesses for 2 and 4 bytes when 
__ARM_FEATURE_UNALIGNED is set, but not for 8 bytes (it traps). Then 
it's not clear whether a 2 or 4 bytes access is slower when unaligned 
compared to aligned.


At the end of the day, it's a question of compactness of the generated 
trace data (added throughput overhead) vs cpu time required to perform 
an unaligned access vs aligned.


Thoughts ?

Thanks,

Mathieu



Thanks,

Mathieu





In addition I did some performance testing of lttng_inline_memcpy by 
extracting it and adding it to a simple test program. It appears that 
the general performance increases on arm, arm64, arm on arm64 
hardware and x86-64. But it also appears that on arm if you end up in 
memcpy the old code where you call memcpy directly is actually 
slightly faster.


Nothing unexpected here. Just make sure that your test program does 
not call lttng_inline_memcpy
with constant size values which end up optimizing away branches. In 
the context where lttng_inline_memcpy

is used, most of the time its arguments are not constants.



Skipping the memcpy fallback on arm for unaligned copies of sizes 2 
and 4 further improves the performance


This would be naturally done on your board if we conditionally
set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for 
__ARM_FEATURE_UNALIGNED

right ?

and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the 
best performance on arm64.


This could go into lttng-ust master branch as well, e.g.:

#if defined(LTTNG_UST_ARCH_AARCH64)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

Thanks!

Mathieu



Micke
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev






--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-01-31 Thread Mathieu Desnoyers via lttng-dev

On 2023-01-31 11:08, Mathieu Desnoyers wrote:

On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote:

Hello Matthieu!

I have looked at this in place of Anders and as far as I can tell this 
is not an arm64 issue but an arm issue. And even on arm 
__ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs if 
size equals 8.


So for ARM, perhaps we should do the following in include/lttng/ust-arch.h:

#if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

And refer to 
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options


Based on that documentation, it is possible to build with 
-mno-unaligned-access,

and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures,
unaligned accesses are not enabled.

I would only push this kind of change into the master branch though, due to
its impact and the fact that this is only a performance improvement.


But setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for arm32
when __ARM_FEATURE_UNALIGNED is defined would still cause issues for
8-byte lttng_inline_memcpy with my proposed patch right ?

AFAIU 32-bit arm with __ARM_FEATURE_UNALIGNED has unaligned accesses for
2 and 4 bytes accesses, but somehow traps for unaligned 8-bytes
accesses ?

Thanks,

Mathieu





In addition I did some performance testing of lttng_inline_memcpy by 
extracting it and adding it to a simple test program. It appears that 
the general performance increases on arm, arm64, arm on arm64 hardware 
and x86-64. But it also appears that on arm if you end up in memcpy 
the old code where you call memcpy directly is actually slightly faster.


Nothing unexpected here. Just make sure that your test program does not 
call lttng_inline_memcpy
with constant size values which end up optimizing away branches. In the 
context where lttng_inline_memcpy

is used, most of the time its arguments are not constants.



Skipping the memcpy fallback on arm for unaligned copies of sizes 2 
and 4 further improves the performance


This would be naturally done on your board if we conditionally
set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for 
__ARM_FEATURE_UNALIGNED

right ?

and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the 
best performance on arm64.


This could go into lttng-ust master branch as well, e.g.:

#if defined(LTTNG_UST_ARCH_AARCH64)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

Thanks!

Mathieu



Micke
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev




--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-01-31 Thread Mathieu Desnoyers via lttng-dev

On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote:

Hello Matthieu!

I have looked at this in place of Anders and as far as I can tell this is not 
an arm64 issue but an arm issue. And even on arm __ARM_FEATURE_UNALIGNED is 1 
so it seems the problem only occurs if size equals 8.


So for ARM, perhaps we should do the following in include/lttng/ust-arch.h:

#if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

And refer to https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options

Based on that documentation, it is possible to build with -mno-unaligned-access,
and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures,
unaligned accesses are not enabled.

I would only push this kind of change into the master branch though, due to
its impact and the fact that this is only a performance improvement.



In addition I did some performance testing of lttng_inline_memcpy by extracting 
it and adding it to a simple test program. It appears that the general 
performance increases on arm, arm64, arm on arm64 hardware and x86-64. But it 
also appears that on arm if you end up in memcpy the old code where you call 
memcpy directly is actually slightly faster.


Nothing unexpected here. Just make sure that your test program does not call 
lttng_inline_memcpy
with constant size values which end up optimizing away branches. In the context 
where lttng_inline_memcpy
is used, most of the time its arguments are not constants.



Skipping the memcpy fallback on arm for unaligned copies of sizes 2 and 4 
further improves the performance


This would be naturally done on your board if we conditionally
set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for __ARM_FEATURE_UNALIGNED
right ?

and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields the best 
performance on arm64.

This could go into lttng-ust master branch as well, e.g.:

#if defined(LTTNG_UST_ARCH_AARCH64)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

Thanks!

Mathieu



Micke
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-01-26 Thread Mathieu Desnoyers via lttng-dev

On 2023-01-26 14:32, Anders Wallin wrote:

Hi Matthieu,

I've retired and no longer have access to any arch64  target to test it on.



Thanks for your reply Anders,

I've talked to Henrik and Pär today and they are already testing it out.

Enjoy your retirement :)

Best regards,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

2023-01-25 Thread Mathieu Desnoyers via lttng-dev

Hi Anders,

Sorry for the long delay on this one, can you have a look at the following fix ?

https://review.lttng.org/c/lttng-ust/+/9319 Fix: aarch64: do not perform 
unaligned stores

If it passes your testing, I'll merge this into lttng-ust.

Thanks,

Mathieu

On 2017-12-28 09:13, Anders Wallin wrote:

Hi Mathieu,

I finally got some time to dig into this issue. The crash only happens 
when metadata is written AND the size of the metadata will end up in a 
write that is 8,4,2 or 1 bytes long AND
that the source or destination is not aligned correctly according to HW 
limitation. I have not found any simple way to keep the performance 
enhancement code that is run most of the time.

Maybe the metadata writes should have it's own write function instead.

Here is an example of a crash (code is from lttng-ust 2.9.1 and 
lttng-tools 2.9.6) where the size is 8 bytes and the src address is 
unaligned at 0xf3b7eeb2;


#0  lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=) at 
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610

No locals.
#1  lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, 
config=0xf737c560 ) at 
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100

         __len = 8
         handle = 0xf3b2e0c0
         backend_pages = 
         chanb = 0xf3b2e2e0
         offset = 

#2  lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at 
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267

No locals.

#3  0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=out>, metadata_str=0xf3b7eeb2 "", len=) at 
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
         ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, 
data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size 
= 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, 
ctx_len = 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' times>, backend_pages = 0xf690c000}

         chan = 0xf3b2e4d8
         str = 0xf3b7eeb2 ""
         reserve_len = 8
         ret = 
         __func__ = '\000' 
         __PRETTY_FUNCTION__ = '\000' 
---Type  to continue, or q  to quit---

#4  0x000344cc in commit_one_metadata_packet 
(stream=stream@entry=0xf3b2e560) at ust-consumer.c:2206

         write_len = 
         ret = 
         __PRETTY_FUNCTION__ = "commit_one_metadata_packet"

#5  0x00036538 in lttng_ustconsumer_read_subbuffer 
(stream=stream@entry=0xf3b2e560, ctx=ctx@entry=0x25e6e8) at 
ust-consumer.c:2452

         len = 4096
         subbuf_size = 4093
         padding = 
         err = -11
         write_index = 1
         ret = 
         ustream = 
         index = {offset = 0, packet_size = 575697416355872, 
content_size = 17564043391468256584, timestamp_begin = 
17564043425827782792, timestamp_end = 34359738496,

Regards
Anders

fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers 
mailto:mathieu.desnoy...@efficios.com>>:


- On Nov 24, 2017, at 3:23 AM, Anders Wallin mailto:walli...@gmail.com>> wrote:

Hi,
architectures that has memory alignment restrictions may/will
fail with the
optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3.
Please revert the patch or make it X86 specific.


Hi Anders,

This was added in the development cycle of lttng-ust 2.9. We could
perhaps
add a test on the pointer alignment for architectures that care
about it, and
fallback to memcpy in those cases.

The revert approach would have been justified if this commit had
been backported
as a "fix" to a stable branch, which is not the case here. We should
work on
finding an acceptable solution that takes care of dealing with
unaligned pointers
on architectures that care about the difference.

Thanks,

Mathieu



Regards

Anders Wallin


commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3
Author: Mathieu Desnoyers mailto:mathieu.desnoy...@efficios.com>>
Date:   Sun Sep 25 12:31:11 2016 -0400

     Performance: implement lttng_inline_memcpy
     Because all length parameters received for serializing data
coming from
     applications go through a callback, they are never
constant, and it
     hurts performance to perform a call to memcpy each time.
     Signed-off-by: Mathieu Desnoyers
mailto:mathieu.desnoy...@efficios.com>>

diff --git a/libringbuffer/backend_internal.h
b/libringbuffer/backend_internal.h
index 90088b89..e597cf4d 100644
--- a/libringbuffer/backend_internal.h
+++ b/libringbuffer/backend_internal.h
@@ -592,6 +592,28 @@ int update_read_sb_index(const struct
lttng_ust_lib_ring_buffer_config *config,
  #define inline_memcpy(dest, src, n)    

[lttng-dev] [RELEASE] LTTng-modules 2.12.12 and 2.13.8 (Linux kernel tracer)

2023-01-13 Thread Mathieu Desnoyers via lttng-dev

Hi,

Those are stable release updates of the LTTng modules project.

The most relevant change is that the 2.13.8 version introduces
support for the 6.1 Linux kernel, kernel version ranges updates
for the RHEL kernels, and a kallsyms wrapper fix on ppc64el.

The LTTng modules provide Linux kernel tracing capability to the LTTng
tracer toolset.

* New in these releases:

2023-01-13 (National Sticker Day) LTTng modules 2.13.8
* fix: jbd2: use the correct print format
* Fix: in_x32_syscall was introduced in v4.7.0
* Explicitly skip tracing x32 system calls
* fix: kallsyms wrapper on ppc64el
* fix: Adjust ranges for RHEL 8.6 kernels
* fix: kvm-x86 requires CONFIG_KALLSYMS_ALL
* fix: mm/slab_common: drop kmem_alloc & avoid dereferencing fields 
when not using (v6.1)

2023-01-13 (National Sticker Day) LTTng modules 2.12.12
* fix: jbd2: use the correct print format
* Fix: in_x32_syscall was introduced in v4.7.0
* Explicitly skip tracing x32 system calls
* fix: kallsyms wrapper on ppc64el
* fix: Adjust ranges for RHEL 8.6 kernels
* fix: kvm-x86 requires CONFIG_KALLSYMS_ALL

Project website: https://lttng.org
Documentation: https://lttng.org/docs
Download link: https://lttng.org/download

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] LTTng UST structure support

2023-01-12 Thread Mathieu Desnoyers via lttng-dev

On 2023-01-09 09:02, chafraysse--- via lttng-dev wrote:

Hi,

I'm looking for a CTF writer to serialize instrumentations in an 
embedded Linux/Rust framework
LTTng UST looked like a very strong option, but I want to serialize 
structures as CTF compound type structures and I did not see those 
supported in the doc or api


This is correct. I am currently working on a new project called 
"libside" (see https://git.efficios.com/?p=libside.git;a=summary) which 
features support for compound types.


However, we still need to do the heavy-lifting implementation work of 
integrating this with LTTng-UST. This is the plan towards supporting 
compound types in LTTng-UST.



I'd love to have confirmation that I did not just miss something :)
If LTTng UST is out for me I will probably try to use the ctf-writer 
module of babeltrace 2 instead


For now the ctf-writer modules of bt2 would be an alternative to 
consider, but remember that it is not designed for low-impact tracing 
such as lttng-ust. So it depends on how much tracer overhead/runtime 
impact you can afford in your use-case.


Thanks,

Mathieu



Best regards,

Charles
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] lttv: Document project status as unmaintained

2023-01-10 Thread Mathieu Desnoyers via lttng-dev

Hi Florian,

I'll pull your patch into the lttv master branch, but please be aware
that the LTTV project has not seen activity since 2013, is currently
unmaintained, and that we do not plan on doing further releases of this
project. Our efforts were diverted elsewhere on the trace analysis
front, namely into Trace Compass and Babeltrace.

In order to clarify the situation, I will introduce a commit into
LTTV's master branch which will remove Yannick Brosseau from the
maintainer role in the README file, and add this section at the
beginning:

PROJECT STATUS


The LTTV project is currently unmaintained. If you need up-to-date tools
to view/analyze LTTng traces, please consider the following alternatives:

- Trace Compass (https://www.eclipse.org/tracecompass)
- Babeltrace (https://babeltrace.org)


Thank you Yannick for stepping into the role of maintainer near the
end of this project lifetime.

Michael Jeanson noticed that the lttv Fedora package was orphaned.
He just adopted it and is currently investigating the Fedora
documentation to figure out how to request its removal from Fedora.

For those interested in historical artifacts, I created the lttv
svn repository back in 2003 when I was sitting at the Decelles building
at Ecole Polytechnique, working for Prof. Michel Dagenais:

commit bbdf43d6e0e3bd3f9ade420e81915408cbe4fbba
Author: compudj 
Date:   Thu May 15 13:07:17 2003 +

Initial repository layout

git-svn-id: http://ltt.polymtl.ca/svn@1 04897980-b3bd-0310-b5e0-8ef037075253


This was the beginning of a fun ride which turned out motivating the
creation of LTTng, the Linux kernel Tracepoints, the Common Trace Format,
Trace Compass, Babeltrace, liburcu, the membarrier(2), and the rseq(2)
system calls.

LTTV had a good 10 years of activity from 2003 to 2013, but it is now high
time to redirect users to Trace Compass and Babeltrace instead.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] README issue of liburcu

2022-11-10 Thread Mathieu Desnoyers via lttng-dev

On 2022-11-02 03:24, Yongwei Wu via lttng-dev wrote:
I apologize if this is not the right place. I do not see an Issues page 
on GitHub.


The README on GitHub now says MacOS is among "Tested on", so should we 
remove Darwin from "Should also work on"?


Removed from README file in the master branch.

Thanks,

Mathieu



Best regards,

Yongwei

--
Yongwei Wu
URL: http://wyw.dcweb.cn/ 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH] always check pthread_create for failures

2022-10-03 Thread Mathieu Desnoyers via lttng-dev

On 2022-10-02 12:13, Eric Wong via lttng-dev wrote:

pthread_create may fail with EAGAIN (which is no fault of the
programmer), so don't allow the check to be compiled out.


Merged into master, stable-0.13, stable-0.12, thanks!

Mathieu



Signed-off-by: Eric Wong 
---
  src/urcu-defer-impl.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/urcu-defer-impl.h b/src/urcu-defer-impl.h
index 1c96287..cbb0ca8 100644
--- a/src/urcu-defer-impl.h
+++ b/src/urcu-defer-impl.h
@@ -417,7 +417,8 @@ static void start_defer_thread(void)
urcu_posix_assert(!ret);
  
  	ret = pthread_create(_defer, NULL, thr_defer, NULL);

-   urcu_posix_assert(!ret);
+   if (ret)
+   urcu_die(ret);
  
  	ret = pthread_sigmask(SIG_SETMASK, , NULL);

urcu_posix_assert(!ret);
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] LTTng-modules 2.13.7, 2.12.11 and LTTng-UST 2.13.5, 2.12.7

2022-09-30 Thread Mathieu Desnoyers via lttng-dev

Hi,

These bug fix releases of the LTTng kernel and user-space tracers 
contain security fixes which address memory disclosure and denial of 
service issues. Those are of relatively low severity mainly because they 
involve specific uses of the tracer by users that belong to the 
`tracing` group.


Here is the explanation of the impact for each issue corrected. The 
issues that have a security impact are tagged with [security].


The issues that were corrected in LTTng 2.12 were likely present in 
older versions, which are not maintained anymore. All users of 
LTTng-modules and LTTng-UST should upgrade.


* Kernel tracer (LTTng-modules) 2.13.7:

[security] A user belonging to the `tracing` group can use the event 
notification capture or the filtering features to target a userspace 
string (e.g. pathname input field of the openat system call) while any 
user on the system feeds an invalid pointer or a pointer to kernel 
memory to the instrumented system call. This results in a kernel OOPS in 
case of an invalid pointer, or disclosure of kernel memory to the 
tracing group if the pointer targets a kernel memory address. This is 
corrected by properly keeping track of user pointers and using the 
appropriate methods to access userspace memory.


[security] A user belonging to the `tracing` group can use the event 
notification capture or the filtering features to target a userspace 
array of integers (e.g. fildes output field of the pipe2 system call) 
while any user on the system feeds an invalid pointer or a pointer to 
kernel memory to the instrumented system call. This results in a kernel 
OOPS in case of an invalid pointer, or disclosure of kernel memory to 
the tracing group if the pointer targets a kernel memory address. This 
is corrected by properly keeping track of user pointers and using the 
appropriate methods to access userspace memory.


[security] A `tracing` group user crafting an ill-intended event 
notification capture or filter bytecode can emit load and load-field-ref 
instructions which are already specialized for the wrong field type, 
thus bypassing the instruction selection performed by the bytecode 
linker and bytecode specialization phases. When combined with passing 
invalid or kernel memory pointers to userspace memory arguments (e.g. 
pathname input field of openat or fildes output field of pipe2), this 
can result in a kernel OOPS in case of an invalid pointer, or a 
disclosure of kernel memory to the tracing group if the pointer targets 
a kernel memory address. This is corrected by rejecting specialized load 
and load-field-ref instructions in the bytecode validation phase.


Event notification capture fields that end up using more than 512 bytes 
of msgpack buffer space for a single event notification emit warnings in 
the kernel console and result in a corrupted msgpack buffer. This is 
fixed by emitting a "NIL" msgpack field rather than the field that would 
require too much space.


When an event notification capture for a userspace string or a userspace 
integer triggers a page fault, emit a "NIL" msgpack field rather than an 
empty string or a zero-value integer.


Fix a kernel OOPS on powerpc64 when the lttng_tracer module initializes, 
because the do_get_kallsyms LTTng wrapper returns the address of the 
local entry point rather than the global entry point. This is corrected 
by adjusting the offset (+4 and then -4) to get the global entry point 
on PPC64_ELF_ABI_v2.



* Kernel tracer (LTTng-modules) 2.12.11:

[security] A user belonging to the `tracing` group can use the filtering 
feature to target a userspace array of integers (e.g. fildes output 
field of the pipe2 system call) while any user on the system feeds an 
invalid pointer or a pointer to kernel memory to the instrumented system 
call. This results in a kernel OOPS in case of an invalid pointer, or 
disclosure of kernel memory to the tracing group if the pointer targets 
a kernel memory address. This is corrected by properly keeping track of 
user pointers and using the appropriate methods to access userspace memory.


[security] A `tracing` group user crafting an ill-intended filter 
bytecode can emit load and load-field-ref instructions which are already 
specialized for the wrong field type, thus bypassing the instruction 
selection performed by the bytecode linker and bytecode specialization 
phases. When combined with passing invalid or kernel memory pointers to 
userspace memory arguments (e.g. pathname input field of openat or 
fildes output field of pipe2), this can result in a kernel OOPS in case 
of an invalid pointer, or a disclosure of kernel memory to the tracing 
group if the pointer targets a kernel memory address. This is corrected 
by rejecting specialized load and load-field-ref instructions in the 
bytecode validation phase.


Fix a kernel OOPS on powerpc64 when the lttng_tracer module initializes, 
because the do_get_kallsyms LTTng wrapper returns the address of the 
local entry 

  1   2   3   4   >