[lttng-dev] [PATCH 5/7] Use __atomic builtins to implement CMM_{LOAD, STORE}_SHARED

2023-03-17 Thread Ondřej Surý via lttng-dev
Instead of using CMM_ACCESS_ONCE() with memory barriers, use __atomic
builtins with relaxed memory ordering to implement CMM_LOAD_SHARED() and
CMM_STORE_SHARED().

Signed-off-by: Ondřej Surý 
---
 include/urcu/system.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/urcu/system.h b/include/urcu/system.h
index faae390..99e7443 100644
--- a/include/urcu/system.h
+++ b/include/urcu/system.h
@@ -26,7 +26,7 @@
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
  */
-#define _CMM_LOAD_SHARED(p)   CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)   __atomic_load_n(&(p), __ATOMIC_RELAXED)
 
 /*
  * Load a data from shared memory, doing a cache flush if required.
@@ -42,7 +42,7 @@
  * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
  * follow the store.
  */
-#define _CMM_STORE_SHARED(x, v)__extension__ ({ CMM_ACCESS_ONCE(x) = 
(v); })
+#define _CMM_STORE_SHARED(x, v)__atomic_store_n(&(x), (v), 
__ATOMIC_RELAXED)
 
 /*
  * Store v into x, where x is located in shared memory. Performs the
@@ -51,9 +51,8 @@
 #define CMM_STORE_SHARED(x, v) \
__extension__   \
({  \
-   __typeof__(x) _v = _CMM_STORE_SHARED(x, v); \
+   _CMM_STORE_SHARED(x, v);\
cmm_smp_wmc();  \
-   _v = _v;/* Work around clang "unused result" */ \
})
 
 #endif /* _URCU_SYSTEM_H */
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 7/7] Experiment: Add explicit memory barrier in free_completion()

2023-03-17 Thread Ondřej Surý via lttng-dev
FIXME: This is experiment that adds explicit memory barrier in the
free_completion in the workqueue.c, so ThreadSanitizer knows it's ok to
free the resources.

Signed-off-by: Ondřej Surý 
---
 src/workqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/workqueue.c b/src/workqueue.c
index 1039d72..f21907f 100644
--- a/src/workqueue.c
+++ b/src/workqueue.c
@@ -377,6 +377,7 @@ void free_completion(struct urcu_ref *ref)
struct urcu_workqueue_completion *completion;
 
completion = caa_container_of(ref, struct urcu_workqueue_completion, 
ref);
+   assert(!urcu_ref_get_unless_zero(&completion->ref));
free(completion);
 }
 
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 6/7] Fix: uatomic_or() need retyping to uintptr_t in rculfhash.c

2023-03-17 Thread Ondřej Surý via lttng-dev
When adding REMOVED_FLAG to the pointers in the rculfhash
implementation, retype the generic pointer to uintptr_t to fix the
compiler error.

Signed-off-by: Ondřej Surý 
---
 src/rculfhash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/rculfhash.c b/src/rculfhash.c
index b456415..863387e 100644
--- a/src/rculfhash.c
+++ b/src/rculfhash.c
@@ -1198,7 +1198,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long size,
 * Knowing which wins the race will be known after the garbage
 * collection phase, stay tuned!
 */
-   uatomic_or(&node->next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)&node->next, REMOVED_FLAG);
/* We performed the (logical) deletion. */
 
/*
@@ -1441,7 +1441,7 @@ void remove_table_partition(struct cds_lfht *ht, unsigned 
long i,
dbg_printf("remove entry: order %lu index %lu hash %lu\n",
   i, j, j);
/* Set the REMOVED_FLAG to freeze the ->next for gc */
-   uatomic_or(&fini_bucket->next, REMOVED_FLAG);
+   uatomic_or((uintptr_t *)&fini_bucket->next, REMOVED_FLAG);
_cds_lfht_gc_bucket(parent_bucket, fini_bucket);
}
ht->flavor->read_unlock();
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 4/7] Replace the internal pointer manipulation with __atomic builtins

2023-03-17 Thread Ondřej Surý via lttng-dev
Instead of custom code, use the __atomic builtins to implement the
rcu_dereference(), rcu_cmpxchg_pointer(), rcu_xchg_pointer() and
rcu_assign_pointer().

Signed-off-by: Ondřej Surý 
---
 include/urcu/arch.h   | 20 +
 include/urcu/arch/alpha.h |  6 +++
 include/urcu/arch/arm.h   | 12 ++
 include/urcu/arch/mips.h  |  2 +
 include/urcu/arch/nios2.h |  2 +
 include/urcu/arch/ppc.h   |  6 +++
 include/urcu/arch/s390.h  |  2 +
 include/urcu/arch/sparc64.h   |  6 +++
 include/urcu/arch/x86.h   | 20 +
 include/urcu/static/pointer.h | 77 +++
 10 files changed, 90 insertions(+), 63 deletions(-)

diff --git a/include/urcu/arch.h b/include/urcu/arch.h
index d3914da..aec6fa1 100644
--- a/include/urcu/arch.h
+++ b/include/urcu/arch.h
@@ -21,6 +21,26 @@
 #ifndef _URCU_ARCH_H
 #define _URCU_ARCH_H
 
+#if !defined(__has_feature)
+#define __has_feature(x) 0
+#endif /* if !defined(__has_feature) */
+
+/* GCC defines __SANITIZE_ADDRESS__, so reuse the macro for clang */
+#if __has_feature(address_sanitizer)
+#define __SANITIZE_ADDRESS__ 1
+#endif /* if __has_feature(address_sanitizer) */
+
+#ifdef __SANITIZE_THREAD__
+/* FIXME: Somebody who understands the barriers should look into this */
+#define cmm_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_smp_mb() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#define cmm_smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+#define cmm_smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)
+#define cmm_read_barrier_depends() __atomic_thread_fence(__ATOMIC_ACQ_REL)
+#endif
+
 /*
  * Architecture detection using compiler defines.
  *
diff --git a/include/urcu/arch/alpha.h b/include/urcu/arch/alpha.h
index dc33e28..84526ef 100644
--- a/include/urcu/arch/alpha.h
+++ b/include/urcu/arch/alpha.h
@@ -29,9 +29,15 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ ("mb":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()  __asm__ __volatile__ ("wmb":::"memory")
+#endif
+#ifndef cmm_read_barrier_depends
 #define cmm_read_barrier_depends() __asm__ __volatile__ ("mb":::"memory")
+#endif
 
 /*
  * On Linux, define the membarrier system call number if not yet available in
diff --git a/include/urcu/arch/arm.h b/include/urcu/arch/arm.h
index 54ca4fa..4950e13 100644
--- a/include/urcu/arch/arm.h
+++ b/include/urcu/arch/arm.h
@@ -42,16 +42,28 @@ extern "C" {
 /*
  * Issues full system DMB operation.
  */
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_rmb
 #define cmm_rmb()  __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
+#ifndef cmm_wmb
 #define cmm_wmb()  __asm__ __volatile__ ("dmb sy":::"memory")
+#endif
 
 /*
  * Issues DMB operation only to the inner shareable domain.
  */
+#ifndef cmm_smp_mb
 #define cmm_smp_mb()   __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()  __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
+#ifndef cmm_smp_wmb
 #define cmm_smp_wmb()  __asm__ __volatile__ ("dmb ish":::"memory")
+#endif
 
 #endif /* URCU_ARCH_ARMV7 */
 
diff --git a/include/urcu/arch/mips.h b/include/urcu/arch/mips.h
index ea5b7e9..b9ee021 100644
--- a/include/urcu/arch/mips.h
+++ b/include/urcu/arch/mips.h
@@ -30,11 +30,13 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   __asm__ __volatile__ (  \
"   .setmips2   \n" \
"   sync\n" \
"   .setmips0   \n" \
:::"memory")
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/nios2.h b/include/urcu/arch/nios2.h
index b4f3e50..5def45c 100644
--- a/include/urcu/arch/nios2.h
+++ b/include/urcu/arch/nios2.h
@@ -29,7 +29,9 @@
 extern "C" {
 #endif
 
+#ifndef cmm_mb
 #define cmm_mb()   cmm_barrier()
+#endif
 
 #ifdef __cplusplus
 }
diff --git a/include/urcu/arch/ppc.h b/include/urcu/arch/ppc.h
index 791529e..b8ec40d 100644
--- a/include/urcu/arch/ppc.h
+++ b/include/urcu/arch/ppc.h
@@ -48,7 +48,9 @@ extern "C" {
  * order cacheable and non-cacheable memory operations separately---i.e.
  * not the latter against the former.
  */
+#ifndef cmm_mb
 #define cmm_mb() __asm__ __volatile__ ("sync":::"memory")
+#endif
 
 /*
  * lwsync orders loads in cacheable memory with respect to other loads,
@@ -56,8 +58,12 @@ extern "C" {
  * Therefore, use it for barriers ordering accesses to cacheable memory
  * only.
  */
+#ifndef cmm_smp_rmb
 #define cmm_smp_rmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memory")
+#endif
+#ifndef cmm_smp_rmb
 #define cmm_smp_wmb()__asm__ __volatile__ (LWSYNC_OPCODE:::"memo

[lttng-dev] [PATCH 3/7] Use __atomic_thread_fence() for cmm_barrier()

2023-03-17 Thread Ondřej Surý via lttng-dev
Use __atomic_thread_fence(__ATOMIC_ACQ_REL) for cmm_barrier(), so
ThreadSanitizer can understand the memory synchronization.

FIXME: What should be the correct memory ordering here?

Signed-off-by: Ondřej Surý 
---
 include/urcu/compiler.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/urcu/compiler.h b/include/urcu/compiler.h
index 2f32b38..ede909f 100644
--- a/include/urcu/compiler.h
+++ b/include/urcu/compiler.h
@@ -28,7 +28,8 @@
 #define caa_likely(x)  __builtin_expect(!!(x), 1)
 #define caa_unlikely(x)__builtin_expect(!!(x), 0)
 
-#definecmm_barrier()   __asm__ __volatile__ ("" : : : "memory")
+/* FIXME: What would be a correct memory ordering here? */
+#definecmm_barrier()   __atomic_signal_fence(__ATOMIC_ACQ_REL)
 
 /*
  * Instruct the compiler to perform only a single access to a variable
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 2/7] Use gcc __atomic builtis for implementation

2023-03-17 Thread Ondřej Surý via lttng-dev
Replace the custom assembly code in include/urcu/uatomic/ with __atomic
builtins provided by C11-compatible compiler.

Signed-off-by: Ondřej Surý 
---
 include/Makefile.am|  16 -
 include/urcu/uatomic.h |  84 +++--
 include/urcu/uatomic/aarch64.h |  41 ---
 include/urcu/uatomic/alpha.h   |  32 --
 include/urcu/uatomic/arm.h |  57 ---
 include/urcu/uatomic/gcc.h |  46 ---
 include/urcu/uatomic/generic.h | 613 ---
 include/urcu/uatomic/hppa.h|  10 -
 include/urcu/uatomic/ia64.h|  41 ---
 include/urcu/uatomic/m68k.h|  44 ---
 include/urcu/uatomic/mips.h|  32 --
 include/urcu/uatomic/nios2.h   |  32 --
 include/urcu/uatomic/ppc.h | 237 
 include/urcu/uatomic/riscv.h   |  44 ---
 include/urcu/uatomic/s390.h| 170 -
 include/urcu/uatomic/sparc64.h |  81 -
 include/urcu/uatomic/tile.h|  41 ---
 include/urcu/uatomic/x86.h | 646 -
 18 files changed, 53 insertions(+), 2214 deletions(-)
 delete mode 100644 include/urcu/uatomic/aarch64.h
 delete mode 100644 include/urcu/uatomic/alpha.h
 delete mode 100644 include/urcu/uatomic/arm.h
 delete mode 100644 include/urcu/uatomic/gcc.h
 delete mode 100644 include/urcu/uatomic/generic.h
 delete mode 100644 include/urcu/uatomic/hppa.h
 delete mode 100644 include/urcu/uatomic/ia64.h
 delete mode 100644 include/urcu/uatomic/m68k.h
 delete mode 100644 include/urcu/uatomic/mips.h
 delete mode 100644 include/urcu/uatomic/nios2.h
 delete mode 100644 include/urcu/uatomic/ppc.h
 delete mode 100644 include/urcu/uatomic/riscv.h
 delete mode 100644 include/urcu/uatomic/s390.h
 delete mode 100644 include/urcu/uatomic/sparc64.h
 delete mode 100644 include/urcu/uatomic/tile.h
 delete mode 100644 include/urcu/uatomic/x86.h

diff --git a/include/Makefile.am b/include/Makefile.am
index ba1fe60..53a28fd 100644
--- a/include/Makefile.am
+++ b/include/Makefile.am
@@ -59,24 +59,8 @@ nobase_include_HEADERS = \
urcu/syscall-compat.h \
urcu/system.h \
urcu/tls-compat.h \
-   urcu/uatomic/aarch64.h \
-   urcu/uatomic/alpha.h \
urcu/uatomic_arch.h \
-   urcu/uatomic/arm.h \
-   urcu/uatomic/gcc.h \
-   urcu/uatomic/generic.h \
urcu/uatomic.h \
-   urcu/uatomic/hppa.h \
-   urcu/uatomic/ia64.h \
-   urcu/uatomic/m68k.h \
-   urcu/uatomic/mips.h \
-   urcu/uatomic/nios2.h \
-   urcu/uatomic/ppc.h \
-   urcu/uatomic/riscv.h \
-   urcu/uatomic/s390.h \
-   urcu/uatomic/sparc64.h \
-   urcu/uatomic/tile.h \
-   urcu/uatomic/x86.h \
urcu/urcu-bp.h \
urcu/urcu-futex.h \
urcu/urcu.h \
diff --git a/include/urcu/uatomic.h b/include/urcu/uatomic.h
index 2fb5fd4..4dbef4c 100644
--- a/include/urcu/uatomic.h
+++ b/include/urcu/uatomic.h
@@ -22,37 +22,59 @@
 #define _URCU_UATOMIC_H
 
 #include 
+#include 
 
-#if defined(URCU_ARCH_X86)
-#include 
-#elif defined(URCU_ARCH_PPC)
-#include 
-#elif defined(URCU_ARCH_S390)
-#include 
-#elif defined(URCU_ARCH_SPARC64)
-#include 
-#elif defined(URCU_ARCH_ALPHA)
-#include 
-#elif defined(URCU_ARCH_IA64)
-#include 
-#elif defined(URCU_ARCH_ARM)
-#include 
-#elif defined(URCU_ARCH_AARCH64)
-#include 
-#elif defined(URCU_ARCH_MIPS)
-#include 
-#elif defined(URCU_ARCH_NIOS2)
-#include 
-#elif defined(URCU_ARCH_TILE)
-#include 
-#elif defined(URCU_ARCH_HPPA)
-#include 
-#elif defined(URCU_ARCH_M68K)
-#include 
-#elif defined(URCU_ARCH_RISCV)
-#include 
-#else
-#error "Cannot build: unrecognized architecture, see ."
-#endif
+#define UATOMIC_HAS_ATOMIC_BYTE
+#define UATOMIC_HAS_ATOMIC_SHORT
+
+#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)
+
+#define uatomic_read(addr) __atomic_load_n((addr), __ATOMIC_CONSUME)
+
+#define uatomic_xchg(addr, v) __atomic_exchange_n((addr), (v), 
__ATOMIC_ACQ_REL)
+
+#define uatomic_cmpxchg(addr, old, new) \
+   ({  
\
+   __typeof__(*(addr)) __old = old;
\
+   __atomic_compare_exchange_n(addr, &__old, new, 0,   
\
+   __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);\
+   __old;  
\
+   })
+
+#define uatomic_add_return(addr, v) \
+   __atomic_add_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_add(addr, v) \
+   (void)__atomic_add_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_sub_return(addr, v) \
+   __atomic_sub_fetch((addr), (v), __ATOMIC_ACQ_REL)
+
+#define uatomic_sub(addr, v) \
+   (void)__atomic_sub_fetch((addr), (v), __ATOMIC_RELAXED)
+
+#define uatomic_and(addr, mask) \
+   (void)__atomic_and_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define uatomic_or(addr, mask) \
+   (void)__atomic_or_fetch((addr), (mask), __ATOMIC_RELAXED)
+
+#define

[lttng-dev] [PATCH 1/7] Require __atomic builtins to build

2023-03-17 Thread Ondřej Surý via lttng-dev
Add autoconf checks for all __atomic builtins that urcu require, and
adjust the gcc and clang versions in the README.md.

Signed-off-by: Ondřej Surý 
---
 README.md| 33 +
 configure.ac | 15 +++
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index ba5bb08..a65a07a 100644
--- a/README.md
+++ b/README.md
@@ -68,30 +68,15 @@ Should also work on:
 
 (more testing needed before claiming support for these OS).
 
-Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 or
-better.
-
-The C compiler used needs to support at least C99. The C++ compiler used
-needs to support at least C++11.
-
-The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
-supported, with the following exceptions:
-
-  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
-accesses to offsets in a TLS structure on 32-bit x86. These versions are
-therefore not compatible with `liburcu` on x86 32-bit
-(i386, i486, i586, i686).
-The problem has been reported to the GCC community:
-
-  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
-See 
-  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic builtins
-support. For ARM this was introduced with GCC 4.4:
-.
-  - Linux aarch64 depends on GCC 5.1 or better because prior versions
-perform unsafe access to deallocated stack.
-
-Clang version 3.0 (based on LLVM 3.0) is supported.
+Linux ARM depends on running a Linux kernel 2.6.15 or better.
+
+The C compiler used needs to support at least C99 and __atomic
+builtins. The C++ compiler used needs to support at least C++11
+and __atomic builtins.
+
+The GCC compiler versions 4.7 or better are supported.
+
+Clang version 3.1 (based on LLVM 3.1) is supported.
 
 Glibc >= 2.4 should work but the older version we test against is
 currently 2.17.
diff --git a/configure.ac b/configure.ac
index 909cf1d..cb7ba18 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,21 @@ AC_SEARCH_LIBS([clock_gettime], [rt], [
   AC_DEFINE([CONFIG_RCU_HAVE_CLOCK_GETTIME], [1], [clock_gettime() is 
detected.])
 ])
 
+# Require __atomic builtins
+AC_COMPILE_IFELSE(
+   [AC_LANG_PROGRAM(
+   [[int x, y;]],
+   [[__atomic_store_n(&x, 0, __ATOMIC_RELEASE);
+ __atomic_load_n(&x, __ATOMIC_CONSUME);
+ y = __atomic_exchange_n(&x, 1, __ATOMIC_ACQ_REL);
+ __atomic_compare_exchange_n(&x, &y, 0, 0, __ATOMIC_ACQ_REL, 
__ATOMIC_CONSUME);
+ __atomic_add_fetch(&x, 1, __ATOMIC_ACQ_REL);
+ __atomic_sub_fetch(&x, 1, __ATOMIC_ACQ_REL);
+ __atomic_and_fetch(&x, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_or_fetch(&x, 0x01, __ATOMIC_ACQ_REL);
+ __atomic_thread_fence(__ATOMIC_ACQ_REL)]])],
+   [],
+   [AC_MSG_ERROR([The compiler does not support __atomic builtins])])
 
 ## ##
 ## Optional features selection ##
-- 
2.39.2

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH 0/7] Replace the custom code with gcc/clang __atomic builtins

2023-03-17 Thread Ondřej Surý via lttng-dev
Hi,

(this is my first time using git send-email, so I apologise in advance if
anything breaks).

Here's my attempt to convert the Userspace RCU to use __atomic builtins whenever
possible instead of custom assembly.

The __atomic builtins were first introduced in gcc 4.7.0 and clang 3.1.0.

Apart from simplifying the code, this should also help ThreadSanitizer to
understand the memory synchronization and report less (or no) warnings.

The code compiles and the tests passed (on amd64).

This is by no means complete, and most probably I missed or misunderstood
something, but it's a solid start, so I am submitting the patch set for
discussion and review.

Thanks,
--
Ondřej Surý 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 13:02, Ondřej Surý wrote:

On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
wrote:

I would indeed like to remove all the custom atomics assembly code from liburcu 
now that there are good atomics support in the major compilers (gcc and clang).


Here's very preliminary implementation:

https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/2

I just did something wrong somewhere along the path and it doesn't compile now,
but it did for me locally.

I am submitting this now as it's 18:00 Friday evening and my kids are starting 
to
be angry at me :).

This will need some more work - I think some of the cmm_ macros might be dropped
now, and somebody who does that more often than I should take a look at the 
memory
orderings.


A few comments:

cmm_barrier() should rather be __atomic_signal_fence().

Also I notice this macro pattern (coding style):

#define uatomic_set(addr, v) __atomic_store_n((addr), (v), __ATOMIC_RELEASE)

The extra parentheses for parameters are not needed, because the comma is pretty
much the last operator in terms of priority. The following would be preferred
specifically because those are separated by comma:

#define uatomic_set(addr, v) __atomic_store_n(addr, v, __ATOMIC_RELEASE)

Our memory barrier semantic are similar to the Linux kernel, where the following
imply ACQ_REL because they return something: cmpxchg, add_return, sub_return, 
xchg.

The rest (add, sub, and, or, inc, dec) are __ATOMIC_RELAXED. Note that
cmm_smp_mb__before/after_uatomic_*() need to be implemented as
__atomic_thread_fence(__ATOMIC_ACQ_REL).

There are some architectures where we will want to keep a specialized version
of those add, sub, and, or, inc, dec operations which include the ACQ_REL 
semantic,
e.g. x86, where this is implied by the LOCK prefix. For those the 
cmm_smp_mb__before/after_uatomic_*()
will be no-ops.

The CMM_STORE_SHARED is not meant to have a RELEASE semantic. It is meant to
update variables that don't need the release ordering. The ATOMIC_CONSUME was
not the intent at the CMM_LOAD_SHARED level neither.

(this is just from looking around at the patches, it would be better if we can 
have the
patches posted to the mailing list for further discussion)

Thanks!

Mathieu




Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Ondřej Surý via lttng-dev
> On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
> wrote:
> 
> I would indeed like to remove all the custom atomics assembly code from 
> liburcu now that there are good atomics support in the major compilers (gcc 
> and clang). 

Here's very preliminary implementation:

https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/2

I just did something wrong somewhere along the path and it doesn't compile now,
but it did for me locally.

I am submitting this now as it's 18:00 Friday evening and my kids are starting 
to
be angry at me :).

This will need some more work - I think some of the cmm_ macros might be dropped
now, and somebody who does that more often than I should take a look at the 
memory
orderings.

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Duncan Sands via lttng-dev

Hi Ondřej,


I have two questions:

1. I've tried to help TSAN by replacing the custom atomics with __atomic gcc
   primitives - that seems to work pretty well.  Note that using C11 stdatomics
   is frankly not possible here because it would require wrapping everything 
into
   _Atomic().


I'm using the attached patch which does much the same.

Ciao, Duncan.diff --git a/External/UserspaceRCU/userspace-rcu/include/urcu/system.h b/External/UserspaceRCU/userspace-rcu/include/urcu/system.h
index faae390554..82898d1dbe 100644
--- a/External/UserspaceRCU/userspace-rcu/include/urcu/system.h
+++ b/External/UserspaceRCU/userspace-rcu/include/urcu/system.h
@@ -26,34 +26,45 @@
  * Identify a shared load. A cmm_smp_rmc() or cmm_smp_mc() should come
  * before the load.
  */
-#define _CMM_LOAD_SHARED(p)	   CMM_ACCESS_ONCE(p)
+#define _CMM_LOAD_SHARED(p)	\
+	__extension__		\
+	({			\
+		__typeof__(p) v;\
+		__atomic_load(&p, &v, __ATOMIC_RELAXED);	\
+		v;		\
+	})
 
 /*
  * Load a data from shared memory, doing a cache flush if required.
  */
-#define CMM_LOAD_SHARED(p)			\
-	__extension__			\
-	({\
-		cmm_smp_rmc();		\
-		_CMM_LOAD_SHARED(p);	\
+#define CMM_LOAD_SHARED(p)	\
+	__extension__		\
+	({			\
+		__typeof__(p) v;\
+		__atomic_load(&p, &v, __ATOMIC_ACQUIRE);	\
+		v;		\
 	})
 
 /*
  * Identify a shared store. A cmm_smp_wmc() or cmm_smp_mc() should
  * follow the store.
  */
-#define _CMM_STORE_SHARED(x, v)	__extension__ ({ CMM_ACCESS_ONCE(x) = (v); })
+#define _CMM_STORE_SHARED(x, v)	\
+	__extension__		\
+	({			\
+		__typeof__(x) w = v;\
+		__atomic_store(&x, &w, __ATOMIC_RELAXED);	\
+	})
 
 /*
  * Store v into x, where x is located in shared memory. Performs the
  * required cache flush after writing. Returns v.
  */
-#define CMM_STORE_SHARED(x, v)		\
-	__extension__			\
-	({\
-		__typeof__(x) _v = _CMM_STORE_SHARED(x, v);		\
-		cmm_smp_wmc();		\
-		_v = _v;	/* Work around clang "unused result" */	\
+#define CMM_STORE_SHARED(x, v)	\
+	__extension__		\
+	({			\
+		__typeof__(x) w = v;\
+		__atomic_store(&x, &w, __ATOMIC_RELEASE);	\
 	})
 
 #endif /* _URCU_SYSTEM_H */
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Tracing a docker containerized java application from the host

2023-03-17 Thread Jérémie Galarneau via lttng-dev
I'm re-adding the list so every one can benefit from the exchange.


The agent and session daemon both attempt to communicate on "0.0.0.0" (the 
local host). I would assume you can expose/publish (in docker terms) the 
session daemon's port on the local interface too. I have not tried that though.

As for the contents of the /var/run/lttng folder, you ultimately have to make 
those files accessible to the applications. You can use the LTTNG_HOME 
environment variable to control where the session daemon creates those files 
and where the applications look for them. Then, adjust the container's mount 
points accordingly.

To record the events, the shared memory must be accessible to both ends (host 
and container). You will need to mount /dev/shm inside the container.

I think Michael's configuration for Kubernetes can help you understand how the 
various parts fit together:
https://github.com/mjeanson/gcloud-lttng/blob/master/pods/demo1.yaml.in

Jérémie

--
Jérémie Galarneau
EfficiOS Inc.
https://www.efficios.com

From: Adel Belkhiri 
Sent: March 16, 2023 19:23
To: Jérémie Galarneau 
Subject: Re: [lttng-dev] Tracing a docker containerized java application from 
the host

Hello again,

I solved the issue. The problem was that the host address from inside the 
container was 172.0.0.1 but the agent was looking for 127.0.0.1. So, one last 
question, is it possible to instruct the agent to look for/communicate with the 
sessiond using a specific address (different from 127.0.0.1)? Also, I'm 
currently sharing the folder "/var/run/lttng" with the container, which in my 
opinion isn't a very elegant way. Then, is it possible to do otherwise? If I 
only copy the file "agent.port" inside the container, I can list the registered 
events, but I cannot record them. (I guess it is related to the location of the 
app socket file).

Adel




On Thu, Mar 16, 2023 at 3:57 PM Adel Belkhiri 
mailto:adel.belkh...@gmail.com>> wrote:
Hi Jérémie,

Thank you for your reply. Yes, the container can communicate with the LTTtng 
session daemon that is running on the host machine. For instance, if I execute 
"lttng list -u" on the host I can see that there is an application registered 
and there are userspace events (e.g., lttng_jul:event,  statedumps, etc.). 
However, when I execute "lttng --jul -a", no java events are displayed.

Adel

On Thu, Mar 16, 2023 at 12:14 PM Jérémie Galarneau 
mailto:jga...@efficios.com>> wrote:
Hi Adel,

The java tracing facilities make use of an agent that communicates with the 
session deamon through a TCP socket. You must to ensure your java application 
can connect to the session daemon's 'agent-tcp-port'.

https://lttng.org/man/8/lttng-sessiond/v2.13/#doc-opt--agent-tcp-port

Is that port properly exposed by your container configuration?

Jérémie

--
Jérémie Galarneau
EfficiOS Inc.
https://www.efficios.com

From: lttng-dev 
mailto:lttng-dev-boun...@lists.lttng.org>> 
on behalf of Adel Belkhiri via lttng-dev 
mailto:lttng-dev@lists.lttng.org>>
Sent: March 15, 2023 20:03
To: lttng-dev@lists.lttng.org 
mailto:lttng-dev@lists.lttng.org>>
Subject: [lttng-dev] Tracing a docker containerized java application from the 
host

Hello everyone,

I am reaching out to seek assistance in tracing an instrumented Java 
application running within a Docker container from the host side.

In my setup, the LTTng session daemon, running on the host side, indicates that 
application registration has been completed (please refer to the logs). 
However, when running the command "lttng list --jul," no registered 
applications are displayed. Although the application successfully creates the 
"LttngLogHandler" object and attaches it to the logger, no traces are generated.

To allow the application to access the lttng socket file, the host's 
/var/run/lttng directory is shared with the container at launch (using the 
command "sudo docker run -p 8080:8080 -v /var/run/lttng:/var/run/lttng 
sample-image"). I am only interested in collecting userspace Java (--jul) 
traces.

I would greatly appreciate any assistance or guidance that you can provide. 
Thank you for your time.

Additional details include:

Ubuntu 22.04
Kernel: 5.15.0-60-generic
LTTng session daemon version: 2.13.2
Docker version 20.10.21, build 20.10.21-0ubuntu1~22.04.2
Regards,
Adel Belkhiri
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-17 11:50, Ondřej Surý wrote:

On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
wrote:

Sure, can you please submit the patch as a separate email with subject/commit 
message/signed-off-by tag ?



https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/1.patch

Would this work for you?

Or do you need to have the patch attached?


Having the patch attached (e.g. using git send-email) would be better, 
but I don't mind downloading the file for this time. Merged into liburcu 
master branch, thanks!


Mathieu



Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Ondřej Surý via lttng-dev
> On 17. 3. 2023, at 14:44, Mathieu Desnoyers  
> wrote:
> 
> Sure, can you please submit the patch as a separate email with subject/commit 
> message/signed-off-by tag ?


https://gitlab.isc.org/isc-projects/userspace-rcu/-/merge_requests/1.patch

Would this work for you?

Or do you need to have the patch attached?

Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] userspace-rcu and ThreadSanitizer

2023-03-17 Thread Mathieu Desnoyers via lttng-dev

On 2023-03-14 08:26, Ondřej Surý via lttng-dev wrote:

Hey,

we use ThreadSanitizer in BIND 9 CI quite extensively and with userspace-rcu
it's lit like an American house on Saturnalia ;).


Haha, I have no doubt about it. Userspace RCU is all about concurrent 
accesses, and so far possesses no TSAN annotations.




I have two questions:

1. I've tried to help TSAN by replacing the custom atomics with __atomic gcc
   primitives - that seems to work pretty well.  Note that using C11 stdatomics
   is frankly not possible here because it would require wrapping everything 
into
   _Atomic().


Agreed. gcc __atomic seems to be the way to go.



   Do you want me to contribute this back? And how should I plug this into the
   existing structure?  This touches:

include/urcu/static/pointer.h
include/urcu/system.h
include/urcu/uatomic.h


I would indeed like to remove all the custom atomics assembly code from 
liburcu now that there are good atomics support in the major compilers 
(gcc and clang). I am also tempted to bump the base compiler version 
requirements (for both gcc and clang) to something less ancient than 
what we have currently for the next liburcu releases if it helps us rely 
on non-buggy atomics implementations. The currently supported compilers 
are stated in the README.md file:


"Linux ARM depends on running a Linux kernel 2.6.15 or better, GCC 4.4 
or better.


The C compiler used needs to support at least C99. The C++ compiler used
needs to support at least C++11.

The GCC compiler versions 3.3, 3.4, 4.0, 4.1, 4.2, 4.3, 4.4 and 4.5 are
supported, with the following exceptions:

  - GCC 3.3 and 3.4 have a bug that prevents them from generating volatile
accesses to offsets in a TLS structure on 32-bit x86. These 
versions are

therefore not compatible with `liburcu` on x86 32-bit
(i386, i486, i586, i686).
The problem has been reported to the GCC community:

  - GCC 3.3 cannot match the "xchg" instruction on 32-bit x86 build.
See 
  - Alpha, ia64 and ARM architectures depend on GCC 4.x with atomic 
builtins

support. For ARM this was introduced with GCC 4.4:
.
  - Linux aarch64 depends on GCC 5.1 or better because prior versions
perform unsafe access to deallocated stack.

Clang version 3.0 (based on LLVM 3.0) is supported."

For gcc, I wonder if gcc-4.8 has appropriate support for __atomic on all 
supported architectures supported by liburcu ?


I also wonder what would be a good conservative baseline version for clang.

As we introduce a newer compiler baseline version, I would be tempted to 
add a compiler version detection in include/urcu/compiler.h and #warn 
whenever the compiler is too old. This is similar to what we do for the
compiler disallow list with URCU_GCC_VERSION, but enforced with a 
warning rather than a #error. The last thing I want is to end up wasting 
people's time due to compiling with a buggy old compiler, so I favor a 
"fail early" approach.





2. I know there's KTSAN, so it must work somehow, but was there any success
   on using ThreadSanitizer on projects using Userspace-RCU?  It mostly seems
   to highlight the CDS parts of the code.


Not AFAIK.



I can help TSAN to understand some of the code or suppress some of the warnings,
but I do want to prevent the code to be full of stuff like this:

static void
destroy_adbname_rcu_head(struct rcu_head *rcu_head) {
 dns_adbname_t *adbname = caa_container_of(rcu_head, dns_adbname_t,
   rcu_head);

#ifdef __SANITIZE_THREAD__
 SPINLOCK(&adbname->lock);
 SPINUNLOCK(&adbname->lock);
#endif

 destroy_adbname(adbname);
}


Indeed, we'd want to improve the liburcu header files and implementation 
by adding the appropriate annotation there.




I am absolutely sure that the adbname can be destroyed here (because of the
reference counting), but TSAN had a problem with it. Doing the "fake" barrier
with a spinlock here made it stop consider this to be a data race.

I also had to disable the auto_resize of cds_lfht when running under TSAN.

I am also worried that by hiding some code from TSAN, we might miss a legitimate
error.

All I found using Google was this notice from 2014:
https://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg05121.html

and perhaps this:
https://github.com/google/sanitizers/issues/1415

(Perhaps, I should look into annotating urcu code with TSAN annotations?)


Yes, I suspect we'll want to add TSAN annotation to liburcu code, and 
perhaps Helgrind and DRD annotations as well while we are at it. Those 
tools are very valuable development tools, which makes it worthwhile to 
add the relevant annotations to help them figure out liburcu intricacies.






3. As an extra bonus, this is going to be needed with clang-17 as noreturn is 
now
reserved w