date:20240710

Re: [PATCH] test/vsock: add install target

2024-07-10 Thread Jakub Kicinski

On Wed, 10 Jul 2024 13:58:39 +0200 Stefano Garzarella wrote:
> There is a comment there:
> 
>  # Avoid changing the rest of the logic here and lib.mk.
> 
> Added by commit 17eac6c2db8b2cdfe33d40229bdda2acd86b304a.
> 
> IIUC they re-used INSTALL_PATH, just to avoid too many changes in that 
> file and in tools/testing/selftests/lib.mk
> 
> So, IMHO we should not care about it and only use VSOCK_INSTALL_PATH if 
> you don't want to conflict with INSTALL_PATH.

Any reason why vsock isn't part of selftests in the first place?

Re: [PATCH] bootconfig: Remove duplicate included header file linux/bootconfig.h

2024-07-10 Thread Google

On Thu, 11 Jul 2024 02:21:53 +0200
Thorsten Blum  wrote:

> The header file linux/bootconfig.h is included whether __KERNEL__ is
> defined or not.
> 
> Include it only once before the #ifdef/#else/#endif preprocessor
> directives and remove the following make includecheck warning:
> 
>   linux/bootconfig.h is included more than once
> 

OK, but this leaves only a comment in !__KERNEL__ part.
That comment should be moved and updated too. Please move this on the top of
this file and remove `!__KERNEL__` part.

/*
 * NOTE: This file is also used by tools/bootconfig, because tools/bootconfig
 * will run the parser for sanity test.
 * This does NOT mean lib/bootconfig.c is always available in the user space.
 * However, if you change this file, please make sure the tools/bootconfig
 * has no issue on building and running.
 */

Thank you,

> Signed-off-by: Thorsten Blum 
> ---
>  lib/bootconfig.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index 97f8911ea339..297871455db5 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -4,8 +4,9 @@
>   * Masami Hiramatsu 
>   */
>  
> -#ifdef __KERNEL__
>  #include 
> +
> +#ifdef __KERNEL__
>  #include 
>  #include 
>  #include 
> @@ -33,7 +34,6 @@ const char * __init xbc_get_embedded_bootconfig(size_t 
> *size)
>   * However, if you change this file, please make sure the tools/bootconfig
>   * has no issue on building and running.
>   */
> -#include 
>  #endif
>  
>  /*
> -- 
> 2.45.2
> 

-- 
Masami Hiramatsu (Google)

[ANNOUNCE] 5.10.220-rt112

2024-07-10 Thread Luis Claudio R. Goncalves

Hello RT-list!

I'm pleased to announce the 5.10.220-rt112 stable release.

This release is just an update to the new stable 5.10.220
version and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v5.10-rt
  Head SHA1: b7364cea19a02eda9039a2d34c5f105d77f4697a

Or to build 5.10.220-rt112 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz

  https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.220.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.220-rt112.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis

[PATCH] bootconfig: Remove duplicate included header file linux/bootconfig.h

2024-07-10 Thread Thorsten Blum

The header file linux/bootconfig.h is included whether __KERNEL__ is
defined or not.

Include it only once before the #ifdef/#else/#endif preprocessor
directives and remove the following make includecheck warning:

  linux/bootconfig.h is included more than once

Signed-off-by: Thorsten Blum 
---
 lib/bootconfig.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 97f8911ea339..297871455db5 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -4,8 +4,9 @@
  * Masami Hiramatsu 
  */
 
-#ifdef __KERNEL__
 #include 
+
+#ifdef __KERNEL__
 #include 
 #include 
 #include 
@@ -33,7 +34,6 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
  * However, if you change this file, please make sure the tools/bootconfig
  * has no issue on building and running.
  */
-#include 
 #endif
 
 /*
-- 
2.45.2

[PATCH v2 4/4] trace: platform/x86/intel/ifs: Add SBAF trace support

2024-07-10 Thread Kuppuswamy Sathyanarayanan

From: Jithu Joseph 

Add tracing support for the SBAF IFS tests, which may be useful for
debugging systems that fail these tests. Log details like test content
batch number, SBAF bundle ID, program index and the exact errors or
warnings encountered by each HT thread during the test.

Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Signed-off-by: Jithu Joseph 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 include/trace/events/intel_ifs.h | 27 
 drivers/platform/x86/intel/ifs/runtest.c |  1 +
 2 files changed, 28 insertions(+)

diff --git a/include/trace/events/intel_ifs.h b/include/trace/events/intel_ifs.h
index 0d88ebf2c980..70323acde1de 100644
--- a/include/trace/events/intel_ifs.h
+++ b/include/trace/events/intel_ifs.h
@@ -35,6 +35,33 @@ TRACE_EVENT(ifs_status,
__entry->status)
 );
 
+TRACE_EVENT(ifs_sbaf,
+
+   TP_PROTO(int batch, union ifs_sbaf activate, union ifs_sbaf_status 
status),
+
+   TP_ARGS(batch, activate, status),
+
+   TP_STRUCT__entry(
+   __field(u64,status  )
+   __field(int,batch   )
+   __field(u16,bundle  )
+   __field(u16,pgm )
+   ),
+
+   TP_fast_assign(
+   __entry->status = status.data;
+   __entry->batch  = batch;
+   __entry->bundle = activate.bundle_idx;
+   __entry->pgm= activate.pgm_idx;
+   ),
+
+   TP_printk("batch: 0x%.2x, bundle_idx: 0x%.4x, pgm_idx: 0x%.4x, status: 
0x%.16llx",
+   __entry->batch,
+   __entry->bundle,
+   __entry->pgm,
+   __entry->status)
+);
+
 #endif /* _TRACE_IFS_H */
 
 /* This part must be outside protection */
diff --git a/drivers/platform/x86/intel/ifs/runtest.c 
b/drivers/platform/x86/intel/ifs/runtest.c
index a812ec7761b9..2ec9a8c93d04 100644
--- a/drivers/platform/x86/intel/ifs/runtest.c
+++ b/drivers/platform/x86/intel/ifs/runtest.c
@@ -530,6 +530,7 @@ static int dosbaf(void *data)
 */
wrmsrl(MSR_ACTIVATE_SBAF, run_params->activate->data);
rdmsrl(MSR_SBAF_STATUS, status.data);
+   trace_ifs_sbaf(ifsd->cur_batch, *run_params->activate, status);
 
/* Pass back the result of the test */
if (cpu == first)
-- 
2.25.1

[PATCH v2 3/4] platform/x86/intel/ifs: Add SBAF test support

2024-07-10 Thread Kuppuswamy Sathyanarayanan

From: Jithu Joseph 

In a core, the SBAF test engine is shared between sibling CPUs.

An SBAF test image contains multiple bundles. Each bundle is further
composed of subunits called programs. When a SBAF test (for a particular
core) is triggered by the user, each SBAF bundle from the loaded test
image is executed sequentially on all the threads on the core using
the stop_core_cpuslocked mechanism. Each bundle execution is initiated by
writing to MSR_ACTIVATE_SBAF.

SBAF test bundle execution may be aborted when an interrupt occurs or
if the CPU does not have enough power budget for the test. In these
cases the kernel restarts the test from the aborted bundle. SBAF
execution is not retried if the test fails or if the test makes no
forward progress after 5 retries.

Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Signed-off-by: Jithu Joseph 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/platform/x86/intel/ifs/ifs.h |  30 +++
 drivers/platform/x86/intel/ifs/runtest.c | 234 +++
 2 files changed, 264 insertions(+)

diff --git a/drivers/platform/x86/intel/ifs/ifs.h 
b/drivers/platform/x86/intel/ifs/ifs.h
index 600bb8a1b285..b261be46bce8 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -157,6 +157,8 @@
 #define MSR_SBAF_HASHES_STATUS 0x02b9
 #define MSR_AUTHENTICATE_AND_COPY_SBAF_CHUNK   0x02ba
 #define MSR_SBAF_CHUNKS_AUTHENTICATION_STATUS  0x02bb
+#define MSR_ACTIVATE_SBAF  0x02bc
+#define MSR_SBAF_STATUS0x02bd
 
 #define MSR_COPY_SCAN_HASHES   0x02c2
 #define MSR_SCAN_HASHES_STATUS 0x02c3
@@ -283,6 +285,34 @@ union ifs_array {
};
 };
 
+/* MSR_ACTIVATE_SBAF bit fields */
+union ifs_sbaf {
+   u64 data;
+   struct {
+   u32 bundle_idx  :9;
+   u32 rsvd1   :5;
+   u32 pgm_idx :2;
+   u32 rsvd2   :16;
+   u32 delay   :31;
+   u32 sigmce  :1;
+   };
+};
+
+/* MSR_SBAF_STATUS bit fields */
+union ifs_sbaf_status {
+   u64 data;
+   struct {
+   u32 bundle_idx  :9;
+   u32 rsvd1   :5;
+   u32 pgm_idx :2;
+   u32 rsvd2   :16;
+   u32 error_code  :8;
+   u32 rsvd3   :21;
+   u32 test_fail   :1;
+   u32 sbaf_status :2;
+   };
+};
+
 /*
  * Driver populated error-codes
  * 0xFD: Test timed out before completing all the chunks.
diff --git a/drivers/platform/x86/intel/ifs/runtest.c 
b/drivers/platform/x86/intel/ifs/runtest.c
index 282e4bfe30da..a812ec7761b9 100644
--- a/drivers/platform/x86/intel/ifs/runtest.c
+++ b/drivers/platform/x86/intel/ifs/runtest.c
@@ -29,6 +29,13 @@ struct run_params {
union ifs_status status;
 };
 
+struct sbaf_run_params {
+   struct ifs_data *ifsd;
+   int *retry_cnt;
+   union ifs_sbaf *activate;
+   union ifs_sbaf_status status;
+};
+
 /*
  * Number of TSC cycles that a logical CPU will wait for the other
  * logical CPU on the core in the WRMSR(ACTIVATE_SCAN).
@@ -146,6 +153,7 @@ static bool can_restart(union ifs_status status)
 #define SPINUNIT 100 /* 100 nsec */
 static atomic_t array_cpus_in;
 static atomic_t scan_cpus_in;
+static atomic_t sbaf_cpus_in;
 
 /*
  * Simplified cpu sibling rendezvous loop based on microcode loader 
__wait_for_cpus()
@@ -387,6 +395,226 @@ static void ifs_array_test_gen1(int cpu, struct device 
*dev)
ifsd->status = SCAN_TEST_PASS;
 }
 
+#define SBAF_STATUS_PASS   0
+#define SBAF_STATUS_SIGN_FAIL  1
+#define SBAF_STATUS_INTR   2
+#define SBAF_STATUS_TEST_FAIL  3
+
+enum sbaf_status_err_code {
+   IFS_SBAF_NO_ERROR   = 0,
+   IFS_SBAF_OTHER_THREAD_COULD_NOT_JOIN= 1,
+   IFS_SBAF_INTERRUPTED_BEFORE_RENDEZVOUS  = 2,
+   IFS_SBAF_UNASSIGNED_ERROR_CODE3 = 3,
+   IFS_SBAF_INVALID_BUNDLE_INDEX   = 4,
+   IFS_SBAF_MISMATCH_ARGS_BETWEEN_THREADS  = 5,
+   IFS_SBAF_CORE_NOT_CAPABLE_CURRENTLY = 6,
+   IFS_SBAF_UNASSIGNED_ERROR_CODE7 = 7,
+   IFS_SBAF_EXCEED_NUMBER_OF_THREADS_CONCURRENT= 8,
+   IFS_SBAF_INTERRUPTED_DURING_EXECUTION   = 9,
+   IFS_SBAF_INVALID_PROGRAM_INDEX  = 0xA,
+   IFS_SBAF_CORRUPTED_CHUNK= 0xB,
+   IFS_SBAF_DID_NOT_START  = 0xC,
+};
+
+static const char * const sbaf_test_status[] = {
+   [IFS_SBAF_NO_ERROR] = "SBAF no error",
+   [IFS_SBAF_OTHER_THREAD_COULD_NOT_JOIN] = "Other thread could not join.",
+

[PATCH v2 2/4] platform/x86/intel/ifs: Add SBAF test image loading support

2024-07-10 Thread Kuppuswamy Sathyanarayanan

From: Jithu Joseph 

Structural Based Functional Test at Field (SBAF) is a new type of
testing that provides comprehensive core test coverage complementing
existing IFS tests like Scan at Field (SAF) or ArrayBist.

SBAF device will appear as a new device instance (intel_ifs_2) under
/sys/devices/virtual/misc. The user interaction necessary to load the
test image and test a particular core is the same as the existing scan
test (intel_ifs_0).

During the loading stage, the driver will look for a file named
ff-mm-ss-.sbft in the /lib/firmware/intel/ifs_2 directory.
The hardware interaction needed for loading the image is similar to
SAF, with the only difference being the MSR addresses used. Reuse the
SAF image loading code, passing the SBAF-specific MSR addresses via
struct ifs_test_msrs in the driver device data.

Unlike SAF, the SBAF test image chunks are further divided into smaller
logical entities called bundles. Since the SBAF test is initiated per
bundle, cache the maximum number of bundles in the current image, which
is used for iterating through bundles during SBAF test execution.

Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Signed-off-by: Jithu Joseph 
Co-developed-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 arch/x86/include/asm/msr-index.h  |  2 ++
 drivers/platform/x86/intel/ifs/ifs.h  | 37 ++-
 drivers/platform/x86/intel/ifs/core.c | 24 +
 drivers/platform/x86/intel/ifs/load.c | 15 ---
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e022e6eb766c..503d7acdda3f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -247,6 +247,8 @@
 #define MSR_INTEGRITY_CAPS_ARRAY_BIST  
BIT(MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT)
 #define MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT   4
 #define MSR_INTEGRITY_CAPS_PERIODIC_BIST   
BIT(MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT)
+#define MSR_INTEGRITY_CAPS_SBAF_BIT8
+#define MSR_INTEGRITY_CAPS_SBAF
BIT(MSR_INTEGRITY_CAPS_SBAF_BIT)
 #define MSR_INTEGRITY_CAPS_SAF_GEN_MASKGENMASK_ULL(10, 9)
 
 #define MSR_LBR_NHM_FROM   0x0680
diff --git a/drivers/platform/x86/intel/ifs/ifs.h 
b/drivers/platform/x86/intel/ifs/ifs.h
index 738cbc7a5d00..600bb8a1b285 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -126,11 +126,38 @@
  * The driver does not make use of this, it only tests one core at a time.
  *
  * .. [#f1] https://github.com/intel/TBD
+ *
+ *
+ * Structural Based Functional Test at Field (SBAF):
+ * 
+ *
+ * SBAF is a new type of testing that provides comprehensive core test
+ * coverage complementing Scan at Field (SAF) testing. SBAF mimics the
+ * manufacturing screening environment and leverages the same test suite.
+ * It makes use of Design For Test (DFT) observation sites and features
+ * to maximize coverage in minimum time.
+ *
+ * Similar to the SAF test, SBAF isolates the core under test from the
+ * rest of the system during execution. Upon completion, the core
+ * seamlessly resets to its pre-test state and resumes normal operation.
+ * Any machine checks or hangs encountered during the test are confined to
+ * the isolated core, preventing disruption to the overall system.
+ *
+ * Like the SAF test, the SBAF test is also divided into multiple batches,
+ * and each batch test can take hundreds of milliseconds (100-200 ms) to
+ * complete. If such a lengthy interruption is undesirable, it is
+ * recommended to relocate the time-sensitive applications to other cores.
  */
 #include 
 #include 
 
 #define MSR_ARRAY_BIST 0x0105
+
+#define MSR_COPY_SBAF_HASHES   0x02b8
+#define MSR_SBAF_HASHES_STATUS 0x02b9
+#define MSR_AUTHENTICATE_AND_COPY_SBAF_CHUNK   0x02ba
+#define MSR_SBAF_CHUNKS_AUTHENTICATION_STATUS  0x02bb
+
 #define MSR_COPY_SCAN_HASHES   0x02c2
 #define MSR_SCAN_HASHES_STATUS 0x02c3
 #define MSR_AUTHENTICATE_AND_COPY_CHUNK0x02c4
@@ -140,6 +167,7 @@
 #define MSR_ARRAY_TRIGGER  0x02d6
 #define MSR_ARRAY_STATUS   0x02d7
 #define MSR_SAF_CTRL   0x04f0
+#define MSR_SBAF_CTRL  0x04f8
 
 #define SCAN_NOT_TESTED0
 #define SCAN_TEST_PASS 1
@@ -147,6 +175,7 @@
 
 #define IFS_TYPE_SAF   0
 #define IFS_TYPE_ARRAY_BIST1
+#define IFS_TYPE_SBAF  2
 
 #define ARRAY_GEN0 0
 #define ARRAY_GEN1 1
@@ -196,7 +225,8 @@ union ifs_chunks_auth_status_gen2 {
u16 valid_chunks;
u16 total_chunks;

[PATCH v2 1/4] platform/x86/intel/ifs: Refactor MSR usage in IFS test code

2024-07-10 Thread Kuppuswamy Sathyanarayanan

IFS tests such as Scan at Field (SAF) or Structural Based Functional
Test at Field (SBAF), require the user to load a test image. The image
loading process is similar across these tests, with the only difference
being MSR addresses used. To reuse the code between these tests, remove
the hard coding of MSR addresses and allow the driver to pass the MSR
addresses per IFS test (via driver device data).

Add a new structure named "struct ifs_test_msrs" to specify the
test-specific MSR addresses. Each IFS test will provide this structure,
enabling them to reuse the common code.

This is a preliminary patch in preparation for the addition of SBAF
support.

Reviewed-by: Ashok Raj 
Reviewed-by: Tony Luck 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/platform/x86/intel/ifs/ifs.h  | 25 +
 drivers/platform/x86/intel/ifs/core.c |  9 +
 drivers/platform/x86/intel/ifs/load.c | 24 ++--
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/platform/x86/intel/ifs/ifs.h 
b/drivers/platform/x86/intel/ifs/ifs.h
index 56b9f3e3cf76..738cbc7a5d00 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -266,6 +266,22 @@ struct ifs_test_caps {
int test_num;
 };
 
+/**
+ * struct ifs_test_msrs - MSRs used in IFS tests
+ * @copy_hashes: Copy test hash data
+ * @copy_hashes_status: Status of copied test hash data
+ * @copy_chunks: Copy chunks of the test data
+ * @copy_chunks_status: Status of the copied test data chunks
+ * @test_ctrl: Control the test attributes
+ */
+struct ifs_test_msrs {
+   u32 copy_hashes;
+   u32 copy_hashes_status;
+   u32 copy_chunks;
+   u32 copy_chunks_status;
+   u32 test_ctrl;
+};
+
 /**
  * struct ifs_data - attributes related to intel IFS driver
  * @loaded_version: stores the currently loaded ifs image version.
@@ -299,6 +315,7 @@ struct ifs_work {
 
 struct ifs_device {
const struct ifs_test_caps *test_caps;
+   const struct ifs_test_msrs *test_msrs;
struct ifs_data rw_data;
struct miscdevice misc;
 };
@@ -319,6 +336,14 @@ static inline const struct ifs_test_caps 
*ifs_get_test_caps(struct device *dev)
return d->test_caps;
 }
 
+static inline const struct ifs_test_msrs *ifs_get_test_msrs(struct device *dev)
+{
+   struct miscdevice *m = dev_get_drvdata(dev);
+   struct ifs_device *d = container_of(m, struct ifs_device, misc);
+
+   return d->test_msrs;
+}
+
 extern bool *ifs_pkg_auth;
 int ifs_load_firmware(struct device *dev);
 int do_core_test(int cpu, struct device *dev);
diff --git a/drivers/platform/x86/intel/ifs/core.c 
b/drivers/platform/x86/intel/ifs/core.c
index 7b11198d85a1..1a7ca74abb61 100644
--- a/drivers/platform/x86/intel/ifs/core.c
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -40,9 +40,18 @@ static const struct ifs_test_caps array_test = {
.test_num = IFS_TYPE_ARRAY_BIST,
 };
 
+static const struct ifs_test_msrs scan_msrs = {
+   .copy_hashes = MSR_COPY_SCAN_HASHES,
+   .copy_hashes_status = MSR_SCAN_HASHES_STATUS,
+   .copy_chunks = MSR_AUTHENTICATE_AND_COPY_CHUNK,
+   .copy_chunks_status = MSR_CHUNKS_AUTHENTICATION_STATUS,
+   .test_ctrl = MSR_SAF_CTRL,
+};
+
 static struct ifs_device ifs_devices[] = {
[IFS_TYPE_SAF] = {
.test_caps = _test,
+   .test_msrs = _msrs,
.misc = {
.name = "intel_ifs_0",
.minor = MISC_DYNAMIC_MINOR,
diff --git a/drivers/platform/x86/intel/ifs/load.c 
b/drivers/platform/x86/intel/ifs/load.c
index 39f19cb51749..ad0c107f0922 100644
--- a/drivers/platform/x86/intel/ifs/load.c
+++ b/drivers/platform/x86/intel/ifs/load.c
@@ -118,15 +118,17 @@ static void copy_hashes_authenticate_chunks(struct 
work_struct *work)
union ifs_scan_hashes_status hashes_status;
union ifs_chunks_auth_status chunk_status;
struct device *dev = local_work->dev;
+   const struct ifs_test_msrs *msrs;
int i, num_chunks, chunk_size;
struct ifs_data *ifsd;
u64 linear_addr, base;
u32 err_code;
 
ifsd = ifs_get_data(dev);
+   msrs = ifs_get_test_msrs(dev);
/* run scan hash copy */
-   wrmsrl(MSR_COPY_SCAN_HASHES, ifs_hash_ptr);
-   rdmsrl(MSR_SCAN_HASHES_STATUS, hashes_status.data);
+   wrmsrl(msrs->copy_hashes, ifs_hash_ptr);
+   rdmsrl(msrs->copy_hashes_status, hashes_status.data);
 
/* enumerate the scan image information */
num_chunks = hashes_status.num_chunks;
@@ -147,8 +149,8 @@ static void copy_hashes_authenticate_chunks(struct 
work_struct *work)
linear_addr = base + i * chunk_size;
linear_addr |= i;
 
-   wrmsrl(MSR_AUTHENTICATE_AND_COPY_CHUNK, linear_addr);
-   rdmsrl(MSR_CHUNKS_AUTHENTICATION_STATUS, chunk_status.data);
+   wrmsrl(msrs->copy_chunks,

[PATCH v2 0/4] Add SBAF test to IFS

2024-07-10 Thread Kuppuswamy Sathyanarayanan

This patch series adds support for Structural Based Functional Test at
Field (SBAF) in the IFS driver. SBAF is a new type of testing that
provides comprehensive core test coverage, complementing existing IFS
tests like Scan at Field (SAF) and ArrayBist. Granite Rapids (GNR) is
the first platform that supports SBAF.

SBAF mimics the manufacturing screening environment and leverages the
same test suite. It makes use of Design For Test (DFT) observation
sites and features to maximize coverage in minimum time.

Similar to the SAF test, SBAF isolates the core under test from the
rest of the system during execution. Upon completion, the core
seamlessly resets to its pre-test state and resumes normal operation.
Any machine checks or hangs encountered during the test are confined to
the isolated core, preventing disruption to the overall system. Like
SAF test, the SBAF test is also divided into multiple batches, and each
batch test can take hundreds of milliseconds (100-200 ms) to complete.
If such a lengthy interruption is undesirable, it is recommended to
relocate the time-sensitive applications to other cores for the
duration of the test.

Patch Details:

Patch 1/4: Refactors MSR usage in IFS image loading code to share the
   code between SBAF and SAF tests.
Patch 2/4: Leverages SAF image loading logic and adds SBAF image loading 
support.
Patch 3/4: Adds support for user to trigger SBAF test.
Patch 4/4: Adds trace support for SBAF tests.

This series is originally authored by Jithu Joseph. I have made cleanups
related to code reuse between the SBAF and SAF tests and resubmitting it for
review.

Changes since v1:
 * Addressed trace struct hole issue (Steven)
 * Fixed initialization issue in ifs_sbaf_test_core() (Ilpo)

Jithu Joseph (3):
  platform/x86/intel/ifs: Add SBAF test image loading support
  platform/x86/intel/ifs: Add SBAF test support
  trace: platform/x86/intel/ifs: Add SBAF trace support

Kuppuswamy Sathyanarayanan (1):
  platform/x86/intel/ifs: Refactor MSR usage in IFS test code

 arch/x86/include/asm/msr-index.h |   2 +
 drivers/platform/x86/intel/ifs/ifs.h |  92 -
 include/trace/events/intel_ifs.h |  27 +++
 drivers/platform/x86/intel/ifs/core.c|  33 
 drivers/platform/x86/intel/ifs/load.c|  39 ++--
 drivers/platform/x86/intel/ifs/runtest.c | 235 +++
 6 files changed, 413 insertions(+), 15 deletions(-)

-- 
2.25.1

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 03:54:22PM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 1:39 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote:
> > > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > virtio balloon communicates to the core that in some
> > > > > > configurations vq #s are non-contiguous by setting name
> > > > > > pointer to NULL.
> > > > > >
> > > > > > Unfortunately, core then turned around and just made them
> > > > > > contiguous again. Result is that driver is out of spec.
> > > > >
> > > > > Thanks for fixing this - I think the overall approach of the patch 
> > > > > looks good.
> > > > >
> > > > > > Implement what the API was supposed to do
> > > > > > in the 1st place. Compatibility with buggy hypervisors
> > > > > > is handled inside virtio-balloon, which is the only driver
> > > > > > making use of this facility, so far.
> > > > >
> > > > > In addition to virtio-balloon, I believe the same problem also affects
> > > > > the virtio-fs device, since queue 1 is only supposed to be present if
> > > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > > > > meant to be queue indexes 2 and up. From a look at the Linux driver
> > > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > > > > and assumes that request queues start at index 1 rather than 2, which
> > > > > looks out of spec to me, but the current device implementations (that
> > > > > I am aware of, anyway) are also broken in the same way, so it ends up
> > > > > working today. Queue numbering in a spec-compliant device and the
> > > > > current Linux driver would mismatch; what the driver considers to be
> > > > > the first request queue (index 1) would be ignored by the device since
> > > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> > > >
> > > >
> > > > Oh, thanks a lot for pointing this out!
> > > >
> > > > I see so this patch is no good as is, we need to add a workaround for
> > > > virtio-fs first.
> > > >
> > > > QEMU workaround is simple - just add an extra queue. But I did not
> > > > reasearch how this would interact with vhost-user.
> > > >
> > > > From driver POV, I guess we could just ignore queue # 1 - would that be
> > > > ok or does it have performance implications?
> > >
> > > As a driver workaround for non-compliant devices, I think ignoring the
> > > first request queue would be a reasonable approach if the device's
> > > config advertises num_request_queues > 1. Unfortunately, both
> > > virtiofsd and crosvm's virtio-fs device have hard-coded
> > > num_request_queues =1, so this won't help with those existing devices.
> >
> > Do they care what the vq # is though?
> > We could do some magic to translate VQ #s in qemu.
> >
> >
> > > Maybe there are other devices that we would need to consider as well;
> > > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
> > > benchmarks that seem to be from a different virtio-fs implementation
> > > that does support multiple request queues, so the workaround could
> > > possibly be used there.
> > >
> > > > Or do what I did for balloon here: try with spec compliant #s first,
> > > > if that fails then assume it's the spec issue and shift by 1.
> > >
> > > If there is a way to "guess and check" without breaking spec-compliant
> > > devices, that sounds reasonable too; however, I'm not sure how this
> > > would work out in practice: an existing non-compliant device may fail
> > > to start if the driver tries to enable queue index 2 when it only
> > > supports one request queue,
> >
> > You don't try to enable queue - driver starts by checking queue size.
> > The way my patch works is that it assumes a non existing queue has
> > size 0 if not available.
> >
> > This was actually a documented way to check for PCI and MMIO:
> > Read the virtqueue size from queue_size. This controls how big the 
> > virtqueue is (see 2.6 Virtqueues).
> > If this field is 0, the virtqueue does not exist.
> > MMIO:
> > If the returned value is zero (0x0) the queue is not available.
> >
> > unfortunately not for CCW, but I guess CCW implementations outside
> > of QEMU are uncommon enough that we can assume it's the same?
> >
> >
> > To me the above is also a big hint that drivers are allowed to
> > query size for queues that do not exist.
> 
> Ah, that makes total sense - detecting queue presence by non-zero
> queue size sounds good to me, and it should work in the normal virtio
> device case.
> 
> I am not sure about vhost-user, since there is no way for the
> front-end to ask the back-end for a queue's size; the confusingly
> named VHOST_USER_SET_VRING_NUM allows the front-end to configure the
> size of a queue, but there's no

Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

2024-07-10 Thread Naik, Avadhut




On 7/10/2024 04:38, Borislav Petkov wrote:
> On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> 
>> Userspace error decoding tools like the rasdaemon gather related hardware 
>> error
>> information through the tracepoints. As such, its important to have these two
>> registers in the tracepoint so that the tools like rasdaemon can parse them
>> and output the supplemental error information like FRU Text contained in 
>> them.
> 
> Put *that* in the commit message - do not explain what the patch does.
> 
Will do.
 
>> With this set, the first two elements of the vendor data dynamic array are 
>> SYND 1/2
>> registers while the third element is MCA_CONFIG (added through patch 4 of 
>> the set).
>> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are 
>> interpreted
>> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
>>
>> Thus, we depend on array's layout for accurate FRU Text decoding in the 
>> rasdaemon.
> 
> So it sounds to me like we want to document and thus freeze the
> vendor-specific blob layout because tools are going to be using and parsing
> it. And this will spare us the kernel version checks.
> 
> And new additions to that AMD-specific blob will come at the end and will
> have to be documented too.
> 
> That sounds like an ok compromise to me...
> 
> Thx.
>
Sounds good!
Is it okay to document this where the new wrapper and vendor-specific data
structures are being defined, in arch/x86/include/asm/mce.h?
Similar approach has been taken for struct mce.
Or do you have any other recommendations?

-- 
Thanks,
Avadhut Naik

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Daniel Verkamp

On Wed, Jul 10, 2024 at 1:39 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote:
> > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > virtio balloon communicates to the core that in some
> > > > > configurations vq #s are non-contiguous by setting name
> > > > > pointer to NULL.
> > > > >
> > > > > Unfortunately, core then turned around and just made them
> > > > > contiguous again. Result is that driver is out of spec.
> > > >
> > > > Thanks for fixing this - I think the overall approach of the patch 
> > > > looks good.
> > > >
> > > > > Implement what the API was supposed to do
> > > > > in the 1st place. Compatibility with buggy hypervisors
> > > > > is handled inside virtio-balloon, which is the only driver
> > > > > making use of this facility, so far.
> > > >
> > > > In addition to virtio-balloon, I believe the same problem also affects
> > > > the virtio-fs device, since queue 1 is only supposed to be present if
> > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > > > meant to be queue indexes 2 and up. From a look at the Linux driver
> > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > > > and assumes that request queues start at index 1 rather than 2, which
> > > > looks out of spec to me, but the current device implementations (that
> > > > I am aware of, anyway) are also broken in the same way, so it ends up
> > > > working today. Queue numbering in a spec-compliant device and the
> > > > current Linux driver would mismatch; what the driver considers to be
> > > > the first request queue (index 1) would be ignored by the device since
> > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> > >
> > >
> > > Oh, thanks a lot for pointing this out!
> > >
> > > I see so this patch is no good as is, we need to add a workaround for
> > > virtio-fs first.
> > >
> > > QEMU workaround is simple - just add an extra queue. But I did not
> > > reasearch how this would interact with vhost-user.
> > >
> > > From driver POV, I guess we could just ignore queue # 1 - would that be
> > > ok or does it have performance implications?
> >
> > As a driver workaround for non-compliant devices, I think ignoring the
> > first request queue would be a reasonable approach if the device's
> > config advertises num_request_queues > 1. Unfortunately, both
> > virtiofsd and crosvm's virtio-fs device have hard-coded
> > num_request_queues =1, so this won't help with those existing devices.
>
> Do they care what the vq # is though?
> We could do some magic to translate VQ #s in qemu.
>
>
> > Maybe there are other devices that we would need to consider as well;
> > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
> > benchmarks that seem to be from a different virtio-fs implementation
> > that does support multiple request queues, so the workaround could
> > possibly be used there.
> >
> > > Or do what I did for balloon here: try with spec compliant #s first,
> > > if that fails then assume it's the spec issue and shift by 1.
> >
> > If there is a way to "guess and check" without breaking spec-compliant
> > devices, that sounds reasonable too; however, I'm not sure how this
> > would work out in practice: an existing non-compliant device may fail
> > to start if the driver tries to enable queue index 2 when it only
> > supports one request queue,
>
> You don't try to enable queue - driver starts by checking queue size.
> The way my patch works is that it assumes a non existing queue has
> size 0 if not available.
>
> This was actually a documented way to check for PCI and MMIO:
> Read the virtqueue size from queue_size. This controls how big the 
> virtqueue is (see 2.6 Virtqueues).
> If this field is 0, the virtqueue does not exist.
> MMIO:
> If the returned value is zero (0x0) the queue is not available.
>
> unfortunately not for CCW, but I guess CCW implementations outside
> of QEMU are uncommon enough that we can assume it's the same?
>
>
> To me the above is also a big hint that drivers are allowed to
> query size for queues that do not exist.

Ah, that makes total sense - detecting queue presence by non-zero
queue size sounds good to me, and it should work in the normal virtio
device case.

I am not sure about vhost-user, since there is no way for the
front-end to ask the back-end for a queue's size; the confusingly
named VHOST_USER_SET_VRING_NUM allows the front-end to configure the
size of a queue, but there's no corresponding GET message.

> > and a spec-compliant device would probably
> > balk if the driver tries to enable queue 1 but does not negotiate
> > VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the
> > whole virtio device initialization process if a device

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 1:18 PM Oleg Nesterov  wrote:
>
> On 07/10, Andrii Nakryiko wrote:
> >
> > On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov  wrote:
> > >
> > > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() 
> > > +
> > > put_uprobe(). And to me this change simplifies the code a bit.
> > >
> > > Signed-off-by: Oleg Nesterov 
> > > ---
> > >  include/linux/uprobes.h | 14 ++--
> > >  kernel/events/uprobes.c | 45 -
> > >  kernel/trace/bpf_trace.c| 12 +-
> > >  kernel/trace/trace_uprobe.c | 28 +++
> > >  4 files changed, 41 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > > index aa89a8b67039..399509befcf4 100644
> > > --- a/include/linux/uprobes.h
> > > +++ b/include/linux/uprobes.h
> >
> > I don't see struct uprobe forward-declared in this header, maybe we
> > should add it?
>
> Probably yes, thanks... Although the current code already uses
> struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y.
> Thanks, will add.
>

Yep, I saw that and was wondering as well.

> > >  static inline int
> > > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer 
> > > *uc, bool add)
> > > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
> > >  {
> > > return -ENOSYS;
> > >  }
> >
> > complete aside, when I was looking at this code I was wondering why we
> > even need uprobe_apply, it looks like some hacky variant of
> > uprobe_register and uprobe_unregister.
>
> All I can say is that
>
> - I can hardly recall this logic, I'll try to do this tomorrow
>   and write another email
>
> - in any case this logic is ugly and needs more cleanups
>
> - but this patch only tries to simplify this code without any
>   visible changes.

yep, that's why it's an aside, up to you

>
> > > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
> > >   * refcount is released when the last @uc for the @uprobe
> > >   * unregisters. Caller of uprobe_register() is required to keep @inode
> > >   * (and the containing mount) referenced.
> > > - *
> > > - * Return errno if it cannot successully install probes
> > > - * else return 0 (success)
> >
> > mention that it never returns NULL, but rather encodes error code
> > inside the pointer on error? It's an important part of the contract.
>
> OK...
>
> > >  /*
> >
> > this should be /** for doccomment checking (you'd get a warning for
> > missing @uprobe if there was this extra star)
>
> Well, this is what we have before this patch, but OK
>
> > >   * uprobe_apply - unregister an already registered probe.
> > > - * @inode: the file in which the probe has to be removed.
> > > - * @offset: offset from the start of the file.
> >
> > add @uprobe description now?
>
> If only I knew what this @uprobe description can say ;)

I'm pointing this out because I accidentally used /** for comment for
some function, and I got some bot report about missing argument. I
think /** makes sense for documenting "public API" function, so which
is why all the above.

>
> > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path 
> > > *path, struct bpf_uprobe *uprobes,
> > >  {
> > > u32 i;
> > >
> > > -   for (i = 0; i < cnt; i++) {
> > > -   uprobe_unregister(d_real_inode(path->dentry), 
> > > uprobes[i].offset,
> > > - [i].consumer);
> > > -   }
> > > +   for (i = 0; i < cnt; i++)
> >
> > you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL
> > check if you null-out it below)
>
> Hmm... are you sure? I'll re-check... See also the end of my email.

no, you are right, it should be fine

>
> > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union 
> > > bpf_attr *attr, struct bpf_prog *pr
> > >   _uprobe_multi_link_lops, prog);
> > >
> > > for (i = 0; i < cnt; i++) {
> > > -   err = uprobe_register(d_real_inode(link->path.dentry),
> > > +   uprobes[i].uprobe = 
> > > uprobe_register(d_real_inode(link->path.dentry),
> >
> > will uprobe keep inode alive as long as uprobe is attached?
>
> Why? No, this patch doesn't (shouldn't) change the current behaviour.
>
> The users still need kern_path/path_put to, say, prevent umount.

ok, then link->path has to stay, I was just wondering if we need to
keep it alive

>
> > we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle
> > only NULL vs non-NULL case
>
> Not sure I understand...
>
> > or maybe better yet let's just have local struct uprobe variable and
> > only assign it if registration succeeded
>
> We can, but
>
> > (still need NULL check in
> > bpf_uprobe_unregister above)
>
> Why?
>
> If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on
> error, it passes cnt = i where is the 1st

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > virtio balloon communicates to the core that in some
> > > > configurations vq #s are non-contiguous by setting name
> > > > pointer to NULL.
> > > >
> > > > Unfortunately, core then turned around and just made them
> > > > contiguous again. Result is that driver is out of spec.
> > >
> > > Thanks for fixing this - I think the overall approach of the patch looks 
> > > good.
> > >
> > > > Implement what the API was supposed to do
> > > > in the 1st place. Compatibility with buggy hypervisors
> > > > is handled inside virtio-balloon, which is the only driver
> > > > making use of this facility, so far.
> > >
> > > In addition to virtio-balloon, I believe the same problem also affects
> > > the virtio-fs device, since queue 1 is only supposed to be present if
> > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > > meant to be queue indexes 2 and up. From a look at the Linux driver
> > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > > and assumes that request queues start at index 1 rather than 2, which
> > > looks out of spec to me, but the current device implementations (that
> > > I am aware of, anyway) are also broken in the same way, so it ends up
> > > working today. Queue numbering in a spec-compliant device and the
> > > current Linux driver would mismatch; what the driver considers to be
> > > the first request queue (index 1) would be ignored by the device since
> > > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> >
> >
> > Oh, thanks a lot for pointing this out!
> >
> > I see so this patch is no good as is, we need to add a workaround for
> > virtio-fs first.
> >
> > QEMU workaround is simple - just add an extra queue. But I did not
> > reasearch how this would interact with vhost-user.
> >
> > From driver POV, I guess we could just ignore queue # 1 - would that be
> > ok or does it have performance implications?
> 
> As a driver workaround for non-compliant devices, I think ignoring the
> first request queue would be a reasonable approach if the device's
> config advertises num_request_queues > 1. Unfortunately, both
> virtiofsd and crosvm's virtio-fs device have hard-coded
> num_request_queues =1, so this won't help with those existing devices.

Do they care what the vq # is though?
We could do some magic to translate VQ #s in qemu.


> Maybe there are other devices that we would need to consider as well;
> commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
> benchmarks that seem to be from a different virtio-fs implementation
> that does support multiple request queues, so the workaround could
> possibly be used there.
> 
> > Or do what I did for balloon here: try with spec compliant #s first,
> > if that fails then assume it's the spec issue and shift by 1.
> 
> If there is a way to "guess and check" without breaking spec-compliant
> devices, that sounds reasonable too; however, I'm not sure how this
> would work out in practice: an existing non-compliant device may fail
> to start if the driver tries to enable queue index 2 when it only
> supports one request queue,

You don't try to enable queue - driver starts by checking queue size.
The way my patch works is that it assumes a non existing queue has
size 0 if not available.

This was actually a documented way to check for PCI and MMIO:
Read the virtqueue size from queue_size. This controls how big the 
virtqueue is (see 2.6 Virtqueues).
If this field is 0, the virtqueue does not exist.
MMIO:
If the returned value is zero (0x0) the queue is not available.

unfortunately not for CCW, but I guess CCW implementations outside
of QEMU are uncommon enough that we can assume it's the same?


To me the above is also a big hint that drivers are allowed to
query size for queues that do not exist.



> and a spec-compliant device would probably
> balk if the driver tries to enable queue 1 but does not negotiate
> VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the
> whole virtio device initialization process if a device fails like
> this, then maybe it's feasible. (Or can the driver tweak the virtqueue
> configuration and try to set DRIVER_OK repeatedly until it works? It's
> not clear to me if this is allowed by the spec, or what device
> implementations actually do in practice in this scenario.)
> 
> Thanks,
> -- Daniel

My patch starts with a spec compliant behaviour. If that fails,
try non-compliant one as a fallback.

-- 
MST

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Oleg Nesterov

On 07/10, Andrii Nakryiko wrote:
>
> On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov  wrote:
> >
> > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
> > put_uprobe(). And to me this change simplifies the code a bit.
> >
> > Signed-off-by: Oleg Nesterov 
> > ---
> >  include/linux/uprobes.h | 14 ++--
> >  kernel/events/uprobes.c | 45 -
> >  kernel/trace/bpf_trace.c| 12 +-
> >  kernel/trace/trace_uprobe.c | 28 +++
> >  4 files changed, 41 insertions(+), 58 deletions(-)
> >
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index aa89a8b67039..399509befcf4 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
>
> I don't see struct uprobe forward-declared in this header, maybe we
> should add it?

Probably yes, thanks... Although the current code already uses
struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y.
Thanks, will add.

> >  static inline int
> > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer 
> > *uc, bool add)
> > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
> >  {
> > return -ENOSYS;
> >  }
>
> complete aside, when I was looking at this code I was wondering why we
> even need uprobe_apply, it looks like some hacky variant of
> uprobe_register and uprobe_unregister.

All I can say is that

- I can hardly recall this logic, I'll try to do this tomorrow
  and write another email

- in any case this logic is ugly and needs more cleanups

- but this patch only tries to simplify this code without any
  visible changes.

> > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
> >   * refcount is released when the last @uc for the @uprobe
> >   * unregisters. Caller of uprobe_register() is required to keep @inode
> >   * (and the containing mount) referenced.
> > - *
> > - * Return errno if it cannot successully install probes
> > - * else return 0 (success)
>
> mention that it never returns NULL, but rather encodes error code
> inside the pointer on error? It's an important part of the contract.

OK...

> >  /*
>
> this should be /** for doccomment checking (you'd get a warning for
> missing @uprobe if there was this extra star)

Well, this is what we have before this patch, but OK

> >   * uprobe_apply - unregister an already registered probe.
> > - * @inode: the file in which the probe has to be removed.
> > - * @offset: offset from the start of the file.
>
> add @uprobe description now?

If only I knew what this @uprobe description can say ;)

> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, 
> > struct bpf_uprobe *uprobes,
> >  {
> > u32 i;
> >
> > -   for (i = 0; i < cnt; i++) {
> > -   uprobe_unregister(d_real_inode(path->dentry), 
> > uprobes[i].offset,
> > - [i].consumer);
> > -   }
> > +   for (i = 0; i < cnt; i++)
>
> you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL
> check if you null-out it below)

Hmm... are you sure? I'll re-check... See also the end of my email.

> > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union 
> > bpf_attr *attr, struct bpf_prog *pr
> >   _uprobe_multi_link_lops, prog);
> >
> > for (i = 0; i < cnt; i++) {
> > -   err = uprobe_register(d_real_inode(link->path.dentry),
> > +   uprobes[i].uprobe = 
> > uprobe_register(d_real_inode(link->path.dentry),
>
> will uprobe keep inode alive as long as uprobe is attached?

Why? No, this patch doesn't (shouldn't) change the current behaviour.

The users still need kern_path/path_put to, say, prevent umount.

> we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle
> only NULL vs non-NULL case

Not sure I understand...

> or maybe better yet let's just have local struct uprobe variable and
> only assign it if registration succeeded

We can, but

> (still need NULL check in
> bpf_uprobe_unregister above)

Why?

If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on
error, it passes cnt = i where is the 1st failed uprobe index. IOW,
uptobes[i].uprobe can't be IS_ERR_OR_NULL() in the 0..cnt-1 range.

If it is called by bpf_uprobe_multi_link_release() then the whole
0..umulti_link->cnt-1 range should be fine?

Oleg.

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Daniel Verkamp

On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
> > >
> > > virtio balloon communicates to the core that in some
> > > configurations vq #s are non-contiguous by setting name
> > > pointer to NULL.
> > >
> > > Unfortunately, core then turned around and just made them
> > > contiguous again. Result is that driver is out of spec.
> >
> > Thanks for fixing this - I think the overall approach of the patch looks 
> > good.
> >
> > > Implement what the API was supposed to do
> > > in the 1st place. Compatibility with buggy hypervisors
> > > is handled inside virtio-balloon, which is the only driver
> > > making use of this facility, so far.
> >
> > In addition to virtio-balloon, I believe the same problem also affects
> > the virtio-fs device, since queue 1 is only supposed to be present if
> > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> > meant to be queue indexes 2 and up. From a look at the Linux driver
> > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> > and assumes that request queues start at index 1 rather than 2, which
> > looks out of spec to me, but the current device implementations (that
> > I am aware of, anyway) are also broken in the same way, so it ends up
> > working today. Queue numbering in a spec-compliant device and the
> > current Linux driver would mismatch; what the driver considers to be
> > the first request queue (index 1) would be ignored by the device since
> > queue index 1 has no function if F_NOTIFICATION isn't negotiated.
>
>
> Oh, thanks a lot for pointing this out!
>
> I see so this patch is no good as is, we need to add a workaround for
> virtio-fs first.
>
> QEMU workaround is simple - just add an extra queue. But I did not
> reasearch how this would interact with vhost-user.
>
> From driver POV, I guess we could just ignore queue # 1 - would that be
> ok or does it have performance implications?

As a driver workaround for non-compliant devices, I think ignoring the
first request queue would be a reasonable approach if the device's
config advertises num_request_queues > 1. Unfortunately, both
virtiofsd and crosvm's virtio-fs device have hard-coded
num_request_queues =1, so this won't help with those existing devices.
Maybe there are other devices that we would need to consider as well;
commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes
benchmarks that seem to be from a different virtio-fs implementation
that does support multiple request queues, so the workaround could
possibly be used there.

> Or do what I did for balloon here: try with spec compliant #s first,
> if that fails then assume it's the spec issue and shift by 1.

If there is a way to "guess and check" without breaking spec-compliant
devices, that sounds reasonable too; however, I'm not sure how this
would work out in practice: an existing non-compliant device may fail
to start if the driver tries to enable queue index 2 when it only
supports one request queue, and a spec-compliant device would probably
balk if the driver tries to enable queue 1 but does not negotiate
VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the
whole virtio device initialization process if a device fails like
this, then maybe it's feasible. (Or can the driver tweak the virtqueue
configuration and try to set DRIVER_OK repeatedly until it works? It's
not clear to me if this is allowed by the spec, or what device
implementations actually do in practice in this scenario.)

Thanks,
-- Daniel

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 12:38 PM Jiri Olsa  wrote:
>
> On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa  wrote:
> > >
> > > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
> > >
> > > SNIP
> > >
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index 467f358c8ce7..7571811127a2 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> > > >   loff_t offset;
> > > >   unsigned long ref_ctr_offset;
> > > >   u64 cookie;
> > > > + struct uprobe *uprobe;
> > > >   struct uprobe_consumer consumer;
> > > >  };
> > > >
> > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path 
> > > > *path, struct bpf_uprobe *uprobes,
> > > >  {
> > > >   u32 i;
> > > >
> > > > - for (i = 0; i < cnt; i++) {
> > > > - uprobe_unregister(d_real_inode(path->dentry), 
> > > > uprobes[i].offset,
> > > > -   [i].consumer);
> > > > - }
> > >
> > > nice, we could also drop path argument now
> >
> > see my comments to Oleg, I think we can/should get rid of link->path
> > altogether if uprobe itself keeps inode alive.
>
> yea, I was thinking of that, but then it's kind of useful to have it in
> bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from
> first uprobe in the link, but ok, still probably worth to remove it ;-)

if we need it for link_info, probably cleaner to just keep it, no big deal then

>
> anyway as you wrote it's ok for follow up cleanup, I'll check on that
>
> >
> > BTW, Jiri, do we have any test for multi-uprobe that simulates partial
> > attachment success/failure (whichever way you want to look at it). It
> > would be super useful to have to check at least some error handling
> > code in the uprobe code base. If we don't, do you mind adding
> > something simple to BPF selftests?
>
> there's test_attach_api_fails, but I think all checked fails are before
> actually calling uprobe_register function
>
> I think there are few ways to fail the uprobe_register, like install it
> on top of int3.. will check add some test for that
>

great, thank you!

> jirka
>
> >
> > >
> > > jirka
> > >
> > > > + for (i = 0; i < cnt; i++)
> > > > + uprobe_unregister(uprobes[i].uprobe, 
> > > > [i].consumer);
> > > >  }
> > > >
> > > >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union 
> > > > bpf_attr *attr, struct bpf_prog *pr
> > > > _uprobe_multi_link_lops, prog);
> > > >
> > > >   for (i = 0; i < cnt; i++) {
> > > > - err = uprobe_register(d_real_inode(link->path.dentry),
> > > > + uprobes[i].uprobe = 
> > > > uprobe_register(d_real_inode(link->path.dentry),
> > > >uprobes[i].offset,
> > > >uprobes[i].ref_ctr_offset,
> > > >[i].consumer);
> > > > - if (err) {
> > > > + if (IS_ERR(uprobes[i].uprobe)) {
> > > > + err = PTR_ERR(uprobes[i].uprobe);
> > > >   bpf_uprobe_unregister(, uprobes, i);
> > > >   goto error_free;
> > > >   }

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Jiri Olsa

On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa  wrote:
> >
> > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
> >
> > SNIP
> >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 467f358c8ce7..7571811127a2 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> > >   loff_t offset;
> > >   unsigned long ref_ctr_offset;
> > >   u64 cookie;
> > > + struct uprobe *uprobe;
> > >   struct uprobe_consumer consumer;
> > >  };
> > >
> > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path 
> > > *path, struct bpf_uprobe *uprobes,
> > >  {
> > >   u32 i;
> > >
> > > - for (i = 0; i < cnt; i++) {
> > > - uprobe_unregister(d_real_inode(path->dentry), 
> > > uprobes[i].offset,
> > > -   [i].consumer);
> > > - }
> >
> > nice, we could also drop path argument now
> 
> see my comments to Oleg, I think we can/should get rid of link->path
> altogether if uprobe itself keeps inode alive.

yea, I was thinking of that, but then it's kind of useful to have it in
bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from
first uprobe in the link, but ok, still probably worth to remove it ;-)

anyway as you wrote it's ok for follow up cleanup, I'll check on that

> 
> BTW, Jiri, do we have any test for multi-uprobe that simulates partial
> attachment success/failure (whichever way you want to look at it). It
> would be super useful to have to check at least some error handling
> code in the uprobe code base. If we don't, do you mind adding
> something simple to BPF selftests?

there's test_attach_api_fails, but I think all checked fails are before
actually calling uprobe_register function

I think there are few ways to fail the uprobe_register, like install it
on top of int3.. will check add some test for that

jirka

> 
> >
> > jirka
> >
> > > + for (i = 0; i < cnt; i++)
> > > + uprobe_unregister(uprobes[i].uprobe, [i].consumer);
> > >  }
> > >
> > >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union 
> > > bpf_attr *attr, struct bpf_prog *pr
> > > _uprobe_multi_link_lops, prog);
> > >
> > >   for (i = 0; i < cnt; i++) {
> > > - err = uprobe_register(d_real_inode(link->path.dentry),
> > > + uprobes[i].uprobe = 
> > > uprobe_register(d_real_inode(link->path.dentry),
> > >uprobes[i].offset,
> > >uprobes[i].ref_ctr_offset,
> > >[i].consumer);
> > > - if (err) {
> > > + if (IS_ERR(uprobes[i].uprobe)) {
> > > + err = PTR_ERR(uprobes[i].uprobe);
> > >   bpf_uprobe_unregister(, uprobes, i);
> > >   goto error_free;
> > >   }

[PATCH v5] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-07-10 Thread Andrii Nakryiko

When tracing user functions with uprobe functionality, it's common to
install the probe (e.g., a BPF program) at the first instruction of the
function. This is often going to be `push %rbp` instruction in function
preamble, which means that within that function frame pointer hasn't
been established yet. This leads to consistently missing an actual
caller of the traced function, because perf_callchain_user() only
records current IP (capturing traced function) and then following frame
pointer chain (which would be caller's frame, containing the address of
caller's caller).

So when we have target_1 -> target_2 -> target_3 call chain and we are
tracing an entry to target_3, captured stack trace will report
target_1 -> target_3 call chain, which is wrong and confusing.

This patch proposes a x86-64-specific heuristic to detect `push %rbp`
(`push %ebp` on 32-bit architecture) instruction being traced. Given
entire kernel implementation of user space stack trace capturing works
under assumption that user space code was compiled with frame pointer
register (%rbp/%ebp) preservation, it seems pretty reasonable to use
this instruction as a strong indicator that this is the entry to the
function. In that case, return address is still pointed to by %rsp/%esp,
so we fetch it and add to stack trace before proceeding to unwind the
rest using frame pointer-based logic.

We also check for `endbr64` (for 64-bit modes) as another common pattern
for function entry, as suggested by Josh Poimboeuf. Even if we get this
wrong sometimes for uprobes attached not at the function entry, it's OK
because stack trace will still be overall meaningful, just with one
extra bogus entry. If we don't detect this, we end up with guaranteed to
be missing caller function entry in the stack trace, which is worse
overall.

Signed-off-by: Andrii Nakryiko 
---
 arch/x86/events/core.c  | 63 +
 include/linux/uprobes.h |  2 ++
 kernel/events/uprobes.c |  2 ++
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..780b8dc36f05 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -41,6 +41,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "perf_event.h"
 
@@ -2813,6 +2815,46 @@ static unsigned long get_segment_base(unsigned int 
segment)
return get_desc_base(desc);
 }
 
+#ifdef CONFIG_UPROBES
+/*
+ * Heuristic-based check if uprobe is installed at the function entry.
+ *
+ * Under assumption of user code being compiled with frame pointers,
+ * `push %rbp/%ebp` is a good indicator that we indeed are.
+ *
+ * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
+ * If we get this wrong, captured stack trace might have one extra bogus
+ * entry, but the rest of stack trace will still be meaningful.
+ */
+static bool is_uprobe_at_func_entry(struct pt_regs *regs)
+{
+   struct arch_uprobe *auprobe;
+
+   if (!current->utask)
+   return false;
+
+   auprobe = current->utask->auprobe;
+   if (!auprobe)
+   return false;
+
+   /* push %rbp/%ebp */
+   if (auprobe->insn[0] == 0x55)
+   return true;
+
+   /* endbr64 (64-bit only) */
+   if (user_64bit_mode(regs) && is_endbr(*(u32 *)auprobe->insn))
+   return true;
+
+   return false;
+}
+
+#else
+static bool is_uprobe_at_func_entry(struct pt_regs *regs)
+{
+   return false;
+}
+#endif /* CONFIG_UPROBES */
+
 #ifdef CONFIG_IA32_EMULATION
 
 #include 
@@ -2824,6 +2866,7 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry_ctx *ent
unsigned long ss_base, cs_base;
struct stack_frame_ia32 frame;
const struct stack_frame_ia32 __user *fp;
+   u32 ret_addr;
 
if (user_64bit_mode(regs))
return 0;
@@ -2833,6 +2876,12 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry_ctx *ent
 
fp = compat_ptr(ss_base + regs->bp);
pagefault_disable();
+
+   /* see perf_callchain_user() below for why we do this */
+   if (is_uprobe_at_func_entry(regs) &&
+   !get_user(ret_addr, (const u32 __user *)regs->sp))
+   perf_callchain_store(entry, ret_addr);
+
while (entry->nr < entry->max_stack) {
if (!valid_user_frame(fp, sizeof(frame)))
break;
@@ -2861,6 +2910,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *regs
 {
struct stack_frame frame;
const struct stack_frame __user *fp;
+   unsigned long ret_addr;
 
if (perf_guest_state()) {
/* TODO: We don't support guest os callchain now */
@@ -2884,6 +2934,19 @@ perf_callchain_user(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *regs
return;
 
pagefault_disable();
+
+   /*
+* If we are called from uprobe handler, and we are indeed at the very
+

Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()

2024-07-10 Thread Oleg Nesterov

On 07/10, Andrii Nakryiko wrote:
>
> LGTM with few nits below.
>
> Acked-by: Andrii Nakryiko 

Thanks for looking at this.

> > @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr 
> > *attr, struct bpf_prog *pr
> >   _uprobe_multi_link_lops, prog);
> >
> > for (i = 0; i < cnt; i++) {
> > -   err = 
> > uprobe_register_refctr(d_real_inode(link->path.dentry),
> > +   err = uprobe_register(d_real_inode(link->path.dentry),
> >  uprobes[i].offset,
> >  uprobes[i].ref_ctr_offset,
> >  [i].consumer);
>
> please adjust indentation here

OK,

> > -   if (tu->ref_ctr_offset)
> > -   ret = uprobe_register_refctr(tu->inode, tu->offset,
> > -   tu->ref_ctr_offset, >consumer);
> > -   else
> > -   ret = uprobe_register(tu->inode, tu->offset, >consumer);
> > -
> > +   ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
> > + >consumer);
>
> doesn't fit under 100 characters? If it does, please keep as a single line.

OK, will do.

Oleg.

Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 9:24 AM Josh Poimboeuf  wrote:
>
> On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra  wrote:
> > > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > > > You can see it replaced the first byte, the following 3 bytes are
> > > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > > > you can see I stepped through a few more instructions.
> > > >
> > > > Works by accident?
> > >
> > > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> > > that supports it.
> >
> > OK, I don't know what the implications are, but it's a good accident :)
> >
> > Anyways, what should I do for v4? Drop is_endbr6() check or keep it?
>
> Given the current behavior of uprobe overwriting ENDBR64 with INT3, the
> is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry()
> would never return true on OSes which have the ENDBR64 compiled in.
>
> However, once userspace IBT actually gets enabled, uprobe should skip
> the ENDBR64 and patch the subsequent instruction.  Then the is_endbr6()
> check would no longer be needed.
>

Ok, I'll keep it then, thanks.

> --
> Josh

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Oleg Nesterov

On 07/10, Jiri Olsa wrote:
>
> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, 
> > struct bpf_uprobe *uprobes,
> >  {
> > u32 i;
> >
> > -   for (i = 0; i < cnt; i++) {
> > -   uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > - [i].consumer);
> > -   }
>
> nice, we could also drop path argument now

Indeed, I missed that. Will send V2 when/if this makes any sense.

Thanks!

Oleg.

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
> >
> > virtio balloon communicates to the core that in some
> > configurations vq #s are non-contiguous by setting name
> > pointer to NULL.
> >
> > Unfortunately, core then turned around and just made them
> > contiguous again. Result is that driver is out of spec.
> 
> Thanks for fixing this - I think the overall approach of the patch looks good.
> 
> > Implement what the API was supposed to do
> > in the 1st place. Compatibility with buggy hypervisors
> > is handled inside virtio-balloon, which is the only driver
> > making use of this facility, so far.
> 
> In addition to virtio-balloon, I believe the same problem also affects
> the virtio-fs device, since queue 1 is only supposed to be present if
> VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> meant to be queue indexes 2 and up. From a look at the Linux driver
> (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> and assumes that request queues start at index 1 rather than 2, which
> looks out of spec to me, but the current device implementations (that
> I am aware of, anyway) are also broken in the same way, so it ends up
> working today. Queue numbering in a spec-compliant device and the
> current Linux driver would mismatch; what the driver considers to be
> the first request queue (index 1) would be ignored by the device since
> queue index 1 has no function if F_NOTIFICATION isn't negotiated.
> 
> [...]
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index 7d82facafd75..fa606e7321ad 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > struct virtqueue_info *vqi;
> > u16 msix_vec;
> > -   int i, err, nvectors, allocated_vectors, queue_idx = 0;
> > +   int i, err, nvectors, allocated_vectors;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > msix_vec = allocated_vectors++;
> > else
> > msix_vec = VP_MSIX_VQ_VECTOR;
> > -   vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > +   vqs[i] = vp_setup_vq(vdev, i, vqi->callback,
> >  vqi->name, vqi->ctx, msix_vec);
> > if (IS_ERR(vqs[i])) {
> > err = PTR_ERR(vqs[i]);
> > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtqueue_info vqs_info[])
> >  {
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > -   int i, err, queue_idx = 0;
> > +   int i, err;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device 
> > *vdev, unsigned int nvqs,
> > vqs[i] = NULL;
> > continue;
> > }
> > +<<< HEAD
> > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> >  vqi->name, vqi->ctx,
> > +===
> > +   vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> > +ctx ? ctx[i] : false,
> > +>>> f814759f80b7... virtio: fix vq # for balloon
> 
> This still has merge markers in it.
> 
> Thanks,
> -- Daniel

ouch forgot to commit ;)

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote:
> On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
> >
> > virtio balloon communicates to the core that in some
> > configurations vq #s are non-contiguous by setting name
> > pointer to NULL.
> >
> > Unfortunately, core then turned around and just made them
> > contiguous again. Result is that driver is out of spec.
> 
> Thanks for fixing this - I think the overall approach of the patch looks good.
> 
> > Implement what the API was supposed to do
> > in the 1st place. Compatibility with buggy hypervisors
> > is handled inside virtio-balloon, which is the only driver
> > making use of this facility, so far.
> 
> In addition to virtio-balloon, I believe the same problem also affects
> the virtio-fs device, since queue 1 is only supposed to be present if
> VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
> meant to be queue indexes 2 and up. From a look at the Linux driver
> (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
> and assumes that request queues start at index 1 rather than 2, which
> looks out of spec to me, but the current device implementations (that
> I am aware of, anyway) are also broken in the same way, so it ends up
> working today. Queue numbering in a spec-compliant device and the
> current Linux driver would mismatch; what the driver considers to be
> the first request queue (index 1) would be ignored by the device since
> queue index 1 has no function if F_NOTIFICATION isn't negotiated.


Oh, thanks a lot for pointing this out!

I see so this patch is no good as is, we need to add a workaround for
virtio-fs first.

QEMU workaround is simple - just add an extra queue. But I did not
reasearch how this would interact with vhost-user.

>From driver POV, I guess we could just ignore queue # 1 - would that be
ok or does it have performance implications?
Or do what I did for balloon here: try with spec compliant #s first,
if that fails then assume it's the spec issue and shift by 1.


> [...]
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index 7d82facafd75..fa606e7321ad 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > struct virtqueue_info *vqi;
> > u16 msix_vec;
> > -   int i, err, nvectors, allocated_vectors, queue_idx = 0;
> > +   int i, err, nvectors, allocated_vectors;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > msix_vec = allocated_vectors++;
> > else
> > msix_vec = VP_MSIX_VQ_VECTOR;
> > -   vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > +   vqs[i] = vp_setup_vq(vdev, i, vqi->callback,
> >  vqi->name, vqi->ctx, msix_vec);
> > if (IS_ERR(vqs[i])) {
> > err = PTR_ERR(vqs[i]);
> > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> > unsigned int nvqs,
> > struct virtqueue_info vqs_info[])
> >  {
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > -   int i, err, queue_idx = 0;
> > +   int i, err;
> >
> > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> > if (!vp_dev->vqs)
> > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device 
> > *vdev, unsigned int nvqs,
> > vqs[i] = NULL;
> > continue;
> > }
> > +<<< HEAD
> > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> >  vqi->name, vqi->ctx,
> > +===
> > +   vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> > +ctx ? ctx[i] : false,
> > +>>> f814759f80b7... virtio: fix vq # for balloon
> 
> This still has merge markers in it.
> 
> Thanks,
> -- Daniel

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa  wrote:
>
> On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
>
> SNIP
>
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 467f358c8ce7..7571811127a2 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> >   loff_t offset;
> >   unsigned long ref_ctr_offset;
> >   u64 cookie;
> > + struct uprobe *uprobe;
> >   struct uprobe_consumer consumer;
> >  };
> >
> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, 
> > struct bpf_uprobe *uprobes,
> >  {
> >   u32 i;
> >
> > - for (i = 0; i < cnt; i++) {
> > - uprobe_unregister(d_real_inode(path->dentry), 
> > uprobes[i].offset,
> > -   [i].consumer);
> > - }
>
> nice, we could also drop path argument now

see my comments to Oleg, I think we can/should get rid of link->path
altogether if uprobe itself keeps inode alive.

BTW, Jiri, do we have any test for multi-uprobe that simulates partial
attachment success/failure (whichever way you want to look at it). It
would be super useful to have to check at least some error handling
code in the uprobe code base. If we don't, do you mind adding
something simple to BPF selftests?

>
> jirka
>
> > + for (i = 0; i < cnt; i++)
> > + uprobe_unregister(uprobes[i].uprobe, [i].consumer);
> >  }
> >
> >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union 
> > bpf_attr *attr, struct bpf_prog *pr
> > _uprobe_multi_link_lops, prog);
> >
> >   for (i = 0; i < cnt; i++) {
> > - err = uprobe_register(d_real_inode(link->path.dentry),
> > + uprobes[i].uprobe = 
> > uprobe_register(d_real_inode(link->path.dentry),
> >uprobes[i].offset,
> >uprobes[i].ref_ctr_offset,
> >[i].consumer);
> > - if (err) {
> > + if (IS_ERR(uprobes[i].uprobe)) {
> > + err = PTR_ERR(uprobes[i].uprobe);
> >   bpf_uprobe_unregister(, uprobes, i);
> >   goto error_free;
> >   }

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov  wrote:
>
> This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
> put_uprobe(). And to me this change simplifies the code a bit.
>
> Signed-off-by: Oleg Nesterov 
> ---
>  include/linux/uprobes.h | 14 ++--
>  kernel/events/uprobes.c | 45 -
>  kernel/trace/bpf_trace.c| 12 +-
>  kernel/trace/trace_uprobe.c | 28 +++
>  4 files changed, 41 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index aa89a8b67039..399509befcf4 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h

I don't see struct uprobe forward-declared in this header, maybe we
should add it?

> @@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
>  extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
> *mm, unsigned long vaddr, uprobe_opcode_t);
> -extern int uprobe_register(struct inode *inode, loff_t offset, loff_t 
> ref_ctr_offset, struct uprobe_consumer *uc);
> -extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc, bool);
> -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
> uprobe_consumer *uc);
> +extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, 
> loff_t ref_ctr_offset, struct uprobe_consumer *uc);
> +extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, 
> bool);
> +extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
> *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
>  extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, 
> unsigned long end);
>  extern void uprobe_start_dup_mmap(void);
> @@ -147,18 +147,18 @@ static inline void uprobes_init(void)
>
>  #define uprobe_get_trap_addr(regs) instruction_pointer(regs)
>
> -static inline int
> +static inline struct uprobe *
>  uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, 
> struct uprobe_consumer *uc)
>  {
> -   return -ENOSYS;
> +   return ERR_PTR(-ENOSYS);
>  }
>  static inline int
> -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
> bool add)
> +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
>  {
> return -ENOSYS;
>  }

complete aside, when I was looking at this code I was wondering why we
even need uprobe_apply, it looks like some hacky variant of
uprobe_register and uprobe_unregister. I didn't dig deeper, but think
whether we even need this? If this is just to avoid (for some period)
some consumer callback calling, then that could be handled at the
consumer side by ignoring such calls.

callback call is cheap, it's the int3 handling that's expensive and
with uprobe_apply we are already paying it anyways, so what is this
for?

>  static inline void
> -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer 
> *uc)
> +uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
>  }
>  static inline int uprobe_mmap(struct vm_area_struct *vma)

[...]

>
> @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
>   * refcount is released when the last @uc for the @uprobe
>   * unregisters. Caller of uprobe_register() is required to keep @inode
>   * (and the containing mount) referenced.
> - *
> - * Return errno if it cannot successully install probes
> - * else return 0 (success)

mention that it never returns NULL, but rather encodes error code
inside the pointer on error? It's an important part of the contract.

>   */
> -int uprobe_register(struct inode *inode, loff_t offset, loff_t 
> ref_ctr_offset,
> -   struct uprobe_consumer *uc)
> +struct uprobe *uprobe_register(struct inode *inode,
> +   loff_t offset, loff_t ref_ctr_offset,
> +   struct uprobe_consumer *uc)
>  {

[...]

> @@ -1186,35 +1177,27 @@ int uprobe_register(struct inode *inode, loff_t 
> offset, loff_t ref_ctr_offset,
>
> if (unlikely(ret == -EAGAIN))
> goto retry;
> -   return ret;
> +
> +   return ret ? ERR_PTR(ret) : uprobe;
>  }
>  EXPORT_SYMBOL_GPL(uprobe_register);
>
>  /*

this should be /** for doccomment checking (you'd get a warning for
missing @uprobe if there was this extra star)

>   * uprobe_apply - unregister an already registered probe.
> - * @inode: the file in which the probe has to be removed.
> - * @offset: offset from the start of the file.

add @uprobe description now?

>   * @uc: consumer which wants to add more or remove some breakpoints
>   * @add: add or remove the breakpoints
>   */
> -int uprobe_apply(struct inode *inode, loff_t offset,
> -   struct uprobe_consumer *uc,

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Daniel Verkamp

On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin  wrote:
>
> virtio balloon communicates to the core that in some
> configurations vq #s are non-contiguous by setting name
> pointer to NULL.
>
> Unfortunately, core then turned around and just made them
> contiguous again. Result is that driver is out of spec.

Thanks for fixing this - I think the overall approach of the patch looks good.

> Implement what the API was supposed to do
> in the 1st place. Compatibility with buggy hypervisors
> is handled inside virtio-balloon, which is the only driver
> making use of this facility, so far.

In addition to virtio-balloon, I believe the same problem also affects
the virtio-fs device, since queue 1 is only supposed to be present if
VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are
meant to be queue indexes 2 and up. From a look at the Linux driver
(virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION
and assumes that request queues start at index 1 rather than 2, which
looks out of spec to me, but the current device implementations (that
I am aware of, anyway) are also broken in the same way, so it ends up
working today. Queue numbering in a spec-compliant device and the
current Linux driver would mismatch; what the driver considers to be
the first request queue (index 1) would be ignored by the device since
queue index 1 has no function if F_NOTIFICATION isn't negotiated.

[...]
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 7d82facafd75..fa606e7321ad 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned int nvqs,
> struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> struct virtqueue_info *vqi;
> u16 msix_vec;
> -   int i, err, nvectors, allocated_vectors, queue_idx = 0;
> +   int i, err, nvectors, allocated_vectors;
>
> vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> if (!vp_dev->vqs)
> @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned int nvqs,
> msix_vec = allocated_vectors++;
> else
> msix_vec = VP_MSIX_VQ_VECTOR;
> -   vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> +   vqs[i] = vp_setup_vq(vdev, i, vqi->callback,
>  vqi->name, vqi->ctx, msix_vec);
> if (IS_ERR(vqs[i])) {
> err = PTR_ERR(vqs[i]);
> @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> unsigned int nvqs,
> struct virtqueue_info vqs_info[])
>  {
> struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> -   int i, err, queue_idx = 0;
> +   int i, err;
>
> vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
> if (!vp_dev->vqs)
> @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> unsigned int nvqs,
> vqs[i] = NULL;
> continue;
> }
> +<<< HEAD
> vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
>  vqi->name, vqi->ctx,
> +===
> +   vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> +ctx ? ctx[i] : false,
> +>>> f814759f80b7... virtio: fix vq # for balloon

This still has merge markers in it.

Thanks,
-- Daniel

Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 9:32 AM Oleg Nesterov  wrote:
>
> It doesn't make any sense to have 2 versions of _register(). Note that
> trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
> to check tu->ref_ctr_offset to decide which one should be used, it could
> safely pass ref_ctr_offset == 0 to uprobe_register_refctr().
>
> Add this argument to uprobe_register(), update the callers, and kill
> uprobe_register_refctr().
>
> Signed-off-by: Oleg Nesterov 
> ---
>  include/linux/uprobes.h |  9 ++---
>  kernel/events/uprobes.c | 23 +--
>  kernel/trace/bpf_trace.c|  2 +-
>  kernel/trace/trace_uprobe.c |  8 ++--
>  4 files changed, 10 insertions(+), 32 deletions(-)
>

LGTM with few nits below.

Acked-by: Andrii Nakryiko 

>  /*
>   * uprobe_apply - unregister an already registered probe.
>   * @inode: the file in which the probe has to be removed.
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d1daeab1bbc1..467f358c8ce7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr 
> *attr, struct bpf_prog *pr
>   _uprobe_multi_link_lops, prog);
>
> for (i = 0; i < cnt; i++) {
> -   err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> +   err = uprobe_register(d_real_inode(link->path.dentry),
>  uprobes[i].offset,
>  uprobes[i].ref_ctr_offset,
>  [i].consumer);

please adjust indentation here

> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c98e3b3386ba..78a5c40e885a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1089,12 +1089,8 @@ static int trace_uprobe_enable(struct trace_uprobe 
> *tu, filter_func_t filter)
> tu->consumer.filter = filter;
> tu->inode = d_real_inode(tu->path.dentry);
>
> -   if (tu->ref_ctr_offset)
> -   ret = uprobe_register_refctr(tu->inode, tu->offset,
> -   tu->ref_ctr_offset, >consumer);
> -   else
> -   ret = uprobe_register(tu->inode, tu->offset, >consumer);
> -
> +   ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
> + >consumer);

doesn't fit under 100 characters? If it does, please keep as a single line.

> if (ret)
> tu->inode = NULL;
>
> --
> 2.25.1.362.g51ebf55
>
>

[PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Oleg Nesterov

This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
put_uprobe(). And to me this change simplifies the code a bit.

Signed-off-by: Oleg Nesterov 
---
 include/linux/uprobes.h | 14 ++--
 kernel/events/uprobes.c | 45 -
 kernel/trace/bpf_trace.c| 12 +-
 kernel/trace/trace_uprobe.c | 28 +++
 4 files changed, 41 insertions(+), 58 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index aa89a8b67039..399509befcf4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
-extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
-extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, 
bool);
+extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, 
unsigned long end);
 extern void uprobe_start_dup_mmap(void);
@@ -147,18 +147,18 @@ static inline void uprobes_init(void)
 
 #define uprobe_get_trap_addr(regs) instruction_pointer(regs)
 
-static inline int
+static inline struct uprobe *
 uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, 
struct uprobe_consumer *uc)
 {
-   return -ENOSYS;
+   return ERR_PTR(-ENOSYS);
 }
 static inline int
-uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
+uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
 {
return -ENOSYS;
 }
 static inline void
-uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 }
 static inline int uprobe_mmap(struct vm_area_struct *vma)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e5b7c6239970..dba41403d7b8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1100,22 +1100,15 @@ __uprobe_unregister(struct uprobe *uprobe, struct 
uprobe_consumer *uc)
 
 /*
  * uprobe_unregister - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
+ * @uprobe: uprobe to remove
  * @offset: offset from the start of the file.
  * @uc: identify which probe if multiple probes are colocated.
  */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
down_write(>register_rwsem);
__uprobe_unregister(uprobe, uc);
up_write(>register_rwsem);
-   put_uprobe(uprobe);
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
@@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * refcount is released when the last @uc for the @uprobe
  * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
- *
- * Return errno if it cannot successully install probes
- * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
-   struct uprobe_consumer *uc)
+struct uprobe *uprobe_register(struct inode *inode,
+   loff_t offset, loff_t ref_ctr_offset,
+   struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
 
/* Uprobe must have at least one set consumer */
if (!uc->handler && !uc->ret_handler)
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
 
/* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
if (!inode->i_mapping->a_ops->read_folio &&
!shmem_mapping(inode->i_mapping))
-   return -EIO;
+   return ERR_PTR(-EIO);
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
 
/*
 * This ensures that copy_from_page(), copy_to_page() and
 * __update_ref_ctr() can't cross page boundary.
 */

Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *

2024-07-10 Thread Jiri Olsa

On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:

SNIP

> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 467f358c8ce7..7571811127a2 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
>   loff_t offset;
>   unsigned long ref_ctr_offset;
>   u64 cookie;
> + struct uprobe *uprobe;
>   struct uprobe_consumer consumer;
>  };
>  
> @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, 
> struct bpf_uprobe *uprobes,
>  {
>   u32 i;
>  
> - for (i = 0; i < cnt; i++) {
> - uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> -   [i].consumer);
> - }

nice, we could also drop path argument now

jirka

> + for (i = 0; i < cnt; i++)
> + uprobe_unregister(uprobes[i].uprobe, [i].consumer);
>  }
>  
>  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr 
> *attr, struct bpf_prog *pr
> _uprobe_multi_link_lops, prog);
>  
>   for (i = 0; i < cnt; i++) {
> - err = uprobe_register(d_real_inode(link->path.dentry),
> + uprobes[i].uprobe = 
> uprobe_register(d_real_inode(link->path.dentry),
>uprobes[i].offset,
>uprobes[i].ref_ctr_offset,
>[i].consumer);
> - if (err) {
> + if (IS_ERR(uprobes[i].uprobe)) {
> + err = PTR_ERR(uprobes[i].uprobe);
>   bpf_uprobe_unregister(, uprobes, i);
>   goto error_free;
>   }

[PATCH 2/3] uprobes: simplify error handling for alloc_uprobe()

2024-07-10 Thread Oleg Nesterov

From: Andrii Nakryiko 

Return -ENOMEM instead of NULL, which makes caller's error handling just
a touch simpler.

Signed-off-by: Andrii Nakryiko 
Signed-off-by: Oleg Nesterov 
---
 kernel/events/uprobes.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4950decebe5c..e5b7c6239970 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
 
uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
if (!uprobe)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
uprobe->inode = inode;
uprobe->offset = offset;
@@ -1166,8 +1166,6 @@ int uprobe_register(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset,
 
  retry:
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
-   if (!uprobe)
-   return -ENOMEM;
if (IS_ERR(uprobe))
return PTR_ERR(uprobe);
 
-- 
2.25.1.362.g51ebf55

[PATCH 1/3] uprobes: kill uprobe_register_refctr()

2024-07-10 Thread Oleg Nesterov

It doesn't make any sense to have 2 versions of _register(). Note that
trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
to check tu->ref_ctr_offset to decide which one should be used, it could
safely pass ref_ctr_offset == 0 to uprobe_register_refctr().

Add this argument to uprobe_register(), update the callers, and kill
uprobe_register_refctr().

Signed-off-by: Oleg Nesterov 
---
 include/linux/uprobes.h |  9 ++---
 kernel/events/uprobes.c | 23 +--
 kernel/trace/bpf_trace.c|  2 +-
 kernel/trace/trace_uprobe.c |  8 ++--
 4 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..aa89a8b67039 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,8 +110,7 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
-extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_register(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -149,11 +148,7 @@ static inline void uprobes_init(void)
 #define uprobe_get_trap_addr(regs) instruction_pointer(regs)
 
 static inline int
-uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-   return -ENOSYS;
-}
-static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, 
struct uprobe_consumer *uc)
 {
return -ENOSYS;
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1bdf386485d4..4950decebe5c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1120,25 +1120,25 @@ void uprobe_unregister(struct inode *inode, loff_t 
offset, struct uprobe_consume
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
 /*
- * __uprobe_register - register a probe
+ * uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, __uprobe_register() takes a creation
+ * Apart from the access refcount, uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
+ * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-static int __uprobe_register(struct inode *inode, loff_t offset,
-loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
+   struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -1190,21 +1190,8 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
goto retry;
return ret;
 }
-
-int uprobe_register(struct inode *inode, loff_t offset,
-   struct uprobe_consumer *uc)
-{
-   return __uprobe_register(inode, offset, 0, uc);
-}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
-int uprobe_register_refctr(struct inode *inode, loff_t offset,
-  loff_t ref_ctr_offset, struct uprobe_consumer *uc)
-{
-   return __uprobe_register(inode, offset, ref_ctr_offset, uc);
-}
-EXPORT_SYMBOL_GPL(uprobe_register_refctr);
-
 /*
  * uprobe_apply - unregister an already registered probe.
  * @inode: the file in which the probe has to be removed.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d1daeab1bbc1..467f358c8ce7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr 
*attr, struct bpf_prog *pr
  _uprobe_multi_link_lops, prog);
 
for (i = 0; i < cnt; i++) {
-   err =

[PATCH 0/3] uprobes: future cleanups for review

2024-07-10 Thread Oleg Nesterov

On 07/10, Oleg Nesterov wrote:
>
> Peter, these simple cleanups should not conflict with your changes,
> but I can resend them later if it causes any inconvenience.

In fact I would like to push 2 more cleanups before the more significant
changes, but they certainly conflict with your ongoing work, albeit only
textually.

Let me send the patches for review anyway, perhaps you can take at least
the 1st one.

3/3 is only compile tested so far. Andrii, can you take a look?

Oleg.

Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-07-10 Thread Josh Poimboeuf

On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra  wrote:
> > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > > You can see it replaced the first byte, the following 3 bytes are
> > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > > you can see I stepped through a few more instructions.
> > >
> > > Works by accident?
> >
> > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> > that supports it.
> 
> OK, I don't know what the implications are, but it's a good accident :)
> 
> Anyways, what should I do for v4? Drop is_endbr6() check or keep it?

Given the current behavior of uprobe overwriting ENDBR64 with INT3, the
is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry()
would never return true on OSes which have the ENDBR64 compiled in.

However, once userspace IBT actually gets enabled, uprobe should skip
the ENDBR64 and patch the subsequent instruction.  Then the is_endbr6()
check would no longer be needed.

-- 
Josh

Re: [PATCH v3 9/9] dt-bindings: interconnect: qcom: msm8953: Fix 'See also' in description

2024-07-10 Thread Rob Herring (Arm)



On Tue, 09 Jul 2024 12:22:54 +0200, Adam Skladowski wrote:
> "See also" in description seems to be wrongly defined,
> make it inline with other yamls.
> 
> Fixes: 791ed23f735b ("dt-bindings: interconnect: qcom: Add Qualcomm MSM8953 
> NoC")
> Signed-off-by: Adam Skladowski 
> ---
>  .../devicetree/bindings/interconnect/qcom,msm8953.yaml | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 

Acked-by: Rob Herring (Arm)

Re: [PATCH v3 7/9] dt-bindings: interconnect: qcom: msm8939: Fix example

2024-07-10 Thread Rob Herring (Arm)



On Tue, 09 Jul 2024 12:22:52 +0200, Adam Skladowski wrote:
> For now example list snoc_mm as children of bimc which is obviously
> not valid, drop bimc and move snoc_mm into snoc.
> 
> Signed-off-by: Adam Skladowski 
> ---
>  .../devicetree/bindings/interconnect/qcom,msm8939.yaml  | 6 --
>  1 file changed, 6 deletions(-)
> 

Reviewed-by: Rob Herring (Arm)

Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support

2024-07-10 Thread David Woodhouse

On Wed, 2024-07-10 at 15:07 +0200, Peter Hilber wrote:
> On 08.07.24 11:27, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > The vmclock "device" provides a shared memory region with precision clock
> > information. By using shared memory, it is safe across Live Migration.
> > 
> > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > actually helpful.
> > 
> > The memory region of the device is also exposed to userspace so it can be
> > read or memory mapped by application which need reliable notification of
> > clock disruptions.
> > 
> > Signed-off-by: David Woodhouse 
> 
> [...]
> 
> > +
> > +struct vmclock_abi {
> > +   /* CONSTANT FIELDS */
> > +   uint32_t magic;
> > +#define VMCLOCK_MAGIC  0x4b4c4356 /* "VCLK" */
> > +   uint32_t size;  /* Size of region containing this structure 
> > */
> > +   uint16_t version;   /* 1 */
> > +   uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except 
> > INVALID */
> > +#define VMCLOCK_COUNTER_ARM_VCNT   0
> > +#define VMCLOCK_COUNTER_X86_TSC1
> > +#define VMCLOCK_COUNTER_INVALID0xff
> > +   uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
> > +#define VMCLOCK_TIME_UTC   0   /* Since 1970-01-01 
> > 00:00:00z */
> > +#define VMCLOCK_TIME_TAI   1   /* Since 1970-01-01 
> > 00:00:00z */
> > +#define VMCLOCK_TIME_MONOTONIC 2   /* Since undefined 
> > epoch */
> > +#define VMCLOCK_TIME_INVALID_SMEARED   3   /* Not supported */
> > +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4   /* Not supported */
> > +
> > +   /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
> > +   uint32_t seq_count; /* Low bit means an update is in progress */
> > +   /*
> > +    * This field changes to another non-repeating value when the CPU
> > +    * counter is disrupted, for example on live migration. This lets
> > +    * the guest know that it should discard any calibration it has
> > +    * performed of the counter against external sources (NTP/PTP/etc.).
> > +    */
> > +   uint64_t disruption_marker;
> > +   uint64_t flags;
> > +   /* Indicates that the tai_offset_sec field is valid */
> > +#define VMCLOCK_FLAG_TAI_OFFSET_VALID  (1 << 0)
> > +   /*
> > +    * Optionally used to notify guests of pending maintenance events.
> > +    * A guest which provides latency-sensitive services may wish to
> > +    * remove itself from service if an event is coming up. Two flags
> > +    * indicate the approximate imminence of the event.
> > +    */
> > +#define VMCLOCK_FLAG_DISRUPTION_SOON   (1 << 1) /* About a day */
> > +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT   (1 << 2) /* About an hour */
> > +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3)
> > +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4)
> > +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID   (1 << 5)
> > +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID   (1 << 6)
> > +   /*
> > +    * Even regardless of leap seconds, the time presented through this
> > +    * mechanism may not be strictly monotonic. If the counter slows 
> > down
> > +    * and the host adapts to this discovery, the time calculated from
> > +    * the value of the counter immediately after an update to this
> > +    * structure, may appear to be *earlier* than a calculation just
> > +    * before the update (while the counter was believed to be running
> > +    * faster than it now is). A guest operating system will typically
> > +    * *skew* its own system clock back towards the reference clock
> > +    * exposed here, rather than following this clock directly. If,
> > +    * however, this structure is being populated from such a system
> > +    * clock which is already handled in such a fashion and the results
> > +    * *are* guaranteed to be monotonic, such monotonicity can be
> > +    * advertised by setting this bit.
> > +    */
> 
> I wonder if this might be difficult to define in a standard.

I'm sure we could do better than my attempt above, but surely it isn't
*so* hard to define monotonicity?

> Is there a need to define device and driver behavior in more detail? What
> would happen if e.g. the device first decides how to update the clock, but
> is then slow to update the SHM?

That's OK, isn't it?

The key in the VMCLOCK_FLAG_TIME_MONOTONIC flag is that by setting it,
the host guarantees that the time calculated according to this
structure at any given moment, shall not appear to be later than the
time calculated via the structure at any *later* moment.

The kernel typically takes create care to ensure that time as seen by
userspace (e.g. in the real vDSO) *is* monotonic. If the underlying
counter is speeding up and the

Re: [PATCH v3 3/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8937 NoC

2024-07-10 Thread Rob Herring (Arm)



On Tue, 09 Jul 2024 12:22:48 +0200, Adam Skladowski wrote:
> Add bindings for Qualcomm MSM8937 Network-On-Chip interconnect devices.
> 
> Signed-off-by: Adam Skladowski 
> ---
>  .../bindings/interconnect/qcom,msm8939.yaml   |  8 +-
>  .../dt-bindings/interconnect/qcom,msm8937.h   | 93 +++
>  2 files changed, 99 insertions(+), 2 deletions(-)
>  create mode 100644 include/dt-bindings/interconnect/qcom,msm8937.h
> 

Reviewed-by: Rob Herring (Arm)

Re: [PATCH 1/1] remoteproc: mediatek: Support multiple reserved memory regions

2024-07-10 Thread Mathieu Poirier

On Wed, Jul 03, 2024 at 07:53:08PM +0800, Shun-yi Wang wrote:
> From: "shun-yi.wang" 
> 
> SCP supports multiple reserved memory regions, intended for
> specific hardwards.
>
> Signed-off-by: shun-yi.wang 
> ---
>  drivers/remoteproc/mtk_scp.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
> index 9ecd5ea04b5f3..1902826cea0af 100644
> --- a/drivers/remoteproc/mtk_scp.c
> +++ b/drivers/remoteproc/mtk_scp.c
> @@ -1006,22 +1006,31 @@ EXPORT_SYMBOL_GPL(scp_mapping_dm_addr);
>  
>  static int scp_map_memory_region(struct mtk_scp *scp)
>  {
> - int ret;
> + int ret, i, err;
>   const struct mtk_scp_sizes_data *scp_sizes;
> + struct device_node *node = scp->dev->of_node;
> + struct of_phandle_iterator it;
> +
> + i = 0;
> + of_for_each_phandle(, err, node, "memory-region", NULL, 0) {
> + ret = of_reserved_mem_device_init_by_idx(scp->dev, node, i);
> +
> + if (ret) {
> + dev_err(scp->dev, "failed to assign memory-region: 
> %s\n",
> + it.node->name);
> + of_node_put(it.node);
> + return -ENOMEM;
> + }

With this patch the code is out of sync with the bindings which are still
specifying a maxItems of 1 - please address.

Thanks,
Mathieu

>  
> - ret = of_reserved_mem_device_init(scp->dev);
> + i++;
> + }
>  
>   /* reserved memory is optional. */
> - if (ret == -ENODEV) {
> + if (!i) {
>   dev_info(scp->dev, "skipping reserved memory initialization.");
>   return 0;
>   }
>  
> - if (ret) {
> - dev_err(scp->dev, "failed to assign memory-region: %d\n", ret);
> - return -ENOMEM;
> - }
> -
>   /* Reserved SCP code size */
>   scp_sizes = scp->data->scp_sizes;
>   scp->cpu_addr = dma_alloc_coherent(scp->dev, scp_sizes->max_dram_size,
> -- 
> 2.18.0
>

Re: [PATCH v3 3/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8937 NoC

2024-07-10 Thread Rob Herring

On Tue, Jul 09, 2024 at 12:22:48PM +0200, Adam Skladowski wrote:
> Add bindings for Qualcomm MSM8937 Network-On-Chip interconnect devices.

That is obvious. What would be useful is detailing how 8937 is similar 
to the existing devices.

> 
> Signed-off-by: Adam Skladowski 
> ---
>  .../bindings/interconnect/qcom,msm8939.yaml   |  8 +-
>  .../dt-bindings/interconnect/qcom,msm8937.h   | 93 +++
>  2 files changed, 99 insertions(+), 2 deletions(-)
>  create mode 100644 include/dt-bindings/interconnect/qcom,msm8937.h
> 
> diff --git a/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml 
> b/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml
> index 0641a3c992a5..d19e20247df8 100644
> --- a/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml
> +++ b/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml
> @@ -4,13 +4,13 @@
>  $id: http://devicetree.org/schemas/interconnect/qcom,msm8939.yaml#
>  $schema: http://devicetree.org/meta-schemas/core.yaml#
>  
> -title: Qualcomm MSM8939/MSM8976 Network-On-Chip interconnect
> +title: Qualcomm MSM8937/MSM8939/MSM8976 Network-On-Chip interconnect
>  
>  maintainers:
>- Konrad Dybcio 
>  
>  description:
> -  The Qualcomm MSM8939/MSM8976 interconnect providers support
> +  The Qualcomm MSM8937/MSM8939/MSM8976 interconnect providers support
>adjusting the bandwidth requirements between the various NoC fabrics.
>  
>  allOf:
> @@ -19,6 +19,9 @@ allOf:
>  properties:
>compatible:
>  enum:
> +  - qcom,msm8937-bimc
> +  - qcom,msm8937-pcnoc
> +  - qcom,msm8937-snoc
>- qcom,msm8939-bimc
>- qcom,msm8939-pcnoc
>- qcom,msm8939-snoc
> @@ -43,6 +46,7 @@ patternProperties:
>  properties:
>compatible:
>  enum:
> +  - qcom,msm8937-snoc-mm
>- qcom,msm8939-snoc-mm
>- qcom,msm8976-snoc-mm
>  
> diff --git a/include/dt-bindings/interconnect/qcom,msm8937.h 
> b/include/dt-bindings/interconnect/qcom,msm8937.h
> new file mode 100644
> index ..98b8a4637aab
> --- /dev/null
> +++ b/include/dt-bindings/interconnect/qcom,msm8937.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */
> +/*
> + * Qualcomm MSM8937 interconnect IDs
> + */
> +
> +#ifndef __DT_BINDINGS_INTERCONNECT_QCOM_MSM8937_H
> +#define __DT_BINDINGS_INTERCONNECT_QCOM_MSM8937_H
> +
> +/* BIMC fabric */
> +#define MAS_APPS_PROC0
> +#define MAS_OXILI1
> +#define MAS_SNOC_BIMC_0  2
> +#define MAS_SNOC_BIMC_2  3
> +#define MAS_SNOC_BIMC_1  4
> +#define MAS_TCU_05
> +#define SLV_EBI  6
> +#define SLV_BIMC_SNOC7
> +
> +/* PCNOC fabric */
> +#define MAS_SPDM 0
> +#define MAS_BLSP_1   1
> +#define MAS_BLSP_2   2
> +#define MAS_USB_HS1  3
> +#define MAS_XI_USB_HS1   4
> +#define MAS_CRYPTO   5
> +#define MAS_SDCC_1   6
> +#define MAS_SDCC_2   7
> +#define MAS_SNOC_PCNOC   8
> +#define PCNOC_M_09
> +#define PCNOC_M_110
> +#define PCNOC_INT_0  11
> +#define PCNOC_INT_1  12
> +#define PCNOC_INT_2  13
> +#define PCNOC_INT_3  14
> +#define PCNOC_S_015
> +#define PCNOC_S_116
> +#define PCNOC_S_217
> +#define PCNOC_S_318
> +#define PCNOC_S_419
> +#define PCNOC_S_620
> +#define PCNOC_S_721
> +#define PCNOC_S_822
> +#define SLV_SDCC_2   23
> +#define SLV_SPDM 24
> +#define SLV_PDM  25
> +#define SLV_PRNG 26
> +#define SLV_TCSR 27
> +#define SLV_SNOC_CFG 28
> +#define SLV_MESSAGE_RAM  29
> +#define SLV_CAMERA_SS_CFG30
> +#define SLV_DISP_SS_CFG  31
> +#define SLV_VENUS_CFG32
> +#define SLV_GPU_CFG  33
> +#define SLV_TLMM 34
> +#define SLV_BLSP_1   35
> +#define SLV_BLSP_2   36
> +#define SLV_PMIC_ARB 37
> +#define SLV_SDCC_1   38
> +#define SLV_CRYPTO_0_CFG 39
> +#define SLV_USB_HS   40
> +#define SLV_TCU  41
> +#define SLV_PCNOC_SNOC   42
> +
> +/* SNOC fabric */
> +#define MAS_QDSS_BAM 0
> +#define MAS_BIMC_SNOC1
> +#define MAS_PCNOC_SNOC   2
> +#define MAS_QDSS_ETR 3
> +#define QDSS_INT 4
> +#define SNOC_INT_0   5
> +#define SNOC_INT_1   6
> +#define SNOC_INT_2   7
> +#define SLV_KPSS_AHB 8
> +#define SLV_WCSS 9
> +#define SLV_SNOC_BIMC_1  10
> +#define SLV_IMEM 11
> +#define SLV_SNOC_PCNOC   12
> +#define SLV_QDSS_STM 13
> +#define SLV_CATS_1   14
> +#define SLV_LPASS15
> +
> +/* SNOC-MM fabric */
> +#define MAS_JPEG

Re: [PATCH v3 1/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8976 NoC

2024-07-10 Thread Rob Herring

On Tue, Jul 09, 2024 at 12:22:46PM +0200, Adam Skladowski wrote:
> Add bindings for Qualcomm MSM8976 Network-On-Chip interconnect devices.
> 
> Signed-off-by: Adam Skladowski 
> ---
>  .../bindings/interconnect/qcom,msm8939.yaml   | 15 ++-
>  .../dt-bindings/interconnect/qcom,msm8976.h   | 97 +++
>  2 files changed, 107 insertions(+), 5 deletions(-)
>  create mode 100644 include/dt-bindings/interconnect/qcom,msm8976.h

Reviewed-by: Rob Herring (Arm)

Re: [PATCH 0/1] Support multiple reserved memory regions

2024-07-10 Thread Mathieu Poirier

Good morning,

On Wed, Jul 03, 2024 at 07:53:07PM +0800, Shun-yi Wang wrote:
> From: "shun-yi.wang" 
> 
> Besides the reserved memory region for SCP, there are additional 
> reserved memory regions for specific hardware use.
> Currently, only a single memory region is supported.
> Modifications are made to support multiple memory regions.
> 

This is the changelog that should be in the patch.  Usually there is no need for
a cover letter when there is a single patch in the set.

> shun-yi.wang (1):
>   remoteproc: mediatek: support multiple reserved memory
> 
>  drivers/remoteproc/mtk_scp.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> -- 
> 2.18.0
>

Re: [PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Mathieu Poirier

On Wed, 10 Jul 2024 at 05:43, Michael S. Tsirkin  wrote:
>
> virtio balloon communicates to the core that in some
> configurations vq #s are non-contiguous by setting name
> pointer to NULL.
>
> Unfortunately, core then turned around and just made them
> contiguous again. Result is that driver is out of spec.
>
> Implement what the API was supposed to do
> in the 1st place. Compatibility with buggy hypervisors
> is handled inside virtio-balloon, which is the only driver
> making use of this facility, so far.
>
> Message-ID: 
> Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page 
> reports to host")
> Cc: "Alexander Duyck" 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  arch/um/drivers/virtio_uml.c   |  4 ++--
>  drivers/remoteproc/remoteproc_virtio.c |  4 ++--

Reviewed-by: Mathieu Poirier 

>  drivers/s390/virtio/virtio_ccw.c   |  4 ++--
>  drivers/virtio/virtio_mmio.c   |  4 ++--
>  drivers/virtio/virtio_pci_common.c | 11 ---
>  drivers/virtio/virtio_vdpa.c   |  4 ++--
>  6 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
> index 2b6e701776b6..c903e4959f51 100644
> --- a/arch/um/drivers/virtio_uml.c
> +++ b/arch/um/drivers/virtio_uml.c
> @@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>struct irq_affinity *desc)
>  {
> struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
> -   int i, queue_idx = 0, rc;
> +   int i, rc;
> struct virtqueue *vq;
>
> /* not supported for now */
> @@ -1038,7 +1038,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
> continue;
> }
>
> -   vqs[i] = vu_setup_vq(vdev, queue_idx++, vqi->callback,
> +   vqs[i] = vu_setup_vq(vdev, i, vqi->callback,
>  vqi->name, vqi->ctx);
> if (IS_ERR(vqs[i])) {
> rc = PTR_ERR(vqs[i]);
> diff --git a/drivers/remoteproc/remoteproc_virtio.c 
> b/drivers/remoteproc/remoteproc_virtio.c
> index d3f39009b28e..1019b2825c26 100644
> --- a/drivers/remoteproc/remoteproc_virtio.c
> +++ b/drivers/remoteproc/remoteproc_virtio.c
> @@ -185,7 +185,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
>  struct virtqueue_info vqs_info[],
>  struct irq_affinity *desc)
>  {
> -   int i, ret, queue_idx = 0;
> +   int i, ret;
>
> for (i = 0; i < nvqs; ++i) {
> struct virtqueue_info *vqi = _info[i];
> @@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
> continue;
> }
>
> -   vqs[i] = rp_find_vq(vdev, queue_idx++, vqi->callback,
> +   vqs[i] = rp_find_vq(vdev, i, vqi->callback,
> vqi->name, vqi->ctx);
> if (IS_ERR(vqs[i])) {
> ret = PTR_ERR(vqs[i]);
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index 62eca9419ad7..82a3440bbabb 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -694,7 +694,7 @@ static int virtio_ccw_find_vqs(struct virtio_device 
> *vdev, unsigned nvqs,
>  {
> struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> dma64_t *indicatorp = NULL;
> -   int ret, i, queue_idx = 0;
> +   int ret, i;
> struct ccw1 *ccw;
> dma32_t indicatorp_dma = 0;
>
> @@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device 
> *vdev, unsigned nvqs,
> continue;
> }
>
> -   vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, vqi->callback,
> +   vqs[i] = virtio_ccw_setup_vq(vdev, i, vqi->callback,
>  vqi->name, vqi->ctx, ccw);
> if (IS_ERR(vqs[i])) {
> ret = PTR_ERR(vqs[i]);
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 90e784e7b721..db6a0366f082 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -494,7 +494,7 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned int nvqs,
>  {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> int irq = platform_get_irq(vm_dev->pdev, 0);
> -   int i, err, queue_idx = 0;
> +   int i, err;
>
> if (irq < 0)
> return irq;
> @@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned int nvqs,
> continue;
> }
>
> -   vqs[i] = vm_setup_vq(vdev, queue_idx++, vqi->callback,
> +   vqs[i] = vm_setup_vq(vdev, i, vqi->callback,

Re: [PATCH v4 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race

2024-07-10 Thread Haitao Huang

On Fri, 05 Jul 2024 02:45:24 -0500, Dmitrii Kuvaiskii  
 wrote:



Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:

/*
 * CPU1: User space performs
 * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
 * on enclave page X
 */
sgx_encl_remove_pages() {

  mutex_lock(>lock);

  entry = sgx_encl_load_page(encl);
  /*
   * verify that page is
   * trimmed and accepted
   */

  mutex_unlock(>lock);

  /*
   * remove PTE entry; cannot
   * be performed under lock
   */
  sgx_zap_enclave_ptes(encl);
 /*
  * Fault on CPU2 on same page X
  */
 sgx_vma_fault() {
   /*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
   xa_load(>page_array) != NULL ->
   /*
* SGX driver thinks that this page
* was swapped out and loads it
*/
   mutex_lock(>lock);
   /*
* this is effectively a no-op
*/
   entry = sgx_encl_load_page_in_vma();
   /*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
   vmf_insert_pfn(...);

   mutex_unlock(>lock);
   return VM_FAULT_NOPAGE;
 }
  /*
   * continue with page removal
   */
  mutex_lock(>lock);

  sgx_encl_free_epc_page(epc_page) {
/*
 * remove page via EREMOVE
 */
/*
 * free EPC page
 */
sgx_free_epc_page(epc_page);
  }

  xa_erase(>page_array);

  mutex_unlock(>lock);
}

Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.

Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.

Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: sta...@vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii 
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c  
b/arch/x86/kernel/cpu/sgx/ioctl.c

index 5d390df21440..02441883401d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl  
*encl,

/*
 * Do not keep encl->lock because of dependency on
 * mmap_lock acquired in sgx_zap_enclave_ptes().
+*
+* Releasing encl->lock leads to a data race: while CPU1
+* performs sgx_zap_enclave_ptes() and removes the PTE entry
+* for the enclave page, CPU2 may attempt to load this page
+* (because the page is still in enclave's xarray). To prevent
+* CPU2 from loading the page, mark the page as busy.
 */
+   entry->desc |= SGX_ENCL_PAGE_BUSY;

Re: [PATCH v4 1/3] x86/sgx: Split SGX_ENCL_PAGE_BEING_RECLAIMED into two flags

2024-07-10 Thread Haitao Huang

On Fri, 05 Jul 2024 02:45:22 -0500, Dmitrii Kuvaiskii  
 wrote:



SGX_ENCL_PAGE_BEING_RECLAIMED flag is set when the enclave page is being
reclaimed (moved to the backing store). This flag however has two
logical meanings:

1. Don't attempt to load the enclave page (the page is busy).
2. Don't attempt to remove the PCMD page corresponding to this enclave
   page (the PCMD page is busy).

To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed. A
future commit will introduce a new case when the enclave page is being
removed; this new case will set only the SGX_ENCL_PAGE_BUSY flag.


LGTM.
Reviewed-by: Haitao Huang 
Thanks
Haitao

Re: [PATCH v2 01/12] uprobes: update outdated comment

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 6:33 AM Oleg Nesterov  wrote:
>
> On 07/03, Oleg Nesterov wrote:
> >
> > > /*
> > > -* The NULL 'tsk' here ensures that any faults that occur here
> > > -* will not be accounted to the task.  'mm' *is* current->mm,
> > > -* but we treat this as a 'remote' access since it is
> > > -* essentially a kernel access to the memory.
> > > +* 'mm' *is* current->mm, but we treat this as a 'remote' access since
> > > +* it is essentially a kernel access to the memory.
> > >  */
> > > result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, , NULL);
> >
> > OK, this makes it less confusing, so
> >
> > Acked-by: Oleg Nesterov 
> >
> > -
> > but it still looks confusing to me. This code used to pass tsk = NULL
> > only to avoid tsk->maj/min_flt++ in faultin_page().
> >
> > But today mm_account_fault() increments these counters without checking
> > FAULT_FLAG_REMOTE, mm == current->mm, so it seems it would be better to
> > just use get_user_pages() and remove this comment?
>
> Well, yes, it still looks confusing, imo.
>
> Andrii, I hope you won't mind if I redo/resend this and the next cleanup?
>
> The next one only updates the comment above uprobe_write_opcode(), but
> it would be nice to explain mmap_write_lock() in register_for_each_vma().
>

I don't mind a bit, thanks for sending the patches!

> Oleg.
>
>

Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-07-10 Thread Andrii Nakryiko

On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra  wrote:
>
> On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> > On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra  wrote:
> > >
> > > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > > > +#ifdef CONFIG_UPROBES
> > > > +/*
> > > > + * Heuristic-based check if uprobe is installed at the function entry.
> > > > + *
> > > > + * Under assumption of user code being compiled with frame pointers,
> > > > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > > > + *
> > > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common 
> > > > pattern.
> > > > + * If we get this wrong, captured stack trace might have one extra 
> > > > bogus
> > > > + * entry, but the rest of stack trace will still be meaningful.
> > > > + */
> > > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > > > +{
> > > > + struct arch_uprobe *auprobe;
> > > > +
> > > > + if (!current->utask)
> > > > + return false;
> > > > +
> > > > + auprobe = current->utask->auprobe;
> > > > + if (!auprobe)
> > > > + return false;
> > > > +
> > > > + /* push %rbp/%ebp */
> > > > + if (auprobe->insn[0] == 0x55)
> > > > + return true;
> > > > +
> > > > + /* endbr64 (64-bit only) */
> > > > + if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > > > + return true;
> > >
> > > I meant to reply to Josh suggesting this, but... how can this be? If you
> > > scribble the ENDBR with an INT3 things will #CP and we'll never get to
> > > the #BP.
> >
> > Well, it seems like it works in practice, I just tried. Here's the
> > disassembly of the function:
> >
> > 19d0 :
> > 19d0: f3 0f 1e fa   endbr64
> > 19d4: 55pushq   %rbp
> > 19d5: 48 89 e5  movq%rsp, %rbp
> > 19d8: 48 83 ec 10   subq$0x10, %rsp
> > 19dc: 48 8d 3d fe ed ff ff  leaq-0x1202(%rip), %rdi
> >  # 0x7e1 <__isoc99_scanf+0x7e1>
> > 19e3: 48 8d 75 fc   leaq-0x4(%rbp), %rsi
> > 19e7: b0 00 movb$0x0, %al
> > 19e9: e8 f2 00 00 00callq   0x1ae0 
> > <__isoc99_scanf+0x1ae0>
> > 19ee: b8 01 00 00 00movl$0x1, %eax
> > 19f3: 48 83 c4 10   addq$0x10, %rsp
> > 19f7: 5dpopq%rbp
> > 19f8: c3retq
> > 19f9: 0f 1f 80 00 00 00 00  nopl(%rax)
> >
> > And here's the state when uprobe is attached:
> >
> > (gdb) disass/r urandlib_api_v1
> > Dump of assembler code for function urandlib_api_v1:
> >0x7ffb734e39d0 <+0>: cc  int3
> >0x7ffb734e39d1 <+1>: 0f 1e fanop%edx
> >0x7ffb734e39d4 <+4>: 55  push   %rbp
> >0x7ffb734e39d5 <+5>: 48 89 e5mov%rsp,%rbp
> >0x7ffb734e39d8 <+8>: 48 83 ec 10 sub$0x10,%rsp
> >0x7ffb734e39dc <+12>:48 8d 3d fe ed ff fflea
> > -0x1202(%rip),%rdi# 0x7ffb734e27e1
> >0x7ffb734e39e3 <+19>:48 8d 75 fc lea
> > -0x4(%rbp),%rsi
> > => 0x7ffb734e39e7 <+23>:b0 00   mov$0x0,%al
> >0x7ffb734e39e9 <+25>:e8 f2 00 00 00  call
> > 0x7ffb734e3ae0 <__isoc99_scanf@plt>
> >0x7ffb734e39ee <+30>:b8 01 00 00 00  mov$0x1,%eax
> >0x7ffb734e39f3 <+35>:48 83 c4 10 add$0x10,%rsp
> >0x7ffb734e39f7 <+39>:5d  pop%rbp
> >0x7ffb734e39f8 <+40>:c3  ret
> >
> >
> > You can see it replaced the first byte, the following 3 bytes are
> > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> > you can see I stepped through a few more instructions.
> >
> > Works by accident?
>
> Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
> that supports it.

OK, I don't know what the implications are, but it's a good accident :)

Anyways, what should I do for v4? Drop is_endbr6() check or keep it?

Re: [PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override

2024-07-10 Thread Google

On Wed, 10 Jul 2024 15:13:07 +0200
Jiri Olsa  wrote:

> On Wed, Jul 10, 2024 at 04:59:39PM +0800, Menglong Dong wrote:
> > After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction
> 
> should be in Fixes: tag probably ?

Yes, I'll add a Fixed tag.

> 
> > pointer with original one"), "bpf_kprobe_override" is not used anywhere
> > anymore, and we can remove it now.
> > 
> > Signed-off-by: Menglong Dong 
> 
> lgtm, cc-ing Masami
> 
> Acked-by: Jiri Olsa 

Thanks!

> 
> jirka
> 
> > ---
> >  include/linux/trace_events.h | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index 9df3e2973626..9435185c10ef 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -880,7 +880,6 @@ do {
> > \
> >  struct perf_event;
> >  
> >  DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
> > -DECLARE_PER_CPU(int, bpf_kprobe_override);
> >  
> >  extern int  perf_trace_init(struct perf_event *event);
> >  extern void perf_trace_destroy(struct perf_event *event);
> > -- 
> > 2.39.2
> > 
> > 


-- 
Masami Hiramatsu (Google)

Re: [PATCH v2 01/12] uprobes: update outdated comment

2024-07-10 Thread Oleg Nesterov

On 07/03, Oleg Nesterov wrote:
>
> > /*
> > -* The NULL 'tsk' here ensures that any faults that occur here
> > -* will not be accounted to the task.  'mm' *is* current->mm,
> > -* but we treat this as a 'remote' access since it is
> > -* essentially a kernel access to the memory.
> > +* 'mm' *is* current->mm, but we treat this as a 'remote' access since
> > +* it is essentially a kernel access to the memory.
> >  */
> > result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, , NULL);
>
> OK, this makes it less confusing, so
>
> Acked-by: Oleg Nesterov 
>
> -
> but it still looks confusing to me. This code used to pass tsk = NULL
> only to avoid tsk->maj/min_flt++ in faultin_page().
>
> But today mm_account_fault() increments these counters without checking
> FAULT_FLAG_REMOTE, mm == current->mm, so it seems it would be better to
> just use get_user_pages() and remove this comment?

Well, yes, it still looks confusing, imo.

Andrii, I hope you won't mind if I redo/resend this and the next cleanup?

The next one only updates the comment above uprobe_write_opcode(), but
it would be nice to explain mmap_write_lock() in register_for_each_vma().

Oleg.

Re: [PATCH v3] init: staging: Fix missing warning/taint on builtin code

2024-07-10 Thread Greg Kroah-Hartman

On Sat, Jul 06, 2024 at 12:15:01AM -0300, Ágatha Isabelle Chris Moreira Guedes 
wrote:
> Fix the absence of warning message and kernel tainting when initializing
> drivers from the `drivers/staging` subtree from initcalls (when
> configured as built-in).
> 
> When such a driver is built as module and the module is loaded, the
> `load_module()` function taints the kernel to signal code of unknown
> quality is loaded, and produces a warning like this:
> 
> [8.076352] rts5208: module is from the staging directory, the
> quality is unknown, you have been warned.
> 
> The same behaviour is absent, however, when a staging driver is compiled
> as built-in on the kernel image, since loading it happens through
> initcalls and not through load_module().
> 
> This might prevent relevant information of being available on a bug
> report (i.e. on a panic log) among other possible problems.
> 
> NOTES:
> - The patch is written in such a way that all non-staging drivers are
>   kept the way they were, except for staging drivers built with
>   `-DSTAGING_CODE`.

That's good!

> - Since it changes some macros related to clang LTO as well, I tested it
>   and it works properly in kernels compiled with both clang and gcc.

This is odd, why is it messing with LTO stuff?  It should be much more
"self contained" than this I feel like.

I see what you are doing by trying to use some of the LTO macros again,
but in doing so, it makes it really hard to understand the diff and feel
comfortable with this.

If you want to stick with what you have done here, can you split it up a
bit more?  Once patch for the LTO header file changes and then another
that only adds the staging stuff.  That way it's easier to review and
justify that nothing is going to be broken with this patch.

> - Some `checkpatch.pl` errors, warnings and checks (with `--strict`) are
>   present. Some were already there, some I introduced but I think
>   they're unavoidable. Some IMHO don´t make sense at all, I think they
>   would apply for most regular macros but initcall macros are just way
>   different.

Yeah, checkpatch and macros can get tricky, use your best judgement here
and it looks ok.

> Fixes: 061b1bd394ca ("Staging: add TAINT_CRAP for all drivers/staging code")

I think it really fixes the commit _after_ this one that turns on the
taint for the build :)

anyway, nice work, I think it's almost there!

greg k-h

Re: [PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override

2024-07-10 Thread Jiri Olsa

On Wed, Jul 10, 2024 at 04:59:39PM +0800, Menglong Dong wrote:
> After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction

should be in Fixes: tag probably ?

> pointer with original one"), "bpf_kprobe_override" is not used anywhere
> anymore, and we can remove it now.
> 
> Signed-off-by: Menglong Dong 

lgtm, cc-ing Masami

Acked-by: Jiri Olsa 

jirka

> ---
>  include/linux/trace_events.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 9df3e2973626..9435185c10ef 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -880,7 +880,6 @@ do {  
> \
>  struct perf_event;
>  
>  DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
> -DECLARE_PER_CPU(int, bpf_kprobe_override);
>  
>  extern int  perf_trace_init(struct perf_event *event);
>  extern void perf_trace_destroy(struct perf_event *event);
> -- 
> 2.39.2
> 
>

Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support

2024-07-10 Thread Peter Hilber

On 08.07.24 11:27, David Woodhouse wrote:
> From: David Woodhouse 
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
> 
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.
> 
> Signed-off-by: David Woodhouse 

[...]

> +
> +struct vmclock_abi {
> + /* CONSTANT FIELDS */
> + uint32_t magic;
> +#define VMCLOCK_MAGIC0x4b4c4356 /* "VCLK" */
> + uint32_t size;  /* Size of region containing this structure */
> + uint16_t version;   /* 1 */
> + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
> +#define VMCLOCK_COUNTER_ARM_VCNT 0
> +#define VMCLOCK_COUNTER_X86_TSC  1
> +#define VMCLOCK_COUNTER_INVALID  0xff
> + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
> +#define VMCLOCK_TIME_UTC 0   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_TAI 1   /* Since 1970-01-01 
> 00:00:00z */
> +#define VMCLOCK_TIME_MONOTONIC   2   /* Since 
> undefined epoch */
> +#define VMCLOCK_TIME_INVALID_SMEARED 3   /* Not supported */
> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED   4   /* Not supported */
> +
> + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
> + uint32_t seq_count; /* Low bit means an update is in progress */
> + /*
> +  * This field changes to another non-repeating value when the CPU
> +  * counter is disrupted, for example on live migration. This lets
> +  * the guest know that it should discard any calibration it has
> +  * performed of the counter against external sources (NTP/PTP/etc.).
> +  */
> + uint64_t disruption_marker;
> + uint64_t flags;
> + /* Indicates that the tai_offset_sec field is valid */
> +#define VMCLOCK_FLAG_TAI_OFFSET_VALID(1 << 0)
> + /*
> +  * Optionally used to notify guests of pending maintenance events.
> +  * A guest which provides latency-sensitive services may wish to
> +  * remove itself from service if an event is coming up. Two flags
> +  * indicate the approximate imminence of the event.
> +  */
> +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */
> +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */
> +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID   (1 << 3)
> +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID   (1 << 4)
> +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5)
> +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6)
> + /*
> +  * Even regardless of leap seconds, the time presented through this
> +  * mechanism may not be strictly monotonic. If the counter slows down
> +  * and the host adapts to this discovery, the time calculated from
> +  * the value of the counter immediately after an update to this
> +  * structure, may appear to be *earlier* than a calculation just
> +  * before the update (while the counter was believed to be running
> +  * faster than it now is). A guest operating system will typically
> +  * *skew* its own system clock back towards the reference clock
> +  * exposed here, rather than following this clock directly. If,
> +  * however, this structure is being populated from such a system
> +  * clock which is already handled in such a fashion and the results
> +  * *are* guaranteed to be monotonic, such monotonicity can be
> +  * advertised by setting this bit.
> +  */

I wonder if this might be difficult to define in a standard.

Is there a need to define device and driver behavior in more detail? What
would happen if e.g. the device first decides how to update the clock, but
is then slow to update the SHM?

> +#define VMCLOCK_FLAG_TIME_MONOTONIC  (1 << 7)
> +
> + uint8_t pad[2];
> + uint8_t clock_status;
> +#define VMCLOCK_STATUS_UNKNOWN   0
> +#define VMCLOCK_STATUS_INITIALIZING  1
> +#define VMCLOCK_STATUS_SYNCHRONIZED  2
> +#define VMCLOCK_STATUS_FREERUNNING   3
> +#define VMCLOCK_STATUS_UNRELIABLE4
> +
> + /*
> +  * The time exposed through this device is never smeared. This field
> +  * corresponds to the 'subtype' field in virtio-rtc, which indicates
> +  * the smearing method. However in this case it provides a *hint* to
> +  * the guest operating system, such that *if* the guest OS wants to
> +  * provide its users with an alternative clock which does not follow
> +  * the POSIX CLOCK_REALTIME standard, it may do so in a fashion
> +  * consistent with the other systems in the nearby environment.

Re: [PATCH V2] test/vsock: add install target

2024-07-10 Thread Stefano Garzarella


On Wed, Jul 10, 2024 at 08:27:28PM GMT, Peng Fan (OSS) wrote:

From: Peng Fan 

Add install target for vsock to make Yocto easy to install the images.

Signed-off-by: Peng Fan 
---


LGTM! This is a net-next material, so next time better to specify it 
(e.g. [PATCH net-next]).


If not queued within a week, please resend specifying net-next.

Reviewed-by: Stefano Garzarella 



V2:
Use VSOCK_INSTALL_PATH, drop INSTALL_PATH

tools/testing/vsock/Makefile | 13 +
1 file changed, 13 insertions(+)

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index a7f56a09ca9f..6e0b4e95e230 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -13,3 +13,16 @@ CFLAGS += -g -O2 -Werror -Wall -I. -I../../include 
-I../../../usr/include -Wno-p
clean:
${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
+
+VSOCK_INSTALL_PATH ?=
+
+install: all
+ifdef VSOCK_INSTALL_PATH
+   mkdir -p $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_test $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_perf $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_diag_test $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_uring_test $(VSOCK_INSTALL_PATH)
+else
+   $(error Error: set VSOCK_INSTALL_PATH to use install)
+endif
--
2.37.1

[PATCH V2] test/vsock: add install target

2024-07-10 Thread Peng Fan (OSS)

From: Peng Fan 

Add install target for vsock to make Yocto easy to install the images.

Signed-off-by: Peng Fan 
---

V2:
 Use VSOCK_INSTALL_PATH, drop INSTALL_PATH

 tools/testing/vsock/Makefile | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index a7f56a09ca9f..6e0b4e95e230 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -13,3 +13,16 @@ CFLAGS += -g -O2 -Werror -Wall -I. -I../../include 
-I../../../usr/include -Wno-p
 clean:
${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
 -include *.d
+
+VSOCK_INSTALL_PATH ?=
+
+install: all
+ifdef VSOCK_INSTALL_PATH
+   mkdir -p $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_test $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_perf $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_diag_test $(VSOCK_INSTALL_PATH)
+   install -m 744 vsock_uring_test $(VSOCK_INSTALL_PATH)
+else
+   $(error Error: set VSOCK_INSTALL_PATH to use install)
+endif
-- 
2.37.1

Re: [PATCH] test/vsock: add install target

2024-07-10 Thread Stefano Garzarella


On Wed, Jul 10, 2024 at 11:34:05AM GMT, Peng Fan wrote:

Subject: Re: [PATCH] test/vsock: add install target

On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote:
>> Subject: Re: [PATCH] test/vsock: add install target
>>
>> On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote:
>> >From: Peng Fan 
>> >
>> >Add install target for vsock to make Yocto easy to install the
images.
>> >
>> >Signed-off-by: Peng Fan 
>> >---
>> > tools/testing/vsock/Makefile | 12 
>> > 1 file changed, 12 insertions(+)
>> >
>> >diff --git a/tools/testing/vsock/Makefile
>> >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460
>> 100644
>> >--- a/tools/testing/vsock/Makefile
>> >+++ b/tools/testing/vsock/Makefile
>> >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o
>> msg_zerocopy_common.o
>> > vsock_uring_test: LDLIBS = -luring
>> > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>> >msg_zerocopy_common.o
>> >
>> >+VSOCK_INSTALL_PATH ?= $(abspath .)
>> >+# Avoid changing the rest of the logic here and lib.mk.
>> >+INSTALL_PATH := $(VSOCK_INSTALL_PATH)
>> >+
>> > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include
>> > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow
>> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -
>> D_GNU_SOURCE
>> > .PHONY: all test clean
>> > clean:
>> >   ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
>> vsock_uring_test
>> > -include *.d
>> >+
>> >+install: all
>> >+  @# Ask all targets to install their files
>> >+  mkdir -p $(INSTALL_PATH)/vsock
>>
>> why using the "vsock" subdir?
>>
>> IIUC you were inspired by selftests/Makefile, but it installs under
>> $(INSTALL_PATH)/kselftest/ the scripts used by the main one
>> `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead.
>> So in this case I would install everything in $(INSTALL_PATH).
>>
>> WDYT?
>
>I agree.
>
>>
>> >+  install -m 744 vsock_test $(INSTALL_PATH)/vsock/
>> >+  install -m 744 vsock_perf $(INSTALL_PATH)/vsock/
>> >+  install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/
>> >+  install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/
>>
>> Also from selftests/Makefile, what about using the ifdef instead of
>> using $(abspath .) as default place?
>>
>> I mean this:
>>
>> install: all
>> ifdef INSTALL_PATH
>>...
>> else
>>$(error Error: set INSTALL_PATH to use install) endif
>
>Is the following looks good to you?
>
># Avoid conflict with INSTALL_PATH set by the main Makefile
>VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH)

I'm not a super Makefile expert, but why do we need both
VSOCK_INSTALL_PATH and INSTALL_PATH?


INSTALL_PATH is exported by kernel root directory makefile.
So to user, we need to avoid export INSTALL_PATH here.
So I just follow selftests/Makefile using KSFT_INSTALL_PATH


There is a comment there:

# Avoid changing the rest of the logic here and lib.mk.

Added by commit 17eac6c2db8b2cdfe33d40229bdda2acd86b304a.

IIUC they re-used INSTALL_PATH, just to avoid too many changes in that 
file and in tools/testing/selftests/lib.mk


So, IMHO we should not care about it and only use VSOCK_INSTALL_PATH if 
you don't want to conflict with INSTALL_PATH.


Stefano

[PATCH v2 2/2] virtio: fix vq # for balloon

2024-07-10 Thread Michael S. Tsirkin

virtio balloon communicates to the core that in some
configurations vq #s are non-contiguous by setting name
pointer to NULL.

Unfortunately, core then turned around and just made them
contiguous again. Result is that driver is out of spec.

Implement what the API was supposed to do
in the 1st place. Compatibility with buggy hypervisors
is handled inside virtio-balloon, which is the only driver
making use of this facility, so far.

Message-ID: 
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page 
reports to host")
Cc: "Alexander Duyck" 
Signed-off-by: Michael S. Tsirkin 
---
 arch/um/drivers/virtio_uml.c   |  4 ++--
 drivers/remoteproc/remoteproc_virtio.c |  4 ++--
 drivers/s390/virtio/virtio_ccw.c   |  4 ++--
 drivers/virtio/virtio_mmio.c   |  4 ++--
 drivers/virtio/virtio_pci_common.c | 11 ---
 drivers/virtio/virtio_vdpa.c   |  4 ++--
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 2b6e701776b6..c903e4959f51 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct irq_affinity *desc)
 {
struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
-   int i, queue_idx = 0, rc;
+   int i, rc;
struct virtqueue *vq;
 
/* not supported for now */
@@ -1038,7 +1038,7 @@ static int vu_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = vu_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = vu_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
rc = PTR_ERR(vqs[i]);
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index d3f39009b28e..1019b2825c26 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -185,7 +185,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
 struct virtqueue_info vqs_info[],
 struct irq_affinity *desc)
 {
-   int i, ret, queue_idx = 0;
+   int i, ret;
 
for (i = 0; i < nvqs; ++i) {
struct virtqueue_info *vqi = _info[i];
@@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
continue;
}
 
-   vqs[i] = rp_find_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = rp_find_vq(vdev, i, vqi->callback,
vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 62eca9419ad7..82a3440bbabb 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -694,7 +694,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
dma64_t *indicatorp = NULL;
-   int ret, i, queue_idx = 0;
+   int ret, i;
struct ccw1 *ccw;
dma32_t indicatorp_dma = 0;
 
@@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
continue;
}
 
-   vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = virtio_ccw_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx, ccw);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 90e784e7b721..db6a0366f082 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -494,7 +494,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
int irq = platform_get_irq(vm_dev->pdev, 0);
-   int i, err, queue_idx = 0;
+   int i, err;
 
if (irq < 0)
return irq;
@@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
continue;
}
 
-   vqs[i] = vm_setup_vq(vdev, queue_idx++, vqi->callback,
+   vqs[i] = vm_setup_vq(vdev, i, vqi->callback,
 vqi->name, vqi->ctx);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 7d82facafd75..fa606e7321ad 100644
--- a/drivers/virtio/virtio_pci_common.c
+++

[PATCH v2 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin

QEMU implemented the configuration
VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
incorrectly: it then uses vq3 for reporting, spec says it is always 4.

This is masked by a corresponding bug in driver:
add a work around as I'm going to try and fix the driver bug.

Message-ID: 
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page 
reports to host")
Cc: "Alexander Duyck" 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 54469277ca30..eebeab863697 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -589,8 +589,23 @@ static int init_vqs(struct virtio_balloon *vb)
 
err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
  vqs_info, NULL);
-   if (err)
-   return err;
+   if (err) {
+   /*
+* Try to work around QEMU bug which since 2020 confused vq 
numbers
+* when VIRTIO_BALLOON_F_REPORTING but not
+* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
+*/
+   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING) &&
+   !virtio_has_feature(vb->vdev, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].name = 
"reporting_vq";
+   vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].callback = 
balloon_ack;
+   err = virtio_find_vqs(vb->vdev,
+ VIRTIO_BALLOON_VQ_REPORTING, 
vqs_info, NULL);
+   }
+
+   if (err)
+   return err;
+   }
 
vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
-- 
MST

Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe

2024-07-10 Thread Peter Zijlstra

On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote:
> On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra  wrote:
> >
> > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote:
> > > +#ifdef CONFIG_UPROBES
> > > +/*
> > > + * Heuristic-based check if uprobe is installed at the function entry.
> > > + *
> > > + * Under assumption of user code being compiled with frame pointers,
> > > + * `push %rbp/%ebp` is a good indicator that we indeed are.
> > > + *
> > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
> > > + * If we get this wrong, captured stack trace might have one extra bogus
> > > + * entry, but the rest of stack trace will still be meaningful.
> > > + */
> > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs)
> > > +{
> > > + struct arch_uprobe *auprobe;
> > > +
> > > + if (!current->utask)
> > > + return false;
> > > +
> > > + auprobe = current->utask->auprobe;
> > > + if (!auprobe)
> > > + return false;
> > > +
> > > + /* push %rbp/%ebp */
> > > + if (auprobe->insn[0] == 0x55)
> > > + return true;
> > > +
> > > + /* endbr64 (64-bit only) */
> > > + if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
> > > + return true;
> >
> > I meant to reply to Josh suggesting this, but... how can this be? If you
> > scribble the ENDBR with an INT3 things will #CP and we'll never get to
> > the #BP.
> 
> Well, it seems like it works in practice, I just tried. Here's the
> disassembly of the function:
> 
> 19d0 :
> 19d0: f3 0f 1e fa   endbr64
> 19d4: 55pushq   %rbp
> 19d5: 48 89 e5  movq%rsp, %rbp
> 19d8: 48 83 ec 10   subq$0x10, %rsp
> 19dc: 48 8d 3d fe ed ff ff  leaq-0x1202(%rip), %rdi
>  # 0x7e1 <__isoc99_scanf+0x7e1>
> 19e3: 48 8d 75 fc   leaq-0x4(%rbp), %rsi
> 19e7: b0 00 movb$0x0, %al
> 19e9: e8 f2 00 00 00callq   0x1ae0 <__isoc99_scanf+0x1ae0>
> 19ee: b8 01 00 00 00movl$0x1, %eax
> 19f3: 48 83 c4 10   addq$0x10, %rsp
> 19f7: 5dpopq%rbp
> 19f8: c3retq
> 19f9: 0f 1f 80 00 00 00 00  nopl(%rax)
> 
> And here's the state when uprobe is attached:
> 
> (gdb) disass/r urandlib_api_v1
> Dump of assembler code for function urandlib_api_v1:
>0x7ffb734e39d0 <+0>: cc  int3
>0x7ffb734e39d1 <+1>: 0f 1e fanop%edx
>0x7ffb734e39d4 <+4>: 55  push   %rbp
>0x7ffb734e39d5 <+5>: 48 89 e5mov%rsp,%rbp
>0x7ffb734e39d8 <+8>: 48 83 ec 10 sub$0x10,%rsp
>0x7ffb734e39dc <+12>:48 8d 3d fe ed ff fflea
> -0x1202(%rip),%rdi# 0x7ffb734e27e1
>0x7ffb734e39e3 <+19>:48 8d 75 fc lea-0x4(%rbp),%rsi
> => 0x7ffb734e39e7 <+23>:b0 00   mov$0x0,%al
>0x7ffb734e39e9 <+25>:e8 f2 00 00 00  call
> 0x7ffb734e3ae0 <__isoc99_scanf@plt>
>0x7ffb734e39ee <+30>:b8 01 00 00 00  mov$0x1,%eax
>0x7ffb734e39f3 <+35>:48 83 c4 10 add$0x10,%rsp
>0x7ffb734e39f7 <+39>:5d  pop%rbp
>0x7ffb734e39f8 <+40>:c3  ret
> 
> 
> You can see it replaced the first byte, the following 3 bytes are
> remnants of endb64 (gdb says it's a nop? :)), and then we proceeded,
> you can see I stepped through a few more instructions.
> 
> Works by accident?

Yeah, we don't actually have Userspace IBT enabled yet, even on hardware
that supports it.

Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 03:37:49PM +0800, Jason Wang wrote:
> On Wed, Jul 10, 2024 at 2:16 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote:
> > > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin  wrote:
> > > >
> > > > QEMU implemented the configuration
> > > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > > > incorrectly: it then uses vq3 for reporting, spec says it is always 4.
> > > >
> > > > This is masked by a corresponding bug in driver:
> > > > add a work around as I'm going to try and fix the driver bug.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  drivers/virtio/virtio_balloon.c | 19 +--
> > > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > > b/drivers/virtio/virtio_balloon.c
> > > > index 9a61febbd2f7..7dc3fcd56238 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
> > > >
> > > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
> > > >   callbacks, names, NULL);
> > > > -   if (err)
> > > > -   return err;
> > > > +   if (err) {
> > > > +   /*
> > > > +* Try to work around QEMU bug which since 2020 
> > > > confused vq numbers
> > > > +* when VIRTIO_BALLOON_F_REPORTING but not
> > > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
> > > > +*/
> > > > +   if (virtio_has_feature(vb->vdev, 
> > > > VIRTIO_BALLOON_F_REPORTING) &&
> > > > +   !virtio_has_feature(vb->vdev, 
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > > > +   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > > "reporting_vq";
> > > > +   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > > balloon_ack;
> > > > +   err = virtio_find_vqs(vb->vdev,
> > > > + 
> > > > VIRTIO_BALLOON_VQ_REPORTING, vqs, callbacks, names, NULL);
> > > > +   }
> > > > +
> > > > +   if (err)
> > > > +   return err;
> > > > +   }
> > > >
> > > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
> > > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
> > > > --
> > > > MST
> > > >
> > >
> > > Acked-by: Jason Wang 
> > >
> > > Do we need a spec to say this is something that needs to be considered
> > > by the driver?
> > >
> > > Thanks
> >
> > I'd say it's a temporary situation that we won't need to bother
> > about in several years.
> 
> I mean, should a newly-written virtio-balloon driver care about this?
> If not, it means it can't work for several Qemu versions.
> 
> Thanks

True - I could not find a way to make it work in a way that
would be compatible with old qemu.


> >
> > --
> > MST
> >

RE: [PATCH] test/vsock: add install target

2024-07-10 Thread Peng Fan

> Subject: Re: [PATCH] test/vsock: add install target
> 
> On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote:
> >> Subject: Re: [PATCH] test/vsock: add install target
> >>
> >> On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote:
> >> >From: Peng Fan 
> >> >
> >> >Add install target for vsock to make Yocto easy to install the
> images.
> >> >
> >> >Signed-off-by: Peng Fan 
> >> >---
> >> > tools/testing/vsock/Makefile | 12 
> >> > 1 file changed, 12 insertions(+)
> >> >
> >> >diff --git a/tools/testing/vsock/Makefile
> >> >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460
> >> 100644
> >> >--- a/tools/testing/vsock/Makefile
> >> >+++ b/tools/testing/vsock/Makefile
> >> >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o
> >> msg_zerocopy_common.o
> >> > vsock_uring_test: LDLIBS = -luring
> >> > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
> >> >msg_zerocopy_common.o
> >> >
> >> >+VSOCK_INSTALL_PATH ?= $(abspath .)
> >> >+# Avoid changing the rest of the logic here and lib.mk.
> >> >+INSTALL_PATH := $(VSOCK_INSTALL_PATH)
> >> >+
> >> > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include
> >> > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow
> >> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -
> >> D_GNU_SOURCE
> >> > .PHONY: all test clean
> >> > clean:
> >> >  ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
> >> vsock_uring_test
> >> > -include *.d
> >> >+
> >> >+install: all
> >> >+ @# Ask all targets to install their files
> >> >+ mkdir -p $(INSTALL_PATH)/vsock
> >>
> >> why using the "vsock" subdir?
> >>
> >> IIUC you were inspired by selftests/Makefile, but it installs under
> >> $(INSTALL_PATH)/kselftest/ the scripts used by the main one
> >> `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead.
> >> So in this case I would install everything in $(INSTALL_PATH).
> >>
> >> WDYT?
> >
> >I agree.
> >
> >>
> >> >+ install -m 744 vsock_test $(INSTALL_PATH)/vsock/
> >> >+ install -m 744 vsock_perf $(INSTALL_PATH)/vsock/
> >> >+ install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/
> >> >+ install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/
> >>
> >> Also from selftests/Makefile, what about using the ifdef instead of
> >> using $(abspath .) as default place?
> >>
> >> I mean this:
> >>
> >> install: all
> >> ifdef INSTALL_PATH
> >>...
> >> else
> >>$(error Error: set INSTALL_PATH to use install) endif
> >
> >Is the following looks good to you?
> >
> ># Avoid conflict with INSTALL_PATH set by the main Makefile
> >VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH)
> 
> I'm not a super Makefile expert, but why do we need both
> VSOCK_INSTALL_PATH and INSTALL_PATH?

INSTALL_PATH is exported by kernel root directory makefile.
So to user, we need to avoid export INSTALL_PATH here.
So I just follow selftests/Makefile using KSFT_INSTALL_PATH

Regards,
Peng.

> 
> Stefano
> 
> >
> >install: all
> >ifdef INSTALL_PATH
> >mkdir -p $(INSTALL_PATH)
> >install -m 744 vsock_test $(INSTALL_PATH)
> >install -m 744 vsock_perf $(INSTALL_PATH)
> >install -m 744 vsock_diag_test $(INSTALL_PATH)
> >install -m 744 vsock_uring_test $(INSTALL_PATH) else
> >$(error Error: set INSTALL_PATH to use install) Endif
> >
> >Thanks,
> >Peng.
> >>
> >> Thanks,
> >> Stefano
> >

RE: [EXTERNAL] [PATCH net-next v10 04/15] mm: page_frag: add '_va' suffix to page_frag API

2024-07-10 Thread Subbaraya Sundeep Bhatta

From: Yunsheng Lin  
Sent: Tuesday, July 9, 2024 6:57 PM
To: da...@davemloft.net; k...@kernel.org; pab...@redhat.com
Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; Yunsheng Lin 
; Alexander Duyck ; Jeroen 
de Borst ; Praveen Kaligineedi ; 
Shailend Chand ; Eric Dumazet ; Tony 
Nguyen ; Przemek Kitszel 
; Sunil Kovvuri Goutham ; 
Geethasowjanya Akula ; Subbaraya Sundeep Bhatta 
; Hariprasad Kelam ; Felix Fietkau 
; Sean Wang ; Mark Lee 
; Lorenzo Bianconi ; Matthias 
Brugger ; AngeloGioacchino Del Regno 
; Keith Busch ; 
Jens Axboe ; Christoph Hellwig ; Sagi Grimberg 
; Chaitanya Kulkarni ; Michael S. Tsirkin 
; Jason Wang ; Eugenio Pérez 
; Andrew Morton ; Alexei 
Starovoitov ; Daniel Borkmann ; Jesper 
Dangaard Brouer ; John Fastabend ; 
Andrii Nakryiko ; Martin KaFai Lau ; 
Eduard Zingerman ; Song Liu ; Yonghong Song 
; KP Singh ; Stanislav Fomichev 
; Hao Luo ; Jiri Olsa ; 
David Howells ; Marc Dionne ; 
Trond Myklebust ; Anna Schumaker ; Chuck 
Lever ; Jeff Layton ; Neil Brown 
; Olga Kornievskaia ; Dai Ngo 
; Tom Talpey ; 
intel-wired-...@lists.osuosl.org; linux-arm-ker...@lists.infradead.org; 
linux-media...@lists.infradead.org; linux-n...@lists.infradead.org; 
k...@vger.kernel.org; virtualizat...@lists.linux.dev; linux...@kvack.org; 
b...@vger.kernel.org; linux-...@lists.infradead.org; linux-...@vger.kernel.org
Subject: [PATCH net-next v10 04/15] mm: page_frag: add '_va' suffix to 
page_frag API

Currently the page_frag API is returning 'virtual address'
or 'va' when allocing and expecting 'virtual address' or
'va' as input when freeing.

As we are about to support new use cases that the caller
need to deal with 'struct page' or need to deal with both
'va' and 'struct page'. In order to differentiate the API
handling between 'va' and 'struct page', add '_va' suffix
to the corresponding API mirroring the page_pool_alloc_va()
API of the page_pool. So that callers expecting to deal with
va, page or both va and page may call page_frag_alloc_va*,
page_frag_alloc_pg*, or page_frag_alloc* API accordingly.

CC: Alexander Duyck 
Signed-off-by: Yunsheng Lin 

Looks good.

Reviewed-by: Subbaraya Sundeep 
---
 drivers/net/ethernet/google/gve/gve_rx.c  |  4 ++--
 drivers/net/ethernet/intel/ice/ice_txrx.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  2 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  4 ++--
 .../marvell/octeontx2/nic/otx2_common.c   |  2 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c|  4 ++--
 drivers/nvme/host/tcp.c   |  8 +++
 drivers/nvme/target/tcp.c | 22 +--
 drivers/vhost/net.c   |  6 ++---
 include/linux/page_frag_cache.h   | 21 +-
 include/linux/skbuff.h|  2 +-
 kernel/bpf/cpumap.c   |  2 +-
 mm/page_frag_cache.c  | 12 +-
 mm/page_frag_test.c   | 13 ++-
 net/core/skbuff.c | 14 ++--
 net/core/xdp.c|  2 +-
 net/rxrpc/txbuf.c | 15 +++--
 net/sunrpc/svcsock.c  |  6 ++---
 19 files changed, 74 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_rx.c 
b/drivers/net/ethernet/google/gve/gve_rx.c
index acb73d4d0de6..b6c10100e462 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -729,7 +729,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct 
gve_rx_ring *rx,

total_len = headroom + SKB_DATA_ALIGN(len) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-   frame = page_frag_alloc(>page_cache, total_len, GFP_ATOMIC);
+   frame = page_frag_alloc_va(>page_cache, total_len, GFP_ATOMIC);
if (!frame) {
u64_stats_update_begin(>statss);
rx->xdp_alloc_fails++;
@@ -742,7 +742,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct 
gve_rx_ring *rx,

err = xdp_do_redirect(dev, , xdp_prog);
if (err)
-   page_frag_free(frame);
+   page_frag_free_va(frame);

return err;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c 
b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 8bb743f78fcb..399b317c509d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -126,7 +126,7 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct 
ice_tx_buf *tx_buf)
dev_kfree_skb_any(tx_buf->skb);
break;
case ICE_TX_BUF_XDP_TX:
-   page_frag_free(tx_buf->raw_buf);
+   page_frag_free_va(tx_buf->raw_buf);
break;
case ICE_TX_BUF_XDP_XMIT:
xdp_return_frame(tx_buf->xdpf);
diff --git

Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool

2024-07-10 Thread Cindy Lu

On Wed, 10 Jul 2024 at 14:10, Michael S. Tsirkin  wrote:
>
> On Wed, Jul 10, 2024 at 11:05:48AM +0800, Jason Wang wrote:
> > On Tue, Jul 9, 2024 at 8:42 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote:
> > > > On Tue, 9 Jul 2024 at 11:59, Parav Pandit  wrote:
> > > > >
> > > > > Hi Cindy,
> > > > >
> > > > > > From: Cindy Lu 
> > > > > > Sent: Monday, July 8, 2024 12:17 PM
> > > > > >
> > > > > > Add support for setting the MAC address using the VDPA tool.
> > > > > > This feature will allow setting the MAC address using the VDPA tool.
> > > > > > For example, in vdpa_sim_net, the implementation sets the MAC 
> > > > > > address to
> > > > > > the config space. However, for other drivers, they can implement 
> > > > > > their own
> > > > > > function, not limited to the config space.
> > > > > >
> > > > > > Changelog v2
> > > > > >  - Changed the function name to prevent misunderstanding
> > > > > >  - Added check for blk device
> > > > > >  - Addressed the comments
> > > > > > Changelog v3
> > > > > >  - Split the function of the net device from 
> > > > > > vdpa_nl_cmd_dev_attr_set_doit
> > > > > >  - Add a lock for the network device's dev_set_attr operation
> > > > > >  - Address the comments
> > > > > >
> > > > > > Cindy Lu (2):
> > > > > >   vdpa: support set mac address from vdpa tool
> > > > > >   vdpa_sim_net: Add the support of set mac address
> > > > > >
> > > > > >  drivers/vdpa/vdpa.c  | 81 
> > > > > > 
> > > > > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++-
> > > > > >  include/linux/vdpa.h |  9 
> > > > > >  include/uapi/linux/vdpa.h|  1 +
> > > > > >  4 files changed, 109 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > --
> > > > > > 2.45.0
> > > > >
> > > > > Mlx5 device already allows setting the mac and mtu during the vdpa 
> > > > > device creation time.
> > > > > Once the vdpa device is created, it binds to vdpa bus and other 
> > > > > driver vhost_vdpa etc bind to it.
> > > > > So there was no good reason in the past to support explicit config 
> > > > > after device add complicate the flow for synchronizing this.
> > > > >
> > > > > The user who wants a device with new attributes, as well destroy and 
> > > > > recreate the vdpa device with new desired attributes.
> > > > >
> > > > > vdpa_sim_net can also be extended for similar way when adding the 
> > > > > vdpa device.
> > > > >
> > > > > Have you considered using the existing tool and kernel in place since 
> > > > > 2021?
> > > > > Such as commit d8ca2fa5be1.
> > > > >
> > > > > An example of it is,
> > > > > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 
> > > > > 9000
> > > > >
> > > > Hi Parav
> > > > Really thanks for your comments. The reason for adding this function
> > > > is to support Kubevirt.
> > > > the problem we meet is that kubevirt chooses one random vdpa device
> > > > from the pool and we don't know which one it going to pick. That means
> > > > we can't get to know the Mac address before it is created. So we plan
> > > > to have this function to change the mac address after it is created
> > > > Thanks
> > > > cindy
> > >
> > > Well you will need to change kubevirt to teach it to set
> > > mac address, right?
> >
> > That's the plan. Adding Leonardo.
> >
> > Thanks
>
> So given you are going to change kubevirt, can we
> change it to create devices as needed with the
> existing API?
>
Hi Micheal and Parav,
I'm really not familiar with kubevirt, hope Leonardo can help answer
these questions
Hi @Leonardo Milleri
would you help answer these questions?

Thanks
Cindy
> > >
> > > --
> > > MST
> > >
>

Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

2024-07-10 Thread Borislav Petkov

On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> IIUC, its an abbreviation of a Latin word and is used as a synonym for 
> "namely"
> or "that is to say".
> Might not be the best choice in this case. Will change it.

I learn new stuff every day:

https://en.wikipedia.org/wiki/Viz.

> Userspace error decoding tools like the rasdaemon gather related hardware 
> error
> information through the tracepoints. As such, its important to have these two
> registers in the tracepoint so that the tools like rasdaemon can parse them
> and output the supplemental error information like FRU Text contained in them.

Put *that* in the commit message - do not explain what the patch does.

> Got it. The first SoB entry is of the primary author. The successive SoB's are
> from the people handling and transporting the patch.

Exactly!

> IOW, the route taken by a patch, as its propagated, to maintainers and 
> eventually
> to Linus, should be evident from the SoB chain.

You got it.

> With this set, the first two elements of the vendor data dynamic array are 
> SYND 1/2
> registers while the third element is MCA_CONFIG (added through patch 4 of the 
> set).
> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are 
> interpreted
> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
> 
> Thus, we depend on array's layout for accurate FRU Text decoding in the 
> rasdaemon.

So it sounds to me like we want to document and thus freeze the
vendor-specific blob layout because tools are going to be using and parsing
it. And this will spare us the kernel version checks.

And new additions to that AMD-specific blob will come at the end and will
have to be documented too.

That sounds like an ok compromise to me...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH] tracing: Fix overflow in get_free_elt()

2024-07-10 Thread Tze-nan Wu

"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.

Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.

Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".

Co-developed-by: Cheng-Jui Wang 
Signed-off-by: Cheng-Jui Wang 
Signed-off-by: Tze-nan Wu 
---
 kernel/trace/tracing_map.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index a4dcf0f24352..3a56e7c8aa4f 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -454,7 +454,7 @@ static struct tracing_map_elt *get_free_elt(struct 
tracing_map *map)
struct tracing_map_elt *elt = NULL;
int idx;
 
-   idx = atomic_inc_return(>next_elt);
+   idx = atomic_fetch_add_unless(>next_elt, 1, map->max_elts);
if (idx < map->max_elts) {
elt = *(TRACING_MAP_ELT(map->elts, idx));
if (map->ops && map->ops->elt_init)
@@ -699,7 +699,7 @@ void tracing_map_clear(struct tracing_map *map)
 {
unsigned int i;
 
-   atomic_set(>next_elt, -1);
+   atomic_set(>next_elt, 0);
atomic64_set(>hits, 0);
atomic64_set(>drops, 0);
 
@@ -783,7 +783,7 @@ struct tracing_map *tracing_map_create(unsigned int 
map_bits,
 
map->map_bits = map_bits;
map->max_elts = (1 << map_bits);
-   atomic_set(>next_elt, -1);
+   atomic_set(>next_elt, 0);
 
map->map_size = (1 << (map_bits + 1));
map->ops = ops;
-- 
2.18.0

Re: [PATCH] test/vsock: add install target

2024-07-10 Thread Stefano Garzarella


On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote:

Subject: Re: [PATCH] test/vsock: add install target

On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote:
>From: Peng Fan 
>
>Add install target for vsock to make Yocto easy to install the images.
>
>Signed-off-by: Peng Fan 
>---
> tools/testing/vsock/Makefile | 12 
> 1 file changed, 12 insertions(+)
>
>diff --git a/tools/testing/vsock/Makefile
>b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460
100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o
msg_zerocopy_common.o
> vsock_uring_test: LDLIBS = -luring
> vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
>msg_zerocopy_common.o
>
>+VSOCK_INSTALL_PATH ?= $(abspath .)
>+# Avoid changing the rest of the logic here and lib.mk.
>+INSTALL_PATH := $(VSOCK_INSTALL_PATH)
>+
> CFLAGS += -g -O2 -Werror -Wall -I. -I../../include
> -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow
> -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -
D_GNU_SOURCE
> .PHONY: all test clean
> clean:
>${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
vsock_uring_test
> -include *.d
>+
>+install: all
>+   @# Ask all targets to install their files
>+   mkdir -p $(INSTALL_PATH)/vsock

why using the "vsock" subdir?

IIUC you were inspired by selftests/Makefile, but it installs under
$(INSTALL_PATH)/kselftest/ the scripts used by the main one
`run_kselftest.sh`, which is installed in $(INSTALL_PATH instead.
So in this case I would install everything in $(INSTALL_PATH).

WDYT?


I agree.



>+   install -m 744 vsock_test $(INSTALL_PATH)/vsock/
>+   install -m 744 vsock_perf $(INSTALL_PATH)/vsock/
>+   install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/
>+   install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/

Also from selftests/Makefile, what about using the ifdef instead of
using $(abspath .) as default place?

I mean this:

install: all
ifdef INSTALL_PATH
   ...
else
$(error Error: set INSTALL_PATH to use install) endif


Is the following looks good to you?

# Avoid conflict with INSTALL_PATH set by the main Makefile
VSOCK_INSTALL_PATH ?=
INSTALL_PATH := $(VSOCK_INSTALL_PATH)


I'm not a super Makefile expert, but why do we need both 
VSOCK_INSTALL_PATH and INSTALL_PATH?


Stefano



install: all
ifdef INSTALL_PATH
   mkdir -p $(INSTALL_PATH)
   install -m 744 vsock_test $(INSTALL_PATH)
   install -m 744 vsock_perf $(INSTALL_PATH)
   install -m 744 vsock_diag_test $(INSTALL_PATH)
   install -m 744 vsock_uring_test $(INSTALL_PATH)
else
   $(error Error: set INSTALL_PATH to use install)
Endif

Thanks,
Peng.


Thanks,
Stefano

[PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override

2024-07-10 Thread Menglong Dong

After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction
pointer with original one"), "bpf_kprobe_override" is not used anywhere
anymore, and we can remove it now.

Signed-off-by: Menglong Dong 
---
 include/linux/trace_events.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 9df3e2973626..9435185c10ef 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -880,7 +880,6 @@ do {
\
 struct perf_event;
 
 DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
-DECLARE_PER_CPU(int, bpf_kprobe_override);
 
 extern int  perf_trace_init(struct perf_event *event);
 extern void perf_trace_destroy(struct perf_event *event);
-- 
2.39.2

[PATCH RFC 1/1] remoteproc: mediatek: Support reserved CMA regions

2024-07-10 Thread Shun-yi Wang

From: "shun-yi.wang" 

In order to reserve specific Contiguous Memory Allocator (CMA) regions
for hardware use. When the name of the reserved region contains "cma",
then a corresponding CMA heap is added.

Signed-off-by: shun-yi.wang 
---
 drivers/remoteproc/mtk_scp.c | 38 
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
index e744c07507ee..ca0a4a52ece9 100644
--- a/drivers/remoteproc/mtk_scp.c
+++ b/drivers/remoteproc/mtk_scp.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1006,22 +1007,43 @@ EXPORT_SYMBOL_GPL(scp_mapping_dm_addr);
 
 static int scp_map_memory_region(struct mtk_scp *scp)
 {
-   int ret;
+   int ret, i, err;
const struct mtk_scp_sizes_data *scp_sizes;
+   struct device_node *node = scp->dev->of_node;
+   struct of_phandle_iterator it;
+
+   i = 0;
+   of_for_each_phandle(, err, node, "memory-region", NULL, 0) {
+   ret = of_reserved_mem_device_init_by_idx(scp->dev, node, i);
+
+   if (ret) {
+   dev_err(scp->dev, "failed to assign memory-region: 
%s\n",
+   it.node->name);
+   of_node_put(it.node);
+   return -ENOMEM;
+   }
+
+#ifdef CONFIG_DMABUF_HEAPS_CMA
+   if (strstr(it.node->name, "cma")) {
+   /* Reserved cma memory region */
+   ret = dma_heap_add_cma(scp->dev);
+   if (ret) {
+   dev_err(scp->dev, "Failed to add reserved cma");
+   of_node_put(it.node);
+   return ret;
+   }
+   }
+#endif /* CONFIG_DMABUF_HEAPS_CMA */
 
-   ret = of_reserved_mem_device_init(scp->dev);
+   i++;
+   }
 
/* reserved memory is optional. */
-   if (ret == -ENODEV) {
+   if (!i) {
dev_info(scp->dev, "skipping reserved memory initialization.");
return 0;
}
 
-   if (ret) {
-   dev_err(scp->dev, "failed to assign memory-region: %d\n", ret);
-   return -ENOMEM;
-   }
-
/* Reserved SCP code size */
scp_sizes = scp->data->scp_sizes;
scp->cpu_addr = dma_alloc_coherent(scp->dev, scp_sizes->max_dram_size,
-- 
2.18.0

[PATCH RFC 0/1] Support CMA regions

2024-07-10 Thread Shun-yi Wang

From: "shun-yi.wang" 

In order to reserve specific Contiguous Memory Allocator (CMA) regions 
for hardware use. When the name of the reserved region contains "cma", 
then a corresponding CMA heap is added.

In the DTS (Device Tree Source), we may have several memory regions 
with different names, e.g.,
 {
...
memory-region = <_reserved_1>, <_reserved_2>;
};

mem_reserved_1: xxx-xxx-region {
...
};
mem_reserved_2: xxx-xxx-cma-region {
...
};

When the name of the region contains "cma", a corresponding heap 
allocator is added by cma_heap_add().
However, we are unsure if using the name "cma" as an identifier
is a good practice.
I sincerely hope that you can provide me some suggestions, thanks.
Especially within the segment of '#ifdef CONFIG_DMA_BUF_HEAPS_CMA'

Note: The cma_heap_add() is introduced from this patch:
dma-buf: heaps: Introduce cma_heap_add() for non-default CMA heap

shun-yi.wang (1):
  remoteproc: mediatek: Support reserved CMA regions

 drivers/remoteproc/mtk_scp.c | 38 
 1 file changed, 30 insertions(+), 8 deletions(-)

-- 
2.18.0

Re: [PATCH v2] filemap: add trace events for get_pages, map_pages, and fault

2024-07-10 Thread Takaya Saeki

Hello Matthew, I'd appreciate it if you could comment on this.

Thank you.

RE: [PATCH] test/vsock: add install target

2024-07-10 Thread Peng Fan

> Subject: Re: [PATCH] test/vsock: add install target
> 
> On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote:
> >From: Peng Fan 
> >
> >Add install target for vsock to make Yocto easy to install the images.
> >
> >Signed-off-by: Peng Fan 
> >---
> > tools/testing/vsock/Makefile | 12 
> > 1 file changed, 12 insertions(+)
> >
> >diff --git a/tools/testing/vsock/Makefile
> >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460
> 100644
> >--- a/tools/testing/vsock/Makefile
> >+++ b/tools/testing/vsock/Makefile
> >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o
> msg_zerocopy_common.o
> > vsock_uring_test: LDLIBS = -luring
> > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
> >msg_zerocopy_common.o
> >
> >+VSOCK_INSTALL_PATH ?= $(abspath .)
> >+# Avoid changing the rest of the logic here and lib.mk.
> >+INSTALL_PATH := $(VSOCK_INSTALL_PATH)
> >+
> > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include
> > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow
> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -
> D_GNU_SOURCE
> > .PHONY: all test clean
> > clean:
> > ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
> vsock_uring_test
> > -include *.d
> >+
> >+install: all
> >+@# Ask all targets to install their files
> >+mkdir -p $(INSTALL_PATH)/vsock
> 
> why using the "vsock" subdir?
> 
> IIUC you were inspired by selftests/Makefile, but it installs under
> $(INSTALL_PATH)/kselftest/ the scripts used by the main one
> `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead.
> So in this case I would install everything in $(INSTALL_PATH).
> 
> WDYT?

I agree.

> 
> >+install -m 744 vsock_test $(INSTALL_PATH)/vsock/
> >+install -m 744 vsock_perf $(INSTALL_PATH)/vsock/
> >+install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/
> >+install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/
> 
> Also from selftests/Makefile, what about using the ifdef instead of
> using $(abspath .) as default place?
> 
> I mean this:
> 
> install: all
> ifdef INSTALL_PATH
>...
> else
>   $(error Error: set INSTALL_PATH to use install) endif

Is the following looks good to you?

# Avoid conflict with INSTALL_PATH set by the main Makefile 

VSOCK_INSTALL_PATH ?=   

INSTALL_PATH := $(VSOCK_INSTALL_PATH)   



install: all

ifdef INSTALL_PATH  

mkdir -p $(INSTALL_PATH)

install -m 744 vsock_test $(INSTALL_PATH)   

install -m 744 vsock_perf $(INSTALL_PATH)   

install -m 744 vsock_diag_test $(INSTALL_PATH)  

install -m 744 vsock_uring_test $(INSTALL_PATH) 

else

$(error Error: set INSTALL_PATH to use install) 

Endif

Thanks,
Peng.
> 
> Thanks,
> Stefano

Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Jason Wang

On Wed, Jul 10, 2024 at 2:16 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote:
> > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin  wrote:
> > >
> > > QEMU implemented the configuration
> > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > > incorrectly: it then uses vq3 for reporting, spec says it is always 4.
> > >
> > > This is masked by a corresponding bug in driver:
> > > add a work around as I'm going to try and fix the driver bug.
> > >
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/virtio/virtio_balloon.c | 19 +--
> > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > b/drivers/virtio/virtio_balloon.c
> > > index 9a61febbd2f7..7dc3fcd56238 100644
> > > --- a/drivers/virtio/virtio_balloon.c
> > > +++ b/drivers/virtio/virtio_balloon.c
> > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
> > >
> > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
> > >   callbacks, names, NULL);
> > > -   if (err)
> > > -   return err;
> > > +   if (err) {
> > > +   /*
> > > +* Try to work around QEMU bug which since 2020 confused 
> > > vq numbers
> > > +* when VIRTIO_BALLOON_F_REPORTING but not
> > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
> > > +*/
> > > +   if (virtio_has_feature(vb->vdev, 
> > > VIRTIO_BALLOON_F_REPORTING) &&
> > > +   !virtio_has_feature(vb->vdev, 
> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > > +   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > "reporting_vq";
> > > +   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > > balloon_ack;
> > > +   err = virtio_find_vqs(vb->vdev,
> > > + 
> > > VIRTIO_BALLOON_VQ_REPORTING, vqs, callbacks, names, NULL);
> > > +   }
> > > +
> > > +   if (err)
> > > +   return err;
> > > +   }
> > >
> > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
> > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
> > > --
> > > MST
> > >
> >
> > Acked-by: Jason Wang 
> >
> > Do we need a spec to say this is something that needs to be considered
> > by the driver?
> >
> > Thanks
>
> I'd say it's a temporary situation that we won't need to bother
> about in several years.

I mean, should a newly-written virtio-balloon driver care about this?
If not, it means it can't work for several Qemu versions.

Thanks

>
> --
> MST
>

Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter

2024-07-10 Thread Stefano Garzarella


On Wed, Jul 10, 2024 at 03:28:31PM GMT, Jason Wang wrote:

On Wed, Jul 10, 2024 at 3:19 PM Stefano Garzarella  wrote:


On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote:
>On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella  wrote:
>>
>> On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote:
>> >On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella  
wrote:
>> >>
>> >> Hi Cindy, Jason,
>> >>
>> >> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote:
>> >> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu  wrote:
>> >> >>
>> >> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella  
wrote:
>> >> >> >
>> >> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote:
>> >> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella wrote:
>> >> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, but 
some
>> >> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see
>> >> >> > >> https://issues.redhat.com/browse/RHEL-45951).
>> >> >> > >>
>> >> >> > >> So to allow us to test these filesystems, let's add a module 
parameter
>> >> >> > >> to control the size of the simulated virtio-blk devices.
>> >> >> > >> The value is mapped directly to the `capacity` field of the 
virtio-blk
>> >> >> > >> configuration space, so it must be expressed in sector numbers of 
512
>> >> >> > >> bytes.
>> >> >> > >>
>> >> >> > >> The default value (0x4) is the same as the previous value, so 
the
>> >> >> > >> behavior without setting `capacity` remains unchanged.
>> >> >> > >>
>> >> >> > >> Before this patch or with this patch without setting `capacity`:
>> >> >> > >>   $ modprobe vdpa-sim-blk
>> >> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>> >> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>> >> >> > >>   virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 
MB/128 MiB)
>> >> >> > >>
>> >> >> > >> After this patch:
>> >> >> > >>   $ modprobe vdpa-sim-blk capacity=614400
>> >> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>> >> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>> >> >> > >>   virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 
MB/300 MiB)
>> >> >> > >>
>> >> >> > >> Signed-off-by: Stefano Garzarella 
>> >> >> > >
>> >> >> > >What a hack. Cindy was working on adding control over config
>> >> >> > >space, why can't that be used?
>> >> >> >
>> >> >> > If it can be used easily with virtio-blk device too, it will be 
great.
>> >> >> > @Cindy do you plan to support that changes for a virtio-blk device 
too?
>> >> >> >
>> >> >> Hi Stefano
>> >> >> I plan to add support to change the vdpa device's configuration after
>> >> >> it is created.
>> >> >
>> >> >I think for Stefano's case, we can just implement it via provisioning
>> >> >parameters?
>> >>
>> >> Yep, I think we don't need to change it after creation, but specifying
>> >> while creating should be enough.
>> >>
>> >> So, IIUC we can already do it, implementing something similar to
>> >> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right?
>> >
>> >Right.
>> >
>> >>
>> >> What about when we have `shared_backend` set to true for the
>> >> vdpa_sim_blk.ko? In this case the backend is supposed to be shared
>> >> between all the devices to test live migration.
>> >
>> >This seems to be another topic.
>>
>> Yep, but really related. I think we need to handle that case when
>> supporting the `capacity` setting.
>
>Ok, so if I was not wrong, the goal is to test migration.

Sorry, I was not clear, I try to rephrase:
vdpa_sim_blk already supports a module parameter called `shared_backend`
introduced mainly to test live migration on the same host. When that
parameter is on, all the created devices share the same backend and so
we can easily do migration from one to another.

With that parameter on or off, the device is always 128 MB, but now
that's a problem for testing, because it looks like XFS requires at
least 300 MB: https://issues.redhat.com/browse/RHEL-45951

That's why I sent this patch.

When `shared_backend` is off (default), using the provisioning
parameters seems feasible to me, but when it's on, how do I deal with
it?

Being a simulator, we can maybe make it so that only the first device
can change the size for example, or that all devices control the size,
but then we would have to handle the size change at runtime, which I
think is feasible, but it requires some work to send a notification of
configuration change, etc.


Can we mandate the size parameter to be exactly the same as the first
vDPA block simulator?



>
>>
>> >
>> >>
>> >> Maybe we can just change the size of the shared ramdisk to be reflected
>> >> to all devices.
>> >>
>> >> Suggestions?
>> >
>> >Could we specify the path to tmpfs or others during provisioning
>> >instead?  It seems more general (but more work).
>>
>> Then it would almost become a real device, no longer just a simulator.
>> It's enough work, though, as you said, but at that point we'd just

Re: [PATCH] test/vsock: add install target

2024-07-10 Thread Stefano Garzarella


On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote:

From: Peng Fan 

Add install target for vsock to make Yocto easy to install the images.

Signed-off-by: Peng Fan 
---
tools/testing/vsock/Makefile | 12 
1 file changed, 12 insertions(+)

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index a7f56a09ca9f..5c8442fa9460 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o msg_zerocopy_common.o
vsock_uring_test: LDLIBS = -luring
vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o 
msg_zerocopy_common.o

+VSOCK_INSTALL_PATH ?= $(abspath .)
+# Avoid changing the rest of the logic here and lib.mk.
+INSTALL_PATH := $(VSOCK_INSTALL_PATH)
+
CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include 
-Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
clean:
${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
+
+install: all
+   @# Ask all targets to install their files
+   mkdir -p $(INSTALL_PATH)/vsock


why using the "vsock" subdir?

IIUC you were inspired by selftests/Makefile, but it installs under 
$(INSTALL_PATH)/kselftest/ the scripts used by the main one 
`run_kselftest.sh`, which is installed in $(INSTALL_PATH instead.

So in this case I would install everything in $(INSTALL_PATH).

WDYT?


+   install -m 744 vsock_test $(INSTALL_PATH)/vsock/
+   install -m 744 vsock_perf $(INSTALL_PATH)/vsock/
+   install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/
+   install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/


Also from selftests/Makefile, what about using the ifdef instead of 
using $(abspath .) as default place?


I mean this:

install: all
ifdef INSTALL_PATH
  ...
else
$(error Error: set INSTALL_PATH to use install)
endif

Thanks,
Stefano

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-07-10 Thread Petr Pavlu

On 6/17/24 19:58, Sami Tolvanen wrote:
> Hi folks,
> 
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.

Thanks for working on this. Below is some feedback with my (open)SUSE
hat on, although it should be quite general.

> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.

Using the DWARF data makes sense to me. Distribution kernels are
normally built with debuginfo because one has to be able to debug them
later, unsurprisingly. Besides that, one now typically wants to use BPF
together with built-in BTF data (CONFIG_DEBUG_INFO_BTF), which also
requires building the kernel with debuginfo.

I would however keep in mind that producing all DWARF data has some
cost. From a quick test, an x86_64-defconfig build is ~30% slower for me
with CONFIG_DEBUG_INFO=y. The current genksyms tool allows to have
debuginfo disabled when backporting some patches and consequently have
a quicker feedback whether modversions changed. This option would
disappear with gendwarfksyms but I think it is acceptable.

> 
> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.
> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).

In addition to calculating CRCs of exported symbols, genksyms has other
features which I think are important.

Firstly, the genksyms tool has a human-readable storage format for input
data used in the calculation of symbol CRCs. Setting the make variable
KBUILD_SYMTYPES enables dumping this data and storing it in *.symtypes
files.

When a developer later modifies the kernel and wants to check if some
symbols have changed, they can take these files and feed them as
*.symref back to genksyms. This allows the tool to provide an actual
reason why some symbols have changed, instead of just printing that
their CRCs are different.

Is there any plan to add the same functionality to gendwarfksyms, or do
you envison that people will use libabigail, Symbol-Type Graph, or
another tool for making this type of comparison?

Secondly, when distributions want to maintain stable kABI, they need to
be able to deal with patch backports that add new members to structures.
One common approach is to have placeholders in important structures
which can be later replaced by the new members as needed. __GENKSYMS__
ifdefs are then used at the C source level to hide these kABI-compatible
changes from genksyms.

Gendwarfksyms works on the resulting binary and so using such ifdefs
wouldn't work. Instead, I suspect that what is required is a mechanism
to tell the tool that a given change is ok, probably by allowing to
specify some map from the original definition to the new one.

Is there a plan to implement something like this, or how could it be
addressed?

> Another compatibility issue is fitting the extremely long mangled
> Rust symbol names into struct modversion_info, which can only hold
> 55-character names (on 64-bit systems). Previous proposals suggested
> changing the structure to support longer names, but the conclusion was
> that we cannot break userspace tools that parse the version table.
> 
> The next two patches of the series implement support for hashed
> symbol names in struct modversion_info, where names longer than 55
> characters are hashed, and the hash is stored in the name field. To
> avoid breaking userspace tools, the binary hash is prefixed with a
> null-terminated string containing the name of the hash function. While
> userspace tools can later be updated to potentially produce more
> useful information about the long symbols, this allows them to
> continue working in the meantime.

I think this approach with hashed names is quite complex. I'd personally
be also in favor of having a new section with variable-length strings to
store the names of imported symbols. As yet another alternative, it
should be also possible

Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter

2024-07-10 Thread Jason Wang

On Wed, Jul 10, 2024 at 3:19 PM Stefano Garzarella  wrote:
>
> On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote:
> >On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote:
> >> >On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella  
> >> >wrote:
> >> >>
> >> >> Hi Cindy, Jason,
> >> >>
> >> >> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote:
> >> >> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu  wrote:
> >> >> >>
> >> >> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella 
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote:
> >> >> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella 
> >> >> >> > >wrote:
> >> >> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, 
> >> >> >> > >> but some
> >> >> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see
> >> >> >> > >> https://issues.redhat.com/browse/RHEL-45951).
> >> >> >> > >>
> >> >> >> > >> So to allow us to test these filesystems, let's add a module 
> >> >> >> > >> parameter
> >> >> >> > >> to control the size of the simulated virtio-blk devices.
> >> >> >> > >> The value is mapped directly to the `capacity` field of the 
> >> >> >> > >> virtio-blk
> >> >> >> > >> configuration space, so it must be expressed in sector numbers 
> >> >> >> > >> of 512
> >> >> >> > >> bytes.
> >> >> >> > >>
> >> >> >> > >> The default value (0x4) is the same as the previous value, 
> >> >> >> > >> so the
> >> >> >> > >> behavior without setting `capacity` remains unchanged.
> >> >> >> > >>
> >> >> >> > >> Before this patch or with this patch without setting `capacity`:
> >> >> >> > >>   $ modprobe vdpa-sim-blk
> >> >> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
> >> >> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
> >> >> >> > >>   virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 
> >> >> >> > >> MB/128 MiB)
> >> >> >> > >>
> >> >> >> > >> After this patch:
> >> >> >> > >>   $ modprobe vdpa-sim-blk capacity=614400
> >> >> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
> >> >> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
> >> >> >> > >>   virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 
> >> >> >> > >> MB/300 MiB)
> >> >> >> > >>
> >> >> >> > >> Signed-off-by: Stefano Garzarella 
> >> >> >> > >
> >> >> >> > >What a hack. Cindy was working on adding control over config
> >> >> >> > >space, why can't that be used?
> >> >> >> >
> >> >> >> > If it can be used easily with virtio-blk device too, it will be 
> >> >> >> > great.
> >> >> >> > @Cindy do you plan to support that changes for a virtio-blk device 
> >> >> >> > too?
> >> >> >> >
> >> >> >> Hi Stefano
> >> >> >> I plan to add support to change the vdpa device's configuration after
> >> >> >> it is created.
> >> >> >
> >> >> >I think for Stefano's case, we can just implement it via provisioning
> >> >> >parameters?
> >> >>
> >> >> Yep, I think we don't need to change it after creation, but specifying
> >> >> while creating should be enough.
> >> >>
> >> >> So, IIUC we can already do it, implementing something similar to
> >> >> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right?
> >> >
> >> >Right.
> >> >
> >> >>
> >> >> What about when we have `shared_backend` set to true for the
> >> >> vdpa_sim_blk.ko? In this case the backend is supposed to be shared
> >> >> between all the devices to test live migration.
> >> >
> >> >This seems to be another topic.
> >>
> >> Yep, but really related. I think we need to handle that case when
> >> supporting the `capacity` setting.
> >
> >Ok, so if I was not wrong, the goal is to test migration.
>
> Sorry, I was not clear, I try to rephrase:
> vdpa_sim_blk already supports a module parameter called `shared_backend`
> introduced mainly to test live migration on the same host. When that
> parameter is on, all the created devices share the same backend and so
> we can easily do migration from one to another.
>
> With that parameter on or off, the device is always 128 MB, but now
> that's a problem for testing, because it looks like XFS requires at
> least 300 MB: https://issues.redhat.com/browse/RHEL-45951
>
> That's why I sent this patch.
>
> When `shared_backend` is off (default), using the provisioning
> parameters seems feasible to me, but when it's on, how do I deal with
> it?
>
> Being a simulator, we can maybe make it so that only the first device
> can change the size for example, or that all devices control the size,
> but then we would have to handle the size change at runtime, which I
> think is feasible, but it requires some work to send a notification of
> configuration change, etc.

Can we mandate the size parameter to be exactly the same as the first
vDPA block simulator?

>
> >
> >>
> >> >
> >> >>
> >> >> Maybe we can just change the size of the shared ramdisk to be reflected
> >> >> to all

Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter

2024-07-10 Thread Stefano Garzarella

On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote:

On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella  wrote:

On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote:
>On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella  wrote:
>>
>> Hi Cindy, Jason,
>>
>> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote:
>> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu  wrote:
>> >>
>> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella  
wrote:
>> >> >
>> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote:
>> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella wrote:
>> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, but 
some
>> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see
>> >> > >> https://issues.redhat.com/browse/RHEL-45951).
>> >> > >>
>> >> > >> So to allow us to test these filesystems, let's add a module 
parameter
>> >> > >> to control the size of the simulated virtio-blk devices.
>> >> > >> The value is mapped directly to the `capacity` field of the 
virtio-blk
>> >> > >> configuration space, so it must be expressed in sector numbers of 512
>> >> > >> bytes.
>> >> > >>
>> >> > >> The default value (0x4) is the same as the previous value, so the
>> >> > >> behavior without setting `capacity` remains unchanged.
>> >> > >>
>> >> > >> Before this patch or with this patch without setting `capacity`:
>> >> > >>   $ modprobe vdpa-sim-blk
>> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>> >> > >>   virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 
MB/128 MiB)
>> >> > >>
>> >> > >> After this patch:
>> >> > >>   $ modprobe vdpa-sim-blk capacity=614400
>> >> > >>   $ vdpa dev add mgmtdev vdpasim_blk name blk0
>> >> > >>   virtio_blk virtio6: 1/0/0 default/read/poll queues
>> >> > >>   virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 
MB/300 MiB)
>> >> > >>
>> >> > >> Signed-off-by: Stefano Garzarella 
>> >> > >
>> >> > >What a hack. Cindy was working on adding control over config
>> >> > >space, why can't that be used?
>> >> >
>> >> > If it can be used easily with virtio-blk device too, it will be great.
>> >> > @Cindy do you plan to support that changes for a virtio-blk device too?
>> >> >
>> >> Hi Stefano
>> >> I plan to add support to change the vdpa device's configuration after
>> >> it is created.
>> >
>> >I think for Stefano's case, we can just implement it via provisioning
>> >parameters?
>>
>> Yep, I think we don't need to change it after creation, but specifying
>> while creating should be enough.
>>
>> So, IIUC we can already do it, implementing something similar to
>> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right?
>
>Right.
>
>>
>> What about when we have `shared_backend` set to true for the
>> vdpa_sim_blk.ko? In this case the backend is supposed to be shared
>> between all the devices to test live migration.
>
>This seems to be another topic.

Yep, but really related. I think we need to handle that case when
supporting the `capacity` setting.

Ok, so if I was not wrong, the goal is to test migration.

Sorry, I was not clear, I try to rephrase:
vdpa_sim_blk already supports a module parameter called `shared_backend` 
introduced mainly to test live migration on the same host. When that 
parameter is on, all the created devices share the same backend and so 
we can easily do migration from one to another.

With that parameter on or off, the device is always 128 MB, but now 
that's a problem for testing, because it looks like XFS requires at 
least 300 MB: https://issues.redhat.com/browse/RHEL-45951

That's why I sent this patch.

When `shared_backend` is off (default), using the provisioning 
parameters seems feasible to me, but when it's on, how do I deal with 
it?

Being a simulator, we can maybe make it so that only the first device 
can change the size for example, or that all devices control the size, 
but then we would have to handle the size change at runtime, which I 
think is feasible, but it requires some work to send a notification of 
configuration change, etc.

>
>>
>> Maybe we can just change the size of the shared ramdisk to be reflected
>> to all devices.
>>
>> Suggestions?
>
>Could we specify the path to tmpfs or others during provisioning
>instead?  It seems more general (but more work).

Then it would almost become a real device, no longer just a simulator.
It's enough work, though, as you said, but at that point we'd just have
to specify the backend file to use for the device.

In that case what API would we need to use to allow the user to set the
backend file?

Yes, I think we can allow some vendor specific provisioning parameters.

But not sure it's an overkill for the use case here. If others are
happy with the shared_backed. I'm fine.

Yeah, maybe it's overkill and I don't have much time these days :-(

I think the easiest way is to merge this patch, but I

Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote:
> On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin  wrote:
> >
> > QEMU implemented the configuration
> > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > incorrectly: it then uses vq3 for reporting, spec says it is always 4.
> >
> > This is masked by a corresponding bug in driver:
> > add a work around as I'm going to try and fix the driver bug.
> >
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/virtio/virtio_balloon.c | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c 
> > b/drivers/virtio/virtio_balloon.c
> > index 9a61febbd2f7..7dc3fcd56238 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb)
> >
> > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs,
> >   callbacks, names, NULL);
> > -   if (err)
> > -   return err;
> > +   if (err) {
> > +   /*
> > +* Try to work around QEMU bug which since 2020 confused vq 
> > numbers
> > +* when VIRTIO_BALLOON_F_REPORTING but not
> > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered.
> > +*/
> > +   if (virtio_has_feature(vb->vdev, 
> > VIRTIO_BALLOON_F_REPORTING) &&
> > +   !virtio_has_feature(vb->vdev, 
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > +   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "reporting_vq";
> > +   callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = 
> > balloon_ack;
> > +   err = virtio_find_vqs(vb->vdev,
> > + VIRTIO_BALLOON_VQ_REPORTING, 
> > vqs, callbacks, names, NULL);
> > +   }
> > +
> > +   if (err)
> > +   return err;
> > +   }
> >
> > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
> > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
> > --
> > MST
> >
> 
> Acked-by: Jason Wang 
> 
> Do we need a spec to say this is something that needs to be considered
> by the driver?
> 
> Thanks

I'd say it's a temporary situation that we won't need to bother
about in several years.

-- 
MST

Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool

2024-07-10 Thread Michael S. Tsirkin

On Wed, Jul 10, 2024 at 11:05:48AM +0800, Jason Wang wrote:
> On Tue, Jul 9, 2024 at 8:42 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote:
> > > On Tue, 9 Jul 2024 at 11:59, Parav Pandit  wrote:
> > > >
> > > > Hi Cindy,
> > > >
> > > > > From: Cindy Lu 
> > > > > Sent: Monday, July 8, 2024 12:17 PM
> > > > >
> > > > > Add support for setting the MAC address using the VDPA tool.
> > > > > This feature will allow setting the MAC address using the VDPA tool.
> > > > > For example, in vdpa_sim_net, the implementation sets the MAC address 
> > > > > to
> > > > > the config space. However, for other drivers, they can implement 
> > > > > their own
> > > > > function, not limited to the config space.
> > > > >
> > > > > Changelog v2
> > > > >  - Changed the function name to prevent misunderstanding
> > > > >  - Added check for blk device
> > > > >  - Addressed the comments
> > > > > Changelog v3
> > > > >  - Split the function of the net device from 
> > > > > vdpa_nl_cmd_dev_attr_set_doit
> > > > >  - Add a lock for the network device's dev_set_attr operation
> > > > >  - Address the comments
> > > > >
> > > > > Cindy Lu (2):
> > > > >   vdpa: support set mac address from vdpa tool
> > > > >   vdpa_sim_net: Add the support of set mac address
> > > > >
> > > > >  drivers/vdpa/vdpa.c  | 81 
> > > > > 
> > > > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++-
> > > > >  include/linux/vdpa.h |  9 
> > > > >  include/uapi/linux/vdpa.h|  1 +
> > > > >  4 files changed, 109 insertions(+), 1 deletion(-)
> > > > >
> > > > > --
> > > > > 2.45.0
> > > >
> > > > Mlx5 device already allows setting the mac and mtu during the vdpa 
> > > > device creation time.
> > > > Once the vdpa device is created, it binds to vdpa bus and other driver 
> > > > vhost_vdpa etc bind to it.
> > > > So there was no good reason in the past to support explicit config 
> > > > after device add complicate the flow for synchronizing this.
> > > >
> > > > The user who wants a device with new attributes, as well destroy and 
> > > > recreate the vdpa device with new desired attributes.
> > > >
> > > > vdpa_sim_net can also be extended for similar way when adding the vdpa 
> > > > device.
> > > >
> > > > Have you considered using the existing tool and kernel in place since 
> > > > 2021?
> > > > Such as commit d8ca2fa5be1.
> > > >
> > > > An example of it is,
> > > > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 
> > > > 9000
> > > >
> > > Hi Parav
> > > Really thanks for your comments. The reason for adding this function
> > > is to support Kubevirt.
> > > the problem we meet is that kubevirt chooses one random vdpa device
> > > from the pool and we don't know which one it going to pick. That means
> > > we can't get to know the Mac address before it is created. So we plan
> > > to have this function to change the mac address after it is created
> > > Thanks
> > > cindy
> >
> > Well you will need to change kubevirt to teach it to set
> > mac address, right?
> 
> That's the plan. Adding Leonardo.
> 
> Thanks

So given you are going to change kubevirt, can we
change it to create devices as needed with the
existing API?

> >
> > --
> > MST
> >

78 matches

Mail list logo