Re: [PATCH] test/vsock: add install target
On Wed, 10 Jul 2024 13:58:39 +0200 Stefano Garzarella wrote: > There is a comment there: > > # Avoid changing the rest of the logic here and lib.mk. > > Added by commit 17eac6c2db8b2cdfe33d40229bdda2acd86b304a. > > IIUC they re-used INSTALL_PATH, just to avoid too many changes in that > file and in tools/testing/selftests/lib.mk > > So, IMHO we should not care about it and only use VSOCK_INSTALL_PATH if > you don't want to conflict with INSTALL_PATH. Any reason why vsock isn't part of selftests in the first place?
Re: [PATCH] bootconfig: Remove duplicate included header file linux/bootconfig.h
On Thu, 11 Jul 2024 02:21:53 +0200 Thorsten Blum wrote: > The header file linux/bootconfig.h is included whether __KERNEL__ is > defined or not. > > Include it only once before the #ifdef/#else/#endif preprocessor > directives and remove the following make includecheck warning: > > linux/bootconfig.h is included more than once > OK, but this leaves only a comment in !__KERNEL__ part. That comment should be moved and updated too. Please move this on the top of this file and remove `!__KERNEL__` part. /* * NOTE: This file is also used by tools/bootconfig, because tools/bootconfig * will run the parser for sanity test. * This does NOT mean lib/bootconfig.c is always available in the user space. * However, if you change this file, please make sure the tools/bootconfig * has no issue on building and running. */ Thank you, > Signed-off-by: Thorsten Blum > --- > lib/bootconfig.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/bootconfig.c b/lib/bootconfig.c > index 97f8911ea339..297871455db5 100644 > --- a/lib/bootconfig.c > +++ b/lib/bootconfig.c > @@ -4,8 +4,9 @@ > * Masami Hiramatsu > */ > > -#ifdef __KERNEL__ > #include > + > +#ifdef __KERNEL__ > #include > #include > #include > @@ -33,7 +34,6 @@ const char * __init xbc_get_embedded_bootconfig(size_t > *size) > * However, if you change this file, please make sure the tools/bootconfig > * has no issue on building and running. > */ > -#include > #endif > > /* > -- > 2.45.2 > -- Masami Hiramatsu (Google)
[ANNOUNCE] 5.10.220-rt112
Hello RT-list! I'm pleased to announce the 5.10.220-rt112 stable release. This release is just an update to the new stable 5.10.220 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v5.10-rt Head SHA1: b7364cea19a02eda9039a2d34c5f105d77f4697a Or to build 5.10.220-rt112 directly, the following patches should be applied: https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.220.xz https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.220-rt112.patch.xz Signing key fingerprint: 9354 0649 9972 8D31 D464 D140 F394 A423 F8E6 7C26 All keys used for the above files and repositories can be found on the following git repository: git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git Enjoy! Luis
[PATCH] bootconfig: Remove duplicate included header file linux/bootconfig.h
The header file linux/bootconfig.h is included whether __KERNEL__ is defined or not. Include it only once before the #ifdef/#else/#endif preprocessor directives and remove the following make includecheck warning: linux/bootconfig.h is included more than once Signed-off-by: Thorsten Blum --- lib/bootconfig.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/bootconfig.c b/lib/bootconfig.c index 97f8911ea339..297871455db5 100644 --- a/lib/bootconfig.c +++ b/lib/bootconfig.c @@ -4,8 +4,9 @@ * Masami Hiramatsu */ -#ifdef __KERNEL__ #include + +#ifdef __KERNEL__ #include #include #include @@ -33,7 +34,6 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size) * However, if you change this file, please make sure the tools/bootconfig * has no issue on building and running. */ -#include #endif /* -- 2.45.2
[PATCH v2 4/4] trace: platform/x86/intel/ifs: Add SBAF trace support
From: Jithu Joseph Add tracing support for the SBAF IFS tests, which may be useful for debugging systems that fail these tests. Log details like test content batch number, SBAF bundle ID, program index and the exact errors or warnings encountered by each HT thread during the test. Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Signed-off-by: Jithu Joseph Signed-off-by: Kuppuswamy Sathyanarayanan --- include/trace/events/intel_ifs.h | 27 drivers/platform/x86/intel/ifs/runtest.c | 1 + 2 files changed, 28 insertions(+) diff --git a/include/trace/events/intel_ifs.h b/include/trace/events/intel_ifs.h index 0d88ebf2c980..70323acde1de 100644 --- a/include/trace/events/intel_ifs.h +++ b/include/trace/events/intel_ifs.h @@ -35,6 +35,33 @@ TRACE_EVENT(ifs_status, __entry->status) ); +TRACE_EVENT(ifs_sbaf, + + TP_PROTO(int batch, union ifs_sbaf activate, union ifs_sbaf_status status), + + TP_ARGS(batch, activate, status), + + TP_STRUCT__entry( + __field(u64,status ) + __field(int,batch ) + __field(u16,bundle ) + __field(u16,pgm ) + ), + + TP_fast_assign( + __entry->status = status.data; + __entry->batch = batch; + __entry->bundle = activate.bundle_idx; + __entry->pgm= activate.pgm_idx; + ), + + TP_printk("batch: 0x%.2x, bundle_idx: 0x%.4x, pgm_idx: 0x%.4x, status: 0x%.16llx", + __entry->batch, + __entry->bundle, + __entry->pgm, + __entry->status) +); + #endif /* _TRACE_IFS_H */ /* This part must be outside protection */ diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c index a812ec7761b9..2ec9a8c93d04 100644 --- a/drivers/platform/x86/intel/ifs/runtest.c +++ b/drivers/platform/x86/intel/ifs/runtest.c @@ -530,6 +530,7 @@ static int dosbaf(void *data) */ wrmsrl(MSR_ACTIVATE_SBAF, run_params->activate->data); rdmsrl(MSR_SBAF_STATUS, status.data); + trace_ifs_sbaf(ifsd->cur_batch, *run_params->activate, status); /* Pass back the result of the test */ if (cpu == first) -- 2.25.1
[PATCH v2 3/4] platform/x86/intel/ifs: Add SBAF test support
From: Jithu Joseph In a core, the SBAF test engine is shared between sibling CPUs. An SBAF test image contains multiple bundles. Each bundle is further composed of subunits called programs. When a SBAF test (for a particular core) is triggered by the user, each SBAF bundle from the loaded test image is executed sequentially on all the threads on the core using the stop_core_cpuslocked mechanism. Each bundle execution is initiated by writing to MSR_ACTIVATE_SBAF. SBAF test bundle execution may be aborted when an interrupt occurs or if the CPU does not have enough power budget for the test. In these cases the kernel restarts the test from the aborted bundle. SBAF execution is not retried if the test fails or if the test makes no forward progress after 5 retries. Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Signed-off-by: Jithu Joseph Signed-off-by: Kuppuswamy Sathyanarayanan --- drivers/platform/x86/intel/ifs/ifs.h | 30 +++ drivers/platform/x86/intel/ifs/runtest.c | 234 +++ 2 files changed, 264 insertions(+) diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h index 600bb8a1b285..b261be46bce8 100644 --- a/drivers/platform/x86/intel/ifs/ifs.h +++ b/drivers/platform/x86/intel/ifs/ifs.h @@ -157,6 +157,8 @@ #define MSR_SBAF_HASHES_STATUS 0x02b9 #define MSR_AUTHENTICATE_AND_COPY_SBAF_CHUNK 0x02ba #define MSR_SBAF_CHUNKS_AUTHENTICATION_STATUS 0x02bb +#define MSR_ACTIVATE_SBAF 0x02bc +#define MSR_SBAF_STATUS0x02bd #define MSR_COPY_SCAN_HASHES 0x02c2 #define MSR_SCAN_HASHES_STATUS 0x02c3 @@ -283,6 +285,34 @@ union ifs_array { }; }; +/* MSR_ACTIVATE_SBAF bit fields */ +union ifs_sbaf { + u64 data; + struct { + u32 bundle_idx :9; + u32 rsvd1 :5; + u32 pgm_idx :2; + u32 rsvd2 :16; + u32 delay :31; + u32 sigmce :1; + }; +}; + +/* MSR_SBAF_STATUS bit fields */ +union ifs_sbaf_status { + u64 data; + struct { + u32 bundle_idx :9; + u32 rsvd1 :5; + u32 pgm_idx :2; + u32 rsvd2 :16; + u32 error_code :8; + u32 rsvd3 :21; + u32 test_fail :1; + u32 sbaf_status :2; + }; +}; + /* * Driver populated error-codes * 0xFD: Test timed out before completing all the chunks. diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c index 282e4bfe30da..a812ec7761b9 100644 --- a/drivers/platform/x86/intel/ifs/runtest.c +++ b/drivers/platform/x86/intel/ifs/runtest.c @@ -29,6 +29,13 @@ struct run_params { union ifs_status status; }; +struct sbaf_run_params { + struct ifs_data *ifsd; + int *retry_cnt; + union ifs_sbaf *activate; + union ifs_sbaf_status status; +}; + /* * Number of TSC cycles that a logical CPU will wait for the other * logical CPU on the core in the WRMSR(ACTIVATE_SCAN). @@ -146,6 +153,7 @@ static bool can_restart(union ifs_status status) #define SPINUNIT 100 /* 100 nsec */ static atomic_t array_cpus_in; static atomic_t scan_cpus_in; +static atomic_t sbaf_cpus_in; /* * Simplified cpu sibling rendezvous loop based on microcode loader __wait_for_cpus() @@ -387,6 +395,226 @@ static void ifs_array_test_gen1(int cpu, struct device *dev) ifsd->status = SCAN_TEST_PASS; } +#define SBAF_STATUS_PASS 0 +#define SBAF_STATUS_SIGN_FAIL 1 +#define SBAF_STATUS_INTR 2 +#define SBAF_STATUS_TEST_FAIL 3 + +enum sbaf_status_err_code { + IFS_SBAF_NO_ERROR = 0, + IFS_SBAF_OTHER_THREAD_COULD_NOT_JOIN= 1, + IFS_SBAF_INTERRUPTED_BEFORE_RENDEZVOUS = 2, + IFS_SBAF_UNASSIGNED_ERROR_CODE3 = 3, + IFS_SBAF_INVALID_BUNDLE_INDEX = 4, + IFS_SBAF_MISMATCH_ARGS_BETWEEN_THREADS = 5, + IFS_SBAF_CORE_NOT_CAPABLE_CURRENTLY = 6, + IFS_SBAF_UNASSIGNED_ERROR_CODE7 = 7, + IFS_SBAF_EXCEED_NUMBER_OF_THREADS_CONCURRENT= 8, + IFS_SBAF_INTERRUPTED_DURING_EXECUTION = 9, + IFS_SBAF_INVALID_PROGRAM_INDEX = 0xA, + IFS_SBAF_CORRUPTED_CHUNK= 0xB, + IFS_SBAF_DID_NOT_START = 0xC, +}; + +static const char * const sbaf_test_status[] = { + [IFS_SBAF_NO_ERROR] = "SBAF no error", + [IFS_SBAF_OTHER_THREAD_COULD_NOT_JOIN] = "Other thread could not join.", +
[PATCH v2 2/4] platform/x86/intel/ifs: Add SBAF test image loading support
From: Jithu Joseph Structural Based Functional Test at Field (SBAF) is a new type of testing that provides comprehensive core test coverage complementing existing IFS tests like Scan at Field (SAF) or ArrayBist. SBAF device will appear as a new device instance (intel_ifs_2) under /sys/devices/virtual/misc. The user interaction necessary to load the test image and test a particular core is the same as the existing scan test (intel_ifs_0). During the loading stage, the driver will look for a file named ff-mm-ss-.sbft in the /lib/firmware/intel/ifs_2 directory. The hardware interaction needed for loading the image is similar to SAF, with the only difference being the MSR addresses used. Reuse the SAF image loading code, passing the SBAF-specific MSR addresses via struct ifs_test_msrs in the driver device data. Unlike SAF, the SBAF test image chunks are further divided into smaller logical entities called bundles. Since the SBAF test is initiated per bundle, cache the maximum number of bundles in the current image, which is used for iterating through bundles during SBAF test execution. Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Signed-off-by: Jithu Joseph Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan --- arch/x86/include/asm/msr-index.h | 2 ++ drivers/platform/x86/intel/ifs/ifs.h | 37 ++- drivers/platform/x86/intel/ifs/core.c | 24 + drivers/platform/x86/intel/ifs/load.c | 15 --- 4 files changed, 73 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index e022e6eb766c..503d7acdda3f 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -247,6 +247,8 @@ #define MSR_INTEGRITY_CAPS_ARRAY_BIST BIT(MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT) #define MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT 4 #define MSR_INTEGRITY_CAPS_PERIODIC_BIST BIT(MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT) +#define MSR_INTEGRITY_CAPS_SBAF_BIT8 +#define MSR_INTEGRITY_CAPS_SBAF BIT(MSR_INTEGRITY_CAPS_SBAF_BIT) #define MSR_INTEGRITY_CAPS_SAF_GEN_MASKGENMASK_ULL(10, 9) #define MSR_LBR_NHM_FROM 0x0680 diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h index 738cbc7a5d00..600bb8a1b285 100644 --- a/drivers/platform/x86/intel/ifs/ifs.h +++ b/drivers/platform/x86/intel/ifs/ifs.h @@ -126,11 +126,38 @@ * The driver does not make use of this, it only tests one core at a time. * * .. [#f1] https://github.com/intel/TBD + * + * + * Structural Based Functional Test at Field (SBAF): + * + * + * SBAF is a new type of testing that provides comprehensive core test + * coverage complementing Scan at Field (SAF) testing. SBAF mimics the + * manufacturing screening environment and leverages the same test suite. + * It makes use of Design For Test (DFT) observation sites and features + * to maximize coverage in minimum time. + * + * Similar to the SAF test, SBAF isolates the core under test from the + * rest of the system during execution. Upon completion, the core + * seamlessly resets to its pre-test state and resumes normal operation. + * Any machine checks or hangs encountered during the test are confined to + * the isolated core, preventing disruption to the overall system. + * + * Like the SAF test, the SBAF test is also divided into multiple batches, + * and each batch test can take hundreds of milliseconds (100-200 ms) to + * complete. If such a lengthy interruption is undesirable, it is + * recommended to relocate the time-sensitive applications to other cores. */ #include #include #define MSR_ARRAY_BIST 0x0105 + +#define MSR_COPY_SBAF_HASHES 0x02b8 +#define MSR_SBAF_HASHES_STATUS 0x02b9 +#define MSR_AUTHENTICATE_AND_COPY_SBAF_CHUNK 0x02ba +#define MSR_SBAF_CHUNKS_AUTHENTICATION_STATUS 0x02bb + #define MSR_COPY_SCAN_HASHES 0x02c2 #define MSR_SCAN_HASHES_STATUS 0x02c3 #define MSR_AUTHENTICATE_AND_COPY_CHUNK0x02c4 @@ -140,6 +167,7 @@ #define MSR_ARRAY_TRIGGER 0x02d6 #define MSR_ARRAY_STATUS 0x02d7 #define MSR_SAF_CTRL 0x04f0 +#define MSR_SBAF_CTRL 0x04f8 #define SCAN_NOT_TESTED0 #define SCAN_TEST_PASS 1 @@ -147,6 +175,7 @@ #define IFS_TYPE_SAF 0 #define IFS_TYPE_ARRAY_BIST1 +#define IFS_TYPE_SBAF 2 #define ARRAY_GEN0 0 #define ARRAY_GEN1 1 @@ -196,7 +225,8 @@ union ifs_chunks_auth_status_gen2 { u16 valid_chunks; u16 total_chunks;
[PATCH v2 1/4] platform/x86/intel/ifs: Refactor MSR usage in IFS test code
IFS tests such as Scan at Field (SAF) or Structural Based Functional Test at Field (SBAF), require the user to load a test image. The image loading process is similar across these tests, with the only difference being MSR addresses used. To reuse the code between these tests, remove the hard coding of MSR addresses and allow the driver to pass the MSR addresses per IFS test (via driver device data). Add a new structure named "struct ifs_test_msrs" to specify the test-specific MSR addresses. Each IFS test will provide this structure, enabling them to reuse the common code. This is a preliminary patch in preparation for the addition of SBAF support. Reviewed-by: Ashok Raj Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan --- drivers/platform/x86/intel/ifs/ifs.h | 25 + drivers/platform/x86/intel/ifs/core.c | 9 + drivers/platform/x86/intel/ifs/load.c | 24 ++-- 3 files changed, 48 insertions(+), 10 deletions(-) diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h index 56b9f3e3cf76..738cbc7a5d00 100644 --- a/drivers/platform/x86/intel/ifs/ifs.h +++ b/drivers/platform/x86/intel/ifs/ifs.h @@ -266,6 +266,22 @@ struct ifs_test_caps { int test_num; }; +/** + * struct ifs_test_msrs - MSRs used in IFS tests + * @copy_hashes: Copy test hash data + * @copy_hashes_status: Status of copied test hash data + * @copy_chunks: Copy chunks of the test data + * @copy_chunks_status: Status of the copied test data chunks + * @test_ctrl: Control the test attributes + */ +struct ifs_test_msrs { + u32 copy_hashes; + u32 copy_hashes_status; + u32 copy_chunks; + u32 copy_chunks_status; + u32 test_ctrl; +}; + /** * struct ifs_data - attributes related to intel IFS driver * @loaded_version: stores the currently loaded ifs image version. @@ -299,6 +315,7 @@ struct ifs_work { struct ifs_device { const struct ifs_test_caps *test_caps; + const struct ifs_test_msrs *test_msrs; struct ifs_data rw_data; struct miscdevice misc; }; @@ -319,6 +336,14 @@ static inline const struct ifs_test_caps *ifs_get_test_caps(struct device *dev) return d->test_caps; } +static inline const struct ifs_test_msrs *ifs_get_test_msrs(struct device *dev) +{ + struct miscdevice *m = dev_get_drvdata(dev); + struct ifs_device *d = container_of(m, struct ifs_device, misc); + + return d->test_msrs; +} + extern bool *ifs_pkg_auth; int ifs_load_firmware(struct device *dev); int do_core_test(int cpu, struct device *dev); diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c index 7b11198d85a1..1a7ca74abb61 100644 --- a/drivers/platform/x86/intel/ifs/core.c +++ b/drivers/platform/x86/intel/ifs/core.c @@ -40,9 +40,18 @@ static const struct ifs_test_caps array_test = { .test_num = IFS_TYPE_ARRAY_BIST, }; +static const struct ifs_test_msrs scan_msrs = { + .copy_hashes = MSR_COPY_SCAN_HASHES, + .copy_hashes_status = MSR_SCAN_HASHES_STATUS, + .copy_chunks = MSR_AUTHENTICATE_AND_COPY_CHUNK, + .copy_chunks_status = MSR_CHUNKS_AUTHENTICATION_STATUS, + .test_ctrl = MSR_SAF_CTRL, +}; + static struct ifs_device ifs_devices[] = { [IFS_TYPE_SAF] = { .test_caps = _test, + .test_msrs = _msrs, .misc = { .name = "intel_ifs_0", .minor = MISC_DYNAMIC_MINOR, diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c index 39f19cb51749..ad0c107f0922 100644 --- a/drivers/platform/x86/intel/ifs/load.c +++ b/drivers/platform/x86/intel/ifs/load.c @@ -118,15 +118,17 @@ static void copy_hashes_authenticate_chunks(struct work_struct *work) union ifs_scan_hashes_status hashes_status; union ifs_chunks_auth_status chunk_status; struct device *dev = local_work->dev; + const struct ifs_test_msrs *msrs; int i, num_chunks, chunk_size; struct ifs_data *ifsd; u64 linear_addr, base; u32 err_code; ifsd = ifs_get_data(dev); + msrs = ifs_get_test_msrs(dev); /* run scan hash copy */ - wrmsrl(MSR_COPY_SCAN_HASHES, ifs_hash_ptr); - rdmsrl(MSR_SCAN_HASHES_STATUS, hashes_status.data); + wrmsrl(msrs->copy_hashes, ifs_hash_ptr); + rdmsrl(msrs->copy_hashes_status, hashes_status.data); /* enumerate the scan image information */ num_chunks = hashes_status.num_chunks; @@ -147,8 +149,8 @@ static void copy_hashes_authenticate_chunks(struct work_struct *work) linear_addr = base + i * chunk_size; linear_addr |= i; - wrmsrl(MSR_AUTHENTICATE_AND_COPY_CHUNK, linear_addr); - rdmsrl(MSR_CHUNKS_AUTHENTICATION_STATUS, chunk_status.data); + wrmsrl(msrs->copy_chunks,
[PATCH v2 0/4] Add SBAF test to IFS
This patch series adds support for Structural Based Functional Test at Field (SBAF) in the IFS driver. SBAF is a new type of testing that provides comprehensive core test coverage, complementing existing IFS tests like Scan at Field (SAF) and ArrayBist. Granite Rapids (GNR) is the first platform that supports SBAF. SBAF mimics the manufacturing screening environment and leverages the same test suite. It makes use of Design For Test (DFT) observation sites and features to maximize coverage in minimum time. Similar to the SAF test, SBAF isolates the core under test from the rest of the system during execution. Upon completion, the core seamlessly resets to its pre-test state and resumes normal operation. Any machine checks or hangs encountered during the test are confined to the isolated core, preventing disruption to the overall system. Like SAF test, the SBAF test is also divided into multiple batches, and each batch test can take hundreds of milliseconds (100-200 ms) to complete. If such a lengthy interruption is undesirable, it is recommended to relocate the time-sensitive applications to other cores for the duration of the test. Patch Details: Patch 1/4: Refactors MSR usage in IFS image loading code to share the code between SBAF and SAF tests. Patch 2/4: Leverages SAF image loading logic and adds SBAF image loading support. Patch 3/4: Adds support for user to trigger SBAF test. Patch 4/4: Adds trace support for SBAF tests. This series is originally authored by Jithu Joseph. I have made cleanups related to code reuse between the SBAF and SAF tests and resubmitting it for review. Changes since v1: * Addressed trace struct hole issue (Steven) * Fixed initialization issue in ifs_sbaf_test_core() (Ilpo) Jithu Joseph (3): platform/x86/intel/ifs: Add SBAF test image loading support platform/x86/intel/ifs: Add SBAF test support trace: platform/x86/intel/ifs: Add SBAF trace support Kuppuswamy Sathyanarayanan (1): platform/x86/intel/ifs: Refactor MSR usage in IFS test code arch/x86/include/asm/msr-index.h | 2 + drivers/platform/x86/intel/ifs/ifs.h | 92 - include/trace/events/intel_ifs.h | 27 +++ drivers/platform/x86/intel/ifs/core.c| 33 drivers/platform/x86/intel/ifs/load.c| 39 ++-- drivers/platform/x86/intel/ifs/runtest.c | 235 +++ 6 files changed, 413 insertions(+), 15 deletions(-) -- 2.25.1
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 03:54:22PM -0700, Daniel Verkamp wrote: > On Wed, Jul 10, 2024 at 1:39 PM Michael S. Tsirkin wrote: > > > > On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote: > > > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin > > > wrote: > > > > > > > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > > > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > virtio balloon communicates to the core that in some > > > > > > configurations vq #s are non-contiguous by setting name > > > > > > pointer to NULL. > > > > > > > > > > > > Unfortunately, core then turned around and just made them > > > > > > contiguous again. Result is that driver is out of spec. > > > > > > > > > > Thanks for fixing this - I think the overall approach of the patch > > > > > looks good. > > > > > > > > > > > Implement what the API was supposed to do > > > > > > in the 1st place. Compatibility with buggy hypervisors > > > > > > is handled inside virtio-balloon, which is the only driver > > > > > > making use of this facility, so far. > > > > > > > > > > In addition to virtio-balloon, I believe the same problem also affects > > > > > the virtio-fs device, since queue 1 is only supposed to be present if > > > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > > > > > meant to be queue indexes 2 and up. From a look at the Linux driver > > > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > > > > > and assumes that request queues start at index 1 rather than 2, which > > > > > looks out of spec to me, but the current device implementations (that > > > > > I am aware of, anyway) are also broken in the same way, so it ends up > > > > > working today. Queue numbering in a spec-compliant device and the > > > > > current Linux driver would mismatch; what the driver considers to be > > > > > the first request queue (index 1) would be ignored by the device since > > > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated. > > > > > > > > > > > > Oh, thanks a lot for pointing this out! > > > > > > > > I see so this patch is no good as is, we need to add a workaround for > > > > virtio-fs first. > > > > > > > > QEMU workaround is simple - just add an extra queue. But I did not > > > > reasearch how this would interact with vhost-user. > > > > > > > > From driver POV, I guess we could just ignore queue # 1 - would that be > > > > ok or does it have performance implications? > > > > > > As a driver workaround for non-compliant devices, I think ignoring the > > > first request queue would be a reasonable approach if the device's > > > config advertises num_request_queues > 1. Unfortunately, both > > > virtiofsd and crosvm's virtio-fs device have hard-coded > > > num_request_queues =1, so this won't help with those existing devices. > > > > Do they care what the vq # is though? > > We could do some magic to translate VQ #s in qemu. > > > > > > > Maybe there are other devices that we would need to consider as well; > > > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes > > > benchmarks that seem to be from a different virtio-fs implementation > > > that does support multiple request queues, so the workaround could > > > possibly be used there. > > > > > > > Or do what I did for balloon here: try with spec compliant #s first, > > > > if that fails then assume it's the spec issue and shift by 1. > > > > > > If there is a way to "guess and check" without breaking spec-compliant > > > devices, that sounds reasonable too; however, I'm not sure how this > > > would work out in practice: an existing non-compliant device may fail > > > to start if the driver tries to enable queue index 2 when it only > > > supports one request queue, > > > > You don't try to enable queue - driver starts by checking queue size. > > The way my patch works is that it assumes a non existing queue has > > size 0 if not available. > > > > This was actually a documented way to check for PCI and MMIO: > > Read the virtqueue size from queue_size. This controls how big the > > virtqueue is (see 2.6 Virtqueues). > > If this field is 0, the virtqueue does not exist. > > MMIO: > > If the returned value is zero (0x0) the queue is not available. > > > > unfortunately not for CCW, but I guess CCW implementations outside > > of QEMU are uncommon enough that we can assume it's the same? > > > > > > To me the above is also a big hint that drivers are allowed to > > query size for queues that do not exist. > > Ah, that makes total sense - detecting queue presence by non-zero > queue size sounds good to me, and it should work in the normal virtio > device case. > > I am not sure about vhost-user, since there is no way for the > front-end to ask the back-end for a queue's size; the confusingly > named VHOST_USER_SET_VRING_NUM allows the front-end to configure the > size of a queue, but there's no
Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
On 7/10/2024 04:38, Borislav Petkov wrote: > On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote: > >> Userspace error decoding tools like the rasdaemon gather related hardware >> error >> information through the tracepoints. As such, its important to have these two >> registers in the tracepoint so that the tools like rasdaemon can parse them >> and output the supplemental error information like FRU Text contained in >> them. > > Put *that* in the commit message - do not explain what the patch does. > Will do. >> With this set, the first two elements of the vendor data dynamic array are >> SYND 1/2 >> registers while the third element is MCA_CONFIG (added through patch 4 of >> the set). >> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are >> interpreted >> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set. >> >> Thus, we depend on array's layout for accurate FRU Text decoding in the >> rasdaemon. > > So it sounds to me like we want to document and thus freeze the > vendor-specific blob layout because tools are going to be using and parsing > it. And this will spare us the kernel version checks. > > And new additions to that AMD-specific blob will come at the end and will > have to be documented too. > > That sounds like an ok compromise to me... > > Thx. > Sounds good! Is it okay to document this where the new wrapper and vendor-specific data structures are being defined, in arch/x86/include/asm/mce.h? Similar approach has been taken for struct mce. Or do you have any other recommendations? -- Thanks, Avadhut Naik
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 1:39 PM Michael S. Tsirkin wrote: > > On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote: > > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin wrote: > > > > > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > virtio balloon communicates to the core that in some > > > > > configurations vq #s are non-contiguous by setting name > > > > > pointer to NULL. > > > > > > > > > > Unfortunately, core then turned around and just made them > > > > > contiguous again. Result is that driver is out of spec. > > > > > > > > Thanks for fixing this - I think the overall approach of the patch > > > > looks good. > > > > > > > > > Implement what the API was supposed to do > > > > > in the 1st place. Compatibility with buggy hypervisors > > > > > is handled inside virtio-balloon, which is the only driver > > > > > making use of this facility, so far. > > > > > > > > In addition to virtio-balloon, I believe the same problem also affects > > > > the virtio-fs device, since queue 1 is only supposed to be present if > > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > > > > meant to be queue indexes 2 and up. From a look at the Linux driver > > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > > > > and assumes that request queues start at index 1 rather than 2, which > > > > looks out of spec to me, but the current device implementations (that > > > > I am aware of, anyway) are also broken in the same way, so it ends up > > > > working today. Queue numbering in a spec-compliant device and the > > > > current Linux driver would mismatch; what the driver considers to be > > > > the first request queue (index 1) would be ignored by the device since > > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated. > > > > > > > > > Oh, thanks a lot for pointing this out! > > > > > > I see so this patch is no good as is, we need to add a workaround for > > > virtio-fs first. > > > > > > QEMU workaround is simple - just add an extra queue. But I did not > > > reasearch how this would interact with vhost-user. > > > > > > From driver POV, I guess we could just ignore queue # 1 - would that be > > > ok or does it have performance implications? > > > > As a driver workaround for non-compliant devices, I think ignoring the > > first request queue would be a reasonable approach if the device's > > config advertises num_request_queues > 1. Unfortunately, both > > virtiofsd and crosvm's virtio-fs device have hard-coded > > num_request_queues =1, so this won't help with those existing devices. > > Do they care what the vq # is though? > We could do some magic to translate VQ #s in qemu. > > > > Maybe there are other devices that we would need to consider as well; > > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes > > benchmarks that seem to be from a different virtio-fs implementation > > that does support multiple request queues, so the workaround could > > possibly be used there. > > > > > Or do what I did for balloon here: try with spec compliant #s first, > > > if that fails then assume it's the spec issue and shift by 1. > > > > If there is a way to "guess and check" without breaking spec-compliant > > devices, that sounds reasonable too; however, I'm not sure how this > > would work out in practice: an existing non-compliant device may fail > > to start if the driver tries to enable queue index 2 when it only > > supports one request queue, > > You don't try to enable queue - driver starts by checking queue size. > The way my patch works is that it assumes a non existing queue has > size 0 if not available. > > This was actually a documented way to check for PCI and MMIO: > Read the virtqueue size from queue_size. This controls how big the > virtqueue is (see 2.6 Virtqueues). > If this field is 0, the virtqueue does not exist. > MMIO: > If the returned value is zero (0x0) the queue is not available. > > unfortunately not for CCW, but I guess CCW implementations outside > of QEMU are uncommon enough that we can assume it's the same? > > > To me the above is also a big hint that drivers are allowed to > query size for queues that do not exist. Ah, that makes total sense - detecting queue presence by non-zero queue size sounds good to me, and it should work in the normal virtio device case. I am not sure about vhost-user, since there is no way for the front-end to ask the back-end for a queue's size; the confusingly named VHOST_USER_SET_VRING_NUM allows the front-end to configure the size of a queue, but there's no corresponding GET message. > > and a spec-compliant device would probably > > balk if the driver tries to enable queue 1 but does not negotiate > > VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the > > whole virtio device initialization process if a device
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 1:18 PM Oleg Nesterov wrote: > > On 07/10, Andrii Nakryiko wrote: > > > > On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov wrote: > > > > > > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() > > > + > > > put_uprobe(). And to me this change simplifies the code a bit. > > > > > > Signed-off-by: Oleg Nesterov > > > --- > > > include/linux/uprobes.h | 14 ++-- > > > kernel/events/uprobes.c | 45 - > > > kernel/trace/bpf_trace.c| 12 +- > > > kernel/trace/trace_uprobe.c | 28 +++ > > > 4 files changed, 41 insertions(+), 58 deletions(-) > > > > > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > > > index aa89a8b67039..399509befcf4 100644 > > > --- a/include/linux/uprobes.h > > > +++ b/include/linux/uprobes.h > > > > I don't see struct uprobe forward-declared in this header, maybe we > > should add it? > > Probably yes, thanks... Although the current code already uses > struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y. > Thanks, will add. > Yep, I saw that and was wondering as well. > > > static inline int > > > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer > > > *uc, bool add) > > > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add) > > > { > > > return -ENOSYS; > > > } > > > > complete aside, when I was looking at this code I was wondering why we > > even need uprobe_apply, it looks like some hacky variant of > > uprobe_register and uprobe_unregister. > > All I can say is that > > - I can hardly recall this logic, I'll try to do this tomorrow > and write another email > > - in any case this logic is ugly and needs more cleanups > > - but this patch only tries to simplify this code without any > visible changes. yep, that's why it's an aside, up to you > > > > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister); > > > * refcount is released when the last @uc for the @uprobe > > > * unregisters. Caller of uprobe_register() is required to keep @inode > > > * (and the containing mount) referenced. > > > - * > > > - * Return errno if it cannot successully install probes > > > - * else return 0 (success) > > > > mention that it never returns NULL, but rather encodes error code > > inside the pointer on error? It's an important part of the contract. > > OK... > > > > /* > > > > this should be /** for doccomment checking (you'd get a warning for > > missing @uprobe if there was this extra star) > > Well, this is what we have before this patch, but OK > > > > * uprobe_apply - unregister an already registered probe. > > > - * @inode: the file in which the probe has to be removed. > > > - * @offset: offset from the start of the file. > > > > add @uprobe description now? > > If only I knew what this @uprobe description can say ;) I'm pointing this out because I accidentally used /** for comment for some function, and I got some bot report about missing argument. I think /** makes sense for documenting "public API" function, so which is why all the above. > > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path > > > *path, struct bpf_uprobe *uprobes, > > > { > > > u32 i; > > > > > > - for (i = 0; i < cnt; i++) { > > > - uprobe_unregister(d_real_inode(path->dentry), > > > uprobes[i].offset, > > > - [i].consumer); > > > - } > > > + for (i = 0; i < cnt; i++) > > > > you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL > > check if you null-out it below) > > Hmm... are you sure? I'll re-check... See also the end of my email. no, you are right, it should be fine > > > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union > > > bpf_attr *attr, struct bpf_prog *pr > > > _uprobe_multi_link_lops, prog); > > > > > > for (i = 0; i < cnt; i++) { > > > - err = uprobe_register(d_real_inode(link->path.dentry), > > > + uprobes[i].uprobe = > > > uprobe_register(d_real_inode(link->path.dentry), > > > > will uprobe keep inode alive as long as uprobe is attached? > > Why? No, this patch doesn't (shouldn't) change the current behaviour. > > The users still need kern_path/path_put to, say, prevent umount. ok, then link->path has to stay, I was just wondering if we need to keep it alive > > > we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle > > only NULL vs non-NULL case > > Not sure I understand... > > > or maybe better yet let's just have local struct uprobe variable and > > only assign it if registration succeeded > > We can, but > > > (still need NULL check in > > bpf_uprobe_unregister above) > > Why? > > If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on > error, it passes cnt = i where is the 1st
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 12:58:11PM -0700, Daniel Verkamp wrote: > On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin wrote: > > > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin > > > wrote: > > > > > > > > virtio balloon communicates to the core that in some > > > > configurations vq #s are non-contiguous by setting name > > > > pointer to NULL. > > > > > > > > Unfortunately, core then turned around and just made them > > > > contiguous again. Result is that driver is out of spec. > > > > > > Thanks for fixing this - I think the overall approach of the patch looks > > > good. > > > > > > > Implement what the API was supposed to do > > > > in the 1st place. Compatibility with buggy hypervisors > > > > is handled inside virtio-balloon, which is the only driver > > > > making use of this facility, so far. > > > > > > In addition to virtio-balloon, I believe the same problem also affects > > > the virtio-fs device, since queue 1 is only supposed to be present if > > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > > > meant to be queue indexes 2 and up. From a look at the Linux driver > > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > > > and assumes that request queues start at index 1 rather than 2, which > > > looks out of spec to me, but the current device implementations (that > > > I am aware of, anyway) are also broken in the same way, so it ends up > > > working today. Queue numbering in a spec-compliant device and the > > > current Linux driver would mismatch; what the driver considers to be > > > the first request queue (index 1) would be ignored by the device since > > > queue index 1 has no function if F_NOTIFICATION isn't negotiated. > > > > > > Oh, thanks a lot for pointing this out! > > > > I see so this patch is no good as is, we need to add a workaround for > > virtio-fs first. > > > > QEMU workaround is simple - just add an extra queue. But I did not > > reasearch how this would interact with vhost-user. > > > > From driver POV, I guess we could just ignore queue # 1 - would that be > > ok or does it have performance implications? > > As a driver workaround for non-compliant devices, I think ignoring the > first request queue would be a reasonable approach if the device's > config advertises num_request_queues > 1. Unfortunately, both > virtiofsd and crosvm's virtio-fs device have hard-coded > num_request_queues =1, so this won't help with those existing devices. Do they care what the vq # is though? We could do some magic to translate VQ #s in qemu. > Maybe there are other devices that we would need to consider as well; > commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes > benchmarks that seem to be from a different virtio-fs implementation > that does support multiple request queues, so the workaround could > possibly be used there. > > > Or do what I did for balloon here: try with spec compliant #s first, > > if that fails then assume it's the spec issue and shift by 1. > > If there is a way to "guess and check" without breaking spec-compliant > devices, that sounds reasonable too; however, I'm not sure how this > would work out in practice: an existing non-compliant device may fail > to start if the driver tries to enable queue index 2 when it only > supports one request queue, You don't try to enable queue - driver starts by checking queue size. The way my patch works is that it assumes a non existing queue has size 0 if not available. This was actually a documented way to check for PCI and MMIO: Read the virtqueue size from queue_size. This controls how big the virtqueue is (see 2.6 Virtqueues). If this field is 0, the virtqueue does not exist. MMIO: If the returned value is zero (0x0) the queue is not available. unfortunately not for CCW, but I guess CCW implementations outside of QEMU are uncommon enough that we can assume it's the same? To me the above is also a big hint that drivers are allowed to query size for queues that do not exist. > and a spec-compliant device would probably > balk if the driver tries to enable queue 1 but does not negotiate > VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the > whole virtio device initialization process if a device fails like > this, then maybe it's feasible. (Or can the driver tweak the virtqueue > configuration and try to set DRIVER_OK repeatedly until it works? It's > not clear to me if this is allowed by the spec, or what device > implementations actually do in practice in this scenario.) > > Thanks, > -- Daniel My patch starts with a spec compliant behaviour. If that fails, try non-compliant one as a fallback. -- MST
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On 07/10, Andrii Nakryiko wrote: > > On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov wrote: > > > > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() + > > put_uprobe(). And to me this change simplifies the code a bit. > > > > Signed-off-by: Oleg Nesterov > > --- > > include/linux/uprobes.h | 14 ++-- > > kernel/events/uprobes.c | 45 - > > kernel/trace/bpf_trace.c| 12 +- > > kernel/trace/trace_uprobe.c | 28 +++ > > 4 files changed, 41 insertions(+), 58 deletions(-) > > > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > > index aa89a8b67039..399509befcf4 100644 > > --- a/include/linux/uprobes.h > > +++ b/include/linux/uprobes.h > > I don't see struct uprobe forward-declared in this header, maybe we > should add it? Probably yes, thanks... Although the current code already uses struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y. Thanks, will add. > > static inline int > > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer > > *uc, bool add) > > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add) > > { > > return -ENOSYS; > > } > > complete aside, when I was looking at this code I was wondering why we > even need uprobe_apply, it looks like some hacky variant of > uprobe_register and uprobe_unregister. All I can say is that - I can hardly recall this logic, I'll try to do this tomorrow and write another email - in any case this logic is ugly and needs more cleanups - but this patch only tries to simplify this code without any visible changes. > > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister); > > * refcount is released when the last @uc for the @uprobe > > * unregisters. Caller of uprobe_register() is required to keep @inode > > * (and the containing mount) referenced. > > - * > > - * Return errno if it cannot successully install probes > > - * else return 0 (success) > > mention that it never returns NULL, but rather encodes error code > inside the pointer on error? It's an important part of the contract. OK... > > /* > > this should be /** for doccomment checking (you'd get a warning for > missing @uprobe if there was this extra star) Well, this is what we have before this patch, but OK > > * uprobe_apply - unregister an already registered probe. > > - * @inode: the file in which the probe has to be removed. > > - * @offset: offset from the start of the file. > > add @uprobe description now? If only I knew what this @uprobe description can say ;) > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, > > struct bpf_uprobe *uprobes, > > { > > u32 i; > > > > - for (i = 0; i < cnt; i++) { > > - uprobe_unregister(d_real_inode(path->dentry), > > uprobes[i].offset, > > - [i].consumer); > > - } > > + for (i = 0; i < cnt; i++) > > you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL > check if you null-out it below) Hmm... are you sure? I'll re-check... See also the end of my email. > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union > > bpf_attr *attr, struct bpf_prog *pr > > _uprobe_multi_link_lops, prog); > > > > for (i = 0; i < cnt; i++) { > > - err = uprobe_register(d_real_inode(link->path.dentry), > > + uprobes[i].uprobe = > > uprobe_register(d_real_inode(link->path.dentry), > > will uprobe keep inode alive as long as uprobe is attached? Why? No, this patch doesn't (shouldn't) change the current behaviour. The users still need kern_path/path_put to, say, prevent umount. > we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle > only NULL vs non-NULL case Not sure I understand... > or maybe better yet let's just have local struct uprobe variable and > only assign it if registration succeeded We can, but > (still need NULL check in > bpf_uprobe_unregister above) Why? If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on error, it passes cnt = i where is the 1st failed uprobe index. IOW, uptobes[i].uprobe can't be IS_ERR_OR_NULL() in the 0..cnt-1 range. If it is called by bpf_uprobe_multi_link_release() then the whole 0..umulti_link->cnt-1 range should be fine? Oleg.
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 11:39 AM Michael S. Tsirkin wrote: > > On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin wrote: > > > > > > virtio balloon communicates to the core that in some > > > configurations vq #s are non-contiguous by setting name > > > pointer to NULL. > > > > > > Unfortunately, core then turned around and just made them > > > contiguous again. Result is that driver is out of spec. > > > > Thanks for fixing this - I think the overall approach of the patch looks > > good. > > > > > Implement what the API was supposed to do > > > in the 1st place. Compatibility with buggy hypervisors > > > is handled inside virtio-balloon, which is the only driver > > > making use of this facility, so far. > > > > In addition to virtio-balloon, I believe the same problem also affects > > the virtio-fs device, since queue 1 is only supposed to be present if > > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > > meant to be queue indexes 2 and up. From a look at the Linux driver > > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > > and assumes that request queues start at index 1 rather than 2, which > > looks out of spec to me, but the current device implementations (that > > I am aware of, anyway) are also broken in the same way, so it ends up > > working today. Queue numbering in a spec-compliant device and the > > current Linux driver would mismatch; what the driver considers to be > > the first request queue (index 1) would be ignored by the device since > > queue index 1 has no function if F_NOTIFICATION isn't negotiated. > > > Oh, thanks a lot for pointing this out! > > I see so this patch is no good as is, we need to add a workaround for > virtio-fs first. > > QEMU workaround is simple - just add an extra queue. But I did not > reasearch how this would interact with vhost-user. > > From driver POV, I guess we could just ignore queue # 1 - would that be > ok or does it have performance implications? As a driver workaround for non-compliant devices, I think ignoring the first request queue would be a reasonable approach if the device's config advertises num_request_queues > 1. Unfortunately, both virtiofsd and crosvm's virtio-fs device have hard-coded num_request_queues =1, so this won't help with those existing devices. Maybe there are other devices that we would need to consider as well; commit 529395d2ae64 ("virtio-fs: add multi-queue support") quotes benchmarks that seem to be from a different virtio-fs implementation that does support multiple request queues, so the workaround could possibly be used there. > Or do what I did for balloon here: try with spec compliant #s first, > if that fails then assume it's the spec issue and shift by 1. If there is a way to "guess and check" without breaking spec-compliant devices, that sounds reasonable too; however, I'm not sure how this would work out in practice: an existing non-compliant device may fail to start if the driver tries to enable queue index 2 when it only supports one request queue, and a spec-compliant device would probably balk if the driver tries to enable queue 1 but does not negotiate VIRTIO_FS_F_NOTIFICATION. If there's a way to reset and retry the whole virtio device initialization process if a device fails like this, then maybe it's feasible. (Or can the driver tweak the virtqueue configuration and try to set DRIVER_OK repeatedly until it works? It's not clear to me if this is allowed by the spec, or what device implementations actually do in practice in this scenario.) Thanks, -- Daniel
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 12:38 PM Jiri Olsa wrote: > > On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote: > > On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa wrote: > > > > > > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote: > > > > > > SNIP > > > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > > index 467f358c8ce7..7571811127a2 100644 > > > > --- a/kernel/trace/bpf_trace.c > > > > +++ b/kernel/trace/bpf_trace.c > > > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe { > > > > loff_t offset; > > > > unsigned long ref_ctr_offset; > > > > u64 cookie; > > > > + struct uprobe *uprobe; > > > > struct uprobe_consumer consumer; > > > > }; > > > > > > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path > > > > *path, struct bpf_uprobe *uprobes, > > > > { > > > > u32 i; > > > > > > > > - for (i = 0; i < cnt; i++) { > > > > - uprobe_unregister(d_real_inode(path->dentry), > > > > uprobes[i].offset, > > > > - [i].consumer); > > > > - } > > > > > > nice, we could also drop path argument now > > > > see my comments to Oleg, I think we can/should get rid of link->path > > altogether if uprobe itself keeps inode alive. > > yea, I was thinking of that, but then it's kind of useful to have it in > bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from > first uprobe in the link, but ok, still probably worth to remove it ;-) if we need it for link_info, probably cleaner to just keep it, no big deal then > > anyway as you wrote it's ok for follow up cleanup, I'll check on that > > > > > BTW, Jiri, do we have any test for multi-uprobe that simulates partial > > attachment success/failure (whichever way you want to look at it). It > > would be super useful to have to check at least some error handling > > code in the uprobe code base. If we don't, do you mind adding > > something simple to BPF selftests? > > there's test_attach_api_fails, but I think all checked fails are before > actually calling uprobe_register function > > I think there are few ways to fail the uprobe_register, like install it > on top of int3.. will check add some test for that > great, thank you! > jirka > > > > > > > > > jirka > > > > > > > + for (i = 0; i < cnt; i++) > > > > + uprobe_unregister(uprobes[i].uprobe, > > > > [i].consumer); > > > > } > > > > > > > > static void bpf_uprobe_multi_link_release(struct bpf_link *link) > > > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union > > > > bpf_attr *attr, struct bpf_prog *pr > > > > _uprobe_multi_link_lops, prog); > > > > > > > > for (i = 0; i < cnt; i++) { > > > > - err = uprobe_register(d_real_inode(link->path.dentry), > > > > + uprobes[i].uprobe = > > > > uprobe_register(d_real_inode(link->path.dentry), > > > >uprobes[i].offset, > > > >uprobes[i].ref_ctr_offset, > > > >[i].consumer); > > > > - if (err) { > > > > + if (IS_ERR(uprobes[i].uprobe)) { > > > > + err = PTR_ERR(uprobes[i].uprobe); > > > > bpf_uprobe_unregister(, uprobes, i); > > > > goto error_free; > > > > }
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote: > On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa wrote: > > > > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote: > > > > SNIP > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > index 467f358c8ce7..7571811127a2 100644 > > > --- a/kernel/trace/bpf_trace.c > > > +++ b/kernel/trace/bpf_trace.c > > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe { > > > loff_t offset; > > > unsigned long ref_ctr_offset; > > > u64 cookie; > > > + struct uprobe *uprobe; > > > struct uprobe_consumer consumer; > > > }; > > > > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path > > > *path, struct bpf_uprobe *uprobes, > > > { > > > u32 i; > > > > > > - for (i = 0; i < cnt; i++) { > > > - uprobe_unregister(d_real_inode(path->dentry), > > > uprobes[i].offset, > > > - [i].consumer); > > > - } > > > > nice, we could also drop path argument now > > see my comments to Oleg, I think we can/should get rid of link->path > altogether if uprobe itself keeps inode alive. yea, I was thinking of that, but then it's kind of useful to have it in bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from first uprobe in the link, but ok, still probably worth to remove it ;-) anyway as you wrote it's ok for follow up cleanup, I'll check on that > > BTW, Jiri, do we have any test for multi-uprobe that simulates partial > attachment success/failure (whichever way you want to look at it). It > would be super useful to have to check at least some error handling > code in the uprobe code base. If we don't, do you mind adding > something simple to BPF selftests? there's test_attach_api_fails, but I think all checked fails are before actually calling uprobe_register function I think there are few ways to fail the uprobe_register, like install it on top of int3.. will check add some test for that jirka > > > > > jirka > > > > > + for (i = 0; i < cnt; i++) > > > + uprobe_unregister(uprobes[i].uprobe, [i].consumer); > > > } > > > > > > static void bpf_uprobe_multi_link_release(struct bpf_link *link) > > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union > > > bpf_attr *attr, struct bpf_prog *pr > > > _uprobe_multi_link_lops, prog); > > > > > > for (i = 0; i < cnt; i++) { > > > - err = uprobe_register(d_real_inode(link->path.dentry), > > > + uprobes[i].uprobe = > > > uprobe_register(d_real_inode(link->path.dentry), > > >uprobes[i].offset, > > >uprobes[i].ref_ctr_offset, > > >[i].consumer); > > > - if (err) { > > > + if (IS_ERR(uprobes[i].uprobe)) { > > > + err = PTR_ERR(uprobes[i].uprobe); > > > bpf_uprobe_unregister(, uprobes, i); > > > goto error_free; > > > }
[PATCH v5] perf,x86: avoid missing caller address in stack traces captured in uprobe
When tracing user functions with uprobe functionality, it's common to install the probe (e.g., a BPF program) at the first instruction of the function. This is often going to be `push %rbp` instruction in function preamble, which means that within that function frame pointer hasn't been established yet. This leads to consistently missing an actual caller of the traced function, because perf_callchain_user() only records current IP (capturing traced function) and then following frame pointer chain (which would be caller's frame, containing the address of caller's caller). So when we have target_1 -> target_2 -> target_3 call chain and we are tracing an entry to target_3, captured stack trace will report target_1 -> target_3 call chain, which is wrong and confusing. This patch proposes a x86-64-specific heuristic to detect `push %rbp` (`push %ebp` on 32-bit architecture) instruction being traced. Given entire kernel implementation of user space stack trace capturing works under assumption that user space code was compiled with frame pointer register (%rbp/%ebp) preservation, it seems pretty reasonable to use this instruction as a strong indicator that this is the entry to the function. In that case, return address is still pointed to by %rsp/%esp, so we fetch it and add to stack trace before proceeding to unwind the rest using frame pointer-based logic. We also check for `endbr64` (for 64-bit modes) as another common pattern for function entry, as suggested by Josh Poimboeuf. Even if we get this wrong sometimes for uprobes attached not at the function entry, it's OK because stack trace will still be overall meaningful, just with one extra bogus entry. If we don't detect this, we end up with guaranteed to be missing caller function entry in the stack trace, which is worse overall. Signed-off-by: Andrii Nakryiko --- arch/x86/events/core.c | 63 + include/linux/uprobes.h | 2 ++ kernel/events/uprobes.c | 2 ++ 3 files changed, 67 insertions(+) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 5b0dd07b1ef1..780b8dc36f05 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -41,6 +41,8 @@ #include #include #include +#include +#include #include "perf_event.h" @@ -2813,6 +2815,46 @@ static unsigned long get_segment_base(unsigned int segment) return get_desc_base(desc); } +#ifdef CONFIG_UPROBES +/* + * Heuristic-based check if uprobe is installed at the function entry. + * + * Under assumption of user code being compiled with frame pointers, + * `push %rbp/%ebp` is a good indicator that we indeed are. + * + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern. + * If we get this wrong, captured stack trace might have one extra bogus + * entry, but the rest of stack trace will still be meaningful. + */ +static bool is_uprobe_at_func_entry(struct pt_regs *regs) +{ + struct arch_uprobe *auprobe; + + if (!current->utask) + return false; + + auprobe = current->utask->auprobe; + if (!auprobe) + return false; + + /* push %rbp/%ebp */ + if (auprobe->insn[0] == 0x55) + return true; + + /* endbr64 (64-bit only) */ + if (user_64bit_mode(regs) && is_endbr(*(u32 *)auprobe->insn)) + return true; + + return false; +} + +#else +static bool is_uprobe_at_func_entry(struct pt_regs *regs) +{ + return false; +} +#endif /* CONFIG_UPROBES */ + #ifdef CONFIG_IA32_EMULATION #include @@ -2824,6 +2866,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent unsigned long ss_base, cs_base; struct stack_frame_ia32 frame; const struct stack_frame_ia32 __user *fp; + u32 ret_addr; if (user_64bit_mode(regs)) return 0; @@ -2833,6 +2876,12 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent fp = compat_ptr(ss_base + regs->bp); pagefault_disable(); + + /* see perf_callchain_user() below for why we do this */ + if (is_uprobe_at_func_entry(regs) && + !get_user(ret_addr, (const u32 __user *)regs->sp)) + perf_callchain_store(entry, ret_addr); + while (entry->nr < entry->max_stack) { if (!valid_user_frame(fp, sizeof(frame))) break; @@ -2861,6 +2910,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs { struct stack_frame frame; const struct stack_frame __user *fp; + unsigned long ret_addr; if (perf_guest_state()) { /* TODO: We don't support guest os callchain now */ @@ -2884,6 +2934,19 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs return; pagefault_disable(); + + /* +* If we are called from uprobe handler, and we are indeed at the very +
Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()
On 07/10, Andrii Nakryiko wrote: > > LGTM with few nits below. > > Acked-by: Andrii Nakryiko Thanks for looking at this. > > @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr > > *attr, struct bpf_prog *pr > > _uprobe_multi_link_lops, prog); > > > > for (i = 0; i < cnt; i++) { > > - err = > > uprobe_register_refctr(d_real_inode(link->path.dentry), > > + err = uprobe_register(d_real_inode(link->path.dentry), > > uprobes[i].offset, > > uprobes[i].ref_ctr_offset, > > [i].consumer); > > please adjust indentation here OK, > > - if (tu->ref_ctr_offset) > > - ret = uprobe_register_refctr(tu->inode, tu->offset, > > - tu->ref_ctr_offset, >consumer); > > - else > > - ret = uprobe_register(tu->inode, tu->offset, >consumer); > > - > > + ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, > > + >consumer); > > doesn't fit under 100 characters? If it does, please keep as a single line. OK, will do. Oleg.
Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe
On Wed, Jul 10, 2024 at 9:24 AM Josh Poimboeuf wrote: > > On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote: > > On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra wrote: > > > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote: > > > > You can see it replaced the first byte, the following 3 bytes are > > > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded, > > > > you can see I stepped through a few more instructions. > > > > > > > > Works by accident? > > > > > > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware > > > that supports it. > > > > OK, I don't know what the implications are, but it's a good accident :) > > > > Anyways, what should I do for v4? Drop is_endbr6() check or keep it? > > Given the current behavior of uprobe overwriting ENDBR64 with INT3, the > is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry() > would never return true on OSes which have the ENDBR64 compiled in. > > However, once userspace IBT actually gets enabled, uprobe should skip > the ENDBR64 and patch the subsequent instruction. Then the is_endbr6() > check would no longer be needed. > Ok, I'll keep it then, thanks. > -- > Josh
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On 07/10, Jiri Olsa wrote: > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, > > struct bpf_uprobe *uprobes, > > { > > u32 i; > > > > - for (i = 0; i < cnt; i++) { > > - uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset, > > - [i].consumer); > > - } > > nice, we could also drop path argument now Indeed, I missed that. Will send V2 when/if this makes any sense. Thanks! Oleg.
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin wrote: > > > > virtio balloon communicates to the core that in some > > configurations vq #s are non-contiguous by setting name > > pointer to NULL. > > > > Unfortunately, core then turned around and just made them > > contiguous again. Result is that driver is out of spec. > > Thanks for fixing this - I think the overall approach of the patch looks good. > > > Implement what the API was supposed to do > > in the 1st place. Compatibility with buggy hypervisors > > is handled inside virtio-balloon, which is the only driver > > making use of this facility, so far. > > In addition to virtio-balloon, I believe the same problem also affects > the virtio-fs device, since queue 1 is only supposed to be present if > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > meant to be queue indexes 2 and up. From a look at the Linux driver > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > and assumes that request queues start at index 1 rather than 2, which > looks out of spec to me, but the current device implementations (that > I am aware of, anyway) are also broken in the same way, so it ends up > working today. Queue numbering in a spec-compliant device and the > current Linux driver would mismatch; what the driver considers to be > the first request queue (index 1) would be ignored by the device since > queue index 1 has no function if F_NOTIFICATION isn't negotiated. > > [...] > > diff --git a/drivers/virtio/virtio_pci_common.c > > b/drivers/virtio/virtio_pci_common.c > > index 7d82facafd75..fa606e7321ad 100644 > > --- a/drivers/virtio/virtio_pci_common.c > > +++ b/drivers/virtio/virtio_pci_common.c > > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > > unsigned int nvqs, > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > struct virtqueue_info *vqi; > > u16 msix_vec; > > - int i, err, nvectors, allocated_vectors, queue_idx = 0; > > + int i, err, nvectors, allocated_vectors; > > > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > > if (!vp_dev->vqs) > > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > > unsigned int nvqs, > > msix_vec = allocated_vectors++; > > else > > msix_vec = VP_MSIX_VQ_VECTOR; > > - vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > > + vqs[i] = vp_setup_vq(vdev, i, vqi->callback, > > vqi->name, vqi->ctx, msix_vec); > > if (IS_ERR(vqs[i])) { > > err = PTR_ERR(vqs[i]); > > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > > unsigned int nvqs, > > struct virtqueue_info vqs_info[]) > > { > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > - int i, err, queue_idx = 0; > > + int i, err; > > > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > > if (!vp_dev->vqs) > > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device > > *vdev, unsigned int nvqs, > > vqs[i] = NULL; > > continue; > > } > > +<<< HEAD > > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > > vqi->name, vqi->ctx, > > +=== > > + vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], > > +ctx ? ctx[i] : false, > > +>>> f814759f80b7... virtio: fix vq # for balloon > > This still has merge markers in it. > > Thanks, > -- Daniel ouch forgot to commit ;)
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 11:12:34AM -0700, Daniel Verkamp wrote: > On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin wrote: > > > > virtio balloon communicates to the core that in some > > configurations vq #s are non-contiguous by setting name > > pointer to NULL. > > > > Unfortunately, core then turned around and just made them > > contiguous again. Result is that driver is out of spec. > > Thanks for fixing this - I think the overall approach of the patch looks good. > > > Implement what the API was supposed to do > > in the 1st place. Compatibility with buggy hypervisors > > is handled inside virtio-balloon, which is the only driver > > making use of this facility, so far. > > In addition to virtio-balloon, I believe the same problem also affects > the virtio-fs device, since queue 1 is only supposed to be present if > VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are > meant to be queue indexes 2 and up. From a look at the Linux driver > (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION > and assumes that request queues start at index 1 rather than 2, which > looks out of spec to me, but the current device implementations (that > I am aware of, anyway) are also broken in the same way, so it ends up > working today. Queue numbering in a spec-compliant device and the > current Linux driver would mismatch; what the driver considers to be > the first request queue (index 1) would be ignored by the device since > queue index 1 has no function if F_NOTIFICATION isn't negotiated. Oh, thanks a lot for pointing this out! I see so this patch is no good as is, we need to add a workaround for virtio-fs first. QEMU workaround is simple - just add an extra queue. But I did not reasearch how this would interact with vhost-user. >From driver POV, I guess we could just ignore queue # 1 - would that be ok or does it have performance implications? Or do what I did for balloon here: try with spec compliant #s first, if that fails then assume it's the spec issue and shift by 1. > [...] > > diff --git a/drivers/virtio/virtio_pci_common.c > > b/drivers/virtio/virtio_pci_common.c > > index 7d82facafd75..fa606e7321ad 100644 > > --- a/drivers/virtio/virtio_pci_common.c > > +++ b/drivers/virtio/virtio_pci_common.c > > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > > unsigned int nvqs, > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > struct virtqueue_info *vqi; > > u16 msix_vec; > > - int i, err, nvectors, allocated_vectors, queue_idx = 0; > > + int i, err, nvectors, allocated_vectors; > > > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > > if (!vp_dev->vqs) > > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > > unsigned int nvqs, > > msix_vec = allocated_vectors++; > > else > > msix_vec = VP_MSIX_VQ_VECTOR; > > - vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > > + vqs[i] = vp_setup_vq(vdev, i, vqi->callback, > > vqi->name, vqi->ctx, msix_vec); > > if (IS_ERR(vqs[i])) { > > err = PTR_ERR(vqs[i]); > > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > > unsigned int nvqs, > > struct virtqueue_info vqs_info[]) > > { > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > - int i, err, queue_idx = 0; > > + int i, err; > > > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > > if (!vp_dev->vqs) > > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device > > *vdev, unsigned int nvqs, > > vqs[i] = NULL; > > continue; > > } > > +<<< HEAD > > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > > vqi->name, vqi->ctx, > > +=== > > + vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], > > +ctx ? ctx[i] : false, > > +>>> f814759f80b7... virtio: fix vq # for balloon > > This still has merge markers in it. > > Thanks, > -- Daniel
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa wrote: > > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote: > > SNIP > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 467f358c8ce7..7571811127a2 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe { > > loff_t offset; > > unsigned long ref_ctr_offset; > > u64 cookie; > > + struct uprobe *uprobe; > > struct uprobe_consumer consumer; > > }; > > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, > > struct bpf_uprobe *uprobes, > > { > > u32 i; > > > > - for (i = 0; i < cnt; i++) { > > - uprobe_unregister(d_real_inode(path->dentry), > > uprobes[i].offset, > > - [i].consumer); > > - } > > nice, we could also drop path argument now see my comments to Oleg, I think we can/should get rid of link->path altogether if uprobe itself keeps inode alive. BTW, Jiri, do we have any test for multi-uprobe that simulates partial attachment success/failure (whichever way you want to look at it). It would be super useful to have to check at least some error handling code in the uprobe code base. If we don't, do you mind adding something simple to BPF selftests? > > jirka > > > + for (i = 0; i < cnt; i++) > > + uprobe_unregister(uprobes[i].uprobe, [i].consumer); > > } > > > > static void bpf_uprobe_multi_link_release(struct bpf_link *link) > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union > > bpf_attr *attr, struct bpf_prog *pr > > _uprobe_multi_link_lops, prog); > > > > for (i = 0; i < cnt; i++) { > > - err = uprobe_register(d_real_inode(link->path.dentry), > > + uprobes[i].uprobe = > > uprobe_register(d_real_inode(link->path.dentry), > >uprobes[i].offset, > >uprobes[i].ref_ctr_offset, > >[i].consumer); > > - if (err) { > > + if (IS_ERR(uprobes[i].uprobe)) { > > + err = PTR_ERR(uprobes[i].uprobe); > > bpf_uprobe_unregister(, uprobes, i); > > goto error_free; > > }
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov wrote: > > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() + > put_uprobe(). And to me this change simplifies the code a bit. > > Signed-off-by: Oleg Nesterov > --- > include/linux/uprobes.h | 14 ++-- > kernel/events/uprobes.c | 45 - > kernel/trace/bpf_trace.c| 12 +- > kernel/trace/trace_uprobe.c | 28 +++ > 4 files changed, 41 insertions(+), 58 deletions(-) > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index aa89a8b67039..399509befcf4 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h I don't see struct uprobe forward-declared in this header, maybe we should add it? > @@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn); > extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); > extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); > extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct > *mm, unsigned long vaddr, uprobe_opcode_t); > -extern int uprobe_register(struct inode *inode, loff_t offset, loff_t > ref_ctr_offset, struct uprobe_consumer *uc); > -extern int uprobe_apply(struct inode *inode, loff_t offset, struct > uprobe_consumer *uc, bool); > -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct > uprobe_consumer *uc); > +extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, > loff_t ref_ctr_offset, struct uprobe_consumer *uc); > +extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, > bool); > +extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer > *uc); > extern int uprobe_mmap(struct vm_area_struct *vma); > extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, > unsigned long end); > extern void uprobe_start_dup_mmap(void); > @@ -147,18 +147,18 @@ static inline void uprobes_init(void) > > #define uprobe_get_trap_addr(regs) instruction_pointer(regs) > > -static inline int > +static inline struct uprobe * > uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, > struct uprobe_consumer *uc) > { > - return -ENOSYS; > + return ERR_PTR(-ENOSYS); > } > static inline int > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, > bool add) > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add) > { > return -ENOSYS; > } complete aside, when I was looking at this code I was wondering why we even need uprobe_apply, it looks like some hacky variant of uprobe_register and uprobe_unregister. I didn't dig deeper, but think whether we even need this? If this is just to avoid (for some period) some consumer callback calling, then that could be handled at the consumer side by ignoring such calls. callback call is cheap, it's the int3 handling that's expensive and with uprobe_apply we are already paying it anyways, so what is this for? > static inline void > -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer > *uc) > +uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) > { > } > static inline int uprobe_mmap(struct vm_area_struct *vma) [...] > > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister); > * refcount is released when the last @uc for the @uprobe > * unregisters. Caller of uprobe_register() is required to keep @inode > * (and the containing mount) referenced. > - * > - * Return errno if it cannot successully install probes > - * else return 0 (success) mention that it never returns NULL, but rather encodes error code inside the pointer on error? It's an important part of the contract. > */ > -int uprobe_register(struct inode *inode, loff_t offset, loff_t > ref_ctr_offset, > - struct uprobe_consumer *uc) > +struct uprobe *uprobe_register(struct inode *inode, > + loff_t offset, loff_t ref_ctr_offset, > + struct uprobe_consumer *uc) > { [...] > @@ -1186,35 +1177,27 @@ int uprobe_register(struct inode *inode, loff_t > offset, loff_t ref_ctr_offset, > > if (unlikely(ret == -EAGAIN)) > goto retry; > - return ret; > + > + return ret ? ERR_PTR(ret) : uprobe; > } > EXPORT_SYMBOL_GPL(uprobe_register); > > /* this should be /** for doccomment checking (you'd get a warning for missing @uprobe if there was this extra star) > * uprobe_apply - unregister an already registered probe. > - * @inode: the file in which the probe has to be removed. > - * @offset: offset from the start of the file. add @uprobe description now? > * @uc: consumer which wants to add more or remove some breakpoints > * @add: add or remove the breakpoints > */ > -int uprobe_apply(struct inode *inode, loff_t offset, > - struct uprobe_consumer *uc,
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, Jul 10, 2024 at 4:43 AM Michael S. Tsirkin wrote: > > virtio balloon communicates to the core that in some > configurations vq #s are non-contiguous by setting name > pointer to NULL. > > Unfortunately, core then turned around and just made them > contiguous again. Result is that driver is out of spec. Thanks for fixing this - I think the overall approach of the patch looks good. > Implement what the API was supposed to do > in the 1st place. Compatibility with buggy hypervisors > is handled inside virtio-balloon, which is the only driver > making use of this facility, so far. In addition to virtio-balloon, I believe the same problem also affects the virtio-fs device, since queue 1 is only supposed to be present if VIRTIO_FS_F_NOTIFICATION is negotiated, and the request queues are meant to be queue indexes 2 and up. From a look at the Linux driver (virtio_fs.c), it appears like it never acks VIRTIO_FS_F_NOTIFICATION and assumes that request queues start at index 1 rather than 2, which looks out of spec to me, but the current device implementations (that I am aware of, anyway) are also broken in the same way, so it ends up working today. Queue numbering in a spec-compliant device and the current Linux driver would mismatch; what the driver considers to be the first request queue (index 1) would be ignored by the device since queue index 1 has no function if F_NOTIFICATION isn't negotiated. [...] > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index 7d82facafd75..fa606e7321ad 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -293,7 +293,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned int nvqs, > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > struct virtqueue_info *vqi; > u16 msix_vec; > - int i, err, nvectors, allocated_vectors, queue_idx = 0; > + int i, err, nvectors, allocated_vectors; > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > if (!vp_dev->vqs) > @@ -332,7 +332,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, > unsigned int nvqs, > msix_vec = allocated_vectors++; > else > msix_vec = VP_MSIX_VQ_VECTOR; > - vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > + vqs[i] = vp_setup_vq(vdev, i, vqi->callback, > vqi->name, vqi->ctx, msix_vec); > if (IS_ERR(vqs[i])) { > err = PTR_ERR(vqs[i]); > @@ -368,7 +368,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > unsigned int nvqs, > struct virtqueue_info vqs_info[]) > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > - int i, err, queue_idx = 0; > + int i, err; > > vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); > if (!vp_dev->vqs) > @@ -388,8 +388,13 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, > unsigned int nvqs, > vqs[i] = NULL; > continue; > } > +<<< HEAD > vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback, > vqi->name, vqi->ctx, > +=== > + vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], > +ctx ? ctx[i] : false, > +>>> f814759f80b7... virtio: fix vq # for balloon This still has merge markers in it. Thanks, -- Daniel
Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()
On Wed, Jul 10, 2024 at 9:32 AM Oleg Nesterov wrote: > > It doesn't make any sense to have 2 versions of _register(). Note that > trace_uprobe_enable(), the only user of uprobe_register(), doesn't need > to check tu->ref_ctr_offset to decide which one should be used, it could > safely pass ref_ctr_offset == 0 to uprobe_register_refctr(). > > Add this argument to uprobe_register(), update the callers, and kill > uprobe_register_refctr(). > > Signed-off-by: Oleg Nesterov > --- > include/linux/uprobes.h | 9 ++--- > kernel/events/uprobes.c | 23 +-- > kernel/trace/bpf_trace.c| 2 +- > kernel/trace/trace_uprobe.c | 8 ++-- > 4 files changed, 10 insertions(+), 32 deletions(-) > LGTM with few nits below. Acked-by: Andrii Nakryiko > /* > * uprobe_apply - unregister an already registered probe. > * @inode: the file in which the probe has to be removed. > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index d1daeab1bbc1..467f358c8ce7 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr > *attr, struct bpf_prog *pr > _uprobe_multi_link_lops, prog); > > for (i = 0; i < cnt; i++) { > - err = uprobe_register_refctr(d_real_inode(link->path.dentry), > + err = uprobe_register(d_real_inode(link->path.dentry), > uprobes[i].offset, > uprobes[i].ref_ctr_offset, > [i].consumer); please adjust indentation here > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > index c98e3b3386ba..78a5c40e885a 100644 > --- a/kernel/trace/trace_uprobe.c > +++ b/kernel/trace/trace_uprobe.c > @@ -1089,12 +1089,8 @@ static int trace_uprobe_enable(struct trace_uprobe > *tu, filter_func_t filter) > tu->consumer.filter = filter; > tu->inode = d_real_inode(tu->path.dentry); > > - if (tu->ref_ctr_offset) > - ret = uprobe_register_refctr(tu->inode, tu->offset, > - tu->ref_ctr_offset, >consumer); > - else > - ret = uprobe_register(tu->inode, tu->offset, >consumer); > - > + ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, > + >consumer); doesn't fit under 100 characters? If it does, please keep as a single line. > if (ret) > tu->inode = NULL; > > -- > 2.25.1.362.g51ebf55 > >
[PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() + put_uprobe(). And to me this change simplifies the code a bit. Signed-off-by: Oleg Nesterov --- include/linux/uprobes.h | 14 ++-- kernel/events/uprobes.c | 45 - kernel/trace/bpf_trace.c| 12 +- kernel/trace/trace_uprobe.c | 28 +++ 4 files changed, 41 insertions(+), 58 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index aa89a8b67039..399509befcf4 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn); extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); -extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); -extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); +extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool); +extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void uprobe_start_dup_mmap(void); @@ -147,18 +147,18 @@ static inline void uprobes_init(void) #define uprobe_get_trap_addr(regs) instruction_pointer(regs) -static inline int +static inline struct uprobe * uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) { - return -ENOSYS; + return ERR_PTR(-ENOSYS); } static inline int -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add) +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add) { return -ENOSYS; } static inline void -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) { } static inline int uprobe_mmap(struct vm_area_struct *vma) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index e5b7c6239970..dba41403d7b8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1100,22 +1100,15 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) /* * uprobe_unregister - unregister an already registered probe. - * @inode: the file in which the probe has to be removed. + * @uprobe: uprobe to remove * @offset: offset from the start of the file. * @uc: identify which probe if multiple probes are colocated. */ -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) { - struct uprobe *uprobe; - - uprobe = find_uprobe(inode, offset); - if (WARN_ON(!uprobe)) - return; - down_write(>register_rwsem); __uprobe_unregister(uprobe, uc); up_write(>register_rwsem); - put_uprobe(uprobe); } EXPORT_SYMBOL_GPL(uprobe_unregister); @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister); * refcount is released when the last @uc for the @uprobe * unregisters. Caller of uprobe_register() is required to keep @inode * (and the containing mount) referenced. - * - * Return errno if it cannot successully install probes - * else return 0 (success) */ -int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, - struct uprobe_consumer *uc) +struct uprobe *uprobe_register(struct inode *inode, + loff_t offset, loff_t ref_ctr_offset, + struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; /* Uprobe must have at least one set consumer */ if (!uc->handler && !uc->ret_handler) - return -EINVAL; + return ERR_PTR(-EINVAL); /* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */ if (!inode->i_mapping->a_ops->read_folio && !shmem_mapping(inode->i_mapping)) - return -EIO; + return ERR_PTR(-EIO); /* Racy, just to catch the obvious mistakes */ if (offset > i_size_read(inode)) - return -EINVAL; + return ERR_PTR(-EINVAL); /* * This ensures that copy_from_page(), copy_to_page() and * __update_ref_ctr() can't cross page boundary. */
Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote: SNIP > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 467f358c8ce7..7571811127a2 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -3157,6 +3157,7 @@ struct bpf_uprobe { > loff_t offset; > unsigned long ref_ctr_offset; > u64 cookie; > + struct uprobe *uprobe; > struct uprobe_consumer consumer; > }; > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, > struct bpf_uprobe *uprobes, > { > u32 i; > > - for (i = 0; i < cnt; i++) { > - uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset, > - [i].consumer); > - } nice, we could also drop path argument now jirka > + for (i = 0; i < cnt; i++) > + uprobe_unregister(uprobes[i].uprobe, [i].consumer); > } > > static void bpf_uprobe_multi_link_release(struct bpf_link *link) > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr > *attr, struct bpf_prog *pr > _uprobe_multi_link_lops, prog); > > for (i = 0; i < cnt; i++) { > - err = uprobe_register(d_real_inode(link->path.dentry), > + uprobes[i].uprobe = > uprobe_register(d_real_inode(link->path.dentry), >uprobes[i].offset, >uprobes[i].ref_ctr_offset, >[i].consumer); > - if (err) { > + if (IS_ERR(uprobes[i].uprobe)) { > + err = PTR_ERR(uprobes[i].uprobe); > bpf_uprobe_unregister(, uprobes, i); > goto error_free; > }
[PATCH 2/3] uprobes: simplify error handling for alloc_uprobe()
From: Andrii Nakryiko Return -ENOMEM instead of NULL, which makes caller's error handling just a touch simpler. Signed-off-by: Andrii Nakryiko Signed-off-by: Oleg Nesterov --- kernel/events/uprobes.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4950decebe5c..e5b7c6239970 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL); if (!uprobe) - return NULL; + return ERR_PTR(-ENOMEM); uprobe->inode = inode; uprobe->offset = offset; @@ -1166,8 +1166,6 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, retry: uprobe = alloc_uprobe(inode, offset, ref_ctr_offset); - if (!uprobe) - return -ENOMEM; if (IS_ERR(uprobe)) return PTR_ERR(uprobe); -- 2.25.1.362.g51ebf55
[PATCH 1/3] uprobes: kill uprobe_register_refctr()
It doesn't make any sense to have 2 versions of _register(). Note that trace_uprobe_enable(), the only user of uprobe_register(), doesn't need to check tu->ref_ctr_offset to decide which one should be used, it could safely pass ref_ctr_offset == 0 to uprobe_register_refctr(). Add this argument to uprobe_register(), update the callers, and kill uprobe_register_refctr(). Signed-off-by: Oleg Nesterov --- include/linux/uprobes.h | 9 ++--- kernel/events/uprobes.c | 23 +-- kernel/trace/bpf_trace.c| 2 +- kernel/trace/trace_uprobe.c | 8 ++-- 4 files changed, 10 insertions(+), 32 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f46e0ca0169c..aa89a8b67039 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -110,8 +110,7 @@ extern bool is_trap_insn(uprobe_opcode_t *insn); extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); -extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); -extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); +extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); @@ -149,11 +148,7 @@ static inline void uprobes_init(void) #define uprobe_get_trap_addr(regs) instruction_pointer(regs) static inline int -uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) -{ - return -ENOSYS; -} -static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) +uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) { return -ENOSYS; } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 1bdf386485d4..4950decebe5c 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1120,25 +1120,25 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume EXPORT_SYMBOL_GPL(uprobe_unregister); /* - * __uprobe_register - register a probe + * uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. * @uc: information on howto handle the probe.. * - * Apart from the access refcount, __uprobe_register() takes a creation + * Apart from the access refcount, uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation * refcount is released when the last @uc for the @uprobe - * unregisters. Caller of __uprobe_register() is required to keep @inode + * unregisters. Caller of uprobe_register() is required to keep @inode * (and the containing mount) referenced. * * Return errno if it cannot successully install probes * else return 0 (success) */ -static int __uprobe_register(struct inode *inode, loff_t offset, -loff_t ref_ctr_offset, struct uprobe_consumer *uc) +int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, + struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; @@ -1190,21 +1190,8 @@ static int __uprobe_register(struct inode *inode, loff_t offset, goto retry; return ret; } - -int uprobe_register(struct inode *inode, loff_t offset, - struct uprobe_consumer *uc) -{ - return __uprobe_register(inode, offset, 0, uc); -} EXPORT_SYMBOL_GPL(uprobe_register); -int uprobe_register_refctr(struct inode *inode, loff_t offset, - loff_t ref_ctr_offset, struct uprobe_consumer *uc) -{ - return __uprobe_register(inode, offset, ref_ctr_offset, uc); -} -EXPORT_SYMBOL_GPL(uprobe_register_refctr); - /* * uprobe_apply - unregister an already registered probe. * @inode: the file in which the probe has to be removed. diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index d1daeab1bbc1..467f358c8ce7 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr _uprobe_multi_link_lops, prog); for (i = 0; i < cnt; i++) { - err =
[PATCH 0/3] uprobes: future cleanups for review
On 07/10, Oleg Nesterov wrote: > > Peter, these simple cleanups should not conflict with your changes, > but I can resend them later if it causes any inconvenience. In fact I would like to push 2 more cleanups before the more significant changes, but they certainly conflict with your ongoing work, albeit only textually. Let me send the patches for review anyway, perhaps you can take at least the 1st one. 3/3 is only compile tested so far. Andrii, can you take a look? Oleg.
Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe
On Wed, Jul 10, 2024 at 08:11:57AM -0700, Andrii Nakryiko wrote: > On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra wrote: > > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote: > > > You can see it replaced the first byte, the following 3 bytes are > > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded, > > > you can see I stepped through a few more instructions. > > > > > > Works by accident? > > > > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware > > that supports it. > > OK, I don't know what the implications are, but it's a good accident :) > > Anyways, what should I do for v4? Drop is_endbr6() check or keep it? Given the current behavior of uprobe overwriting ENDBR64 with INT3, the is_endbr6() check still makes sense, otherwise is_uprobe_at_func_entry() would never return true on OSes which have the ENDBR64 compiled in. However, once userspace IBT actually gets enabled, uprobe should skip the ENDBR64 and patch the subsequent instruction. Then the is_endbr6() check would no longer be needed. -- Josh
Re: [PATCH v3 9/9] dt-bindings: interconnect: qcom: msm8953: Fix 'See also' in description
On Tue, 09 Jul 2024 12:22:54 +0200, Adam Skladowski wrote: > "See also" in description seems to be wrongly defined, > make it inline with other yamls. > > Fixes: 791ed23f735b ("dt-bindings: interconnect: qcom: Add Qualcomm MSM8953 > NoC") > Signed-off-by: Adam Skladowski > --- > .../devicetree/bindings/interconnect/qcom,msm8953.yaml | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > Acked-by: Rob Herring (Arm)
Re: [PATCH v3 7/9] dt-bindings: interconnect: qcom: msm8939: Fix example
On Tue, 09 Jul 2024 12:22:52 +0200, Adam Skladowski wrote: > For now example list snoc_mm as children of bimc which is obviously > not valid, drop bimc and move snoc_mm into snoc. > > Signed-off-by: Adam Skladowski > --- > .../devicetree/bindings/interconnect/qcom,msm8939.yaml | 6 -- > 1 file changed, 6 deletions(-) > Reviewed-by: Rob Herring (Arm)
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On Wed, 2024-07-10 at 15:07 +0200, Peter Hilber wrote: > On 08.07.24 11:27, David Woodhouse wrote: > > From: David Woodhouse > > > > The vmclock "device" provides a shared memory region with precision clock > > information. By using shared memory, it is safe across Live Migration. > > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > > actually helpful. > > > > The memory region of the device is also exposed to userspace so it can be > > read or memory mapped by application which need reliable notification of > > clock disruptions. > > > > Signed-off-by: David Woodhouse > > [...] > > > + > > +struct vmclock_abi { > > + /* CONSTANT FIELDS */ > > + uint32_t magic; > > +#define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */ > > + uint32_t size; /* Size of region containing this structure > > */ > > + uint16_t version; /* 1 */ > > + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except > > INVALID */ > > +#define VMCLOCK_COUNTER_ARM_VCNT 0 > > +#define VMCLOCK_COUNTER_X86_TSC1 > > +#define VMCLOCK_COUNTER_INVALID0xff > > + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ > > +#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 > > 00:00:00z */ > > +#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 > > 00:00:00z */ > > +#define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined > > epoch */ > > +#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ > > +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ > > + > > + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */ > > + uint32_t seq_count; /* Low bit means an update is in progress */ > > + /* > > + * This field changes to another non-repeating value when the CPU > > + * counter is disrupted, for example on live migration. This lets > > + * the guest know that it should discard any calibration it has > > + * performed of the counter against external sources (NTP/PTP/etc.). > > + */ > > + uint64_t disruption_marker; > > + uint64_t flags; > > + /* Indicates that the tai_offset_sec field is valid */ > > +#define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0) > > + /* > > + * Optionally used to notify guests of pending maintenance events. > > + * A guest which provides latency-sensitive services may wish to > > + * remove itself from service if an event is coming up. Two flags > > + * indicate the approximate imminence of the event. > > + */ > > +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ > > +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ > > +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3) > > +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4) > > +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5) > > +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6) > > + /* > > + * Even regardless of leap seconds, the time presented through this > > + * mechanism may not be strictly monotonic. If the counter slows > > down > > + * and the host adapts to this discovery, the time calculated from > > + * the value of the counter immediately after an update to this > > + * structure, may appear to be *earlier* than a calculation just > > + * before the update (while the counter was believed to be running > > + * faster than it now is). A guest operating system will typically > > + * *skew* its own system clock back towards the reference clock > > + * exposed here, rather than following this clock directly. If, > > + * however, this structure is being populated from such a system > > + * clock which is already handled in such a fashion and the results > > + * *are* guaranteed to be monotonic, such monotonicity can be > > + * advertised by setting this bit. > > + */ > > I wonder if this might be difficult to define in a standard. I'm sure we could do better than my attempt above, but surely it isn't *so* hard to define monotonicity? > Is there a need to define device and driver behavior in more detail? What > would happen if e.g. the device first decides how to update the clock, but > is then slow to update the SHM? That's OK, isn't it? The key in the VMCLOCK_FLAG_TIME_MONOTONIC flag is that by setting it, the host guarantees that the time calculated according to this structure at any given moment, shall not appear to be later than the time calculated via the structure at any *later* moment. The kernel typically takes create care to ensure that time as seen by userspace (e.g. in the real vDSO) *is* monotonic. If the underlying counter is speeding up and the
Re: [PATCH v3 3/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8937 NoC
On Tue, 09 Jul 2024 12:22:48 +0200, Adam Skladowski wrote: > Add bindings for Qualcomm MSM8937 Network-On-Chip interconnect devices. > > Signed-off-by: Adam Skladowski > --- > .../bindings/interconnect/qcom,msm8939.yaml | 8 +- > .../dt-bindings/interconnect/qcom,msm8937.h | 93 +++ > 2 files changed, 99 insertions(+), 2 deletions(-) > create mode 100644 include/dt-bindings/interconnect/qcom,msm8937.h > Reviewed-by: Rob Herring (Arm)
Re: [PATCH 1/1] remoteproc: mediatek: Support multiple reserved memory regions
On Wed, Jul 03, 2024 at 07:53:08PM +0800, Shun-yi Wang wrote: > From: "shun-yi.wang" > > SCP supports multiple reserved memory regions, intended for > specific hardwards. > > Signed-off-by: shun-yi.wang > --- > drivers/remoteproc/mtk_scp.c | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c > index 9ecd5ea04b5f3..1902826cea0af 100644 > --- a/drivers/remoteproc/mtk_scp.c > +++ b/drivers/remoteproc/mtk_scp.c > @@ -1006,22 +1006,31 @@ EXPORT_SYMBOL_GPL(scp_mapping_dm_addr); > > static int scp_map_memory_region(struct mtk_scp *scp) > { > - int ret; > + int ret, i, err; > const struct mtk_scp_sizes_data *scp_sizes; > + struct device_node *node = scp->dev->of_node; > + struct of_phandle_iterator it; > + > + i = 0; > + of_for_each_phandle(, err, node, "memory-region", NULL, 0) { > + ret = of_reserved_mem_device_init_by_idx(scp->dev, node, i); > + > + if (ret) { > + dev_err(scp->dev, "failed to assign memory-region: > %s\n", > + it.node->name); > + of_node_put(it.node); > + return -ENOMEM; > + } With this patch the code is out of sync with the bindings which are still specifying a maxItems of 1 - please address. Thanks, Mathieu > > - ret = of_reserved_mem_device_init(scp->dev); > + i++; > + } > > /* reserved memory is optional. */ > - if (ret == -ENODEV) { > + if (!i) { > dev_info(scp->dev, "skipping reserved memory initialization."); > return 0; > } > > - if (ret) { > - dev_err(scp->dev, "failed to assign memory-region: %d\n", ret); > - return -ENOMEM; > - } > - > /* Reserved SCP code size */ > scp_sizes = scp->data->scp_sizes; > scp->cpu_addr = dma_alloc_coherent(scp->dev, scp_sizes->max_dram_size, > -- > 2.18.0 >
Re: [PATCH v3 3/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8937 NoC
On Tue, Jul 09, 2024 at 12:22:48PM +0200, Adam Skladowski wrote: > Add bindings for Qualcomm MSM8937 Network-On-Chip interconnect devices. That is obvious. What would be useful is detailing how 8937 is similar to the existing devices. > > Signed-off-by: Adam Skladowski > --- > .../bindings/interconnect/qcom,msm8939.yaml | 8 +- > .../dt-bindings/interconnect/qcom,msm8937.h | 93 +++ > 2 files changed, 99 insertions(+), 2 deletions(-) > create mode 100644 include/dt-bindings/interconnect/qcom,msm8937.h > > diff --git a/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml > b/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml > index 0641a3c992a5..d19e20247df8 100644 > --- a/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml > +++ b/Documentation/devicetree/bindings/interconnect/qcom,msm8939.yaml > @@ -4,13 +4,13 @@ > $id: http://devicetree.org/schemas/interconnect/qcom,msm8939.yaml# > $schema: http://devicetree.org/meta-schemas/core.yaml# > > -title: Qualcomm MSM8939/MSM8976 Network-On-Chip interconnect > +title: Qualcomm MSM8937/MSM8939/MSM8976 Network-On-Chip interconnect > > maintainers: >- Konrad Dybcio > > description: > - The Qualcomm MSM8939/MSM8976 interconnect providers support > + The Qualcomm MSM8937/MSM8939/MSM8976 interconnect providers support >adjusting the bandwidth requirements between the various NoC fabrics. > > allOf: > @@ -19,6 +19,9 @@ allOf: > properties: >compatible: > enum: > + - qcom,msm8937-bimc > + - qcom,msm8937-pcnoc > + - qcom,msm8937-snoc >- qcom,msm8939-bimc >- qcom,msm8939-pcnoc >- qcom,msm8939-snoc > @@ -43,6 +46,7 @@ patternProperties: > properties: >compatible: > enum: > + - qcom,msm8937-snoc-mm >- qcom,msm8939-snoc-mm >- qcom,msm8976-snoc-mm > > diff --git a/include/dt-bindings/interconnect/qcom,msm8937.h > b/include/dt-bindings/interconnect/qcom,msm8937.h > new file mode 100644 > index ..98b8a4637aab > --- /dev/null > +++ b/include/dt-bindings/interconnect/qcom,msm8937.h > @@ -0,0 +1,93 @@ > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */ > +/* > + * Qualcomm MSM8937 interconnect IDs > + */ > + > +#ifndef __DT_BINDINGS_INTERCONNECT_QCOM_MSM8937_H > +#define __DT_BINDINGS_INTERCONNECT_QCOM_MSM8937_H > + > +/* BIMC fabric */ > +#define MAS_APPS_PROC0 > +#define MAS_OXILI1 > +#define MAS_SNOC_BIMC_0 2 > +#define MAS_SNOC_BIMC_2 3 > +#define MAS_SNOC_BIMC_1 4 > +#define MAS_TCU_05 > +#define SLV_EBI 6 > +#define SLV_BIMC_SNOC7 > + > +/* PCNOC fabric */ > +#define MAS_SPDM 0 > +#define MAS_BLSP_1 1 > +#define MAS_BLSP_2 2 > +#define MAS_USB_HS1 3 > +#define MAS_XI_USB_HS1 4 > +#define MAS_CRYPTO 5 > +#define MAS_SDCC_1 6 > +#define MAS_SDCC_2 7 > +#define MAS_SNOC_PCNOC 8 > +#define PCNOC_M_09 > +#define PCNOC_M_110 > +#define PCNOC_INT_0 11 > +#define PCNOC_INT_1 12 > +#define PCNOC_INT_2 13 > +#define PCNOC_INT_3 14 > +#define PCNOC_S_015 > +#define PCNOC_S_116 > +#define PCNOC_S_217 > +#define PCNOC_S_318 > +#define PCNOC_S_419 > +#define PCNOC_S_620 > +#define PCNOC_S_721 > +#define PCNOC_S_822 > +#define SLV_SDCC_2 23 > +#define SLV_SPDM 24 > +#define SLV_PDM 25 > +#define SLV_PRNG 26 > +#define SLV_TCSR 27 > +#define SLV_SNOC_CFG 28 > +#define SLV_MESSAGE_RAM 29 > +#define SLV_CAMERA_SS_CFG30 > +#define SLV_DISP_SS_CFG 31 > +#define SLV_VENUS_CFG32 > +#define SLV_GPU_CFG 33 > +#define SLV_TLMM 34 > +#define SLV_BLSP_1 35 > +#define SLV_BLSP_2 36 > +#define SLV_PMIC_ARB 37 > +#define SLV_SDCC_1 38 > +#define SLV_CRYPTO_0_CFG 39 > +#define SLV_USB_HS 40 > +#define SLV_TCU 41 > +#define SLV_PCNOC_SNOC 42 > + > +/* SNOC fabric */ > +#define MAS_QDSS_BAM 0 > +#define MAS_BIMC_SNOC1 > +#define MAS_PCNOC_SNOC 2 > +#define MAS_QDSS_ETR 3 > +#define QDSS_INT 4 > +#define SNOC_INT_0 5 > +#define SNOC_INT_1 6 > +#define SNOC_INT_2 7 > +#define SLV_KPSS_AHB 8 > +#define SLV_WCSS 9 > +#define SLV_SNOC_BIMC_1 10 > +#define SLV_IMEM 11 > +#define SLV_SNOC_PCNOC 12 > +#define SLV_QDSS_STM 13 > +#define SLV_CATS_1 14 > +#define SLV_LPASS15 > + > +/* SNOC-MM fabric */ > +#define MAS_JPEG
Re: [PATCH v3 1/9] dt-bindings: interconnect: qcom: Add Qualcomm MSM8976 NoC
On Tue, Jul 09, 2024 at 12:22:46PM +0200, Adam Skladowski wrote: > Add bindings for Qualcomm MSM8976 Network-On-Chip interconnect devices. > > Signed-off-by: Adam Skladowski > --- > .../bindings/interconnect/qcom,msm8939.yaml | 15 ++- > .../dt-bindings/interconnect/qcom,msm8976.h | 97 +++ > 2 files changed, 107 insertions(+), 5 deletions(-) > create mode 100644 include/dt-bindings/interconnect/qcom,msm8976.h Reviewed-by: Rob Herring (Arm)
Re: [PATCH 0/1] Support multiple reserved memory regions
Good morning, On Wed, Jul 03, 2024 at 07:53:07PM +0800, Shun-yi Wang wrote: > From: "shun-yi.wang" > > Besides the reserved memory region for SCP, there are additional > reserved memory regions for specific hardware use. > Currently, only a single memory region is supported. > Modifications are made to support multiple memory regions. > This is the changelog that should be in the patch. Usually there is no need for a cover letter when there is a single patch in the set. > shun-yi.wang (1): > remoteproc: mediatek: support multiple reserved memory > > drivers/remoteproc/mtk_scp.c | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > -- > 2.18.0 >
Re: [PATCH v2 2/2] virtio: fix vq # for balloon
On Wed, 10 Jul 2024 at 05:43, Michael S. Tsirkin wrote: > > virtio balloon communicates to the core that in some > configurations vq #s are non-contiguous by setting name > pointer to NULL. > > Unfortunately, core then turned around and just made them > contiguous again. Result is that driver is out of spec. > > Implement what the API was supposed to do > in the 1st place. Compatibility with buggy hypervisors > is handled inside virtio-balloon, which is the only driver > making use of this facility, so far. > > Message-ID: > Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page > reports to host") > Cc: "Alexander Duyck" > Signed-off-by: Michael S. Tsirkin > --- > arch/um/drivers/virtio_uml.c | 4 ++-- > drivers/remoteproc/remoteproc_virtio.c | 4 ++-- Reviewed-by: Mathieu Poirier > drivers/s390/virtio/virtio_ccw.c | 4 ++-- > drivers/virtio/virtio_mmio.c | 4 ++-- > drivers/virtio/virtio_pci_common.c | 11 --- > drivers/virtio/virtio_vdpa.c | 4 ++-- > 6 files changed, 18 insertions(+), 13 deletions(-) > > diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c > index 2b6e701776b6..c903e4959f51 100644 > --- a/arch/um/drivers/virtio_uml.c > +++ b/arch/um/drivers/virtio_uml.c > @@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, > unsigned nvqs, >struct irq_affinity *desc) > { > struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev); > - int i, queue_idx = 0, rc; > + int i, rc; > struct virtqueue *vq; > > /* not supported for now */ > @@ -1038,7 +1038,7 @@ static int vu_find_vqs(struct virtio_device *vdev, > unsigned nvqs, > continue; > } > > - vqs[i] = vu_setup_vq(vdev, queue_idx++, vqi->callback, > + vqs[i] = vu_setup_vq(vdev, i, vqi->callback, > vqi->name, vqi->ctx); > if (IS_ERR(vqs[i])) { > rc = PTR_ERR(vqs[i]); > diff --git a/drivers/remoteproc/remoteproc_virtio.c > b/drivers/remoteproc/remoteproc_virtio.c > index d3f39009b28e..1019b2825c26 100644 > --- a/drivers/remoteproc/remoteproc_virtio.c > +++ b/drivers/remoteproc/remoteproc_virtio.c > @@ -185,7 +185,7 @@ static int rproc_virtio_find_vqs(struct virtio_device > *vdev, unsigned int nvqs, > struct virtqueue_info vqs_info[], > struct irq_affinity *desc) > { > - int i, ret, queue_idx = 0; > + int i, ret; > > for (i = 0; i < nvqs; ++i) { > struct virtqueue_info *vqi = _info[i]; > @@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device > *vdev, unsigned int nvqs, > continue; > } > > - vqs[i] = rp_find_vq(vdev, queue_idx++, vqi->callback, > + vqs[i] = rp_find_vq(vdev, i, vqi->callback, > vqi->name, vqi->ctx); > if (IS_ERR(vqs[i])) { > ret = PTR_ERR(vqs[i]); > diff --git a/drivers/s390/virtio/virtio_ccw.c > b/drivers/s390/virtio/virtio_ccw.c > index 62eca9419ad7..82a3440bbabb 100644 > --- a/drivers/s390/virtio/virtio_ccw.c > +++ b/drivers/s390/virtio/virtio_ccw.c > @@ -694,7 +694,7 @@ static int virtio_ccw_find_vqs(struct virtio_device > *vdev, unsigned nvqs, > { > struct virtio_ccw_device *vcdev = to_vc_device(vdev); > dma64_t *indicatorp = NULL; > - int ret, i, queue_idx = 0; > + int ret, i; > struct ccw1 *ccw; > dma32_t indicatorp_dma = 0; > > @@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device > *vdev, unsigned nvqs, > continue; > } > > - vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, vqi->callback, > + vqs[i] = virtio_ccw_setup_vq(vdev, i, vqi->callback, > vqi->name, vqi->ctx, ccw); > if (IS_ERR(vqs[i])) { > ret = PTR_ERR(vqs[i]); > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 90e784e7b721..db6a0366f082 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -494,7 +494,7 @@ static int vm_find_vqs(struct virtio_device *vdev, > unsigned int nvqs, > { > struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > int irq = platform_get_irq(vm_dev->pdev, 0); > - int i, err, queue_idx = 0; > + int i, err; > > if (irq < 0) > return irq; > @@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, > unsigned int nvqs, > continue; > } > > - vqs[i] = vm_setup_vq(vdev, queue_idx++, vqi->callback, > + vqs[i] = vm_setup_vq(vdev, i, vqi->callback,
Re: [PATCH v4 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race
On Fri, 05 Jul 2024 02:45:24 -0500, Dmitrii Kuvaiskii wrote: Two enclave threads may try to add and remove the same enclave page simultaneously (e.g., if the SGX runtime supports both lazy allocation and MADV_DONTNEED semantics). Consider some enclave page added to the enclave. User space decides to temporarily remove this page (e.g., emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user space performs a memory access on the same page on CPU2, which results in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as follows: /* * CPU1: User space performs * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES) * on enclave page X */ sgx_encl_remove_pages() { mutex_lock(>lock); entry = sgx_encl_load_page(encl); /* * verify that page is * trimmed and accepted */ mutex_unlock(>lock); /* * remove PTE entry; cannot * be performed under lock */ sgx_zap_enclave_ptes(encl); /* * Fault on CPU2 on same page X */ sgx_vma_fault() { /* * PTE entry was removed, but the * page is still in enclave's xarray */ xa_load(>page_array) != NULL -> /* * SGX driver thinks that this page * was swapped out and loads it */ mutex_lock(>lock); /* * this is effectively a no-op */ entry = sgx_encl_load_page_in_vma(); /* * add PTE entry * * *BUG*: a PTE is installed for a * page in process of being removed */ vmf_insert_pfn(...); mutex_unlock(>lock); return VM_FAULT_NOPAGE; } /* * continue with page removal */ mutex_lock(>lock); sgx_encl_free_epc_page(epc_page) { /* * remove page via EREMOVE */ /* * free EPC page */ sgx_free_epc_page(epc_page); } xa_erase(>page_array); mutex_unlock(>lock); } Here, CPU1 removed the page. However CPU2 installed the PTE entry on the same page. This enclave page becomes perpetually inaccessible (until another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is marked accessible in the PTE entry but is not EAUGed, and any subsequent access to this page raises a fault: with the kernel believing there to be a valid VMA, the unlikely error code X86_PF_SGX encountered by code path do_user_addr_fault() -> access_error() causes the SGX driver's sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead. The userspace SIGSEGV handler cannot perform EACCEPT because the page was not EAUGed. Thus, the user space is stuck with the inaccessible page. Fix this race by forcing the fault handler on CPU2 to back off if the page is currently being removed (on CPU1). This is achieved by setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this page is busy, and if yes then CPU2 backs off and waits until the page is completely removed. After that, any memory access to this page results in a normal "allocate and EAUG a page on #PF" flow. Fixes: 9849bb27152c ("x86/sgx: Support complete page removal") Cc: sta...@vger.kernel.org Signed-off-by: Dmitrii Kuvaiskii --- arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 5d390df21440..02441883401d 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, /* * Do not keep encl->lock because of dependency on * mmap_lock acquired in sgx_zap_enclave_ptes(). +* +* Releasing encl->lock leads to a data race: while CPU1 +* performs sgx_zap_enclave_ptes() and removes the PTE entry +* for the enclave page, CPU2 may attempt to load this page +* (because the page is still in enclave's xarray). To prevent +* CPU2 from loading the page, mark the page as busy. */ + entry->desc |= SGX_ENCL_PAGE_BUSY;
Re: [PATCH v4 1/3] x86/sgx: Split SGX_ENCL_PAGE_BEING_RECLAIMED into two flags
On Fri, 05 Jul 2024 02:45:22 -0500, Dmitrii Kuvaiskii wrote: SGX_ENCL_PAGE_BEING_RECLAIMED flag is set when the enclave page is being reclaimed (moved to the backing store). This flag however has two logical meanings: 1. Don't attempt to load the enclave page (the page is busy). 2. Don't attempt to remove the PCMD page corresponding to this enclave page (the PCMD page is busy). To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently, both flags are set only when the enclave page is being reclaimed. A future commit will introduce a new case when the enclave page is being removed; this new case will set only the SGX_ENCL_PAGE_BUSY flag. LGTM. Reviewed-by: Haitao Huang Thanks Haitao
Re: [PATCH v2 01/12] uprobes: update outdated comment
On Wed, Jul 10, 2024 at 6:33 AM Oleg Nesterov wrote: > > On 07/03, Oleg Nesterov wrote: > > > > > /* > > > -* The NULL 'tsk' here ensures that any faults that occur here > > > -* will not be accounted to the task. 'mm' *is* current->mm, > > > -* but we treat this as a 'remote' access since it is > > > -* essentially a kernel access to the memory. > > > +* 'mm' *is* current->mm, but we treat this as a 'remote' access since > > > +* it is essentially a kernel access to the memory. > > > */ > > > result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, , NULL); > > > > OK, this makes it less confusing, so > > > > Acked-by: Oleg Nesterov > > > > - > > but it still looks confusing to me. This code used to pass tsk = NULL > > only to avoid tsk->maj/min_flt++ in faultin_page(). > > > > But today mm_account_fault() increments these counters without checking > > FAULT_FLAG_REMOTE, mm == current->mm, so it seems it would be better to > > just use get_user_pages() and remove this comment? > > Well, yes, it still looks confusing, imo. > > Andrii, I hope you won't mind if I redo/resend this and the next cleanup? > > The next one only updates the comment above uprobe_write_opcode(), but > it would be nice to explain mmap_write_lock() in register_for_each_vma(). > I don't mind a bit, thanks for sending the patches! > Oleg. > >
Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe
On Wed, Jul 10, 2024 at 4:39 AM Peter Zijlstra wrote: > > On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote: > > On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra wrote: > > > > > > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote: > > > > +#ifdef CONFIG_UPROBES > > > > +/* > > > > + * Heuristic-based check if uprobe is installed at the function entry. > > > > + * > > > > + * Under assumption of user code being compiled with frame pointers, > > > > + * `push %rbp/%ebp` is a good indicator that we indeed are. > > > > + * > > > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common > > > > pattern. > > > > + * If we get this wrong, captured stack trace might have one extra > > > > bogus > > > > + * entry, but the rest of stack trace will still be meaningful. > > > > + */ > > > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs) > > > > +{ > > > > + struct arch_uprobe *auprobe; > > > > + > > > > + if (!current->utask) > > > > + return false; > > > > + > > > > + auprobe = current->utask->auprobe; > > > > + if (!auprobe) > > > > + return false; > > > > + > > > > + /* push %rbp/%ebp */ > > > > + if (auprobe->insn[0] == 0x55) > > > > + return true; > > > > + > > > > + /* endbr64 (64-bit only) */ > > > > + if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3) > > > > + return true; > > > > > > I meant to reply to Josh suggesting this, but... how can this be? If you > > > scribble the ENDBR with an INT3 things will #CP and we'll never get to > > > the #BP. > > > > Well, it seems like it works in practice, I just tried. Here's the > > disassembly of the function: > > > > 19d0 : > > 19d0: f3 0f 1e fa endbr64 > > 19d4: 55pushq %rbp > > 19d5: 48 89 e5 movq%rsp, %rbp > > 19d8: 48 83 ec 10 subq$0x10, %rsp > > 19dc: 48 8d 3d fe ed ff ff leaq-0x1202(%rip), %rdi > > # 0x7e1 <__isoc99_scanf+0x7e1> > > 19e3: 48 8d 75 fc leaq-0x4(%rbp), %rsi > > 19e7: b0 00 movb$0x0, %al > > 19e9: e8 f2 00 00 00callq 0x1ae0 > > <__isoc99_scanf+0x1ae0> > > 19ee: b8 01 00 00 00movl$0x1, %eax > > 19f3: 48 83 c4 10 addq$0x10, %rsp > > 19f7: 5dpopq%rbp > > 19f8: c3retq > > 19f9: 0f 1f 80 00 00 00 00 nopl(%rax) > > > > And here's the state when uprobe is attached: > > > > (gdb) disass/r urandlib_api_v1 > > Dump of assembler code for function urandlib_api_v1: > >0x7ffb734e39d0 <+0>: cc int3 > >0x7ffb734e39d1 <+1>: 0f 1e fanop%edx > >0x7ffb734e39d4 <+4>: 55 push %rbp > >0x7ffb734e39d5 <+5>: 48 89 e5mov%rsp,%rbp > >0x7ffb734e39d8 <+8>: 48 83 ec 10 sub$0x10,%rsp > >0x7ffb734e39dc <+12>:48 8d 3d fe ed ff fflea > > -0x1202(%rip),%rdi# 0x7ffb734e27e1 > >0x7ffb734e39e3 <+19>:48 8d 75 fc lea > > -0x4(%rbp),%rsi > > => 0x7ffb734e39e7 <+23>:b0 00 mov$0x0,%al > >0x7ffb734e39e9 <+25>:e8 f2 00 00 00 call > > 0x7ffb734e3ae0 <__isoc99_scanf@plt> > >0x7ffb734e39ee <+30>:b8 01 00 00 00 mov$0x1,%eax > >0x7ffb734e39f3 <+35>:48 83 c4 10 add$0x10,%rsp > >0x7ffb734e39f7 <+39>:5d pop%rbp > >0x7ffb734e39f8 <+40>:c3 ret > > > > > > You can see it replaced the first byte, the following 3 bytes are > > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded, > > you can see I stepped through a few more instructions. > > > > Works by accident? > > Yeah, we don't actually have Userspace IBT enabled yet, even on hardware > that supports it. OK, I don't know what the implications are, but it's a good accident :) Anyways, what should I do for v4? Drop is_endbr6() check or keep it?
Re: [PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override
On Wed, 10 Jul 2024 15:13:07 +0200 Jiri Olsa wrote: > On Wed, Jul 10, 2024 at 04:59:39PM +0800, Menglong Dong wrote: > > After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction > > should be in Fixes: tag probably ? Yes, I'll add a Fixed tag. > > > pointer with original one"), "bpf_kprobe_override" is not used anywhere > > anymore, and we can remove it now. > > > > Signed-off-by: Menglong Dong > > lgtm, cc-ing Masami > > Acked-by: Jiri Olsa Thanks! > > jirka > > > --- > > include/linux/trace_events.h | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > > index 9df3e2973626..9435185c10ef 100644 > > --- a/include/linux/trace_events.h > > +++ b/include/linux/trace_events.h > > @@ -880,7 +880,6 @@ do { > > \ > > struct perf_event; > > > > DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); > > -DECLARE_PER_CPU(int, bpf_kprobe_override); > > > > extern int perf_trace_init(struct perf_event *event); > > extern void perf_trace_destroy(struct perf_event *event); > > -- > > 2.39.2 > > > > -- Masami Hiramatsu (Google)
Re: [PATCH v2 01/12] uprobes: update outdated comment
On 07/03, Oleg Nesterov wrote: > > > /* > > -* The NULL 'tsk' here ensures that any faults that occur here > > -* will not be accounted to the task. 'mm' *is* current->mm, > > -* but we treat this as a 'remote' access since it is > > -* essentially a kernel access to the memory. > > +* 'mm' *is* current->mm, but we treat this as a 'remote' access since > > +* it is essentially a kernel access to the memory. > > */ > > result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, , NULL); > > OK, this makes it less confusing, so > > Acked-by: Oleg Nesterov > > - > but it still looks confusing to me. This code used to pass tsk = NULL > only to avoid tsk->maj/min_flt++ in faultin_page(). > > But today mm_account_fault() increments these counters without checking > FAULT_FLAG_REMOTE, mm == current->mm, so it seems it would be better to > just use get_user_pages() and remove this comment? Well, yes, it still looks confusing, imo. Andrii, I hope you won't mind if I redo/resend this and the next cleanup? The next one only updates the comment above uprobe_write_opcode(), but it would be nice to explain mmap_write_lock() in register_for_each_vma(). Oleg.
Re: [PATCH v3] init: staging: Fix missing warning/taint on builtin code
On Sat, Jul 06, 2024 at 12:15:01AM -0300, Ágatha Isabelle Chris Moreira Guedes wrote: > Fix the absence of warning message and kernel tainting when initializing > drivers from the `drivers/staging` subtree from initcalls (when > configured as built-in). > > When such a driver is built as module and the module is loaded, the > `load_module()` function taints the kernel to signal code of unknown > quality is loaded, and produces a warning like this: > > [8.076352] rts5208: module is from the staging directory, the > quality is unknown, you have been warned. > > The same behaviour is absent, however, when a staging driver is compiled > as built-in on the kernel image, since loading it happens through > initcalls and not through load_module(). > > This might prevent relevant information of being available on a bug > report (i.e. on a panic log) among other possible problems. > > NOTES: > - The patch is written in such a way that all non-staging drivers are > kept the way they were, except for staging drivers built with > `-DSTAGING_CODE`. That's good! > - Since it changes some macros related to clang LTO as well, I tested it > and it works properly in kernels compiled with both clang and gcc. This is odd, why is it messing with LTO stuff? It should be much more "self contained" than this I feel like. I see what you are doing by trying to use some of the LTO macros again, but in doing so, it makes it really hard to understand the diff and feel comfortable with this. If you want to stick with what you have done here, can you split it up a bit more? Once patch for the LTO header file changes and then another that only adds the staging stuff. That way it's easier to review and justify that nothing is going to be broken with this patch. > - Some `checkpatch.pl` errors, warnings and checks (with `--strict`) are > present. Some were already there, some I introduced but I think > they're unavoidable. Some IMHO don´t make sense at all, I think they > would apply for most regular macros but initcall macros are just way > different. Yeah, checkpatch and macros can get tricky, use your best judgement here and it looks ok. > Fixes: 061b1bd394ca ("Staging: add TAINT_CRAP for all drivers/staging code") I think it really fixes the commit _after_ this one that turns on the taint for the build :) anyway, nice work, I think it's almost there! greg k-h
Re: [PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override
On Wed, Jul 10, 2024 at 04:59:39PM +0800, Menglong Dong wrote: > After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction should be in Fixes: tag probably ? > pointer with original one"), "bpf_kprobe_override" is not used anywhere > anymore, and we can remove it now. > > Signed-off-by: Menglong Dong lgtm, cc-ing Masami Acked-by: Jiri Olsa jirka > --- > include/linux/trace_events.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index 9df3e2973626..9435185c10ef 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -880,7 +880,6 @@ do { > \ > struct perf_event; > > DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); > -DECLARE_PER_CPU(int, bpf_kprobe_override); > > extern int perf_trace_init(struct perf_event *event); > extern void perf_trace_destroy(struct perf_event *event); > -- > 2.39.2 > >
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 08.07.24 11:27, David Woodhouse wrote: > From: David Woodhouse > > The vmclock "device" provides a shared memory region with precision clock > information. By using shared memory, it is safe across Live Migration. > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > actually helpful. > > The memory region of the device is also exposed to userspace so it can be > read or memory mapped by application which need reliable notification of > clock disruptions. > > Signed-off-by: David Woodhouse [...] > + > +struct vmclock_abi { > + /* CONSTANT FIELDS */ > + uint32_t magic; > +#define VMCLOCK_MAGIC0x4b4c4356 /* "VCLK" */ > + uint32_t size; /* Size of region containing this structure */ > + uint16_t version; /* 1 */ > + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */ > +#define VMCLOCK_COUNTER_ARM_VCNT 0 > +#define VMCLOCK_COUNTER_X86_TSC 1 > +#define VMCLOCK_COUNTER_INVALID 0xff > + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ > +#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 > 00:00:00z */ > +#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 > 00:00:00z */ > +#define VMCLOCK_TIME_MONOTONIC 2 /* Since > undefined epoch */ > +#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ > +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ > + > + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */ > + uint32_t seq_count; /* Low bit means an update is in progress */ > + /* > + * This field changes to another non-repeating value when the CPU > + * counter is disrupted, for example on live migration. This lets > + * the guest know that it should discard any calibration it has > + * performed of the counter against external sources (NTP/PTP/etc.). > + */ > + uint64_t disruption_marker; > + uint64_t flags; > + /* Indicates that the tai_offset_sec field is valid */ > +#define VMCLOCK_FLAG_TAI_OFFSET_VALID(1 << 0) > + /* > + * Optionally used to notify guests of pending maintenance events. > + * A guest which provides latency-sensitive services may wish to > + * remove itself from service if an event is coming up. Two flags > + * indicate the approximate imminence of the event. > + */ > +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ > +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ > +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3) > +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4) > +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5) > +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6) > + /* > + * Even regardless of leap seconds, the time presented through this > + * mechanism may not be strictly monotonic. If the counter slows down > + * and the host adapts to this discovery, the time calculated from > + * the value of the counter immediately after an update to this > + * structure, may appear to be *earlier* than a calculation just > + * before the update (while the counter was believed to be running > + * faster than it now is). A guest operating system will typically > + * *skew* its own system clock back towards the reference clock > + * exposed here, rather than following this clock directly. If, > + * however, this structure is being populated from such a system > + * clock which is already handled in such a fashion and the results > + * *are* guaranteed to be monotonic, such monotonicity can be > + * advertised by setting this bit. > + */ I wonder if this might be difficult to define in a standard. Is there a need to define device and driver behavior in more detail? What would happen if e.g. the device first decides how to update the clock, but is then slow to update the SHM? > +#define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7) > + > + uint8_t pad[2]; > + uint8_t clock_status; > +#define VMCLOCK_STATUS_UNKNOWN 0 > +#define VMCLOCK_STATUS_INITIALIZING 1 > +#define VMCLOCK_STATUS_SYNCHRONIZED 2 > +#define VMCLOCK_STATUS_FREERUNNING 3 > +#define VMCLOCK_STATUS_UNRELIABLE4 > + > + /* > + * The time exposed through this device is never smeared. This field > + * corresponds to the 'subtype' field in virtio-rtc, which indicates > + * the smearing method. However in this case it provides a *hint* to > + * the guest operating system, such that *if* the guest OS wants to > + * provide its users with an alternative clock which does not follow > + * the POSIX CLOCK_REALTIME standard, it may do so in a fashion > + * consistent with the other systems in the nearby environment.
Re: [PATCH V2] test/vsock: add install target
On Wed, Jul 10, 2024 at 08:27:28PM GMT, Peng Fan (OSS) wrote: From: Peng Fan Add install target for vsock to make Yocto easy to install the images. Signed-off-by: Peng Fan --- LGTM! This is a net-next material, so next time better to specify it (e.g. [PATCH net-next]). If not queued within a week, please resend specifying net-next. Reviewed-by: Stefano Garzarella V2: Use VSOCK_INSTALL_PATH, drop INSTALL_PATH tools/testing/vsock/Makefile | 13 + 1 file changed, 13 insertions(+) diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index a7f56a09ca9f..6e0b4e95e230 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -13,3 +13,16 @@ CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-p clean: ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test -include *.d + +VSOCK_INSTALL_PATH ?= + +install: all +ifdef VSOCK_INSTALL_PATH + mkdir -p $(VSOCK_INSTALL_PATH) + install -m 744 vsock_test $(VSOCK_INSTALL_PATH) + install -m 744 vsock_perf $(VSOCK_INSTALL_PATH) + install -m 744 vsock_diag_test $(VSOCK_INSTALL_PATH) + install -m 744 vsock_uring_test $(VSOCK_INSTALL_PATH) +else + $(error Error: set VSOCK_INSTALL_PATH to use install) +endif -- 2.37.1
[PATCH V2] test/vsock: add install target
From: Peng Fan Add install target for vsock to make Yocto easy to install the images. Signed-off-by: Peng Fan --- V2: Use VSOCK_INSTALL_PATH, drop INSTALL_PATH tools/testing/vsock/Makefile | 13 + 1 file changed, 13 insertions(+) diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index a7f56a09ca9f..6e0b4e95e230 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -13,3 +13,16 @@ CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-p clean: ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test -include *.d + +VSOCK_INSTALL_PATH ?= + +install: all +ifdef VSOCK_INSTALL_PATH + mkdir -p $(VSOCK_INSTALL_PATH) + install -m 744 vsock_test $(VSOCK_INSTALL_PATH) + install -m 744 vsock_perf $(VSOCK_INSTALL_PATH) + install -m 744 vsock_diag_test $(VSOCK_INSTALL_PATH) + install -m 744 vsock_uring_test $(VSOCK_INSTALL_PATH) +else + $(error Error: set VSOCK_INSTALL_PATH to use install) +endif -- 2.37.1
Re: [PATCH] test/vsock: add install target
On Wed, Jul 10, 2024 at 11:34:05AM GMT, Peng Fan wrote: Subject: Re: [PATCH] test/vsock: add install target On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote: >> Subject: Re: [PATCH] test/vsock: add install target >> >> On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote: >> >From: Peng Fan >> > >> >Add install target for vsock to make Yocto easy to install the images. >> > >> >Signed-off-by: Peng Fan >> >--- >> > tools/testing/vsock/Makefile | 12 >> > 1 file changed, 12 insertions(+) >> > >> >diff --git a/tools/testing/vsock/Makefile >> >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460 >> 100644 >> >--- a/tools/testing/vsock/Makefile >> >+++ b/tools/testing/vsock/Makefile >> >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o >> msg_zerocopy_common.o >> > vsock_uring_test: LDLIBS = -luring >> > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o >> >msg_zerocopy_common.o >> > >> >+VSOCK_INSTALL_PATH ?= $(abspath .) >> >+# Avoid changing the rest of the logic here and lib.mk. >> >+INSTALL_PATH := $(VSOCK_INSTALL_PATH) >> >+ >> > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include >> > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow >> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE - >> D_GNU_SOURCE >> > .PHONY: all test clean >> > clean: >> > ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf >> vsock_uring_test >> > -include *.d >> >+ >> >+install: all >> >+ @# Ask all targets to install their files >> >+ mkdir -p $(INSTALL_PATH)/vsock >> >> why using the "vsock" subdir? >> >> IIUC you were inspired by selftests/Makefile, but it installs under >> $(INSTALL_PATH)/kselftest/ the scripts used by the main one >> `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead. >> So in this case I would install everything in $(INSTALL_PATH). >> >> WDYT? > >I agree. > >> >> >+ install -m 744 vsock_test $(INSTALL_PATH)/vsock/ >> >+ install -m 744 vsock_perf $(INSTALL_PATH)/vsock/ >> >+ install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/ >> >+ install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/ >> >> Also from selftests/Makefile, what about using the ifdef instead of >> using $(abspath .) as default place? >> >> I mean this: >> >> install: all >> ifdef INSTALL_PATH >>... >> else >>$(error Error: set INSTALL_PATH to use install) endif > >Is the following looks good to you? > ># Avoid conflict with INSTALL_PATH set by the main Makefile >VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH) I'm not a super Makefile expert, but why do we need both VSOCK_INSTALL_PATH and INSTALL_PATH? INSTALL_PATH is exported by kernel root directory makefile. So to user, we need to avoid export INSTALL_PATH here. So I just follow selftests/Makefile using KSFT_INSTALL_PATH There is a comment there: # Avoid changing the rest of the logic here and lib.mk. Added by commit 17eac6c2db8b2cdfe33d40229bdda2acd86b304a. IIUC they re-used INSTALL_PATH, just to avoid too many changes in that file and in tools/testing/selftests/lib.mk So, IMHO we should not care about it and only use VSOCK_INSTALL_PATH if you don't want to conflict with INSTALL_PATH. Stefano
[PATCH v2 2/2] virtio: fix vq # for balloon
virtio balloon communicates to the core that in some configurations vq #s are non-contiguous by setting name pointer to NULL. Unfortunately, core then turned around and just made them contiguous again. Result is that driver is out of spec. Implement what the API was supposed to do in the 1st place. Compatibility with buggy hypervisors is handled inside virtio-balloon, which is the only driver making use of this facility, so far. Message-ID: Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page reports to host") Cc: "Alexander Duyck" Signed-off-by: Michael S. Tsirkin --- arch/um/drivers/virtio_uml.c | 4 ++-- drivers/remoteproc/remoteproc_virtio.c | 4 ++-- drivers/s390/virtio/virtio_ccw.c | 4 ++-- drivers/virtio/virtio_mmio.c | 4 ++-- drivers/virtio/virtio_pci_common.c | 11 --- drivers/virtio/virtio_vdpa.c | 4 ++-- 6 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c index 2b6e701776b6..c903e4959f51 100644 --- a/arch/um/drivers/virtio_uml.c +++ b/arch/um/drivers/virtio_uml.c @@ -1019,7 +1019,7 @@ static int vu_find_vqs(struct virtio_device *vdev, unsigned nvqs, struct irq_affinity *desc) { struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev); - int i, queue_idx = 0, rc; + int i, rc; struct virtqueue *vq; /* not supported for now */ @@ -1038,7 +1038,7 @@ static int vu_find_vqs(struct virtio_device *vdev, unsigned nvqs, continue; } - vqs[i] = vu_setup_vq(vdev, queue_idx++, vqi->callback, + vqs[i] = vu_setup_vq(vdev, i, vqi->callback, vqi->name, vqi->ctx); if (IS_ERR(vqs[i])) { rc = PTR_ERR(vqs[i]); diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index d3f39009b28e..1019b2825c26 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -185,7 +185,7 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue_info vqs_info[], struct irq_affinity *desc) { - int i, ret, queue_idx = 0; + int i, ret; for (i = 0; i < nvqs; ++i) { struct virtqueue_info *vqi = _info[i]; @@ -195,7 +195,7 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned int nvqs, continue; } - vqs[i] = rp_find_vq(vdev, queue_idx++, vqi->callback, + vqs[i] = rp_find_vq(vdev, i, vqi->callback, vqi->name, vqi->ctx); if (IS_ERR(vqs[i])) { ret = PTR_ERR(vqs[i]); diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 62eca9419ad7..82a3440bbabb 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -694,7 +694,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, unsigned nvqs, { struct virtio_ccw_device *vcdev = to_vc_device(vdev); dma64_t *indicatorp = NULL; - int ret, i, queue_idx = 0; + int ret, i; struct ccw1 *ccw; dma32_t indicatorp_dma = 0; @@ -710,7 +710,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, unsigned nvqs, continue; } - vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, vqi->callback, + vqs[i] = virtio_ccw_setup_vq(vdev, i, vqi->callback, vqi->name, vqi->ctx, ccw); if (IS_ERR(vqs[i])) { ret = PTR_ERR(vqs[i]); diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 90e784e7b721..db6a0366f082 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -494,7 +494,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs, { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); int irq = platform_get_irq(vm_dev->pdev, 0); - int i, err, queue_idx = 0; + int i, err; if (irq < 0) return irq; @@ -515,7 +515,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs, continue; } - vqs[i] = vm_setup_vq(vdev, queue_idx++, vqi->callback, + vqs[i] = vm_setup_vq(vdev, i, vqi->callback, vqi->name, vqi->ctx); if (IS_ERR(vqs[i])) { vm_del_vqs(vdev); diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 7d82facafd75..fa606e7321ad 100644 --- a/drivers/virtio/virtio_pci_common.c +++
[PATCH v2 1/2] virtio_balloon: add work around for out of spec QEMU
QEMU implemented the configuration VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT incorrectly: it then uses vq3 for reporting, spec says it is always 4. This is masked by a corresponding bug in driver: add a work around as I'm going to try and fix the driver bug. Message-ID: Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page reports to host") Cc: "Alexander Duyck" Signed-off-by: Michael S. Tsirkin --- drivers/virtio/virtio_balloon.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 54469277ca30..eebeab863697 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -589,8 +589,23 @@ static int init_vqs(struct virtio_balloon *vb) err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, vqs_info, NULL); - if (err) - return err; + if (err) { + /* +* Try to work around QEMU bug which since 2020 confused vq numbers +* when VIRTIO_BALLOON_F_REPORTING but not +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered. +*/ + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING) && + !virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { + vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].name = "reporting_vq"; + vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].callback = balloon_ack; + err = virtio_find_vqs(vb->vdev, + VIRTIO_BALLOON_VQ_REPORTING, vqs_info, NULL); + } + + if (err) + return err; + } vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; -- MST
Re: [PATCH v4] perf,x86: avoid missing caller address in stack traces captured in uprobe
On Tue, Jul 09, 2024 at 10:50:00AM -0700, Andrii Nakryiko wrote: > On Tue, Jul 9, 2024 at 3:11 AM Peter Zijlstra wrote: > > > > On Mon, Jul 08, 2024 at 04:11:27PM -0700, Andrii Nakryiko wrote: > > > +#ifdef CONFIG_UPROBES > > > +/* > > > + * Heuristic-based check if uprobe is installed at the function entry. > > > + * > > > + * Under assumption of user code being compiled with frame pointers, > > > + * `push %rbp/%ebp` is a good indicator that we indeed are. > > > + * > > > + * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern. > > > + * If we get this wrong, captured stack trace might have one extra bogus > > > + * entry, but the rest of stack trace will still be meaningful. > > > + */ > > > +static bool is_uprobe_at_func_entry(struct pt_regs *regs) > > > +{ > > > + struct arch_uprobe *auprobe; > > > + > > > + if (!current->utask) > > > + return false; > > > + > > > + auprobe = current->utask->auprobe; > > > + if (!auprobe) > > > + return false; > > > + > > > + /* push %rbp/%ebp */ > > > + if (auprobe->insn[0] == 0x55) > > > + return true; > > > + > > > + /* endbr64 (64-bit only) */ > > > + if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3) > > > + return true; > > > > I meant to reply to Josh suggesting this, but... how can this be? If you > > scribble the ENDBR with an INT3 things will #CP and we'll never get to > > the #BP. > > Well, it seems like it works in practice, I just tried. Here's the > disassembly of the function: > > 19d0 : > 19d0: f3 0f 1e fa endbr64 > 19d4: 55pushq %rbp > 19d5: 48 89 e5 movq%rsp, %rbp > 19d8: 48 83 ec 10 subq$0x10, %rsp > 19dc: 48 8d 3d fe ed ff ff leaq-0x1202(%rip), %rdi > # 0x7e1 <__isoc99_scanf+0x7e1> > 19e3: 48 8d 75 fc leaq-0x4(%rbp), %rsi > 19e7: b0 00 movb$0x0, %al > 19e9: e8 f2 00 00 00callq 0x1ae0 <__isoc99_scanf+0x1ae0> > 19ee: b8 01 00 00 00movl$0x1, %eax > 19f3: 48 83 c4 10 addq$0x10, %rsp > 19f7: 5dpopq%rbp > 19f8: c3retq > 19f9: 0f 1f 80 00 00 00 00 nopl(%rax) > > And here's the state when uprobe is attached: > > (gdb) disass/r urandlib_api_v1 > Dump of assembler code for function urandlib_api_v1: >0x7ffb734e39d0 <+0>: cc int3 >0x7ffb734e39d1 <+1>: 0f 1e fanop%edx >0x7ffb734e39d4 <+4>: 55 push %rbp >0x7ffb734e39d5 <+5>: 48 89 e5mov%rsp,%rbp >0x7ffb734e39d8 <+8>: 48 83 ec 10 sub$0x10,%rsp >0x7ffb734e39dc <+12>:48 8d 3d fe ed ff fflea > -0x1202(%rip),%rdi# 0x7ffb734e27e1 >0x7ffb734e39e3 <+19>:48 8d 75 fc lea-0x4(%rbp),%rsi > => 0x7ffb734e39e7 <+23>:b0 00 mov$0x0,%al >0x7ffb734e39e9 <+25>:e8 f2 00 00 00 call > 0x7ffb734e3ae0 <__isoc99_scanf@plt> >0x7ffb734e39ee <+30>:b8 01 00 00 00 mov$0x1,%eax >0x7ffb734e39f3 <+35>:48 83 c4 10 add$0x10,%rsp >0x7ffb734e39f7 <+39>:5d pop%rbp >0x7ffb734e39f8 <+40>:c3 ret > > > You can see it replaced the first byte, the following 3 bytes are > remnants of endb64 (gdb says it's a nop? :)), and then we proceeded, > you can see I stepped through a few more instructions. > > Works by accident? Yeah, we don't actually have Userspace IBT enabled yet, even on hardware that supports it.
Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU
On Wed, Jul 10, 2024 at 03:37:49PM +0800, Jason Wang wrote: > On Wed, Jul 10, 2024 at 2:16 PM Michael S. Tsirkin wrote: > > > > On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote: > > > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin wrote: > > > > > > > > QEMU implemented the configuration > > > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT > > > > incorrectly: it then uses vq3 for reporting, spec says it is always 4. > > > > > > > > This is masked by a corresponding bug in driver: > > > > add a work around as I'm going to try and fix the driver bug. > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > > drivers/virtio/virtio_balloon.c | 19 +-- > > > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/virtio/virtio_balloon.c > > > > b/drivers/virtio/virtio_balloon.c > > > > index 9a61febbd2f7..7dc3fcd56238 100644 > > > > --- a/drivers/virtio/virtio_balloon.c > > > > +++ b/drivers/virtio/virtio_balloon.c > > > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb) > > > > > > > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, > > > > callbacks, names, NULL); > > > > - if (err) > > > > - return err; > > > > + if (err) { > > > > + /* > > > > +* Try to work around QEMU bug which since 2020 > > > > confused vq numbers > > > > +* when VIRTIO_BALLOON_F_REPORTING but not > > > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered. > > > > +*/ > > > > + if (virtio_has_feature(vb->vdev, > > > > VIRTIO_BALLOON_F_REPORTING) && > > > > + !virtio_has_feature(vb->vdev, > > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > > > > + names[VIRTIO_BALLOON_VQ_FREE_PAGE] = > > > > "reporting_vq"; > > > > + callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = > > > > balloon_ack; > > > > + err = virtio_find_vqs(vb->vdev, > > > > + > > > > VIRTIO_BALLOON_VQ_REPORTING, vqs, callbacks, names, NULL); > > > > + } > > > > + > > > > + if (err) > > > > + return err; > > > > + } > > > > > > > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; > > > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; > > > > -- > > > > MST > > > > > > > > > > Acked-by: Jason Wang > > > > > > Do we need a spec to say this is something that needs to be considered > > > by the driver? > > > > > > Thanks > > > > I'd say it's a temporary situation that we won't need to bother > > about in several years. > > I mean, should a newly-written virtio-balloon driver care about this? > If not, it means it can't work for several Qemu versions. > > Thanks True - I could not find a way to make it work in a way that would be compatible with old qemu. > > > > -- > > MST > >
RE: [PATCH] test/vsock: add install target
> Subject: Re: [PATCH] test/vsock: add install target > > On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote: > >> Subject: Re: [PATCH] test/vsock: add install target > >> > >> On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote: > >> >From: Peng Fan > >> > > >> >Add install target for vsock to make Yocto easy to install the > images. > >> > > >> >Signed-off-by: Peng Fan > >> >--- > >> > tools/testing/vsock/Makefile | 12 > >> > 1 file changed, 12 insertions(+) > >> > > >> >diff --git a/tools/testing/vsock/Makefile > >> >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460 > >> 100644 > >> >--- a/tools/testing/vsock/Makefile > >> >+++ b/tools/testing/vsock/Makefile > >> >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o > >> msg_zerocopy_common.o > >> > vsock_uring_test: LDLIBS = -luring > >> > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o > >> >msg_zerocopy_common.o > >> > > >> >+VSOCK_INSTALL_PATH ?= $(abspath .) > >> >+# Avoid changing the rest of the logic here and lib.mk. > >> >+INSTALL_PATH := $(VSOCK_INSTALL_PATH) > >> >+ > >> > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include > >> > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow > >> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE - > >> D_GNU_SOURCE > >> > .PHONY: all test clean > >> > clean: > >> > ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf > >> vsock_uring_test > >> > -include *.d > >> >+ > >> >+install: all > >> >+ @# Ask all targets to install their files > >> >+ mkdir -p $(INSTALL_PATH)/vsock > >> > >> why using the "vsock" subdir? > >> > >> IIUC you were inspired by selftests/Makefile, but it installs under > >> $(INSTALL_PATH)/kselftest/ the scripts used by the main one > >> `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead. > >> So in this case I would install everything in $(INSTALL_PATH). > >> > >> WDYT? > > > >I agree. > > > >> > >> >+ install -m 744 vsock_test $(INSTALL_PATH)/vsock/ > >> >+ install -m 744 vsock_perf $(INSTALL_PATH)/vsock/ > >> >+ install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/ > >> >+ install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/ > >> > >> Also from selftests/Makefile, what about using the ifdef instead of > >> using $(abspath .) as default place? > >> > >> I mean this: > >> > >> install: all > >> ifdef INSTALL_PATH > >>... > >> else > >>$(error Error: set INSTALL_PATH to use install) endif > > > >Is the following looks good to you? > > > ># Avoid conflict with INSTALL_PATH set by the main Makefile > >VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH) > > I'm not a super Makefile expert, but why do we need both > VSOCK_INSTALL_PATH and INSTALL_PATH? INSTALL_PATH is exported by kernel root directory makefile. So to user, we need to avoid export INSTALL_PATH here. So I just follow selftests/Makefile using KSFT_INSTALL_PATH Regards, Peng. > > Stefano > > > > >install: all > >ifdef INSTALL_PATH > >mkdir -p $(INSTALL_PATH) > >install -m 744 vsock_test $(INSTALL_PATH) > >install -m 744 vsock_perf $(INSTALL_PATH) > >install -m 744 vsock_diag_test $(INSTALL_PATH) > >install -m 744 vsock_uring_test $(INSTALL_PATH) else > >$(error Error: set INSTALL_PATH to use install) Endif > > > >Thanks, > >Peng. > >> > >> Thanks, > >> Stefano > >
RE: [EXTERNAL] [PATCH net-next v10 04/15] mm: page_frag: add '_va' suffix to page_frag API
From: Yunsheng Lin Sent: Tuesday, July 9, 2024 6:57 PM To: da...@davemloft.net; k...@kernel.org; pab...@redhat.com Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; Yunsheng Lin ; Alexander Duyck ; Jeroen de Borst ; Praveen Kaligineedi ; Shailend Chand ; Eric Dumazet ; Tony Nguyen ; Przemek Kitszel ; Sunil Kovvuri Goutham ; Geethasowjanya Akula ; Subbaraya Sundeep Bhatta ; Hariprasad Kelam ; Felix Fietkau ; Sean Wang ; Mark Lee ; Lorenzo Bianconi ; Matthias Brugger ; AngeloGioacchino Del Regno ; Keith Busch ; Jens Axboe ; Christoph Hellwig ; Sagi Grimberg ; Chaitanya Kulkarni ; Michael S. Tsirkin ; Jason Wang ; Eugenio Pérez ; Andrew Morton ; Alexei Starovoitov ; Daniel Borkmann ; Jesper Dangaard Brouer ; John Fastabend ; Andrii Nakryiko ; Martin KaFai Lau ; Eduard Zingerman ; Song Liu ; Yonghong Song ; KP Singh ; Stanislav Fomichev ; Hao Luo ; Jiri Olsa ; David Howells ; Marc Dionne ; Trond Myklebust ; Anna Schumaker ; Chuck Lever ; Jeff Layton ; Neil Brown ; Olga Kornievskaia ; Dai Ngo ; Tom Talpey ; intel-wired-...@lists.osuosl.org; linux-arm-ker...@lists.infradead.org; linux-media...@lists.infradead.org; linux-n...@lists.infradead.org; k...@vger.kernel.org; virtualizat...@lists.linux.dev; linux...@kvack.org; b...@vger.kernel.org; linux-...@lists.infradead.org; linux-...@vger.kernel.org Subject: [PATCH net-next v10 04/15] mm: page_frag: add '_va' suffix to page_frag API Currently the page_frag API is returning 'virtual address' or 'va' when allocing and expecting 'virtual address' or 'va' as input when freeing. As we are about to support new use cases that the caller need to deal with 'struct page' or need to deal with both 'va' and 'struct page'. In order to differentiate the API handling between 'va' and 'struct page', add '_va' suffix to the corresponding API mirroring the page_pool_alloc_va() API of the page_pool. So that callers expecting to deal with va, page or both va and page may call page_frag_alloc_va*, page_frag_alloc_pg*, or page_frag_alloc* API accordingly. CC: Alexander Duyck Signed-off-by: Yunsheng Lin Looks good. Reviewed-by: Subbaraya Sundeep --- drivers/net/ethernet/google/gve/gve_rx.c | 4 ++-- drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +- .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- .../marvell/octeontx2/nic/otx2_common.c | 2 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c| 4 ++-- drivers/nvme/host/tcp.c | 8 +++ drivers/nvme/target/tcp.c | 22 +-- drivers/vhost/net.c | 6 ++--- include/linux/page_frag_cache.h | 21 +- include/linux/skbuff.h| 2 +- kernel/bpf/cpumap.c | 2 +- mm/page_frag_cache.c | 12 +- mm/page_frag_test.c | 13 ++- net/core/skbuff.c | 14 ++-- net/core/xdp.c| 2 +- net/rxrpc/txbuf.c | 15 +++-- net/sunrpc/svcsock.c | 6 ++--- 19 files changed, 74 insertions(+), 69 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index acb73d4d0de6..b6c10100e462 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -729,7 +729,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, total_len = headroom + SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - frame = page_frag_alloc(>page_cache, total_len, GFP_ATOMIC); + frame = page_frag_alloc_va(>page_cache, total_len, GFP_ATOMIC); if (!frame) { u64_stats_update_begin(>statss); rx->xdp_alloc_fails++; @@ -742,7 +742,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, err = xdp_do_redirect(dev, , xdp_prog); if (err) - page_frag_free(frame); + page_frag_free_va(frame); return err; } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 8bb743f78fcb..399b317c509d 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -126,7 +126,7 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf) dev_kfree_skb_any(tx_buf->skb); break; case ICE_TX_BUF_XDP_TX: - page_frag_free(tx_buf->raw_buf); + page_frag_free_va(tx_buf->raw_buf); break; case ICE_TX_BUF_XDP_XMIT: xdp_return_frame(tx_buf->xdpf); diff --git
Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool
On Wed, 10 Jul 2024 at 14:10, Michael S. Tsirkin wrote: > > On Wed, Jul 10, 2024 at 11:05:48AM +0800, Jason Wang wrote: > > On Tue, Jul 9, 2024 at 8:42 PM Michael S. Tsirkin wrote: > > > > > > On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote: > > > > On Tue, 9 Jul 2024 at 11:59, Parav Pandit wrote: > > > > > > > > > > Hi Cindy, > > > > > > > > > > > From: Cindy Lu > > > > > > Sent: Monday, July 8, 2024 12:17 PM > > > > > > > > > > > > Add support for setting the MAC address using the VDPA tool. > > > > > > This feature will allow setting the MAC address using the VDPA tool. > > > > > > For example, in vdpa_sim_net, the implementation sets the MAC > > > > > > address to > > > > > > the config space. However, for other drivers, they can implement > > > > > > their own > > > > > > function, not limited to the config space. > > > > > > > > > > > > Changelog v2 > > > > > > - Changed the function name to prevent misunderstanding > > > > > > - Added check for blk device > > > > > > - Addressed the comments > > > > > > Changelog v3 > > > > > > - Split the function of the net device from > > > > > > vdpa_nl_cmd_dev_attr_set_doit > > > > > > - Add a lock for the network device's dev_set_attr operation > > > > > > - Address the comments > > > > > > > > > > > > Cindy Lu (2): > > > > > > vdpa: support set mac address from vdpa tool > > > > > > vdpa_sim_net: Add the support of set mac address > > > > > > > > > > > > drivers/vdpa/vdpa.c | 81 > > > > > > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++- > > > > > > include/linux/vdpa.h | 9 > > > > > > include/uapi/linux/vdpa.h| 1 + > > > > > > 4 files changed, 109 insertions(+), 1 deletion(-) > > > > > > > > > > > > -- > > > > > > 2.45.0 > > > > > > > > > > Mlx5 device already allows setting the mac and mtu during the vdpa > > > > > device creation time. > > > > > Once the vdpa device is created, it binds to vdpa bus and other > > > > > driver vhost_vdpa etc bind to it. > > > > > So there was no good reason in the past to support explicit config > > > > > after device add complicate the flow for synchronizing this. > > > > > > > > > > The user who wants a device with new attributes, as well destroy and > > > > > recreate the vdpa device with new desired attributes. > > > > > > > > > > vdpa_sim_net can also be extended for similar way when adding the > > > > > vdpa device. > > > > > > > > > > Have you considered using the existing tool and kernel in place since > > > > > 2021? > > > > > Such as commit d8ca2fa5be1. > > > > > > > > > > An example of it is, > > > > > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu > > > > > 9000 > > > > > > > > > Hi Parav > > > > Really thanks for your comments. The reason for adding this function > > > > is to support Kubevirt. > > > > the problem we meet is that kubevirt chooses one random vdpa device > > > > from the pool and we don't know which one it going to pick. That means > > > > we can't get to know the Mac address before it is created. So we plan > > > > to have this function to change the mac address after it is created > > > > Thanks > > > > cindy > > > > > > Well you will need to change kubevirt to teach it to set > > > mac address, right? > > > > That's the plan. Adding Leonardo. > > > > Thanks > > So given you are going to change kubevirt, can we > change it to create devices as needed with the > existing API? > Hi Micheal and Parav, I'm really not familiar with kubevirt, hope Leonardo can help answer these questions Hi @Leonardo Milleri would you help answer these questions? Thanks Cindy > > > > > > -- > > > MST > > > >
Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote: > IIUC, its an abbreviation of a Latin word and is used as a synonym for > "namely" > or "that is to say". > Might not be the best choice in this case. Will change it. I learn new stuff every day: https://en.wikipedia.org/wiki/Viz. > Userspace error decoding tools like the rasdaemon gather related hardware > error > information through the tracepoints. As such, its important to have these two > registers in the tracepoint so that the tools like rasdaemon can parse them > and output the supplemental error information like FRU Text contained in them. Put *that* in the commit message - do not explain what the patch does. > Got it. The first SoB entry is of the primary author. The successive SoB's are > from the people handling and transporting the patch. Exactly! > IOW, the route taken by a patch, as its propagated, to maintainers and > eventually > to Linus, should be evident from the SoB chain. You got it. > With this set, the first two elements of the vendor data dynamic array are > SYND 1/2 > registers while the third element is MCA_CONFIG (added through patch 4 of the > set). > Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are > interpreted > as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set. > > Thus, we depend on array's layout for accurate FRU Text decoding in the > rasdaemon. So it sounds to me like we want to document and thus freeze the vendor-specific blob layout because tools are going to be using and parsing it. And this will spare us the kernel version checks. And new additions to that AMD-specific blob will come at the end and will have to be documented too. That sounds like an ok compromise to me... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH] tracing: Fix overflow in get_free_elt()
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing. Once it overflows, new elements can still be inserted into the tracing_map even though the maximum number of elements (`max_elts`) has been reached. Continuing to insert elements after the overflow could result in the tracing_map containing "tracing_map->max_size" elements, leaving no empty entries. If any attempt is made to insert an element into a full tracing_map using `__tracing_map_insert()`, it will cause an infinite loop with preemption disabled, leading to a CPU hang problem. Fix this by preventing any further increments to "tracing_map->next_elt" once it reaches "tracing_map->max_elt". Co-developed-by: Cheng-Jui Wang Signed-off-by: Cheng-Jui Wang Signed-off-by: Tze-nan Wu --- kernel/trace/tracing_map.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c index a4dcf0f24352..3a56e7c8aa4f 100644 --- a/kernel/trace/tracing_map.c +++ b/kernel/trace/tracing_map.c @@ -454,7 +454,7 @@ static struct tracing_map_elt *get_free_elt(struct tracing_map *map) struct tracing_map_elt *elt = NULL; int idx; - idx = atomic_inc_return(>next_elt); + idx = atomic_fetch_add_unless(>next_elt, 1, map->max_elts); if (idx < map->max_elts) { elt = *(TRACING_MAP_ELT(map->elts, idx)); if (map->ops && map->ops->elt_init) @@ -699,7 +699,7 @@ void tracing_map_clear(struct tracing_map *map) { unsigned int i; - atomic_set(>next_elt, -1); + atomic_set(>next_elt, 0); atomic64_set(>hits, 0); atomic64_set(>drops, 0); @@ -783,7 +783,7 @@ struct tracing_map *tracing_map_create(unsigned int map_bits, map->map_bits = map_bits; map->max_elts = (1 << map_bits); - atomic_set(>next_elt, -1); + atomic_set(>next_elt, 0); map->map_size = (1 << (map_bits + 1)); map->ops = ops; -- 2.18.0
Re: [PATCH] test/vsock: add install target
On Wed, Jul 10, 2024 at 08:11:32AM GMT, Peng Fan wrote: Subject: Re: [PATCH] test/vsock: add install target On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote: >From: Peng Fan > >Add install target for vsock to make Yocto easy to install the images. > >Signed-off-by: Peng Fan >--- > tools/testing/vsock/Makefile | 12 > 1 file changed, 12 insertions(+) > >diff --git a/tools/testing/vsock/Makefile >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460 100644 >--- a/tools/testing/vsock/Makefile >+++ b/tools/testing/vsock/Makefile >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o msg_zerocopy_common.o > vsock_uring_test: LDLIBS = -luring > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o >msg_zerocopy_common.o > >+VSOCK_INSTALL_PATH ?= $(abspath .) >+# Avoid changing the rest of the logic here and lib.mk. >+INSTALL_PATH := $(VSOCK_INSTALL_PATH) >+ > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE - D_GNU_SOURCE > .PHONY: all test clean > clean: >${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test > -include *.d >+ >+install: all >+ @# Ask all targets to install their files >+ mkdir -p $(INSTALL_PATH)/vsock why using the "vsock" subdir? IIUC you were inspired by selftests/Makefile, but it installs under $(INSTALL_PATH)/kselftest/ the scripts used by the main one `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead. So in this case I would install everything in $(INSTALL_PATH). WDYT? I agree. >+ install -m 744 vsock_test $(INSTALL_PATH)/vsock/ >+ install -m 744 vsock_perf $(INSTALL_PATH)/vsock/ >+ install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/ >+ install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/ Also from selftests/Makefile, what about using the ifdef instead of using $(abspath .) as default place? I mean this: install: all ifdef INSTALL_PATH ... else $(error Error: set INSTALL_PATH to use install) endif Is the following looks good to you? # Avoid conflict with INSTALL_PATH set by the main Makefile VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH) I'm not a super Makefile expert, but why do we need both VSOCK_INSTALL_PATH and INSTALL_PATH? Stefano install: all ifdef INSTALL_PATH mkdir -p $(INSTALL_PATH) install -m 744 vsock_test $(INSTALL_PATH) install -m 744 vsock_perf $(INSTALL_PATH) install -m 744 vsock_diag_test $(INSTALL_PATH) install -m 744 vsock_uring_test $(INSTALL_PATH) else $(error Error: set INSTALL_PATH to use install) Endif Thanks, Peng. Thanks, Stefano
[PATCH bpf-next] bpf: kprobe: remove unused declaring of bpf_kprobe_override
After the commit 5ad2f102 ("tracing/kprobe: bpf: Compare instruction pointer with original one"), "bpf_kprobe_override" is not used anywhere anymore, and we can remove it now. Signed-off-by: Menglong Dong --- include/linux/trace_events.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 9df3e2973626..9435185c10ef 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -880,7 +880,6 @@ do { \ struct perf_event; DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); -DECLARE_PER_CPU(int, bpf_kprobe_override); extern int perf_trace_init(struct perf_event *event); extern void perf_trace_destroy(struct perf_event *event); -- 2.39.2
[PATCH RFC 1/1] remoteproc: mediatek: Support reserved CMA regions
From: "shun-yi.wang" In order to reserve specific Contiguous Memory Allocator (CMA) regions for hardware use. When the name of the reserved region contains "cma", then a corresponding CMA heap is added. Signed-off-by: shun-yi.wang --- drivers/remoteproc/mtk_scp.c | 38 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index e744c07507ee..ca0a4a52ece9 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -4,6 +4,7 @@ #include #include +#include #include #include #include @@ -1006,22 +1007,43 @@ EXPORT_SYMBOL_GPL(scp_mapping_dm_addr); static int scp_map_memory_region(struct mtk_scp *scp) { - int ret; + int ret, i, err; const struct mtk_scp_sizes_data *scp_sizes; + struct device_node *node = scp->dev->of_node; + struct of_phandle_iterator it; + + i = 0; + of_for_each_phandle(, err, node, "memory-region", NULL, 0) { + ret = of_reserved_mem_device_init_by_idx(scp->dev, node, i); + + if (ret) { + dev_err(scp->dev, "failed to assign memory-region: %s\n", + it.node->name); + of_node_put(it.node); + return -ENOMEM; + } + +#ifdef CONFIG_DMABUF_HEAPS_CMA + if (strstr(it.node->name, "cma")) { + /* Reserved cma memory region */ + ret = dma_heap_add_cma(scp->dev); + if (ret) { + dev_err(scp->dev, "Failed to add reserved cma"); + of_node_put(it.node); + return ret; + } + } +#endif /* CONFIG_DMABUF_HEAPS_CMA */ - ret = of_reserved_mem_device_init(scp->dev); + i++; + } /* reserved memory is optional. */ - if (ret == -ENODEV) { + if (!i) { dev_info(scp->dev, "skipping reserved memory initialization."); return 0; } - if (ret) { - dev_err(scp->dev, "failed to assign memory-region: %d\n", ret); - return -ENOMEM; - } - /* Reserved SCP code size */ scp_sizes = scp->data->scp_sizes; scp->cpu_addr = dma_alloc_coherent(scp->dev, scp_sizes->max_dram_size, -- 2.18.0
[PATCH RFC 0/1] Support CMA regions
From: "shun-yi.wang" In order to reserve specific Contiguous Memory Allocator (CMA) regions for hardware use. When the name of the reserved region contains "cma", then a corresponding CMA heap is added. In the DTS (Device Tree Source), we may have several memory regions with different names, e.g., { ... memory-region = <_reserved_1>, <_reserved_2>; }; mem_reserved_1: xxx-xxx-region { ... }; mem_reserved_2: xxx-xxx-cma-region { ... }; When the name of the region contains "cma", a corresponding heap allocator is added by cma_heap_add(). However, we are unsure if using the name "cma" as an identifier is a good practice. I sincerely hope that you can provide me some suggestions, thanks. Especially within the segment of '#ifdef CONFIG_DMA_BUF_HEAPS_CMA' Note: The cma_heap_add() is introduced from this patch: dma-buf: heaps: Introduce cma_heap_add() for non-default CMA heap shun-yi.wang (1): remoteproc: mediatek: Support reserved CMA regions drivers/remoteproc/mtk_scp.c | 38 1 file changed, 30 insertions(+), 8 deletions(-) -- 2.18.0
Re: [PATCH v2] filemap: add trace events for get_pages, map_pages, and fault
Hello Matthew, I'd appreciate it if you could comment on this. Thank you.
RE: [PATCH] test/vsock: add install target
> Subject: Re: [PATCH] test/vsock: add install target > > On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote: > >From: Peng Fan > > > >Add install target for vsock to make Yocto easy to install the images. > > > >Signed-off-by: Peng Fan > >--- > > tools/testing/vsock/Makefile | 12 > > 1 file changed, 12 insertions(+) > > > >diff --git a/tools/testing/vsock/Makefile > >b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460 > 100644 > >--- a/tools/testing/vsock/Makefile > >+++ b/tools/testing/vsock/Makefile > >@@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o > msg_zerocopy_common.o > > vsock_uring_test: LDLIBS = -luring > > vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o > >msg_zerocopy_common.o > > > >+VSOCK_INSTALL_PATH ?= $(abspath .) > >+# Avoid changing the rest of the logic here and lib.mk. > >+INSTALL_PATH := $(VSOCK_INSTALL_PATH) > >+ > > CFLAGS += -g -O2 -Werror -Wall -I. -I../../include > > -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow > > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE - > D_GNU_SOURCE > > .PHONY: all test clean > > clean: > > ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf > vsock_uring_test > > -include *.d > >+ > >+install: all > >+@# Ask all targets to install their files > >+mkdir -p $(INSTALL_PATH)/vsock > > why using the "vsock" subdir? > > IIUC you were inspired by selftests/Makefile, but it installs under > $(INSTALL_PATH)/kselftest/ the scripts used by the main one > `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead. > So in this case I would install everything in $(INSTALL_PATH). > > WDYT? I agree. > > >+install -m 744 vsock_test $(INSTALL_PATH)/vsock/ > >+install -m 744 vsock_perf $(INSTALL_PATH)/vsock/ > >+install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/ > >+install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/ > > Also from selftests/Makefile, what about using the ifdef instead of > using $(abspath .) as default place? > > I mean this: > > install: all > ifdef INSTALL_PATH >... > else > $(error Error: set INSTALL_PATH to use install) endif Is the following looks good to you? # Avoid conflict with INSTALL_PATH set by the main Makefile VSOCK_INSTALL_PATH ?= INSTALL_PATH := $(VSOCK_INSTALL_PATH) install: all ifdef INSTALL_PATH mkdir -p $(INSTALL_PATH) install -m 744 vsock_test $(INSTALL_PATH) install -m 744 vsock_perf $(INSTALL_PATH) install -m 744 vsock_diag_test $(INSTALL_PATH) install -m 744 vsock_uring_test $(INSTALL_PATH) else $(error Error: set INSTALL_PATH to use install) Endif Thanks, Peng. > > Thanks, > Stefano
Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU
On Wed, Jul 10, 2024 at 2:16 PM Michael S. Tsirkin wrote: > > On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote: > > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin wrote: > > > > > > QEMU implemented the configuration > > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT > > > incorrectly: it then uses vq3 for reporting, spec says it is always 4. > > > > > > This is masked by a corresponding bug in driver: > > > add a work around as I'm going to try and fix the driver bug. > > > > > > Signed-off-by: Michael S. Tsirkin > > > --- > > > drivers/virtio/virtio_balloon.c | 19 +-- > > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/virtio/virtio_balloon.c > > > b/drivers/virtio/virtio_balloon.c > > > index 9a61febbd2f7..7dc3fcd56238 100644 > > > --- a/drivers/virtio/virtio_balloon.c > > > +++ b/drivers/virtio/virtio_balloon.c > > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb) > > > > > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, > > > callbacks, names, NULL); > > > - if (err) > > > - return err; > > > + if (err) { > > > + /* > > > +* Try to work around QEMU bug which since 2020 confused > > > vq numbers > > > +* when VIRTIO_BALLOON_F_REPORTING but not > > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered. > > > +*/ > > > + if (virtio_has_feature(vb->vdev, > > > VIRTIO_BALLOON_F_REPORTING) && > > > + !virtio_has_feature(vb->vdev, > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > > > + names[VIRTIO_BALLOON_VQ_FREE_PAGE] = > > > "reporting_vq"; > > > + callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = > > > balloon_ack; > > > + err = virtio_find_vqs(vb->vdev, > > > + > > > VIRTIO_BALLOON_VQ_REPORTING, vqs, callbacks, names, NULL); > > > + } > > > + > > > + if (err) > > > + return err; > > > + } > > > > > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; > > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; > > > -- > > > MST > > > > > > > Acked-by: Jason Wang > > > > Do we need a spec to say this is something that needs to be considered > > by the driver? > > > > Thanks > > I'd say it's a temporary situation that we won't need to bother > about in several years. I mean, should a newly-written virtio-balloon driver care about this? If not, it means it can't work for several Qemu versions. Thanks > > -- > MST >
Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter
On Wed, Jul 10, 2024 at 03:28:31PM GMT, Jason Wang wrote: On Wed, Jul 10, 2024 at 3:19 PM Stefano Garzarella wrote: On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote: >On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella wrote: >> >> On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote: >> >On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella wrote: >> >> >> >> Hi Cindy, Jason, >> >> >> >> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote: >> >> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu wrote: >> >> >> >> >> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella wrote: >> >> >> > >> >> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote: >> >> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella wrote: >> >> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, but some >> >> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see >> >> >> > >> https://issues.redhat.com/browse/RHEL-45951). >> >> >> > >> >> >> >> > >> So to allow us to test these filesystems, let's add a module parameter >> >> >> > >> to control the size of the simulated virtio-blk devices. >> >> >> > >> The value is mapped directly to the `capacity` field of the virtio-blk >> >> >> > >> configuration space, so it must be expressed in sector numbers of 512 >> >> >> > >> bytes. >> >> >> > >> >> >> >> > >> The default value (0x4) is the same as the previous value, so the >> >> >> > >> behavior without setting `capacity` remains unchanged. >> >> >> > >> >> >> >> > >> Before this patch or with this patch without setting `capacity`: >> >> >> > >> $ modprobe vdpa-sim-blk >> >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 >> >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues >> >> >> > >> virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 MB/128 MiB) >> >> >> > >> >> >> >> > >> After this patch: >> >> >> > >> $ modprobe vdpa-sim-blk capacity=614400 >> >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 >> >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues >> >> >> > >> virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 MB/300 MiB) >> >> >> > >> >> >> >> > >> Signed-off-by: Stefano Garzarella >> >> >> > > >> >> >> > >What a hack. Cindy was working on adding control over config >> >> >> > >space, why can't that be used? >> >> >> > >> >> >> > If it can be used easily with virtio-blk device too, it will be great. >> >> >> > @Cindy do you plan to support that changes for a virtio-blk device too? >> >> >> > >> >> >> Hi Stefano >> >> >> I plan to add support to change the vdpa device's configuration after >> >> >> it is created. >> >> > >> >> >I think for Stefano's case, we can just implement it via provisioning >> >> >parameters? >> >> >> >> Yep, I think we don't need to change it after creation, but specifying >> >> while creating should be enough. >> >> >> >> So, IIUC we can already do it, implementing something similar to >> >> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right? >> > >> >Right. >> > >> >> >> >> What about when we have `shared_backend` set to true for the >> >> vdpa_sim_blk.ko? In this case the backend is supposed to be shared >> >> between all the devices to test live migration. >> > >> >This seems to be another topic. >> >> Yep, but really related. I think we need to handle that case when >> supporting the `capacity` setting. > >Ok, so if I was not wrong, the goal is to test migration. Sorry, I was not clear, I try to rephrase: vdpa_sim_blk already supports a module parameter called `shared_backend` introduced mainly to test live migration on the same host. When that parameter is on, all the created devices share the same backend and so we can easily do migration from one to another. With that parameter on or off, the device is always 128 MB, but now that's a problem for testing, because it looks like XFS requires at least 300 MB: https://issues.redhat.com/browse/RHEL-45951 That's why I sent this patch. When `shared_backend` is off (default), using the provisioning parameters seems feasible to me, but when it's on, how do I deal with it? Being a simulator, we can maybe make it so that only the first device can change the size for example, or that all devices control the size, but then we would have to handle the size change at runtime, which I think is feasible, but it requires some work to send a notification of configuration change, etc. Can we mandate the size parameter to be exactly the same as the first vDPA block simulator? > >> >> > >> >> >> >> Maybe we can just change the size of the shared ramdisk to be reflected >> >> to all devices. >> >> >> >> Suggestions? >> > >> >Could we specify the path to tmpfs or others during provisioning >> >instead? It seems more general (but more work). >> >> Then it would almost become a real device, no longer just a simulator. >> It's enough work, though, as you said, but at that point we'd just
Re: [PATCH] test/vsock: add install target
On Tue, Jul 09, 2024 at 09:50:51PM GMT, Peng Fan (OSS) wrote: From: Peng Fan Add install target for vsock to make Yocto easy to install the images. Signed-off-by: Peng Fan --- tools/testing/vsock/Makefile | 12 1 file changed, 12 insertions(+) diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index a7f56a09ca9f..5c8442fa9460 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -8,8 +8,20 @@ vsock_perf: vsock_perf.o msg_zerocopy_common.o vsock_uring_test: LDLIBS = -luring vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o msg_zerocopy_common.o +VSOCK_INSTALL_PATH ?= $(abspath .) +# Avoid changing the rest of the logic here and lib.mk. +INSTALL_PATH := $(VSOCK_INSTALL_PATH) + CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE .PHONY: all test clean clean: ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test -include *.d + +install: all + @# Ask all targets to install their files + mkdir -p $(INSTALL_PATH)/vsock why using the "vsock" subdir? IIUC you were inspired by selftests/Makefile, but it installs under $(INSTALL_PATH)/kselftest/ the scripts used by the main one `run_kselftest.sh`, which is installed in $(INSTALL_PATH instead. So in this case I would install everything in $(INSTALL_PATH). WDYT? + install -m 744 vsock_test $(INSTALL_PATH)/vsock/ + install -m 744 vsock_perf $(INSTALL_PATH)/vsock/ + install -m 744 vsock_diag_test $(INSTALL_PATH)/vsock/ + install -m 744 vsock_uring_test $(INSTALL_PATH)/vsock/ Also from selftests/Makefile, what about using the ifdef instead of using $(abspath .) as default place? I mean this: install: all ifdef INSTALL_PATH ... else $(error Error: set INSTALL_PATH to use install) endif Thanks, Stefano
Re: [PATCH 00/15] Implement MODVERSIONS for Rust
On 6/17/24 19:58, Sami Tolvanen wrote: > Hi folks, > > This series implements CONFIG_MODVERSIONS for Rust, an important > feature for distributions like Android that want to ship Rust > kernel modules, and depend on modversions to help ensure module ABI > compatibility. Thanks for working on this. Below is some feedback with my (open)SUSE hat on, although it should be quite general. > There have been earlier proposals [1][2] that would allow Rust > modules to coexist with modversions, but none that actually implement > symbol versioning. Unlike C, Rust source code doesn't have sufficient > information about the final ABI, as the compiler has considerable > freedom in adjusting structure layout for improved performance [3], > for example, which makes using a source code parser like genksyms > a non-starter. Based on Matt's suggestion and previous feedback > from maintainers, this series uses DWARF debugging information for > computing versions. DWARF is an established and relatively stable > format, which includes all the necessary ABI details, and adding a > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a > reasonable trade-off. Using the DWARF data makes sense to me. Distribution kernels are normally built with debuginfo because one has to be able to debug them later, unsurprisingly. Besides that, one now typically wants to use BPF together with built-in BTF data (CONFIG_DEBUG_INFO_BTF), which also requires building the kernel with debuginfo. I would however keep in mind that producing all DWARF data has some cost. From a quick test, an x86_64-defconfig build is ~30% slower for me with CONFIG_DEBUG_INFO=y. The current genksyms tool allows to have debuginfo disabled when backporting some patches and consequently have a quicker feedback whether modversions changed. This option would disappear with gendwarfksyms but I think it is acceptable. > > The first 12 patches of this series add a small tool for computing > symbol versions from DWARF, called gendwarfksyms. When passed a list > of exported symbols, the tool generates an expanded type string > for each symbol, and computes symbol CRCs similarly to genksyms. > gendwarfksyms is written in C and uses libdw to process DWARF, mainly > because of the existing support for C host tools that use elfutils > (e.g., objtool). In addition to calculating CRCs of exported symbols, genksyms has other features which I think are important. Firstly, the genksyms tool has a human-readable storage format for input data used in the calculation of symbol CRCs. Setting the make variable KBUILD_SYMTYPES enables dumping this data and storing it in *.symtypes files. When a developer later modifies the kernel and wants to check if some symbols have changed, they can take these files and feed them as *.symref back to genksyms. This allows the tool to provide an actual reason why some symbols have changed, instead of just printing that their CRCs are different. Is there any plan to add the same functionality to gendwarfksyms, or do you envison that people will use libabigail, Symbol-Type Graph, or another tool for making this type of comparison? Secondly, when distributions want to maintain stable kABI, they need to be able to deal with patch backports that add new members to structures. One common approach is to have placeholders in important structures which can be later replaced by the new members as needed. __GENKSYMS__ ifdefs are then used at the C source level to hide these kABI-compatible changes from genksyms. Gendwarfksyms works on the resulting binary and so using such ifdefs wouldn't work. Instead, I suspect that what is required is a mechanism to tell the tool that a given change is ok, probably by allowing to specify some map from the original definition to the new one. Is there a plan to implement something like this, or how could it be addressed? > Another compatibility issue is fitting the extremely long mangled > Rust symbol names into struct modversion_info, which can only hold > 55-character names (on 64-bit systems). Previous proposals suggested > changing the structure to support longer names, but the conclusion was > that we cannot break userspace tools that parse the version table. > > The next two patches of the series implement support for hashed > symbol names in struct modversion_info, where names longer than 55 > characters are hashed, and the hash is stored in the name field. To > avoid breaking userspace tools, the binary hash is prefixed with a > null-terminated string containing the name of the hash function. While > userspace tools can later be updated to potentially produce more > useful information about the long symbols, this allows them to > continue working in the meantime. I think this approach with hashed names is quite complex. I'd personally be also in favor of having a new section with variable-length strings to store the names of imported symbols. As yet another alternative, it should be also possible
Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter
On Wed, Jul 10, 2024 at 3:19 PM Stefano Garzarella wrote: > > On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote: > >On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella > >wrote: > >> > >> On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote: > >> >On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella > >> >wrote: > >> >> > >> >> Hi Cindy, Jason, > >> >> > >> >> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote: > >> >> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu wrote: > >> >> >> > >> >> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella > >> >> >> wrote: > >> >> >> > > >> >> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote: > >> >> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella > >> >> >> > >wrote: > >> >> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, > >> >> >> > >> but some > >> >> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see > >> >> >> > >> https://issues.redhat.com/browse/RHEL-45951). > >> >> >> > >> > >> >> >> > >> So to allow us to test these filesystems, let's add a module > >> >> >> > >> parameter > >> >> >> > >> to control the size of the simulated virtio-blk devices. > >> >> >> > >> The value is mapped directly to the `capacity` field of the > >> >> >> > >> virtio-blk > >> >> >> > >> configuration space, so it must be expressed in sector numbers > >> >> >> > >> of 512 > >> >> >> > >> bytes. > >> >> >> > >> > >> >> >> > >> The default value (0x4) is the same as the previous value, > >> >> >> > >> so the > >> >> >> > >> behavior without setting `capacity` remains unchanged. > >> >> >> > >> > >> >> >> > >> Before this patch or with this patch without setting `capacity`: > >> >> >> > >> $ modprobe vdpa-sim-blk > >> >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 > >> >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues > >> >> >> > >> virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 > >> >> >> > >> MB/128 MiB) > >> >> >> > >> > >> >> >> > >> After this patch: > >> >> >> > >> $ modprobe vdpa-sim-blk capacity=614400 > >> >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 > >> >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues > >> >> >> > >> virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 > >> >> >> > >> MB/300 MiB) > >> >> >> > >> > >> >> >> > >> Signed-off-by: Stefano Garzarella > >> >> >> > > > >> >> >> > >What a hack. Cindy was working on adding control over config > >> >> >> > >space, why can't that be used? > >> >> >> > > >> >> >> > If it can be used easily with virtio-blk device too, it will be > >> >> >> > great. > >> >> >> > @Cindy do you plan to support that changes for a virtio-blk device > >> >> >> > too? > >> >> >> > > >> >> >> Hi Stefano > >> >> >> I plan to add support to change the vdpa device's configuration after > >> >> >> it is created. > >> >> > > >> >> >I think for Stefano's case, we can just implement it via provisioning > >> >> >parameters? > >> >> > >> >> Yep, I think we don't need to change it after creation, but specifying > >> >> while creating should be enough. > >> >> > >> >> So, IIUC we can already do it, implementing something similar to > >> >> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right? > >> > > >> >Right. > >> > > >> >> > >> >> What about when we have `shared_backend` set to true for the > >> >> vdpa_sim_blk.ko? In this case the backend is supposed to be shared > >> >> between all the devices to test live migration. > >> > > >> >This seems to be another topic. > >> > >> Yep, but really related. I think we need to handle that case when > >> supporting the `capacity` setting. > > > >Ok, so if I was not wrong, the goal is to test migration. > > Sorry, I was not clear, I try to rephrase: > vdpa_sim_blk already supports a module parameter called `shared_backend` > introduced mainly to test live migration on the same host. When that > parameter is on, all the created devices share the same backend and so > we can easily do migration from one to another. > > With that parameter on or off, the device is always 128 MB, but now > that's a problem for testing, because it looks like XFS requires at > least 300 MB: https://issues.redhat.com/browse/RHEL-45951 > > That's why I sent this patch. > > When `shared_backend` is off (default), using the provisioning > parameters seems feasible to me, but when it's on, how do I deal with > it? > > Being a simulator, we can maybe make it so that only the first device > can change the size for example, or that all devices control the size, > but then we would have to handle the size change at runtime, which I > think is feasible, but it requires some work to send a notification of > configuration change, etc. Can we mandate the size parameter to be exactly the same as the first vDPA block simulator? > > > > >> > >> > > >> >> > >> >> Maybe we can just change the size of the shared ramdisk to be reflected > >> >> to all
Re: [PATCH] vdpa_sim_blk: add `capacity` module parameter
On Wed, Jul 10, 2024 at 11:08:48AM GMT, Jason Wang wrote: On Tue, Jul 9, 2024 at 8:41 PM Stefano Garzarella wrote: On Tue, Jul 09, 2024 at 10:56:16AM GMT, Jason Wang wrote: >On Mon, Jul 8, 2024 at 4:15 PM Stefano Garzarella wrote: >> >> Hi Cindy, Jason, >> >> On Mon, Jul 08, 2024 at 03:59:34PM GMT, Jason Wang wrote: >> >On Mon, Jul 8, 2024 at 3:06 PM Cindy Lu wrote: >> >> >> >> On Fri, 5 Jul 2024 at 20:42, Stefano Garzarella wrote: >> >> > >> >> > On Fri, Jul 05, 2024 at 07:30:51AM GMT, Michael S. Tsirkin wrote: >> >> > >On Fri, Jul 05, 2024 at 01:28:21PM +0200, Stefano Garzarella wrote: >> >> > >> The vDPA block simulator always allocated a 128 MiB ram-disk, but some >> >> > >> filesystems (e.g. XFS) may require larger minimum sizes (see >> >> > >> https://issues.redhat.com/browse/RHEL-45951). >> >> > >> >> >> > >> So to allow us to test these filesystems, let's add a module parameter >> >> > >> to control the size of the simulated virtio-blk devices. >> >> > >> The value is mapped directly to the `capacity` field of the virtio-blk >> >> > >> configuration space, so it must be expressed in sector numbers of 512 >> >> > >> bytes. >> >> > >> >> >> > >> The default value (0x4) is the same as the previous value, so the >> >> > >> behavior without setting `capacity` remains unchanged. >> >> > >> >> >> > >> Before this patch or with this patch without setting `capacity`: >> >> > >> $ modprobe vdpa-sim-blk >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues >> >> > >> virtio_blk virtio6: [vdb] 262144 512-byte logical blocks (134 MB/128 MiB) >> >> > >> >> >> > >> After this patch: >> >> > >> $ modprobe vdpa-sim-blk capacity=614400 >> >> > >> $ vdpa dev add mgmtdev vdpasim_blk name blk0 >> >> > >> virtio_blk virtio6: 1/0/0 default/read/poll queues >> >> > >> virtio_blk virtio6: [vdb] 614400 512-byte logical blocks (315 MB/300 MiB) >> >> > >> >> >> > >> Signed-off-by: Stefano Garzarella >> >> > > >> >> > >What a hack. Cindy was working on adding control over config >> >> > >space, why can't that be used? >> >> > >> >> > If it can be used easily with virtio-blk device too, it will be great. >> >> > @Cindy do you plan to support that changes for a virtio-blk device too? >> >> > >> >> Hi Stefano >> >> I plan to add support to change the vdpa device's configuration after >> >> it is created. >> > >> >I think for Stefano's case, we can just implement it via provisioning >> >parameters? >> >> Yep, I think we don't need to change it after creation, but specifying >> while creating should be enough. >> >> So, IIUC we can already do it, implementing something similar to >> vdpasim_net_setup_config() to call during vdpasim_blk_dev_add(), right? > >Right. > >> >> What about when we have `shared_backend` set to true for the >> vdpa_sim_blk.ko? In this case the backend is supposed to be shared >> between all the devices to test live migration. > >This seems to be another topic. Yep, but really related. I think we need to handle that case when supporting the `capacity` setting. Ok, so if I was not wrong, the goal is to test migration. Sorry, I was not clear, I try to rephrase: vdpa_sim_blk already supports a module parameter called `shared_backend` introduced mainly to test live migration on the same host. When that parameter is on, all the created devices share the same backend and so we can easily do migration from one to another. With that parameter on or off, the device is always 128 MB, but now that's a problem for testing, because it looks like XFS requires at least 300 MB: https://issues.redhat.com/browse/RHEL-45951 That's why I sent this patch. When `shared_backend` is off (default), using the provisioning parameters seems feasible to me, but when it's on, how do I deal with it? Being a simulator, we can maybe make it so that only the first device can change the size for example, or that all devices control the size, but then we would have to handle the size change at runtime, which I think is feasible, but it requires some work to send a notification of configuration change, etc. > >> >> Maybe we can just change the size of the shared ramdisk to be reflected >> to all devices. >> >> Suggestions? > >Could we specify the path to tmpfs or others during provisioning >instead? It seems more general (but more work). Then it would almost become a real device, no longer just a simulator. It's enough work, though, as you said, but at that point we'd just have to specify the backend file to use for the device. In that case what API would we need to use to allow the user to set the backend file? Yes, I think we can allow some vendor specific provisioning parameters. But not sure it's an overkill for the use case here. If others are happy with the shared_backed. I'm fine. Yeah, maybe it's overkill and I don't have much time these days :-( I think the easiest way is to merge this patch, but I
Re: [PATCH 1/2] virtio_balloon: add work around for out of spec QEMU
On Wed, Jul 10, 2024 at 11:23:20AM +0800, Jason Wang wrote: > On Fri, Jul 5, 2024 at 6:09 PM Michael S. Tsirkin wrote: > > > > QEMU implemented the configuration > > VIRTIO_BALLOON_F_REPORTING && ! VIRTIO_BALLOON_F_FREE_PAGE_HINT > > incorrectly: it then uses vq3 for reporting, spec says it is always 4. > > > > This is masked by a corresponding bug in driver: > > add a work around as I'm going to try and fix the driver bug. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > drivers/virtio/virtio_balloon.c | 19 +-- > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/virtio/virtio_balloon.c > > b/drivers/virtio/virtio_balloon.c > > index 9a61febbd2f7..7dc3fcd56238 100644 > > --- a/drivers/virtio/virtio_balloon.c > > +++ b/drivers/virtio/virtio_balloon.c > > @@ -597,8 +597,23 @@ static int init_vqs(struct virtio_balloon *vb) > > > > err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, > > callbacks, names, NULL); > > - if (err) > > - return err; > > + if (err) { > > + /* > > +* Try to work around QEMU bug which since 2020 confused vq > > numbers > > +* when VIRTIO_BALLOON_F_REPORTING but not > > +* VIRTIO_BALLOON_F_FREE_PAGE_HINT are offered. > > +*/ > > + if (virtio_has_feature(vb->vdev, > > VIRTIO_BALLOON_F_REPORTING) && > > + !virtio_has_feature(vb->vdev, > > VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { > > + names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "reporting_vq"; > > + callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = > > balloon_ack; > > + err = virtio_find_vqs(vb->vdev, > > + VIRTIO_BALLOON_VQ_REPORTING, > > vqs, callbacks, names, NULL); > > + } > > + > > + if (err) > > + return err; > > + } > > > > vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; > > vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; > > -- > > MST > > > > Acked-by: Jason Wang > > Do we need a spec to say this is something that needs to be considered > by the driver? > > Thanks I'd say it's a temporary situation that we won't need to bother about in several years. -- MST
Re: [PATCH v3 0/2] vdpa: support set mac address from vdpa tool
On Wed, Jul 10, 2024 at 11:05:48AM +0800, Jason Wang wrote: > On Tue, Jul 9, 2024 at 8:42 PM Michael S. Tsirkin wrote: > > > > On Tue, Jul 09, 2024 at 02:19:19PM +0800, Cindy Lu wrote: > > > On Tue, 9 Jul 2024 at 11:59, Parav Pandit wrote: > > > > > > > > Hi Cindy, > > > > > > > > > From: Cindy Lu > > > > > Sent: Monday, July 8, 2024 12:17 PM > > > > > > > > > > Add support for setting the MAC address using the VDPA tool. > > > > > This feature will allow setting the MAC address using the VDPA tool. > > > > > For example, in vdpa_sim_net, the implementation sets the MAC address > > > > > to > > > > > the config space. However, for other drivers, they can implement > > > > > their own > > > > > function, not limited to the config space. > > > > > > > > > > Changelog v2 > > > > > - Changed the function name to prevent misunderstanding > > > > > - Added check for blk device > > > > > - Addressed the comments > > > > > Changelog v3 > > > > > - Split the function of the net device from > > > > > vdpa_nl_cmd_dev_attr_set_doit > > > > > - Add a lock for the network device's dev_set_attr operation > > > > > - Address the comments > > > > > > > > > > Cindy Lu (2): > > > > > vdpa: support set mac address from vdpa tool > > > > > vdpa_sim_net: Add the support of set mac address > > > > > > > > > > drivers/vdpa/vdpa.c | 81 > > > > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 19 ++- > > > > > include/linux/vdpa.h | 9 > > > > > include/uapi/linux/vdpa.h| 1 + > > > > > 4 files changed, 109 insertions(+), 1 deletion(-) > > > > > > > > > > -- > > > > > 2.45.0 > > > > > > > > Mlx5 device already allows setting the mac and mtu during the vdpa > > > > device creation time. > > > > Once the vdpa device is created, it binds to vdpa bus and other driver > > > > vhost_vdpa etc bind to it. > > > > So there was no good reason in the past to support explicit config > > > > after device add complicate the flow for synchronizing this. > > > > > > > > The user who wants a device with new attributes, as well destroy and > > > > recreate the vdpa device with new desired attributes. > > > > > > > > vdpa_sim_net can also be extended for similar way when adding the vdpa > > > > device. > > > > > > > > Have you considered using the existing tool and kernel in place since > > > > 2021? > > > > Such as commit d8ca2fa5be1. > > > > > > > > An example of it is, > > > > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu > > > > 9000 > > > > > > > Hi Parav > > > Really thanks for your comments. The reason for adding this function > > > is to support Kubevirt. > > > the problem we meet is that kubevirt chooses one random vdpa device > > > from the pool and we don't know which one it going to pick. That means > > > we can't get to know the Mac address before it is created. So we plan > > > to have this function to change the mac address after it is created > > > Thanks > > > cindy > > > > Well you will need to change kubevirt to teach it to set > > mac address, right? > > That's the plan. Adding Leonardo. > > Thanks So given you are going to change kubevirt, can we change it to create devices as needed with the existing API? > > > > -- > > MST > >