[PATCH] kselftests: fix grammar and non-ASCII space
From: Randy DunlapThis is a small cleanup to kselftest.rst: - Fix some language typos in the usage instructions. - Change one non-ASCII space to an ASCII space. Signed-off-by: Randy Dunlap --- Documentation/dev-tools/kselftest.rst | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) --- linux-next-20180426.orig/Documentation/dev-tools/kselftest.rst +++ linux-next-20180426/Documentation/dev-tools/kselftest.rst @@ -9,7 +9,7 @@ and booting a kernel. On some systems, hot-plug tests could hang forever waiting for cpu and memory to be ready to be offlined. A special hot-plug target is created -to run full range of hot-plug tests. In default mode, hot-plug tests run +to run the full range of hot-plug tests. In default mode, hot-plug tests run in safe mode with a limited scope. In limited mode, cpu-hotplug test is run on a single cpu as opposed to all hotplug capable cpus, and memory hotplug test is run on 2% of hotplug capable memory instead of 10%. @@ -89,9 +89,9 @@ Note that some tests will require root p Install selftests = -You can use kselftest_install.sh tool installs selftests in default -location which is tools/testing/selftests/kselftest or a user specified -location. +You can use the kselftest_install.sh tool to install selftests in the +default location, which is tools/testing/selftests/kselftest, or in a +user specified location. To install selftests in default location:: @@ -109,7 +109,7 @@ Running installed selftests Kselftest install as well as the Kselftest tarball provide a script named "run_kselftest.sh" to run the tests. -You can simply do the following to run the installed Kselftests. Please +You can simply do the following to run the installed Kselftests. Please note some tests will require root privileges:: $ cd kselftest @@ -139,7 +139,7 @@ Contributing new tests (details) default. TEST_CUSTOM_PROGS should be used by tests that require custom build - rule and prevent common build rule use. + rules and prevent common build rule use. TEST_PROGS are for test shell scripts. Please ensure shell script has its exec bit set. Otherwise, lib.mk run_tests will generate a warning. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] documentation: core-api: rearrange a few kernel-api chapters and sections
On 04/27/2018 04:17 PM, Jonathan Corbet wrote: > On Thu, 26 Apr 2018 18:11:02 -0700 > Randy Dunlapwrote: > >> Rearrange some kernel-api chapters and sections to group them >> together better. >> >> - move Bit Operations from Basic C Library Functions to Basic >> Kernel Library Functions (now adjacent to Bitmap Operations since >> they are not typical C library functions) >> >> - move Sorting from Math Functions to Basic Kernel Library Functions >> since sort functions are more Basic than Math Functions >> >> - move Text Searching from Math Functions to Basic Kernel Library >> Functions (keep Sorting and Searching close to each other) >> >> - combine CRC and Math functions together into the (newly named) >> CRC and Math Functions chapter > > The changes look good. But ... grr... some of the stuff you are moving > around was introduced in -rc2 via the networking tree. That kind of thing > makes life harder than it needs to be. I've sorted it out and applied the > patch, thanks. Sorry, I thought that making the patch on linux-next 20180426 (which is based on -rc2 and a few hundred git trees) would help with that. Thanks for fixing. -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next v2] ipv6: sr: Add documentation for seg_flowlabel sysctl
From: Ahmed AbdelsalamDate: Fri, 27 Apr 2018 17:51:48 +0200 > This patch adds a documentation for seg_flowlabel sysctl into > Documentation/networking/ip-sysctl.txt > > Signed-off-by: Ahmed Abdelsalam Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] fpga: manager: change api, don't use drvdata
On 04/27/2018 04:30 PM, Alan Tull wrote: > On Fri, Apr 27, 2018 at 1:26 PM, Florian Fainelli> wrote: >> On 04/26/2018 06:26 PM, Moritz Fischer wrote: >>> From: Alan Tull >>> >>> Change fpga_mgr_register to not set or use drvdata. This supports >>> the case where a PCIe device has more than one manager. >>> >>> Add fpga_mgr_create/free functions. Change fpga_mgr_register and >>> fpga_mgr_unregister functions to take the mgr struct as their only >>> parameter. >>> >>> struct fpga_manager *fpga_mgr_create(struct device *dev, >>> const char *name, >>> const struct fpga_manager_ops *mops, >>> void *priv); >>> void fpga_mgr_free(struct fpga_manager *mgr); >>> int fpga_mgr_register(struct fpga_manager *mgr); >>> void fpga_mgr_unregister(struct fpga_manager *mgr); >>> >>> Update the drivers that call fpga_mgr_register with the new API. >> >> Apologies for chiming in so late, this commit does not make it clear >> that fpga_mgr_unregister() now also free the 'mgr' argument by calling >> fpga_mgr_free(), this is kind of detail, but an API should make that >> clear IMHO. > > If people follow the usage information, in > Documentation/fpga/fpga-mgr.txt, they'll do the right thing. But I > can add a patch that clarifies the description of fpga_mgr_unregister > in fpga-mgr.c that it "unregisters and frees" the manager. Just mentioning that because not all APIs do this, take the network devices: there is an unregister_netdev() and a free_netdev(). Either way is fine with me as long as it is documented as such, I had to look at the API implementation to figure out that, no, all the drivers were not leaking their fpga_manager instance in their .remove() function :) -- Florian -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] docs/vm: update KSM documentation
On Tue, 24 Apr 2018 09:40:21 +0300 Mike Rapoportwrote: > These patches extend KSM documentation with high level design overview and > some details about reverse mappings and split the userspace interface > description to Documentation/admin-guide/mm. > > The description of some KSM sysfs attributes is changed so that it won't > include implementation detail. The description of these implementation > details are moved to the new "Design" section. > > The last patch in the series depends on the patchset that create > Documentation/admin-guide/mm [1], all the rest applies cleanly to the > current docs-next. I've applied the set, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: driver-api: fix device_connection.rst kernel-doc error
On Thu, 26 Apr 2018 18:29:41 -0700 Randy Dunlapwrote: > Using incorrect :functions: syntax (extra space) causes an odd kernel-doc > warning, so fix that. > > Documentation/driver-api/device_connection.rst:42: ERROR: Error in > "kernel-doc" directive: Applied, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] documentation: core-api: rearrange a few kernel-api chapters and sections
On Thu, 26 Apr 2018 18:11:02 -0700 Randy Dunlapwrote: > Rearrange some kernel-api chapters and sections to group them > together better. > > - move Bit Operations from Basic C Library Functions to Basic > Kernel Library Functions (now adjacent to Bitmap Operations since > they are not typical C library functions) > > - move Sorting from Math Functions to Basic Kernel Library Functions > since sort functions are more Basic than Math Functions > > - move Text Searching from Math Functions to Basic Kernel Library > Functions (keep Sorting and Searching close to each other) > > - combine CRC and Math functions together into the (newly named) > CRC and Math Functions chapter The changes look good. But ... grr... some of the stuff you are moving around was introduced in -rc2 via the networking tree. That kind of thing makes life harder than it needs to be. I've sorted it out and applied the patch, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] docs/vm: start moving files do Documentation/admin-guide`
On Wed, 18 Apr 2018 11:07:43 +0300 Mike Rapoportwrote: > These pacthes begin categorizing memory management documentation. The > documents that describe userspace APIs and do not overload the reader with > implementation details can be moved to Documentation/admin-guide, so let's > do it :) Looks good, set applied, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] coresight: Refresh documenation
On Tue, 17 Apr 2018 10:08:04 -0600 Mathieu Poirierwrote: > Now that the perf tools CoreSight support is upstream this set adds > documentation to go with it and move things around so that topics > are located together. I've applied the set, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] doc: dev-tools: kselftest.rst: update contributing new tests
On Thu, 19 Apr 2018 12:28:25 +0200 Anders Roxellwrote: > Add a description that the kernel headers should be used as far as it is > possible and then the system headers. > > Signed-off-by: Anders Roxell Applied, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] docs: kernel-parameters.txt: Fix whitespace
On Wed, 18 Apr 2018 20:51:39 +0200 Thymo van Beerswrote: > Some lines used spaces instead of tabs at line start. > This can cause mangled lines in editors due to inconsistency. > > Replace spaces for tabs where appropriate. Applied, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] linux-next: ftrace/docs: Fix spelling typos in ftrace-users.rst
On Fri, 27 Apr 2018 18:17:09 -0400 Steven Rostedtwrote: > I just noticed that this was never applied. > > Jon, can you take this? Wow...from November. Not sure what happened...applied now, thanks. jon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] linux-next: ftrace/docs: Fix spelling typos in ftrace-users.rst
I just noticed that this was never applied. Jon, can you take this? -- Steve On Mon, 27 Nov 2017 22:46:36 -0500 Steven Rostedtwrote: > On Tue, 28 Nov 2017 12:26:13 +0900 > Masanari Iida wrote: > > > This patch corrects some spelling typo in ftrace-users.rst > > > > Signed-off-by: Masanari Iida > > --- > > Documentation/trace/ftrace-uses.rst | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/trace/ftrace-uses.rst > > b/Documentation/trace/ftrace-uses.rst > > index 8494a801d341..9df5ee15859a 100644 > > --- a/Documentation/trace/ftrace-uses.rst > > +++ b/Documentation/trace/ftrace-uses.rst > > @@ -12,7 +12,7 @@ Written for: 4.14 > > Introduction > > > > > > -The ftrace infrastructure was originially created to attach callbacks to > > the > > +The ftrace infrastructure was originally created to attach callbacks to the > > beginning of functions in order to record and trace the flow of the kernel. > > But callbacks to the start of a function can have other use cases. Either > > for live kernel patching, or for security monitoring. This document > > describes > > @@ -29,7 +29,7 @@ going to idle, during CPU bring up and takedown, or going > > to user space. > > This requires extra care to what can be done inside a callback. A callback > > can be called outside the protective scope of RCU. > > > > -The ftrace infrastructure has some protections agains recursions and RCU > > +The ftrace infrastructure has some protections against recursions and RCU > > but one must still be very careful how they use the callbacks. > > > > > > Acked-by: Steven Rostedt (VMware) > > -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range
Even with clamped sysctl parameters, it is still not that straight forward to figure out the exact range of those parameters. One may try to write extreme parameter values to see if they get clamped. To make it easier, a warning with the expected range will now be printed into the kernel ring buffer when a clamped sysctl parameter receives an out of range value. The pr_warn_ratelimited() macro is used to limit the number of warning messages that can be printed within a given period of time. Signed-off-by: Waiman Long--- kernel/sysctl.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 5b84c1d..76b2f1b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -17,6 +17,7 @@ * The list_for_each() macro wasn't appropriate for the sysctl loop. * Removed it and replaced it with older style, 03/23/00, Bill Wendling */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include @@ -2516,6 +2517,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * @flags: pointer to flags + * @name: sysctl parameter name * * The do_proc_dointvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2525,6 +2527,7 @@ struct do_proc_dointvec_minmax_conv_param { int *min; int *max; uint16_t *flags; + const char *name; }; static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, @@ -2534,12 +2537,14 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, struct do_proc_dointvec_minmax_conv_param *param = data; if (write) { int val = *negp ? -*lvalp : *lvalp; + bool clamped = false; bool clamp = param->flags && (*param->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE); if (param->min && *param->min > val) { if (clamp) { val = *param->min; + clamped = true; } else { return -EINVAL; } @@ -2547,11 +2552,17 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, if (param->max && *param->max < val) { if (clamp) { val = *param->max; + clamped = true; } else { return -EINVAL; } } *valp = val; + if (clamped && param->name) + pr_warn_ratelimited("\"%s\" was set out of range [%d, %d], clamped to %d.\n", + param->name, + param->min ? *param->min : -INT_MAX, + param->max ? *param->max : INT_MAX, val); } else { int val = *valp; if (val < 0) { @@ -2589,6 +2600,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, .min = (int *) table->extra1, .max = (int *) table->extra2, .flags = >flags, + .name = table->procname, }; return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_minmax_conv, ); @@ -2599,6 +2611,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * @flags: pointer to flags + * @name: sysctl parameter name * * The do_proc_douintvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2608,6 +2621,7 @@ struct do_proc_douintvec_minmax_conv_param { unsigned int *min; unsigned int *max; uint16_t *flags; + const char *name; }; static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, @@ -2618,6 +2632,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, if (write) { unsigned int val = *lvalp; + bool clamped = false; bool clamp = param->flags && (*param->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE); @@ -2627,6 +2642,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, if (param->min && *param->min > val) { if (clamp) { val = *param->min; + clamped = true; } else { return -ERANGE; } @@ -2634,11 +2650,17 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
[PATCH v6 2/8] proc/sysctl: Provide additional ctl_table.flags checks
Checking code is added to provide the following additional ctl_table.flags checks: 1) No unknown flag is allowed. 2) Minimum of a range cannot be larger than the maximum value. 3) The signed and unsigned flags are mutually exclusive. 4) The proc_handler should be consistent with the signed or unsigned flags. The separation of signed and unsigned flags helps to provide more comprehensive checking than it would have been if there is only one flag available. Signed-off-by: Waiman Long--- fs/proc/proc_sysctl.c | 60 +++ 1 file changed, 60 insertions(+) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 8989936..fb09454 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1092,6 +1092,64 @@ static int sysctl_check_table_array(const char *path, struct ctl_table *table) return err; } +/* + * This code assumes that only one integer value is allowed in an integer + * sysctl when one of the clamping flags is used. If that assumption is no + * longer true, we may need to add another flag to indicate the entry size. + */ +static int sysctl_check_flags(const char *path, struct ctl_table *table) +{ + int err = 0; + + if ((table->flags & ~CTL_TABLE_FLAGS_ALL) || + ((table->flags & CTL_FLAGS_CLAMP_RANGE) == CTL_FLAGS_CLAMP_RANGE)) + err = sysctl_err(path, table, "invalid flags"); + + if (table->flags & CTL_FLAGS_CLAMP_RANGE) { + int range_err = 0; + bool is_int = (table->maxlen == sizeof(int)); + + if (!is_int && (table->maxlen != sizeof(long))) { + range_err++; + } else if (!table->extra1 || !table->extra2) { + /* No min > max checking needed */ + } else if (table->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) { + unsigned long min, max; + + min = is_int ? *(unsigned int *)table->extra1 +: *(unsigned long *)table->extra1; + max = is_int ? *(unsigned int *)table->extra2 +: *(unsigned long *)table->extra2; + range_err += (min > max); + } else { /* table->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE */ + + long min, max; + + min = is_int ? *(int *)table->extra1 +: *(long *)table->extra1; + max = is_int ? *(int *)table->extra2 +: *(long *)table->extra2; + range_err += (min > max); + } + + /* +* proc_handler and flag consistency check. +*/ + if (((table->proc_handler == proc_douintvec_minmax) || +(table->proc_handler == proc_doulongvec_minmax)) && + !(table->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE)) + range_err++; + + if ((table->proc_handler == proc_dointvec_minmax) && + !(table->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE)) + range_err++; + + if (range_err) + err |= sysctl_err(path, table, "Invalid range"); + } + return err; +} + static int sysctl_check_table(const char *path, struct ctl_table *table) { int err = 0; @@ -,6 +1169,8 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) { if (!table->data) err |= sysctl_err(path, table, "No data"); + if (table->flags) + err |= sysctl_check_flags(path, table); if (!table->maxlen) err |= sysctl_err(path, table, "No maxlen"); else -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/8] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
A user can write arbitrary integer values to msgmni and shmmni sysctl parameters without getting error, but the actual limit is really IPCMNI (32k). This can mislead users as they think they can get a value that is not real. Enforcing the limit by failing the sysctl parameter write, however, can break existing user applications in case they are writing a value greater than 32k. Instead, the range clamping flag is set to enforce the limit without failing existing user code. Users can easily figure out if the sysctl parameter value is out of range by either reading back the parameter value or checking the kernel ring buffer for warning. Signed-off-by: Waiman Long--- ipc/ipc_sysctl.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 8ad93c2..d71f949 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, static int zero; static int one = 1; static int int_max = INT_MAX; +static int ipc_mni = IPCMNI; static struct ctl_table ipc_kern_table[] = { { @@ -120,7 +121,10 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .data = _ipc_ns.shm_ctlmni, .maxlen = sizeof(init_ipc_ns.shm_ctlmni), .mode = 0644, - .proc_handler = proc_ipc_dointvec, + .proc_handler = proc_ipc_dointvec_minmax, + .extra1 = , + .extra2 = _mni, + .flags = CTL_FLAGS_CLAMP_SIGNED_RANGE, }, { .procname = "shm_rmid_forced", @@ -147,7 +151,8 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .mode = 0644, .proc_handler = proc_ipc_dointvec_minmax, .extra1 = , - .extra2 = _max, + .extra2 = _mni, + .flags = CTL_FLAGS_CLAMP_SIGNED_RANGE, }, { .procname = "auto_msgmni", -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/8] sysctl: Add flags to support min/max range clamping
When minimum/maximum values are specified for a sysctl parameter in the ctl_table structure with proc_dointvec_minmax() handler, update to that parameter will fail with error if the given value is outside of the required range. There are use cases where it may be better to clamp the value of the sysctl parameter to the given range without failing the update, especially if the users are not aware of the actual range limits. Reading the value back after the update will now be a good practice to see if the provided value exceeds the range limits. To provide this less restrictive form of range checking, a new flags field is added to the ctl_table structure. The new field is a 16-bit value that just fits into the hole left by the 16-bit umode_t field without increasing the size of the structure. When either the CTL_FLAGS_CLAMP_SIGNED_RANGE or the CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag is set in the ctl_table entry, any update from the userspace will be clamped to the given range without error if either the proc_dointvec_minmax() or the proc_douintvec_minmax() handlers is used respectively. In the case of proc_doulongvec_minmax(), the out-of-range input value is either ignored or clamped if the CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag is set. The clamped value is either the maximum or minimum value that is closest to the input value provided by the user. This patch, by itself, does not require the use of separate signed and unsigned flags. However, the use of separate flags allows us to perform more comprehensive checking in a later patch. Extra braces are also used in this patch to make a latter patch easier to read. Signed-off-by: Waiman Long--- include/linux/sysctl.h | 32 ++ kernel/sysctl.c| 74 ++ 2 files changed, 94 insertions(+), 12 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b769ecf..3a628cf 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -116,6 +116,7 @@ struct ctl_table void *data; int maxlen; umode_t mode; + uint16_t flags; struct ctl_table *child;/* Deprecated */ proc_handler *proc_handler; /* Callback for text formatting */ struct ctl_table_poll *poll; @@ -123,6 +124,37 @@ struct ctl_table void *extra2; } __randomize_layout; +/** + * enum ctl_table_flags - flags for the ctl table (struct ctl_table.flags) + * + * @CTL_FLAGS_CLAMP_SIGNED_RANGE: Set to indicate that the entry holds a + * signed value and should be flexibly clamped to the provided + * min/max signed value in case the user provided a value outside + * of the given range. The clamped value is either the provided + * minimum or maximum value that is closest to the input value. + * No lower bound or upper bound checking will be done if the + * corresponding minimum or maximum value isn't provided. + * + * @CTL_FLAGS_CLAMP_UNSIGNED_RANGE: Set to indicate that the entry holds + * an unsigned value and should be flexibly clamped to the provided + * min/max unsigned value in case the user provided a value outside + * of the given range. The clamped value is either the provided + * minimum or maximum value that is closest to the input value. + * No lower bound or upper bound checking will be done if the + * corresponding minimum or maximum value isn't provided. + * + * At most 16 different flags are currently allowed. + */ +enum ctl_table_flags { + CTL_FLAGS_CLAMP_SIGNED_RANGE= BIT(0), + CTL_FLAGS_CLAMP_UNSIGNED_RANGE = BIT(1), + __CTL_FLAGS_MAX = BIT(2), +}; + +#define CTL_FLAGS_CLAMP_RANGE (CTL_FLAGS_CLAMP_SIGNED_RANGE|\ +CTL_FLAGS_CLAMP_UNSIGNED_RANGE) +#define CTL_TABLE_FLAGS_ALL(__CTL_FLAGS_MAX - 1) + struct ctl_node { struct rb_node node; struct ctl_table_header *header; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 6a78cf7..5b84c1d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2515,6 +2515,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value + * @flags: pointer to flags * * The do_proc_dointvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2523,6 +2524,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, struct do_proc_dointvec_minmax_conv_param { int *min; int *max; + uint16_t *flags; }; static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, @@ -2532,9 +2534,23 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, struct do_proc_dointvec_minmax_conv_param
[PATCH v6 6/8] test_sysctl: Add range clamping test
Add a range clamping test to verify that the input value will be clamped if it exceeds the builtin maximum or minimum value. Below is the expected test run result: Running test: sysctl_test_0006 - run #0 Checking range minimum clamping ... ok Checking range maximum clamping ... ok Checking range minimum clamping ... ok Checking range maximum clamping ... ok Signed-off-by: Waiman Long--- lib/test_sysctl.c| 29 ++ tools/testing/selftests/sysctl/sysctl.sh | 52 2 files changed, 81 insertions(+) diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c index 3dd801c..3c619b9 100644 --- a/lib/test_sysctl.c +++ b/lib/test_sysctl.c @@ -38,12 +38,18 @@ static int i_zero; static int i_one_hundred = 100; +static int signed_min = -10; +static int signed_max = 10; +static unsigned int unsigned_min = 10; +static unsigned int unsigned_max = 30; struct test_sysctl_data { int int_0001; int int_0002; int int_0003[4]; + int range_0001; + unsigned int urange_0001; unsigned int uint_0001; char string_0001[65]; @@ -58,6 +64,9 @@ struct test_sysctl_data { .int_0003[2] = 2, .int_0003[3] = 3, + .range_0001 = 0, + .urange_0001 = 20, + .uint_0001 = 314, .string_0001 = "(none)", @@ -102,6 +111,26 @@ struct test_sysctl_data { .mode = 0644, .proc_handler = proc_dostring, }, + { + .procname = "range_0001", + .data = _data.range_0001, + .maxlen = sizeof(test_data.range_0001), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .flags = CTL_FLAGS_CLAMP_SIGNED_RANGE, + .extra1 = _min, + .extra2 = _max, + }, + { + .procname = "urange_0001", + .data = _data.urange_0001, + .maxlen = sizeof(test_data.urange_0001), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .flags = CTL_FLAGS_CLAMP_UNSIGNED_RANGE, + .extra1 = _min, + .extra2 = _max, + }, { } }; diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh index ec232c3..1aa1bba 100755 --- a/tools/testing/selftests/sysctl/sysctl.sh +++ b/tools/testing/selftests/sysctl/sysctl.sh @@ -34,6 +34,7 @@ ALL_TESTS="$ALL_TESTS 0002:1:1" ALL_TESTS="$ALL_TESTS 0003:1:1" ALL_TESTS="$ALL_TESTS 0004:1:1" ALL_TESTS="$ALL_TESTS 0005:3:1" +ALL_TESTS="$ALL_TESTS 0006:1:1" test_modprobe() { @@ -543,6 +544,38 @@ run_stringtests() test_rc } +# TARGET, RANGE_MIN & RANGE_MAX need to be defined before running test. +run_range_clamping_test() +{ + rc=0 + + echo -n "Checking range minimum clamping ... " + VAL=$((RANGE_MIN - 1)) + echo -n $VAL > "${TARGET}" 2> /dev/null + EXITVAL=$? + NEWVAL=$(cat "${TARGET}") + if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MIN ]]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + echo -n "Checking range maximum clamping ... " + VAL=$((RANGE_MAX + 1)) + echo -n $VAL > "${TARGET}" 2> /dev/null + EXITVAL=$? + NEWVAL=$(cat "${TARGET}") + if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MAX ]]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + test_rc +} + sysctl_test_0001() { TARGET="${SYSCTL}/int_0001" @@ -600,6 +633,25 @@ sysctl_test_0005() run_limit_digit_int_array } +sysctl_test_0006() +{ + TARGET="${SYSCTL}/range_0001" + ORIG=$(cat "${TARGET}") + RANGE_MIN=-10 + RANGE_MAX=10 + + run_range_clamping_test + set_orig + + TARGET="${SYSCTL}/urange_0001" + ORIG=$(cat "${TARGET}") + RANGE_MIN=10 + RANGE_MAX=30 + + run_range_clamping_test + set_orig +} + list_tests() { echo "Test ID list:" -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/8] ipc: Clamp semmni to the real IPCMNI limit
For SysV semaphores, the semmni value is the last part of the 4-element sem number array. To make semmni behave in a similar way to msgmni and shmmni, we can't directly use the _minmax handler. Instead, a special sem specific handler is added to check the last argument to make sure that it is clamped to the [0, IPCMNI] range and prints a warning message once when an out-of-range value is being written. This does require duplicating some of the code in the _minmax handlers. Signed-off-by: Waiman Long--- ipc/ipc_sysctl.c | 12 +++- ipc/sem.c| 25 + ipc/util.h | 4 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index d71f949..478e634 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -88,12 +88,22 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, return proc_dointvec_minmax(_table, write, buffer, lenp, ppos); } +static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret = proc_ipc_dointvec(table, write, buffer, lenp, ppos); + + sem_check_semmni(table, current->nsproxy->ipc_ns); + return ret; +} + #else #define proc_ipc_doulongvec_minmax NULL #define proc_ipc_dointvec NULL #define proc_ipc_dointvec_minmax NULL #define proc_ipc_dointvec_minmax_orphans NULL #define proc_ipc_auto_msgmni NULL +#define proc_ipc_sem_dointvec NULL #endif static int zero; @@ -177,7 +187,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .data = _ipc_ns.sem_ctls, .maxlen = 4*sizeof(int), .mode = 0644, - .proc_handler = proc_ipc_dointvec, + .proc_handler = proc_ipc_sem_dointvec, }, #ifdef CONFIG_CHECKPOINT_RESTORE { diff --git a/ipc/sem.c b/ipc/sem.c index 06be75d..96bdec6 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -2397,3 +2397,28 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) return 0; } #endif + +#ifdef CONFIG_PROC_SYSCTL +/* + * Check to see if semmni is out of range and clamp it if necessary. + */ +void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns) +{ + bool clamped = false; + + /* +* Clamp semmni to the range [0, IPCMNI]. +*/ + if (ns->sc_semmni < 0) { + ns->sc_semmni = 0; + clamped = true; + } + if (ns->sc_semmni > IPCMNI) { + ns->sc_semmni = IPCMNI; + clamped = true; + } + if (clamped) + pr_warn_ratelimited("sysctl: \"sem[3]\" was set out of range [%d, %d], clamped to %d.\n", +0, IPCMNI, ns->sc_semmni); +} +#endif diff --git a/ipc/util.h b/ipc/util.h index acc5159..7c20871 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -218,6 +218,10 @@ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids, void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, void (*free)(struct ipc_namespace *, struct kern_ipc_perm *)); +#ifdef CONFIG_PROC_SYSCTL +extern void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns); +#endif + #ifdef CONFIG_COMPAT #include struct compat_ipc_perm { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M
The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 2M. This is a 64X increase which hopefully is big enough for them. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 1k. So it is a trade-off. Signed-off-by: Waiman Long--- Documentation/admin-guide/kernel-parameters.txt | 3 +++ ipc/ipc_sysctl.c| 12 +- ipc/util.c | 12 +- ipc/util.h | 30 ++--- 4 files changed, 42 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 11fc28e..00bc0cb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1735,6 +1735,9 @@ ip= [IP_PNP] See Documentation/filesystems/nfs/nfsroot.txt. + ipcmni_extend [KNL] Extend the maximum number of unique System V + IPC identifiers from 32768 to 2097152. + irqaffinity=[SMP] Set the default irq affinity mask The argument is a cpu list, as described above. diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 478e634..4e2cb6d 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -109,7 +109,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int zero; static int one = 1; static int int_max = INT_MAX; -static int ipc_mni = IPCMNI; +int ipc_mni __read_mostly = IPCMNI; +int ipc_mni_shift __read_mostly = IPCMNI_SHIFT; static struct ctl_table ipc_kern_table[] = { { @@ -237,3 +238,12 @@ static int __init ipc_sysctl_init(void) } device_initcall(ipc_sysctl_init); + +static int __init ipc_mni_extend(char *str) +{ + ipc_mni = IPCMNI_EXTEND; + ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + pr_info("IPCMNI extended to %d.\n", ipc_mni); + return 0; +} +early_param("ipcmni_extend", ipc_mni_extend); diff --git a/ipc/util.c b/ipc/util.c index 4e81182..782a8d0 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -113,7 +113,7 @@ static int __init ipc_init(void) * @ids: ipc identifier set * * Set up the sequence range to use for the ipc identifier range (limited - * below IPCMNI) then initialise the keys hashtable and ids idr. + * below ipc_mni) then initialise the keys hashtable and ids idr. */ int ipc_init_ids(struct ipc_ids *ids) { @@ -214,7 +214,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, ids->next_id = -1; } - return SEQ_MULTIPLIER * new->seq + id; + return (new->seq << SEQ_SHIFT) + id; } #else @@ -228,7 +228,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, if (ids->seq > IPCID_SEQ_MAX) ids->seq = 0; - return SEQ_MULTIPLIER * new->seq + id; + return (new->seq << SEQ_SHIFT) + id; } #endif /* CONFIG_CHECKPOINT_RESTORE */ @@ -252,8 +252,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit) kgid_t egid; int id, err; - if (limit > IPCMNI) - limit = IPCMNI; + if (limit > ipc_mni) + limit = ipc_mni; if (!ids->tables_initialized || ids->in_use >= limit) return -ENOSPC; @@ -777,7 +777,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos, if (total >= ids->in_use) return NULL; - for (; pos < IPCMNI; pos++) { + for (; pos < ipc_mni; pos++) { ipc = idr_find(>ipcs_idr, pos); if (ipc != NULL) { *new_pos = pos + 1; diff --git a/ipc/util.h b/ipc/util.h index 7c20871..e4d14b6 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -15,8 +15,22 @@ #include #include -#define IPCMNI 32768 /* <= MAX_INT limit for ipc arrays (including sysctl changes) */ -#define SEQ_MULTIPLIER (IPCMNI) +/* + * By default, the ipc arrays can have up to 32k (15 bits) entries. + * When IPCMNI extension mode is turned on, the ipc arrays can have up + * to 2M (21 bits) entries. However, the space for sequence number will + * be shrunk from 16 bits to 10 bits. + */ +#define IPCMNI_SHIFT 15 +#define IPCMNI_EXTEND_SHIFT21 +#define IPCMNI (1 << IPCMNI_SHIFT) +#define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) + +extern int ipc_mni; +extern int ipc_mni_shift; + +#define SEQ_SHIFT ipc_mni_shift +#define SEQ_MASK ((1 << ipc_mni_shift) - 1) int sem_init(void); int msg_init(void); @@ -96,9 +110,9 @@ void __init ipc_init_proc_interface(const
[PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode
The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With extended IPCMNI mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when one or more IDs have been removed previously to make sure that those IDs will not be reused when a new one is built. This is being done in the extended IPCMNI mode, Signed-off-by: Waiman Long--- include/linux/ipc_namespace.h | 1 + ipc/ipc_sysctl.c | 2 ++ ipc/util.c| 29 ++--- ipc/util.h| 1 + 4 files changed, 26 insertions(+), 7 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index b5630c8..9c86fd9 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -16,6 +16,7 @@ struct ipc_ids { int in_use; unsigned short seq; + unsigned short deleted; bool tables_initialized; struct rw_semaphore rwsem; struct idr ipcs_idr; diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 4e2cb6d..b7fb38c 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -111,6 +111,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int int_max = INT_MAX; int ipc_mni __read_mostly = IPCMNI; int ipc_mni_shift __read_mostly = IPCMNI_SHIFT; +bool ipc_mni_extended __read_mostly; static struct ctl_table ipc_kern_table[] = { { @@ -243,6 +244,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_mni_extended = true; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index 782a8d0..7c8e733 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -119,7 +119,8 @@ int ipc_init_ids(struct ipc_ids *ids) { int err; ids->in_use = 0; - ids->seq = 0; + ids->deleted = false; + ids->seq = ipc_mni_extended ? 0 : -1; /* seq # is pre-incremented */ init_rwsem(>rwsem); err = rhashtable_init(>key_ht, _kht_params); if (err) @@ -193,6 +194,11 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key) return NULL; } +/* + * To conserve sequence number space with extended ipc_mni when new ID + * is built, the sequence number is incremented only when one or more + * IDs have been removed previously. + */ #ifdef CONFIG_CHECKPOINT_RESTORE /* * Specify desired id for next allocated IPC object. @@ -206,9 +212,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, struct kern_ipc_perm *new) { if (ids->next_id < 0) { /* default, behave as !CHECKPOINT_RESTORE */ - new->seq = ids->seq++; - if (ids->seq > IPCID_SEQ_MAX) - ids->seq = 0; + if (!ipc_mni_extended || ids->deleted) { + ids->seq++; + if (ids->seq > IPCID_SEQ_MAX) + ids->seq = 0; + ids->deleted = false; + } + new->seq = ids->seq; } else { new->seq = ipcid_to_seqx(ids->next_id); ids->next_id = -1; @@ -224,9 +234,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, static inline int ipc_buildid(int id, struct ipc_ids *ids, struct kern_ipc_perm *new) { - new->seq = ids->seq++; - if (ids->seq > IPCID_SEQ_MAX) - ids->seq = 0; + if (!ipc_mni_extended || ids->deleted) { + ids->seq++; + if (ids->seq > IPCID_SEQ_MAX) + ids->seq = 0; + ids->deleted = false; + } + new->seq = ids->seq; return (new->seq << SEQ_SHIFT) + id; } @@ -436,6 +450,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp) idr_remove(>ipcs_idr, lid); ipc_kht_remove(ids, ipcp); ids->in_use--; + ids->deleted = true; ipcp->deleted = true; if (unlikely(lid == ids->max_id)) { diff --git a/ipc/util.h b/ipc/util.h index e4d14b6..54a86fc 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -28,6 +28,7 @@ extern int ipc_mni; extern int ipc_mni_shift; +extern bool ipc_mni_extended; #define SEQ_SHIFT ipc_mni_shift #define SEQ_MASK ((1 << ipc_mni_shift) - 1) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
v5->v6: - Consolidate the 3 ctl_table flags into 2. - Make similar changes to proc_doulongvec_minmax() and its associates to complete the clamping change. - Remove the sysctl registration failure test patch for now for later consideration. - Add extra braces to patch 1 to reduce code diff in a later patch. v4->v5: - Revert the flags back to 16-bit so that there will be no change to the size of ctl_table. - Enhance the sysctl_check_flags() as requested by Luis to perform more checks to spot incorrect ctl_table entries. - Change the sysctl selftest to use dummy sysctls instead of production ones & enhance it to do more checks. - Add one more sysctl selftest for registration failure. - Add 2 ipc patches to add an extended mode to increase IPCMNI from 32k to 2M. - Miscellaneous change to incorporate feedback comments from reviewers. v3->v4: - Remove v3 patches 1 & 2 as they have been merged into the mm tree. - Change flags from uint16_t to unsigned int. - Remove CTL_FLAGS_OOR_WARNED and use pr_warn_ratelimited() instead. - Simplify the warning message code. - Add a new patch to fail the ctl_table registration with invalid flag. - Add a test case for range clamping in sysctl selftest. v2->v3: - Fix kdoc comment errors. - Incorporate comments and suggestions from Luis R. Rodriguez. - Add a patch to fix a typo error in fs/proc/proc_sysctl.c. v1->v2: - Add kdoc comments to the do_proc_do{u}intvec_minmax_conv_param structures. - Add a new flags field to the ctl_table structure for specifying whether range clamping should be activated instead of adding new sysctl parameter handlers. - Clamp the semmni value embedded in the multi-values sem parameter. v1 patch: https://lkml.org/lkml/2018/2/19/453 v2 patch: https://lkml.org/lkml/2018/2/27/627 v3 patch: https://lkml.org/lkml/2018/3/1/716 v4 patch: https://lkml.org/lkml/2018/3/12/867 v5 patch: https://lkml.org/lkml/2018/3/16/1106 The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. Enforcing the limit by failing sysctl parameter write, however, may cause regressions if existing user setup scripts set those parameters above 32k as those scripts will now fail in this case. To address this delemma, a new flags field is introduced into the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any ctl_table entries to enable a looser range clamping without returning any error. For example, .flags = CTL_FLAGS_CLAMP_RANGE, This flags value are now used for the range checking of shmmni, msgmni and semmni without breaking existing applications. If any out of range value is written to those sysctl parameters, the following warning will be printed instead. sysctl: "shmmni" was set out of range [0, 32768], clamped to 32768. Reading the values back will show 32768 instead of some fake values. New sysctl selftests are added to exercise new code added by this patchset. There are users out there requesting increase in the IPCMNI value. The last 2 patches attempt to do that by using a boot kernel parameter "ipcmni_extend" to increase the IPCMNI limit from 32k to 2M. Eric Biederman had posted an RFC patch to just scrap the IPCMNI limit and open up the whole positive integer space for IPC IDs. A major issue that I have with this approach is that SysV IPC had been in use for over 20 years. We just don't know if there are user applications that have dependency on the way that the IDs are built. So drastic change like this may have the potential of breaking some applications. I prefer a more conservative approach where users will observe no change in behavior unless they explictly opt in to enable the extended mode. I could open up the whole positive integer space in this case like what Eric did, but that will make the code more complex. So I just extend IPCMNI to 2M in this case and keep similar ID generation logic. Waiman Long (8): sysctl: Add flags to support min/max range clamping proc/sysctl: Provide additional ctl_table.flags checks sysctl: Warn when a clamped sysctl parameter is set out of range ipc: Clamp msgmni and shmmni to the real IPCMNI limit ipc: Clamp semmni to the real IPCMNI limit test_sysctl: Add range clamping test ipc: Allow boot time extension of IPCMNI from 32k to 2M ipc: Conserve sequence numbers in extended IPCMNI mode Documentation/admin-guide/kernel-parameters.txt | 3 + fs/proc/proc_sysctl.c | 60 ++ include/linux/ipc_namespace.h | 1 + include/linux/sysctl.h | 32 ipc/ipc_sysctl.c| 33 +++- ipc/sem.c | 25 ++ ipc/util.c
[PATCH 0/3] Better integrate seccomp logging and auditing
Seccomp received improved logging controls in v4.14. Applications can opt into logging of "handled" actions (SECCOMP_RET_TRAP, SECCOMP_RET_TRACE, SECCOMP_RET_ERRNO) using the SECCOMP_FILTER_FLAG_LOG bit when loading filters. They can also debug filter matching with the new SECCOMP_RET_LOG action. Administrators can prevent specific actions from being logged using the kernel.seccomp.actions_logged sysctl. However, one corner case intentionally wasn't addressed in those v4.14 changes. When a process is being inspected by the audit subsystem, seccomp's decision making for logging ignores the new controls and unconditionally logs every action taken except for SECCOMP_RET_ALLOW. This isn't particularly useful since many existing applications don't intend to log handled actions due to them occurring very frequently. This amount of logging fills the audit logs without providing many benefits now that application authors have fine grained controls at their disposal. This patch set aligns the seccomp logging behavior for both audited and unaudited processes. It also emits an audit record, if auditing is enabled, when the kernel.seccomp.actions_logged sysctl is written to so that there's a paper trail when entire actions are quieted. Tyler -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] seccomp: Separate read and write code for actions_logged sysctl
Break the read and write paths of the kernel.seccomp.actions_logged sysctl into separate functions to maintain readability. An upcoming change will need to audit writes, but not reads, of this sysctl which would introduce too many conditional code paths on whether or not the 'write' parameter evaluates to true. Signed-off-by: Tyler Hicks--- kernel/seccomp.c | 60 +++- 1 file changed, 38 insertions(+), 22 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index dc77548..f4afe67 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1199,48 +1199,64 @@ static bool seccomp_actions_logged_from_names(u32 *actions_logged, char *names) return true; } -static int seccomp_actions_logged_handler(struct ctl_table *ro_table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos) +static int read_actions_logged(struct ctl_table *ro_table, void __user *buffer, + size_t *lenp, loff_t *ppos) { char names[sizeof(seccomp_actions_avail)]; struct ctl_table table; + + memset(names, 0, sizeof(names)); + + if (!seccomp_names_from_actions_logged(names, sizeof(names), + seccomp_actions_logged)) + return -EINVAL; + + table = *ro_table; + table.data = names; + table.maxlen = sizeof(names); + return proc_dostring(, 0, buffer, lenp, ppos); +} + +static int write_actions_logged(struct ctl_table *ro_table, void __user *buffer, + size_t *lenp, loff_t *ppos) +{ + char names[sizeof(seccomp_actions_avail)]; + struct ctl_table table; + u32 actions_logged; int ret; - if (write && !capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN)) return -EPERM; memset(names, 0, sizeof(names)); - if (!write) { - if (!seccomp_names_from_actions_logged(names, sizeof(names), - seccomp_actions_logged)) - return -EINVAL; - } - table = *ro_table; table.data = names; table.maxlen = sizeof(names); - ret = proc_dostring(, write, buffer, lenp, ppos); + ret = proc_dostring(, 1, buffer, lenp, ppos); if (ret) return ret; - if (write) { - u32 actions_logged; - - if (!seccomp_actions_logged_from_names(_logged, - table.data)) - return -EINVAL; - - if (actions_logged & SECCOMP_LOG_ALLOW) - return -EINVAL; + if (!seccomp_actions_logged_from_names(_logged, table.data)) + return -EINVAL; - seccomp_actions_logged = actions_logged; - } + if (actions_logged & SECCOMP_LOG_ALLOW) + return -EINVAL; + seccomp_actions_logged = actions_logged; return 0; } +static int seccomp_actions_logged_handler(struct ctl_table *ro_table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + if (write) + return write_actions_logged(ro_table, buffer, lenp, ppos); + else + return read_actions_logged(ro_table, buffer, lenp, ppos); +} + static struct ctl_path seccomp_sysctl_path[] = { { .procname = "kernel", }, { .procname = "seccomp", }, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl
The decision to log a seccomp action will always be subject to the value of the kernel.seccomp.actions_logged sysctl, even for processes that are being inspected via the audit subsystem, in an upcoming patch. Therefore, we need to emit an audit record on attempts at writing to the actions_logged sysctl when auditing is enabled. This patch updates the write handler for the actions_logged sysctl to emit an audit record on attempts to write to the sysctl. Successful writes to the sysctl will result in a record that includes a normalized list of logged actions in the "actions" field and a "res" field equal to 0. Unsuccessful writes to the sysctl will result in a record that doesn't include the "actions" field and has a "res" field equal to 1. Not all unsuccessful writes to the sysctl are audited. For example, an audit record will not be emitted if an unprivileged process attempts to open the sysctl file for reading since that access control check is not part of the sysctl's write handler. Below are some example audit records when writing various strings to the actions_logged sysctl. Writing "not-a-real-action" emits: type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0 auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" op=seccomp-logging res=1 Writing "kill_process kill_thread errno trace log" emits: type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0 auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" op=seccomp-logging actions="kill_process kill_thread errno trace log" res=0 Writing the string "log log errno trace kill_process kill_thread", which is unordered and contains the log action twice, results in the same value as the previous example for the actions field: type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0 auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" op=seccomp-logging actions="kill_process kill_thread errno trace log" res=0 No audit records are generated when reading the actions_logged sysctl. Suggested-by: Steve GrubbSigned-off-by: Tyler Hicks --- include/linux/audit.h | 3 +++ kernel/auditsc.c | 37 + kernel/seccomp.c | 43 ++- 3 files changed, 74 insertions(+), 9 deletions(-) diff --git a/include/linux/audit.h b/include/linux/audit.h index 75d5b03..b311d7d 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent, const struct dentry *dentry, const unsigned char type); extern void __audit_seccomp(unsigned long syscall, long signr, int code); +extern void audit_seccomp_actions_logged(const char *names, int res); extern void __audit_ptrace(struct task_struct *t); static inline bool audit_dummy_context(void) @@ -502,6 +503,8 @@ static inline void __audit_seccomp(unsigned long syscall, long signr, int code) { } static inline void audit_seccomp(unsigned long syscall, long signr, int code) { } +static inline void audit_seccomp_actions_logged(const char *names, int res) +{ } static inline int auditsc_get_stamp(struct audit_context *ctx, struct timespec64 *t, unsigned int *serial) { diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 4e0a4ac..3496238 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -2478,6 +2478,43 @@ void __audit_seccomp(unsigned long syscall, long signr, int code) audit_log_end(ab); } +void audit_seccomp_actions_logged(const char *names, int res) +{ + struct tty_struct *tty; + const struct cred *cred; + struct audit_buffer *ab; + char comm[sizeof(current->comm)]; + + if (!audit_enabled) + return; + + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE); + if (unlikely(!ab)) + return; + + cred = current_cred(); + tty = audit_get_tty(current); + audit_log_format(ab, "pid=%d uid=%u auid=%u tty=%s ses=%u", +task_tgid_nr(current), +from_kuid(_user_ns, cred->uid), +from_kuid(_user_ns, +audit_get_loginuid(current)), +tty ? tty_name(tty) : "(none)", +audit_get_sessionid(current)); + audit_put_tty(tty); + audit_log_task_context(ab); + audit_log_format(ab, " comm="); + audit_log_untrustedstring(ab, get_task_comm(comm, current)); + audit_log_d_path_exe(ab, current->mm); + audit_log_format(ab, " op=seccomp-logging"); + + if (names) + audit_log_format(ab, " actions=\"%s\"", names); + + audit_log_format(ab, " res=%d", res); + audit_log_end(ab); +} + struct list_head *audit_killed_trees(void) { struct audit_context *ctx = current->audit_context; diff --git a/kernel/seccomp.c
[PATCH 3/3] seccomp: Don't special case audited processes when logging
Seccomp logging for "handled" actions such as RET_TRAP, RET_TRACE, or RET_ERRNO can be very noisy for processes that are being audited. This patch modifies the seccomp logging behavior to treat processes that are being inspected via the audit subsystem the same as processes that aren't under inspection. Handled actions will no longer be logged just because the process is being inspected. Since v4.14, applications have the ability to request logging of handled actions by using the SECCOMP_FILTER_FLAG_LOG flag when loading seccomp filters. With this patch, the logic for deciding if an action will be logged is: if action == RET_ALLOW: do not log else if action not in actions_logged: do not log else if action == RET_KILL: log else if action == RET_LOG: log else if filter-requests-logging: log else: do not log Reported-by: Steve GrubbSigned-off-by: Tyler Hicks --- Documentation/userspace-api/seccomp_filter.rst | 7 --- include/linux/audit.h | 10 +- kernel/auditsc.c | 2 +- kernel/seccomp.c | 15 +-- 4 files changed, 7 insertions(+), 27 deletions(-) diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst index 099c412..82a468b 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -207,13 +207,6 @@ directory. Here's a description of each file in that directory: to the file do not need to be in ordered form but reads from the file will be ordered in the same way as the actions_avail sysctl. - It is important to note that the value of ``actions_logged`` does not - prevent certain actions from being logged when the audit subsystem is - configured to audit a task. If the action is not found in - ``actions_logged`` list, the final decision on whether to audit the - action for that task is ultimately left up to the audit subsystem to - decide for all seccomp return values other than ``SECCOMP_RET_ALLOW``. - The ``allow`` string is not accepted in the ``actions_logged`` sysctl as it is not possible to log ``SECCOMP_RET_ALLOW`` actions. Attempting to write ``allow`` to the sysctl will result in an EINVAL being diff --git a/include/linux/audit.h b/include/linux/audit.h index b311d7d..1964fbd 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -232,7 +232,7 @@ extern void __audit_file(const struct file *); extern void __audit_inode_child(struct inode *parent, const struct dentry *dentry, const unsigned char type); -extern void __audit_seccomp(unsigned long syscall, long signr, int code); +extern void audit_seccomp(unsigned long syscall, long signr, int code); extern void audit_seccomp_actions_logged(const char *names, int res); extern void __audit_ptrace(struct task_struct *t); @@ -303,12 +303,6 @@ static inline void audit_inode_child(struct inode *parent, } void audit_core_dumps(long signr); -static inline void audit_seccomp(unsigned long syscall, long signr, int code) -{ - if (audit_enabled && unlikely(!audit_dummy_context())) - __audit_seccomp(syscall, signr, code); -} - static inline void audit_ptrace(struct task_struct *t) { if (unlikely(!audit_dummy_context())) @@ -499,8 +493,6 @@ static inline void audit_inode_child(struct inode *parent, { } static inline void audit_core_dumps(long signr) { } -static inline void __audit_seccomp(unsigned long syscall, long signr, int code) -{ } static inline void audit_seccomp(unsigned long syscall, long signr, int code) { } static inline void audit_seccomp_actions_logged(const char *names, int res) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 3496238..1e64b91 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -2464,7 +2464,7 @@ void audit_core_dumps(long signr) audit_log_end(ab); } -void __audit_seccomp(unsigned long syscall, long signr, int code) +void audit_seccomp(unsigned long syscall, long signr, int code) { struct audit_buffer *ab; diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e28ddcc..947cc0f 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -584,18 +584,13 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action, } /* -* Force an audit message to be emitted when the action is RET_KILL_*, -* RET_LOG, or the FILTER_FLAG_LOG bit was set and the action is -* allowed to be logged by the admin. +* Emit an audit message when the action is RET_KILL_*, RET_LOG, or the +* FILTER_FLAG_LOG bit was set. The admin has the ability to silence +* any action from being logged by removing the action name from the +* seccomp_actions_logged
Re: [PATCH 2/4] fpga: manager: change api, don't use drvdata
On 04/26/2018 06:26 PM, Moritz Fischer wrote: > From: Alan Tull> > Change fpga_mgr_register to not set or use drvdata. This supports > the case where a PCIe device has more than one manager. > > Add fpga_mgr_create/free functions. Change fpga_mgr_register and > fpga_mgr_unregister functions to take the mgr struct as their only > parameter. > > struct fpga_manager *fpga_mgr_create(struct device *dev, > const char *name, > const struct fpga_manager_ops *mops, > void *priv); > void fpga_mgr_free(struct fpga_manager *mgr); > int fpga_mgr_register(struct fpga_manager *mgr); > void fpga_mgr_unregister(struct fpga_manager *mgr); > > Update the drivers that call fpga_mgr_register with the new API. Apologies for chiming in so late, this commit does not make it clear that fpga_mgr_unregister() now also free the 'mgr' argument by calling fpga_mgr_free(), this is kind of detail, but an API should make that clear IMHO. Thanks -- Florian -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/10] Add MSI-X support on pcitest tool
Depends of the following serie [1]. Add MSI-X support on pcitest tool. Add new callbacks methods and handlers to trigger the MSI-X interrupts on the EP DesignWare IP driver. Allow to set/get MSI-X EP maximum capability number. Rework on set/get and triggering MSI methods on EP DesignWare IP driver. Add a new input parameter (msix) to pcitest tool to test MSI-X feature. Update the pcitest.sh script to support MSI-X feature tests. [1] -> https://lkml.org/lkml/2018/4/27/342 Gustavo Pimentel (10): PCI: endpoint: Add MSI-X interfaces PCI: dwc: Add MSI-X callbacks handler PCI: cadence: Update cdns_pcie_ep_raise_irq function signature PCI: dwc: Rework MSI callbacks handler PCI: dwc: Add legacy interrupt callback handler misc: pci_endpoint_test: Add MSI-X support misc: pci_endpoint_test: Replace lower into upper case characters PCI: endpoint: functions/pci-epf-test: Replace lower into upper case characters tools: PCI: Add MSI-X support misc: pci_endpoint_test: Use pci_irq_vector function Documentation/misc-devices/pci-endpoint-test.txt | 3 + drivers/misc/pci_endpoint_test.c | 147 +++-- drivers/pci/cadence/pcie-cadence-ep.c| 2 +- drivers/pci/dwc/pci-dra7xx.c | 2 +- drivers/pci/dwc/pcie-artpec6.c | 2 +- drivers/pci/dwc/pcie-designware-ep.c | 202 +-- drivers/pci/dwc/pcie-designware-plat.c | 7 +- drivers/pci/dwc/pcie-designware.h| 31 ++-- drivers/pci/endpoint/functions/pci-epf-test.c| 104 drivers/pci/endpoint/pci-ep-cfs.c| 24 +++ drivers/pci/endpoint/pci-epc-core.c | 60 ++- include/linux/pci-epc.h | 11 +- include/linux/pci-epf.h | 1 + include/uapi/linux/pcitest.h | 1 + tools/pci/pcitest.c | 18 +- tools/pci/pcitest.sh | 25 +++ 16 files changed, 517 insertions(+), 123 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/10] PCI: endpoint: functions/pci-epf-test: Replace lower into upper case characters
Replace all initial lower case character into upper case in comments and debug printks. Signed-off-by: Gustavo Pimentel--- drivers/pci/endpoint/functions/pci-epf-test.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c index be5547f..e9ff4aa 100644 --- a/drivers/pci/endpoint/functions/pci-epf-test.c +++ b/drivers/pci/endpoint/functions/pci-epf-test.c @@ -92,7 +92,7 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test) src_addr = pci_epc_mem_alloc_addr(epc, _phys_addr, reg->size); if (!src_addr) { - dev_err(dev, "failed to allocate source address\n"); + dev_err(dev, "Failed to allocate source address\n"); reg->status = STATUS_SRC_ADDR_INVALID; ret = -ENOMEM; goto err; @@ -101,14 +101,14 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test) ret = pci_epc_map_addr(epc, epf->func_no, src_phys_addr, reg->src_addr, reg->size); if (ret) { - dev_err(dev, "failed to map source address\n"); + dev_err(dev, "Failed to map source address\n"); reg->status = STATUS_SRC_ADDR_INVALID; goto err_src_addr; } dst_addr = pci_epc_mem_alloc_addr(epc, _phys_addr, reg->size); if (!dst_addr) { - dev_err(dev, "failed to allocate destination address\n"); + dev_err(dev, "Failed to allocate destination address\n"); reg->status = STATUS_DST_ADDR_INVALID; ret = -ENOMEM; goto err_src_map_addr; @@ -117,7 +117,7 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test) ret = pci_epc_map_addr(epc, epf->func_no, dst_phys_addr, reg->dst_addr, reg->size); if (ret) { - dev_err(dev, "failed to map destination address\n"); + dev_err(dev, "Failed to map destination address\n"); reg->status = STATUS_DST_ADDR_INVALID; goto err_dst_addr; } @@ -154,7 +154,7 @@ static int pci_epf_test_read(struct pci_epf_test *epf_test) src_addr = pci_epc_mem_alloc_addr(epc, _addr, reg->size); if (!src_addr) { - dev_err(dev, "failed to allocate address\n"); + dev_err(dev, "Failed to allocate address\n"); reg->status = STATUS_SRC_ADDR_INVALID; ret = -ENOMEM; goto err; @@ -163,7 +163,7 @@ static int pci_epf_test_read(struct pci_epf_test *epf_test) ret = pci_epc_map_addr(epc, epf->func_no, phys_addr, reg->src_addr, reg->size); if (ret) { - dev_err(dev, "failed to map address\n"); + dev_err(dev, "Failed to map address\n"); reg->status = STATUS_SRC_ADDR_INVALID; goto err_addr; } @@ -206,7 +206,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test) dst_addr = pci_epc_mem_alloc_addr(epc, _addr, reg->size); if (!dst_addr) { - dev_err(dev, "failed to allocate address\n"); + dev_err(dev, "Failed to allocate address\n"); reg->status = STATUS_DST_ADDR_INVALID; ret = -ENOMEM; goto err; @@ -215,7 +215,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test) ret = pci_epc_map_addr(epc, epf->func_no, phys_addr, reg->dst_addr, reg->size); if (ret) { - dev_err(dev, "failed to map address\n"); + dev_err(dev, "Failed to map address\n"); reg->status = STATUS_DST_ADDR_INVALID; goto err_addr; } @@ -409,7 +409,7 @@ static int pci_epf_test_set_bar(struct pci_epf *epf) ret = pci_epc_set_bar(epc, epf->func_no, epf_bar); if (ret) { pci_epf_free_space(epf, epf_test->reg[bar], bar); - dev_err(dev, "failed to set BAR%d\n", bar); + dev_err(dev, "Failed to set BAR%d\n", bar); if (bar == test_reg_bar) return ret; } @@ -436,7 +436,7 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf) base = pci_epf_alloc_space(epf, sizeof(struct pci_epf_test_reg), test_reg_bar); if (!base) { - dev_err(dev, "failed to allocated register space\n"); + dev_err(dev, "Failed to allocated register space\n"); return -ENOMEM; } epf_test->reg[test_reg_bar] = base; @@ -446,7 +446,7 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf) continue; base =
[PATCH 05/10] PCI: dwc: Add legacy interrupt callback handler
Add a legacy interrupt callback handler. Currently DesignWare IP don't allow triggering the legacy interrupt. Signed-off-by: Gustavo Pimentel--- drivers/pci/dwc/pcie-designware-ep.c | 10 ++ drivers/pci/dwc/pcie-designware-plat.c | 3 +-- drivers/pci/dwc/pcie-designware.h | 6 ++ 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c index 1ba3a7f..b599169 100644 --- a/drivers/pci/dwc/pcie-designware-ep.c +++ b/drivers/pci/dwc/pcie-designware-ep.c @@ -370,6 +370,16 @@ static const struct pci_epc_ops epc_ops = { .stop = dw_pcie_ep_stop, }; +int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no) +{ + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + struct device *dev = pci->dev; + + dev_err(dev, "EP cannot trigger legacy IRQs\n"); + + return -EINVAL; +} + int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, u8 interrupt_num) { diff --git a/drivers/pci/dwc/pcie-designware-plat.c b/drivers/pci/dwc/pcie-designware-plat.c index 654dcb5..90a8c95 100644 --- a/drivers/pci/dwc/pcie-designware-plat.c +++ b/drivers/pci/dwc/pcie-designware-plat.c @@ -84,8 +84,7 @@ static int dw_plat_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, switch (type) { case PCI_EPC_IRQ_LEGACY: - dev_err(pci->dev, "EP cannot trigger legacy IRQs\n"); - return -EINVAL; + return dw_pcie_ep_raise_legacy_irq(ep, func_no); case PCI_EPC_IRQ_MSI: return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num); case PCI_EPC_IRQ_MSIX: diff --git a/drivers/pci/dwc/pcie-designware.h b/drivers/pci/dwc/pcie-designware.h index a0ab12f..69e6e17 100644 --- a/drivers/pci/dwc/pcie-designware.h +++ b/drivers/pci/dwc/pcie-designware.h @@ -350,6 +350,7 @@ static inline int dw_pcie_allocate_domains(struct pcie_port *pp) void dw_pcie_ep_linkup(struct dw_pcie_ep *ep); int dw_pcie_ep_init(struct dw_pcie_ep *ep); void dw_pcie_ep_exit(struct dw_pcie_ep *ep); +int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no); int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, u8 interrupt_num); int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no, @@ -369,6 +370,11 @@ static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep) { } +static inline int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no) +{ + return 0; +} + static inline int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, u8 interrupt_num) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] misc: pci_endpoint_test: Replace lower into upper case characters
Replace all initial lower case character into upper case in comments and debug printks. Signed-off-by: Gustavo Pimentel--- drivers/misc/pci_endpoint_test.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index b003079..c9b6e26 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -250,7 +250,7 @@ static bool pci_endpoint_test_copy(struct pci_endpoint_test *test, size_t size) orig_src_addr = dma_alloc_coherent(dev, size + alignment, _src_phys_addr, GFP_KERNEL); if (!orig_src_addr) { - dev_err(dev, "failed to allocate source buffer\n"); + dev_err(dev, "Failed to allocate source buffer\n"); ret = false; goto err; } @@ -276,7 +276,7 @@ static bool pci_endpoint_test_copy(struct pci_endpoint_test *test, size_t size) orig_dst_addr = dma_alloc_coherent(dev, size + alignment, _dst_phys_addr, GFP_KERNEL); if (!orig_dst_addr) { - dev_err(dev, "failed to allocate destination address\n"); + dev_err(dev, "Failed to allocate destination address\n"); ret = false; goto err_orig_src_addr; } @@ -340,7 +340,7 @@ static bool pci_endpoint_test_write(struct pci_endpoint_test *test, size_t size) orig_addr = dma_alloc_coherent(dev, size + alignment, _phys_addr, GFP_KERNEL); if (!orig_addr) { - dev_err(dev, "failed to allocate address\n"); + dev_err(dev, "Failed to allocate address\n"); ret = false; goto err; } @@ -403,7 +403,7 @@ static bool pci_endpoint_test_read(struct pci_endpoint_test *test, size_t size) orig_addr = dma_alloc_coherent(dev, size + alignment, _phys_addr, GFP_KERNEL); if (!orig_addr) { - dev_err(dev, "failed to allocate destination address\n"); + dev_err(dev, "Failed to allocate destination address\n"); ret = false; goto err; } @@ -543,7 +543,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, case IRQ_TYPE_MSI: irq = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI); if (irq < 0) - dev_err(dev, "failed to get MSI interrupts\n"); + dev_err(dev, "Failed to get MSI interrupts\n"); test->num_irqs = irq; break; case IRQ_TYPE_MSIX: @@ -560,7 +560,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, err = devm_request_irq(dev, pdev->irq, pci_endpoint_test_irqhandler, IRQF_SHARED, DRV_MODULE_NAME, test); if (err) { - dev_err(dev, "failed to request IRQ %d\n", pdev->irq); + dev_err(dev, "Failed to request IRQ %d\n", pdev->irq); goto err_disable_msi; } @@ -578,7 +578,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) { base = pci_ioremap_bar(pdev, bar); if (!base) { - dev_err(dev, "failed to read BAR%d\n", bar); + dev_err(dev, "Failed to read BAR%d\n", bar); WARN_ON(bar == test_reg_bar); } test->bar[bar] = base; @@ -598,7 +598,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, id = ida_simple_get(_endpoint_test_ida, 0, 0, GFP_KERNEL); if (id < 0) { err = id; - dev_err(dev, "unable to get id\n"); + dev_err(dev, "Unable to get id\n"); goto err_iounmap; } @@ -614,7 +614,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, err = misc_register(misc_device); if (err) { - dev_err(dev, "failed to register device\n"); + dev_err(dev, "Failed to register device\n"); goto err_kfree_name; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] misc: pci_endpoint_test: Use pci_irq_vector function
Replace "pdev->irq + index" operation by the pci_irq_vector() call, that converts from device vector to Linux IRQ. (suggestion made by Alan Douglas). Signed-off-by: Gustavo Pimentel--- drivers/misc/pci_endpoint_test.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index c9b6e26..cbdd0c6 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -220,7 +220,7 @@ static bool pci_endpoint_test_msi_irq(struct pci_endpoint_test *test, if (!val) return false; - if (test->last_irq - pdev->irq == msi_num - 1) + if (pci_irq_vector(pdev, irq_num - 1) == test->last_irq) return true; return false; @@ -565,12 +565,12 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, } for (i = 1; i < irq; i++) { - err = devm_request_irq(dev, pdev->irq + i, + err = devm_request_irq(dev, pci_irq_vector(pdev, i), pci_endpoint_test_irqhandler, IRQF_SHARED, DRV_MODULE_NAME, test); if (err) dev_err(dev, "Failed to request IRQ %d for MSI%s %d\n", - pdev->irq + i, + pci_irq_vector(pdev, i), irq_type == IRQ_TYPE_MSIX ? "-X" : "", i + 1); } @@ -633,7 +633,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, } for (i = 0; i < irq; i++) - devm_free_irq(dev, pdev->irq + i, test); + devm_free_irq(>dev, pci_irq_vector(pdev, i), test); err_disable_msi: pci_disable_msi(pdev); @@ -667,7 +667,7 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev) pci_iounmap(pdev, test->bar[bar]); } for (i = 0; i < test->num_irqs; i++) - devm_free_irq(>dev, pdev->irq + i, test); + devm_free_irq(>dev, pci_irq_vector(pdev, i), test); pci_disable_msi(pdev); pci_disable_msix(pdev); pci_release_regions(pdev); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] PCI: dwc: Rework MSI callbacks handler
Remove duplicate defines located on pcie-designware.h file already available on /include/uapi/linux/pci-regs.h file. Add pci_epc_set_msi() maximum 32 interrupts validation. Signed-off-by: Gustavo Pimentel--- drivers/pci/dwc/pcie-designware-ep.c | 49 drivers/pci/dwc/pcie-designware.h| 11 drivers/pci/endpoint/pci-epc-core.c | 3 ++- 3 files changed, 35 insertions(+), 28 deletions(-) diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c index 9b0d396..1ba3a7f 100644 --- a/drivers/pci/dwc/pcie-designware-ep.c +++ b/drivers/pci/dwc/pcie-designware-ep.c @@ -246,29 +246,38 @@ static int dw_pcie_ep_map_addr(struct pci_epc *epc, u8 func_no, static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 func_no) { - int val; struct dw_pcie_ep *ep = epc_get_drvdata(epc); struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + u32 val, reg; + + if (!ep->msi_cap) + return 0; - val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL); - if (!(val & MSI_CAP_MSI_EN_MASK)) + reg = ep->msi_cap + PCI_MSI_FLAGS; + val = dw_pcie_readw_dbi(pci, reg); + if (!(val & PCI_MSI_FLAGS_ENABLE)) return -EINVAL; - val = (val & MSI_CAP_MME_MASK) >> MSI_CAP_MME_SHIFT; + val = (val & PCI_MSI_FLAGS_QSIZE) >> 4; + return val; } -static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 encode_int) +static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts) { - int val; struct dw_pcie_ep *ep = epc_get_drvdata(epc); struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + u32 val, reg; - val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL); - val &= ~MSI_CAP_MMC_MASK; - val |= (encode_int << MSI_CAP_MMC_SHIFT) & MSI_CAP_MMC_MASK; + if (!ep->msi_cap) + return 0; + + reg = ep->msi_cap + PCI_MSI_FLAGS; + val = dw_pcie_readw_dbi(pci, reg); + val &= ~PCI_MSI_FLAGS_QMASK; + val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK; dw_pcie_dbi_ro_wr_en(pci); - dw_pcie_writew_dbi(pci, MSI_MESSAGE_CONTROL, val); + dw_pcie_writew_dbi(pci, reg, val); dw_pcie_dbi_ro_wr_dis(pci); return 0; @@ -367,21 +376,29 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, struct dw_pcie *pci = to_dw_pcie_from_ep(ep); struct pci_epc *epc = ep->epc; u16 msg_ctrl, msg_data; - u32 msg_addr_lower, msg_addr_upper; + u32 msg_addr_lower, msg_addr_upper, reg; u64 msg_addr; bool has_upper; int ret; + if (!ep->msi_cap) + return 0; + /* Raise MSI per the PCI Local Bus Specification Revision 3.0, 6.8.1. */ - msg_ctrl = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL); + reg = ep->msi_cap + PCI_MSI_FLAGS; + msg_ctrl = dw_pcie_readw_dbi(pci, reg); has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT); - msg_addr_lower = dw_pcie_readl_dbi(pci, MSI_MESSAGE_ADDR_L32); + reg = ep->msi_cap + PCI_MSI_ADDRESS_LO; + msg_addr_lower = dw_pcie_readl_dbi(pci, reg); if (has_upper) { - msg_addr_upper = dw_pcie_readl_dbi(pci, MSI_MESSAGE_ADDR_U32); - msg_data = dw_pcie_readw_dbi(pci, MSI_MESSAGE_DATA_64); + reg = ep->msi_cap + PCI_MSI_ADDRESS_HI; + msg_addr_upper = dw_pcie_readl_dbi(pci, reg); + reg = ep->msi_cap + PCI_MSI_DATA_64; + msg_data = dw_pcie_readw_dbi(pci, reg); } else { msg_addr_upper = 0; - msg_data = dw_pcie_readw_dbi(pci, MSI_MESSAGE_DATA_32); + reg = ep->msi_cap + PCI_MSI_DATA_32; + msg_data = dw_pcie_readw_dbi(pci, reg); } msg_addr = ((u64) msg_addr_upper) << 32 | msg_addr_lower; ret = dw_pcie_ep_map_addr(epc, func_no, ep->msi_mem_phys, msg_addr, diff --git a/drivers/pci/dwc/pcie-designware.h b/drivers/pci/dwc/pcie-designware.h index b22c5bb..a0ab12f 100644 --- a/drivers/pci/dwc/pcie-designware.h +++ b/drivers/pci/dwc/pcie-designware.h @@ -96,17 +96,6 @@ #define PCIE_GET_ATU_INB_UNR_REG_OFFSET(region) \ ((0x3 << 20) | ((region) << 9) | (0x1 << 8)) -#define MSI_MESSAGE_CONTROL0x52 -#define MSI_CAP_MMC_SHIFT 1 -#define MSI_CAP_MMC_MASK (7 << MSI_CAP_MMC_SHIFT) -#define MSI_CAP_MME_SHIFT 4 -#define MSI_CAP_MSI_EN_MASK0x1 -#define MSI_CAP_MME_MASK (7 << MSI_CAP_MME_SHIFT) -#define MSI_MESSAGE_ADDR_L32 0x54 -#define MSI_MESSAGE_ADDR_U32 0x58 -#define MSI_MESSAGE_DATA_320x58 -#define MSI_MESSAGE_DATA_640x5C - #define MAX_MSI_IRQS 256 #define MAX_MSI_IRQS_PER_CTRL 32 #define MAX_MSI_CTRLS
[PATCH 06/10] misc: pci_endpoint_test: Add MSI-X support
Add MSI-X support and update driver documentation accordingly. Add new driver parameter to allow interruption type selection. Modify the Legacy/MSI/MSI-X test process, by: - Add and use a specific register located in a BAR, which defines the interrupt type is been triggered. - Move the interrupt ID number from the command section to a register located in a BAR. Signed-off-by: Gustavo Pimentel--- Documentation/misc-devices/pci-endpoint-test.txt | 3 + drivers/misc/pci_endpoint_test.c | 121 +++ drivers/pci/endpoint/functions/pci-epf-test.c| 78 +++ 3 files changed, 143 insertions(+), 59 deletions(-) diff --git a/Documentation/misc-devices/pci-endpoint-test.txt b/Documentation/misc-devices/pci-endpoint-test.txt index 4ebc359..fdfa0f6 100644 --- a/Documentation/misc-devices/pci-endpoint-test.txt +++ b/Documentation/misc-devices/pci-endpoint-test.txt @@ -10,6 +10,7 @@ The PCI driver for the test device performs the following tests *) verifying addresses programmed in BAR *) raise legacy IRQ *) raise MSI IRQ + *) raise MSI-X IRQ *) read data *) write data *) copy data @@ -25,6 +26,8 @@ ioctl PCITEST_LEGACY_IRQ: Tests legacy IRQ PCITEST_MSI: Tests message signalled interrupts. The MSI number to be tested should be passed as argument. + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number + to be tested should be passed as argument. PCITEST_WRITE: Perform write tests. The size of the buffer should be passed as argument. PCITEST_READ: Perform read tests. The size of the buffer should be passed diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index 58a88ba..b003079 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -35,38 +35,44 @@ #include -#define DRV_MODULE_NAME"pci-endpoint-test" - -#define PCI_ENDPOINT_TEST_MAGIC0x0 - -#define PCI_ENDPOINT_TEST_COMMAND 0x4 -#define COMMAND_RAISE_LEGACY_IRQ BIT(0) -#define COMMAND_RAISE_MSI_IRQ BIT(1) -#define MSI_NUMBER_SHIFT 2 -/* 6 bits for MSI number */ -#define COMMAND_READBIT(8) -#define COMMAND_WRITE BIT(9) -#define COMMAND_COPYBIT(10) - -#define PCI_ENDPOINT_TEST_STATUS 0x8 -#define STATUS_READ_SUCCESS BIT(0) -#define STATUS_READ_FAILBIT(1) -#define STATUS_WRITE_SUCCESSBIT(2) -#define STATUS_WRITE_FAIL BIT(3) -#define STATUS_COPY_SUCCESS BIT(4) -#define STATUS_COPY_FAILBIT(5) -#define STATUS_IRQ_RAISED BIT(6) -#define STATUS_SRC_ADDR_INVALID BIT(7) -#define STATUS_DST_ADDR_INVALID BIT(8) - -#define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR 0xc +#define DRV_MODULE_NAME"pci-endpoint-test" + +#define IRQ_TYPE_LEGACY0 +#define IRQ_TYPE_MSI 1 +#define IRQ_TYPE_MSIX 2 + +#define PCI_ENDPOINT_TEST_MAGIC0x0 + +#define PCI_ENDPOINT_TEST_COMMAND 0x4 +#define COMMAND_RAISE_LEGACY_IRQ BIT(0) +#define COMMAND_RAISE_MSI_IRQ BIT(1) +#define COMMAND_RAISE_MSIX_IRQ BIT(2) +#define COMMAND_READ BIT(3) +#define COMMAND_WRITE BIT(4) +#define COMMAND_COPY BIT(5) + +#define PCI_ENDPOINT_TEST_STATUS 0x8 +#define STATUS_READ_SUCCESSBIT(0) +#define STATUS_READ_FAIL BIT(1) +#define STATUS_WRITE_SUCCESS BIT(2) +#define STATUS_WRITE_FAIL BIT(3) +#define STATUS_COPY_SUCCESSBIT(4) +#define STATUS_COPY_FAIL BIT(5) +#define STATUS_IRQ_RAISED BIT(6) +#define STATUS_SRC_ADDR_INVALIDBIT(7) +#define STATUS_DST_ADDR_INVALIDBIT(8) + +#define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR 0x0c #define PCI_ENDPOINT_TEST_UPPER_SRC_ADDR 0x10 #define PCI_ENDPOINT_TEST_LOWER_DST_ADDR 0x14 #define PCI_ENDPOINT_TEST_UPPER_DST_ADDR 0x18 -#define PCI_ENDPOINT_TEST_SIZE 0x1c -#define PCI_ENDPOINT_TEST_CHECKSUM 0x20 +#define PCI_ENDPOINT_TEST_SIZE 0x1c +#define PCI_ENDPOINT_TEST_CHECKSUM 0x20 + +#define PCI_ENDPOINT_TEST_IRQ_TYPE 0x24 +#define PCI_ENDPOINT_TEST_IRQ_NUMBER 0x28 static DEFINE_IDA(pci_endpoint_test_ida); @@ -77,6 +83,10 @@ static bool no_msi; module_param(no_msi, bool, 0444); MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in pci_endpoint_test"); +static int irq_type = IRQ_TYPE_MSI; +module_param(irq_type, int, 0444);
[PATCH 09/10] tools: PCI: Add MSI-X support
Add MSI-X support to pcitest tool. Modify pcitest.sh script to accomodate MSI-X interrupt tests. Signed-off-by: Gustavo Pimentel--- include/uapi/linux/pcitest.h | 1 + tools/pci/pcitest.c | 18 +- tools/pci/pcitest.sh | 25 + 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/pcitest.h b/include/uapi/linux/pcitest.h index 953cf03..d746fb1 100644 --- a/include/uapi/linux/pcitest.h +++ b/include/uapi/linux/pcitest.h @@ -16,5 +16,6 @@ #define PCITEST_WRITE _IOW('P', 0x4, unsigned long) #define PCITEST_READ _IOW('P', 0x5, unsigned long) #define PCITEST_COPY _IOW('P', 0x6, unsigned long) +#define PCITEST_MSIX _IOW('P', 0x7, int) #endif /* __UAPI_LINUX_PCITEST_H */ diff --git a/tools/pci/pcitest.c b/tools/pci/pcitest.c index 9074b47..9d145a3 100644 --- a/tools/pci/pcitest.c +++ b/tools/pci/pcitest.c @@ -37,6 +37,7 @@ struct pci_test { charbarnum; boollegacyirq; unsigned intmsinum; + unsigned intmsixnum; boolread; boolwrite; boolcopy; @@ -83,6 +84,15 @@ static int run_test(struct pci_test *test) fprintf(stdout, "%s\n", result[ret]); } + if (test->msixnum > 0 && test->msixnum <= 2048) { + ret = ioctl(fd, PCITEST_MSIX, test->msixnum); + fprintf(stdout, "MSI-X%d:\t\t", test->msixnum); + if (ret < 0) + fprintf(stdout, "TEST FAILED\n"); + else + fprintf(stdout, "%s\n", result[ret]); + } + if (test->write) { ret = ioctl(fd, PCITEST_WRITE, test->size); fprintf(stdout, "WRITE (%7ld bytes):\t\t", test->size); @@ -133,7 +143,7 @@ int main(int argc, char **argv) /* set default endpoint device */ test->device = "/dev/pci-endpoint-test.0"; - while ((c = getopt(argc, argv, "D:b:m:lrwcs:")) != EOF) + while ((c = getopt(argc, argv, "D:b:m:x:lrwcs:")) != EOF) switch (c) { case 'D': test->device = optarg; @@ -151,6 +161,11 @@ int main(int argc, char **argv) if (test->msinum < 1 || test->msinum > 32) goto usage; continue; + case 'x': + test->msixnum = atoi(optarg); + if (test->msixnum < 1 || test->msixnum > 2048) + goto usage; + continue; case 'r': test->read = true; continue; @@ -173,6 +188,7 @@ int main(int argc, char **argv) "\t-D PCI endpoint test device {default: /dev/pci-endpoint-test.0}\n" "\t-b BAR test (bar number between 0..5)\n" "\t-m MSI test (msi number between 1..32)\n" + "\t-x MSI-X test (msix number between 1..2048)\n" "\t-l Legacy IRQ test\n" "\t-r Read buffer test\n" "\t-w Write buffer test\n" diff --git a/tools/pci/pcitest.sh b/tools/pci/pcitest.sh index 77e8c85..86709a2 100644 --- a/tools/pci/pcitest.sh +++ b/tools/pci/pcitest.sh @@ -4,6 +4,8 @@ echo "BAR tests" echo +modprobe pci_endpoint_test +sleep 2 bar=0 while [ $bar -lt 6 ] @@ -16,7 +18,14 @@ echo echo "Interrupt tests" echo +rmmod pci_endpoint_test +sleep 2 +modprobe pci_endpoint_test irq_type=0 pcitest -l + +rmmod pci_endpoint_test +sleep 2 +modprobe pci_endpoint_test irq_type=1 msi=1 while [ $msi -lt 33 ] @@ -26,9 +35,25 @@ do done echo +rmmod pci_endpoint_test +sleep 2 +modprobe pci_endpoint_test irq_type=2 +msix=1 + +while [ $msix -lt 2049 ] +do +pcitest -x $msix +msix=`expr $msix + 1` +done +echo + echo "Read Tests" echo +rmmod pci_endpoint_test +sleep 2 +modprobe pci_endpoint_test irq_type=1 + pcitest -r -s 1 pcitest -r -s 1024 pcitest -r -s 1025 -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] PCI: endpoint: Add MSI-X interfaces
Add PCI_EPC_IRQ_MSIX type. Add MSI-X callbacks signatures to ops structure. Add sysfs interface for set/get MSI-X capability maximum number. Signed-off-by: Gustavo Pimentel--- drivers/pci/endpoint/pci-ep-cfs.c | 24 drivers/pci/endpoint/pci-epc-core.c | 57 + include/linux/pci-epc.h | 11 ++- include/linux/pci-epf.h | 1 + 4 files changed, 92 insertions(+), 1 deletion(-) diff --git a/drivers/pci/endpoint/pci-ep-cfs.c b/drivers/pci/endpoint/pci-ep-cfs.c index 018ea34..d1288a0 100644 --- a/drivers/pci/endpoint/pci-ep-cfs.c +++ b/drivers/pci/endpoint/pci-ep-cfs.c @@ -286,6 +286,28 @@ static ssize_t pci_epf_msi_interrupts_show(struct config_item *item, to_pci_epf_group(item)->epf->msi_interrupts); } +static ssize_t pci_epf_msix_interrupts_store(struct config_item *item, +const char *page, size_t len) +{ + u16 val; + int ret; + + ret = kstrtou16(page, 0, ); + if (ret) + return ret; + + to_pci_epf_group(item)->epf->msix_interrupts = val; + + return len; +} + +static ssize_t pci_epf_msix_interrupts_show(struct config_item *item, + char *page) +{ + return sprintf(page, "%d\n", + to_pci_epf_group(item)->epf->msix_interrupts); +} + PCI_EPF_HEADER_R(vendorid) PCI_EPF_HEADER_W_u16(vendorid) @@ -327,6 +349,7 @@ CONFIGFS_ATTR(pci_epf_, subsys_vendor_id); CONFIGFS_ATTR(pci_epf_, subsys_id); CONFIGFS_ATTR(pci_epf_, interrupt_pin); CONFIGFS_ATTR(pci_epf_, msi_interrupts); +CONFIGFS_ATTR(pci_epf_, msix_interrupts); static struct configfs_attribute *pci_epf_attrs[] = { _epf_attr_vendorid, @@ -340,6 +363,7 @@ static struct configfs_attribute *pci_epf_attrs[] = { _epf_attr_subsys_id, _epf_attr_interrupt_pin, _epf_attr_msi_interrupts, + _epf_attr_msix_interrupts, NULL, }; diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c index b0ee427..7d77bd0 100644 --- a/drivers/pci/endpoint/pci-epc-core.c +++ b/drivers/pci/endpoint/pci-epc-core.c @@ -218,6 +218,63 @@ int pci_epc_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts) EXPORT_SYMBOL_GPL(pci_epc_set_msi); /** + * pci_epc_get_msix() - get the number of MSI-X interrupt numbers allocated + * @epc: the EPC device to which MSI-X interrupts was requested + * @func_no: the endpoint function number in the EPC device + * + * Invoke to get the number of MSI-X interrupts allocated by the RC + */ +int pci_epc_get_msix(struct pci_epc *epc, u8 func_no) +{ + int interrupt; + unsigned long flags; + + if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions) + return 0; + + if (!epc->ops->get_msix) + return 0; + + spin_lock_irqsave(>lock, flags); + interrupt = epc->ops->get_msix(epc, func_no); + spin_unlock_irqrestore(>lock, flags); + + if (interrupt < 0) + return 0; + + return interrupt + 1; +} +EXPORT_SYMBOL_GPL(pci_epc_get_msix); + +/** + * pci_epc_set_msix() - set the number of MSI-X interrupt numbers required + * @epc: the EPC device on which MSI-X has to be configured + * @func_no: the endpoint function number in the EPC device + * @interrupts: number of MSI-X interrupts required by the EPF + * + * Invoke to set the required number of MSI-X interrupts. + */ +int pci_epc_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts) +{ + int ret; + unsigned long flags; + + if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions || + interrupts < 1 || interrupts > 2048) + return -EINVAL; + + if (!epc->ops->set_msix) + return 0; + + spin_lock_irqsave(>lock, flags); + ret = epc->ops->set_msix(epc, func_no, interrupts - 1); + spin_unlock_irqrestore(>lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(pci_epc_set_msix); + +/** * pci_epc_unmap_addr() - unmap CPU address from PCI address * @epc: the EPC device on which address is allocated * @func_no: the endpoint function number in the EPC device diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h index af657ca..32e8961 100644 --- a/include/linux/pci-epc.h +++ b/include/linux/pci-epc.h @@ -17,6 +17,7 @@ enum pci_epc_irq_type { PCI_EPC_IRQ_UNKNOWN, PCI_EPC_IRQ_LEGACY, PCI_EPC_IRQ_MSI, + PCI_EPC_IRQ_MSIX, }; /** @@ -30,6 +31,10 @@ enum pci_epc_irq_type { * capability register * @get_msi: ops to get the number of MSI interrupts allocated by the RC from * the MSI capability register + * @set_msix: ops to set the requested number of MSI-X interrupts in the + * MSI-X capability register + * @get_msix: ops to get the number of MSI-X interrupts allocated by the RC + * from the
[PATCH 02/10] PCI: dwc: Add MSI-X callbacks handler
Change pcie_raise_irq() signature, namely the interrupt_num variable type from u8 to u16 to accommodate the 2048 maximum MSI-X interrupts. Add PCIe config space capability search function. Add sysfs set/get interface to allow to change of EP MSI-X maximum number. Add EP MSI-X callback for triggering interruptions. Signed-off-by: Gustavo Pimentel--- drivers/pci/dwc/pci-dra7xx.c | 2 +- drivers/pci/dwc/pcie-artpec6.c | 2 +- drivers/pci/dwc/pcie-designware-ep.c | 143 - drivers/pci/dwc/pcie-designware-plat.c | 4 +- drivers/pci/dwc/pcie-designware.h | 14 +++- 5 files changed, 160 insertions(+), 5 deletions(-) diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index ed8558d..5265725 100644 --- a/drivers/pci/dwc/pci-dra7xx.c +++ b/drivers/pci/dwc/pci-dra7xx.c @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie *dra7xx, } static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no, -enum pci_epc_irq_type type, u8 interrupt_num) +enum pci_epc_irq_type type, u16 interrupt_num) { struct dw_pcie *pci = to_dw_pcie_from_ep(ep); struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c index e66cede..96dc259 100644 --- a/drivers/pci/dwc/pcie-artpec6.c +++ b/drivers/pci/dwc/pcie-artpec6.c @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep) } static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no, - enum pci_epc_irq_type type, u8 interrupt_num) + enum pci_epc_irq_type type, u16 interrupt_num) { struct dw_pcie *pci = to_dw_pcie_from_ep(ep); diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c index 15b22a6..9b0d396 100644 --- a/drivers/pci/dwc/pcie-designware-ep.c +++ b/drivers/pci/dwc/pcie-designware-ep.c @@ -40,6 +40,39 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar) __dw_pcie_ep_reset_bar(pci, bar, 0); } +u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr, + u8 cap) +{ + u8 cap_id, next_cap_ptr; + u16 reg; + + reg = dw_pcie_readw_dbi(pci, cap_ptr); + next_cap_ptr = (reg & 0xff00) >> 8; + cap_id = (reg & 0x00ff); + + if (!next_cap_ptr || cap_id > PCI_CAP_ID_MAX) + return 0; + + if (cap_id == cap) + return cap_ptr; + + return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap); +} + +u8 dw_pcie_ep_find_capability(struct dw_pcie *pci, u8 cap) +{ + u8 next_cap_ptr; + u16 reg; + + reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST); + next_cap_ptr = (reg & 0x00ff); + + if (!next_cap_ptr) + return 0; + + return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap); +} + static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no, struct pci_epf_header *hdr) { @@ -241,8 +274,47 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 encode_int) return 0; } +static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no) +{ + struct dw_pcie_ep *ep = epc_get_drvdata(epc); + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + u32 val, reg; + + if (!ep->msix_cap) + return 0; + + reg = ep->msix_cap + PCI_MSIX_FLAGS; + val = dw_pcie_readw_dbi(pci, reg); + if (!(val & PCI_MSIX_FLAGS_ENABLE)) + return -EINVAL; + + val &= PCI_MSIX_FLAGS_QSIZE; + + return val; +} + +static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts) +{ + struct dw_pcie_ep *ep = epc_get_drvdata(epc); + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + u32 val, reg; + + if (!ep->msix_cap) + return 0; + + reg = ep->msix_cap + PCI_MSIX_FLAGS; + val = dw_pcie_readw_dbi(pci, reg); + val &= ~PCI_MSIX_FLAGS_QSIZE; + val |= interrupts; + dw_pcie_dbi_ro_wr_en(pci); + dw_pcie_writew_dbi(pci, reg, val); + dw_pcie_dbi_ro_wr_dis(pci); + + return 0; +} + static int dw_pcie_ep_raise_irq(struct pci_epc *epc, u8 func_no, - enum pci_epc_irq_type type, u8 interrupt_num) + enum pci_epc_irq_type type, u16 interrupt_num) { struct dw_pcie_ep *ep = epc_get_drvdata(epc); @@ -282,6 +354,8 @@ static const struct pci_epc_ops epc_ops = { .unmap_addr = dw_pcie_ep_unmap_addr, .set_msi= dw_pcie_ep_set_msi, .get_msi= dw_pcie_ep_get_msi, + .set_msix = dw_pcie_ep_set_msix, + .get_msix = dw_pcie_ep_get_msix,
[PATCH 03/10] PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
Change cdns_pcie_ep_raise_irq() signature, namely the interrupt_num variable type from u8 to u16 to accommodate the 2048 maximum MSI-X interrupts. Signed-off-by: Gustavo PimentelAcked-by: Alan Douglas --- drivers/pci/cadence/pcie-cadence-ep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/cadence/pcie-cadence-ep.c b/drivers/pci/cadence/pcie-cadence-ep.c index 3d8283e..6d6322c 100644 --- a/drivers/pci/cadence/pcie-cadence-ep.c +++ b/drivers/pci/cadence/pcie-cadence-ep.c @@ -363,7 +363,7 @@ static int cdns_pcie_ep_send_msi_irq(struct cdns_pcie_ep *ep, u8 fn, } static int cdns_pcie_ep_raise_irq(struct pci_epc *epc, u8 fn, - enum pci_epc_irq_type type, u8 interrupt_num) + enum pci_epc_irq_type type, u16 interrupt_num) { struct cdns_pcie_ep *ep = epc_get_drvdata(epc); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
On Fri, 27 Apr 2018 17:09:14 +0100 Will Deaconwrote: > Kim, > > [Ganapat: please don't let this discussion disrupt your PMU driver > development. You can safely ignore it for now :)] > > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote: > > On Fri, 27 Apr 2018 15:37:20 +0100 > > Will Deacon wrote: > > > > > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote: > > > > On Fri, 27 Apr 2018 10:30:27 +0100 > > > > Mark Rutland wrote: > > > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > > > > > > On Wed, 25 Apr 2018 14:30:47 +0530 > > > > > > Ganapatrao Kulkarni wrote: > > > > > > > > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > > > > > > > > > > > This PMU driver can be made more user-friendly by not just silently > > > > > > returning an error code such as -EINVAL, but by emitting a useful > > > > > > message describing the specific error via dmesg. > > > > > > > > > > As has previously been discussed on several occasions, patches which > > > > > log > > > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are > > > > > not acceptable -- dmesg is not intended as a mechanism to inform users > > > > > of driver-specific constraints. > > > > > > > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), > > > > etc. > > > > > > > > > I would appreciate if in future you could qualify your suggestion with > > > > > the requirement that pr_debug() is used. > > > > > > > > It shouldn't - the driver isn't being debugged, it's in regular use. > > > > > > For anything under drivers/perf/, I'd prefer not to have these prints > > > and instead see efforts to improve error reporting via the perf system > > > call interface. > > > > We'd all prefer that, and for all PMU drivers, why should ones under > > drivers/perf be treated differently? > > Because they're the ones I maintain... You represent a minority on your opinion on this matter though. > > As you are already aware, I've personally tried to fix this problem - > > that has existed since before the introduction of the perf tool (I > > consider it a syscall-independent enhanced error interface), multiple > > times, and failed. > > Why is that my problem? Try harder? It's your problem because we're here reviewing a patch that happens to fall under your maintainership. I'll be the first person to tell you I'm obviously incompetent and haven't been able to come up with a solution that is acceptable for everyone up to and including Linus Torvalds. I'm just noticing a chronic usability problem that can be easily alleviated in the context of this patch review. > > So until someone comes up with a solution that works for everyone > > up to and including Linus Torvalds (who hasn't put up a problem > > pulling PMU drivers emitting things to dmesg so far, by the way), this > > keep PMU drivers' errors silent preference of yours is unnecessarily > > impeding people trying to measure system performance on Arm based > > machines - all other archs' maintainers are fine with PMU drivers using > > dmesg. > > Good for them, although I'm pretty sure that at least the x86 folks are > against this crap too. Unfortunately, it doesn't affect them nearly as much as it does our more diverse platforms, which is why I don't think they care to do much about it. > > > Anyway, I think this driver has bigger problems that need addressing. > > > > To me it represents yet another PMU driver submission - as the years go > > by - that is lacking in the user messaging area. Which reminds me, can > > you take another look at applying this?: > > As I said before, I'm not going to take anything that logs above pr_debug > for things that are directly triggerable from userspace. Spin a version Why? There are plenty of things that emit stuff into dmesg that are directly triggerable from userspace. Is it because it upsets fuzzing tests? How about those be run with a patched kernel that somehow mitigates the printing? > using pr_debug and I'll queue it. How about using a ratelimited dev_err variant? > Have a good weekend, You too. Kim -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
Kim, [Ganapat: please don't let this discussion disrupt your PMU driver development. You can safely ignore it for now :)] On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote: > On Fri, 27 Apr 2018 15:37:20 +0100 > Will Deaconwrote: > > > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote: > > > On Fri, 27 Apr 2018 10:30:27 +0100 > > > Mark Rutland wrote: > > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > > > > > On Wed, 25 Apr 2018 14:30:47 +0530 > > > > > Ganapatrao Kulkarni wrote: > > > > > > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > > > > > > > > > This PMU driver can be made more user-friendly by not just silently > > > > > returning an error code such as -EINVAL, but by emitting a useful > > > > > message describing the specific error via dmesg. > > > > > > > > As has previously been discussed on several occasions, patches which log > > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are > > > > not acceptable -- dmesg is not intended as a mechanism to inform users > > > > of driver-specific constraints. > > > > > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc. > > > > > > > I would appreciate if in future you could qualify your suggestion with > > > > the requirement that pr_debug() is used. > > > > > > It shouldn't - the driver isn't being debugged, it's in regular use. > > > > For anything under drivers/perf/, I'd prefer not to have these prints > > and instead see efforts to improve error reporting via the perf system > > call interface. > > We'd all prefer that, and for all PMU drivers, why should ones under > drivers/perf be treated differently? Because they're the ones I maintain... > As you are already aware, I've personally tried to fix this problem - > that has existed since before the introduction of the perf tool (I > consider it a syscall-independent enhanced error interface), multiple > times, and failed. Why is that my problem? Try harder? > So until someone comes up with a solution that works for everyone > up to and including Linus Torvalds (who hasn't put up a problem > pulling PMU drivers emitting things to dmesg so far, by the way), this > keep PMU drivers' errors silent preference of yours is unnecessarily > impeding people trying to measure system performance on Arm based > machines - all other archs' maintainers are fine with PMU drivers using > dmesg. Good for them, although I'm pretty sure that at least the x86 folks are against this crap too. > > Anyway, I think this driver has bigger problems that need addressing. > > To me it represents yet another PMU driver submission - as the years go > by - that is lacking in the user messaging area. Which reminds me, can > you take another look at applying this?: As I said before, I'm not going to take anything that logs above pr_debug for things that are directly triggerable from userspace. Spin a version using pr_debug and I'll queue it. Have a good weekend, Will -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
On Fri, 27 Apr 2018 08:47:14 -0700 Randy Dunlapwrote: > On 04/27/2018 03:35 AM, Ahmed Abdelsalam wrote: > > This patch adds a documentation for seg_flowlabel sysctl into > > Documentation/networking/ip-sysctl.txt > > > > Signed-off-by: Ahmed Abdelsalam > > --- > > Documentation/networking/ip-sysctl.txt | 13 + > > 1 file changed, 13 insertions(+) > > > > diff --git a/Documentation/networking/ip-sysctl.txt > > b/Documentation/networking/ip-sysctl.txt > > index 5dc1a04..7528f71 100644 > > --- a/Documentation/networking/ip-sysctl.txt > > +++ b/Documentation/networking/ip-sysctl.txt > > @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER > > ip6frag_time - INTEGER > > Time in seconds to keep an IPv6 fragment in memory. > > > > +IPv6 Segment Routing: > > + > > +seg6_flowlabel - INTEGER > > + Controls the behaviour of computing the flowlabel of outer > > + IPv6 header in case of SR T.encaps > > + > > + -1 set flowlabel to zero. > > + 0 copy flowlabel from Inner paceket in case of Inner IPv6 > > packet > Thanks I fixed it in v2 of the patch. -- Ahmed Abdelsalam -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next v2] ipv6: sr: Add documentation for seg_flowlabel sysctl
This patch adds a documentation for seg_flowlabel sysctl into Documentation/networking/ip-sysctl.txt Signed-off-by: Ahmed Abdelsalam--- Documentation/networking/ip-sysctl.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5dc1a04..7c14747 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER ip6frag_time - INTEGER Time in seconds to keep an IPv6 fragment in memory. +IPv6 Segment Routing: + +seg6_flowlabel - INTEGER + Controls the behaviour of computing the flowlabel of outer + IPv6 header in case of SR T.encaps + + -1 set flowlabel to zero. + 0 copy flowlabel from Inner packet in case of Inner IPv6 + (Set flowlabel to 0 in case IPv4/L2) + 1 Compute the flowlabel using seg6_make_flowlabel() + + Default is 0. + conf/default/*: Change the interface-specific default settings. -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
On 04/27/2018 03:35 AM, Ahmed Abdelsalam wrote: > This patch adds a documentation for seg_flowlabel sysctl into > Documentation/networking/ip-sysctl.txt > > Signed-off-by: Ahmed Abdelsalam> --- > Documentation/networking/ip-sysctl.txt | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/Documentation/networking/ip-sysctl.txt > b/Documentation/networking/ip-sysctl.txt > index 5dc1a04..7528f71 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER > ip6frag_time - INTEGER > Time in seconds to keep an IPv6 fragment in memory. > > +IPv6 Segment Routing: > + > +seg6_flowlabel - INTEGER > + Controls the behaviour of computing the flowlabel of outer > + IPv6 header in case of SR T.encaps > + > + -1 set flowlabel to zero. > + 0 copy flowlabel from Inner paceket in case of Inner IPv6 packet > + (Set flowlabel to 0 in case IPv4/L2) > + 1 Compute the flowlabel using seg6_make_flowlabel() > + > + Default is 0. > + > conf/default/*: > Change the interface-specific default settings. > > -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
On Fri, 27 Apr 2018 15:37:20 +0100 Will Deaconwrote: > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote: > > On Fri, 27 Apr 2018 10:30:27 +0100 > > Mark Rutland wrote: > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > > > > On Wed, 25 Apr 2018 14:30:47 +0530 > > > > Ganapatrao Kulkarni wrote: > > > > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > > > > > > > This PMU driver can be made more user-friendly by not just silently > > > > returning an error code such as -EINVAL, but by emitting a useful > > > > message describing the specific error via dmesg. > > > > > > As has previously been discussed on several occasions, patches which log > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are > > > not acceptable -- dmesg is not intended as a mechanism to inform users > > > of driver-specific constraints. > > > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc. > > > > > I would appreciate if in future you could qualify your suggestion with > > > the requirement that pr_debug() is used. > > > > It shouldn't - the driver isn't being debugged, it's in regular use. > > For anything under drivers/perf/, I'd prefer not to have these prints > and instead see efforts to improve error reporting via the perf system > call interface. We'd all prefer that, and for all PMU drivers, why should ones under drivers/perf be treated differently? As you are already aware, I've personally tried to fix this problem - that has existed since before the introduction of the perf tool (I consider it a syscall-independent enhanced error interface), multiple times, and failed. So until someone comes up with a solution that works for everyone up to and including Linus Torvalds (who hasn't put up a problem pulling PMU drivers emitting things to dmesg so far, by the way), this keep PMU drivers' errors silent preference of yours is unnecessarily impeding people trying to measure system performance on Arm based machines - all other archs' maintainers are fine with PMU drivers using dmesg. > Anyway, I think this driver has bigger problems that need addressing. To me it represents yet another PMU driver submission - as the years go by - that is lacking in the user messaging area. Which reminds me, can you take another look at applying this?: https://patchwork.kernel.org/patch/10068535/ Thanks, Kim -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote: > On Fri, 27 Apr 2018 10:30:27 +0100 > Mark Rutlandwrote: > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > > > On Wed, 25 Apr 2018 14:30:47 +0530 > > > Ganapatrao Kulkarni wrote: > > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > > > > > This PMU driver can be made more user-friendly by not just silently > > > returning an error code such as -EINVAL, but by emitting a useful > > > message describing the specific error via dmesg. > > > > As has previously been discussed on several occasions, patches which log > > to dmesg in a pmu::event_init() path at any level above pr_debug() are > > not acceptable -- dmesg is not intended as a mechanism to inform users > > of driver-specific constraints. > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc. > > > I would appreciate if in future you could qualify your suggestion with > > the requirement that pr_debug() is used. > > It shouldn't - the driver isn't being debugged, it's in regular use. For anything under drivers/perf/, I'd prefer not to have these prints and instead see efforts to improve error reporting via the perf system call interface. Anyway, I think this driver has bigger problems that need addressing. Will -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC tip/locking/lockdep v6 01/20] lockdep/Documention: Recursive read lock detection reasoning
(Copy more people) On Wed, Apr 11, 2018 at 09:50:51PM +0800, Boqun Feng wrote: > This patch add the documentation piece for the reasoning of deadlock > detection related to recursive read lock. The following sections are > added: > > * Explain what is a recursive read lock, and what deadlock cases > they could introduce. > > * Introduce the notations for different types of dependencies, and > the definition of strong paths. > > * Proof for a closed strong path is both sufficient and necessary > for deadlock detections with recursive read locks involved. The > proof could also explain why we call the path "strong" > > Signed-off-by: Boqun Feng> --- > Documentation/locking/lockdep-design.txt | 178 > +++ > 1 file changed, 178 insertions(+) > > diff --git a/Documentation/locking/lockdep-design.txt > b/Documentation/locking/lockdep-design.txt > index 9de1c158d44c..6bb9e90e2c4f 100644 > --- a/Documentation/locking/lockdep-design.txt > +++ b/Documentation/locking/lockdep-design.txt > @@ -284,3 +284,181 @@ Run the command and save the output, then compare > against the output from > a later run of this command to identify the leakers. This same output > can also help you find situations where runtime lock initialization has > been omitted. > + > +Recursive read locks: > +- > + > +Lockdep now is equipped with deadlock detection for recursive read locks. > + > +Recursive read locks, as their name indicates, are the locks able to be > +acquired recursively. Unlike non-recursive read locks, recursive read locks > +only get blocked by current write lock *holders* other than write lock > +*waiters*, for example: > + > + TASK A: TASK B: > + > + read_lock(X); > + > + write_lock(X); > + > + read_lock(X); > + > +is not a deadlock for recursive read locks, as while the task B is waiting > for > +the lock X, the second read_lock() doesn't need to wait because it's a > recursive > +read lock. However if the read_lock() is non-recursive read lock, then the > above > +case is a deadlock, because even if the write_lock() in TASK B can not get > the > +lock, but it can block the second read_lock() in TASK A. > + > +Note that a lock can be a write lock (exclusive lock), a non-recursive read > +lock (non-recursive shared lock) or a recursive read lock (recursive shared > +lock), depending on the lock operations used to acquire it (more > specifically, > +the value of the 'read' parameter for lock_acquire()). In other words, a > single > +lock instance has three types of acquisition depending on the acquisition > +functions: exclusive, non-recursive read, and recursive read. > + > +To be concise, we call that write locks and non-recursive read locks as > +"non-recursive" locks and recursive read locks as "recursive" locks. > + > +Recursive locks don't block each other, while non-recursive locks do (this is > +even true for two non-recursive read locks). A non-recursive lock can block > the > +corresponding recursive lock, and vice versa. > + > +A deadlock case with recursive locks involved is as follow: > + > + TASK A: TASK B: > + > + read_lock(X); > + read_lock(Y); > + write_lock(Y); > + write_lock(X); > + > +Task A is waiting for task B to read_unlock() Y and task B is waiting for > task > +A to read_unlock() X. > + > +Dependency types and strong dependency paths: > +- > +In order to detect deadlocks as above, lockdep needs to track different > dependencies. > +There are 4 categories for dependency edges in the lockdep graph: > + > +1) -(NN)->: non-recursive to non-recursive dependency. "X -(NN)-> Y" means > +X -> Y and both X and Y are non-recursive locks. > + > +2) -(RN)->: recursive to non-recursive dependency. "X -(RN)-> Y" means > +X -> Y and X is recursive read lock and Y is non-recursive lock. > + > +3) -(NR)->: non-recursive to recursive dependency, "X -(NR)-> Y" means > +X -> Y and X is non-recursive lock and Y is recursive lock. > + > +4) -(RR)->: recursive to recursive dependency, "X -(RR)-> Y" means > +X -> Y and both X and Y are recursive locks. > + > +Note that given two locks, they may have multiple dependencies between them, > for example: > + > + TASK A: > + > + read_lock(X); > + write_lock(Y); > + ... > + > + TASK B: > + > + write_lock(X); > + write_lock(Y); > + > +, we have both X -(RN)-> Y and X -(NN)-> Y in the dependency graph. > + > +We use -(*N)-> for edges that is either -(RN)-> or -(NN)->, the similar for > -(N*)->, > +-(*R)-> and -(R*)-> > + > +A "path" is a series of conjunct dependency edges in the graph. And we > define a > +"strong" path, which indicates the strong dependency throughout each > dependency > +in
Re: [PATCH] documentation: core-api: rearrange a few kernel-api chapters and sections
On Thu, Apr 26, 2018 at 06:11:02PM -0700, Randy Dunlap wrote: > Rearrange some kernel-api chapters and sections to group them > together better. > > - move Bit Operations from Basic C Library Functions to Basic > Kernel Library Functions (now adjacent to Bitmap Operations since > they are not typical C library functions) > > - move Sorting from Math Functions to Basic Kernel Library Functions > since sort functions are more Basic than Math Functions > > - move Text Searching from Math Functions to Basic Kernel Library > Functions (keep Sorting and Searching close to each other) > > - combine CRC and Math functions together into the (newly named) > CRC and Math Functions chapter > > Signed-off-by: Randy DunlapThis all makes sense to me. Acked-by: Matthew Wilcox -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
On Fri, 27 Apr 2018 10:30:27 +0100 Mark Rutlandwrote: > Hi Kim, > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > > On Wed, 25 Apr 2018 14:30:47 +0530 > > Ganapatrao Kulkarni wrote: > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > > > This PMU driver can be made more user-friendly by not just silently > > returning an error code such as -EINVAL, but by emitting a useful > > message describing the specific error via dmesg. > > As has previously been discussed on several occasions, patches which log > to dmesg in a pmu::event_init() path at any level above pr_debug() are > not acceptable -- dmesg is not intended as a mechanism to inform users > of driver-specific constraints. I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc. > I would appreciate if in future you could qualify your suggestion with > the requirement that pr_debug() is used. It shouldn't - the driver isn't being debugged, it's in regular use. Thanks, Kim -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: driver-api: fix device_connection.rst kernel-doc error
On Thu, Apr 26, 2018 at 06:29:41PM -0700, Randy Dunlap wrote: > From: Randy Dunlap> > Using incorrect :functions: syntax (extra space) causes an odd kernel-doc > warning, so fix that. > > Documentation/driver-api/device_connection.rst:42: ERROR: Error in > "kernel-doc" directive: > > Signed-off-by: Randy Dunlap > Cc: Heikki Krogerus FWIW: Reviewed-by: Heikki Krogerus > --- > Documentation/driver-api/device_connection.rst |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- linux-next-20180426.orig/Documentation/driver-api/device_connection.rst > +++ linux-next-20180426/Documentation/driver-api/device_connection.rst > @@ -40,4 +40,4 @@ API > --- > > .. kernel-doc:: drivers/base/devcon.c > - : functions: device_connection_find_match device_connection_find > device_connection_add device_connection_remove > + :functions: device_connection_find_match device_connection_find > device_connection_add device_connection_remove > Thanks, -- heikki -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/15] ARM: dts: dra7-evm: Add wilink8 wlan support
Hi Rob, On Wednesday 25 April 2018 08:17 PM, Rob Herring wrote: > On Wed, Apr 25, 2018 at 7:54 AM, Kishon Vijay Abraham Iwrote: >> From: Hari Nagalla >> >> The wilink module is a combo wireless connectivity sdio >> card based on Texas Instrument's wl18xx solution. It is a >> 4-wire, 1.8V, embedded sdio wlan device with an external >> irq line and is power-controlled by a gpio-based fixed >> regulator. >> >> Add pinmux configuration and IODelay values for MMC4. >> On dra7-evm, MMC4 is used for connecting to wilink module. >> >> IODelay data credits to : Vishal Mahaveer >> and Sekhar Nori >> >> Signed-off-by: Ido Yariv >> Signed-off-by: Eyal Reizer >> Signed-off-by: Hari Nagalla >> Signed-off-by: Sekhar Nori >> Signed-off-by: Kishon Vijay Abraham I >> --- >> arch/arm/boot/dts/dra7-evm-common.dtsi | 15 +++ >> arch/arm/boot/dts/dra7-evm.dts | 25 + >> 2 files changed, 40 insertions(+) >> >> diff --git a/arch/arm/boot/dts/dra7-evm-common.dtsi >> b/arch/arm/boot/dts/dra7-evm-common.dtsi >> index 05a7b1a01bc3..3590c40fc112 100644 >> --- a/arch/arm/boot/dts/dra7-evm-common.dtsi >> +++ b/arch/arm/boot/dts/dra7-evm-common.dtsi >> @@ -260,3 +260,18 @@ >> _rc { >> status = "okay"; >> }; >> + >> + { >> + bus-width = <4>; >> + cap-power-off-card; >> + keep-power-in-suspend; >> + non-removable; >> + #address-cells = <1>; >> + #size-cells = <0>; >> + wlcore: wlcore@2 { > > wifi@2 sure, I'll fix it in the next revision. Thanks Kishon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/15] ARM: dts: dra74x-mmc-iodelay: Add a new pinctrl group for clk line without pullup
Hi Tony, On Wednesday 25 April 2018 07:05 PM, Tony Lindgren wrote: > * Kishon Vijay Abraham I[180425 12:57]: >> --- a/arch/arm/boot/dts/dra74x-mmc-iodelay.dtsi >> +++ b/arch/arm/boot/dts/dra74x-mmc-iodelay.dtsi >> @@ -49,6 +49,17 @@ >> >; >> }; >> >> +mmc1_pins_default_no_clk_pu: mmc1_pins_default_no_clk_pu { >> +pinctrl-single,pins = < >> +DRA7XX_CORE_IOPAD(0x3754, PIN_INPUT_PULLDOWN | >> MUX_MODE0) /* mmc1_clk.clk */ >> +DRA7XX_CORE_IOPAD(0x3758, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_cmd.cmd */ >> +DRA7XX_CORE_IOPAD(0x375c, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_dat0.dat0 */ >> +DRA7XX_CORE_IOPAD(0x3760, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_dat1.dat1 */ >> +DRA7XX_CORE_IOPAD(0x3764, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_dat2.dat2 */ >> +DRA7XX_CORE_IOPAD(0x3768, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_dat3.dat3 */ >> +>; >> +}; >> + >> mmc1_pins_sdr12: mmc1_pins_sdr12 { >> pinctrl-single,pins = < >> DRA7XX_CORE_IOPAD(0x3754, PIN_INPUT_PULLUP | MUX_MODE0) >> /* mmc1_clk.clk */ >> -- > > If this data is the same for all of them, why don't you add something > like dra7-iodelay.dtsi that can be included as needed? okay, I'll add dra7-mmc-iodelay.dtsi and send a new revision. Thanks Kishon -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
This patch adds a documentation for seg_flowlabel sysctl into Documentation/networking/ip-sysctl.txt Signed-off-by: Ahmed Abdelsalam--- Documentation/networking/ip-sysctl.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5dc1a04..7528f71 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER ip6frag_time - INTEGER Time in seconds to keep an IPv6 fragment in memory. +IPv6 Segment Routing: + +seg6_flowlabel - INTEGER + Controls the behaviour of computing the flowlabel of outer + IPv6 header in case of SR T.encaps + + -1 set flowlabel to zero. + 0 copy flowlabel from Inner paceket in case of Inner IPv6 + (Set flowlabel to 0 in case IPv4/L2) + 1 Compute the flowlabel using seg6_make_flowlabel() + + Default is 0. + conf/default/*: Change the interface-specific default settings. -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
On 04/27/2018 12:02 PM, Leo Yan wrote: > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 > for JIT opcode dumping; this patch is to update the doc for it. > > Suggested-by: Daniel Borkmann> Signed-off-by: Leo Yan Applied to bpf-next, thanks Leo! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 for JIT opcode dumping; this patch is to update the doc for it. Suggested-by: Daniel BorkmannSigned-off-by: Leo Yan --- Documentation/networking/filter.txt | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index fd55c7d..5032e12 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -483,6 +483,12 @@ Example output from dmesg: [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and +setting any other value than that will return in failure. This is even the case for +setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log +is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the +generally recommended approach instead. + In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for generating disassembly out of the kernel log's hexdump: -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
On 04/27/2018 11:49 AM, Leo Yan wrote: > On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote: >> On 04/26/2018 04:26 AM, Leo Yan wrote: >>> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for >>> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 >>> for JIT opcode dumping; this patch is to update the doc for it. >>> >>> Signed-off-by: Leo Yan>>> --- >>> Documentation/networking/filter.txt | 6 ++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/Documentation/networking/filter.txt >>> b/Documentation/networking/filter.txt >>> index fd55c7d..feddab9 100644 >>> --- a/Documentation/networking/filter.txt >>> +++ b/Documentation/networking/filter.txt >>> @@ -483,6 +483,12 @@ Example output from dmesg: >>> [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff >>> ff 00 00 >>> [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 >>> >>> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by >>> default >>> +and it returns failure if change to any other value from proc node; this is >>> +for security consideration to avoid leaking info to unprivileged users. In >>> this >>> +case, we can't directly dump JIT opcode image from kernel log, >>> alternatively we >>> +need to use bpf tool for the dumping. >>> + >> >> Could you change this doc text a bit, I think it's slightly misleading. From >> the first >> sentence one could also interpret that value 0 would leaking info to >> unprivileged users >> whereas here we're only talking about the case of value 2. Maybe something >> roughly like >> this to make it more clear: >> >> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently >> set to 1 and >> setting any other value than that will return in failure. This is even the >> case for >> setting bpf_jit_enable to 2, since dumping the final JIT image into the >> kernel log >> is discouraged and introspection through bpftool (under >> tools/bpf/bpftool/) is the >> generally recommended approach instead. > > Yeah, your rephrasing is more clear and better. Will do this and send > new patch soon. Thanks for your helping. Awesome, thank you! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote: > On 04/26/2018 04:26 AM, Leo Yan wrote: > > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for > > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 > > for JIT opcode dumping; this patch is to update the doc for it. > > > > Signed-off-by: Leo Yan> > --- > > Documentation/networking/filter.txt | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/Documentation/networking/filter.txt > > b/Documentation/networking/filter.txt > > index fd55c7d..feddab9 100644 > > --- a/Documentation/networking/filter.txt > > +++ b/Documentation/networking/filter.txt > > @@ -483,6 +483,12 @@ Example output from dmesg: > > [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff > > ff 00 00 > > [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 > > > > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by > > default > > +and it returns failure if change to any other value from proc node; this is > > +for security consideration to avoid leaking info to unprivileged users. In > > this > > +case, we can't directly dump JIT opcode image from kernel log, > > alternatively we > > +need to use bpf tool for the dumping. > > + > > Could you change this doc text a bit, I think it's slightly misleading. From > the first > sentence one could also interpret that value 0 would leaking info to > unprivileged users > whereas here we're only talking about the case of value 2. Maybe something > roughly like > this to make it more clear: > > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set > to 1 and > setting any other value than that will return in failure. This is even the > case for > setting bpf_jit_enable to 2, since dumping the final JIT image into the > kernel log > is discouraged and introspection through bpftool (under tools/bpf/bpftool/) > is the > generally recommended approach instead. Yeah, your rephrasing is more clear and better. Will do this and send new patch soon. Thanks for your helping. > Thanks, > Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
On 04/26/2018 04:26 AM, Leo Yan wrote: > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 > for JIT opcode dumping; this patch is to update the doc for it. > > Signed-off-by: Leo Yan> --- > Documentation/networking/filter.txt | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/Documentation/networking/filter.txt > b/Documentation/networking/filter.txt > index fd55c7d..feddab9 100644 > --- a/Documentation/networking/filter.txt > +++ b/Documentation/networking/filter.txt > @@ -483,6 +483,12 @@ Example output from dmesg: > [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff > 00 00 > [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 > > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by > default > +and it returns failure if change to any other value from proc node; this is > +for security consideration to avoid leaking info to unprivileged users. In > this > +case, we can't directly dump JIT opcode image from kernel log, alternatively > we > +need to use bpf tool for the dumping. > + Could you change this doc text a bit, I think it's slightly misleading. From the first sentence one could also interpret that value 0 would leaking info to unprivileged users whereas here we're only talking about the case of value 2. Maybe something roughly like this to make it more clear: When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and setting any other value than that will return in failure. This is even the case for setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the generally recommended approach instead. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
Hi Kim, On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote: > On Wed, 25 Apr 2018 14:30:47 +0530 > Ganapatrao Kulkarniwrote: > > > +static int thunderx2_uncore_event_init(struct perf_event *event) > This PMU driver can be made more user-friendly by not just silently > returning an error code such as -EINVAL, but by emitting a useful > message describing the specific error via dmesg. As has previously been discussed on several occasions, patches which log to dmesg in a pmu::event_init() path at any level above pr_debug() are not acceptable -- dmesg is not intended as a mechanism to inform users of driver-specific constraints. I would appreciate if in future you could qualify your suggestion with the requirement that pr_debug() is used. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bpf-next v4 00/10] bpf: document eBPF helpers and add a script to generate man page
On 04/25/2018 07:16 PM, Quentin Monnet wrote: > eBPF helper functions can be called from within eBPF programs to perform > a variety of tasks that would be otherwise hard or impossible to do with > eBPF itself. There is a growing number of such helper functions in the > kernel, but documentation is scarce. The main user space header file > does contain a short commented description of most helpers, but it is > somewhat outdated and not complete. It is more a "cheat sheet" than a > real documentation accessible to new eBPF developers. > > This commit attempts to improve the situation by replacing the existing > overview for the helpers with a more developed description. Furthermore, > a Python script is added to generate a manual page for eBPF helpers. The > workflow is the following, and requires the rst2man utility: > > $ ./scripts/bpf_helpers_doc.py \ > --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst > $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7 > $ man /tmp/bpf-helpers.7 > > The objective is to keep all documentation related to the helpers in a > single place, and to be able to generate from here a manual page that > could be packaged in the man-pages repository and shipped with most > distributions. > > Additionally, parsing the prototypes of the helper functions could > hopefully be reused, with a different Printer object, to generate > header files needed in some eBPF-related projects. > > Regarding the description of each helper, it comprises several items: > > - The function prototype. > - A description of the function and of its arguments (except for a > couple of cases, when there are no arguments and the return value > makes the function usage really obvious). > - A description of return values (if not void). > > Additional items such as the list of compatible eBPF program and map > types for each helper, Linux kernel version that introduced the helper, > GPL-only restriction, and commit hash could be added in the future, but > it was decided on the mailing list to leave them aside for now. > > For several helpers, descriptions are inspired (at times, nearly copied) > from the commit logs introducing them in the kernel--Many thanks to > their respective authors! Some sentences were also adapted from comments > from the reviews, thanks to the reviewers as well. Descriptions were > completed as much as possible, the objective being to have something easily > accessible even for people just starting with eBPF. There is probably a bit > more work to do in this direction for some helpers. [...] Applied yesterday night to bpf-next (and now in net-next), thanks Quentin! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html