[PATCH 00/12] Major code reorganization to make all i2c transfers working
The current driver is failing in following test case 1. Handling of failure cases is not working in long run for BAM mode. It generates error message “bam-dma-engine 7884000.dma: Cannot free busy channel” sometimes. 2. Following I2C transfers are failing a. Single transfer with multiple read messages b. Single transfer with multiple read/write message with maximum allowed length per message (65K) in BAM mode c. Single transfer with write greater than 32 bytes in QUP v1 and write greater than 64 bytes in QUP v2 for non-DMA mode. 3. No handling is present for Block/FIFO interrupts. Any non-error interrupts are being treated as the transfer completion and then polling is being done for available/free bytes in FIFO. To fix all these issues, major code changes are required. This patch series fixes all the above issues and makes the driver interrupt based instead of polling based. After these changes, all the mentioned test cases are working properly. The code changes have been tested for QUP v1 (IPQ8064) and QUP v2 (IPQ8074) with sample application written over i2c-dev. Abhishek Sahu (12): i2c: qup: fixed releasing dma without flush operation completion i2c: qup: minor code reorganization for use_dma i2c: qup: remove redundant variables for BAM SG count i2c: qup: schedule EOT and FLUSH tags at the end of transfer i2c: qup: fix the transfer length for BAM rx EOT FLUSH tags i2c: qup: proper error handling for i2c error in BAM mode i2c: qup: use the complete transfer length to choose DMA mode i2c: qup: change completion timeout according to transfer length i2c: qup: fix buffer overflow for multiple msg of maximum xfer len i2c: qup: send NACK for last read sub transfers i2c: qup: reorganization of driver code to remove polling for qup v1 i2c: qup: reorganization of driver code to remove polling for qup v2 drivers/i2c/busses/i2c-qup.c | 1538 +- 1 file changed, 924 insertions(+), 614 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 02/12] i2c: qup: minor code reorganization for use_dma
1. Assigns use_dma in qup_dev structure itself which will help in subsequent patches to determine the mode in IRQ handler. 2. Does minor code reorganization for loops to reduce the unnecessary comparison and assignment. Signed-off-by: Abhishek Sahu --- drivers/i2c/busses/i2c-qup.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c index 9faa26c41a..c68f433 100644 --- a/drivers/i2c/busses/i2c-qup.c +++ b/drivers/i2c/busses/i2c-qup.c @@ -190,6 +190,8 @@ struct qup_i2c_dev { /* dma parameters */ boolis_dma; + /* To check if the current transfer is using DMA */ + booluse_dma; struct dma_pool *dpool; struct qup_i2c_tag start_tag; struct qup_i2c_bam brx; @@ -1297,7 +1299,7 @@ static int qup_i2c_xfer_v2(struct i2c_adapter *adap, int num) { struct qup_i2c_dev *qup = i2c_get_adapdata(adap); - int ret, len, idx = 0, use_dma = 0; + int ret, len, idx = 0; qup->bus_err = 0; qup->qup_err = 0; @@ -1326,13 +1328,12 @@ static int qup_i2c_xfer_v2(struct i2c_adapter *adap, len = (msgs[idx].len > qup->out_fifo_sz) || (msgs[idx].len > qup->in_fifo_sz); - if ((!is_vmalloc_addr(msgs[idx].buf)) && len) { - use_dma = 1; -} else { - use_dma = 0; + if (is_vmalloc_addr(msgs[idx].buf) || !len) break; - } } + + if (idx == num) + qup->use_dma = true; } idx = 0; @@ -1356,15 +1357,17 @@ static int qup_i2c_xfer_v2(struct i2c_adapter *adap, reinit_completion(&qup->xfer); - if (use_dma) { + if (qup->use_dma) { ret = qup_i2c_bam_xfer(adap, &msgs[idx], num); + qup->use_dma = false; + break; } else { if (msgs[idx].flags & I2C_M_RD) ret = qup_i2c_read_one_v2(qup, &msgs[idx]); else ret = qup_i2c_write_one_v2(qup, &msgs[idx]); } - } while ((idx++ < (num - 1)) && !use_dma && !ret); + } while ((idx++ < (num - 1)) && !ret); if (!ret) ret = qup_i2c_change_state(qup, QUP_RESET_STATE); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 01/12] i2c: qup: fixed releasing dma without flush operation completion
The QUP BSLP BAM generates the following error sometimes if the current I2C DMA transfer fails and the flush operation has been scheduled “bam-dma-engine 7884000.dma: Cannot free busy channel” If any I2C error comes during BAM DMA transfer, then the QUP I2C interrupt will be generated and the flush operation will be carried out to make i2c consume all scheduled DMA transfer. Currently, the same completion structure is being used for BAM transfer which has already completed without reinit. It will make flush operation wait_for_completion_timeout completed immediately and will proceed for freeing the DMA resources where the descriptors are still in process. Signed-off-by: Abhishek Sahu --- drivers/i2c/busses/i2c-qup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c index 08f8e01..9faa26c41a 100644 --- a/drivers/i2c/busses/i2c-qup.c +++ b/drivers/i2c/busses/i2c-qup.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009-2013, The Linux Foundation. All rights reserved. + * Copyright (c) 2009-2013, 2016-2018, The Linux Foundation. All rights reserved. * Copyright (c) 2014, Sony Mobile Communications AB. * * @@ -844,6 +844,8 @@ static int qup_i2c_bam_do_xfer(struct qup_i2c_dev *qup, struct i2c_msg *msg, } if (ret || qup->bus_err || qup->qup_err) { + reinit_completion(&qup->xfer); + if (qup_i2c_change_state(qup, QUP_RUN_STATE)) { dev_err(qup->dev, "change to run state timed out"); goto desc_err; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 2/2] HID: core: Fix size as type u32
Hi Aaron, On Mon, Jan 08, 2018 at 10:41:41AM +0800, Aaron Ma wrote: > When size is negative, calling memset will make segment fault. > Declare the size as type u32 to keep memset safe. > > size in struct hid_report is unsigned, fix return type of > hid_report_len to u32. > > Cc: sta...@vger.kernel.org > Signed-off-by: Aaron Ma > --- > drivers/hid/hid-core.c | 10 +- > include/linux/hid.h| 6 +++--- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c > index 0c3f608131cf..cf81c53e3b98 100644 > --- a/drivers/hid/hid-core.c > +++ b/drivers/hid/hid-core.c > @@ -1390,7 +1390,7 @@ u8 *hid_alloc_report_buf(struct hid_report *report, > gfp_t flags) >* of implement() working on 8 byte chunks >*/ > > - int len = hid_report_len(report) + 7; > + u32 len = hid_report_len(report) + 7; > > return kmalloc(len, flags); > } > @@ -1455,7 +1455,7 @@ void __hid_request(struct hid_device *hid, struct > hid_report *report, > { > char *buf; > int ret; > - int len; > + u32 len; > > buf = hid_alloc_report_buf(report, GFP_KERNEL); > if (!buf) > @@ -1481,14 +1481,14 @@ void __hid_request(struct hid_device *hid, struct > hid_report *report, > } > EXPORT_SYMBOL_GPL(__hid_request); > > -int hid_report_raw_event(struct hid_device *hid, int type, u8 *data, int > size, > +int hid_report_raw_event(struct hid_device *hid, int type, u8 *data, u32 > size, > int interrupt) > { > struct hid_report_enum *report_enum = hid->report_enum + type; > struct hid_report *report; > struct hid_driver *hdrv; > unsigned int a; > - int rsize, csize = size; > + u32 rsize, csize = size; > u8 *cdata = data; > int ret = 0; > > @@ -1546,7 +1546,7 @@ EXPORT_SYMBOL_GPL(hid_report_raw_event); > * > * This is data entry for lower layers. > */ > -int hid_input_report(struct hid_device *hid, int type, u8 *data, int size, > int interrupt) > +int hid_input_report(struct hid_device *hid, int type, u8 *data, u32 size, > int interrupt) > { > struct hid_report_enum *report_enum; > struct hid_driver *hdrv; > diff --git a/include/linux/hid.h b/include/linux/hid.h > index d491027a7c22..9bc296eebc98 100644 > --- a/include/linux/hid.h > +++ b/include/linux/hid.h > @@ -841,7 +841,7 @@ extern int hidinput_connect(struct hid_device *hid, > unsigned int force); > extern void hidinput_disconnect(struct hid_device *); > > int hid_set_field(struct hid_field *, unsigned, __s32); > -int hid_input_report(struct hid_device *, int type, u8 *, int, int); > +int hid_input_report(struct hid_device *, int type, u8 *, u32, int); > int hidinput_find_field(struct hid_device *hid, unsigned int type, unsigned > int code, struct hid_field **field); > struct hid_field *hidinput_get_led_field(struct hid_device *hid); > unsigned int hidinput_count_leds(struct hid_device *hid); > @@ -1088,13 +1088,13 @@ static inline void hid_hw_wait(struct hid_device > *hdev) > * > * @report: the report we want to know the length > */ > -static inline int hid_report_len(struct hid_report *report) > +static inline u32 hid_report_len(struct hid_report *report) hid_report_len() is used in several files. If we think it is a good idea to change the return type, we should fix these files as well. [08:47:56]marcus@little:~/git/linux$ git grep -l hid_report_len drivers/hid/hid-core.c drivers/hid/hid-input.c drivers/hid/hid-multitouch.c drivers/hid/hid-rmi.c drivers/hid/usbhid/hid-core.c drivers/hid/wacom_sys.c drivers/staging/greybus/hid.c include/linux/hid.h > { > /* equivalent to DIV_ROUND_UP(report->size, 8) + !!(report->id > 0) */ > return ((report->size - 1) >> 3) + 1 + (report->id > 0); > } > > -int hid_report_raw_event(struct hid_device *hid, int type, u8 *data, int > size, > +int hid_report_raw_event(struct hid_device *hid, int type, u8 *data, u32 > size, > int interrupt); > > /* HID quirks API */ > -- > 2.14.3 Best regards Marcus Folkesson > signature.asc Description: PGP signature
Re: [BUG] x86 : i486 reporting to be vulnerable to Meltdown/Spectre_V1/Spectre_V2
On Fri, 2018-02-02 at 23:52 -0500, tedheadster wrote: > I just tested the 4.15 kernel and it is reporting that my old i486 > (non-cpuid capable) cpu is vulnerable to all three issues: Meltdown, > Spectre V1, and Spectre V2. > > I find this to be _unlikely_. This should be fixed in Linus' tree already by commit fec9434a1 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown"). We'll make sure it ends up in the stable tree too, if it hasn't already. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 1/3] net: stmmac: dwmac-sun8i: drop V3s compatible and add V3 one
于 2018年2月3日 GMT+08:00 上午6:13:01, Maxime Ripard 写到: >On Sat, Feb 03, 2018 at 02:04:54AM +0800, Icenowy Zheng wrote: >> The V3s is just a differently packaged version of the V3 chip, which >has >> a MAC with the same capability with H3. The V3s just doesn't wire out >> the external MII/RMII/RGMII bus. (V3 wired out it). >> >> Drop the compatible string of V3s in the dwmac-sun8i driver, and add >a >> V3 compatible string, which has all capabilities. >> >> Signed-off-by: Icenowy Zheng > >This breaks the DT ABI, so NAK. I have asked this at IRC. The V3s compatible string is never used in any mainline kernel, even not in any RC version. > >Maxime
Re: Coccinelle: zalloc-simple: Checking consistency for SmPL rules
>> * Do we agree that a proper size determination is essential for every >> condition in the discussed SmPL rules together with forwarding >> this information? > > No. I don't mind a few false positives. Do you care to split SmPL rules by their confidence category in such an use case? Regards, Markus
Re: [linux-sunxi] [PATCH 1/3] net: stmmac: dwmac-sun8i: drop V3s compatible and add V3 one
于 2018年2月3日 GMT+08:00 下午2:00:33, Julian Calaby 写到: >Hi Icenowy, > >On Sat, Feb 3, 2018 at 5:04 AM, Icenowy Zheng wrote: >> The V3s is just a differently packaged version of the V3 chip, which >has >> a MAC with the same capability with H3. The V3s just doesn't wire out >> the external MII/RMII/RGMII bus. (V3 wired out it). >> >> Drop the compatible string of V3s in the dwmac-sun8i driver, and add >a >> V3 compatible string, which has all capabilities. > >Aren't compatible strings technically API, so don't we need to support >those that are out in the wild "forever"? > >Therefore shouldn't we leave the v3s variant around for compatibility >with existing device trees? You can run grep at arch/arm/boot/dts, this compatible string is not used at all. > >Thanks,
Re: clang warning: implicit conversion in intel_ddi.c:1481
On Fri, 2018-02-02 at 16:50 +0100, Greg KH wrote: > On Fri, Feb 02, 2018 at 04:37:55PM +0200, Jani Nikula wrote: > > On Fri, 02 Feb 2018, Greg KH wrote: > > > On Fri, Feb 02, 2018 at 12:44:38PM +0200, Jani Nikula wrote: > > >> > > >> +Knut, Fengguang > > >> > > >> On Fri, 02 Feb 2018, Greg KH wrote: > > >> >- If clang now builds the kernel "cleanly", yes, I want to take > > >> > warning fixes in the stable tree. And even better yet, if you > > >> > keep working to ensure the tree is "clean", that would be > > >> > wonderful. > > >> > > >> So we can run sparse using 'make C=1' and friends, or other static > > >> analysis tools using 'make CHECK=foo C=1', as long as the passed command > > >> line params work. There was work by Knut to extend this make checker > > >> stuff [1]. Since mixing different HOSTCC's in a single workdir seems > > >> like a bad idea, I wonder how hard it would be to make clang work like > > >> this: > > >> > > >> $ make CHECK=clang C=1 > > >> > > >> Or using Knut's wrapper. Feels like that could increase the use of clang > > >> for static analysis of patches. > > > > > > Why not just build with clang itself: > > > make CC=clang > > > > Same as HOSTCC, mixing different CC's in a single build dir seems like a > > bad idea. Sure, everyone can setup a separate build dir for clang, but > > IMHO having 'make CHECK=clang C=1' work has least resistance. YMMV. > > "O=some_output_dir" is your friend. If you aren't doing that already > for your test builds, you don't know what you are missing :) I use O= a lot myself - so good not to have all the output files "pollute" the source tree, and to be able to switch branches and compile without having to recompile everything by having multiple O= set up. I think what my runchecks wrapper script brings in addition is the ability to to a number of checks which may or may not pass, even return error codes, from the same 'make' command and configure what errors to fix now and what to postpone/ignore (and thus not fail from). As an example, I just tried clang (on v4.15-rc6) with: cd $HOME/src/kernel make O=$HOME/build/kernel/clang cd $HOME/build/kernel/clang make and it fails to compile for me in arch/x86/xen/mmu_pv.o. If I'd want to just make sure that some patches did not introduce new errors with clang, I would waste some time with unrelated errors, and there will be noise in the output, also consuming personal "cycles". I haven't really looked at the details of much of what clang outputs of errors yet, but I can imagine that specific errors reported by clang might be useful to correct even in old kernels, where some files inevitably will fail to compile like this. This would be easy to handle with runchecks using a few exceptions for those problems/files not yet fixed, allowing a run to easily detect (while compiling with gcc as the main compiler) that no new clang errors were introduced of any other kind than those suppressed. Thanks, Knut > > thanks, > > greg k-h
Re: [PATCH RESEND v3] perf/core: Fix installing cgroup event into cpu
Hi leilei.lin, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/perf/core] [also build test ERROR on v4.15 next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/linxiulei-gmail-com/perf-core-Fix-installing-cgroup-event-into-cpu/20180203-133110 config: i386-randconfig-s0-201804 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): kernel/events/core.c: In function '__perf_install_in_context': >> kernel/events/core.c:2332:10: error: implicit declaration of function >> 'perf_cgroup_from_task' [-Werror=implicit-function-declaration] cgrp = perf_cgroup_from_task(current, ctx); ^ kernel/events/core.c:2332:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion] cgrp = perf_cgroup_from_task(current, ctx); ^ kernel/events/core.c:2333:40: error: dereferencing pointer to incomplete type 'struct perf_cgroup' reprogram = cgroup_is_descendant(cgrp->css.cgroup, ^~ kernel/events/core.c:2334:11: error: 'struct perf_event' has no member named 'cgrp' event->cgrp->css.cgroup); ^~ cc1: some warnings being treated as errors vim +/perf_cgroup_from_task +2332 kernel/events/core.c 2284 2285 /* 2286 * Cross CPU call to install and enable a performance event 2287 * 2288 * Very similar to remote_function() + event_function() but cannot assume that 2289 * things like ctx->is_active and cpuctx->task_ctx are set. 2290 */ 2291 static int __perf_install_in_context(void *info) 2292 { 2293 struct perf_event *event = info; 2294 struct perf_event_context *ctx = event->ctx; 2295 struct perf_cpu_context *cpuctx = __get_cpu_context(ctx); 2296 struct perf_event_context *task_ctx = cpuctx->task_ctx; 2297 struct perf_cgroup *cgrp; 2298 bool reprogram = true; 2299 int ret = 0; 2300 2301 raw_spin_lock(&cpuctx->ctx.lock); 2302 if (ctx->task) { 2303 raw_spin_lock(&ctx->lock); 2304 task_ctx = ctx; 2305 2306 reprogram = (ctx->task == current); 2307 2308 /* 2309 * If the task is running, it must be running on this CPU, 2310 * otherwise we cannot reprogram things. 2311 * 2312 * If its not running, we don't care, ctx->lock will 2313 * serialize against it becoming runnable. 2314 */ 2315 if (task_curr(ctx->task) && !reprogram) { 2316 ret = -ESRCH; 2317 goto unlock; 2318 } 2319 2320 WARN_ON_ONCE(reprogram && cpuctx->task_ctx && cpuctx->task_ctx != ctx); 2321 } else if (task_ctx) { 2322 raw_spin_lock(&task_ctx->lock); 2323 } 2324 2325 if (is_cgroup_event(event)) { 2326 /* 2327 * Only care about cgroup events. 2328 * 2329 * If only the task belongs to cgroup of this event, 2330 * we will continue the installment 2331 */ > 2332 cgrp = perf_cgroup_from_task(current, ctx); 2333 reprogram = cgroup_is_descendant(cgrp->css.cgroup, 2334 event->cgrp->css.cgroup); 2335 } 2336 2337 if (reprogram) { 2338 ctx_sched_out(ctx, cpuctx, EVENT_TIME); 2339 add_event_to_ctx(event, ctx); 2340 ctx_resched(cpuctx, task_ctx, get_event_type(event)); 2341 } else { 2342 add_event_to_ctx(event, ctx); 2343 } 2344 2345 unlock: 2346 perf_ctx_unlock(cpuctx, task_ctx); 2347 2348 return ret; 2349 } 2350 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] media: cx25821: prevent out-of-bounds read on array card
Hi Colin, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linuxtv-media/master] [also build test WARNING on v4.15 next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Colin-King/media-cx25821-prevent-out-of-bounds-read-on-array-card/20180203-130958 base: git://linuxtv.org/media_tree.git master config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): In file included from include/linux/printk.h:7:0, from include/linux/kernel.h:14, from include/linux/list.h:9, from include/linux/kobject.h:20, from include/linux/device.h:17, from include/linux/i2c.h:30, from drivers/media/pci/cx25821/cx25821-core.c:22: drivers/media/pci/cx25821/cx25821-core.c: In function 'cx25821_dev_setup': >> include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of >> type 'long int', but argument 3 has type 'unsigned int' [-Wformat=] #define KERN_SOH "\001" /* ASCII Start Of Header */ ^ include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH' #define KERN_INFO KERN_SOH "6" /* informational */ ^~~~ include/linux/printk.h:308:9: note: in expansion of macro 'KERN_INFO' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^ >> drivers/media/pci/cx25821/cx25821.h:380:2: note: in expansion of macro >> 'pr_info' pr_info("(%d): " fmt, dev->board, ##args) ^~~ >> drivers/media/pci/cx25821/cx25821-core.c:871:3: note: in expansion of macro >> 'CX25821_INFO' CX25821_INFO("dev->nr >= %ld", ARRAY_SIZE(card)); ^~~~ vim +/pr_info +380 drivers/media/pci/cx25821/cx25821.h 02b20b0b drivers/staging/cx25821/cx25821.h Mauro Carvalho Chehab 2009-09-15 374 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 375 #define CX25821_ERR(fmt, args...) \ 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 376pr_err("(%d): " fmt, dev->board, ##args) 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 377 #define CX25821_WARN(fmt, args...)\ 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 378pr_warn("(%d): " fmt, dev->board, ##args) 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 379 #define CX25821_INFO(fmt, args...)\ 36d89f7d drivers/staging/cx25821/cx25821.h Joe Perches 2010-11-07 @380pr_info("(%d): " fmt, dev->board, ##args) 02b20b0b drivers/staging/cx25821/cx25821.h Mauro Carvalho Chehab 2009-09-15 381 :: The code at line 380 was first introduced by commit :: 36d89f7de4a4937848de86d9b35cb03a9f0357e1 [media] drivers/staging/cx25821: Use pr_fmt and pr_ :: TO: Joe Perches :: CC: Mauro Carvalho Chehab --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [linux-sunxi] [PATCH 1/3] net: stmmac: dwmac-sun8i: drop V3s compatible and add V3 one
Hi Icenowy, On Sat, Feb 3, 2018 at 5:04 AM, Icenowy Zheng wrote: > The V3s is just a differently packaged version of the V3 chip, which has > a MAC with the same capability with H3. The V3s just doesn't wire out > the external MII/RMII/RGMII bus. (V3 wired out it). > > Drop the compatible string of V3s in the dwmac-sun8i driver, and add a > V3 compatible string, which has all capabilities. Aren't compatible strings technically API, so don't we need to support those that are out in the wild "forever"? Therefore shouldn't we leave the v3s variant around for compatibility with existing device trees? Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/
Re: [PATCH AUTOSEL for 3.18 36/40] powerpc/xmon: Avoid tripping SMP hardlockup watchdog
On Tue, 30 Jan 2018 15:35:54 +1100 Michael Ellerman wrote: > alexander.le...@verizon.com writes: > > > On Thu, Dec 14, 2017 at 12:10:39AM +1100, Michael Ellerman wrote: > >>alexander.le...@verizon.com writes: > >> > >>> From: Nicholas Piggin > >>> > >>> [ Upstream commit 064996d62a33ffe10264b5af5dca92d54f60f806 ] > >>> > >>> The SMP hardlockup watchdog cross-checks other CPUs for lockups, which > >>> causes xmon headaches because it's assuming interrupts hard disabled > >>> means no watchdog troubles. Try to improve that by calling > >>> touch_nmi_watchdog() in obvious places where secondaries are spinning. > >>> > >>> Also annotate these spin loops with spin_begin/end calls. > >> > >>These macros didn't exist until 4.13, and haven't been backported AFAIK. > > > > But the touch_nmi_watchdog() bits are something we want in stable, right? > > I don't think you need them unless you've also back ported > arch/powerpc/kernel/watchdog.c, which I don't think you have. > > Maybe Nick can confirm? I'm not 100% sure. The CPUs only check themselves for lockups. They will blow their threshold when in xmon, but when they come out of xmon, I think by a quirk of our local_irq_enable() implementation that actually checks timers explicitly and runs them first before re-enabling hard interrupts, then our heartbeat starts up again just before the perf interrupt would come in to report the lockup. I think. Given that we've had no reports of misbehaviour of the old perf watchdog, I would say you can skip the backport. Thanks, Nick
Re: [PATCH RESEND v3] perf/core: Fix installing cgroup event into cpu
Hi leilei.lin, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/perf/core] [also build test ERROR on v4.15 next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/linxiulei-gmail-com/perf-core-Fix-installing-cgroup-event-into-cpu/20180203-133110 config: i386-randconfig-x071-201804 (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): kernel/events/core.c: In function '__perf_install_in_context': >> kernel/events/core.c:2332:10: error: implicit declaration of function >> 'perf_cgroup_from_task'; did you mean 'perf_cgroup_match'? >> [-Werror=implicit-function-declaration] cgrp = perf_cgroup_from_task(current, ctx); ^ perf_cgroup_match kernel/events/core.c:2332:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion] cgrp = perf_cgroup_from_task(current, ctx); ^ >> kernel/events/core.c:2333:40: error: dereferencing pointer to incomplete >> type 'struct perf_cgroup' reprogram = cgroup_is_descendant(cgrp->css.cgroup, ^~ >> kernel/events/core.c:2334:11: error: 'struct perf_event' has no member named >> 'cgrp' event->cgrp->css.cgroup); ^~ cc1: some warnings being treated as errors vim +2332 kernel/events/core.c 2284 2285 /* 2286 * Cross CPU call to install and enable a performance event 2287 * 2288 * Very similar to remote_function() + event_function() but cannot assume that 2289 * things like ctx->is_active and cpuctx->task_ctx are set. 2290 */ 2291 static int __perf_install_in_context(void *info) 2292 { 2293 struct perf_event *event = info; 2294 struct perf_event_context *ctx = event->ctx; 2295 struct perf_cpu_context *cpuctx = __get_cpu_context(ctx); 2296 struct perf_event_context *task_ctx = cpuctx->task_ctx; 2297 struct perf_cgroup *cgrp; 2298 bool reprogram = true; 2299 int ret = 0; 2300 2301 raw_spin_lock(&cpuctx->ctx.lock); 2302 if (ctx->task) { 2303 raw_spin_lock(&ctx->lock); 2304 task_ctx = ctx; 2305 2306 reprogram = (ctx->task == current); 2307 2308 /* 2309 * If the task is running, it must be running on this CPU, 2310 * otherwise we cannot reprogram things. 2311 * 2312 * If its not running, we don't care, ctx->lock will 2313 * serialize against it becoming runnable. 2314 */ 2315 if (task_curr(ctx->task) && !reprogram) { 2316 ret = -ESRCH; 2317 goto unlock; 2318 } 2319 2320 WARN_ON_ONCE(reprogram && cpuctx->task_ctx && cpuctx->task_ctx != ctx); 2321 } else if (task_ctx) { 2322 raw_spin_lock(&task_ctx->lock); 2323 } 2324 2325 if (is_cgroup_event(event)) { 2326 /* 2327 * Only care about cgroup events. 2328 * 2329 * If only the task belongs to cgroup of this event, 2330 * we will continue the installment 2331 */ > 2332 cgrp = perf_cgroup_from_task(current, ctx); > 2333 reprogram = cgroup_is_descendant(cgrp->css.cgroup, > 2334 event->cgrp->css.cgroup); 2335 } 2336 2337 if (reprogram) { 2338 ctx_sched_out(ctx, cpuctx, EVENT_TIME); 2339 add_event_to_ctx(event, ctx); 2340 ctx_resched(cpuctx, task_ctx, get_event_type(event)); 2341 } else { 2342 add_event_to_ctx(event, ctx); 2343 } 2344 2345 unlock: 2346 perf_ctx_unlock(cpuctx, task_ctx); 2347 2348 return ret; 2349 } 2350 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH] audit: update bugtracker and source URIs
Since the Linux Audit project has transitioned completely over to github, update the MAINTAINERS file and the primary audit source file to reflect that reality. Signed-off-by: Richard Guy Briggs --- MAINTAINERS| 1 - kernel/audit.c | 3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 845fc25..fba4875 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2479,7 +2479,6 @@ M:Paul Moore M: Eric Paris L: linux-au...@redhat.com (moderated for non-subscribers) W: https://github.com/linux-audit -W: https://people.redhat.com/sgrubb/audit T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git S: Supported F: include/linux/audit.h diff --git a/kernel/audit.c b/kernel/audit.c index 227db99..5c25449 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -38,7 +38,8 @@ * 6) Support low-overhead kernel-based filtering to minimize the * information that must be passed to user-space. * - * Example user-space utilities: http://people.redhat.com/sgrubb/audit/ + * Audit userspace, documentation, tests, and bug/issue trackers: + * https://github.com/linux-audit */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -- 1.8.3.1
[BUG] x86 : i486 reporting to be vulnerable to Meltdown/Spectre_V1/Spectre_V2
I just tested the 4.15 kernel and it is reporting that my old i486 (non-cpuid capable) cpu is vulnerable to all three issues: Meltdown, Spectre V1, and Spectre V2. I find this to be _unlikely_. /sys/devices/system/cpu/vulnerabilities/* reports the following: meltdown: "Vulnerable" spectre_v1: "Vulnerable" spectre_v2: "Vulnerable: Minimal generic ASM retpoline" The output of dmesg includes: "Spectre V2 mitigation: Vulnerable: Minimal generic ASM retpoline" "Spectre V2 mitigation: Filling RSB on context switch" Also, /proc/cpuinfo reports the following: cpuid level: -1 flags: fpu retpoline rsb_ctxsw bugs: cpu_meltdown spectre_v1 spectre_v2 I have the hardware to test on. Send me your patches. - Matthew Whitehead
Re: [PATCH] net: mlx5: remove pointless memcpy
On 02/02/2018 12:26 PM, Arnd Bergmann wrote: > On Fri, Feb 2, 2018 at 8:06 PM, Jason Gunthorpe wrote: >> On Fri, Feb 02, 2018 at 04:46:30PM +0100, Arnd Bergmann wrote: >>> gcc-8 notices that the memcpy in mlx5_core_query_xsrq() makes no >>> sense because the source and destination variables are identical: >>> >>> drivers/net/ethernet/mellanox/mlx5/core/transobj.c: In function >>> 'mlx5_core_query_xsrq': >>> drivers/net/ethernet/mellanox/mlx5/core/transobj.c:347:3: error: 'memcpy' >>> source argument is the same as destination [-Werror=restrict] >>> >>> Either one of the pointers should be something else, or the code is >>> completely bogus. Removing the memcpy() won't change the behavior >>> but gets rid of the warning. >>> >>> Fixes: 01949d0109ee ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > >>> 0") >>> Signed-off-by: Arnd Bergmann >>> Please review carefully, I have no idea what the author actually >>> intended here. >> >> I think they intended to adjust the command return between >> mlx5_ifc_query_srq_out_bits and mlx5_ifc_query_xrc_srq_out_bits? >> >>> index 9e38343a951f..75450f7d53bf 100644 >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/transobj.c >>> @@ -332,20 +332,12 @@ int mlx5_core_destroy_xsrq(struct mlx5_core_dev *dev, >>> u32 xsrqn) >>> int mlx5_core_query_xsrq(struct mlx5_core_dev *dev, u32 xsrqn, u32 *out) >>> { >>> u32 in[MLX5_ST_SZ_DW(query_xrc_srq_in)] = {0}; >>> - void *srqc; >>> - void *xrc_srqc; >>> int err; >>> >>> MLX5_SET(query_xrc_srq_in, in, opcode, MLX5_CMD_OP_QUERY_XRC_SRQ); >>> MLX5_SET(query_xrc_srq_in, in, xrc_srqn, xsrqn); >>> err = mlx5_cmd_exec(dev, in, sizeof(in), out, >>> MLX5_ST_SZ_BYTES(query_xrc_srq_out)); >>> - if (!err) { >>> - xrc_srqc = MLX5_ADDR_OF(query_xrc_srq_out, out, >>> - xrc_srq_context_entry); >>> - srqc = MLX5_ADDR_OF(query_srq_out, out, srq_context_entry); >>> - memcpy(srqc, xrc_srqc, MLX5_ST_SZ_BYTES(srqc)); >>> - } OMG! >> >> Probably should add a >> >> BUILD_BUG_ON(MLX5_BYTE_OFF(query_xrc_srq_out, xrc_srq_context_entry) == >> MLX5_BYTE_OFF(query_srq_out, srq_context_entry)); >> >> Just for clarity that the SRQ and XRC_SRQ are being used interchangeably. >> >> and the 'err' variable can be eliminated. >> >> Curious though that I can't find a call site for it, and removing the >> prototype doesn't break the build.. Seems like dead code. > > I checked the git history and don't see any user ever added after the function > first showed up in the kernel, same for a couple of other functions from > commit 01949d0109ee ("net/mlx5_core: Enable XRCs and SRQs when > using ISSI > 0"). > > Can you come up with a proper patch for this isse, either removing the > dead code, or fixing it appropriately? You clearly understand what this > file is about, and I don't ;-) Simply this is just pointless dead code, will remove it, there is no point of trying to figure out what the author was thinking the day he wrote that patch :) Thank you Arnd for spotting this. > > Arnd >
Re: [PATCH 4.15 00/55] 4.15.1-stable review
On Fri, Feb 02, 2018 at 05:58:18PM +0100, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.15.1 release. > There are 55 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sun Feb 4 14:07:50 UTC 2018. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.1-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.15.y > and the diffstat can be found below. Results from Linaro’s test farm. No regressions since 4.15 release, but you'll notice high failure counts in kselftest. These are because it was the first RC and I ran the tests multiple times - first without a skipfile, and then again with a partial skipfile. All of the failures look like known issues that we also saw on 4.15 release. Summary kernel: 4.15.1-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.15.y git commit: b01b3d9519f250398695c7cc6493ba1e8fb072f4 git describe: v4.15-56-gb01b3d9519f2 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.15-oe/build/v4.15-56-gb01b3d9519f2 No regressions (compared to build ) Boards, architectures and test suites: - hi6220-hikey - arm64 * boot - pass: 38, * kselftest - pass: 98, skip: 14, fail: 12 * libhugetlbfs - pass: 180, skip: 2, * ltp-cap_bounds-tests - pass: 4, * ltp-containers-tests - pass: 128, * ltp-fcntl-locktests-tests - pass: 4, * ltp-filecaps-tests - pass: 4, * ltp-fs-tests - pass: 120, * ltp-fs_bind-tests - pass: 4, * ltp-fs_perms_simple-tests - pass: 38, * ltp-fsx-tests - pass: 4, * ltp-hugetlb-tests - pass: 21, skip: 1, * ltp-io-tests - pass: 6, * ltp-ipc-tests - pass: 18, * ltp-math-tests - pass: 22, * ltp-nptl-tests - pass: 4, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 20, * ltp-securebits-tests - pass: 8, * ltp-syscalls-tests - pass: 1968, skip: 242, * ltp-timers-tests - pass: 24, juno-r2 - arm64 * boot - pass: 31, * kselftest - pass: 111, skip: 28, fail: 12 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 4, * ltp-containers-tests - pass: 128, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 4, * ltp-fs-tests - pass: 120, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 38, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 44, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 18, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 4, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 987, skip: 121, * ltp-timers-tests - pass: 24, x15 - arm * boot - pass: 41, * kselftest - pass: 92, skip: 32, fail: 15 * libhugetlbfs - pass: 174, skip: 2, * ltp-cap_bounds-tests - pass: 4, * ltp-containers-tests - pass: 124, fail: 4 * ltp-fcntl-locktests-tests - pass: 4, * ltp-filecaps-tests - pass: 4, * ltp-fs-tests - pass: 120, * ltp-fs_bind-tests - pass: 4, * ltp-fs_perms_simple-tests - pass: 38, * ltp-fsx-tests - pass: 4, * ltp-hugetlb-tests - pass: 40, skip: 4, * ltp-io-tests - pass: 6, * ltp-ipc-tests - pass: 18, * ltp-math-tests - pass: 22, * ltp-nptl-tests - pass: 4, * ltp-pty-tests - pass: 8, * ltp-sched-tests - pass: 26, skip: 2, * ltp-securebits-tests - pass: 8, * ltp-syscalls-tests - pass: 2076, skip: 132, * ltp-timers-tests - pass: 24, x86_64 * boot - pass: 40, * kselftest - pass: 121, skip: 16, fail: 14 * libhugetlbfs - pass: 180, skip: 2, * ltp-cap_bounds-tests - pass: 4, * ltp-containers-tests - pass: 128, * ltp-fcntl-locktests-tests - pass: 4, * ltp-filecaps-tests - pass: 4, * ltp-fs-tests - pass: 122, skip: 2, * ltp-fs_bind-tests - pass: 4, * ltp-fs_perms_simple-tests - pass: 38, * ltp-fsx-tests - pass: 4, * ltp-hugetlb-tests - pass: 44, * ltp-io-tests - pass: 6, * ltp-ipc-tests - pass: 18, * ltp-math-tests - pass: 22, * ltp-nptl-tests - pass: 4, * ltp-pty-tests - pass: 8, * ltp-sched-tests - pass: 18, skip: 2, * ltp-securebits-tests - pass: 8, * ltp-syscalls-tests - pass: 2032, skip: 232, * ltp-timers-tests - pass: 24, -- Linaro QA (beta) https://qa-reports.linaro.org
Re: [PATCH 2/2] HID: i2c-hid: Fix resume issue on Raydium touchscreen device
Hi Could anyone review an apply this single patch? The 2nd patch had been sent as v2. Regards, Aaron
Re: [PATCH v2 2/2] HID: core: Fix size as type u32
Hi: Could anyone review and apply these 2 patch? Regards, Aaron
Re: [PATCH] of: cache phandle nodes to decrease cost of of_find_node_by_phandle()
On 02/01/18 21:53, Chintan Pandya wrote: > > > On 2/2/2018 2:39 AM, Frank Rowand wrote: >> On 02/01/18 06:24, Rob Herring wrote: >>> And so >>> far, no one has explained why a bigger cache got slower. >> >> Yes, I still find that surprising. > > I thought a bit about this. And realized that increasing the cache size > should help improve the performance only if there are too many misses with > the smaller cache. So, from my experiments some time back, I looked up the > logs and saw the access pattern. Seems like, there is *not_too_much* juggling > during look up by phandles. > > See the access pattern here: > https://drive.google.com/file/d/1qfAD8OsswNJABgAwjJf6Gr_JZMeK7rLV/view?usp=sharing Thanks! Very interesting. I was somewhat limited at playing detective with this, because the phandle values are not consistent with the dts file you are currently working with (arch/arm64/boot/dts/qcom/sda670-mtp.dts). For example, I could not determine what the target nodes for the hot phandle values. That information _could_ possibly point at algorithms within the devicetree core code that could be improved. Or maybe not. Hard to tell until actually looking at the data. Anyway, some observations were possible. There are 485 unique phandle values searched for. The ten phandle values most frequently referenced account for 3932 / 6745 (or 58%) of all references. Without the corresponding devicetree I can not tell how many nodes need to be scanned to locate each of these ten values (using the existing algorithm). Thus I can not determine how much scanning would be eliminated by caching just the nodes corresponding to these ten phandle values. There are 89 phandle values that were searched for 10 times or more, accounting for 86% of the searches. Only 164 phandle values were searched for just one time. 303 phandle values were searched for just one or two times. Here is a more complete picture: 10 values each used 100 or more times; searches: 3932 58% 11 values each used 90 or more times; searches: 3994 59% 12 values each used 80 or more times; searches: 4045 60% 13 values each used 70 or more times; searches: 4093 61% 14 values each used 60 or more times; searches: 4136 61% 15 values each used 50 or more times; searches: 4178 62% 18 values each used 40 or more times; searches: 4300 64% 32 values each used 30 or more times; searches: 4774 71% 54 values each used 20 or more times; searches: 5293 78% 89 values each used 10 or more times; searches: 5791 86% 93 values each used 9 or more times; searches: 5827 86% 117 values each used 8 or more times; searches: 6019 89% 122 values each used 7 or more times; searches: 6054 90% 132 values each used 6 or more times; searches: 6114 91% 144 values each used 5 or more times; searches: 6174 92% 162 values each used 4 or more times; searches: 6246 93% 181 values each used 3 or more times; searches: 6303 93% 320 values each used 2 or more times; searches: 6581 98% 484 values each used 1 or more times; searches: 6746 100% A single system does not prove anything. It is possible that other devicetrees would exhibit similarly long tailed behavior, but that is just wild speculation on my part. _If_ the long tail is representative of other systems, then identifying a few hot spots could be useful, but fixing them is not likely to significantly reduce the overhead of calls to of_find_node_by_phandle(). Some method of reducing the overhead of each call would be the answer for a system of this class. > Sample log is pasted below where number in the last is phandle value. > Line 8853: [ 37.425405] OF: want to search this 262 > Line 8854: [ 37.425453] OF: want to search this 262 > Line 8855: [ 37.425499] OF: want to search this 262 > Line 8856: [ 37.425549] OF: want to search this 15 > Line 8857: [ 37.425599] OF: want to search this 5 > Line 8858: [ 37.429989] OF: want to search this 253 > Line 8859: [ 37.430058] OF: want to search this 253 > Line 8860: [ 37.430217] OF: want to search this 253 > Line 8861: [ 37.430278] OF: want to search this 253 > Line 8862: [ 37.430337] OF: want to search this 253 > Line 8863: [ 37.430399] OF: want to search this 254 > Line 8864: [ 37.430597] OF: want to search this 254 > Line 8865: [ 37.430656] OF: want to search this 254 > > > Above explains why results with cache size 64 and 128 have almost similar > results. Now, for cache size 256 we have degrading performance. I don't have > a good theory here but I'm assuming that by making large SW cache, we miss > the benefits of real HW cache which is typically smaller than our array size. > Also, in my set up, I've set max_cpu=1 to reduce the variance. That again, > should affect the cache holding pattern in HW and affect the perf numbers. > > > Chintan
Re: [RESEND RFC PATCH V3] sched: Improve scalability of select_idle_sibling using SMT balance
On Fri, 2018-02-02 at 13:34 -0500, Steven Sistare wrote: > On 2/2/2018 12:39 PM, Steven Sistare wrote: > > On 2/2/2018 12:21 PM, Peter Zijlstra wrote: > >> On Fri, Feb 02, 2018 at 11:53:40AM -0500, Steven Sistare wrote: > >>> It might be interesting to add a tunable for the number of random choices > >>> to > >>> make, and clamp it at the max nr computed from avg_cost in > >>> select_idle_cpu. > >> > >> This needs a fairly complicated PRNG for it would need to visit each > >> possible CPU once before looping. A LFSR does that, but requires 2^n-1 > >> elements and we have topology masks that don't match that.. The trivial > >> example is something with 6 cores. > > > > Or keep it simple and accept the possibility of choosing the same candidate > > more than once. > > > >>> Or, choose a random starting point and then search for nr sequential > >>> candidates; possibly limited by a tunable. > >> > >> And this is basically what we already do. Except with the task-cpu > >> instead of a per-cpu rotor. > > > > Righto. Disregard this suggestion. > > Actually, I take back my take back. I suspect the primary benefit > of random selection is that it breaks up resonance states where > CPUs that are busy tend to stay busy, and CPUs that are idle tend > to stay idle, which is reinforced by starting the search at target = > last cpu ran. I suspect the primary benefit is reduction of bouncing. The absolutely maddening thing about SIS is that some stuff out there (like FB's load) doesn't give a rats ass about anything other than absolute minimum sched latency while other stuff notices cache going missing. Joy. -Mike
Re: [PATCH v2 5/6] arm64: Detect current view of GIC priorities
Hi, Julien On 2018/1/17 19:54, Julien Thierry wrote: The values non secure EL1 needs to use for priority registers depends on the value of SCR_EL3.FIQ. Since we don't have access to SCR_EL3, we fake an interrupt and compare the GIC priority with the one present in the [re]distributor. Also, add firmware requirements related to SCR_EL3. Signed-off-by: Julien Thierry Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Jason Cooper Cc: Marc Zyngier --- Documentation/arm64/booting.txt | 5 +++ arch/arm64/include/asm/arch_gicv3.h | 5 +++ arch/arm64/include/asm/irqflags.h | 6 +++ arch/arm64/include/asm/sysreg.h | 1 + drivers/irqchip/irq-gic-v3.c| 86 + 5 files changed, 103 insertions(+) diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt index 8d0df62..e387938 100644 --- a/Documentation/arm64/booting.txt +++ b/Documentation/arm64/booting.txt @@ -188,6 +188,11 @@ Before jumping into the kernel, the following conditions must be met: the kernel image will be entered must be initialised by software at a higher exception level to prevent execution in an UNKNOWN state. + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is +executing on. + - The value of SCR_EL3.FIQ must be the same as the one present at boot +time whenever the kernel is executing. + For systems with a GICv3 interrupt controller to be used in v3 mode: - If EL3 is present: ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h index 490bb3a..ac7b7f6 100644 --- a/arch/arm64/include/asm/arch_gicv3.h +++ b/arch/arm64/include/asm/arch_gicv3.h @@ -124,6 +124,11 @@ static inline void gic_write_bpr1(u32 val) write_sysreg_s(val, SYS_ICC_BPR1_EL1); } +static inline u32 gic_read_rpr(void) +{ + return read_sysreg_s(SYS_ICC_RPR_EL1); +} + #define gic_read_typer(c) readq_relaxed(c) #define gic_write_irouter(v, c) writeq_relaxed(v, c) #define gic_read_lpir(c) readq_relaxed(c) diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 3d5d443..d25e7ee 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -217,6 +217,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) !(ARCH_FLAGS_GET_PMR(flags) & ICC_PMR_EL1_EN_BIT); } +/* Mask IRQs at CPU level instead of GIC level */ +static inline void arch_irqs_daif_disable(void) +{ + asm volatile ("msr daifset, #2" : : : "memory"); +} + void maybe_switch_to_sysreg_gic_cpuif(void); #endif /* CONFIG_IRQFLAGS_GIC_MASKING */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 08cc885..46fa869 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -304,6 +304,7 @@ #define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5) #define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6) #define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7) +#define SYS_ICC_RPR_EL1sys_reg(3, 0, 12, 11, 3) #define SYS_CONTEXTIDR_EL1sys_reg(3, 0, 13, 0, 1) #define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index df51d96..58b5e89 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -63,6 +63,10 @@ struct gic_chip_data { static struct gic_chip_data gic_data __read_mostly; static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +DEFINE_STATIC_KEY_FALSE(have_non_secure_prio_view); +#endif + static struct gic_kvm_info gic_v3_kvm_info; static DEFINE_PER_CPU(bool, has_rss); @@ -997,6 +1001,84 @@ static int partition_domain_translate(struct irq_domain *d, .select = gic_irq_domain_select, }; +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +/* + * The behaviours of RPR and PMR registers differ depending on the value of + * SCR_EL3.FIQ, while the behaviour of priority registers of the distributor + * and redistributors is always the same. + * + * If SCR_EL3.FIQ == 1, the values used for RPR and PMR are the same as the ones + * programmed in the distributor and redistributors registers. + * + * Otherwise, the value presented by RPR as well as the value which will be + * compared against PMR is: (GIC_(R)DIST_PRI[irq] >> 1) | 0x80; + * + * see GICv3/GICv4 Architecture Specification (IHI0069D): + * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt + * priorities. + * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1 + * interrupt. + */ I think we can use write/read PMR to check if SCR_EL3.FIQ == 1. Like this: gic_write_pmr(0xf0); if (gic_read_pmr() == 0xf0)// if SCR_EL3.FIQ ==
[PATCH 2/2] f2fs: add GC_WRITTEN_PAGE to gc atomic file
This patch enables to gc atomic file by adding GC_WRITTEN_PAGE to identify the gced pages of atomic file, which can avoid register_inmem_page in set_page_dirty, so the gced pages will not mix with the inmem pages. Signed-off-by: Yunlong Song --- fs/f2fs/data.c| 7 ++- fs/f2fs/gc.c | 25 ++--- fs/f2fs/segment.h | 3 +++ 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index edafcb6..5e1fc5d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -120,6 +120,10 @@ static void f2fs_write_end_io(struct bio *bio) dec_page_count(sbi, type); clear_cold_data(page); + if (IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, 0); + ClearPagePrivate(page); + } end_page_writeback(page); } if (!get_pages(sbi, F2FS_WB_CP_DATA) && @@ -2418,7 +2422,8 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); - if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { + if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode) + && !IS_GC_WRITTEN_PAGE(page)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { register_inmem_page(inode, page); return 1; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 84ab3ff..9d54ddb 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,10 +622,6 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; - if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); goto out; @@ -680,6 +676,12 @@ static void move_data_block(struct inode *inode, block_t bidx, goto put_page_out; } + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(fio.encrypted_page)) { + set_page_private(fio.encrypted_page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(fio.encrypted_page); + } set_page_dirty(fio.encrypted_page); f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true); if (clear_page_dirty_for_io(fio.encrypted_page)) @@ -730,9 +732,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode) && - !f2fs_is_commit_atomic_write(inode)) - goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); @@ -742,6 +741,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (gc_type == BG_GC) { if (PageWriteback(page)) goto out; + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); set_cold_data(page); } else { @@ -762,6 +767,12 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, int err; retry: + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode) && + !IS_GC_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)GC_WRITTEN_PAGE); + SetPagePrivate(page); + } set_page_dirty(page); f2fs_wait_on_page_writeback(page, DATA, true); if (clear_page_dirty_for_io(page)) { diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index f11c4bc..f0a6432 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -203,11 +203,14 @@ struct segment_allocation { */ #define ATOMIC_WRITTEN_PAGE((unsigned long)-1) #define DUMMY_WRITTEN_PAGE ((unsigned long)-2) +#define GC_WRITTEN_PAGE((unsigned long)-3) #define IS_ATOMIC_WRITTEN_PAGE(page) \ (page_private(page) == (unsigned long)ATOMIC_WRITTEN_PAGE) #define IS_DUMMY_WRITTEN_PAGE(page)\ (page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE) +#define IS_GC_WRITTEN_PAGE(page) \ + (page_private(page) == (unsigned long)GC_WRITTEN_PAGE) struct inm
[PATCH 1/2] f2fs: enable to gc page whose inode already atomic commit
If inode has already started to atomic commit, then set_page_dirty will not mix the gc pages with the inmem atomic pages, so the page can be gced safely. Signed-off-by: Yunlong Song --- fs/f2fs/data.c | 5 ++--- fs/f2fs/gc.c | 6 -- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7435830..edafcb6 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1580,14 +1580,13 @@ bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) return true; if (S_ISDIR(inode->i_mode)) return true; - if (f2fs_is_atomic_file(inode)) - return true; if (fio) { if (is_cold_data(fio->page)) return true; if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) return true; - } + } else if (f2fs_is_atomic_file(inode)) + return true; return false; } diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index b9d93fd..84ab3ff 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -622,7 +622,8 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { @@ -729,7 +730,8 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode) && + !f2fs_is_commit_atomic_write(inode)) goto out; if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) -- 1.8.5.2
Re: [PATCH v4] Fix loading of module radeonfb on PowerMac
Hi Mathieu, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.15 next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mathieu-Malaterre/Fix-loading-of-module-radeonfb-on-PowerMac/20180203-085907 config: x86_64-randconfig-x009-201804 (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from drivers/video/fbdev/aty/radeon_base.c:91:0: >> drivers/video/fbdev/aty/../edid.h:21:0: warning: "EDID_LENGTH" redefined #define EDID_LENGTH0x80 In file included from include/drm/drm_crtc.h:44:0, from include/drm/drm_fb_helper.h:35, from drivers/video/fbdev/aty/radeon_base.c:73: include/drm/drm_edid.h:32:0: note: this is the location of the previous definition #define EDID_LENGTH 128 Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64 Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64 Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order Cyclomatic Complexity 1 include/linux/string.h:strnlen Cyclomatic Complexity 4 include/linux/string.h:strlen Cyclomatic Complexity 6 include/linux/string.h:strlcpy Cyclomatic Complexity 4 include/linux/string.h:memcpy Cyclomatic Complexity 1 arch/x86/include/asm/paravirt.h:arch_local_irq_disable Cyclomatic Complexity 1 arch/x86/include/asm/paravirt.h:arch_local_irq_enable Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore Cyclomatic Complexity 1 include/linux/jiffies.h:_msecs_to_jiffies Cyclomatic Complexity 3 include/linux/jiffies.h:msecs_to_jiffies Cyclomatic Complexity 1 arch/x86/include/asm/io.h:readb Cyclomatic Complexity 1 arch/x86/include/asm/io.h:readw Cyclomatic Complexity 1 arch/x86/include/asm/io.h:readl Cyclomatic Complexity 1 arch/x86/include/asm/io.h:writeb Cyclomatic Complexity 1 arch/x86/include/asm/io.h:writel Cyclomatic Complexity 1 arch/x86/include/asm/io.h:ioremap Cyclomatic Complexity 1 include/linux/kobject.h:kobject_name Cyclomatic Complexity 2 include/linux/device.h:dev_name Cyclomatic Complexity 1 include/linux/device.h:dev_get_drvdata Cyclomatic Complexity 1 include/linux/device.h:dev_set_drvdata Cyclomatic Complexity 1 include/linux/io.h:arch_phys_wc_add Cyclomatic Complexity 1 include/linux/io.h:arch_phys_wc_del Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large Cyclomatic Complexity 3 include/linux/slab.h:kmalloc Cyclomatic Complexity 1 include/linux/slab.h:kzalloc Cyclomatic Complexity 1 include/linux/pci.h:pci_get_drvdata Cyclomatic Complexity 1 include/linux/pci.h:pci_set_drvdata Cyclomatic Complexity 1 include/linux/pci.h:pci_name Cyclomatic Complexity 2 include/linux/fb.h:alloc_apertures Cyclomatic Complexity 2 drivers/video/fbdev/aty/radeonfb.h:radeon_pll_errata_after_index Cyclomatic Complexity 2 drivers/video/fbdev/aty/radeonfb.h:radeon_pll_errata_after_data Cyclomatic Complexity 1 drivers/video/fbdev/aty/radeonfb.h:round_div Cyclomatic Complexity 3 drivers/video/fbdev/aty/radeonfb.h:var_to_depth Cyclomatic Complexity 5 drivers/video/fbdev/aty/radeonfb.h:radeon_get_dstbpp Cyclomatic Complexity 1 drivers/video/fbdev/aty/radeonfb.h:radeonfb_bl_init Cyclomatic Complexity 1 drivers/video/fbdev/aty/radeonfb.h:radeonfb_bl_exit Cyclomatic Complexity 1 include/drm/drm_fb_helper.h:drm_fb_helper_remove_conflicting_framebuffers Cyclomatic Complexity 21 drivers/video/fbdev/aty/radeon_base.c:radeon_calc_pll_regs Cyclomatic Complexity 1 drivers/video/fbdev/aty/radeon_base.c:radeonfb_exit Cyclomatic Complexity 6 drivers/video/fbdev/aty/radeon_base.c:radeon_find_mem_vbios Cyclomatic Complexity 4 drivers/video/fbdev/aty/radeon_base.c:radeon_kick_out_firmware_fb Cyclomatic Complexity 5 drivers/video/fbdev/aty/radeon_base.c:radeonfb_pci_unregister Cyclomatic Complexity 1 drivers/video/fbdev/aty/radeon_base.c:radeon_show_one_edid Cyclomatic Complexity 3 drivers/video/fbdev/aty/radeon_base.c:radeon_show_edid2 Cyclomatic Complexity 3 drivers/video/fbdev/aty/radeon_base.c:radeon_show_edid1 Cyclomatic Complexity 2 drivers/video/fbdev/aty/radeon_base.c:radeon_set_fbinfo Cyclomatic Complexity 18 drivers/video/fbdev/aty/radeon_base.c:radeonfb_check_var Cyclomatic Complexity 2 drivers/video/fbdev/aty/radeon_base.c:radeon_unmap_ROM Cyclomatic Complexity 7 drivers/video/fbdev/aty/radeon_base.c:radeon_map_ROM Cyclomatic Complexity 16 drivers/video/fbdev/aty/radeon_base.c:radeonfb_setup Cyclomatic Complexity 2 drivers/video/fbdev/aty/radeo
Re: bisected bd4c82c22c367e is the first bad commit (was [Bug 198617] New: zswap causing random applications to crash)
On (02/03/18 10:34), Sergey Senozhatsky wrote: > so we are basically looking at 4.14-rc0+ [..] > # first bad commit: [bd4c82c22c367e068acb1ec9ec02be2fac3e09e2] mm, THP, swap: > delay splitting THP after swapped out To re-confirm, disabling CONFIG_TRANSPARENT_HUGEPAGE fixes my 4.15.0-next -ss
[PATCH v2] x86/perf : Add check for CPUID instruction before using
We still officially support the ancient i486 cpu. First generation versions of this processor do not have the CPUID instruction, though later versions do. Therefore you must check that the cpu supports it before using it. At present it fails with an "Illegal Instruction" signal on the early processors. v1: cpuid detection code based on GCC gcc/config/i386/cpuid.h https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/config/i386/cpuid.h;hb=HEAD v2: cpuid detection code based on Linux kernel arch/x86/kernel/cpu/common.c Signed-off-by: Matthew Whitehead --- tools/perf/arch/x86/util/header.c | 54 +++ tools/perf/util/header.h | 2 ++ 2 files changed, 56 insertions(+) diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c index fb0d71a..d2508b3 100644 --- a/tools/perf/arch/x86/util/header.c +++ b/tools/perf/arch/x86/util/header.c @@ -7,6 +7,57 @@ #include "../../util/header.h" +#ifndef __x86_64__ + +/* This code based on arch/x86/kernel/cpu/common.c + * Standard macro to see if a specific flag is changeable. + */ +static inline int flag_is_changeable_p(u32 flag) +{ + u32 f1, f2; + + /* +* Cyrix and IDT cpus allow disabling of CPUID +* so the code below may return different results +* when it is executed before and after enabling +* the CPUID. Add "volatile" to not allow gcc to +* optimize the subsequent calls to this function. +*/ + asm volatile ("pushfl \n\t" + "pushfl \n\t" + "popl %0 \n\t" + "movl %0, %1 \n\t" + "xorl %2, %0 \n\t" + "pushl %0 \n\t" + "popfl\n\t" + "pushfl \n\t" + "popl %0 \n\t" + "popfl\n\t" + + : "=&r" (f1), "=&r" (f2) + : "ir" (flag)); + + return ((f1^f2) & flag) != 0; +} + +#define X86_EFLAGS_ID 0x0020 + +/* Probe for the CPUID instruction */ +int have_cpuid_p(void) +{ + return flag_is_changeable_p(X86_EFLAGS_ID); +} + +#else /* CONFIG_X86_64 */ + +/* All X86_64 have cpuid instruction */ +int have_cpuid_p(void) +{ + return 1; +} + +#endif /* CONFIG_X86_64 */ + static inline void cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, unsigned int *d) @@ -28,6 +79,9 @@ int nb; char vendor[16]; + if (!have_cpuid_p()) + return -1; + cpuid(0, &lvl, &b, &c, &d); strncpy(&vendor[0], (char *)(&b), 4); strncpy(&vendor[4], (char *)(&d), 4); diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index f28..f4de656 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -171,6 +171,8 @@ int write_padded(struct feat_fd *fd, const void *bf, /* * arch specific callback */ +int have_cpuid_p(void); + int get_cpuid(char *buffer, size_t sz); char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused); -- 1.8.3.1
Re: [PATCH] locking/qspinlock: Ensure node is initialised before updating prev->next
Hi Will, I love your patch! Yet something to improve: [auto build test ERROR on v4.15] [cannot apply to tip/locking/core tip/core/locking tip/auto-latest next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Will-Deacon/locking-qspinlock-Ensure-node-is-initialised-before-updating-prev-next/20180203-095222 config: x86_64-randconfig-x017-201804 (attached as .config) compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): In file included from include/linux/kernel.h:10:0, from include/linux/list.h:9, from include/linux/smp.h:12, from kernel/locking/qspinlock.c:25: kernel/locking/qspinlock.c: In function 'queued_spin_lock_slowpath': >> include/linux/compiler.h:264:8: error: conversion to non-scalar type >> requested union { typeof(x) __val; char __c[1]; } __u = \ ^ >> arch/x86/include/asm/barrier.h:71:2: note: in expansion of macro 'WRITE_ONCE' WRITE_ONCE(*p, v); \ ^~ include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~ >> kernel/locking/qspinlock.c:419:3: note: in expansion of macro >> 'smp_store_release' smp_store_release(prev->next, node); ^ -- In file included from include/linux/kernel.h:10:0, from include/linux/list.h:9, from include/linux/smp.h:12, from kernel//locking/qspinlock.c:25: kernel//locking/qspinlock.c: In function 'queued_spin_lock_slowpath': >> include/linux/compiler.h:264:8: error: conversion to non-scalar type >> requested union { typeof(x) __val; char __c[1]; } __u = \ ^ >> arch/x86/include/asm/barrier.h:71:2: note: in expansion of macro 'WRITE_ONCE' WRITE_ONCE(*p, v); \ ^~ include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~ kernel//locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release' smp_store_release(prev->next, node); ^ vim +/WRITE_ONCE +71 arch/x86/include/asm/barrier.h 47933ad4 Peter Zijlstra 2013-11-06 66 1638fb72 Michael S. Tsirkin 2015-12-27 67 #define __smp_store_release(p, v) \ 47933ad4 Peter Zijlstra 2013-11-06 68 do { \ 47933ad4 Peter Zijlstra 2013-11-06 69 compiletime_assert_atomic_type(*p); \ 47933ad4 Peter Zijlstra 2013-11-06 70 barrier(); \ 76695af2 Andrey Konovalov 2015-08-02 @71 WRITE_ONCE(*p, v); \ 47933ad4 Peter Zijlstra 2013-11-06 72 } while (0) 47933ad4 Peter Zijlstra 2013-11-06 73 :: The code at line 71 was first introduced by commit :: 76695af20c015206cffb84b15912be6797d0cca2 locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire() :: TO: Andrey Konovalov :: CC: Ingo Molnar --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] locking/qspinlock: Ensure node is initialised before updating prev->next
Hi Will, I love your patch! Perhaps something to improve: [auto build test WARNING on v4.15] [cannot apply to tip/locking/core tip/core/locking tip/auto-latest next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Will-Deacon/locking-qspinlock-Ensure-node-is-initialised-before-updating-prev-next/20180203-095222 config: sparc64-allyesconfig (attached as .config) compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=sparc64 All warnings (new ones prefixed by >>): In file included from include/linux/kernel.h:10:0, from include/linux/list.h:9, from include/linux/smp.h:12, from kernel/locking/qspinlock.c:25: kernel/locking/qspinlock.c: In function 'queued_spin_lock_slowpath': include/linux/compiler.h:264:8: error: conversion to non-scalar type requested union { typeof(x) __val; char __c[1]; } __u = \ ^ >> arch/sparc/include/asm/barrier_64.h:45:2: note: in expansion of macro >> 'WRITE_ONCE' WRITE_ONCE(*p, v); \ ^~ include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~ kernel/locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release' smp_store_release(prev->next, node); ^ -- In file included from include/linux/kernel.h:10:0, from include/linux/list.h:9, from include/linux/smp.h:12, from kernel//locking/qspinlock.c:25: kernel//locking/qspinlock.c: In function 'queued_spin_lock_slowpath': include/linux/compiler.h:264:8: error: conversion to non-scalar type requested union { typeof(x) __val; char __c[1]; } __u = \ ^ >> arch/sparc/include/asm/barrier_64.h:45:2: note: in expansion of macro >> 'WRITE_ONCE' WRITE_ONCE(*p, v); \ ^~ include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~ kernel//locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release' smp_store_release(prev->next, node); ^ vim +/WRITE_ONCE +45 arch/sparc/include/asm/barrier_64.h d550bbd4 David Howells 2012-03-28 40 45d9b859 Michael S. Tsirkin 2015-12-27 41 #define __smp_store_release(p, v) \ 47933ad4 Peter Zijlstra 2013-11-06 42 do { \ 47933ad4 Peter Zijlstra 2013-11-06 43 compiletime_assert_atomic_type(*p); \ 47933ad4 Peter Zijlstra 2013-11-06 44 barrier(); \ 76695af2 Andrey Konovalov 2015-08-02 @45 WRITE_ONCE(*p, v); \ 47933ad4 Peter Zijlstra 2013-11-06 46 } while (0) 47933ad4 Peter Zijlstra 2013-11-06 47 :: The code at line 45 was first introduced by commit :: 76695af20c015206cffb84b15912be6797d0cca2 locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire() :: TO: Andrey Konovalov :: CC: Ingo Molnar --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
linux-next: Signed-off-by missing for commits in the s390 tree
Hi all, Commits a39892ed47bf ("s390/runtime_instrumentation: re-add signum system call parameter") 279d2cea3aad ("s390/cio: fix kernel-doc usage") are missing a Signed-off-by from their committer. -- Cheers, Stephen Rothwell
Re: [PATCH bpf-next v8 0/5] libbpf: add XDP binding support
On Wed, Jan 31, 2018 at 05:53:13PM +0100, Daniel Borkmann wrote: > On 01/30/2018 09:50 PM, Eric Leblond wrote: > > Hello Daniel, > > > > No problem with the delay in the answer. I'm doing far worse. > > > > Here is an updated version: > > - add if_link.h in uapi and remove the definition > > - fix a commit message > > - remove uapi from a include > > Fyi, this still needs to wait for a bit in the queue due to current > merge window where bpf-next is closed during that time [0]. Thanks! > > [0] https://www.spinics.net/lists/netdev/msg481490.html I've tested it and applied to bpf tree considering that the series were practically ready long before bpf-next was closed. Thank you Eric. perf build was also fine, but please watch out for any unexpected breakages, since perf has to be built on variety of distros.
Re: RFC(V3): Audit Kernel Container IDs
On Fri, Feb 02, 2018 at 05:05:22PM -0500, Paul Moore wrote: > On Tue, Jan 9, 2018 at 7:16 AM, Richard Guy Briggs wrote: > > Containers are a userspace concept. The kernel knows nothing of them. > > > > The Linux audit system needs a way to be able to track the container > > provenance of events and actions. Audit needs the kernel's help to do > > this. > > Two small comments below, but I tend to think we are at a point where > you can start cobbling together some prototype/RFC patches. Surely Agreed. LGTM. > there are going to be a few changes, and new comments, that come out > once we see an initial implementation so let's see what those are. thanks, -serge
bisected bd4c82c22c367e is the first bad commit (was [Bug 198617] New: zswap causing random applications to crash)
Hello, On (01/30/18 11:48), Andrew Morton wrote: > Subject: [Bug 198617] New: zswap causing random applications to crash > > https://bugzilla.kernel.org/show_bug.cgi?id=198617 > > Bug ID: 198617 >Summary: zswap causing random applications to crash >Product: Memory Management >Version: 2.5 > Kernel Version: 4.14.15 > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: a...@linux-foundation.org > Reporter: kernel_...@dlk.pl > Regression: No > > https://bugs.freedesktop.org/show_bug.cgi?id=104709 > https://bugs.kde.org/show_bug.cgi?id=389542 > > I did have zswap enabled for a long while, and a lot of wine games, > plasmashell, xorg, kwin_x11 (and other) did crash randomly when reached 100% > of > physical ram and swap was like almost never used. > > I could esilly open a lot of browser tabs and the browser or xorg would fail > every time. > > After disabling zswap no crashes at all. > > /etc/systemd/swap.conf > zswap_enabled=1 > zswap_compressor=lz4 # lzo lz4 > zswap_max_pool_percent=25 # 1-99 > zswap_zpool=zbud # zbud z3fold So I did a number of tests and I confirm that under memory pressure with frontswap enabled I do see segfaults and memory corruptions in random user space applications. kernel: urxvt[338]: segfault at 20 ip 7fc08889ae0d sp 7ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x7fc08889ae0d _int_malloc (libc.so.6) #1 0x7fc08889c2f3 malloc (libc.so.6) #2 0x560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) #3 0x560e6005e75c n/a (urxvt) #4 0x560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #5 0x560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) #6 0x560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) #7 0x560e6005c10f _Z17ev_invoke_pendingv (urxvt) #8 0x560e6005cb55 ev_run (urxvt) #9 0x560e6003b9b9 main (urxvt) #10 0x7fc08883af4a __libc_start_main (libc.so.6) #11 0x560e6003f9da _start (urxvt) kernel: urxvt[343]: segfault at 10 ip 7fa56bd7d52b sp 7ffc09783a40 error 4 in libc-2.26.so[7fa56bcfd000+1ae000] #0 0x7fa56bd7d52b _int_malloc (libc.so.6) #1 0x7fa56bd7f2f3 malloc (libc.so.6) #2 0x7fa56b3d6097 n/a (libxcb.so.1) #3 0x7fa56b3d64d8 n/a (libxcb.so.1) #4 0x7fa56c921b79 n/a (libX11.so.6) #5 0x7fa56c921ceb n/a (libX11.so.6) #6 0x7fa56c921fdd _XEventsQueued (libX11.so.6) #7 0x7fa56c913c49 XEventsQueued (libX11.so.6) #8 0x55b35cfc3262 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt) #9 0x55b35cfc910f _Z17ev_invoke_pendingv (urxvt) #10 0x55b35cfc9c02 ev_run (urxvt) #11 0x55b35cfa89b9 main (urxvt) #12 0x7fa56bd1df4a __libc_start_main (libc.so.6) #13 0x55b35cfac9da _start (urxvt) Stack trace of thread 351: #0 0x7f5baaee7860 raise (libc.so.6) #1 0x7f5baaee8ec9 abort (libc.so.6) #2 0x7f5baaf30849 __malloc_assert (libc.so.6) #3 0x7f5baaf34011 _int_malloc (libc.so.6) #4 0x7f5baaf352f3 malloc (libc.so.6) #5 0x7f5baaf71cad __alloc_dir (libc.so.6) #6 0x7f5baaf71dbd opendir_tail (libc.so.6) #7 0x7f5bab5bbac4 Perl_pp_open_dir (libperl.so) #8 0x7f5bab55fec6 Perl_runops_standard (libperl.so) #9 0x7f5bab4d9390 Perl_call_sv (libperl.so) #10 0x5611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #11 0x5611f0947acb _ZN9rxvt_term14init_resourcesEiPKPKc (urxvt) #12 0x5611f0948da8 _ZN9rxvt_term5init2EiPKPKc (urxvt) #13 0x5611f097a0af n/a (urxvt) #14 0x7f5bab568259 Perl_pp_entersub (libperl.so) #15 0x7f5bab55fec6 Perl_runops_standard (libperl.so) #16 0x7f5bab4d9390 Perl_call_sv (libperl.so) #17 0x5611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #18 0x5611f0939a77 _ZN9rxvt_term9key_pressER9XKeyEvent (urxvt) #19 0x5611f093d77a _ZN9rxvt_term4x_cbER7_XEvent (urxvt) #20 0x5611f09572e8 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt) #21 0x5611f095d10f _Z17ev_invoke_pendingv (urxvt) #22 0x5611f095dc02 ev_run (urxvt) #23 0x5611f093c9b9 main (urxvt) #24 0x7f5baaed3f4a __libc_start_main (libc.so.6) #25 0x5611f09409da _start (urxvt) and so on. However, the problem is not specific to 4.14.15 or 4.14.11. I manages to track it down to 4.14 merge window, so we are basically looking at 4.14-rc0+ The bisect log looks as follows: git bisect start # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1 git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 # good: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/dav
[PATCH] pvcalls-back: do not return error on inet_accept EAGAIN
When the client sends a regular blocking accept request, the backend is expected to return only when the accept is completed, simulating a blocking behavior, or return an error. Specifically, on EAGAIN from inet_accept, the backend shouldn't return "EAGAIN" to the client. Instead, it should simply continue the wait. Otherwise, the client will send another accept request, which will cause another EAGAIN to be sent back, which is a waste of resources and not conforming to the expected behavior. Change the behavior by turning the "goto error" into a return. Signed-off-by: Stefano Stabellini diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index c7822d8..156e5ae 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -548,7 +548,7 @@ static void __pvcalls_back_accept(struct work_struct *work) ret = inet_accept(mappass->sock, sock, O_NONBLOCK, true); if (ret == -EAGAIN) { sock_release(sock); - goto out_error; + return; } map = pvcalls_new_active_socket(fedata,
Re: [PATCH 1/2] Documentation/memory-barriers.txt: cross-reference "tools/memory-model/"
On Fri, Feb 02, 2018 at 10:12:48AM +0100, Andrea Parri wrote: > Recent efforts led to the specification of a memory consistency model > for the Linux kernel [1], which "can (roughly speaking) be thought of > as an automated version of memory-barriers.txt" and which is (in turn) > "accompanied by extensive documentation on its use and its design". > > Make sure that the (occasional) reader of memory-barriers.txt will be > aware of these developments. > > [1] https://marc.info/?l=linux-kernel&m=151687290114799&w=2 > > Signed-off-by: Andrea Parri I am inclined to pull in something along these lines, but would like some feedback on the wording, especially how "official" we want to make the memory model to be. Thoughts? If I don't hear otherwise in a couple of days, I will pull this as is. Thanx, Paul > --- > Documentation/memory-barriers.txt | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index a863009849a3b..8cc3f098f4a7d 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -17,7 +17,9 @@ meant as a guide to using the various memory barriers > provided by Linux, but > in case of any doubt (and there are many) please ask. > > To repeat, this document is not a specification of what Linux expects from > -hardware. > +hardware. For such a specification, in the form of a memory consistency > +model, and for documentation about its usage and its design, the reader is > +referred to "tools/memory-model/". > > The purpose of this document is twofold: > > -- > 2.7.4 >
[PATCH v4 4/5] irqchip/gic-v3-its: add ability to resend MAPC on resume
This adds functionality to resend the MAPC command to an ITS node on resume. If the ITS is powered down during suspend and the collections are not backed by memory, the ITS will lose that state. This just sets up the known state for the collections after the ITS is restored. This feature is enabled via Kconfig and a device tree entry. Signed-off-by: Derek Basehore --- arch/arm64/Kconfig | 10 drivers/irqchip/irq-gic-v3-its.c | 101 --- 2 files changed, 73 insertions(+), 38 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 53612879fe56..f38f1a7b4266 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -571,6 +571,16 @@ config HISILICON_ERRATUM_161600802 If unsure, say Y. +config ARM_GIC500_COLLECTIONS_RESET + bool "GIC-500 Collections: Workaround for GIC-500 Collections on suspend reset" + default y + help + The GIC-500 can store Collections state internally for the ITS. If + the ITS is reset on suspend (ie from power getting disabled), the + collections need to be reconfigured on resume. + + If unsure, say Y. + config QCOM_FALKOR_ERRATUM_E1041 bool "Falkor E1041: Speculative instruction fetches might cause errant memory access" default y diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index e13515cdb68f..63764efa4dcc 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -48,6 +48,7 @@ #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) #define ITS_FLAGS_SAVE_SUSPEND_STATE (1ULL << 3) +#define ITS_FLAGS_WORKAROUND_GIC500_MAPC (1ULL << 4) #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING(1 << 0) @@ -1950,52 +1951,53 @@ static void its_cpu_init_lpis(void) dsb(sy); } -static void its_cpu_init_collection(void) +static void its_cpu_init_collection(struct its_node *its) { - struct its_node *its; - int cpu; - - spin_lock(&its_lock); - cpu = smp_processor_id(); - - list_for_each_entry(its, &its_nodes, entry) { - u64 target; + int cpu = smp_processor_id(); + u64 target; - /* avoid cross node collections and its mapping */ - if (its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) { - struct device_node *cpu_node; + /* avoid cross node collections and its mapping */ + if (its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) { + struct device_node *cpu_node; - cpu_node = of_get_cpu_node(cpu, NULL); - if (its->numa_node != NUMA_NO_NODE && - its->numa_node != of_node_to_nid(cpu_node)) - continue; - } + cpu_node = of_get_cpu_node(cpu, NULL); + if (its->numa_node != NUMA_NO_NODE && + its->numa_node != of_node_to_nid(cpu_node)) + return; + } + /* +* We now have to bind each collection to its target +* redistributor. +*/ + if (gic_read_typer(its->base + GITS_TYPER) & GITS_TYPER_PTA) { /* -* We now have to bind each collection to its target +* This ITS wants the physical address of the * redistributor. */ - if (gic_read_typer(its->base + GITS_TYPER) & GITS_TYPER_PTA) { - /* -* This ITS wants the physical address of the -* redistributor. -*/ - target = gic_data_rdist()->phys_base; - } else { - /* -* This ITS wants a linear CPU number. -*/ - target = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER); - target = GICR_TYPER_CPU_NUMBER(target) << 16; - } + target = gic_data_rdist()->phys_base; + } else { + /* This ITS wants a linear CPU number. */ + target = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER); + target = GICR_TYPER_CPU_NUMBER(target) << 16; + } - /* Perform collection mapping */ - its->collections[cpu].target_address = target; - its->collections[cpu].col_id = cpu; + /* Perform collection mapping */ + its->collections[cpu].target_address = target; + its->collections[cpu].col_id = cpu; - its_send_mapc(its, &its->collections[cpu], 1); - its_send_invall(its, &its->collections[cpu]); - } + its_send_mapc(its, &its->collections[cpu], 1); + its_send_invall(its, &its->collections[cpu]); +} + +static void it
[PATCH v4 2/5] irqchip/gic-v3-its: add ability to save/restore ITS state
Some platforms power off GIC logic in suspend, so we need to save/restore state. The distributor and redistributor registers need to be handled in platform code due to access permissions on those registers, but the ITS registers can be restored in the kernel. Signed-off-by: Derek Basehore --- drivers/irqchip/irq-gic-v3-its.c | 101 +++ 1 file changed, 101 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 06f025fd5726..e13515cdb68f 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -46,6 +47,7 @@ #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) +#define ITS_FLAGS_SAVE_SUSPEND_STATE (1ULL << 3) #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING(1 << 0) @@ -83,6 +85,15 @@ struct its_baser { u32 psz; }; +/* + * Saved ITS state - this is where saved state for the ITS is stored + * when it's disabled during system suspend. + */ +struct its_ctx { + u64 cbaser; + u32 ctlr; +}; + struct its_device; /* @@ -101,6 +112,7 @@ struct its_node { struct its_collection *collections; struct fwnode_handle*fwnode_handle; u64 (*get_msi_base)(struct its_device *its_dev); + struct its_ctx its_ctx; struct list_headits_device_list; u64 flags; unsigned long list_nr; @@ -3042,6 +3054,90 @@ static void its_enable_quirks(struct its_node *its) gic_enable_quirks(iidr, its_quirks, its); } +static int its_save_disable(void) +{ + struct its_node *its; + int err = 0; + + spin_lock(&its_lock); + list_for_each_entry(its, &its_nodes, entry) { + struct its_ctx *ctx; + void __iomem *base; + + if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE)) + continue; + + ctx = &its->its_ctx; + base = its->base; + ctx->ctlr = readl_relaxed(base + GITS_CTLR); + err = its_force_quiescent(base); + if (err) { + pr_err("ITS failed to quiesce\n"); + writel_relaxed(ctx->ctlr, base + GITS_CTLR); + goto err; + } + + ctx->cbaser = gits_read_cbaser(base + GITS_CBASER); + } + +err: + if (err) { + list_for_each_entry_continue_reverse(its, &its_nodes, entry) { + if (its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE) { + struct its_ctx *ctx = &its->its_ctx; + void __iomem *base = its->base; + + writel_relaxed(ctx->ctlr, base + GITS_CTLR); + } + } + } + + spin_unlock(&its_lock); + + return err; +} + +static void its_restore_enable(void) +{ + struct its_node *its; + + spin_lock(&its_lock); + list_for_each_entry(its, &its_nodes, entry) { + if (its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE) { + struct its_ctx *ctx = &its->its_ctx; + void __iomem *base = its->base; + /* +* Only the lower 32 bits matter here since the upper 32 +* don't include any of the offset. +*/ + u32 creader = readl_relaxed(base + GITS_CREADR); + int i; + + /* +* Reset the write location to where the ITS is +* currently at. +*/ + gits_write_cbaser(ctx->cbaser, base + GITS_CBASER); + gits_write_cwriter(creader, base + GITS_CWRITER); + its->cmd_write = &its->cmd_base[ + creader / sizeof(struct its_cmd_block)]; + /* Restore GITS_BASER from the value cache. */ + for (i = 0; i < GITS_BASER_NR_REGS; i++) { + struct its_baser *baser = &its->tables[i]; + + its_write_baser(its, baser, baser->val); + } + writel_relaxed(ctx->ctlr, base + GITS_CTLR); + } + } + spin_unlock(&its_lock); +} + +static struct syscore_ops its_syscore_ops = { + .suspend = its_save_disable, + .resume = its_restore_enable, +}; + static int its_init_domain(struct fwnode_handle *handle, struct its_node *its) { struct irq_domain *inner_domain; @@ -3261,6 +3357,9 @@ static int
[PATCH v4 5/5] DT/arm,gic-v3: add collections-reset-on-suspend property
This boolean property for the GIC-V3-ITS enables resending the MAP COLLECTIONS commands when resuming for when the state is reset on suspend. Signed-off-by: Derek Basehore --- Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt index a470147d4f14..adb958e046d2 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt @@ -81,6 +81,10 @@ Optional: - reset-on-suspend: Boolean property. Indicates that the ITS state is reset on suspend. The state is then saved on suspend and restored on resume. +- collections-reset-on-suspend : Boolean property. If the collections for the + ITS are stored internally instead of externally, the state will be lost if the + GIC loses power. Setting this enables the kernel to reset the collections + state on resume for this ITS node. The main GIC node must contain the appropriate #address-cells, #size-cells and ranges properties for the reg property of all ITS -- 2.16.0.rc1.238.g530d649a79-goog
[PATCH v4 3/5] DT/arm,gic-v3-its: add reset-on-suspend property
This adds documentation for the new reset-on-suspend property. This property enables saving and restoring the ITS for when it loses state in system suspend. Signed-off-by: Derek Basehore --- Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt index 0a57f2f4167d..a470147d4f14 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt @@ -78,6 +78,9 @@ These nodes must have the following properties: Optional: - socionext,synquacer-pre-its: (u32, u32) tuple describing the untranslated address and size of the pre-ITS window. +- reset-on-suspend: Boolean property. Indicates that the ITS state is + reset on suspend. The state is then saved on suspend and restored on + resume. The main GIC node must contain the appropriate #address-cells, #size-cells and ranges properties for the reg property of all ITS -- 2.16.0.rc1.238.g530d649a79-goog
[PATCH v4 1/5] cpu_pm: add syscore_suspend error handling
If cpu_cluster_pm_enter() fails, cpu_pm_exit() should be called. This will put the CPU in the correct state to resume from the failure. Signed-off-by: Derek Basehore --- kernel/cpu_pm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c index 67b02e138a47..03bcc0751a51 100644 --- a/kernel/cpu_pm.c +++ b/kernel/cpu_pm.c @@ -186,6 +186,9 @@ static int cpu_pm_suspend(void) return ret; ret = cpu_cluster_pm_enter(); + if (ret) + cpu_pm_exit(); + return ret; } -- 2.16.0.rc1.238.g530d649a79-goog
[PATCH v4 0/5] GICv3 Save and Restore
A lot of changes in v2. The distributor and redistributor saving and restoring is left to the PSCI/firmware implementation after discussions with ARM. This reduces the line changes by a lot and removes now unneeded patches. Patches are verified on an RK3399 platform with pending patches in the ARM-Trusted-Firmware project. Just a couple minor changes in v3 to formatting. Fixed a false ITS wedged detection due to the cmd_write and creadr offsets not matching up on reset in v4. Also minor formatting changes. Derek Basehore (5): cpu_pm: add syscore_suspend error handling irqchip/gic-v3-its: add ability to save/restore ITS state DT/arm,gic-v3-its: add reset-on-suspend property irqchip/gic-v3-its: add ability to resend MAPC on resume DT/arm,gic-v3: add collections-reset-on-suspend property .../bindings/interrupt-controller/arm,gic-v3.txt | 7 + arch/arm64/Kconfig | 10 + drivers/irqchip/irq-gic-v3-its.c | 202 + kernel/cpu_pm.c| 3 + 4 files changed, 184 insertions(+), 38 deletions(-) -- 2.16.0.rc1.238.g530d649a79-goog
cris-linux-ld: cannot open linker script file ./arch/cris/kernel/vmlinux.lds: No such file or directory
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: b89e32ccd1be92a3643df3908d3026b09e271616 commit: 0fbc0b67a89d756ae3a839be01440e54348159a0 cris: remove arch specific early DT functions date: 3 days ago config: cris-defconfig (attached as .config) compiler: cris-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 0fbc0b67a89d756ae3a839be01440e54348159a0 # save the attached .config to linux build tree make.cross ARCH=cris All errors (new ones prefixed by >>): >> cris-linux-ld: cannot open linker script file >> ./arch/cris/kernel/vmlinux.lds: No such file or directory --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Inquiry about your product/ From exportersindia_ Awaiting your reply.
I am interested in your Product i got your listing from exportersindia.com we need large quantites of about 100pcs. Pls send a mail for business discussions at kmike2...@gmail.com so as to carryout orders and payment as soon as possibles. Mike Kennedy CEO 9037623258 USA www.goodman.com
Re: [PATCH 2/2] MAINTAINERS: list file memory-barriers.txt within the LKMM entry
On Fri, Feb 02, 2018 at 03:51:02PM -0800, Paul E. McKenney wrote: > On Fri, Feb 02, 2018 at 10:13:42AM +0100, Andrea Parri wrote: > > Now that a formal specification of the LKMM has become available to > > the developer, some concern about how to track changes to the model > > on the level of the "high-level documentation" was raised. > > > > A first "mitigation" to this issue, suggested by Will, is to assign > > maintainership (and responsibility!!) of such documentation (here, > > memory-barriers.txt) to the maintainers of the LKMM themselves. > > > > Suggested-by: Will Deacon > > Signed-off-by: Andrea Parri > > Very good, thank you, queued! Please see below for the usual commit-log > rework. BTW, in future submissions, could you please capitalize the > first word after the colon (":") in the subject line? It is all too > easy for me to forget to change this, as Ingo can attest. ;-) Sorry, I'll do my best! ;-) > > If we are going to continue to use the LKMM acronym, should we make the > first line of the MAINTAINERS block look something like this? I've no strong opinion about whether we should, but it makes sense to me. (The acronym is currently defined (and heavily used) in explanation.txt.) Thanks, Andrea > > LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) > > One alternative would be to start calling it LKMCM, though that does > look a bit like a Roman numeral. ;-) > > Thanx, Paul > > > > commit 2f80571625dc2d1977acdef79267ba1645b07c53 > Author: Andrea Parri > Date: Fri Feb 2 10:13:42 2018 +0100 > > MAINTAINERS: List file memory-barriers.txt within the LKMM entry > > We now have a shiny new Linux-kernel memory model (LKMM) and the old > tried-and-true Documentation/memory-barrier.txt. It would be good to > keep these automatically synchronized, but in the meantime we need at > least let people know that they are related. Will suggested adding the > Documentation/memory-barrier.txt file to the LKMM maintainership list, > thus making the LKMM maintainers responsible for both the old and the new. > This commit follows Will's excellent suggestion. > > Suggested-by: Will Deacon > Signed-off-by: Andrea Parri > Signed-off-by: Paul E. McKenney > > diff --git a/MAINTAINERS b/MAINTAINERS > index ba4dc08fbe95..e6ad9b44e8fb 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8101,6 +8101,7 @@ L: linux-kernel@vger.kernel.org > S: Supported > T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > F: tools/memory-model/ > +F: Documentation/memory-barriers.txt > > LINUX SECURITY MODULE (LSM) FRAMEWORK > M: Chris Wright >
[GIT] Networking
1) The bnx2x can hang if you give it a GSO packet with a segment size which is too big for the hardware, detect and drop in this case. From Daniel Axtens. 2) Fix some overflows and pointer leaks in xtables, from Dmitry Vyukov. 3) Missing RCU locking in igmp, from Eric Dumazet. 4) Fix RX checksum handling on r8152, it can only checksum UDP and TCP packets. From Hayes Wang. 5) Minor pacing tweak to TCP BBR congestion control, from Neal Cardwell. 6) Missing RCU annotations in cls_u32, from Paolo Abeni. Please pull, thanks a lot! The following changes since commit 255442c93843f52b6891b21d0b485bf2c97f93c3: Merge tag 'docs-4.16' of git://git.lwn.net/linux (2018-01-31 19:25:25 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to edbe69ef2c90fc86998a74b08319a01c508bd497: Revert "defer call to mem_cgroup_sk_alloc()" (2018-02-02 19:49:31 -0500) Alexander Monakov (1): net: pxa168_eth: add netconsole support Arnd Bergmann (3): net: cxgb4: avoid memcpy beyond end of source buffer net: qed: use correct strncpy() size net: qlge: use memmove instead of skb_copy_to_linear_data Christian Brauner (1): rtnetlink: remove check for IFLA_IF_NETNSID Colin Ian King (4): be2net: remove redundant initialization of 'head' and pointer txq net: jme: remove unused initialization of 'rxdesc' lan78xx: remove redundant initialization of pointer 'phydev' vmxnet3: remove redundant initialization of pointer 'rq' Daniel Axtens (2): net: create skb_gso_validate_mac_len() bnx2x: disable GSO where gso_size is too big for hardware David S. Miller (3): Merge branch 'bnx2x-disable-GSO-on-too-large-packets' Merge git://git.kernel.org/.../pablo/nf Merge branch 'r8152-fix-rx-issues' Desnes Augusto Nunes do Rosario (1): ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server Dmitry Vyukov (3): netfilter: x_tables: fix int overflow in xt_alloc_table_info() netfilter: x_tables: fix pointer leaks to userspace netfilter: ipt_CLUSTERIP: fix out-of-bounds accesses in clusterip_tg_check() Ed Swierk (1): openvswitch: Remove padding from packet before L3+ conntrack processing Edwin Peer (1): nfp: fix TLV offset calculation Eric Dumazet (3): netfilter: x_tables: avoid out-of-bounds reads in xt_request_find_{match|target} net: igmp: add a missing rcu locking section soreuseport: fix mem leak in reuseport_add_sock() Geert Uytterhoeven (2): net: bridge: Fix uninitialized error in br_fdb_sync_static() inet: Avoid unitialized variable warning in inet_unhash() Hayes Wang (2): r8152: fix wrong checksum status for received IPv4 packets r8152: set rx mode early when linking on Jiri Pirko (1): rocker: fix possible null pointer dereference in rocker_router_fib_event_work Jozsef Kadlecsik (1): netfilter: ipset: Fix wraparound in hash:*net* types Neal Cardwell (1): tcp_bbr: fix pacing_gain to always be unity when using lt_bw Paolo Abeni (2): netfilter: on sockopt() acquire sock lock only in the required scope cls_u32: add missing RCU annotation. Roman Gushchin (1): Revert "defer call to mem_cgroup_sk_alloc()" drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h| 2 +- drivers/net/ethernet/emulex/benet/be_main.c | 3 +-- drivers/net/ethernet/ibm/ibmvnic.c| 6 +- drivers/net/ethernet/jme.c| 2 +- drivers/net/ethernet/marvell/pxa168_eth.c | 12 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_debug.c | 6 ++ drivers/net/ethernet/qlogic/qlge/qlge_main.c | 3 +-- drivers/net/ethernet/rocker/rocker_main.c | 18 +- drivers/net/usb/lan78xx.c | 2 +- drivers/net/usb/r8152.c | 13 ++--- drivers/net/vmxnet3/vmxnet3_drv.c | 6 ++ include/linux/skbuff.h| 16 mm/memcontrol.c | 14 ++ net/bridge/br_fdb.c | 2 +- net/core/rtnetlink.c | 3 --- net/core/skbuff.c | 63 ++- net/core/sock.c | 5 + net/core/sock_reuseport.c | 35 --- net/ipv4/igmp.c | 4 net/ipv4/inet_connection_sock.c | 1 - net/ipv4/inet_hashtables.c| 6 ++ n
Re: [GIT PULL] pin control bulk changes for v4.16
On Fri, Feb 2, 2018 at 4:44 PM, Linus Torvalds wrote: > > Stupid patch attached. I don't know how much this helps the insane > dependency hell for , but it's bound to help > _some_. Testing it, that patch definitely cuts down on recompiles after touch include/linux/pinctrl/devinfo.h a lot. It still ends up rebuilding a fair amount of odd drivers, but now the files it rebuilds at least make _some_ sense. It used to really rebuild just about everything (because pretty much everything includes ). Now it rebuilds various snd/soc files,gpio stuff and mmc/mfc stuff. I'm sure it could be improved upon still, but I think this is already a fairly noticeable improvement. One odd header include down. Ten million to go. Linus
[PATCH v7 5/5] iommu/vt-d: Add debugfs support for Interrupt remapping
Debugfs extension for Intel IOMMU to dump Interrupt remapping table entries for Interrupt remapping and Interrupt posting. The file /sys/kernel/debug/intel_iommu/ir_translation_struct provides detailed information, such as Index, Source Id, Destination Id, Vector and the IRTE values for entries with the present bit set, in the format shown. Remapped Interrupt supported on IOMMU: dmar7 IR table address:85e50 Index SrcID DstIDVct IRTE_highIRTE_low 1 f0f8 0100 30 0004f0f8 013d 7 f0f8 0400 22 0004f0f8 0422000d Posted Interrupt supported on IOMMU: dmar5 IR table address:85ec0 Index SrcID PDA_high PDA_low Vct IRTE_high IRTE_low 4 4300 000f ff765980 41 000f00044300ff76598000418001 5 4300 000f ff765980 51 000f00044300ff76598000518001 Cc: Jacob Pan Cc: Fenghua Yu Cc: Ashok Raj Co-Developed-by: Gayatri Kammela Signed-off-by: Gayatri Kammela Signed-off-by: Sohil Mehta --- v7: Print the IR table physical base address Simplify IR table formatting v6: Change a couple of seq_puts to seq_putc v5: Fix seq_puts formatting and remove leading '\n's v4: Remove the unused function parameter Fix checkpatch.pl warnings Remove error reporting for debugfs_create_file function Remove redundant IOMMU null check under for_each_active_iommu v3: Use a macro for seq file operations Change the intel_iommu_interrupt_remap file name to ir_translation_struct v2: Handle the case when IR is not enabled. Fix seq_printf formatting drivers/iommu/intel-iommu-debug.c | 94 +++ 1 file changed, 94 insertions(+) diff --git a/drivers/iommu/intel-iommu-debug.c b/drivers/iommu/intel-iommu-debug.c index a9a99aa..b66a073 100644 --- a/drivers/iommu/intel-iommu-debug.c +++ b/drivers/iommu/intel-iommu-debug.c @@ -229,6 +229,96 @@ static int iommu_regset_show(struct seq_file *m, void *unused) } DEFINE_SHOW_ATTRIBUTE(iommu_regset); +#ifdef CONFIG_IRQ_REMAP +static void ir_tbl_remap_entry_show(struct seq_file *m, + struct intel_iommu *iommu) +{ + struct irte *ri_entry; + int idx; + + seq_puts(m, " Index SrcID DstIDVct IRTE_high\t\tIRTE_low\n"); + + for (idx = 0; idx < INTR_REMAP_TABLE_ENTRIES; idx++) { + ri_entry = &iommu->ir_table->base[idx]; + if (!ri_entry->present || ri_entry->p_pst) + continue; + + seq_printf(m, " %d\t%04x %08x %02x %016llx\t%016llx\n", idx, + ri_entry->sid, ri_entry->dest_id, ri_entry->vector, + ri_entry->high, ri_entry->low); + } +} + +static void ir_tbl_posted_entry_show(struct seq_file *m, +struct intel_iommu *iommu) +{ + struct irte *pi_entry; + int idx; + + seq_puts(m, " Index SrcID PDA_high PDA_low Vct IRTE_high\t\tIRTE_low\n"); + + for (idx = 0; idx < INTR_REMAP_TABLE_ENTRIES; idx++) { + pi_entry = &iommu->ir_table->base[idx]; + if (!pi_entry->present || !pi_entry->p_pst) + continue; + + seq_printf(m, " %d\t%04x %08x %08x %02x %016llx\t%016llx\n", + idx, pi_entry->sid, pi_entry->pda_h, + pi_entry->pda_l << 6, pi_entry->vector, + pi_entry->high, pi_entry->low); + } +} + +/* + * For active IOMMUs go through the Interrupt remapping + * table and print valid entries in a table format for + * Remapped and Posted Interrupts. + */ +static int ir_translation_struct_show(struct seq_file *m, void *unused) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + u64 irta; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!ecap_ir_support(iommu->ecap)) + continue; + + irta = dmar_readq(iommu->reg + DMAR_IRTA_REG) & VTD_PAGE_MASK; + seq_printf(m, "Remapped Interrupt supported on IOMMU: %s\n" + " IR table address:%llx\n", iommu->name, irta); + + if (iommu->ir_table && irta) + ir_tbl_remap_entry_show(m, iommu); + else + seq_puts(m, "Interrupt Remapping is not enabled\n"); + seq_putc(m, '\n'); + } + + seq_puts(m, "\n\n"); + + for_each_active_iommu(iommu, drhd) { + if (!cap_pi_support(iommu->cap)) + continue; + + irta = dmar_readq(iommu->reg + DMAR_IRTA_REG) & VTD_PAGE_MASK; + seq_printf(m, "Posted Interrupt supported on IOMMU: %s\n" + " IR table address:%llx\n", iommu->name, irta); + + if (iommu->ir_table && irta) + ir_tbl_posted_entry_show(m, iommu); +
[PATCH v7 3/5] iommu/vt-d: Add debugfs support to show register contents
From: Gayatri Kammela Debugfs extension to dump all the register contents for each IOMMU device to the user space via debugfs. Example: root@OTC-KBLH-01:~# cat /sys/kernel/debug/intel_iommu/iommu_regset DMAR: dmar0: Register Base Address fed9 NameOffset Contents VER 0x000x0010 CAP 0x080x01cc40660462 ECAP0x100x00f0101a GCMD0x180x GSTS0x1c0xc700 RTADDR 0x200x0004071d3800 CCMD0x280x0800 FSTS0x340x FECTL 0x380x FEDATA 0x3c0xfee010044021 Cc: Fenghua Yu Cc: Jacob Pan Cc: Ashok Raj Co-Developed-by: Sohil Mehta Signed-off-by: Sohil Mehta Signed-off-by: Gayatri Kammela --- v7: Use macro for register set definitions Fix compiler warning for readq with 32bit architecture Remove leading '\n' v6: No change v5: No change v4: Fix checkpatch.pl warnings Remove error reporting for debugfs_create_file function Remove redundant IOMMU null check under for_each_active_iommu v3: Use a macro for seq file operations Change the intel_iommu_regset file name to iommu_regset Add information for MTRR registers v2: Fix seq_printf formatting drivers/iommu/intel-iommu-debug.c | 84 +++ include/linux/intel-iommu.h | 2 + 2 files changed, 86 insertions(+) diff --git a/drivers/iommu/intel-iommu-debug.c b/drivers/iommu/intel-iommu-debug.c index 8253503..38651ad 100644 --- a/drivers/iommu/intel-iommu-debug.c +++ b/drivers/iommu/intel-iommu-debug.c @@ -38,6 +38,49 @@ static const struct file_operations __name ## _fops = \ .owner = THIS_MODULE, \ } +struct iommu_regset { + int offset; + const char *regs; +}; + +#define IOMMU_REGSET_ENTRY(_reg_) \ + { DMAR_##_reg_##_REG, __stringify(_reg_) } +static const struct iommu_regset iommu_regs[] = { + IOMMU_REGSET_ENTRY(VER), + IOMMU_REGSET_ENTRY(CAP), + IOMMU_REGSET_ENTRY(ECAP), + IOMMU_REGSET_ENTRY(GCMD), + IOMMU_REGSET_ENTRY(GSTS), + IOMMU_REGSET_ENTRY(RTADDR), + IOMMU_REGSET_ENTRY(CCMD), + IOMMU_REGSET_ENTRY(FSTS), + IOMMU_REGSET_ENTRY(FECTL), + IOMMU_REGSET_ENTRY(FEDATA), + IOMMU_REGSET_ENTRY(FEADDR), + IOMMU_REGSET_ENTRY(FEUADDR), + IOMMU_REGSET_ENTRY(AFLOG), + IOMMU_REGSET_ENTRY(PMEN), + IOMMU_REGSET_ENTRY(PLMBASE), + IOMMU_REGSET_ENTRY(PLMLIMIT), + IOMMU_REGSET_ENTRY(PHMBASE), + IOMMU_REGSET_ENTRY(PHMLIMIT), + IOMMU_REGSET_ENTRY(IQH), + IOMMU_REGSET_ENTRY(IQT), + IOMMU_REGSET_ENTRY(IQA), + IOMMU_REGSET_ENTRY(ICS), + IOMMU_REGSET_ENTRY(IRTA), + IOMMU_REGSET_ENTRY(PQH), + IOMMU_REGSET_ENTRY(PQT), + IOMMU_REGSET_ENTRY(PQA), + IOMMU_REGSET_ENTRY(PRS), + IOMMU_REGSET_ENTRY(PECTL), + IOMMU_REGSET_ENTRY(PEDATA), + IOMMU_REGSET_ENTRY(PEADDR), + IOMMU_REGSET_ENTRY(PEUADDR), + IOMMU_REGSET_ENTRY(MTRRCAP), + IOMMU_REGSET_ENTRY(MTRRDEF) +}; + static void ctx_tbl_entry_show(struct seq_file *m, struct intel_iommu *iommu, int bus, bool ext) { @@ -116,6 +159,45 @@ static int dmar_translation_struct_show(struct seq_file *m, void *unused) } DEFINE_SHOW_ATTRIBUTE(dmar_translation_struct); +static int iommu_regset_show(struct seq_file *m, void *unused) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + unsigned long long base; + int i, ret = 0; + u64 value; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!drhd->reg_base_addr) { + seq_puts(m, "IOMMU: Invalid base address\n"); + ret = -EINVAL; + goto out; + } + + base = drhd->reg_base_addr; + seq_printf(m, "DMAR: %s: Register Base Address %llx\n", + iommu->name, base); + seq_puts(m, "Name\t\t\tOffset\t\tContents\n"); + /* +* Publish the contents of the 64-bit hardware registers +* by adding the offset to the pointer (virtual address). +*/ + for (i = 0 ; i < ARRAY_SIZE(iommu_regs); i++) { + value = dmar_readq(iommu->reg + iommu_regs[i].offset); + seq_printf(m, "%-8s\t\t0x%02x\t\t0x%016llx\n", + iommu_regs[i].regs, iommu_regs[i].offset, + value); + } + seq_putc(m, '\n'); + } +out: + rcu_read_unlock(); + + retu
[PATCH v7 0/5] Add Intel IOMMU debugfs support
Hi All, This series aims to add debugfs support for Intel IOMMU. It exposes IOMMU registers, internal context and dumps individual table entries to help debug Intel IOMMUs. The first patch does the ground work for the following patches by reorganizing some Intel IOMMU data structures. The following patches create a new Kconfig option - INTEL_IOMMU_DEBUG and add debugfs support for IOMMU context internals, register contents, PASID internals, and Interrupt remapping in that order. The information can be accessed in sysfs at '/sys/kernel/debug/intel_iommu/'. Regards, Sohil Changes since v6: - Split patch 1/5 and 2/5 differently - Simplify and improve code formatting - Use macro for register set definitions - Fix compiler warning for readq - Add Co-Developed-by tag to commit messages Changes since v5: - Change the order of includes to an alphabetical order - Change seq_printf and seq_puts formatting Changes since v4: - Change to a SPDX license tag - Fix seq_printf formatting and remove leading '\n's Changes since v3: - Remove an unused function parameter from some of the functions - Fix checkpatch.pl warnings - Remove error reporting for debugfs_create_file functions - Fix unnecessary reprogramming of the context entries - Simplify and merge the show context and extended context patch into one - Remove redundant IOMMU null check under for_each_active_iommu - Update the commit title to be consistent Changes since v2: - Added a macro for seq file operations based on recommendation by Andy Shevchenko. The marco can be moved to seq_file.h at a future point - Changed the debugfs file names to more relevant ones - Added information for MTRR registers in the regset file Changes since v1: - Fixed seq_printf formatting - Handled the case when Interrupt remapping is not enabled Gayatri Kammela (4): iommu/vt-d: Relocate struct/function declarations to its header files iommu/vt-d: Enable debugfs support to show context internals iommu/vt-d: Add debugfs support to show register contents iommu/vt-d: Add debugfs support to show Pasid table contents Sohil Mehta (1): iommu/vt-d: Add debugfs support for Interrupt remapping drivers/iommu/Kconfig | 8 + drivers/iommu/Makefile| 1 + drivers/iommu/intel-iommu-debug.c | 338 ++ drivers/iommu/intel-iommu.c | 34 +--- drivers/iommu/intel-svm.c | 8 - include/linux/intel-iommu.h | 39 + include/linux/intel-svm.h | 10 +- 7 files changed, 400 insertions(+), 38 deletions(-) create mode 100644 drivers/iommu/intel-iommu-debug.c -- 2.7.4
[PATCH v7 1/5] iommu/vt-d: Relocate struct/function declarations to its header files
From: Gayatri Kammela To reuse the static functions and the struct declarations, move them to corresponding header files and export the needed functions. Cc: Sohil Mehta Cc: Fenghua Yu Cc: Ashok Raj Signed-off-by: Jacob Pan Signed-off-by: Gayatri Kammela --- v7: Split patch 1/5 and 2/5 differently Update the commit message v6: No change v5: No change v4: No change v3: No change v2: No change drivers/iommu/intel-iommu.c | 33 - include/linux/intel-iommu.h | 31 +++ include/linux/intel-svm.h | 2 +- 3 files changed, 36 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 4a2de34..f6241f6 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -183,16 +183,6 @@ static int rwbf_quirk; static int force_on = 0; int intel_iommu_tboot_noforce; -/* - * 0: Present - * 1-11: Reserved - * 12-63: Context Ptr (12 - (haw-1)) - * 64-127: Reserved - */ -struct root_entry { - u64 lo; - u64 hi; -}; #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry)) /* @@ -218,21 +208,6 @@ static phys_addr_t root_entry_uctp(struct root_entry *re) return re->hi & VTD_PAGE_MASK; } -/* - * low 64 bits: - * 0: present - * 1: fault processing disable - * 2-3: translation type - * 12-63: address space root - * high 64 bits: - * 0-2: address width - * 3-6: aval - * 8-23: domain id - */ -struct context_entry { - u64 lo; - u64 hi; -}; static inline void context_clear_pasid_enable(struct context_entry *context) { @@ -259,7 +234,7 @@ static inline bool __context_present(struct context_entry *context) return (context->lo & 1); } -static inline bool context_present(struct context_entry *context) +bool context_present(struct context_entry *context) { return context_pasid_enabled(context) ? __context_present(context) : @@ -819,8 +794,8 @@ static void domain_update_iommu_cap(struct dmar_domain *domain) domain->iommu_superpage = domain_update_iommu_superpage(NULL); } -static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu, - u8 bus, u8 devfn, int alloc) +struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, +u8 devfn, int alloc) { struct root_entry *root = &iommu->root_entry[bus]; struct context_entry *context; @@ -5208,7 +5183,7 @@ static void intel_iommu_put_resv_regions(struct device *dev, #ifdef CONFIG_INTEL_IOMMU_SVM #define MAX_NR_PASID_BITS (20) -static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu) +unsigned long intel_iommu_get_pts(struct intel_iommu *iommu) { /* * Convert ecap_pss to extend context entry pts encoding, also diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index f3274d9..78ec85a 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -383,6 +383,33 @@ struct pasid_entry; struct pasid_state_entry; struct page_req_dsc; +/* + * 0: Present + * 1-11: Reserved + * 12-63: Context Ptr (12 - (haw-1)) + * 64-127: Reserved + */ +struct root_entry { + u64 lo; + u64 hi; +}; + +/* + * low 64 bits: + * 0: present + * 1: fault processing disable + * 2-3: translation type + * 12-63: address space root + * high 64 bits: + * 0-2: address width + * 3-6: aval + * 8-23: domain id + */ +struct context_entry { + u64 lo; + u64 hi; +}; + struct intel_iommu { void __iomem*reg; /* Pointer to hardware regs, virtual addr */ u64 reg_phys; /* physical address of hw register set */ @@ -488,8 +515,12 @@ struct intel_svm { extern int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev); extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev); +extern unsigned long intel_iommu_get_pts(struct intel_iommu *iommu); #endif extern const struct attribute_group *intel_iommu_groups[]; +extern bool context_present(struct context_entry *context); +extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu, + u8 bus, u8 devfn, int alloc); #endif diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index 99bc5b3..733eaf9 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -130,7 +130,7 @@ static inline int intel_svm_unbind_mm(struct device *dev, int pasid) BUG(); } -static int intel_svm_is_pasid_valid(struct device *dev, int pasid) +static inline int intel_svm_is_pasid_valid(struct device *dev, int pasid) { return -EINVAL; } -- 2.7.4
[PATCH v7 4/5] iommu/vt-d: Add debugfs support to show Pasid table contents
From: Gayatri Kammela Debugfs extension to dump the internals such as pasid table entries for each IOMMU to the userspace. Example of such dump in Kabylake: root@OTC-KBLH-01:~# cat /sys/kernel/debug/intel_iommu/dmar_translation_struct IOMMU dmar1: Extended Root Table Address:4071d3800 Extended Root Table Entries: Bus 0 L: 4071d7001 H: 0 Lower Context Table Entries for Bus: 0 [entry] Device B:D.FLow High [16]:00:02.04071d6005 102 Higher Context Table Entries for Bus: 0 [16]:00:02.00 0 Pasid Table Address: 746cb0af Pasid Table Entries for domain 0: [Entry] Contents [0] 12c409801 Cc: Fenghua Yu Cc: Jacob Pan Cc: Ashok Raj Co-Developed-by: Sohil Mehta Signed-off-by: Sohil Mehta Signed-off-by: Gayatri Kammela --- v7: Improve code indentation and formatting v6: No change v5: No change v4: Remove the unused function parameter Fix checkpatch.pl warnings v3: No change v2: Fix seq_printf formatting drivers/iommu/intel-iommu-debug.c | 31 +++ drivers/iommu/intel-svm.c | 8 include/linux/intel-svm.h | 8 3 files changed, 39 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-iommu-debug.c b/drivers/iommu/intel-iommu-debug.c index 38651ad..a9a99aa 100644 --- a/drivers/iommu/intel-iommu-debug.c +++ b/drivers/iommu/intel-iommu-debug.c @@ -81,6 +81,36 @@ static const struct iommu_regset iommu_regs[] = { IOMMU_REGSET_ENTRY(MTRRDEF) }; +#ifdef CONFIG_INTEL_IOMMU_SVM +static void pasid_tbl_entry_show(struct seq_file *m, struct intel_iommu *iommu) +{ + int pasid_size = 0, i; + + if (!ecap_pasid(iommu->ecap)) + return; + + pasid_size = intel_iommu_get_pts(iommu); + seq_printf(m, "Pasid Table Address: %p\n", iommu->pasid_table); + + if (!iommu->pasid_table) + return; + + seq_printf(m, "Pasid Table Entries for domain %d:\n", iommu->segment); + seq_puts(m, "[Entry]\t\tContents\n"); + + /* Publish the pasid table entries here */ + for (i = 0; i < pasid_size; i++) { + if (!iommu->pasid_table[i].val) + continue; + + seq_printf(m, "[%d]\t\t%04llx\n", i, iommu->pasid_table[i].val); + } +} +#else /* CONFIG_INTEL_IOMMU_SVM */ +static inline void +pasid_tbl_entry_show(struct seq_file *m, struct intel_iommu *iommu) {} +#endif /* CONFIG_INTEL_IOMMU_SVM */ + static void ctx_tbl_entry_show(struct seq_file *m, struct intel_iommu *iommu, int bus, bool ext) { @@ -116,6 +146,7 @@ static void ctx_tbl_entry_show(struct seq_file *m, struct intel_iommu *iommu, iommu->segment, bus, PCI_SLOT(ctx), PCI_FUNC(ctx), context[1].lo, context[1].hi); } + pasid_tbl_entry_show(m, iommu); out: spin_unlock_irqrestore(&iommu->lock, flags); } diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index ed1cf7c..c646724 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -28,14 +28,6 @@ static irqreturn_t prq_event_thread(int irq, void *d); -struct pasid_entry { - u64 val; -}; - -struct pasid_state_entry { - u64 val; -}; - int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu) { struct page *pages; diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index 733eaf9..a8abad6 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -18,6 +18,14 @@ struct device; +struct pasid_entry { + u64 val; +}; + +struct pasid_state_entry { + u64 val; +}; + struct svm_dev_ops { void (*fault_cb)(struct device *dev, int pasid, u64 address, u32 private, int rwxp, int response); -- 2.7.4
[PATCH v7 2/5] iommu/vt-d: Enable debugfs support to show context internals
From: Gayatri Kammela Add a new config option CONFIG_INTEL_IOMMU_DEBUG and export Intel IOMMU internals states, such as root and context in debugfs to the userspace. Example of such dump in Kabylake: root@OTC-KBLH-01:~# cat /sys/kernel/debug/intel_iommu/dmar_translation_struct IOMMU dmar1: Extended Root Table Address:4071d3800 Extended Root Table Entries: Bus 0 L: 4071d7001 H: 0 Lower Context Table Entries for Bus: 0 [entry] Device B:D.FLow High [16]:00:02.04071d6005 102 Higher Context Table Entries for Bus: 0 [16]:00:02.00 0 IOMMU dmar0: Extended Root Table Address:4071d4800 IOMMU dmar2: Root Table Address:4071d5000 Root Table Entries: Bus 0 L: 406d13001 H: 0 Context Table Entries for Bus: 0 [entry] Device B:D.FLow High [160] :00:14.0406d12001 102 [184] :00:17.0405756001 302 [248] :00:1f.0406d3b001 202 [251] :00:1f.3405497001 402 [254] :00:1f.640662e001 502 Root Table Entries: Bus 1 L: 401e03001 H: 0 Context Table Entries for Bus: 1 [entry] Device B:D.FLow High [0] :01:00.0401e04001 602 Cc: Fenghua Yu Cc: Ashok Raj Co-Developed-by: Sohil Mehta Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta Signed-off-by: Gayatri Kammela --- v7: Split patch 1/5 and 2/5 differently Update commit message and copyright year Fix typo in a comment Simplify code v6: Change the order of includes to an alphabetical order Change seq_printf formatting v5: Change to a SPDX license tag Fix seq_printf formatting v4: Remove the unused function parameter Fix checkpatch.pl warnings Remove error reporting for debugfs_create_file function Fix unnecessary reprogramming of the context entries Simplify and merge the show context and extended context patch into one Remove redundant IOMMU null check under for_each_active_iommu v3: Add a macro for seq file operations Change the intel_iommu_ctx file name to dmar_translation_struct v2: No change drivers/iommu/Kconfig | 8 +++ drivers/iommu/Makefile| 1 + drivers/iommu/intel-iommu-debug.c | 129 ++ drivers/iommu/intel-iommu.c | 1 + include/linux/intel-iommu.h | 6 ++ 5 files changed, 145 insertions(+) create mode 100644 drivers/iommu/intel-iommu-debug.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f3a2134..332648f 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -152,6 +152,14 @@ config INTEL_IOMMU and include PCI device scope covered by these DMA remapping devices. +config INTEL_IOMMU_DEBUG + bool "Export Intel IOMMU internals in Debugfs" + depends on INTEL_IOMMU && DEBUG_FS + help + Debugfs support to export IOMMU context internals, register contents, + PASID internals and interrupt remapping. To access this information in + sysfs, say Y. + config INTEL_IOMMU_SVM bool "Support for Shared Virtual Memory with Intel IOMMU" depends on INTEL_IOMMU && X86 diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 1fb6958..fdbaf46 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,6 +15,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o +obj-$(CONFIG_INTEL_IOMMU_DEBUG) += intel-iommu-debug.o obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o diff --git a/drivers/iommu/intel-iommu-debug.c b/drivers/iommu/intel-iommu-debug.c new file mode 100644 index 000..8253503 --- /dev/null +++ b/drivers/iommu/intel-iommu-debug.c @@ -0,0 +1,129 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright © 2018 Intel Corporation. + * + * Authors: Gayatri Kammela + * Jacob Pan + * Sohil Mehta + */ + +#define pr_fmt(fmt) "INTEL_IOMMU: " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "irq_remapping.h" + +#define TOTAL_BUS_NR 256 /* full bus range */ +#define DEFINE_SHOW_ATTRIBUTE(__name) \ +static int __name ## _open(struct inode *inode, struct file *file) \ +{ \ + return single_open(file, __name ## _show, inode->i_private);\ +} \ +static const struct file_operations __name ## _fops = \ +{ \ + .open = __name ## _open, \ + .read = seq_read, \
Re: [PATCH] net: qlge: use memmove instead of skb_copy_to_linear_data
From: Arnd Bergmann Date: Fri, 2 Feb 2018 16:45:44 +0100 > gcc-8 points out that the skb_copy_to_linear_data() argument points to > the skb itself, which makes it run into a problem with overlapping > memcpy arguments: > > In file included from include/linux/ip.h:20, > from drivers/net/ethernet/qlogic/qlge/qlge_main.c:26: > drivers/net/ethernet/qlogic/qlge/qlge_main.c: In function 'ql_realign_skb': > include/linux/skbuff.h:3378:2: error: 'memcpy' source argument is the same as > destination [-Werror=restrict] > memcpy(skb->data, from, len); > > It's unclear to me what the best solution is, maybe it ought to use a > different helper that adjusts the skb data in a safe way. Simply using > memmove() here seems like the easiest workaround. > > Signed-off-by: Arnd Bergmann This looks fine, applied, thanks.
Re: [GIT PULL] pin control bulk changes for v4.16
On Fri, Feb 2, 2018 at 2:56 PM, Linus Torvalds wrote: > > so I would really prefer to speed up recompiles and just generally try > to avoid horrible header file inclusion by doing the same thing in > , adding just that > > struct dev_pin_info; > > declaration, and removing the include. It turns out that some pinctl users seem to depend on this broken situation., with at least drivers/pinctrl/core.c drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c drivers/pinctrl/pinctrl-ocelot.c drivers/pinctrl/bcm/pinctrl-iproc-gpio.c expecting to magically get some of the pinctrl function declarations not through some pinctrl header file, but just from . Adding that include to would seem to make those happy and make 'allmodconfig' build for me. But I'm only testing x86-64. Can somebody test at least arm too? Stupid patch attached. I don't know how much this helps the insane dependency hell for , but it's bound to help _some_. Comments? Linus include/linux/device.h | 2 +- include/linux/pinctrl/pinctrl.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/device.h b/include/linux/device.h index f649fc0c2571..b093405ed525 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include @@ -41,6 +40,7 @@ struct fwnode_handle; struct iommu_ops; struct iommu_group; struct iommu_fwspec; +struct dev_pin_info; struct bus_attribute { struct attributeattr; diff --git a/include/linux/pinctrl/pinctrl.h b/include/linux/pinctrl/pinctrl.h index 5e45385c5bdc..8f5dbb84547a 100644 --- a/include/linux/pinctrl/pinctrl.h +++ b/include/linux/pinctrl/pinctrl.h @@ -18,6 +18,7 @@ #include #include #include +#include struct device; struct pinctrl_dev;
Re: [PATCH] net: qed: use correct strncpy() size
From: Arnd Bergmann Date: Fri, 2 Feb 2018 16:44:47 +0100 > passing the strlen() of the source string as the destination > length is pointless, and gcc-8 now warns about it: > > drivers/net/ethernet/qlogic/qed/qed_debug.c: In function 'qed_grc_dump': > include/linux/string.h:253: error: 'strncpy' specified bound depends on the > length of the source argument [-Werror=stringop-overflow=] > > This changes qed_grc_dump_big_ram() to instead uses the length of > the destination buffer, and use strscpy() to guarantee nul-termination. > > Signed-off-by: Arnd Bergmann Applied.
Re: [PATCH] tools: libsubcmd: Drop the less hack that was inherited from Git.
Hi, it looks like Sergey has put in a patch to fix the aliasing, looking at the linux-next tree. Are we still looking to remove this hack altogether? Thanks On Thu, Jan 25, 2018 at 08:24:26AM -0500, Arvind Sankar wrote: > Thanks. > > This was found because gcc 8 appears to be enabling -Wrestrict in -Wall, > so there is a build failure with mainline gcc. > > On Thu, Jan 25, 2018 at 05:16:52AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Wed, Jan 24, 2018 at 02:54:11PM -0600, Josh Poimboeuf escreveu: > > > On Tue, Jan 23, 2018 at 07:38:37PM -0500, Arvind Sankar wrote: > > > > We inherited this hack with the original code from the Git project. The > > > > select call is invalid as the two fd_set pointers should not be aliased. > > > > > > > > We could fix it, but the Git project removed this hack in 2012 in commit > > > > e8320f3 (pager: drop "wait for output to run less" hack). The bug it > > > > worked around was apparently fixed in less back in June 2007. > > > > > > > > So remove the hack from here as well. > > > > > > > > Signed-off-by: Arvind Sankar > > > > > > Looks good to me. > > > > > > Acked-by: Josh Poimboeuf > > > > > > Libsubcmd is used by perf and objtool, so adding the perf maintainers to > > > CC. Arnaldo, do you want to pick this one up? > > > > Sure, I'll put it in my perf/core branch. > > > > - Arnaldo > > > > > > --- > > > > tools/lib/subcmd/pager.c | 17 - > > > > tools/lib/subcmd/run-command.c | 2 -- > > > > tools/lib/subcmd/run-command.h | 1 - > > > > 3 files changed, 20 deletions(-) > > > > > > > > diff --git a/tools/lib/subcmd/pager.c b/tools/lib/subcmd/pager.c > > > > index 5ba754d17952..94d61d9b511f 100644 > > > > --- a/tools/lib/subcmd/pager.c > > > > +++ b/tools/lib/subcmd/pager.c > > > > @@ -1,5 +1,4 @@ > > > > // SPDX-License-Identifier: GPL-2.0 > > > > -#include > > > > #include > > > > #include > > > > #include > > > > @@ -23,21 +22,6 @@ void pager_init(const char *pager_env) > > > > subcmd_config.pager_env = pager_env; > > > > } > > > > > > > > -static void pager_preexec(void) > > > > -{ > > > > - /* > > > > -* Work around bug in "less" by not starting it until we > > > > -* have real input > > > > -*/ > > > > - fd_set in; > > > > - > > > > - FD_ZERO(&in); > > > > - FD_SET(0, &in); > > > > - select(1, &in, NULL, &in, NULL); > > > > - > > > > - setenv("LESS", "FRSX", 0); > > > > -} > > > > - > > > > static const char *pager_argv[] = { "sh", "-c", NULL, NULL }; > > > > static struct child_process pager_process; > > > > > > > > @@ -84,7 +68,6 @@ void setup_pager(void) > > > > pager_argv[2] = pager; > > > > pager_process.argv = pager_argv; > > > > pager_process.in = -1; > > > > - pager_process.preexec_cb = pager_preexec; > > > > > > > > if (start_command(&pager_process)) > > > > return; > > > > diff --git a/tools/lib/subcmd/run-command.c > > > > b/tools/lib/subcmd/run-command.c > > > > index 5cdac2162532..9e9dca717ed7 100644 > > > > --- a/tools/lib/subcmd/run-command.c > > > > +++ b/tools/lib/subcmd/run-command.c > > > > @@ -120,8 +120,6 @@ int start_command(struct child_process *cmd) > > > > unsetenv(*cmd->env); > > > > } > > > > } > > > > - if (cmd->preexec_cb) > > > > - cmd->preexec_cb(); > > > > if (cmd->exec_cmd) { > > > > execv_cmd(cmd->argv); > > > > } else { > > > > diff --git a/tools/lib/subcmd/run-command.h > > > > b/tools/lib/subcmd/run-command.h > > > > index 17d969c6add3..6256268802b5 100644 > > > > --- a/tools/lib/subcmd/run-command.h > > > > +++ b/tools/lib/subcmd/run-command.h > > > > @@ -46,7 +46,6 @@ struct child_process { > > > > unsigned no_stderr:1; > > > > unsigned exec_cmd:1; /* if this is to be external sub-command */ > > > > unsigned stdout_to_stderr:1; > > > > - void (*preexec_cb)(void); > > > > }; > > > > > > > > int start_command(struct child_process *); > > > > -- > > > > 2.13.6 > > > > > > > > > > -- > > > Josh
Re: [PATCH] net: cxgb4: avoid memcpy beyond end of source buffer
From: Arnd Bergmann Date: Fri, 2 Feb 2018 16:18:37 +0100 > Building with link-time-optimizations revealed that the cxgb4 driver does > a fixed-size memcpy() from a variable-length constant string into the > network interface name: > > In function 'memcpy', > inlined from 'cfg_queues_uld.constprop' at > drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:335:2, > inlined from 'cxgb4_register_uld.constprop' at > drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:719:9: > include/linux/string.h:350:3: error: call to '__read_overflow2' declared with > attribute error: detected read beyond size of object passed as 2nd parameter >__read_overflow2(); >^ > > I can see two equally workable solutions: either we use a strncpy() instead > of the memcpy() to stop at the end of the input, or we make the source buffer > fixed length as well. This implements the latter. > > Signed-off-by: Arnd Bergmann Not the most pleasant thing in the world, but I can't think of a better solution. > @@ -355,7 +355,7 @@ struct cxgb4_lld_info { > }; > > struct cxgb4_uld_info { > - const char *name; > + char name[IFNAMSIZ]; > void *handle; > unsigned int nrxq; > unsigned int rxq_size; David Laight asked how this can be the sole part of the patch. All of these structures are initialized like: static struct cxgb4_uld_info { .name = "foo", ... }; So changing from "const char *" to "char []" just works.
Re: [PATCH net 0/2] r8152: fix rx issues
From: Hayes Wang Date: Fri, 2 Feb 2018 16:43:34 +0800 > The two patched are used to fix rx issues. Series applied.
Re: [PATCH v3 14/21] fpga: dfl: add fpga manager platform driver for FME
Hi Hao, Alan, On Fri, Feb 02, 2018 at 05:42:13PM +0800, Wu Hao wrote: > On Thu, Feb 01, 2018 at 04:00:36PM -0600, Alan Tull wrote: > > On Mon, Nov 27, 2017 at 12:42 AM, Wu Hao wrote: > > > > Hi Hao, > > > > A few comments below. Besides that, looks good. > > > > > This patch adds fpga manager driver for FPGA Management Engine (FME). It > > > implements fpga_manager_ops for FPGA Partial Reconfiguration function. > > > > > > Signed-off-by: Tim Whisonant > > > Signed-off-by: Enno Luebbers > > > Signed-off-by: Shiva Rao > > > Signed-off-by: Christopher Rauer > > > Signed-off-by: Kang Luwei > > > Signed-off-by: Xiao Guangrong > > > Signed-off-by: Wu Hao > > > > > > v3: rename driver to dfl-fpga-fme-mgr > > > implemented status callback for fpga manager > > > rebased due to fpga api changes > > > --- > > > .../ABI/testing/sysfs-platform-fpga-dfl-fme-mgr| 8 + > > > drivers/fpga/Kconfig | 6 + > > > drivers/fpga/Makefile | 1 + > > > drivers/fpga/fpga-dfl-fme-mgr.c| 318 > > > + > > > drivers/fpga/fpga-dfl.h| 39 ++- > > > 5 files changed, 371 insertions(+), 1 deletion(-) > > > create mode 100644 > > > Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr > > > create mode 100644 drivers/fpga/fpga-dfl-fme-mgr.c > > > > > > diff --git a/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr > > > new file mode 100644 > > > index 000..2d4f917 > > > --- /dev/null > > > +++ b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr > > > @@ -0,0 +1,8 @@ > > > +What: /sys/bus/platform/devices/fpga-dfl-fme-mgr.0/interface_id > > > +Date: November 2017 > > > +KernelVersion: 4.15 > > > +Contact: Wu Hao > > > +Description: Read-only. It returns interface id of partial > > > reconfiguration > > > + hardware. Userspace could use this information to check if > > > + current hardware is compatible with given image before > > > FPGA > > > + programming. > > > > I'm a little confused by this. I can understand that the PR bitstream > > has a dependency on the FPGA's static image, but I don't understand > > the dependency of the bistream on the hardware that is used to program > > the bitstream to the FPGA. > > Sorry for the confusion, the interface_id is used to indicate the version of > the hardware for partial reconfiguration (it's part of the static image of > the FPGA device). Will improve the description on this. > The interface_id expresses the compatibility of the static region with PR bitstreams generated for it. It changes every time a new static region is generated. Would it make more sense to have the interface_id exposed as part of the FME device (which represents the static region)? I'm not sure - it kind of also makes sense here, where you would have all the information in one place (if the interface_id matches, I can use this component to program a bitstream). Sorry for my limited understanding of the infrastructure - would this same "fpga-dfl-fme-mgr.0" be used for PR if we had multiple PR regions? In that case it would need to expose multiple interface_ids (or we'd have to track both interface IDs and an identifier for the target PR region). > > > > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig > > > index 57da904..0171ecb 100644 > > > --- a/drivers/fpga/Kconfig > > > +++ b/drivers/fpga/Kconfig > > > @@ -150,6 +150,12 @@ config FPGA_DFL_FME > > > FPGA platform level management features. There shall be 1 FME > > > per DFL based FPGA device. > > > > > > +config FPGA_DFL_FME_MGR > > > + tristate "FPGA DFL FME Manager Driver" > > > + depends on FPGA_DFL_FME > > > + help > > > + Say Y to enable FPGA Manager driver for FPGA Management Engine. > > > + > > > config INTEL_FPGA_DFL_PCI > > > tristate "Intel FPGA DFL PCIe Device Driver" > > > depends on PCI && FPGA_DFL > > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile > > > index cc75bb3..6378580 100644 > > > --- a/drivers/fpga/Makefile > > > +++ b/drivers/fpga/Makefile > > > @@ -31,6 +31,7 @@ obj-$(CONFIG_OF_FPGA_REGION) += > > > of-fpga-region.o > > > # FPGA Device Feature List Support > > > obj-$(CONFIG_FPGA_DFL) += fpga-dfl.o > > > obj-$(CONFIG_FPGA_DFL_FME) += fpga-dfl-fme.o > > > +obj-$(CONFIG_FPGA_DFL_FME_MGR) += fpga-dfl-fme-mgr.o > > > > > > fpga-dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o > > > > > > diff --git a/drivers/fpga/fpga-dfl-fme-mgr.c > > > b/drivers/fpga/fpga-dfl-fme-mgr.c > > > new file mode 100644 > > > index 000..70356ce > > > --- /dev/null > > > +++ b/drivers/fpga/fpga-dfl-fme-mgr.c > > > @@ -0,0 +1,318 @@ > > > +/* > > > + * FPGA Manager Driver for FPGA Management Engine (FME) > >
Re: [PATCH v3 3/3] ARM: dts: imx: Add memory node unit name
On Wed, Jan 24, 2018 at 11:22 AM, Marco Franchi wrote: > Fix the following warnings from dtc by adding the unit name to memory > nodes: > > Warning (unit_address_vs_reg): Node /memory has a reg or ranges property, but > no unit name > > Converted using the following command: > > perl -p0777i -e 's/memory \{\n\t\treg = \<0x+([0-9a-f])/memory\@$1$\000 > \{\n\t\treg = <0x$1/m' `find ./arch/arm/boot/dts -name "imx*"` > > The files below were manually fixed: > -imx1-ads.dts > -imx1-apf9328.dts > -imx6q-pistachio.dts > > Signed-off-by: Marco Franchi Reviewed-by: Fabio Estevam
Re: [PATCH v3 2/3] ARM: dts: imx: Remove empty memory size nodes
On Wed, Jan 24, 2018 at 11:22 AM, Marco Franchi wrote: > Remove the empty reg property from the SoC dtsi files in order to avoid > duplicate memory nodes when the correct size is passed in board dts files. > > Signed-off-by: Marco Franchi Reviewed-by: Fabio Estevam
Re: [PATCH v3 1/3] ARM: dts: imx: Pass empty memory size on board dts
On Wed, Jan 24, 2018 at 11:22 AM, Marco Franchi wrote: > In preparation for removing 'reg = <0 0>;' from the dtsi SoC files, pass > 'reg = <0 0 >;' to the dts/dtsi board files that do not pass the memory > size. > > Signed-off-by: Marco Franchi Reviewed-by: Fabio Estevam
Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type
Hi Suravee, I love your patch! Perhaps something to improve: [auto build test WARNING on iommu/next] [also build test WARNING on v4.15 next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Suravee-Suthikulpanit/iommu-Fix-iommu_unmap-and-iommu_unmap_fast-return-type/20180203-015316 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/iommu/qcom_iommu.c:592:27: sparse: incorrect type in initializer >> (different signedness) @@ expected long ( )( ... ) @@ got unsigned long ( )( >> ... ) @@ drivers/iommu/qcom_iommu.c:592:27: expected long ( )( ... ) drivers/iommu/qcom_iommu.c:592:27: got unsigned long ( )( ... ) drivers/iommu/qcom_iommu.c:592:12: error: initialization from incompatible pointer type .unmap = qcom_iommu_unmap, ^~~~ drivers/iommu/qcom_iommu.c:592:12: note: (near initialization for 'qcom_iommu_ops.unmap') cc1: some warnings being treated as errors vim +592 drivers/iommu/qcom_iommu.c 0ae349a0f3 Rob Clark2017-08-09 584 0ae349a0f3 Rob Clark2017-08-09 585 static const struct iommu_ops qcom_iommu_ops = { 0ae349a0f3 Rob Clark2017-08-09 586 .capable= qcom_iommu_capable, 0ae349a0f3 Rob Clark2017-08-09 587 .domain_alloc = qcom_iommu_domain_alloc, 0ae349a0f3 Rob Clark2017-08-09 588 .domain_free= qcom_iommu_domain_free, 0ae349a0f3 Rob Clark2017-08-09 589 .attach_dev = qcom_iommu_attach_dev, 0ae349a0f3 Rob Clark2017-08-09 590 .detach_dev = qcom_iommu_detach_dev, 0ae349a0f3 Rob Clark2017-08-09 591 .map= qcom_iommu_map, 0ae349a0f3 Rob Clark2017-08-09 @592 .unmap = qcom_iommu_unmap, 0ae349a0f3 Rob Clark2017-08-09 593 .map_sg = default_iommu_map_sg, 4d689b6194 Robin Murphy 2017-09-28 594 .flush_iotlb_all = qcom_iommu_iotlb_sync, 4d689b6194 Robin Murphy 2017-09-28 595 .iotlb_sync = qcom_iommu_iotlb_sync, 0ae349a0f3 Rob Clark2017-08-09 596 .iova_to_phys = qcom_iommu_iova_to_phys, 0ae349a0f3 Rob Clark2017-08-09 597 .add_device = qcom_iommu_add_device, 0ae349a0f3 Rob Clark2017-08-09 598 .remove_device = qcom_iommu_remove_device, 0ae349a0f3 Rob Clark2017-08-09 599 .device_group = generic_device_group, 0ae349a0f3 Rob Clark2017-08-09 600 .of_xlate = qcom_iommu_of_xlate, 0ae349a0f3 Rob Clark2017-08-09 601 .pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M, 0ae349a0f3 Rob Clark2017-08-09 602 }; 0ae349a0f3 Rob Clark2017-08-09 603 :: The code at line 592 was first introduced by commit :: 0ae349a0f33fb040a2bc228fdc6d60111455feab iommu/qcom: Add qcom_iommu :: TO: Rob Clark :: CC: Joerg Roedel --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: KASAN: use-after-free Read in __list_add_valid (3)
On Wed, Jan 24, 2018 at 11:57:01PM -0800, syzbot wrote: > Hello, > > syzbot hit the following crash on upstream commit > 1f07476ec143bbed7bf0b641749783b1094b4c4f (Tue Jan 23 20:45:40 2018 +) > Merge tag 'pci-v4.15-fixes-3' of > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci > > So far this crash happened 6 times on upstream. > Unfortunately, I don't have any reproducer for this crash yet. > Raw console output is attached. > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+fbbedb95ed1d1e957...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > == > BUG: KASAN: use-after-free in __list_add_valid+0xb1/0xd0 lib/list_debug.c:23 > Read of size 8 at addr 8801bb5893d0 by task syz-executor1/28911 > > CPU: 0 PID: 28911 Comm: syz-executor1 Not tainted 4.15.0-rc9+ #277 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > print_address_description+0x73/0x250 mm/kasan/report.c:252 > kasan_report_error mm/kasan/report.c:351 [inline] > kasan_report+0x25b/0x340 mm/kasan/report.c:409 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 > __list_add_valid+0xb1/0xd0 lib/list_debug.c:23 > __list_add include/linux/list.h:60 [inline] > list_add include/linux/list.h:79 [inline] > __add_wait_queue include/linux/wait.h:156 [inline] > add_wait_queue+0xcf/0x290 kernel/sched/wait.c:30 > vhost_poll_func+0x3d/0x50 drivers/vhost/vhost.c:165 > poll_wait include/linux/poll.h:46 [inline] > eventfd_poll+0xe8/0x1f0 fs/eventfd.c:123 > vhost_poll_start+0x97/0x1c0 drivers/vhost/vhost.c:212 > vhost_vring_ioctl+0xe28/0x19b0 drivers/vhost/vhost.c:1556 > vhost_net_ioctl+0x9df/0x1910 drivers/vhost/net.c:1320 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 > SYSC_ioctl fs/ioctl.c:701 [inline] > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 > entry_SYSCALL_64_fastpath+0x29/0xa0 > RIP: 0033:0x452f19 > RSP: 002b:7f33da20dc58 EFLAGS: 0212 ORIG_RAX: 0010 > RAX: ffda RBX: 7f33da20e700 RCX: 00452f19 > RDX: 2088 RSI: 4008af20 RDI: 0017 > RBP: 00a2f850 R08: R09: > R10: R11: 0212 R12: > R13: 00a2f7cf R14: 7f33da20e9c0 R15: 0006 > > Allocated by task 28878: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 > __do_kmalloc_node mm/slab.c:3672 [inline] > __kmalloc_node+0x47/0x70 mm/slab.c:3679 > kmalloc_node include/linux/slab.h:541 [inline] > kvmalloc_node+0x64/0xd0 mm/util.c:397 > kvmalloc include/linux/mm.h:541 [inline] > vhost_net_open+0x27/0x670 drivers/vhost/net.c:902 > misc_open+0x382/0x500 drivers/char/misc.c:154 > chrdev_open+0x257/0x730 fs/char_dev.c:417 > do_dentry_open+0x667/0xd40 fs/open.c:752 > vfs_open+0x107/0x220 fs/open.c:866 > do_last fs/namei.c:3379 [inline] > path_openat+0x1151/0x3530 fs/namei.c:3519 > do_filp_open+0x25b/0x3b0 fs/namei.c:3554 > do_sys_open+0x502/0x6d0 fs/open.c:1059 > SYSC_openat fs/open.c:1086 [inline] > SyS_openat+0x30/0x40 fs/open.c:1080 > entry_SYSCALL_64_fastpath+0x29/0xa0 > > Freed by task 28878: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 > __cache_free mm/slab.c:3488 [inline] > kfree+0xd6/0x260 mm/slab.c:3803 > kvfree+0x36/0x60 mm/util.c:416 > vhost_net_release+0x159/0x190 drivers/vhost/net.c:1012 > __fput+0x327/0x7e0 fs/file_table.c:210 > fput+0x15/0x20 fs/file_table.c:244 > task_work_run+0x199/0x270 kernel/task_work.c:113 > tracehook_notify_resume include/linux/tracehook.h:191 [inline] > exit_to_usermode_loop+0x296/0x310 arch/x86/entry/common.c:162 > prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline] > syscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264 > entry_SYSCALL_64_fastpath+0x9e/0xa0 > > The buggy address belongs to the object at 8801bb589140 > which belongs to the cache kmalloc-65536 of size 65536 > The buggy address is located 656 bytes inside of > 65536-byte region [8801bb589140, 8801bb599140) > The buggy address belongs to the page: > page:ea0006ed6000 count:1 mapcount:0 mapping:8801bb589140 index:0x0 > compound_mapcount: 0 > flags: 0x2fffc008100(slab|head) > raw: 02fffc008100 8801bb589140 00010001 > raw: ea00068b0820 ea000693d020 8801dac02500 > page dumped because: kasan: bad access detecte
Re: [PATCH 2/2] MAINTAINERS: list file memory-barriers.txt within the LKMM entry
On Fri, Feb 02, 2018 at 10:13:42AM +0100, Andrea Parri wrote: > Now that a formal specification of the LKMM has become available to > the developer, some concern about how to track changes to the model > on the level of the "high-level documentation" was raised. > > A first "mitigation" to this issue, suggested by Will, is to assign > maintainership (and responsibility!!) of such documentation (here, > memory-barriers.txt) to the maintainers of the LKMM themselves. > > Suggested-by: Will Deacon > Signed-off-by: Andrea Parri Very good, thank you, queued! Please see below for the usual commit-log rework. BTW, in future submissions, could you please capitalize the first word after the colon (":") in the subject line? It is all too easy for me to forget to change this, as Ingo can attest. ;-) If we are going to continue to use the LKMM acronym, should we make the first line of the MAINTAINERS block look something like this? LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) One alternative would be to start calling it LKMCM, though that does look a bit like a Roman numeral. ;-) Thanx, Paul commit 2f80571625dc2d1977acdef79267ba1645b07c53 Author: Andrea Parri Date: Fri Feb 2 10:13:42 2018 +0100 MAINTAINERS: List file memory-barriers.txt within the LKMM entry We now have a shiny new Linux-kernel memory model (LKMM) and the old tried-and-true Documentation/memory-barrier.txt. It would be good to keep these automatically synchronized, but in the meantime we need at least let people know that they are related. Will suggested adding the Documentation/memory-barrier.txt file to the LKMM maintainership list, thus making the LKMM maintainers responsible for both the old and the new. This commit follows Will's excellent suggestion. Suggested-by: Will Deacon Signed-off-by: Andrea Parri Signed-off-by: Paul E. McKenney diff --git a/MAINTAINERS b/MAINTAINERS index ba4dc08fbe95..e6ad9b44e8fb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8101,6 +8101,7 @@ L:linux-kernel@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git F: tools/memory-model/ +F: Documentation/memory-barriers.txt LINUX SECURITY MODULE (LSM) FRAMEWORK M: Chris Wright
Re: [PATCH 1/2] tools/memory-model: clarify the origin/scope of the tool name
On Fri, Feb 02, 2018 at 03:19:22PM -0800, Paul E. McKenney wrote: > On Fri, Feb 02, 2018 at 11:44:21AM +0100, Andrea Parri wrote: > > On Thu, Feb 01, 2018 at 03:09:41PM -0800, Paul E. McKenney wrote: > > > On Thu, Feb 01, 2018 at 10:26:50AM -0500, Alan Stern wrote: > > > > On Thu, 1 Feb 2018, Andrea Parri wrote: > > > > > > > > > Ingo pointed out that: > > > > > > > > > > "The "memory model" name is overly generic, ambiguous and somewhat > > > > >misleading, as we usually mean the virtual memory layout/model > > > > >when we say "memory model". GCC too uses it in that sense [...]" > > > > > > > > > > Make it clearer that, in the context of tools/memory-model/, the term > > > > > "memory-model" is used as shorthand for "memory consistency model" by > > > > > calling out this convention in tools/memory-model/README. > > > > > > > > > > Stick to the full name in sources' headers and for the subsystem name. > > > > > > > > > > Suggested-by: Ingo Molnar > > > > > Signed-off-by: Andrea Parri > > > > > > > > For both patches: > > > > > > > > Acked-by: Alan Stern > > > > > > Thank you all -- I have queued this and pushed it to my RCU tree on > > > branch lkmm. I did reword the commit log a bit, please see below and > > > please let me know if any of my rewordings need halp. > > > > Seems to me that your message has a leftover "is used". > > Good catch, how about this instead? Looks good to me. The same for 12a62a1d07031. Thanks, Andrea > > Thanx, Paul > > --- > > commit 2b1b4ab5166209da849f306fbdc84114d9e611fd > Author: Andrea Parri > Date: Thu Feb 1 13:03:29 2018 +0100 > > tools/memory-model: Clarify the origin/scope of the tool name > > Ingo pointed out that: > > "The "memory model" name is overly generic, ambiguous and somewhat >misleading, as we usually mean the virtual memory layout/model >when we say "memory model". GCC too uses it in that sense [...]" > > Make it clear that tools/memory-model/ uses the term "memory model" as > shorthand for "memory consistency model" by calling out this convention > in tools/memory-model/README. > > Stick to the original "memory model" term in sources' headers and for > the subsystem name. > > Suggested-by: Ingo Molnar > Signed-off-by: Andrea Parri > Acked-by: Will Deacon > Acked-by: Alan Stern > Signed-off-by: Paul E. McKenney > > diff --git a/tools/memory-model/MAINTAINERS b/tools/memory-model/MAINTAINERS > index 711cbe72d606..db3bd3fc0435 100644 > --- a/tools/memory-model/MAINTAINERS > +++ b/tools/memory-model/MAINTAINERS > @@ -1,4 +1,4 @@ > -LINUX KERNEL MEMORY MODEL > +LINUX KERNEL MEMORY CONSISTENCY MODEL > M: Alan Stern > M: Andrea Parri > M: Will Deacon > diff --git a/tools/memory-model/README b/tools/memory-model/README > index 43ba49492111..91414a49fac5 100644 > --- a/tools/memory-model/README > +++ b/tools/memory-model/README > @@ -1,15 +1,15 @@ > - = > - LINUX KERNEL MEMORY MODEL > - = > + = > + LINUX KERNEL MEMORY CONSISTENCY MODEL > + = > > > INTRODUCTION > > > -This directory contains the memory model of the Linux kernel, written > -in the "cat" language and executable by the (externally provided) > -"herd7" simulator, which exhaustively explores the state space of > -small litmus tests. > +This directory contains the memory consistency model (memory model, for > +short) of the Linux kernel, written in the "cat" language and executable > +by the externally provided "herd7" simulator, which exhaustively explores > +the state space of small litmus tests. > > In addition, the "klitmus7" tool (also externally provided) may be used > to convert a litmus test to a Linux kernel module, which in turn allows > diff --git a/tools/memory-model/linux-kernel.bell > b/tools/memory-model/linux-kernel.bell > index 57112505f5e0..b984bbda01a5 100644 > --- a/tools/memory-model/linux-kernel.bell > +++ b/tools/memory-model/linux-kernel.bell > @@ -11,7 +11,7 @@ > * which is to appear in ASPLOS 2018. > *) > > -"Linux kernel memory model" > +"Linux-kernel memory consistency model" > > enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) || > 'release (*smp_store_release*) || > diff --git a/tools/memory-model/linux-kernel.cat > b/tools/memory-model/linux-kernel.cat > index 15b7a5dd8a9a..babe2b3b0bb3 100644 > --- a/tools/memory-model/linux-kernel.cat > +++ b/tools/memory-model/linux-kernel.cat > @@ -11,7 +11,7 @@ > * which is to appear in ASPLOS 2018. > *) > > -"Linux kernel memory model" > +"Linux-kernel memory consistency model" >
Re: Can RCU stall lead to hard lockups?
Quoting Paul E. McKenney (paul...@linux.vnet.ibm.com): > On Tue, Jan 09, 2018 at 06:11:14AM -0800, Tejun Heo wrote: > > Hello, Paul. > > > > On Mon, Jan 08, 2018 at 08:24:25PM -0800, Paul E. McKenney wrote: > > > > I don't know the RCU code at all but it *looks* like the first CPU is > > > > taking a sweet while flushing printk buffer while holding a lock (the > > > > console is IPMI serial console, which faithfully emulates 115200 baud > > > > rate), and everyone else seems stuck waiting for that spinlock in > > > > rcu_check_callbacks(). > > > > > > > > Does this sound possible? > > > > > > 115200 baud? Ouch!!! That -will- result in trouble from console > > > printing, and often also in RCU CPU stall warnings. > > > > It could even be slower than 115200, and we occassionally see RCU > > stall warnings caused by printk storms, for example, while the kernel > > is trying to dump a lot of info after an OOM. That's an issue we > > probably want to improve from printk side; however, they don't usually > > lead to NMI hard lockup detector kicking in and crashing the machine, > > which is the peculiarity here. > > > > Hmmm... show_state_filter(), the function which dumps all task > > backtraces, share a similar problem and it avoids it by explicitly > > calling touch_nmi_watchdog(). Maybe we can do something like the > > following from RCU too? > > If this fixes things for you, I would welcome such a patch. Hi - would this also be relevant to 4.9-stable and 4.4-stable, or has something elsewhere changed after 4.9 that actually triggers this? thanks, -serge > Thanx, Paul > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index db85ca3..3c4c4d3 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -561,8 +561,14 @@ static void rcu_print_detail_task_stall_rnp(struct > > rcu_node *rnp) > > } > > t = list_entry(rnp->gp_tasks->prev, > >struct task_struct, rcu_node_entry); > > - list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) > > + list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) { > > + touch_nmi_watchdog(); > > + /* > > +* We could be printing a lot of these messages while > > +* holding a spinlock. Avoid triggering hard lockup. > > +*/ > > sched_show_task(t); > > + } > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > } > > > > @@ -1678,6 +1684,12 @@ static void print_cpu_stall_info(struct rcu_state > > *rsp, int cpu) > > char *ticks_title; > > unsigned long ticks_value; > > > > + /* > > +* We could be printing a lot of these messages while holding a > > +* spinlock. Avoid triggering hard lockup. > > +*/ > > + touch_nmi_watchdog(); > > + > > if (rsp->gpnum == rdp->gpnum) { > > ticks_title = "ticks this GP"; > > ticks_value = rdp->ticks_this_gp; > >
Re: [PATCH] Fix typo IBRS_ATT, which should be IBRS_ALL
On Fri, 2018-02-02 at 19:12 +, Darren Kenny wrote: > Fixes a typo in commit 117cc7a908c83697b0b737d15ae1eb5943afe35b > ("x86/retpoline: Fill return stack buffer on vmexit") > > Signed-off-by: Darren Kenny > Reviewed-by: Konrad Rzeszutek Wilk Not strictly a typo; that was the original name for it. "IBRS all the time". But yes, it should be IBRS_ALL now. Acked-by: David Woodhouse smime.p7s Description: S/MIME cryptographic signature
Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 05:04:37PM -0200, Marcelo Ricardo Leitner wrote: > On Fri, Dec 22, 2017 at 04:28:07PM -0200, Marcelo Ricardo Leitner wrote: > > On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: > > ... > > > > Same with this one, perhaps related to / fixed by: > > > > http://patchwork.ozlabs.org/patch/850957/ > > > > > > > > > > > > > > > > Looking at the log, this one seems to be an infinite loop in SCTP code > > > with console output in it. Kernel is busy printing gazilion of: > > > > > > [ 176.491099] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > ** 110 printk messages dropped ** > > > [ 176.503409] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > ** 103 printk messages dropped ** > > > ... > > > [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > [ 246.742484] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > [ 246.742590] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > > > > Looks like a different issue. > > > > > > > Oh. I guess this is caused by the interface having a MTU smaller than > > SCTP_DEFAULT_MINSEGMENT (512), as the icmp frag needed handler > > (sctp_icmp_frag_needed) will trigger an instant retransmission. > > But as the MTU is smaller, SCTP won't update it, but will issue the > > retransmission anyway. > > > > I will test this soon. Should be fairly easy to trigger it. > > Reproduced it. > > netns A veth0(1500) - veth1(1500) B veth2(508) - veth3(508) C > > When A sends a sctp packet bigger than 508, it triggers the issue as B > will reply a icmp frag needed with a size that sctp won't accept but > will retransmit anyway. > syzbot hasn't encountered this hang again (although, it just happened once in the first place). I assume it was fixed by commit b6c5734db070, so telling syzbot this: #syz fix: sctp: fix the handling of ICMP Frag Needed for too small MTUs - Eric
Re: [PATCH] xen: hypercall: fix out-of-bounds memcpy
On 02/02/2018 10:32 AM, Arnd Bergmann wrote: > The legacy hypercall handlers were originally added with > a comment explaining that "copying the argument structures in > HYPERVISOR_event_channel_op() and HYPERVISOR_physdev_op() into the local > variable is sufficiently safe" and only made sure to not write > past the end of the argument structure, the checks in linux/string.h > disagree with that, when link-time optimizations are used: > > In function 'memcpy', > inlined from 'pirq_query_unmask' at drivers/xen/fallback.c:53:2, > inlined from '__startup_pirq' at drivers/xen/events/events_base.c:529:2, > inlined from 'restore_pirqs' at drivers/xen/events/events_base.c:1439:3, > inlined from 'xen_irq_resume' at drivers/xen/events/events_base.c:1581:2: > include/linux/string.h:350:3: error: call to '__read_overflow2' declared with > attribute error: detected read beyond size of object passed as 2nd parameter >__read_overflow2(); >^ > make[3]: *** [ccLujFNx.ltrans15.ltrans.o] Error 1 > make[3]: Target 'all' not remade because of errors. > lto-wrapper: fatal error: make returned 2 exit status > compilation terminated. > ld: error: lto-wrapper failed > > This changes the functions so that each argument is accessed with > exactly the correct length based on the command code. > > Fixes: cf47a83fb06e ("xen/hypercall: fix hypercall fallback code for very old > hypervisors") > Signed-off-by: Arnd Bergmann > --- > drivers/xen/fallback.c | 94 > -- > 1 file changed, 53 insertions(+), 41 deletions(-) > > diff --git a/drivers/xen/fallback.c b/drivers/xen/fallback.c > index b04fb64c5a91..eded8dd821ad 100644 > --- a/drivers/xen/fallback.c > +++ b/drivers/xen/fallback.c > @@ -7,75 +7,87 @@ > > int xen_event_channel_op_compat(int cmd, void *arg) > { > - struct evtchn_op op; > + struct evtchn_op op = { .cmd = cmd, }; > + size_t len; > int rc; > > - op.cmd = cmd; > - memcpy(&op.u, arg, sizeof(op.u)); > - rc = _hypercall1(int, event_channel_op_compat, &op); > - > switch (cmd) { > + case EVTCHNOP_bind_interdomain: > + len = sizeof(struct evtchn_bind_interdomain); > + break; > + case EVTCHNOP_bind_virq: > + len = sizeof(struct evtchn_bind_virq); > + break; > + case EVTCHNOP_bind_pirq: > + len = sizeof(struct evtchn_bind_pirq); > + break; > case EVTCHNOP_close: > + len = sizeof(struct evtchn_close); > + break; > case EVTCHNOP_send: > + len = sizeof(struct evtchn_send); > + break; > + case EVTCHNOP_alloc_unbound: > + len = sizeof(struct evtchn_alloc_unbound); > + break; > + case EVTCHNOP_bind_ipi: > + len = sizeof(struct evtchn_bind_ipi); > + break; > + case EVTCHNOP_status: > + len = sizeof(struct evtchn_status); > + break; > case EVTCHNOP_bind_vcpu: > + len = sizeof(struct evtchn_bind_vcpu); > + break; > case EVTCHNOP_unmask: > - /* no output */ > + len = sizeof(struct evtchn_unmask); > break; > - > -#define COPY_BACK(eop) \ > - case EVTCHNOP_##eop: \ > - memcpy(arg, &op.u.eop, sizeof(op.u.eop)); \ > - break > - > - COPY_BACK(bind_interdomain); > - COPY_BACK(bind_virq); > - COPY_BACK(bind_pirq); > - COPY_BACK(status); > - COPY_BACK(alloc_unbound); > - COPY_BACK(bind_ipi); > -#undef COPY_BACK > - > default: > - WARN_ON(rc != -ENOSYS); > - break; > + return -ENOSYS; > } > > + memcpy(&op.u, arg, len); > + rc = _hypercall1(int, event_channel_op_compat, &op); > + memcpy(arg, &op.u, len); We don't copy back for all commands, only those that are COPY_BACK. > + > return rc; > } > EXPORT_SYMBOL_GPL(xen_event_channel_op_compat); > > int xen_physdev_op_compat(int cmd, void *arg) > { > - struct physdev_op op; > + struct physdev_op op = { .cmd = cmd, }; > + size_t len; > int rc; > > - op.cmd = cmd; > - memcpy(&op.u, arg, sizeof(op.u)); > - rc = _hypercall1(int, physdev_op_compat, &op); > - > switch (cmd) { > case PHYSDEVOP_IRQ_UNMASK_NOTIFY: > + len = 0; > + break; > + case PHYSDEVOP_irq_status_query: > + len = sizeof(struct physdev_irq_status_query); > + break; > case PHYSDEVOP_set_iopl: > + len = sizeof(struct physdev_set_iopl); > + break; > case PHYSDEVOP_set_iobitmap: > + len = sizeof(struct physdev_set_iobitmap); > + break; > + case PHYSDEVOP_apic_read: > case PHYSDEVOP_apic_write: > - /* no output */ > + len = sizeof(struct physdev_apic); > break; > - > -#define COPY_BACK(pop, fld) \ > - case
[RFC] x86/retpoline: Add clang support for 64-bit builds
clang has its own set of compiler options for retpoline support. Also, the thunks required by C code have their own function names. For 64-bit builds, there is only a single thunk, which is easy to support. Support for 32-bit builds is more complicated - in addition to various register thunks, there is also a thunk named __llvm_external_retpoline_push which is more challenging. Play it safe and only support 64-bit clang builds for now. Link: https://github.com/llvm-mirror/clang/commit/0d816739a82da29748caf88570affb9715e18b69 Cc: David Woodhouse Cc: Thomas Gleixner Cc: Ingo Molnar Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel Cc: Andi Kleen Cc: Josh Poimboeuf Cc: thomas.lenda...@amd.com Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jiri Kosina Cc: Andy Lutomirski Cc: Dave Hansen Cc: Kees Cook Cc: Tim Chen Cc: Greg Kroah-Hartman Cc: Paul Turner Signed-off-by: Guenter Roeck --- Sent as RFC because I am not sure if the 64-bit only solution is acceptable. arch/x86/Makefile| 5 - arch/x86/lib/retpoline.S | 24 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index fad55160dcb9..536dd6775988 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -232,7 +232,10 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables # Avoid indirect branches in kernel to deal with Spectre ifdef CONFIG_RETPOLINE -RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) +RETPOLINE_CFLAGS = $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) +ifeq ($(RETPOLINE_CFLAGS)$(CONFIG_X86_32),) + RETPOLINE_CFLAGS = $(call cc-option,-mretpoline -mretpoline-external-thunk) +endif ifneq ($(RETPOLINE_CFLAGS),) KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE endif diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index 480edc3a5e03..f77738b13481 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -9,14 +9,22 @@ #include #include -.macro THUNK reg +.macro _THUNK prefix, reg .section .text.__x86.indirect_thunk -ENTRY(__x86_indirect_thunk_\reg) +ENTRY(\prefix\reg) CFI_STARTPROC JMP_NOSPEC %\reg CFI_ENDPROC -ENDPROC(__x86_indirect_thunk_\reg) +ENDPROC(\prefix\reg) +.endm + +.macro THUNK reg +_THUNK __x86_indirect_thunk_ \reg +.endm + +.macro CLANG_THUNK reg +_THUNK __llvm_external_retpoline_ \reg .endm /* @@ -27,8 +35,11 @@ ENDPROC(__x86_indirect_thunk_\reg) * the simple and nasty way... */ #define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym) -#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg) +#define _EXPORT_THUNK(thunk, reg) __EXPORT_THUNK(thunk ## reg) +#define EXPORT_THUNK(reg) _EXPORT_THUNK(__x86_indirect_thunk_, reg) #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) +#define EXPORT_CLANG_THUNK(reg) _EXPORT_THUNK(__llvm_external_retpoline_, reg) +#define GENERATE_CLANG_THUNK(reg) CLANG_THUNK reg ; EXPORT_CLANG_THUNK(reg) GENERATE_THUNK(_ASM_AX) GENERATE_THUNK(_ASM_BX) @@ -46,6 +57,11 @@ GENERATE_THUNK(r12) GENERATE_THUNK(r13) GENERATE_THUNK(r14) GENERATE_THUNK(r15) + +#ifdef __clang__ +GENERATE_CLANG_THUNK(r11) +#endif + #endif /* -- 2.7.4
Re: suspicious RCU usage at ./include/linux/rcupdate.h:LINE (4)
On Fri, Feb 02, 2018 at 06:58:01AM -0800, syzbot wrote: > Hello, > > syzbot hit the following crash on bpf-next commit > b2fe5fa68642860e7de76167c3111623aa0d5de1 (Wed Jan 31 22:31:10 2018 +) > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > > So far this crash happened 1575 times on bpf-next. > C reproducer is attached. > syzkaller reproducer is attached. > Raw console output is attached. > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+7dbcd2d3b85f9b608...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > audit: type=1400 audit(1517546098.866:9): avc: denied { prog_run } for > pid=4159 comm="syzkaller076311" > scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 > tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=bpf > permissive=1 > > = > WARNING: suspicious RCU usage > 4.15.0+ #10 Not tainted > - > ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side > critical section! > > other info that might help us debug this: > > > rcu_scheduler_active = 2, debug_locks = 1 > 3 locks held by syzkaller076311/4159: > #0: (&ctx->mutex){+.+.}, at: [<27c8872d>] > perf_event_ctx_lock_nested+0x21b/0x450 kernel/events/core.c:1253 > #1: (bpf_event_mutex){+.+.}, at: [<92294d8c>] > perf_event_query_prog_array+0x10e/0x280 kernel/trace/bpf_trace.c:876 > #2: (rcu_read_lock){}, at: [<2b518ca0>] > bpf_prog_array_copy_to_user+0x0/0x4d0 kernel/bpf/core.c:1568 > > stack backtrace: > CPU: 0 PID: 4159 Comm: syzkaller076311 Not tainted 4.15.0+ #10 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4592 > rcu_preempt_sleep_check include/linux/rcupdate.h:301 [inline] > ___might_sleep+0x385/0x470 kernel/sched/core.c:6079 > __might_sleep+0x95/0x190 kernel/sched/core.c:6067 > __might_fault+0xab/0x1d0 mm/memory.c:4532 > _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 > copy_to_user include/linux/uaccess.h:155 [inline] > bpf_prog_array_copy_to_user+0x217/0x4d0 kernel/bpf/core.c:1587 > bpf_prog_array_copy_info+0x17b/0x1c0 kernel/bpf/core.c:1685 > perf_event_query_prog_array+0x196/0x280 kernel/trace/bpf_trace.c:877 > _perf_ioctl kernel/events/core.c:4737 [inline] > perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4757 > vfs_ioctl fs/ioctl.c:46 [inline] fyi it was copy_to_user in rcu section bug. Submitted a fix here: https://patchwork.ozlabs.org/patch/868824/
Re: Compilation error report for: drivers/firmware/qcom_scm.c:469:47: error: passing argument 3 of ?dma_alloc_coherent? from incompatible pointer type
On Tue 30 Jan 05:25 PST 2018, Arnd Bergmann wrote: > On Tue, Jan 30, 2018 at 11:11 AM, Benjamin GAIGNARD > wrote: > > > > On 01/12/2018 05:11 PM, Arnaud Pouliquen wrote: > >> Hello Andy,David, > > + Arnd > > > > I have the same issue on drm-misc-next. > > Does Arnaud's fix make sense or should we update/change the way of how > > we compile the kernel ? > > We've hit a couple of bugs with qcom drivers confusing physical addresses > and DMA addresses in the past, usually the drivers were buggy in > some form, and tried to use dma_alloc_coherent() to get a buffer > that gets passed into a firmware interface taking a physical address, > which is of course completely wrong. > Thanks Arnd, for once again using the words "bug" and "completely wrong" when referring to something that obviously works just fine... The solution you introduced for venus and adreno relies on static reservations of system ram, which isn't pretty, but more importantly isn't viable for the qcom_scm driver. So, how do I dynamically allocate a chunk of coherent memory? Preferably with the possibility of unmapping it temporarily from Linux while passing the buffer into the trusted environment (as any accesses during the operation might cause access violations). Regards, Bjorn
Re: [RFC net 1/1] rtnetlink: require unique netns identifier
On 2/2/18 1:51 AM, Christian Brauner wrote: > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 56af8e41abfc..d0b7ab22eff4 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -1951,6 +1951,18 @@ static struct net *rtnl_link_get_net_capable(const > struct sk_buff *skb, > return net; > } > > +/* Verify that rtnetlink requests that support network namespace ids do not > pass > + * additional properties that allow to identify a network namespace as they > + * might conflict. > + */ > +static int rtnl_ensure_unique_netns_attr(struct nlattr *tb[]) > +{ > + if (tb[IFLA_IF_NETNSID] && (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD])) > + return -EINVAL; The days of just returning EINVAL are over; please plumb extack arg to this message and add a string describing the problem. There are plenty of examples in rtnetlink.c Also, what if those NSID's all point to the same namespace? That should not fail right?
Re: [PATCH v4 3/3] x86/kvm: Expose AMD Core Perf Extension flag to guests
On 2/2/2018 2:03 PM, kbuild test robot wrote: Hi Janakarajan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tip/x86/core] [also build test WARNING on v4.15] [cannot apply to kvm/linux-next next-20180202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] This patch uses functions defined in commit 'd6321d493319bfd406c484e8359c6101cbda39d3 KVM: x86: generalize guest_cpuid_has_ helpers'. https://lkml.org/lkml/2017/8/2/811 url: https://github.com/0day-ci/linux/commits/Janakarajan-Natarajan/Support-Perf-Extensions-on-AMD-KVM-guests/20180202-231344 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) arch/x86/kvm/cpuid.c:58:6: sparse: symbol 'perf_ext_supported' was not declared. Should it be Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: BUG: unable to handle kernel NULL pointer dereference in __crypto_register_alg
On Sat, Dec 23, 2017 at 11:54:01PM -0800, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 6084b576dca2e898f5c101baef151f7bfdbb606d > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > > Unfortunately, I don't have any reproducer for this bug yet. > > > netlink: 'syz-executor4': attribute type 29 has an invalid length. > BUG: unable to handle kernel NULL pointer dereference at 0020 > IP: __crypto_register_alg+0x7b/0x300 crypto/algapi.c:212 > PGD 0 P4D 0 > Oops: [#1] SMP > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 19130 Comm: cryptomgr_probe Not tainted > 4.15.0-rc3-next-20171214+ #67 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > RIP: 0010:__crypto_register_alg+0x7b/0x300 crypto/algapi.c:212 > RSP: 0018:c9fefe00 EFLAGS: 00010293 > RAX: 88020df4c240 RBX: RCX: 8167622b > RDX: RSI: 8801fa6f8559 RDI: 8801fa6f8881 > RBP: c9fefe30 R08: 0001 R09: 0004 > R10: c9fefdb0 R11: 0004 R12: 8801fa6f84a0 > R13: R14: 080e R15: 8801fa6f8900 > FS: () GS:88021fc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0020 CR3: 0301e004 CR4: 001626f0 > DR0: 2000 DR1: 2000 DR2: > DR3: DR6: 0ff0 DR7: 0600 > Call Trace: > crypto_register_instance+0x83/0x140 crypto/algapi.c:544 > shash_register_instance+0x34/0x50 crypto/shash.c:532 > cbcmac_create+0x15c/0x190 crypto/ccm.c:988 > cryptomgr_probe+0x40/0x100 crypto/algboss.c:75 > kthread+0x149/0x170 kernel/kthread.c:238 > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524 > Code: 10 49 89 44 24 18 48 81 fb 60 21 0e 83 0f 84 bb 00 00 00 e8 28 41 c4 > ff 49 39 dc 74 43 49 8d 44 24 38 48 89 45 d0 e8 15 41 c4 ff <44> 8b 6b 20 41 > f6 c5 60 75 78 e8 06 41 c4 ff 41 83 e5 10 4c 8d > RIP: __crypto_register_alg+0x7b/0x300 crypto/algapi.c:212 RSP: > c9fefe00 > CR2: 0020 > ---[ end trace 598f24e6511387a3 ]--- > Kernel panic - not syncing: Fatal exception > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled > Rebooting in 86400 seconds.. > This is yet another one that was reported while KASAN was accidentally disabled, and it only happened once, so invalidating: #syz invalid - Eric
Re: RFC(V3): Audit Kernel Container IDs
On Fri, Feb 2, 2018 at 5:19 PM, Simo Sorce wrote: > On Fri, 2018-02-02 at 16:24 -0500, Paul Moore wrote: >> On Wed, Jan 10, 2018 at 2:00 AM, Richard Guy Briggs wrote: >> > On 2018-01-09 11:18, Simo Sorce wrote: >> > > On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote: ... >> > Paul, can you justify this somewhat larger inconvenience for some >> > relatively minor convenience on our part? >> >> Done in direct response to Simo. > > Sorry but your response sounds more like waving away then addressing > them, the excuse being: we can't please everyone, so we are going to > please no one. I obviously disagree with the take on my comments but you're free to your opinion. I believe saying we are pleasing no one isn't really fair now is it? Is there any type of audit container ID now? How would you go about associating audit events with containers now? (spoiler alert: it ain't pretty, and there are gaps I don't believe you can cover) This proposal provides a mechanism to do this in a way that isn't tied to any one particular concept of a container and is manageable inside the kernel. If you have a need to track audit events for containers, I find it extremely hard to believe that you are not at least partially pleased by the solutions presented here. It may not be everything on your wishlist, but when did you ever get *everything* on your wishlist? >> But to be clear Richard, we've talked about this a few times, it's not >> a "minor convenience" on our part, it's a pretty big convenience once >> we starting having to route audit events and make decisions based on >> the audit container ID information. Audit performance is less than >> awesome now, I'm working hard to not make it worse. > > Sounds like a security vs performance trade off to me. Welcome to software development. It's generally a pretty terrible hobby and/or occupation, but we make up for it with long hours and endless frustration. >> > u64 vs u128 is easy for us to >> > accomodate in terms of scalar comparisons. It doubles the information >> > in every container id field we print in audit records. >> >> ... and slows down audit container ID checks. > > Are you saying a cmp on a u128 is slower than a comparison on a u64 and > this is something that will be noticeable ? Do you have a 128 bit system? I don't. I've got a bunch of 64 bit systems, and a couple of 32 bit systems too. People that use audit have a tendency to really hammer on it, to the point that we get performance complaints on a not infrequent basis. I don't know the exact number of times we are going to need to check the audit container ID, but it's reasonable to think that we'll expose it as a filter-able field which adds a few checks, we'll use it for record routing so that's a few more, and if we're running multiple audit daemons we will probably want to include LSM checks which could result in a few more audit container ID checks. If it was one comparison I wouldn't be too worried about it, but the point I'm trying to make is that we don't know what the implementation is going to look like yet and I suspect this ID is going to be leveraged in several places in the audit subsystem and I would much rather start small to save headaches later. We can always expand the ID to a larger integer at a later date, but we can't make it smaller. >> > A c36 is a bigger step. >> >> Yeah, we're not doing that, no way. > > Ok, I can see your point though I do not agree with it. > > I can see why you do not want to have arbitrary length strings, but a > u128 sounded like a reasonable compromise to me as it has enough room > to be able to have unique cluster-wide IDs which a u64 definitely makes > a lot harder to provide w/o tight coordination. I originally wanted it to be a 32-bit integer, but Richard managed to talk me into 64-bits, that was my compromise :) As I said earlier, if you are doing container auditing you're going to need coordination with the orchestrator, regardless of the audit container ID size. -- paul moore www.paul-moore.com
Re: KASAN: use-after-free Read in mon_bin_vma_fault
On Thu, Dec 28, 2017 at 12:15:01PM -0800, syzbot wrote: > Hello, > > syzkaller hit the following crash on > beacbc68ac3e23821a681adb30b45dc55b17488d > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > Unfortunately, I don't have any reproducer for this bug yet. > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+9cd6f6d80e1a5205f...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > == > BUG: KASAN: use-after-free in mon_bin_vma_fault+0x378/0x400 > drivers/usb/mon/mon_bin.c:1238 > Read of size 8 at addr 8801cd040080 by task syz-executor1/5424 > > CPU: 1 PID: 5424 Comm: syz-executor1 Not tainted 4.15.0-rc5+ #238 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > print_address_description+0x73/0x250 mm/kasan/report.c:252 > kasan_report_error mm/kasan/report.c:351 [inline] > kasan_report+0x25b/0x340 mm/kasan/report.c:409 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 > mon_bin_vma_fault+0x378/0x400 drivers/usb/mon/mon_bin.c:1238 > __do_fault+0xeb/0x30f mm/memory.c:3196 > do_read_fault mm/memory.c:3606 [inline] > do_fault mm/memory.c:3706 [inline] > handle_pte_fault mm/memory.c:3937 [inline] > __handle_mm_fault+0x1d8f/0x3ce0 mm/memory.c:4061 > handle_mm_fault+0x334/0x8d0 mm/memory.c:4098 > faultin_page mm/gup.c:502 [inline] > __get_user_pages+0x50c/0x15f0 mm/gup.c:699 > populate_vma_page_range+0x20e/0x2f0 mm/gup.c:1216 > __mm_populate+0x23a/0x450 mm/gup.c:1266 > mm_populate include/linux/mm.h:2226 [inline] > vm_mmap_pgoff+0x241/0x280 mm/util.c:338 > SYSC_mmap_pgoff mm/mmap.c:1533 [inline] > SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491 > SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline] > SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91 > entry_SYSCALL_64_fastpath+0x1f/0x96 > RIP: 0033:0x452ac9 > RSP: 002b:7f7721b03c58 EFLAGS: 0212 ORIG_RAX: 0009 > RAX: ffda RBX: 0071bea0 RCX: 00452ac9 > RDX: 0104 RSI: 4000 RDI: 20ac6000 > RBP: 039b R08: 0014 R09: > R10: 8011 R11: 0212 R12: 006f2728 > R13: R14: 7f7721b046d4 R15: > > Allocated by task 5424: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 > kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3610 > kmalloc include/linux/slab.h:499 [inline] > kzalloc include/linux/slab.h:688 [inline] > mon_bin_open+0x1ae/0x4a0 drivers/usb/mon/mon_bin.c:703 > chrdev_open+0x257/0x730 fs/char_dev.c:417 > do_dentry_open+0x667/0xd40 fs/open.c:752 > vfs_open+0x107/0x220 fs/open.c:866 > do_last fs/namei.c:3379 [inline] > path_openat+0x1151/0x3530 fs/namei.c:3519 > do_filp_open+0x25b/0x3b0 fs/namei.c:3554 > do_sys_open+0x502/0x6d0 fs/open.c:1059 > SYSC_open fs/open.c:1077 [inline] > SyS_open+0x2d/0x40 fs/open.c:1072 > entry_SYSCALL_64_fastpath+0x1f/0x96 > > Freed by task 5433: > save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > set_track mm/kasan/kasan.c:459 [inline] > kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 > __cache_free mm/slab.c:3488 [inline] > kfree+0xd6/0x260 mm/slab.c:3803 > mon_bin_ioctl+0x68d/0xd40 drivers/usb/mon/mon_bin.c:1040 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 > SYSC_ioctl fs/ioctl.c:701 [inline] > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 > entry_SYSCALL_64_fastpath+0x1f/0x96 > > The buggy address belongs to the object at 8801cd040080 > which belongs to the cache kmalloc-2048 of size 2048 > The buggy address is located 0 bytes inside of > 2048-byte region [8801cd040080, 8801cd040880) > The buggy address belongs to the page: > page:3d43a99d count:1 mapcount:0 mapping:83654cb9 index:0x0 > compound_mapcount: 0 > flags: 0x2fffc008100(slab|head) > raw: 02fffc008100 8801cd040080 00010003 > raw: ea00072a2e20 ea00071e6920 8801db000c40 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > 8801cd03ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > 8801cd04: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > > 8801cd040080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >^ > 8801cd040100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > 8801cd040180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ==
Re: [PATCH 1/2] tools/memory-model: clarify the origin/scope of the tool name
On Fri, Feb 02, 2018 at 03:17:53PM -0800, Paul E. McKenney wrote: > On Fri, Feb 02, 2018 at 09:54:27AM +0100, Andrea Parri wrote: > > On Thu, Feb 01, 2018 at 03:09:41PM -0800, Paul E. McKenney wrote: > > > On Thu, Feb 01, 2018 at 10:26:50AM -0500, Alan Stern wrote: > > > > On Thu, 1 Feb 2018, Andrea Parri wrote: > > > > > > > > > Ingo pointed out that: > > > > > > > > > > "The "memory model" name is overly generic, ambiguous and somewhat > > > > >misleading, as we usually mean the virtual memory layout/model > > > > >when we say "memory model". GCC too uses it in that sense [...]" > > > > > > > > > > Make it clearer that, in the context of tools/memory-model/, the term > > > > > "memory-model" is used as shorthand for "memory consistency model" by > > > > > calling out this convention in tools/memory-model/README. > > > > > > > > > > Stick to the full name in sources' headers and for the subsystem name. > > > > > > > > > > Suggested-by: Ingo Molnar > > > > > Signed-off-by: Andrea Parri > > > > > > > > For both patches: > > > > > > > > Acked-by: Alan Stern > > > > > > Thank you all -- I have queued this and pushed it to my RCU tree on > > > branch lkmm. I did reword the commit log a bit, please see below and > > > please let me know if any of my rewordings need halp. > > > > > > Andrea, when you resend your second patch, could you please add Alan's > > > Acked-by? > > > > You mean in order to integrate Will's suggestion? I was planning to send > > that as a separate patch, as suggested by Will: the patch is on its way, > > IAC, please let me know if you'd prefer a V2 merging the two changes. > > Ah, apologies, I misread your reply. I have queued your second patch > with Will's and Alan's Acked-by's. And I did reword the commit log a bit, so please check it. Thanx, Paul commit 12a62a1d07031c0afa396b03334abbe30d9b9cf7 Author: Andrea Parri Date: Thu Feb 1 13:04:26 2018 +0100 MAINTAINERS: Add the Memory Consistency Model subsystem Move the contents of tools/memory-model/MAINTAINERS into the main MAINTAINERS file, removing tools/memory-model/MAINTAINERS. This allows get_maintainer.pl to correctly identify the maintainers of tools/memory-model/. Suggested-by: Ingo Molnar Signed-off-by: Andrea Parri Acked-by: Will Deacon Acked-by: Alan Stern Signed-off-by: Paul E. McKenney diff --git a/MAINTAINERS b/MAINTAINERS index e3581413420c..ba4dc08fbe95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8086,6 +8086,22 @@ M: Kees Cook S: Maintained F: drivers/misc/lkdtm* +LINUX KERNEL MEMORY CONSISTENCY MODEL +M: Alan Stern +M: Andrea Parri +M: Will Deacon +M: Peter Zijlstra +M: Boqun Feng +M: Nicholas Piggin +M: David Howells +M: Jade Alglave +M: Luc Maranget +M: "Paul E. McKenney" +L: linux-kernel@vger.kernel.org +S: Supported +T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git +F: tools/memory-model/ + LINUX SECURITY MODULE (LSM) FRAMEWORK M: Chris Wright L: linux-security-mod...@vger.kernel.org diff --git a/tools/memory-model/MAINTAINERS b/tools/memory-model/MAINTAINERS deleted file mode 100644 index db3bd3fc0435.. --- a/tools/memory-model/MAINTAINERS +++ /dev/null @@ -1,15 +0,0 @@ -LINUX KERNEL MEMORY CONSISTENCY MODEL -M: Alan Stern -M: Andrea Parri -M: Will Deacon -M: Peter Zijlstra -M: Boqun Feng -M: Nicholas Piggin -M: David Howells -M: Jade Alglave -M: Luc Maranget -M: "Paul E. McKenney" -L: linux-kernel@vger.kernel.org -S: Supported -T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git -F: tools/memory-model/
Re: [PATCH 1/2] tools/memory-model: clarify the origin/scope of the tool name
On Fri, Feb 02, 2018 at 11:44:21AM +0100, Andrea Parri wrote: > On Thu, Feb 01, 2018 at 03:09:41PM -0800, Paul E. McKenney wrote: > > On Thu, Feb 01, 2018 at 10:26:50AM -0500, Alan Stern wrote: > > > On Thu, 1 Feb 2018, Andrea Parri wrote: > > > > > > > Ingo pointed out that: > > > > > > > > "The "memory model" name is overly generic, ambiguous and somewhat > > > >misleading, as we usually mean the virtual memory layout/model > > > >when we say "memory model". GCC too uses it in that sense [...]" > > > > > > > > Make it clearer that, in the context of tools/memory-model/, the term > > > > "memory-model" is used as shorthand for "memory consistency model" by > > > > calling out this convention in tools/memory-model/README. > > > > > > > > Stick to the full name in sources' headers and for the subsystem name. > > > > > > > > Suggested-by: Ingo Molnar > > > > Signed-off-by: Andrea Parri > > > > > > For both patches: > > > > > > Acked-by: Alan Stern > > > > Thank you all -- I have queued this and pushed it to my RCU tree on > > branch lkmm. I did reword the commit log a bit, please see below and > > please let me know if any of my rewordings need halp. > > Seems to me that your message has a leftover "is used". Good catch, how about this instead? Thanx, Paul --- commit 2b1b4ab5166209da849f306fbdc84114d9e611fd Author: Andrea Parri Date: Thu Feb 1 13:03:29 2018 +0100 tools/memory-model: Clarify the origin/scope of the tool name Ingo pointed out that: "The "memory model" name is overly generic, ambiguous and somewhat misleading, as we usually mean the virtual memory layout/model when we say "memory model". GCC too uses it in that sense [...]" Make it clear that tools/memory-model/ uses the term "memory model" as shorthand for "memory consistency model" by calling out this convention in tools/memory-model/README. Stick to the original "memory model" term in sources' headers and for the subsystem name. Suggested-by: Ingo Molnar Signed-off-by: Andrea Parri Acked-by: Will Deacon Acked-by: Alan Stern Signed-off-by: Paul E. McKenney diff --git a/tools/memory-model/MAINTAINERS b/tools/memory-model/MAINTAINERS index 711cbe72d606..db3bd3fc0435 100644 --- a/tools/memory-model/MAINTAINERS +++ b/tools/memory-model/MAINTAINERS @@ -1,4 +1,4 @@ -LINUX KERNEL MEMORY MODEL +LINUX KERNEL MEMORY CONSISTENCY MODEL M: Alan Stern M: Andrea Parri M: Will Deacon diff --git a/tools/memory-model/README b/tools/memory-model/README index 43ba49492111..91414a49fac5 100644 --- a/tools/memory-model/README +++ b/tools/memory-model/README @@ -1,15 +1,15 @@ - = - LINUX KERNEL MEMORY MODEL - = + = + LINUX KERNEL MEMORY CONSISTENCY MODEL + = INTRODUCTION -This directory contains the memory model of the Linux kernel, written -in the "cat" language and executable by the (externally provided) -"herd7" simulator, which exhaustively explores the state space of -small litmus tests. +This directory contains the memory consistency model (memory model, for +short) of the Linux kernel, written in the "cat" language and executable +by the externally provided "herd7" simulator, which exhaustively explores +the state space of small litmus tests. In addition, the "klitmus7" tool (also externally provided) may be used to convert a litmus test to a Linux kernel module, which in turn allows diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell index 57112505f5e0..b984bbda01a5 100644 --- a/tools/memory-model/linux-kernel.bell +++ b/tools/memory-model/linux-kernel.bell @@ -11,7 +11,7 @@ * which is to appear in ASPLOS 2018. *) -"Linux kernel memory model" +"Linux-kernel memory consistency model" enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) || 'release (*smp_store_release*) || diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat index 15b7a5dd8a9a..babe2b3b0bb3 100644 --- a/tools/memory-model/linux-kernel.cat +++ b/tools/memory-model/linux-kernel.cat @@ -11,7 +11,7 @@ * which is to appear in ASPLOS 2018. *) -"Linux kernel memory model" +"Linux-kernel memory consistency model" (* * File "lock.cat" handles locks and is experimental.
Re: [PATCH 1/2] tools/memory-model: clarify the origin/scope of the tool name
On Fri, Feb 02, 2018 at 09:54:27AM +0100, Andrea Parri wrote: > On Thu, Feb 01, 2018 at 03:09:41PM -0800, Paul E. McKenney wrote: > > On Thu, Feb 01, 2018 at 10:26:50AM -0500, Alan Stern wrote: > > > On Thu, 1 Feb 2018, Andrea Parri wrote: > > > > > > > Ingo pointed out that: > > > > > > > > "The "memory model" name is overly generic, ambiguous and somewhat > > > >misleading, as we usually mean the virtual memory layout/model > > > >when we say "memory model". GCC too uses it in that sense [...]" > > > > > > > > Make it clearer that, in the context of tools/memory-model/, the term > > > > "memory-model" is used as shorthand for "memory consistency model" by > > > > calling out this convention in tools/memory-model/README. > > > > > > > > Stick to the full name in sources' headers and for the subsystem name. > > > > > > > > Suggested-by: Ingo Molnar > > > > Signed-off-by: Andrea Parri > > > > > > For both patches: > > > > > > Acked-by: Alan Stern > > > > Thank you all -- I have queued this and pushed it to my RCU tree on > > branch lkmm. I did reword the commit log a bit, please see below and > > please let me know if any of my rewordings need halp. > > > > Andrea, when you resend your second patch, could you please add Alan's > > Acked-by? > > You mean in order to integrate Will's suggestion? I was planning to send > that as a separate patch, as suggested by Will: the patch is on its way, > IAC, please let me know if you'd prefer a V2 merging the two changes. Ah, apologies, I misread your reply. I have queued your second patch with Will's and Alan's Acked-by's. Thanx, Paul > Andrea > > > > > > Thanx, Paul > > > > > > > > commit de175b697f71b8e3e6d980b7186b909fee0c4378 > > Author: Andrea Parri > > Date: Thu Feb 1 13:03:29 2018 +0100 > > > > tools/memory-model: Clarify the origin/scope of the tool name > > > > Ingo pointed out that: > > > > "The "memory model" name is overly generic, ambiguous and somewhat > >misleading, as we usually mean the virtual memory layout/model > >when we say "memory model". GCC too uses it in that sense [...]" > > > > Make it clearer that tools/memory-model/ uses the term "memory model" > > is used as shorthand for "memory consistency model" by calling out this > > convention in tools/memory-model/README. > > > > Stick to the original "memory model" term in sources' headers and for > > the subsystem name. > > > > Suggested-by: Ingo Molnar > > Signed-off-by: Andrea Parri > > Acked-by: Will Deacon > > Acked-by: Alan Stern > > Signed-off-by: Paul E. McKenney > > > > diff --git a/tools/memory-model/MAINTAINERS b/tools/memory-model/MAINTAINERS > > index 711cbe72d606..db3bd3fc0435 100644 > > --- a/tools/memory-model/MAINTAINERS > > +++ b/tools/memory-model/MAINTAINERS > > @@ -1,4 +1,4 @@ > > -LINUX KERNEL MEMORY MODEL > > +LINUX KERNEL MEMORY CONSISTENCY MODEL > > M: Alan Stern > > M: Andrea Parri > > M: Will Deacon > > diff --git a/tools/memory-model/README b/tools/memory-model/README > > index 43ba49492111..91414a49fac5 100644 > > --- a/tools/memory-model/README > > +++ b/tools/memory-model/README > > @@ -1,15 +1,15 @@ > > - = > > - LINUX KERNEL MEMORY MODEL > > - = > > + = > > + LINUX KERNEL MEMORY CONSISTENCY MODEL > > + = > > > > > > INTRODUCTION > > > > > > -This directory contains the memory model of the Linux kernel, written > > -in the "cat" language and executable by the (externally provided) > > -"herd7" simulator, which exhaustively explores the state space of > > -small litmus tests. > > +This directory contains the memory consistency model (memory model, for > > +short) of the Linux kernel, written in the "cat" language and executable > > +by the externally provided "herd7" simulator, which exhaustively explores > > +the state space of small litmus tests. > > > > In addition, the "klitmus7" tool (also externally provided) may be used > > to convert a litmus test to a Linux kernel module, which in turn allows > > diff --git a/tools/memory-model/linux-kernel.bell > > b/tools/memory-model/linux-kernel.bell > > index 57112505f5e0..b984bbda01a5 100644 > > --- a/tools/memory-model/linux-kernel.bell > > +++ b/tools/memory-model/linux-kernel.bell > > @@ -11,7 +11,7 @@ > > * which is to appear in ASPLOS 2018. > > *) > > > > -"Linux kernel memory model" > > +"Linux-kernel memory consistency model" > > > > enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) || > > 'release (*smp_store_release*) ||
[PATCH 04/18] tracing/x86: Add arch_get_func_args() function
From: "Steven Rostedt (VMware)" Add function to get the function arguments from pt_regs. Signed-off-by: Steven Rostedt (VMware) --- arch/x86/kernel/ftrace.c | 28 1 file changed, 28 insertions(+) diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 01ebcb6f263e..5e845c8cf89d 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void) return 0; } +int arch_get_func_args(struct pt_regs *regs, + int start, int end, long *args) +{ +#ifdef CONFIG_X86_64 +# define MAX_ARGS 6 +# define INIT_REGS \ + { regs->di, regs->si, regs->dx, \ + regs->cx, regs->r8, regs->r9\ + } +#else +# define MAX_ARGS 3 +# define INIT_REGS \ + { regs->ax, regs->dx, regs->cx} +#endif + if (!regs) + return MAX_ARGS; + + { + long pt_args[] = INIT_REGS; + int i; + + for (i = start; i <= end && i < MAX_ARGS; i++) + args[i - start] = pt_args[i]; + + return i - start; + } +} + union ftrace_code_union { char code[MCOUNT_INSN_SIZE]; struct { -- 2.15.1
[PATCH 06/18] tracing: Add indirect offset to args of ftrace based events
From: "Steven Rostedt (VMware)" Add '[' ']' syntex to allow to get values indirectly from the arguments. For example: echo replenish_dl_entity(s64 dl_se[4]) > function_events Will get the 4th long long word from the first parameter like an array. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 32 +++- kernel/trace/trace_event_ftrace.c | 73 +-- 2 files changed, 101 insertions(+), 4 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index f27a0c4e829c..7d67229e8e88 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -100,11 +100,15 @@ as follows: 'x8' | 'x16' | 'x32' | 'x64' | 'char' | 'short' | 'int' | 'long' | 'size_t' - FIELD := + FIELD := | INDEX + + INDEX := '[' ']' Where is a unique string starting with an alphabetic character and consists only of letters and numbers and underscores. + Where is a number that can be read by kstrtol() (hex, decimal, etc). + Simple arguments @@ -128,3 +132,29 @@ If we are only interested in the first argument (skb): We use "x64" in order to make sure that the data is displayed in hex. This is on a x86_64 machine, and we know the pointer sizes are 8 bytes. + + +Indexing + + +The pointers of the skb and the dev isn't that interesting. But if we want the +length "len" field of skb, we could index it with an index operator '[' and ']'. + +Using gdb, we can find the offset of 'len' from the sk_buff type: + + $ gdb vmlinux + (gdb) printf "%d\n", &((struct sk_buff *)0)->len +128 + +As 128 / 4 (length of int) is 32, we can see the length of the skb with: + + # echo 'ip_rcv(int skb[32], x64 dev)' > function_events + + # echo 1 > events/functions/ip_rcv/enable + # cat trace +-0 [003] ..s3 280.167137: __netif_receive_skb_core->ip_rcv(skb=52, dev=8801092f9400) +-0 [003] ..s3 280.167152: __netif_receive_skb_core->ip_rcv(skb=52, dev=8801092f9400) +-0 [003] ..s3 280.806629: __netif_receive_skb_core->ip_rcv(skb=88, dev=8801092f9400) +-0 [003] ..s3 280.807023: __netif_receive_skb_core->ip_rcv(skb=52, dev=8801092f9400) + +Now we see the length of the sk_buff per event. diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index aa19c8af9d34..5d37498d1c6b 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -10,13 +10,15 @@ #include "trace.h" -#define FUNC_EVENT_SYSTEM "functions" -#define WRITE_BUFSIZE 4096 +#define FUNC_EVENT_SYSTEM "functions" +#define WRITE_BUFSIZE 4096 +#define INDIRECT_FLAG 0x1000 struct func_arg { struct list_headlist; char*type; char*name; + longindirect; short offset; short size; chararg; @@ -55,6 +57,9 @@ enum func_states { FUNC_STATE_INIT, FUNC_STATE_FUNC, FUNC_STATE_PARAM, + FUNC_STATE_BRACKET, + FUNC_STATE_BRACKET_END, + FUNC_STATE_INDIRECT, FUNC_STATE_TYPE, FUNC_STATE_VAR, FUNC_STATE_COMMA, @@ -171,6 +176,8 @@ static char *next_token(char **ptr, char *last) for (str = arg; *str; str++) { if (*str == '(' || + *str == '[' || + *str == ']' || *str == ',' || *str == ')') break; @@ -223,6 +230,7 @@ static int add_arg(struct func_event *fevent, int ftype) static enum func_states process_event(struct func_event *fevent, const char *token, enum func_states state) { + long val; int ret; int i; @@ -269,12 +277,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta break; return FUNC_STATE_VAR; + case FUNC_STATE_BRACKET: + WARN_ON(!fevent->last_arg); + ret = kstrtol(token, 0, &val); + if (ret) + break; + val *= fevent->last_arg->size; + fevent->last_arg->indirect = val ^ INDIRECT_FLAG; + return FUNC_STATE_INDIRECT; + + case FUNC_STATE_INDIRECT: + if (token[0] != ']') + break; + return FUNC_STATE_BRACKET_END; + + case FUNC_STATE_BRACKET_END: + switch (token[0]) { + case ')': + return FUNC_STATE_END; + case ',': + return FUNC_STATE_COMMA; + } + break; + case FUNC_STATE_VAR: switch (token[0]) {
[PATCH 07/18] tracing: Add dereferencing multiple fields per arg
From: "Steven Rostedt (VMware)" As an argument may be a structure or an array, we may want to dereference more than one field per argument. Create a pipe '|' token to the parsing that allows to reference multipe dereference fields per function argument. Change func_arg fields from char to s8 or u8 to allow them to be subscripts to arrays. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 20 +- kernel/trace/trace_event_ftrace.c | 29 --- 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 7d67229e8e88..2a002c8a500b 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -91,7 +91,7 @@ as follows: ARGS := ARG | ARG ',' ARGS | '' - ARG := TYPE FIELD + ARG := TYPE FIELD | ARG '|' ARG TYPE := ATOM @@ -158,3 +158,21 @@ As 128 / 4 (length of int) is 32, we can see the length of the skb with: -0 [003] ..s3 280.807023: __netif_receive_skb_core->ip_rcv(skb=52, dev=8801092f9400) Now we see the length of the sk_buff per event. + + +Multiple fields per argument + + + +If we still want to see the skb pointer value along with the length of the +skb, then using the '|' option allows us to add more than one option to +an argument: + + # echo 'ip_rcv(x64 skb | int skb[32], x64 dev)' > function_events + + # echo 1 > events/functions/ip_rcv/enable + # cat trace +-0 [003] ..s3 904.075838: __netif_receive_skb_core->ip_rcv(skb=88011396e800, skb=52, dev=880115204000) +-0 [003] ..s3 904.075848: __netif_receive_skb_core->ip_rcv(skb=88011396e800, skb=52, dev=880115204000) +-0 [003] ..s3 904.725486: __netif_receive_skb_core->ip_rcv(skb=88011396e800, skb=194, dev=880115204000) +-0 [003] ..s3 905.152537: __netif_receive_skb_core->ip_rcv(skb=88011396f200, skb=88, dev=880115204000) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 5d37498d1c6b..8c9d4a92deab 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -21,8 +21,8 @@ struct func_arg { longindirect; short offset; short size; - chararg; - charsign; + s8 arg; + u8 sign; }; struct func_event { @@ -60,6 +60,7 @@ enum func_states { FUNC_STATE_BRACKET, FUNC_STATE_BRACKET_END, FUNC_STATE_INDIRECT, + FUNC_STATE_PIPE, FUNC_STATE_TYPE, FUNC_STATE_VAR, FUNC_STATE_COMMA, @@ -179,6 +180,7 @@ static char *next_token(char **ptr, char *last) *str == '[' || *str == ']' || *str == ',' || + *str == '|' || *str == ')') break; } @@ -251,11 +253,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta break; return FUNC_STATE_PARAM; + case FUNC_STATE_PIPE: + fevent->arg_cnt--; + goto comma; case FUNC_STATE_PARAM: if (token[0] == ')') return FUNC_STATE_END; /* Fall through */ case FUNC_STATE_COMMA: + comma: for (i = 0; func_types[i].size; i++) { if (strcmp(token, func_types[i].name) == 0) break; @@ -297,6 +303,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta return FUNC_STATE_END; case ',': return FUNC_STATE_COMMA; + case '|': + return FUNC_STATE_PIPE; } break; @@ -306,6 +314,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta return FUNC_STATE_END; case ',': return FUNC_STATE_COMMA; + case '|': + return FUNC_STATE_PIPE; case '[': return FUNC_STATE_BRACKET; } @@ -364,7 +374,6 @@ static void func_event_trace(struct trace_event_file *trace_file, int nr_args; int size; int pc; - int i = 0; if (trace_trigger_soft_disabled(trace_file)) return; @@ -386,8 +395,8 @@ static void func_event_trace(struct trace_event_file *trace_file, nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args); list_for_each_entry(arg, &func_event-
[PATCH 18/18] tracing/perf: Allow perf to use function based events
From: "Steven Rostedt (VMware)" Have perf use function based events. # echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' > /sys/kernel/tracing/function_events # perf record -e functions:SyS_openat grep task_forks /proc/kallsyms # perf script grep 913 [002] 5713.413239: functions:SyS_openat: entry_SYSCALL_64_fastpath->sys_openat(dfd=-100, buf=/proc/kallsyms, flags=100, mode=0) Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 3 +- kernel/trace/trace_event_ftrace.c | 134 -- 2 files changed, 104 insertions(+), 33 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 3b341992b93d..6effde96d3d6 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -48,7 +48,8 @@ enable filter format hist id trigger Even though the above function based event does not record much more than the function tracer does, it does become a full fledge event. -This can be used by the histogram infrastructure, and triggers. +This can be used by the histogram infrastructure, triggers, and perf +where one can attach eBPF programs to. # cat events/functions/do_IRQ/format name: do_IRQ diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index b5b719680686..b145639eac45 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -747,46 +747,33 @@ static int get_string(unsigned long addr, unsigned int idx, return len; } -static void func_event_trace(struct trace_event_file *trace_file, -struct func_event *func_event, -unsigned long ip, unsigned long parent_ip, -struct pt_regs *pt_regs) +static int get_event_size(struct func_event *func_event, struct pt_regs *pt_regs, + long *args, int *nr_args) { - struct func_event_hdr *entry; - struct trace_event_call *call = &func_event->call; - struct ring_buffer_event *event; - struct ring_buffer *buffer; - struct func_arg *arg; - long args[func_event->arg_cnt]; - long long val = 1; - unsigned long irq_flags; - int str_offset; - int str_idx = 0; - int nr_args = 0; int size; - int pc; - - if (trace_trigger_soft_disabled(trace_file)) - return; - - local_save_flags(irq_flags); - pc = preempt_count(); - size = func_event->arg_offset + sizeof(*entry); + size = func_event->arg_offset + sizeof(struct func_event_hdr); if (func_event->arg_cnt) - nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args); + *nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args); + else + *nr_args = 0; if (func_event->has_strings) - size += calculate_strings(func_event, nr_args, args); + size += calculate_strings(func_event, *nr_args, args); - event = trace_event_buffer_lock_reserve(&buffer, trace_file, - call->event.type, - size, irq_flags, pc); - if (!event) - return; + return size; +} + +static void +record_entry(struct func_event_hdr *entry, struct func_event *func_event, +unsigned long ip, unsigned long parent_ip, int nr_args, long *args) +{ + struct func_arg *arg; + long long val; + int str_offset; + int str_idx = 0; - entry = ring_buffer_event_data(event); entry->ip = ip; entry->parent_ip = parent_ip; @@ -809,11 +796,80 @@ static void func_event_trace(struct trace_event_file *trace_file, } else memcpy(&entry->data[arg->offset], &val, arg->size); } +} + +static void func_event_trace(struct trace_event_file *trace_file, +struct func_event *func_event, +unsigned long ip, unsigned long parent_ip, +struct pt_regs *pt_regs) +{ + struct func_event_hdr *entry; + struct trace_event_call *call = &func_event->call; + struct ring_buffer_event *event; + struct ring_buffer *buffer; + long args[func_event->arg_cnt]; + unsigned long irq_flags; + int nr_args; + int size; + int pc; + + if (trace_trigger_soft_disabled(trace_file)) + return; + + local_save_flags(irq_flags); + pc = preempt_count(); + + size = get_event_size(func_event, pt_regs, args, &nr_args); + + event = trace_event_buffer_lock_reserve(&buffer, trace_file, + call->event.type, + size, irq_flags, pc); +
[PATCH 09/18] tracing: Add indexing of arguments for function based events
From: "Steven Rostedt (VMware)" Currently reading of 8 byte words can only happen 8 bytes aligned from the argument. But there may be cases that they are 4 bytes aligned. To make the capturing of arguments more flexible, add a plus '+' operator that can index the variable at arbitrary indexes to get any location. u64 arg+4[3] Will get an 8 byte word at index 28 (3 * 8 + 4) Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 24 +++- kernel/trace/trace_event_ftrace.c | 18 ++ 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 72e3e7730d63..bdb28f433bfb 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -100,10 +100,12 @@ as follows: 'x8' | 'x16' | 'x32' | 'x64' | 'char' | 'short' | 'int' | 'long' | 'size_t' - FIELD := | INDEX + FIELD := | INDEX | OFFSET | OFFSET INDEX INDEX := '[' ']' + OFFSET := '+' + Where is a unique string starting with an alphabetic character and consists only of letters and numbers and underscores. @@ -221,3 +223,23 @@ format: print fmt: "%pS->%pS(skb=%u)", REC->__ip, REC->__parent_ip, REC->skb It is now printed with a "%u". + + +Offsets +=== + +After the name of the variable, brackets '[' number ']' will index the value of +the argument by the number given times the size of the field. + + int field[5] will dereference the value of the argument 20 bytes away (4 * 5) + as sizeof(int) is 4. + +If there's a case where the type is of 8 bytes in size but is not 8 bytes +alligned in the structure, an offset may be required. + + For example: x64 param+4[2] + +The above will take the parameter value, add it by 4, then index it by two +8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2] + + Note: "int skb[32]" is the same as "int skb+4[31]". diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 9548b93eb8cd..4c23fa18453d 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -19,6 +19,7 @@ struct func_arg { char*type; char*name; longindirect; + longindex; short offset; short size; s8 arg; @@ -62,6 +63,7 @@ enum func_states { FUNC_STATE_INDIRECT, FUNC_STATE_UNSIGNED, FUNC_STATE_PIPE, + FUNC_STATE_PLUS, FUNC_STATE_TYPE, FUNC_STATE_VAR, FUNC_STATE_COMMA, @@ -182,6 +184,7 @@ static char *next_token(char **ptr, char *last) *str == ']' || *str == ',' || *str == '|' || + *str == '+' || *str == ')') break; } @@ -323,6 +326,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta } break; + case FUNC_STATE_PLUS: + if (WARN_ON(!fevent->last_arg)) + break; + ret = kstrtol(token, 0, &val); + if (ret) + break; + fevent->last_arg->index += val; + return FUNC_STATE_VAR; + case FUNC_STATE_VAR: switch (token[0]) { case ')': @@ -331,6 +343,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta return FUNC_STATE_COMMA; case '|': return FUNC_STATE_PIPE; + case '+': + return FUNC_STATE_PLUS; case '[': return FUNC_STATE_BRACKET; } @@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val) char buf[8]; int ret; + val += arg->index; + if (!arg->indirect) return val; @@ -779,6 +795,8 @@ static int func_event_seq_show(struct seq_file *m, void *v) last_arg = arg->arg; comma = true; seq_printf(m, "%s %s", arg->type, arg->name); + if (arg->index) + seq_printf(m, "+%ld", arg->index); if (arg->indirect && arg->size) seq_printf(m, "[%ld]", (arg->indirect ^ INDIRECT_FLAG) / arg->size); -- 2.15.1
[PATCH 12/18] tracing: Add accessing direct address from function based events
From: "Steven Rostedt (VMware)" Allow referencing any address during the function based event. The syntax is to use = For example: # echo 'do_IRQ(long total_forks=0xa2a4b4c0)' > function_events # echo 1 > events/function/enable # cat trace sshd-832 [000] d... 221639.210845: ret_from_intr->do_IRQ(total_forks=855) sshd-832 [000] d... 221639.24: ret_from_intr->do_IRQ(total_forks=855) -0 [000] d... 221639.211198: ret_from_intr->do_IRQ(total_forks=855) Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 40 +++- kernel/trace/trace_event_ftrace.c | 129 +- 2 files changed, 143 insertions(+), 26 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index f18c8f3ef330..b0e6725f3032 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -91,7 +91,7 @@ as follows: ARGS := ARG | ARG ',' ARGS | '' - ARG := TYPE FIELD | ARG '|' ARG + ARG := TYPE FIELD | TYPE '=' ADDR | TYPE ADDR | ARG '|' ARG TYPE := ATOM | 'unsigned' ATOM @@ -107,6 +107,8 @@ as follows: OFFSET := '+' + ADDR := A hexidecimal address starting with '0x' + Where is a unique string starting with an alphabetic character and consists only of letters and numbers and underscores. @@ -267,3 +269,39 @@ Again, using gdb to find the offset of the "func" field of struct work_struct -0 [000] dNs3 6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=88011a010800, func=vmstat_shepherd+0x0/0xb0) worker/0:2-1689 [000] d..2 6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=88011a11da00, func=vmstat_update+0x0/0x70) -0 [005] d.s3 6241.347996: queue_work_on->__queue_work(cpu=128, wq=88011a011200, func=fb_flashcursor+0x0/0x110 [fb]) + + +Direct memory access + + +Function arguments are not the only thing that can be recorded from a function +based event. Memory addresses can also be examined. If there's a global variable +that you want to monitor via an interrupt, you can put in the address directly. + + # grep total_forks /proc/kallsyms +82354c18 B total_forks + + # echo 'do_IRQ(int total_forks=0x82354c18)' > function_events + + # echo 1 events/functions/do_IRQ/enable + # cat trace +-0 [003] d..3 337.076709: ret_from_intr->do_IRQ(total_forks=1419) +-0 [003] d..3 337.077046: ret_from_intr->do_IRQ(total_forks=1419) +-0 [003] d..3 337.077076: ret_from_intr->do_IRQ(total_forks=1420) + +Note, address notations do not affect the argument count. For instance, with + +__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) + + # echo 'do_IRQ(int total_forks=0x82354c18, symbol regs[16])' > function_events + +Is the same as + + # echo 'do_IRQ(int total_forks=0x82354c18 | symbol regs[16])' > function_events + + # cat trace +-0 [003] d..3 653.839546: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330) +-0 [003] d..3 653.906011: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330) +-0 [003] d..3 655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50) +-0 [003] d..3 655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330) + diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index ba10177b9bd6..206114f192be 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -63,6 +63,8 @@ enum func_states { FUNC_STATE_BRACKET_END, FUNC_STATE_INDIRECT, FUNC_STATE_UNSIGNED, + FUNC_STATE_ADDR, + FUNC_STATE_EQUAL, FUNC_STATE_PIPE, FUNC_STATE_PLUS, FUNC_STATE_TYPE, @@ -199,6 +201,7 @@ static char *next_token(char **ptr, char *last) *str == ',' || *str == '|' || *str == '+' || + *str == '=' || *str == ')') break; } @@ -243,12 +246,39 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign) arg->sign = func_type->sign; arg->offset = ALIGN(fevent->arg_offset, arg->size); arg->func_type = ftype; - arg->arg = fevent->arg_cnt; fevent->arg_offset = arg->offset + arg->size; list_add_tail(&arg->list, &fevent->args); fevent->last_arg = arg; - fevent->arg_cnt++; + + return 0; +} + +static int update_arg_name(struct func_event *fevent, const char *name) +{ + struct func_arg *arg = fevent->last_arg; + + if (WARN_ON(!arg)) + return -EINVAL; + + arg->name = kstrdup(name, GFP_KERNEL); + if (!arg->name) + return -ENO
[PATCH 08/18] tracing: Add "unsigned" to function based events
From: "Steven Rostedt (VMware)" Add "unsigned" to the format processing to creating dynamic function based events. For example: "unsigned long" now works. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 47 ++- kernel/trace/trace_event_ftrace.c | 23 ++--- 2 files changed, 65 insertions(+), 5 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 2a002c8a500b..72e3e7730d63 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -93,7 +93,7 @@ as follows: ARG := TYPE FIELD | ARG '|' ARG - TYPE := ATOM + TYPE := ATOM | 'unsigned' ATOM ATOM := 'u8' | 'u16' | 'u32' | 'u64' | 's8' | 's16' | 's32' | 's64' | @@ -176,3 +176,48 @@ an argument: -0 [003] ..s3 904.075848: __netif_receive_skb_core->ip_rcv(skb=88011396e800, skb=52, dev=880115204000) -0 [003] ..s3 904.725486: __netif_receive_skb_core->ip_rcv(skb=88011396e800, skb=194, dev=880115204000) -0 [003] ..s3 905.152537: __netif_receive_skb_core->ip_rcv(skb=88011396f200, skb=88, dev=880115204000) + + +Unsigned usage +== + +One can also use "unsigned" to make some types unsigned. It works against +"long", "int", "short" and "char". It doesn't error against other types but +may not make any sense. + + # echo 'ip_rcv(int skb[32])' > function_events + # cat events/functions/ip_rcv/format +name: ip_rcv +ID: 1397 +format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + + field:unsigned long __parent_ip;offset:8; size:8; signed:0; + field:unsigned long __ip; offset:16; size:8; signed:0; + field:int skb; offset:24; size:4; signed:1; + +print fmt: "%pS->%pS(skb=%d)", REC->__ip, REC->__parent_ip, REC->skb + + +Notice that REC->skb is printed with "%d". By adding "unsigned" + + # echo 'ip_rcv(unsigned int skb[32])' > function_events + # cat events/functions/ip_rcv/format +name: ip_rcv +ID: 1398 +format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + + field:unsigned long __parent_ip;offset:8; size:8; signed:0; + field:unsigned long __ip; offset:16; size:8; signed:0; + field:unsigned int skb; offset:24; size:4; signed:0; + +print fmt: "%pS->%pS(skb=%u)", REC->__ip, REC->__parent_ip, REC->skb + +It is now printed with a "%u". diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 8c9d4a92deab..9548b93eb8cd 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -60,6 +60,7 @@ enum func_states { FUNC_STATE_BRACKET, FUNC_STATE_BRACKET_END, FUNC_STATE_INDIRECT, + FUNC_STATE_UNSIGNED, FUNC_STATE_PIPE, FUNC_STATE_TYPE, FUNC_STATE_VAR, @@ -198,7 +199,7 @@ static char *next_token(char **ptr, char *last) return arg; } -static int add_arg(struct func_event *fevent, int ftype) +static int add_arg(struct func_event *fevent, int ftype, int unsign) { struct func_type *func_type = &func_types[ftype]; struct func_arg *arg; @@ -211,13 +212,18 @@ static int add_arg(struct func_event *fevent, int ftype) if (!arg) return -ENOMEM; - arg->type = kstrdup(func_type->name, GFP_KERNEL); + if (unsign) + arg->type = kasprintf(GFP_KERNEL, "unsigned %s", + func_type->name); + else + arg->type = kstrdup(func_type->name, GFP_KERNEL); if (!arg->type) { kfree(arg); return -ENOMEM; } arg->size = func_type->size; - arg->sign = func_type->sign; + if (!unsign) + arg->sign = func_type->sign; arg->offset = ALIGN(fevent->arg_offset, arg->size); arg->arg = fevent->arg_cnt; fevent->arg_offset = arg->offset + arg->size; @@ -232,12 +238,14 @@ static int add_arg(struct func_event *fevent, int ftype) static enum func_states process_event(struct func_event *fevent, const char *token, enum func_states state) { + static int unsign; long val; int ret; int i; switch (state) { case FUNC_STATE_INIT: + unsign = 0; if (!isalpha(token[0]))
[PATCH 11/18] tracing: Add symbol type to function based events
From: "Steven Rostedt (VMware)" Add a special type "symbol" that will use %pS to display the field of a function based event. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 26 +- kernel/trace/trace_event_ftrace.c | 13 ++--- 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index bdb28f433bfb..f18c8f3ef330 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -98,7 +98,8 @@ as follows: ATOM := 'u8' | 'u16' | 'u32' | 'u64' | 's8' | 's16' | 's32' | 's64' | 'x8' | 'x16' | 'x32' | 'x64' | - 'char' | 'short' | 'int' | 'long' | 'size_t' + 'char' | 'short' | 'int' | 'long' | 'size_t' | +'symbol' FIELD := | INDEX | OFFSET | OFFSET INDEX @@ -243,3 +244,26 @@ The above will take the parameter value, add it by 4, then index it by two 8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2] Note: "int skb[32]" is the same as "int skb+4[31]". + + +Symbols (function names) + + +To display kallsyms "%pS" type of output, use the special type "symbol". + +Again, using gdb to find the offset of the "func" field of struct work_struct + +(gdb) printf "%d\n", &((struct work_struct *)0)->func +24 + + Both "symbol func[3]" and "symbol func+24[0]" will work. + + # echo '__queue_work(int cpu, x64 wq, symbol func[3])' > function_events + + # echo 1 > events/functions/__queue_work/enable + # cat trace + bash-1641 [007] d..2 6241.171332: queue_work_on->__queue_work(cpu=128, wq=88011a010e00, func=flush_to_ldisc+0x0/0xa0) + bash-1641 [007] d..2 6241.171460: queue_work_on->__queue_work(cpu=128, wq=88011a010e00, func=flush_to_ldisc+0x0/0xa0) + -0 [000] dNs3 6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=88011a010800, func=vmstat_shepherd+0x0/0xb0) + worker/0:2-1689 [000] d..2 6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=88011a11da00, func=vmstat_update+0x0/0x70) + -0 [005] d.s3 6241.347996: queue_work_on->__queue_work(cpu=128, wq=88011a011200, func=fb_flashcursor+0x0/0x110 [fb]) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 0f2650e97e49..ba10177b9bd6 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -76,6 +76,7 @@ typedef u64 x64; typedef u32 x32; typedef u16 x16; typedef u8 x8; +typedef void * symbol; #define TYPE_TUPLE(type) \ { #type, sizeof(type), is_signed_type(type) } @@ -97,7 +98,8 @@ typedef u8 x8; TYPE_TUPLE(x16),\ TYPE_TUPLE(u8), \ TYPE_TUPLE(s8), \ - TYPE_TUPLE(x8) + TYPE_TUPLE(x8), \ + TYPE_TUPLE(symbol) static struct func_type { char*name; @@ -262,7 +264,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta switch (state) { case FUNC_STATE_INIT: unsign = 0; - if (!isalpha(token[0])) + if (!isalpha(token[0]) && token[0] != '_') break; /* Do not allow wild cards */ if (strstr(token, "*") || strstr(token, "?")) @@ -305,7 +307,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta return FUNC_STATE_TYPE; case FUNC_STATE_TYPE: - if (!isalpha(token[0])) + if (!isalpha(token[0]) || token[0] == '_') break; if (WARN_ON(!fevent->last_arg)) break; @@ -472,6 +474,11 @@ static void make_fmt(struct func_arg *arg, char *fmt) { int c = 0; + if (arg->func_type == FUNC_TYPE_symbol) { + strcpy(fmt, "%pS"); + return; + } + fmt[c++] = '%'; if (arg->size == 8) { -- 2.15.1
[PATCH 10/18] tracing: Make func_type enums for easier comparing of arg types
From: "Steven Rostedt (VMware)" For the function based event args, knowing quickly what type they are is advantageous, as decisions can be made quickly based on them. Having an enum for the types is useful for this purpose. Use macros to create both the func_type array as well as enums that match the type to the index into that array. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_event_ftrace.c | 47 +-- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 4c23fa18453d..0f2650e97e49 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -24,6 +24,7 @@ struct func_arg { short size; s8 arg; u8 sign; + u8 func_type; }; struct func_event { @@ -79,31 +80,42 @@ typedef u8 x8; #define TYPE_TUPLE(type) \ { #type, sizeof(type), is_signed_type(type) } +#define FUNC_TYPES \ + TYPE_TUPLE(long), \ + TYPE_TUPLE(int),\ + TYPE_TUPLE(short), \ + TYPE_TUPLE(char), \ + TYPE_TUPLE(size_t), \ + TYPE_TUPLE(u64),\ + TYPE_TUPLE(s64),\ + TYPE_TUPLE(x64),\ + TYPE_TUPLE(u32),\ + TYPE_TUPLE(s32),\ + TYPE_TUPLE(x32),\ + TYPE_TUPLE(u16),\ + TYPE_TUPLE(s16),\ + TYPE_TUPLE(x16),\ + TYPE_TUPLE(u8), \ + TYPE_TUPLE(s8), \ + TYPE_TUPLE(x8) + static struct func_type { char*name; int size; int sign; } func_types[] = { - TYPE_TUPLE(long), - TYPE_TUPLE(int), - TYPE_TUPLE(short), - TYPE_TUPLE(char), - TYPE_TUPLE(size_t), - TYPE_TUPLE(u64), - TYPE_TUPLE(s64), - TYPE_TUPLE(x64), - TYPE_TUPLE(u32), - TYPE_TUPLE(s32), - TYPE_TUPLE(x32), - TYPE_TUPLE(u16), - TYPE_TUPLE(s16), - TYPE_TUPLE(x16), - TYPE_TUPLE(u8), - TYPE_TUPLE(s8), - TYPE_TUPLE(x8), + FUNC_TYPES, { NULL, 0, 0 } }; +#undef TYPE_TUPLE +#define TYPE_TUPLE(type) FUNC_TYPE_##type + +enum { + FUNC_TYPES, + FUNC_TYPE_MAX +}; + /** * arch_get_func_args - retrieve function arguments via pt_regs * @regs: The registers at the moment the function is called @@ -228,6 +240,7 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign) if (!unsign) arg->sign = func_type->sign; arg->offset = ALIGN(fevent->arg_offset, arg->size); + arg->func_type = ftype; arg->arg = fevent->arg_cnt; fevent->arg_offset = arg->offset + arg->size; -- 2.15.1
[PATCH 13/18] tracing: Add array type to function based events
From: "Steven Rostedt (VMware)" Add syntex to allow the user to create an array type. Brackets after the type field will denote that this is an array type. For example: # echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' > function_events Will make the first argument of the sys_open function call an array of 32 bytes. The array type can also be used in conjunction with the indirect offset brackets as well. For example to get the interrupt stack of regs in do_IRQ() for x86_64. # echo 'do_IRQ(x64[5] regs[16])' > function_events Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 22 +++- kernel/trace/trace_event_ftrace.c | 157 +- 2 files changed, 151 insertions(+), 28 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index b0e6725f3032..4a8a6fb16a0a 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -93,7 +93,7 @@ as follows: ARG := TYPE FIELD | TYPE '=' ADDR | TYPE ADDR | ARG '|' ARG - TYPE := ATOM | 'unsigned' ATOM + TYPE := ATOM | ATOM '[' ']' | 'unsigned' TYPE ATOM := 'u8' | 'u16' | 'u32' | 'u64' | 's8' | 's16' | 's32' | 's64' | @@ -305,3 +305,23 @@ Is the same as -0 [003] d..3 655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50) -0 [003] d..3 655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330) + +Array types +=== + +If there's a case where you want to see an array of a type, then you can +declare a type as an array by adding '[' number ']' after the type. + +To get the net_device perm_addr, from the dev parameter. + + (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr +558 + + # echo 'ip_rcv(x64 skb, x8[6] perm_addr+558)' > function_events + + # echo 1 > events/functions/ip_rcv/enable + # cat trace +-0 [003] ..s3 219.813582: __netif_receive_skb_core->ip_rcv(skb=880118195e00, perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 219.813595: __netif_receive_skb_core->ip_rcv(skb=880118195e00, perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 220.115053: __netif_receive_skb_core->ip_rcv(skb=880118195c00, perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 220.115293: __netif_receive_skb_core->ip_rcv(skb=880118195c00, perm_addr=b4,b5,2f,ce,18,65) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 206114f192be..64e2d7dcfd18 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -20,6 +20,7 @@ struct func_arg { char*name; longindirect; longindex; + short array; short offset; short size; s8 arg; @@ -68,6 +69,9 @@ enum func_states { FUNC_STATE_PIPE, FUNC_STATE_PLUS, FUNC_STATE_TYPE, + FUNC_STATE_ARRAY, + FUNC_STATE_ARRAY_SIZE, + FUNC_STATE_ARRAY_END, FUNC_STATE_VAR, FUNC_STATE_COMMA, FUNC_STATE_END, @@ -289,6 +293,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta static bool update_arg; static int unsign; unsigned long val; + char *type; int ret; int i; @@ -339,6 +344,10 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta return FUNC_STATE_TYPE; case FUNC_STATE_TYPE: + if (token[0] == '[') + return FUNC_STATE_ARRAY; + /* Fall through */ + case FUNC_STATE_ARRAY_END: if (WARN_ON(!fevent->last_arg)) break; if (update_arg_name(fevent, token) < 0) @@ -350,14 +359,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta update_arg = true; return FUNC_STATE_VAR; + case FUNC_STATE_ARRAY: case FUNC_STATE_BRACKET: - WARN_ON(!fevent->last_arg); + if (WARN_ON(!fevent->last_arg)) + break; ret = kstrtoul(token, 0, &val); if (ret) break; - val *= fevent->last_arg->size; - fevent->last_arg->indirect = val ^ INDIRECT_FLAG; - return FUNC_STATE_INDIRECT; + if (state == FUNC_STATE_BRACKET) { + val *= fevent->last_arg->size; + fevent->last_arg->indirect = val ^ INDIRECT_FLAG; + return FUNC_STATE_INDIRECT; + } + if (val <= 0) + break; + fevent->last_a
[PATCH 15/18] tracing: Add string type for dynamic strings in function based events
From: "Steven Rostedt (VMware)" Add a "string" type that will create a dynamic length string for the event, this is the same as the __string() field in normal TRACE_EVENTS. [ missing 'static' found by Fengguang Wu's kbuild test robot ] Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 19 ++- kernel/trace/trace_event_ftrace.c | 183 +++--- 2 files changed, 181 insertions(+), 21 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 99ae77cd59e6..6c643ea749e7 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -99,7 +99,7 @@ as follows: 's8' | 's16' | 's32' | 's64' | 'x8' | 'x16' | 'x32' | 'x64' | 'char' | 'short' | 'int' | 'long' | 'size_t' | -'symbol' +'symbol' | 'string' FIELD := | INDEX | OFFSET | OFFSET INDEX @@ -342,3 +342,20 @@ the format "%s". If a nul is found, the output will stop. Use another type bash-1470 [003] ...2 980.678715: path_openat->link_path_walk(name=/lib64/ld-linux-x86-64.so.2) bash-1470 [003] ...2 980.678721: path_openat->link_path_walk(name=ld-2.24.so) bash-1470 [003] ...2 980.678978: path_lookupat->link_path_walk(name=/etc/ld.so.preload) + + +Dynamic strings +=== + +Static strings are fine, but they can waste a lot of memory in the ring buffer. +The above allocated 64 bytes for a character array, but most of the output was +less than 20 characters. Not wanting to truncate strings or waste space on +the ring buffer, the dynamic string can help. + +Use the "string" type for strings that have a large range in size. The max +size that will be recorded is 512 bytes. If a string is larger than that, then +it will be truncated. + + # echo 'link_path_walk(string name)' > function_events + +Gives the same result as above, but does not waste buffer space. diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index dd24b840329d..273c5838a8e2 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -39,6 +39,7 @@ struct func_event { struct func_arg *last_arg; int arg_cnt; int arg_offset; + int has_strings; }; struct func_file { @@ -83,6 +84,8 @@ typedef u32 x32; typedef u16 x16; typedef u8 x8; typedef void * symbol; +/* 2 byte offset, 2 byte length */ +typedef u32 string; #define TYPE_TUPLE(type) \ { #type, sizeof(type), is_signed_type(type) } @@ -105,7 +108,8 @@ typedef void * symbol; TYPE_TUPLE(u8), \ TYPE_TUPLE(s8), \ TYPE_TUPLE(x8), \ - TYPE_TUPLE(symbol) + TYPE_TUPLE(symbol), \ + TYPE_TUPLE(string) static struct func_type { char*name; @@ -124,6 +128,16 @@ enum { FUNC_TYPE_MAX }; +#define MAX_STR512 + +/* Two contexts, normal and NMI, hence the " * 2" */ +struct func_string { + charbuf[MAX_STR * 2]; +}; + +static struct func_string __percpu *str_buffer; +static int nr_strings; + /** * arch_get_func_args - retrieve function arguments via pt_regs * @regs: The registers at the moment the function is called @@ -163,6 +177,23 @@ int __weak arch_get_func_args(struct pt_regs *regs, return 0; } +static void free_arg(struct func_arg *arg) +{ + list_del(&arg->list); + if (arg->func_type == FUNC_TYPE_string) { + nr_strings--; + if (WARN_ON(nr_strings < 0)) + nr_strings = 0; + if (!nr_strings) { + free_percpu(str_buffer); + str_buffer = NULL; + } + } + kfree(arg->name); + kfree(arg->type); + kfree(arg); +} + static void free_func_event(struct func_event *func_event) { struct func_arg *arg, *n; @@ -171,10 +202,7 @@ static void free_func_event(struct func_event *func_event) return; list_for_each_entry_safe(arg, n, &func_event->args, list) { - list_del(&arg->list); - kfree(arg->name); - kfree(arg->type); - kfree(arg); + free_arg(arg); } ftrace_free_filter(&func_event->ops); kfree(func_event->call.print_fmt); @@ -255,6 +283,17 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign) list_add_tail(&arg->list, &fevent->args); fevent->last_arg = arg; + if (ftype == FUNC_TYPE_string) { + fevent->has_strings++; + nr_strings++; + if (nr_strings == 1) { + str_buffer = alloc_
[PATCH 14/18] tracing: Have char arrays be strings for function based events
From: "Steven Rostedt (VMware)" If a field in a function based event is defined with type "char[##]" then it will be considered a static string. If a user wants an actual byte array they should use one of u8, s8, or x8. Now we can get strings from events: # echo 'SyS_openat(int dfd, char[64] buf, x32 flags, x32 mode)' > function_events # grep xxx /etc/* # cat trace grep-1745 [001] 346135.431364: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/adjtime, flags=100, mode=0) grep-1745 [001] 346135.431734: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/aliases, flags=100, mode=0) grep-1745 [001] 346135.618765: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/alternatives, flags=100, mode=0) grep-1745 [001] 346135.619063: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/anacrontab, flags=100, mode=0) grep-1745 [001] 346135.619134: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/asciidoc, flags=100, mode=0) grep-1745 [001] 346135.619390: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/asound.conf, flags=100, mode=0) grep-1745 [001] 346135.624350: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/audisp, flags=100, mode=0) grep-1745 [001] 346135.624565: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/audit, flags=100, mode=0) Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 17 + kernel/trace/trace_event_ftrace.c | 21 + 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 4a8a6fb16a0a..99ae77cd59e6 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -325,3 +325,20 @@ To get the net_device perm_addr, from the dev parameter. -0 [003] ..s3 219.813595: __netif_receive_skb_core->ip_rcv(skb=880118195e00, perm_addr=b4,b5,2f,ce,18,65) -0 [003] ..s3 220.115053: __netif_receive_skb_core->ip_rcv(skb=880118195c00, perm_addr=b4,b5,2f,ce,18,65) -0 [003] ..s3 220.115293: __netif_receive_skb_core->ip_rcv(skb=880118195c00, perm_addr=b4,b5,2f,ce,18,65) + + +Static strings +== + +An array of type 'char' or 'unsigned char' will be processed as a string using +the format "%s". If a nul is found, the output will stop. Use another type +(x8, u8, s8) if this is not desired. + + # echo 'link_path_walk(char[64] name)' > function_events + + # echo 1 > events/functions/link_path_walk/enable + # cat trace + bash-1470 [003] ...2 980.678664: path_openat->link_path_walk(name=/usr/bin/cat) + bash-1470 [003] ...2 980.678715: path_openat->link_path_walk(name=/lib64/ld-linux-x86-64.so.2) + bash-1470 [003] ...2 980.678721: path_openat->link_path_walk(name=ld-2.24.so) + bash-1470 [003] ...2 980.678978: path_lookupat->link_path_walk(name=/etc/ld.so.preload) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 64e2d7dcfd18..dd24b840329d 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -610,6 +610,14 @@ static void make_fmt(struct func_arg *arg, char *fmt) fmt[c++] = '%'; + if (arg->func_type == FUNC_TYPE_char) { + if (arg->array) + fmt[c++] = 's'; + else + fmt[c++] = 'c'; + goto out; + } + if (arg->size == 8) { fmt[c++] = 'l'; fmt[c++] = 'l'; @@ -622,6 +630,7 @@ static void make_fmt(struct func_arg *arg, char *fmt) else fmt[c++] = 'u'; + out: fmt[c++] = '\0'; } @@ -639,7 +648,10 @@ static void write_data(struct trace_seq *s, const struct func_arg *arg, const ch trace_seq_printf(s, fmt, *(unsigned short *)data); break; case 1: - trace_seq_printf(s, fmt, *(unsigned char *)data); + if (arg->array && arg->func_type == FUNC_TYPE_char) + trace_seq_printf(s, fmt, (char *)data); + else + trace_seq_printf(s, fmt, *(unsigned char *)data); break; } } @@ -672,7 +684,7 @@ func_event_print(struct trace_iterator *iter, int flags, make_fmt(arg, fmt); - if (arg->array) { + if (arg->array && arg->func_type != FUNC_TYPE_char) { comma = false; for (a = 0; a < arg->array; a++, data += arg->size) { if (comma) @@ -821,7 +833,7 @@ static int __set_print_fmt(struct func_event *func_event, make_fmt(arg, fmt); - if (arg->array) { + if (arg->array && arg->func_type != FUNC_TYPE_char)
[PATCH 17/18] tracing: Add indirect to indirect access for function based events
From: "Steven Rostedt (VMware)" Allow the function based events to retrieve not only the parameters offsets, but also get data from a pointer within a parameter structure. Something like: # echo 'ip_rcv(string skdev+16[0][0] | x8[6] skperm+16[0]+558)' > function_events # echo 1 > events/functions/ip_rcv/enable # cat trace -0 [003] ..s3 310.626391: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 310.626400: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.183775: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.184329: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.303895: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.304610: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.471980: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 312.472908: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) -0 [003] ..s3 313.135804: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65) That is, we retrieved the net_device of the sk_buff and displayed its name and perm_addr info. sk->dev->name, sk->dev->perm_addr Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 40 +- kernel/trace/trace_event_ftrace.c | 102 -- 2 files changed, 136 insertions(+), 6 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index b90b52b7061d..3b341992b93d 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -101,12 +101,15 @@ as follows: 'char' | 'short' | 'int' | 'long' | 'size_t' | 'symbol' | 'string' - FIELD := | INDEX | OFFSET | OFFSET INDEX + FIELD := | INDEX | OFFSET | OFFSET INDEX | +FIELD INDIRECT INDEX := '[' ']' OFFSET := '+' + INDIRECT := INDEX | OFFSET | INDIRECT INDIRECT | '' + ADDR := A hexidecimal address starting with '0x' Where is a unique string starting with an alphabetic character @@ -385,3 +388,38 @@ based event. NULL can appear in any argument, to have them ignored. Note, skipping arguments does not give you access to later arguments if they are not supported by the architecture. The architecture only supplies the first set of arguments. + + +The chain of indirects +== + +When a parameter is a structure, and that structure points to another structure, +the data of that structure can still be found. + +ssize_t __vfs_read(struct file *file, char __user *buf, size_t count, + loff_t *pos) + +has the following code. + + if (file->f_op->read) + return file->f_op->read(file, buf, count, pos); + +To trace all the functions that are called by f_op->read(), that information +can be obtained from the file pointer. + +Using gdb again: + + (gdb) printf "%d\n", &((struct file *)0)->f_op +40 + (gdb) printf "%d\n", &((struct file_operations *)0)->read +16 + +# echo '__vfs_read(symbol read+40[0]+16)' > function_events + + # echo 1 > events/functions/__vfs_read/enable + # cat trace + sshd-1343 [005] ...2 199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) + bash-1344 [003] ...2 199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) + sshd-1343 [005] ...2 199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) + avahi-daemon-910 [003] ...2 200.136740: vfs_read->__vfs_read(read= (null)) + avahi-daemon-910 [003] ...2 200.136750: vfs_read->__vfs_read(read= (null)) diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 22bcb67ad184..b5b719680686 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -14,8 +14,15 @@ #define WRITE_BUFSIZE 4096 #define INDIRECT_FLAG 0x1000 +struct func_arg_redirect { + struct list_headlist; + longindex; + longindirect; +}; + struct func_arg { struct list_headlist; + struct list_headredirects; char*type; char*name; longindirect; @@ -73,6 +80,8 @@ enum func_states { FUNC_STATE_ARRAY, FUNC_STATE_ARRAY_SIZE, FUNC_STATE_ARRAY_END, + FUNC_STATE_REDIRECT_PLUS, + FUNC_STATE_REDIRECT_BRACKET, FUNC_STATE_VAR, FUNC_STATE_COMMA, FUNC_STATE_NULL, @@ -267,6 +276,8 @@ static int add_arg(struct func_event *fevent, int f
[PATCH 16/18] tracing: Add NULL to skip args for function based events
From: "Steven Rostedt (VMware)" If args are to be skipped (only care about second, third or later arguments) then add a NULL to ignore them. For example, if one only wants to record the third argument of a function, they can perform: echo foo(NULL, NULL, u32 arg3) > function_events Then only the third argument is saved in the function based event. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 28 +- kernel/trace/trace_event_ftrace.c | 34 ++- 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 6c643ea749e7..b90b52b7061d 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -91,7 +91,7 @@ as follows: ARGS := ARG | ARG ',' ARGS | '' - ARG := TYPE FIELD | TYPE '=' ADDR | TYPE ADDR | ARG '|' ARG + ARG := TYPE FIELD | TYPE '=' ADDR | TYPE ADDR | ARG '|' ARG | 'NULL' TYPE := ATOM | ATOM '[' ']' | 'unsigned' TYPE @@ -359,3 +359,29 @@ it will be truncated. # echo 'link_path_walk(string name)' > function_events Gives the same result as above, but does not waste buffer space. + + +NULL arguments +== + +If you are only interested in the second, or later parameter of a function, +you do not have to record the previous parameters. Just set them as NULL and +they will not be recorded. + +If we only wanted the perm_addr of the net_device of ip_rcv() and not the +sk_buff, we put a NULL into the first parameter when created the function +based event. + + # echo 'ip_rcv(NULL, x8[6] perm_addr+558)' > function_events + + # echo 1 > events/functions/ip_rcv/enable + # cat trace +-0 [003] ..s3 165.617114: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 165.617133: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 166.412277: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65) +-0 [003] ..s3 166.412797: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65) + + +NULL can appear in any argument, to have them ignored. Note, skipping arguments +does not give you access to later arguments if they are not supported by the +architecture. The architecture only supplies the first set of arguments. diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 273c5838a8e2..22bcb67ad184 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -75,6 +75,7 @@ enum func_states { FUNC_STATE_ARRAY_END, FUNC_STATE_VAR, FUNC_STATE_COMMA, + FUNC_STATE_NULL, FUNC_STATE_END, FUNC_STATE_ERROR, }; @@ -117,6 +118,7 @@ static struct func_type { int sign; } func_types[] = { FUNC_TYPES, + { "NULL", 0, 0 }, { NULL, 0, 0 } }; @@ -125,6 +127,7 @@ static struct func_type { enum { FUNC_TYPES, + FUNC_TYPE_NULL, FUNC_TYPE_MAX }; @@ -364,6 +367,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta fevent->arg_cnt++; update_arg = false; case FUNC_STATE_PIPE: + if (strcmp(token, "NULL") == 0) + return FUNC_STATE_NULL; if (strcmp(token, "unsigned") == 0) { unsign = 2; return FUNC_STATE_UNSIGNED; @@ -513,6 +518,19 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta fevent->last_arg->indirect = INDIRECT_FLAG; return FUNC_STATE_ADDR; + case FUNC_STATE_NULL: + ret = add_arg(fevent, FUNC_TYPE_NULL, 0); + if (ret < 0) + break; + switch (token[0]) { + case ')': + goto end; + case ',': + update_arg = true; + return FUNC_STATE_COMMA; + } + break; + default: break; } @@ -689,6 +707,8 @@ static void func_event_trace(struct trace_event_file *trace_file, entry->parent_ip = parent_ip; list_for_each_entry(arg, &func_event->args, list) { + if (arg->func_type == FUNC_TYPE_NULL) + continue; if (arg->arg < nr_args) val = get_arg(arg, args); else @@ -811,6 +831,8 @@ func_event_print(struct trace_iterator *iter, int flags, trace_seq_printf(s, "%ps->%ps(", (void *)entry->parent_ip, (void *)entry->ip); list_for_each_entry(arg, &func_event->args, list) { + if (arg->func_type == FUNC_TYPE_NULL) + continue;
[PATCH v2] x86: e820: Implement a range manipulation operator
Add a more versatile memmap= operator, which -- in addition to all the things that were possible before -- allows you to: - redeclare existing ranges -- before, you were limited to adding ranges; - drop any range -- like a mem= for any location; - use any e820 memory type -- not just some predefined ones. The syntax is: memmap=%-+ Size and offset work as usual. The "-" and "+" are optional and their existence determine the behavior: The command works on the specified range of memory limited to type (if specified). This memory is then configured to show up as . If is not specified, the memory is removed from the e820 map. Signed-off-by: Jan H. Schönherr --- v2: Small coding style and typography adjustments Documentation/admin-guide/kernel-parameters.txt | 9 + arch/x86/kernel/e820.c | 18 ++ 2 files changed, 27 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 46b26bfee27b..60926ae3ec06 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2221,6 +2221,15 @@ The memory region may be marked as e820 type 12 (0xc) and is NVDIMM or ADR memory. + memmap=%-+ + [KNL,ACPI] Convert memory within the specified region + from to . If "-" is left + out, the whole region will be marked as , + even if previously unavailable. If "+" is left + out, matching memory will be removed. Types are + specified as e820 types, e.g., 1 = RAM, 2 = reserved, + 3 = ACPI, 12 = PRAM. + memory_corruption_check=0/1 [X86] Some BIOSes seem to corrupt the first 64k of memory when doing things like suspend/resume. diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 71c11ad5643e..6a2cb1442e05 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -924,6 +924,24 @@ static int __init parse_memmap_one(char *p) } else if (*p == '!') { start_at = memparse(p+1, &p); e820__range_add(start_at, mem_size, E820_TYPE_PRAM); + } else if (*p == '%') { + enum e820_type from = 0, to = 0; + + start_at = memparse(p + 1, &p); + if (*p == '-') + from = simple_strtoull(p + 1, &p, 0); + if (*p == '+') + to = simple_strtoull(p + 1, &p, 0); + if (*p != '\0') + return -EINVAL; + if (from && to) + e820__range_update(start_at, mem_size, from, to); + else if (to) + e820__range_add(start_at, mem_size, to); + else if (from) + e820__range_remove(start_at, mem_size, from, 1); + else + e820__range_remove(start_at, mem_size, 0, 0); } else { e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1); } -- 2.9.3.1.gcba166c.dirty
Re: [PATCH v2 1/2] ASoC: codecs: Add support for AK5558 ADC driver
On Fri, Feb 02, 2018 at 09:33:18PM +0200, Andy Shevchenko wrote: > On Fri, Feb 2, 2018 at 6:20 PM, Daniel Baluta wrote: > > +static int ak5558_set_dai_mute(struct snd_soc_dai *dai, int mute) > > +{ > > + struct snd_soc_codec *codec = dai->codec; > > + struct ak5558_priv *ak5558 = snd_soc_codec_get_drvdata(codec); > > > + int ndt = 0; > > It might be even > > int ndt = max(ak5558->fs ? 583000 / ak5558->fs : 5, 5); Please don't encourage people to use the ternery operator like that, it does nothing for legibility not to write out the conditionals. > > +static const struct i2c_device_id ak5558_i2c_id[] = { > > + { "ak5558", 0 }, > > + { } > > +}; > > +MODULE_DEVICE_TABLE(i2c, ak5558_i2c_id); > I dunno if it's really helpful to have. Though it's up to Mark and you. I don't care either way. signature.asc Description: PGP signature
[PATCH 00/18] [ANNOUNCE] Dynamically created function based events
At Kernel Summit back in October, we tried to bring up trace markers, which would be nops within the kernel proper, that would allow modules to hook arbitrary trace events to them. The reaction to this proposal was less than favorable. We were told that we were trying to make a work around for a problem, and not solving it. The problem in our minds is the notion of a "stable trace event". There are maintainers that do not want trace events, or more trace events in their subsystems. This is due to the fact that trace events post an interface to user space, and this interface could become required by some tool. This may cause the trace event to become stable where it must not break the tool, and thus prevent the code from changing. Or, the trace event may just have to add padding for fields that tools may require. The "success" field of the sched_wakeup trace event is one such instance. There is no more "success" variable, but tools may fail if it were to go away, so a "1" is simply added to the trace event wasting ring buffer real estate. I talked with Linus about this, and he told me that we already have these markers in the kernel. They are from the mcount/__fentry__ used by function tracing. Have the trace events be created by these, and see if this will satisfy most areas that want trace events. I decided to implement this idea, and here's the patch set. Introducing "function based events". These are created dynamically by a tracefs file called "function_events". By writing a pseudo prototype into this file, you create an event. # mount -t tracefs nodev /sys/kernel/tracing # cd /sys/kernel/tracing # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events # cat events/functions/do_IRQ/format name: do_IRQ ID: 1399 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned long __parent_ip;offset:8; size:8; signed:0; field:unsigned long __ip; offset:16; size:8; signed:0; field:symbol ip;offset:24; size:8; signed:0; field:x64 irq_stack[6]; offset:32; size:48;signed:0; print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip, REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4], REC->irq_stack[5] # echo 1 > events/functions/do_IRQ/enable # cat trace -0 [003] d..3 3647.049344: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.049433: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.049672: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.325709: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.325929: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.325993: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.387571: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.387791: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) -0 [003] d..3 3647.387874: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=81665db1,10,246,c96c3e80,18,88011eae9b40) And this is much more powerful than just this. We can show strings, and index off of structures into other structures. # echo '__vfs_read(symbol read+40[0]+16)' > function_events # echo 1 > events/functions/__vfs_read/enable # cat trace sshd-1343 [005] ...2 199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) bash-1344 [003] ...2 199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) sshd-1343 [005] ...2 199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0) avahi-daemon-910 [003] ...2 200.136740: vfs_read->__vfs_read(read= (null)) avahi-daemon-910 [003] ...2 200.136750: vfs_read->__vfs_read(read= (null)) And even read user space: # echo 'SyS_openat(int dfd, str
[PATCH 05/18] tracing: Add hex print for dynamic ftrace based events
From: "Steven Rostedt (VMware)" Add x64, x32, x16 and x8 to represent numbers of the same size in hex. Similar to u64, u32, u16, and u8 but uses %x instead of %u. Signed-off-by: Steven Rostedt (VMware) --- Documentation/trace/function-based-events.rst | 14 +- kernel/trace/trace_event_ftrace.c | 13 - 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst index 94c2c975295a..f27a0c4e829c 100644 --- a/Documentation/trace/function-based-events.rst +++ b/Documentation/trace/function-based-events.rst @@ -97,6 +97,7 @@ as follows: ATOM := 'u8' | 'u16' | 'u32' | 'u64' | 's8' | 's16' | 's32' | 's64' | + 'x8' | 'x16' | 'x32' | 'x64' | 'char' | 'short' | 'int' | 'long' | 'size_t' FIELD := @@ -116,11 +117,14 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, If we are only interested in the first argument (skb): - # echo 'ip_rcv(u64 skb, u64 dev)' > function_events + # echo 'ip_rcv(x64 skb, x86 dev)' > function_events # echo 1 > events/functions/ip_rcv/enable # cat trace - -0 [003] ..s3 2119.041935: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920) - -0 [003] ..s3 2119.041944: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920) - -0 [003] ..s3 2119.288337: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920) - -0 [003] ..s3 2119.288960: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920) + -0 [003] ..s3 5543.133460: __netif_receive_skb_core->ip_rcv(skb=88007f960700, net=88011425) + -0 [003] ..s3 5543.133475: __netif_receive_skb_core->ip_rcv(skb=88007f960700, net=88011425) + -0 [003] ..s3 5543.312592: __netif_receive_skb_core->ip_rcv(skb=88007f960700, net=88011425) + -0 [003] ..s3 5543.313150: __netif_receive_skb_core->ip_rcv(skb=88007f960700, net=88011425) + +We use "x64" in order to make sure that the data is displayed in hex. +This is on a x86_64 machine, and we know the pointer sizes are 8 bytes. diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c index 66465be1e6d5..aa19c8af9d34 100644 --- a/kernel/trace/trace_event_ftrace.c +++ b/kernel/trace/trace_event_ftrace.c @@ -62,6 +62,11 @@ enum func_states { FUNC_STATE_ERROR, }; +typedef u64 x64; +typedef u32 x32; +typedef u16 x16; +typedef u8 x8; + #define TYPE_TUPLE(type) \ { #type, sizeof(type), is_signed_type(type) } @@ -77,12 +82,16 @@ static struct func_type { TYPE_TUPLE(size_t), TYPE_TUPLE(u64), TYPE_TUPLE(s64), + TYPE_TUPLE(x64), TYPE_TUPLE(u32), TYPE_TUPLE(s32), + TYPE_TUPLE(x32), TYPE_TUPLE(u16), TYPE_TUPLE(s16), + TYPE_TUPLE(x16), TYPE_TUPLE(u8), TYPE_TUPLE(s8), + TYPE_TUPLE(x8), { NULL, 0, 0 } }; @@ -353,7 +362,9 @@ static void make_fmt(struct func_arg *arg, char *fmt) fmt[c++] = 'l'; } - if (arg->sign) + if (arg->type[0] == 'x') + fmt[c++] = 'x'; + else if (arg->sign) fmt[c++] = 'd'; else fmt[c++] = 'u'; -- 2.15.1