Re: [PATCH] perf stat: fix cvs output format
On Mon, Mar 05, 2018 at 10:43:53PM -0800, Cong Wang wrote: > From: Ilya Pronin > > When printing stats in CSV mode, perf stat appends extra CSV > separators when counter is not supported: > > supported>,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00 > > which causes a failure of parsing fields. The numbers of separators > is fixed for each line, no matter supported or not supported. > > Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output") > Cc: Andi Kleen > Cc: Arnaldo Carvalho de Melo > Cc: Jiri Olsa > Signed-off-by: Ilya Pronin > Signed-off-by: Cong Wang > --- > tools/perf/builtin-stat.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index 98bf9d32f222..54a4c152edb3 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -917,7 +917,7 @@ static void print_metric_csv(void *ctx, > char buf[64], *vals, *ends; > > if (unit == NULL || fmt == NULL) { > - fprintf(out, "%s%s%s%s", csv_sep, csv_sep, csv_sep, csv_sep); > + fprintf(out, "%s%s", csv_sep, csv_sep); > return; > } right, the non else legs prints just 2 values: fprintf(out, "%s%s%s%s", csv_sep, vals, csv_sep, unit); Acked-by: Jiri Olsa thanks, jirka
Re: [PATCH v4 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock
On 03/05/2018 12:41 PM, Aaron Lu wrote: > On Fri, Mar 02, 2018 at 06:55:25PM +0100, Vlastimil Babka wrote: >> On 03/01/2018 03:00 PM, Michal Hocko wrote: >>> >>> I am really surprised that this has such a big impact. >> >> It's even stranger to me. Struct page is 64 bytes these days, exactly a >> a cache line. Unless that changed, Intel CPUs prefetched a "buddy" cache >> line (that forms an aligned 128 bytes block with the one we touch). >> Which is exactly a order-0 buddy struct page! Maybe that implicit >> prefetching stopped at L2 and explicit goes all the way to L1, can't > > The Intel Architecture Optimization Manual section 7.3.2 says: > > prefetchT0 - fetch data into all cache levels > Intel Xeon Processors based on Nehalem, Westmere, Sandy Bridge and newer > microarchitectures: 1st, 2nd and 3rd level cache. > > prefetchT2 - fetch data into 2nd and 3rd level caches (identical to > prefetchT1) > Intel Xeon Processors based on Nehalem, Westmere, Sandy Bridge and newer > microarchitectures: 2nd and 3rd level cache. > > prefetchNTA - fetch data into non-temporal cache close to the processor, > minimizing cache pollution > Intel Xeon Processors based on Nehalem, Westmere, Sandy Bridge and newer > microarchitectures: must fetch into 3rd level cache with fast replacement. > > I tried 'prefetcht0' and 'prefetcht2' instead of the default > 'prefetchNTA' on a 2 sockets Intel Skylake, the two ended up with about > the same performance number as prefetchNTA. I had expected prefetchT0 to > deliver a better score if it was indeed due to L1D since prefetchT2 will > not place data into L1 while prefetchT0 will, but looks like it is not > the case here. > > It feels more like the buddy cacheline isn't in any level of the caches > without prefetch for some reason. So the adjacent line prefetch might be disabled? Could you check bios or the MSR mentioned in https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors >> remember. Would that make such a difference? It would be nice to do some >> perf tests with cache counters to see what is really going on... > > Compare prefetchT2 to no-prefetch, I saw these metrics change: > > no-prefetch change prefetchT2 metrics > \ \ > stddev stddev > > 0.18+0.00.18perf-stat.branch-miss-rate% > > 8.268e+09+3.8% 8.585e+09perf-stat.branch-misses > > 2.333e+10+4.7% 2.443e+10perf-stat.cache-misses > > 2.402e+11+5.0% 2.522e+11perf-stat.cache-references > > 3.52-1.1% 3.48perf-stat.cpi > > 0.02-0.00.01 ±3%perf-stat.dTLB-load-miss-rate% > > 8.677e+08-7.3% 8.048e+08 ±3%perf-stat.dTLB-load-misses > > 1.18+0.01.19perf-stat.dTLB-store-miss-rate% > > 2.359e+10+6.0% 2.502e+10perf-stat.dTLB-store-misses > > 1.979e+12+5.0% 2.078e+12perf-stat.dTLB-stores > > 6.126e+09 +10.1% 6.745e+09 ±3%perf-stat.iTLB-load-misses > > 3464-8.4% 3172 ±3% > perf-stat.instructions-per-iTLB-miss > 0.28+1.1% 0.29perf-stat.ipc > > 2.929e+09+5.1% 3.077e+09perf-stat.minor-faults > > 9.244e+09+4.7% 9.681e+09perf-stat.node-loads > > 2.491e+08+5.8% 2.634e+08perf-stat.node-store-misses > > 6.472e+09+6.1% 6.869e+09perf-stat.node-stores > > 2.929e+09+5.1% 3.077e+09perf-stat.page-faults > >2182469-4.2%2090977perf-stat.path-length > > Not sure if this is useful though... Looks like most stats increased in absolute values as the work done increased and this is a time-limited benchmark? Although number of instructions (calculated from itlb misses and insns-per-itlb-miss) shows less than 1% increase, so dunno. And the improvement comes from reduced dTLB-load-misses? That makes no sense for order-0 buddy struct pages which always share a page. And the memmap mapping should use huge pages. BTW what is path-length?
Re: [PATCH v2 0/2] perf sched map: re-annotate shortname if thread comm changed
On Tue, Mar 06, 2018 at 11:37:35AM +0800, changbin...@intel.com wrote: > From: Changbin Du > > v2: > o add a patch to move thread::shortname to thread_runtime > o add function perf_sched__process_comm() to process PERF_RECORD_COMM event. > > Changbin Du (2): > perf sched: move thread::shortname to thread_runtime > perf sched map: re-annotate shortname if thread comm changed Acked-by: Jiri Olsa thanks, jirka > > tools/perf/builtin-sched.c | 132 > ++--- > tools/perf/util/thread.h | 1 - > 2 files changed, 90 insertions(+), 43 deletions(-) > > -- > 2.7.4 >
Re: [PATCH] perf report: Provide libtraceevent with a kernel symbol resolver
On Thu, Feb 08, 2018 at 01:20:31PM +0100, Jiri Olsa wrote: > On Mon, Jan 15, 2018 at 12:47:32PM +0800, Wang YanQing wrote: > > So that beautifiers wanting to resolve kernel function addresses to > > names can do its work, and when we use "perf report" for output of > > "perf kmem record", we will get kernel symbol output. > > > > Signed-off-by: Wang YanQing > > Acked-by: Jiri Olsa Hi! Arnaldo Carvalho de Melo What is the status of this patch now? Does the patch sanked to the bottom of your mailbox? Thanks!
[PATCH v4 2/3] Input: gpio-keys - allow setting wakeup event action in DT
Allow specifying event actions to trigger wakeup when using the gpio-keys input device as a wakeup source. Reviewed-by: Rob Herring Signed-off-by: Jeffy Chen --- Changes in v4: None Changes in v3: None Changes in v2: Specify wakeup event action instead of irq trigger type as Brian suggested. Documentation/devicetree/bindings/input/gpio-keys.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/input/gpio-keys.txt b/Documentation/devicetree/bindings/input/gpio-keys.txt index a94940481e55..996ce84352cb 100644 --- a/Documentation/devicetree/bindings/input/gpio-keys.txt +++ b/Documentation/devicetree/bindings/input/gpio-keys.txt @@ -26,6 +26,14 @@ Optional subnode-properties: If not specified defaults to 5. - wakeup-source: Boolean, button can wake-up the system. (Legacy property supported: "gpio-key,wakeup") + - wakeup-event-action: Specifies whether the key should wake the + system when asserted, when deasserted, or both. This property is + only valid for keys that wake up the system (e.g., when the + "wakeup-source" property is also provided). + Supported values are defined in linux-event-codes.h: + EV_ACT_ASSERTED - asserted + EV_ACT_DEASSERTED - deasserted + EV_ACT_ANY - both asserted and deasserted - linux,can-disable: Boolean, indicates that button is connected to dedicated (not shared) interrupt which can be disabled to suppress events from the button. -- 2.11.0
[PATCH v4 3/3] arm64: dts: rockchip: kevin: Avoid wakeup when inserting the pen
Add wakeup event action for Pen Insert gpio key, to avoid wakeup when inserting the pen. Signed-off-by: Jeffy Chen Tested-by: Enric Balletbo i Serra --- Changes in v4: Include dt-binding gpio-keys.h Changes in v3: None Changes in v2: Specify wakeup event action instead of irq trigger type as Brian suggested. arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts index 191a6bcb1704..89126dbe5d91 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts @@ -44,6 +44,7 @@ /dts-v1/; #include "rk3399-gru.dtsi" +#include #include /* @@ -134,6 +135,8 @@ gpios = <&gpio0 13 GPIO_ACTIVE_LOW>; linux,code = ; linux,input-type = ; + /* Wakeup only when ejecting */ + wakeup-event-action = ; wakeup-source; }; }; -- 2.11.0
Re: [PATCH 1/3 RESEND] tpm: add longer timeouts for creation commands.
On Mon, Mar 05, 2018 at 01:09:09PM +, Winkler, Tomas wrote: > Why you need cover letter? What are u missing in the patch description If you submit a *patch set* I *require* a cover letter, yes. /Jarkko
[PATCH v4 1/3] Input: gpio-keys - add support for wakeup event action
Add support for specifying event actions to trigger wakeup when using the gpio-keys input device as a wakeup source. This would allow the device to configure when to wakeup the system. For example a gpio-keys input device for pen insert, may only want to wakeup the system when ejecting the pen. Suggested-by: Brian Norris Signed-off-by: Jeffy Chen --- Changes in v4: Add dt-binding gpio-keys.h, stop saving irq trigger type, add enable/disable wakeup helpers as Dmitry suggested. Changes in v3: Adding more comments as Brian suggested. Changes in v2: Specify wakeup event action instead of irq trigger type as Brian suggested. drivers/input/keyboard/gpio_keys.c| 67 +-- include/dt-bindings/input/gpio-keys.h | 13 +++ include/linux/gpio_keys.h | 2 ++ 3 files changed, 79 insertions(+), 3 deletions(-) create mode 100644 include/dt-bindings/input/gpio-keys.h diff --git a/drivers/input/keyboard/gpio_keys.c b/drivers/input/keyboard/gpio_keys.c index 87e613dc33b8..4bc23648b6a7 100644 --- a/drivers/input/keyboard/gpio_keys.c +++ b/drivers/input/keyboard/gpio_keys.c @@ -30,6 +30,7 @@ #include #include #include +#include struct gpio_button_data { const struct gpio_keys_button *button; @@ -45,10 +46,12 @@ struct gpio_button_data { unsigned int software_debounce; /* in msecs, for GPIO-driven buttons */ unsigned int irq; + unsigned int wakeup_trigger_type; spinlock_t lock; bool disabled; bool key_pressed; bool suspended; + bool wakeup_enabled; }; struct gpio_keys_drvdata { @@ -540,6 +543,8 @@ static int gpio_keys_setup_key(struct platform_device *pdev, } if (bdata->gpiod) { + int active_low = gpiod_is_active_low(bdata->gpiod); + if (button->debounce_interval) { error = gpiod_set_debounce(bdata->gpiod, button->debounce_interval * 1000); @@ -568,6 +573,24 @@ static int gpio_keys_setup_key(struct platform_device *pdev, isr = gpio_keys_gpio_isr; irqflags = IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING; + switch (button->wakeup_event_action) { + case EV_ACT_ASSERTED: + bdata->wakeup_trigger_type = active_low ? + IRQ_TYPE_EDGE_FALLING : IRQ_TYPE_EDGE_RISING; + break; + case EV_ACT_DEASSERTED: + bdata->wakeup_trigger_type = active_low ? + IRQ_TYPE_EDGE_RISING : IRQ_TYPE_EDGE_FALLING; + break; + case EV_ACT_ANY: + /* fall through */ + default: + /* +* For other cases, we are OK letting suspend/resume +* not reconfigure the trigger type. +*/ + break; + } } else { if (!button->irq) { dev_err(dev, "Found button without gpio or irq\n"); @@ -586,6 +609,11 @@ static int gpio_keys_setup_key(struct platform_device *pdev, isr = gpio_keys_irq_isr; irqflags = 0; + + /* +* For IRQ buttons, there is no interrupt for release. +* So we don't need to reconfigure the trigger type for wakeup. +*/ } bdata->code = &ddata->keymap[idx]; @@ -718,6 +746,9 @@ gpio_keys_get_devtree_pdata(struct device *dev) /* legacy name */ fwnode_property_read_bool(child, "gpio-key,wakeup"); + fwnode_property_read_u32(child, "wakeup-event-action", +&button->wakeup_event_action); + button->can_disable = fwnode_property_read_bool(child, "linux,can-disable"); @@ -845,6 +876,31 @@ static int gpio_keys_probe(struct platform_device *pdev) return 0; } +static int gpio_keys_enable_wakeup(struct gpio_button_data *bdata) +{ + int ret; + + ret = enable_irq_wake(bdata->irq); + if (ret) + return ret; + + if (bdata->wakeup_trigger_type) + irq_set_irq_type(bdata->irq, bdata->wakeup_trigger_type); + + return 0; +} + +static void gpio_keys_disable_wakeup(struct gpio_button_data *bdata) +{ + /** +* The trigger type is always both edges for gpio-based keys and we do +* not support changing wakeup trigger for interrupt-based keys. +*/ + if (bdata->wakeup_trigger_type) + irq_set_irq_type(bdata->irq, IRQ_TYPE_EDGE_BOTH); + disable_irq_wake(bdata->irq); +} + static int __maybe_unused gpio_keys_suspend(struct device *dev) { struct gpio_keys_drvdata *ddata = dev_get_drvdata(dev); @@ -854,8 +910,10 @@ static int
[PATCH v4 0/3] gpio-keys: Add support for specifying wakeup event action
On chromebook kevin, we are using gpio-keys for pen insert event. But we only want it to wakeup the system when ejecting the pen. So we may need to change the interrupt trigger type during suspending. Changes in v4: Add dt-binding gpio-keys.h, stop saving irq trigger type, add enable/disable wakeup helpers as Dmitry suggested. Include dt-binding gpio-keys.h Changes in v3: Adding more comments as Brian suggested. Changes in v2: Specify wakeup event action instead of irq trigger type as Brian suggested. Specify wakeup event action instead of irq trigger type as Brian suggested. Specify wakeup event action instead of irq trigger type as Brian suggested. Jeffy Chen (3): Input: gpio-keys - add support for wakeup event action Input: gpio-keys - allow setting wakeup event action in DT arm64: dts: rockchip: kevin: Avoid wakeup when inserting the pen .../devicetree/bindings/input/gpio-keys.txt| 8 +++ arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts | 3 + drivers/input/keyboard/gpio_keys.c | 67 +- include/dt-bindings/input/gpio-keys.h | 13 + include/linux/gpio_keys.h | 2 + 5 files changed, 90 insertions(+), 3 deletions(-) create mode 100644 include/dt-bindings/input/gpio-keys.h -- 2.11.0
Re: [PATCH v3 1/3] Input: gpio-keys - add support for wakeup event action
Hi Dmitry, Thanks for your review. On 03/06/2018 08:30 AM, Dmitry Torokhov wrote: >+ switch (button->wakeup_event_action) { >+ case EV_ACT_ASSERTED: >+ bdata->wakeup_trigger_type = active_low ? >+ IRQF_TRIGGER_FALLING : IRQF_TRIGGER_RISING; IRQ_TYPE_EDGE_FALLING : IRQ_TYPE_EDGE_RISING; ok, will fix in next version >+ break; >+ case EV_ACT_DEASSERTED: >+ bdata->wakeup_trigger_type = active_low ? >+ IRQF_TRIGGER_RISING : IRQF_TRIGGER_FALLING; >+ break; case EV_ACT_ANY: ok, will fix in next version >+ default: >+ /* >+* For other cases, we are OK letting suspend/resume >+* not reconfigure the trigger type. >+*/ >+ break; >+ } >} else { >if (!button->irq) { >dev_err(dev, "Found button without gpio or irq\n"); >@@ -586,6 +606,12 @@ static int gpio_keys_setup_key(struct platform_device *pdev, > >isr = gpio_keys_irq_isr; >irqflags = 0; >+ >+ /* >+* For IRQ buttons, the irq trigger type for press and release >+* are the same. So we don't need to reconfigure the trigger >+* type for wakeup. That is not entirely accurate. Interrupt triggers button press, which is followed by either immediate or delayed release. There is no interrupt for release. ok, will fix the comment >+*/ >} > >bdata->code = &ddata->keymap[idx]; >@@ -618,6 +644,8 @@ static int gpio_keys_setup_key(struct platform_device *pdev, >return error; >} > >+ bdata->irq_trigger_type = irq_get_trigger_type(bdata->irq); Why do we need to store the trigger type? It is always both edges for gpio-based keys and we do not support changing wakeup trigger for interrupt-based keys. right, this is not needed. >+ >return 0; > } > >@@ -718,6 +746,9 @@ gpio_keys_get_devtree_pdata(struct device *dev) >/* legacy name */ >fwnode_property_read_bool(child, "gpio-key,wakeup"); > >+ fwnode_property_read_u32(child, "wakeup-event-action", >+&button->wakeup_event_action); >+ >button->can_disable = >fwnode_property_read_bool(child, "linux,can-disable"); > >@@ -854,6 +885,10 @@ static int __maybe_unused gpio_keys_suspend(struct device *dev) >if (device_may_wakeup(dev)) { >for (i = 0; i < ddata->pdata->nbuttons; i++) { >struct gpio_button_data *bdata = &ddata->data[i]; >+ >+ if (bdata->button->wakeup && bdata->wakeup_trigger_type) >+ irq_set_irq_type(bdata->irq, >+bdata->wakeup_trigger_type); if (bdata->button->wakeup) { if (bdata->wakeup_trigger_type) { error = ...; } enable_irq_wake(bdata->irq); } Might need to be split into a helper; if you add error handling to enable_irq_wake() that woudl be great too. ok, will do that. >if (bdata->button->wakeup) >enable_irq_wake(bdata->irq); >bdata->suspended = true; >@@ -878,6 +913,10 @@ static int __maybe_unused gpio_keys_resume(struct device *dev) >if (device_may_wakeup(dev)) { >for (i = 0; i < ddata->pdata->nbuttons; i++) { >struct gpio_button_data *bdata = &ddata->data[i]; >+ >+ if (bdata->button->wakeup && bdata->wakeup_trigger_type) >+ irq_set_irq_type(bdata->irq, >+bdata->irq_trigger_type); Just use IRQ_TYPE_EDGE_BOTH. >if (bdata->button->wakeup) >disable_irq_wake(bdata->irq); >bdata->suspended = false; >diff --git a/include/linux/gpio_keys.h b/include/linux/gpio_keys.h >index d06bf77400f1..7160df54a6fe 100644 >--- a/include/linux/gpio_keys.h >+++ b/include/linux/gpio_keys.h >@@ -13,6 +13,7 @@ struct device; > * @desc: label that will be attached to button's gpio > * @type: input event type (%EV_KEY, %EV_SW, %EV_ABS) > * @wakeup: configure the button as a wake-up source >+ * @wakeup_event_action: event action to trigger wakeup > * @debounce_interval:debounce ticks interval in msecs > * @can_disable: %true indicates that userspace is allowed to > *disable button via sysfs >@@ -26,6 +27,7 @@ struct gpio_keys_button { >const char *desc; >unsigned int type; >int wakeup; >+ int wakeup_even
Re: [PATCH 8/9] drm/xen-front: Implement GEM operations
On 03/06/2018 09:26 AM, Daniel Vetter wrote: On Mon, Mar 05, 2018 at 03:46:07PM +0200, Oleksandr Andrushchenko wrote: On 03/05/2018 11:32 AM, Daniel Vetter wrote: On Wed, Feb 21, 2018 at 10:03:41AM +0200, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Implement GEM handling depending on driver mode of operation: depending on the requirements for the para-virtualized environment, namely requirements dictated by the accompanying DRM/(v)GPU drivers running in both host and guest environments, number of operating modes of para-virtualized display driver are supported: - display buffers can be allocated by either frontend driver or backend - display buffers can be allocated to be contiguous in memory or not Note! Frontend driver itself has no dependency on contiguous memory for its operation. 1. Buffers allocated by the frontend driver. The below modes of operation are configured at compile-time via frontend driver's kernel configuration. 1.1. Front driver configured to use GEM CMA helpers This use-case is useful when used with accompanying DRM/vGPU driver in guest domain which was designed to only work with contiguous buffers, e.g. DRM driver based on GEM CMA helpers: such drivers can only import contiguous PRIME buffers, thus requiring frontend driver to provide such. In order to implement this mode of operation para-virtualized frontend driver can be configured to use GEM CMA helpers. 1.2. Front driver doesn't use GEM CMA If accompanying drivers can cope with non-contiguous memory then, to lower pressure on CMA subsystem of the kernel, driver can allocate buffers from system memory. Note! If used with accompanying DRM/(v)GPU drivers this mode of operation may require IOMMU support on the platform, so accompanying DRM/vGPU hardware can still reach display buffer memory while importing PRIME buffers from the frontend driver. 2. Buffers allocated by the backend This mode of operation is run-time configured via guest domain configuration through XenStore entries. For systems which do not provide IOMMU support, but having specific requirements for display buffers it is possible to allocate such buffers at backend side and share those with the frontend. For example, if host domain is 1:1 mapped and has DRM/GPU hardware expecting physically contiguous memory, this allows implementing zero-copying use-cases. Note! Configuration options 1.1 (contiguous display buffers) and 2 (backend allocated buffers) are not supported at the same time. Signed-off-by: Oleksandr Andrushchenko Some suggestions below for some larger cleanup work. -Daniel --- drivers/gpu/drm/xen/Kconfig | 13 + drivers/gpu/drm/xen/Makefile| 6 + drivers/gpu/drm/xen/xen_drm_front.h | 74 ++ drivers/gpu/drm/xen/xen_drm_front_drv.c | 80 ++- drivers/gpu/drm/xen/xen_drm_front_drv.h | 1 + drivers/gpu/drm/xen/xen_drm_front_gem.c | 360 drivers/gpu/drm/xen/xen_drm_front_gem.h | 46 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 93 +++ 8 files changed, 667 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem.c create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem.h create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c diff --git a/drivers/gpu/drm/xen/Kconfig b/drivers/gpu/drm/xen/Kconfig index 4cca160782ab..4f4abc91f3b6 100644 --- a/drivers/gpu/drm/xen/Kconfig +++ b/drivers/gpu/drm/xen/Kconfig @@ -15,3 +15,16 @@ config DRM_XEN_FRONTEND help Choose this option if you want to enable a para-virtualized frontend DRM/KMS driver for Xen guest OSes. + +config DRM_XEN_FRONTEND_CMA + bool "Use DRM CMA to allocate dumb buffers" + depends on DRM_XEN_FRONTEND + select DRM_KMS_CMA_HELPER + select DRM_GEM_CMA_HELPER + help + Use DRM CMA helpers to allocate display buffers. + This is useful for the use-cases when guest driver needs to + share or export buffers to other drivers which only expect + contiguous buffers. + Note: in this mode driver cannot use buffers allocated + by the backend. diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile index 4fcb0da1a9c5..12376ec78fbc 100644 --- a/drivers/gpu/drm/xen/Makefile +++ b/drivers/gpu/drm/xen/Makefile @@ -8,4 +8,10 @@ drm_xen_front-objs := xen_drm_front.o \ xen_drm_front_shbuf.o \ xen_drm_front_cfg.o +ifeq ($(CONFIG_DRM_XEN_FRONTEND_CMA),y) + drm_xen_front-objs += xen_drm_front_gem_cma.o +else + drm_xen_front-objs += xen_drm_front_gem.o +endif + obj-$(CONFIG_DRM_XEN_FRONTEND) += drm_xen_front.o diff --git a/drivers/gpu/drm/xen/xen_drm_front.h b/drivers/gpu/drm/xen/xen_drm_front.h index 9ed5bfb248d0..c6f52c892434 100644 --- a/drivers/gpu/drm/xen/xen_drm_front
Re: [PATCH 1/4] drm/atomic: integrate modeset lock with private objects
On Tue, Mar 06, 2018 at 08:29:20AM +0100, Daniel Vetter wrote: > On Wed, Feb 21, 2018 at 04:19:40PM +0100, Maarten Lankhorst wrote: > > Hey, > > > > Op 21-02-18 om 15:37 schreef Rob Clark: > > > Follow the same pattern of locking as with other state objects. This > > > avoids boilerplate in the driver. > > I'm afraid this will prohibit any uses of this on i915, since it still uses > > legacy lock_all(). > > > > Oh well, afaict nothing in i915 uses private objects, so I don't think it's > > harmful. :) > > We do use private objects, as part of dp mst helpers. But I also thought > that the only users left of lock_all are in the debugfs code, where this > really doesn't matter all that much. Correction, we use it in other places than debugfs. But thanks to Ville's private state obj refactoring we now have drm_atomic_private_obj_init(), so it's easy to add all the private state objects to a new list in drm_dev->mode_config.private_states or so, and use that list in drm_modeset_lock_all_ctx to also take driver private locks. I think that would actually be useful in other places, just in case. -Daniel > > > Could you cc intel-gfx just in case? > > Yeah, best to double-check with CI. > > > > Signed-off-by: Rob Clark > > > --- > > > drivers/gpu/drm/drm_atomic.c | 9 - > > > include/drm/drm_atomic.h | 5 + > > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c > > > index fc8c4da409ff..004e621ab307 100644 > > > --- a/drivers/gpu/drm/drm_atomic.c > > > +++ b/drivers/gpu/drm/drm_atomic.c > > > @@ -1078,6 +1078,8 @@ drm_atomic_private_obj_init(struct drm_private_obj > > > *obj, > > > { > > > memset(obj, 0, sizeof(*obj)); > > > > > > + drm_modeset_lock_init(&obj->lock); > > > + > > > obj->state = state; > > > obj->funcs = funcs; > > > } > > > @@ -1093,6 +1095,7 @@ void > > > drm_atomic_private_obj_fini(struct drm_private_obj *obj) > > > { > > > obj->funcs->atomic_destroy_state(obj, obj->state); > > > + drm_modeset_lock_fini(&obj->lock); > > > } > > > EXPORT_SYMBOL(drm_atomic_private_obj_fini); > > > > > > @@ -1113,7 +1116,7 @@ struct drm_private_state * > > > drm_atomic_get_private_obj_state(struct drm_atomic_state *state, > > >struct drm_private_obj *obj) > > > { > > > - int index, num_objs, i; > > > + int index, num_objs, i, ret; > > > size_t size; > > > struct __drm_private_objs_state *arr; > > > struct drm_private_state *obj_state; > > > @@ -1122,6 +1125,10 @@ drm_atomic_get_private_obj_state(struct > > > drm_atomic_state *state, > > > if (obj == state->private_objs[i].ptr) > > > return state->private_objs[i].state; > > > > > > + ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); > > > + if (ret) > > > + return ERR_PTR(ret); > > > + > > > num_objs = state->num_private_objs + 1; > > > size = sizeof(*state->private_objs) * num_objs; > > > arr = krealloc(state->private_objs, size, GFP_KERNEL); > > > diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h > > > index 09076a625637..9ae53b73c9d2 100644 > > > --- a/include/drm/drm_atomic.h > > > +++ b/include/drm/drm_atomic.h > > > @@ -218,6 +218,11 @@ struct drm_private_state_funcs { > > > * &drm_modeset_lock is required to duplicate and update this object's > > > state. > > > */ > > > struct drm_private_obj { > > > + /** > > > + * @lock: Modeset lock to protect the state object. > > > + */ > > > + struct drm_modeset_lock lock; > > > + > > > /** > > >* @state: Current atomic state for this driver private object. > > >*/ > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 1/4] drm/atomic: integrate modeset lock with private objects
On Wed, Feb 21, 2018 at 06:36:54PM +0200, Ville Syrjälä wrote: > On Wed, Feb 21, 2018 at 11:17:21AM -0500, Rob Clark wrote: > > On Wed, Feb 21, 2018 at 10:54 AM, Ville Syrjälä > > wrote: > > > On Wed, Feb 21, 2018 at 10:36:06AM -0500, Rob Clark wrote: > > >> On Wed, Feb 21, 2018 at 10:27 AM, Ville Syrjälä > > >> wrote: > > >> > On Wed, Feb 21, 2018 at 10:20:03AM -0500, Rob Clark wrote: > > >> >> On Wed, Feb 21, 2018 at 10:07 AM, Ville Syrjälä > > >> >> wrote: > > >> >> > On Wed, Feb 21, 2018 at 09:54:49AM -0500, Rob Clark wrote: > > >> >> >> On Wed, Feb 21, 2018 at 9:49 AM, Ville Syrjälä > > >> >> >> wrote: > > >> >> >> > On Wed, Feb 21, 2018 at 09:37:21AM -0500, Rob Clark wrote: > > >> >> >> >> Follow the same pattern of locking as with other state objects. > > >> >> >> >> This > > >> >> >> >> avoids boilerplate in the driver. > > >> >> >> > > > >> >> >> > I'm not sure we really want to do this. What if the driver wants > > >> >> >> > a > > >> >> >> > custom locking scheme for this state? > > >> >> >> > > >> >> >> That seems like something we want to discourage, ie. all the more > > >> >> >> reason for this patch. > > >> >> >> > > >> >> >> There is no reason drivers could not split their global state into > > >> >> >> multiple private objs's, each with their own lock, for more fine > > >> >> >> grained locking. That is basically the only valid reason I can > > >> >> >> think > > >> >> >> of for "custom locking". > > >> >> > > > >> >> > In i915 we have at least one case that would want something close > > >> >> > to an > > >> >> > rwlock. Any crtc lock is enough for read, need all of them for > > >> >> > write. > > >> >> > Though if we wanted to use private objs for that we might need to > > >> >> > actually make the states refcounted as well, otherwise I can imagine > > >> >> > we might land in some use-after-free issues once again. > > >> >> > > > >> >> > Maybe we could duplicate the state into per-crtc and global copies, > > >> >> > but > > >> >> > then we have to keep all of those in sync somehow which doesn't > > >> >> > sound > > >> >> > particularly pleasant. > > >> >> > > >> >> Or just keep your own driver lock for read, and use that plus the core > > >> >> modeset lock for write? > > >> > > > >> > If we can't add the private obj to the state we can't really use it. > > >> > > > >> > > >> I'm not sure why that is strictly true (that you need to add it to the > > >> state if for read-only), since you'd be guarding it with your own > > >> driver read-lock you can just priv->foo_state->bar. > > >> > > >> Since it is read-only access, there is no roll-back to worry about for > > >> test-only or failed atomic_check()s.. > > > > > > That would be super ugly. We want to access the information the same > > > way whether it has been modified or not. > > > > Well, I mean the whole idea of what you want to do seems a bit super-ugly > > ;-) > > > > I mean, in mdp5 the assigned global resources go in plane/crtc state, > > and tracking of what is assigned to which plane/crtc is in global > > state, so it fits nicely in the current locking model. For i915, I'm > > not quite sure what is the global state you are concerned about, so it > > is a bit hard to talk about the best solution in the abstract. Maybe > > the better option is to teach modeset-lock how to be a rwlock instead? > > The thing I'm thinking is the core display clock (cdclk) frequency which > we need to consult whenever computing plane states and whatnot. We don't > want a modeset on one crtc to block a plane update on another crtc > unless we actually have to bump the cdclk (which would generally require > all crtcs to undergo a full modeset). Seems like a generally useful > pattern to me. The usual way to fix that is to have read-only copies of the state in the plane or crtc states. And for writing (or if the requirement changes) you have to lock all the objects. Essentially what Rob's doing for his plane/crtc assignment stuff. What we do in i915 is kinda not what I've been recommending to everyone else, because it is a rather tricky and complicated way to get things done. Sure there's a tradeoff between duplicating data and complicated locking schemes, but I think for the kms case having to explicitly type code that reflects the depencies in computation (instead of having that embedded implicitly in the locking scheme) is a feature, not a bug. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v4 15/38] drm/bridge: analogix_dp: Ensure edp is disabled when shutting down the panel
Hi All, This is the patch, which introduces the issue I've pointed here: https://lists.freedesktop.org/archives/dri-devel/2018-March/167794.html On 2018-03-05 23:23, Enric Balletbo i Serra wrote: From: Lin Huang When panel is shut down, we should make sure edp can be disabled to avoid undefined behavior. Cc: Stéphane Marchesin Signed-off-by: Lin Huang Signed-off-by: zain wang Signed-off-by: Sean Paul Signed-off-by: Thierry Escande Reviewed-by: Andrzej Hajda Signed-off-by: Enric Balletbo i Serra --- drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c index 92fb9a072cb6..9b7d530ad24c 100644 --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c @@ -1160,6 +1160,12 @@ static int analogix_dp_set_bridge(struct analogix_dp_device *dp) pm_runtime_get_sync(dp->dev); + ret = clk_prepare_enable(dp->clock); + if (ret < 0) { + DRM_ERROR("Failed to prepare_enable the clock clk [%d]\n", ret); + goto out_dp_clk_pre; + } + if (dp->plat_data->power_on) dp->plat_data->power_on(dp->plat_data); @@ -1191,6 +1197,8 @@ static int analogix_dp_set_bridge(struct analogix_dp_device *dp) phy_power_off(dp->phy); if (dp->plat_data->power_off) dp->plat_data->power_off(dp->plat_data); + clk_disable_unprepare(dp->clock); +out_dp_clk_pre: pm_runtime_put_sync(dp->dev); return ret; @@ -1234,10 +1242,13 @@ static void analogix_dp_bridge_disable(struct drm_bridge *bridge) disable_irq(dp->irq); phy_power_off(dp->phy); + analogix_dp_set_analog_power_down(dp, POWER_ALL, 1); In case of Exynos DP, external PHY is used to power the DP block, so no register access should be performed after phy_power_off(). Please move analogix_dp_set_analog_power_down() before phy_power_off(). if (dp->plat_data->power_off) dp->plat_data->power_off(dp->plat_data); + clk_disable_unprepare(dp->clock); + pm_runtime_put_sync(dp->dev); ret = analogix_dp_prepare_panel(dp, false, true); Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
Re: [PATCH 1/4] drm/atomic: integrate modeset lock with private objects
On Wed, Feb 21, 2018 at 09:37:21AM -0500, Rob Clark wrote: > Follow the same pattern of locking as with other state objects. This > avoids boilerplate in the driver. > > Signed-off-by: Rob Clark Please also adjust the kernel doc, and I think we can remove the locking WARN_ON in drm_atomic_get_mst_topology_state after this patch (plus again adjust the kerneldoc for that please). Otherwise I think this makes sense, and ecnourages reasonable semantics for driver private state objects. -Daniel > --- > drivers/gpu/drm/drm_atomic.c | 9 - > include/drm/drm_atomic.h | 5 + > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c > index fc8c4da409ff..004e621ab307 100644 > --- a/drivers/gpu/drm/drm_atomic.c > +++ b/drivers/gpu/drm/drm_atomic.c > @@ -1078,6 +1078,8 @@ drm_atomic_private_obj_init(struct drm_private_obj *obj, > { > memset(obj, 0, sizeof(*obj)); > > + drm_modeset_lock_init(&obj->lock); > + > obj->state = state; > obj->funcs = funcs; > } > @@ -1093,6 +1095,7 @@ void > drm_atomic_private_obj_fini(struct drm_private_obj *obj) > { > obj->funcs->atomic_destroy_state(obj, obj->state); > + drm_modeset_lock_fini(&obj->lock); > } > EXPORT_SYMBOL(drm_atomic_private_obj_fini); > > @@ -1113,7 +1116,7 @@ struct drm_private_state * > drm_atomic_get_private_obj_state(struct drm_atomic_state *state, >struct drm_private_obj *obj) > { > - int index, num_objs, i; > + int index, num_objs, i, ret; > size_t size; > struct __drm_private_objs_state *arr; > struct drm_private_state *obj_state; > @@ -1122,6 +1125,10 @@ drm_atomic_get_private_obj_state(struct > drm_atomic_state *state, > if (obj == state->private_objs[i].ptr) > return state->private_objs[i].state; > > + ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); > + if (ret) > + return ERR_PTR(ret); > + > num_objs = state->num_private_objs + 1; > size = sizeof(*state->private_objs) * num_objs; > arr = krealloc(state->private_objs, size, GFP_KERNEL); > diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h > index 09076a625637..9ae53b73c9d2 100644 > --- a/include/drm/drm_atomic.h > +++ b/include/drm/drm_atomic.h > @@ -218,6 +218,11 @@ struct drm_private_state_funcs { > * &drm_modeset_lock is required to duplicate and update this object's state. > */ > struct drm_private_obj { > + /** > + * @lock: Modeset lock to protect the state object. > + */ > + struct drm_modeset_lock lock; > + > /** >* @state: Current atomic state for this driver private object. >*/ > -- > 2.14.3 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 7/9] drm/xen-front: Implement KMS/connector handling
On 03/06/2018 09:22 AM, Daniel Vetter wrote: On Mon, Mar 05, 2018 at 02:59:23PM +0200, Oleksandr Andrushchenko wrote: On 03/05/2018 11:23 AM, Daniel Vetter wrote: On Wed, Feb 21, 2018 at 10:03:40AM +0200, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Implement kernel modesetiing/connector handling using DRM simple KMS helper pipeline: - implement KMS part of the driver with the help of DRM simple pipepline helper which is possible due to the fact that the para-virtualized driver only supports a single (primary) plane: - initialize connectors according to XenStore configuration - handle frame done events from the backend - generate vblank events - create and destroy frame buffers and propagate those to the backend - propagate set/reset mode configuration to the backend on display enable/disable callbacks - send page flip request to the backend and implement logic for reporting backend IO errors on prepare fb callback - implement virtual connector handling: - support only pixel formats suitable for single plane modes - make sure the connector is always connected - support a single video mode as per para-virtualized driver configuration Signed-off-by: Oleksandr Andrushchenko I think once you've removed the midlayer in the previous patch it would makes sense to merge the 2 patches into 1. ok, will squash the two Bunch more comments below. -Daniel --- drivers/gpu/drm/xen/Makefile | 2 + drivers/gpu/drm/xen/xen_drm_front_conn.c | 125 + drivers/gpu/drm/xen/xen_drm_front_conn.h | 35 drivers/gpu/drm/xen/xen_drm_front_drv.c | 15 ++ drivers/gpu/drm/xen/xen_drm_front_drv.h | 12 ++ drivers/gpu/drm/xen/xen_drm_front_kms.c | 299 +++ drivers/gpu/drm/xen/xen_drm_front_kms.h | 30 7 files changed, 518 insertions(+) create mode 100644 drivers/gpu/drm/xen/xen_drm_front_conn.c create mode 100644 drivers/gpu/drm/xen/xen_drm_front_conn.h create mode 100644 drivers/gpu/drm/xen/xen_drm_front_kms.c create mode 100644 drivers/gpu/drm/xen/xen_drm_front_kms.h diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile index d3068202590f..4fcb0da1a9c5 100644 --- a/drivers/gpu/drm/xen/Makefile +++ b/drivers/gpu/drm/xen/Makefile @@ -2,6 +2,8 @@ drm_xen_front-objs := xen_drm_front.o \ xen_drm_front_drv.o \ + xen_drm_front_kms.o \ + xen_drm_front_conn.o \ xen_drm_front_evtchnl.o \ xen_drm_front_shbuf.o \ xen_drm_front_cfg.o diff --git a/drivers/gpu/drm/xen/xen_drm_front_conn.c b/drivers/gpu/drm/xen/xen_drm_front_conn.c new file mode 100644 index ..d9986a2e1a3b --- /dev/null +++ b/drivers/gpu/drm/xen/xen_drm_front_conn.c @@ -0,0 +1,125 @@ +/* + * Xen para-virtual DRM device + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2016-2018 EPAM Systems Inc. + * + * Author: Oleksandr Andrushchenko + */ + +#include +#include + +#include + +#include "xen_drm_front_conn.h" +#include "xen_drm_front_drv.h" + +static struct xen_drm_front_drm_pipeline * +to_xen_drm_pipeline(struct drm_connector *connector) +{ + return container_of(connector, struct xen_drm_front_drm_pipeline, conn); +} + +static const uint32_t plane_formats[] = { + DRM_FORMAT_RGB565, + DRM_FORMAT_RGB888, + DRM_FORMAT_XRGB, + DRM_FORMAT_ARGB, + DRM_FORMAT_XRGB, + DRM_FORMAT_ARGB, + DRM_FORMAT_XRGB1555, + DRM_FORMAT_ARGB1555, +}; + +const uint32_t *xen_drm_front_conn_get_formats(int *format_count) +{ + *format_count = ARRAY_SIZE(plane_formats); + return plane_formats; +} + +static enum drm_connector_status connector_detect( + struct drm_connector *connector, bool force) +{ + if (drm_dev_is_unplugged(connector->dev)) + return connector_status_disconnected; + + return connector_status_connected; +} + +#define XEN_DRM_NUM_VIDEO_MODES1 +#define XEN_DRM_CRTC_VREFRESH_HZ 60 + +static int connector_get_modes(struct drm_connector *connector) +{ + struct xen_drm_front_drm_pipeline *pipeline = + to_xen_drm_pipeline(connector); + struct drm_display_mode *mode; + struct videomode videomode; + int width, height; + + mode = drm_mode_create(connector->dev); +
Re: [PATCH 1/4] drm/atomic: integrate modeset lock with private objects
On Wed, Feb 21, 2018 at 04:19:40PM +0100, Maarten Lankhorst wrote: > Hey, > > Op 21-02-18 om 15:37 schreef Rob Clark: > > Follow the same pattern of locking as with other state objects. This > > avoids boilerplate in the driver. > I'm afraid this will prohibit any uses of this on i915, since it still uses > legacy lock_all(). > > Oh well, afaict nothing in i915 uses private objects, so I don't think it's > harmful. :) We do use private objects, as part of dp mst helpers. But I also thought that the only users left of lock_all are in the debugfs code, where this really doesn't matter all that much. > Could you cc intel-gfx just in case? Yeah, best to double-check with CI. > > Signed-off-by: Rob Clark > > --- > > drivers/gpu/drm/drm_atomic.c | 9 - > > include/drm/drm_atomic.h | 5 + > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c > > index fc8c4da409ff..004e621ab307 100644 > > --- a/drivers/gpu/drm/drm_atomic.c > > +++ b/drivers/gpu/drm/drm_atomic.c > > @@ -1078,6 +1078,8 @@ drm_atomic_private_obj_init(struct drm_private_obj > > *obj, > > { > > memset(obj, 0, sizeof(*obj)); > > > > + drm_modeset_lock_init(&obj->lock); > > + > > obj->state = state; > > obj->funcs = funcs; > > } > > @@ -1093,6 +1095,7 @@ void > > drm_atomic_private_obj_fini(struct drm_private_obj *obj) > > { > > obj->funcs->atomic_destroy_state(obj, obj->state); > > + drm_modeset_lock_fini(&obj->lock); > > } > > EXPORT_SYMBOL(drm_atomic_private_obj_fini); > > > > @@ -1113,7 +1116,7 @@ struct drm_private_state * > > drm_atomic_get_private_obj_state(struct drm_atomic_state *state, > > struct drm_private_obj *obj) > > { > > - int index, num_objs, i; > > + int index, num_objs, i, ret; > > size_t size; > > struct __drm_private_objs_state *arr; > > struct drm_private_state *obj_state; > > @@ -1122,6 +1125,10 @@ drm_atomic_get_private_obj_state(struct > > drm_atomic_state *state, > > if (obj == state->private_objs[i].ptr) > > return state->private_objs[i].state; > > > > + ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); > > + if (ret) > > + return ERR_PTR(ret); > > + > > num_objs = state->num_private_objs + 1; > > size = sizeof(*state->private_objs) * num_objs; > > arr = krealloc(state->private_objs, size, GFP_KERNEL); > > diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h > > index 09076a625637..9ae53b73c9d2 100644 > > --- a/include/drm/drm_atomic.h > > +++ b/include/drm/drm_atomic.h > > @@ -218,6 +218,11 @@ struct drm_private_state_funcs { > > * &drm_modeset_lock is required to duplicate and update this object's > > state. > > */ > > struct drm_private_obj { > > + /** > > +* @lock: Modeset lock to protect the state object. > > +*/ > > + struct drm_modeset_lock lock; > > + > > /** > > * @state: Current atomic state for this driver private object. > > */ > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 8/9] drm/xen-front: Implement GEM operations
On Mon, Mar 05, 2018 at 03:46:07PM +0200, Oleksandr Andrushchenko wrote: > On 03/05/2018 11:32 AM, Daniel Vetter wrote: > > On Wed, Feb 21, 2018 at 10:03:41AM +0200, Oleksandr Andrushchenko wrote: > > > From: Oleksandr Andrushchenko > > > > > > Implement GEM handling depending on driver mode of operation: > > > depending on the requirements for the para-virtualized environment, namely > > > requirements dictated by the accompanying DRM/(v)GPU drivers running in > > > both > > > host and guest environments, number of operating modes of para-virtualized > > > display driver are supported: > > > - display buffers can be allocated by either frontend driver or backend > > > - display buffers can be allocated to be contiguous in memory or not > > > > > > Note! Frontend driver itself has no dependency on contiguous memory for > > > its operation. > > > > > > 1. Buffers allocated by the frontend driver. > > > > > > The below modes of operation are configured at compile-time via > > > frontend driver's kernel configuration. > > > > > > 1.1. Front driver configured to use GEM CMA helpers > > > This use-case is useful when used with accompanying DRM/vGPU driver > > > in > > > guest domain which was designed to only work with contiguous > > > buffers, > > > e.g. DRM driver based on GEM CMA helpers: such drivers can only > > > import > > > contiguous PRIME buffers, thus requiring frontend driver to provide > > > such. In order to implement this mode of operation para-virtualized > > > frontend driver can be configured to use GEM CMA helpers. > > > > > > 1.2. Front driver doesn't use GEM CMA > > > If accompanying drivers can cope with non-contiguous memory then, to > > > lower pressure on CMA subsystem of the kernel, driver can allocate > > > buffers from system memory. > > > > > > Note! If used with accompanying DRM/(v)GPU drivers this mode of operation > > > may require IOMMU support on the platform, so accompanying DRM/vGPU > > > hardware can still reach display buffer memory while importing PRIME > > > buffers from the frontend driver. > > > > > > 2. Buffers allocated by the backend > > > > > > This mode of operation is run-time configured via guest domain > > > configuration > > > through XenStore entries. > > > > > > For systems which do not provide IOMMU support, but having specific > > > requirements for display buffers it is possible to allocate such buffers > > > at backend side and share those with the frontend. > > > For example, if host domain is 1:1 mapped and has DRM/GPU hardware > > > expecting > > > physically contiguous memory, this allows implementing zero-copying > > > use-cases. > > > > > > Note! Configuration options 1.1 (contiguous display buffers) and 2 > > > (backend > > > allocated buffers) are not supported at the same time. > > > > > > Signed-off-by: Oleksandr Andrushchenko > > Some suggestions below for some larger cleanup work. > > -Daniel > > > > > --- > > > drivers/gpu/drm/xen/Kconfig | 13 + > > > drivers/gpu/drm/xen/Makefile| 6 + > > > drivers/gpu/drm/xen/xen_drm_front.h | 74 ++ > > > drivers/gpu/drm/xen/xen_drm_front_drv.c | 80 ++- > > > drivers/gpu/drm/xen/xen_drm_front_drv.h | 1 + > > > drivers/gpu/drm/xen/xen_drm_front_gem.c | 360 > > > > > > drivers/gpu/drm/xen/xen_drm_front_gem.h | 46 > > > drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 93 +++ > > > 8 files changed, 667 insertions(+), 6 deletions(-) > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem.c > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem.h > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c > > > > > > diff --git a/drivers/gpu/drm/xen/Kconfig b/drivers/gpu/drm/xen/Kconfig > > > index 4cca160782ab..4f4abc91f3b6 100644 > > > --- a/drivers/gpu/drm/xen/Kconfig > > > +++ b/drivers/gpu/drm/xen/Kconfig > > > @@ -15,3 +15,16 @@ config DRM_XEN_FRONTEND > > > help > > > Choose this option if you want to enable a para-virtualized > > > frontend DRM/KMS driver for Xen guest OSes. > > > + > > > +config DRM_XEN_FRONTEND_CMA > > > + bool "Use DRM CMA to allocate dumb buffers" > > > + depends on DRM_XEN_FRONTEND > > > + select DRM_KMS_CMA_HELPER > > > + select DRM_GEM_CMA_HELPER > > > + help > > > + Use DRM CMA helpers to allocate display buffers. > > > + This is useful for the use-cases when guest driver needs to > > > + share or export buffers to other drivers which only expect > > > + contiguous buffers. > > > + Note: in this mode driver cannot use buffers allocated > > > + by the backend. > > > diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile > > > index 4fcb0da1a9c5..12376ec78fbc 100644 > > > --- a/drivers/gpu/drm/xen/Makefile > > > +++ b/drivers/gpu/drm/xen/Makefile > > > @@ -8,4 +8,10 @@ drm_xen_front-objs
Re: [PATCH 7/9] drm/xen-front: Implement KMS/connector handling
On Mon, Mar 05, 2018 at 02:59:23PM +0200, Oleksandr Andrushchenko wrote: > On 03/05/2018 11:23 AM, Daniel Vetter wrote: > > On Wed, Feb 21, 2018 at 10:03:40AM +0200, Oleksandr Andrushchenko wrote: > > > From: Oleksandr Andrushchenko > > > > > > Implement kernel modesetiing/connector handling using > > > DRM simple KMS helper pipeline: > > > > > > - implement KMS part of the driver with the help of DRM > > >simple pipepline helper which is possible due to the fact > > >that the para-virtualized driver only supports a single > > >(primary) plane: > > >- initialize connectors according to XenStore configuration > > >- handle frame done events from the backend > > >- generate vblank events > > >- create and destroy frame buffers and propagate those > > > to the backend > > >- propagate set/reset mode configuration to the backend on display > > > enable/disable callbacks > > >- send page flip request to the backend and implement logic for > > > reporting backend IO errors on prepare fb callback > > > > > > - implement virtual connector handling: > > >- support only pixel formats suitable for single plane modes > > >- make sure the connector is always connected > > >- support a single video mode as per para-virtualized driver > > > configuration > > > > > > Signed-off-by: Oleksandr Andrushchenko > > I think once you've removed the midlayer in the previous patch it would > > makes sense to merge the 2 patches into 1. > ok, will squash the two > > > > Bunch more comments below. > > -Daniel > > > > > --- > > > drivers/gpu/drm/xen/Makefile | 2 + > > > drivers/gpu/drm/xen/xen_drm_front_conn.c | 125 + > > > drivers/gpu/drm/xen/xen_drm_front_conn.h | 35 > > > drivers/gpu/drm/xen/xen_drm_front_drv.c | 15 ++ > > > drivers/gpu/drm/xen/xen_drm_front_drv.h | 12 ++ > > > drivers/gpu/drm/xen/xen_drm_front_kms.c | 299 > > > +++ > > > drivers/gpu/drm/xen/xen_drm_front_kms.h | 30 > > > 7 files changed, 518 insertions(+) > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_conn.c > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_conn.h > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_kms.c > > > create mode 100644 drivers/gpu/drm/xen/xen_drm_front_kms.h > > > > > > diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile > > > index d3068202590f..4fcb0da1a9c5 100644 > > > --- a/drivers/gpu/drm/xen/Makefile > > > +++ b/drivers/gpu/drm/xen/Makefile > > > @@ -2,6 +2,8 @@ > > > drm_xen_front-objs := xen_drm_front.o \ > > > xen_drm_front_drv.o \ > > > + xen_drm_front_kms.o \ > > > + xen_drm_front_conn.o \ > > > xen_drm_front_evtchnl.o \ > > > xen_drm_front_shbuf.o \ > > > xen_drm_front_cfg.o > > > diff --git a/drivers/gpu/drm/xen/xen_drm_front_conn.c > > > b/drivers/gpu/drm/xen/xen_drm_front_conn.c > > > new file mode 100644 > > > index ..d9986a2e1a3b > > > --- /dev/null > > > +++ b/drivers/gpu/drm/xen/xen_drm_front_conn.c > > > @@ -0,0 +1,125 @@ > > > +/* > > > + * Xen para-virtual DRM device > > > + * > > > + * This program is free software; you can redistribute it and/or modify > > > + * it under the terms of the GNU General Public License as published by > > > + * the Free Software Foundation; either version 2 of the License, or > > > + * (at your option) any later version. > > > + * > > > + * This program is distributed in the hope that it will be useful, > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > + * GNU General Public License for more details. > > > + * > > > + * Copyright (C) 2016-2018 EPAM Systems Inc. > > > + * > > > + * Author: Oleksandr Andrushchenko > > > + */ > > > + > > > +#include > > > +#include > > > + > > > +#include > > > + > > > +#include "xen_drm_front_conn.h" > > > +#include "xen_drm_front_drv.h" > > > + > > > +static struct xen_drm_front_drm_pipeline * > > > +to_xen_drm_pipeline(struct drm_connector *connector) > > > +{ > > > + return container_of(connector, struct xen_drm_front_drm_pipeline, conn); > > > +} > > > + > > > +static const uint32_t plane_formats[] = { > > > + DRM_FORMAT_RGB565, > > > + DRM_FORMAT_RGB888, > > > + DRM_FORMAT_XRGB, > > > + DRM_FORMAT_ARGB, > > > + DRM_FORMAT_XRGB, > > > + DRM_FORMAT_ARGB, > > > + DRM_FORMAT_XRGB1555, > > > + DRM_FORMAT_ARGB1555, > > > +}; > > > + > > > +const uint32_t *xen_drm_front_conn_get_formats(int *format_count) > > > +{ > > > + *format_count = ARRAY_SIZE(plane_formats); > > > + return plane_formats; > > > +} > > > + > > > +static enum drm_connector_status connector_detect( > > > + struct drm_connector *connector, bool force) > > > +{ > > > + if (drm_dev_is_unplugg
[GIT PULL] siginfo fix for v4.16-rc5
Linus, Please pull the siginfo-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git siginfo-linus HEAD: f6a015498dcaee72f80283cb7873d88deb07129c signal/x86: Include the field offsets in the build time checks The kbuild test robot found that I accidentally moved si_pkey when I was cleaning up siginfo_t. A short followed by an int with the int having 8 byte alignment. Sheesh siginfo_t is a weird structure. I have now corrected it and added build time checks that with a little luck will catch any similar future mistakes. The build time checks were sufficient for me to verify the bug and to verify my fix. So they are at least useful this once. Eric W. Biederman (2): signal: Correct the offset of si_pkey in struct siginfo signal/x86: Include the field offsets in the build time checks arch/x86/kernel/signal_compat.c| 65 ++ include/linux/compat.h | 4 +-- include/uapi/asm-generic/siginfo.h | 4 +-- 3 files changed, 69 insertions(+), 4 deletions(-)
[RFC PATCH] irqchip/gic-v3-its: handle wrapped case in its_wait_for_range_completion()
From: Yang Yingliang While cpus posting a bunch of ITS commands, the cmd_queue and rd_idx will be wrapped easily. And current way of handling wrapped case is not quite right. Such as, in direct case, rd_idx will wrap if other cpus post commands that make rd_idx increase. When rd_idx wrapped, the driver prints timeout messages but in fact the command is finished. This patch adds two variables to count wrapped times of ITS commands and read index. With these two variables, the driver can handle wrapped case correctly. Signed-off-by: Yang Yingliang --- drivers/irqchip/irq-gic-v3-its.c | 72 +--- 1 file changed, 60 insertions(+), 12 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 1d3056f..a03e18e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -111,6 +111,9 @@ struct its_node { u32 pre_its_base; /* for Socionext Synquacer */ boolis_v4; int vlpi_redist_offset; + int last_rd; + u64 cmd_wrapped_cnt; + u64 rd_wrapped_cnt; }; #define ITS_ITT_ALIGN SZ_256 @@ -662,6 +665,7 @@ static int its_queue_full(struct its_node *its) static struct its_cmd_block *its_allocate_entry(struct its_node *its) { + u32 rd; struct its_cmd_block *cmd; u32 count = 100;/* 1s! */ @@ -675,11 +679,24 @@ static struct its_cmd_block *its_allocate_entry(struct its_node *its) udelay(1); } + /* +* Here is protected by its->lock and driver cannot allocate +* ITS commands, if ITS command queue is full, so the read +* won't wrap twice between this rd_idx and last rd_idx. +* Count rd wrapped times here is safe. +*/ + rd = readl_relaxed(its->base + GITS_CREADR); + if (rd < its->last_rd) + its->rd_wrapped_cnt++; + its->last_rd = rd; + cmd = its->cmd_write++; /* Handle queue wrapping */ - if (its->cmd_write == (its->cmd_base + ITS_CMD_QUEUE_NR_ENTRIES)) + if (its->cmd_write == (its->cmd_base + ITS_CMD_QUEUE_NR_ENTRIES)) { its->cmd_write = its->cmd_base; + its->cmd_wrapped_cnt++; + } /* Clear command */ cmd->raw_cmd[0] = 0; @@ -713,29 +730,57 @@ static void its_flush_cmd(struct its_node *its, struct its_cmd_block *cmd) static int its_wait_for_range_completion(struct its_node *its, struct its_cmd_block *from, -struct its_cmd_block *to) +struct its_cmd_block *to, +u64 last_cmd_wrapped_cnt) { - u64 rd_idx, from_idx, to_idx; + unsigned long flags; + u64 rd_idx, from_idx, to_idx, rd_wrapped_cnt; u32 count = 100;/* 1s! */ from_idx = its_cmd_ptr_to_offset(its, from); to_idx = its_cmd_ptr_to_offset(its, to); while (1) { + raw_spin_lock_irqsave(&its->lock, flags); rd_idx = readl_relaxed(its->base + GITS_CREADR); + if (rd_idx < its->last_rd) + its->rd_wrapped_cnt++; + its->last_rd = rd_idx; + rd_wrapped_cnt = its->rd_wrapped_cnt; + raw_spin_unlock_irqrestore(&its->lock, flags); - /* Direct case */ - if (from_idx < to_idx && rd_idx >= to_idx) - break; - - /* Wrapped case */ - if (from_idx >= to_idx && rd_idx >= to_idx && rd_idx < from_idx) + /* +* If rd_wrapped_cnt > last_cmd_wrapped_cnt: +* there are a lot of ITS commands posted by +* other cpus and ITS is fast. +* +* If rd_wrapped_cnt < last_cmd_wrapped_cnt: +* ITS is slow, there are some ITS commands +* not finished. +* +* If rd_wrapped_cnt == last_cmd_wrapped_cnt: +* it's common case. +*/ + if (rd_wrapped_cnt > last_cmd_wrapped_cnt) { + /* +* There is a lot of ITS commands posted by other cpus, +* it make rd_idx move foward fast and wrap. +*/ break; + } else if (rd_wrapped_cnt == last_cmd_wrapped_cnt) { + /* Direct case */ + if (from_idx < to_idx && rd_idx >= to_idx) + break; + + /* Wrapped case */ + if (from_idx >= to_idx && rd_idx >= to_idx && rd_idx < from_idx) + break; + }
[PATCHv2 2/2] zram: drop max_zpage_size and use zs_huge_class_size()
This patch removes ZRAM's enforced "huge object" value and uses zsmalloc huge-class watermark instead, which makes more sense. TEST - I used a 1G zram device, LZO compression back-end, original data set size was 444MB. Looking at zsmalloc classes stats the test ended up to be pretty fair. BASE ZRAM/ZSMALLOC = zram mm_stat 498978816 191482495 1998315520 199831552156340 zsmalloc classes class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 151 2448 00 1240 1240744 30 168 2720 00 4200 4200 2800 20 190 3072 00 10100 10100 7575 30 202 3264 00 380380304 40 254 4096 00 10620 10620 10620 10 Total 7 46106982 106187 48787 0 PATCHED ZRAM/ZSMALLOC = zram mm_stat 498978816 182579184 1942487040 194248704156280 zsmalloc classes class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable ... 151 2448 00 1240 1240744 30 168 2720 00 4200 4200 2800 20 190 3072 00 10100 10100 7575 30 202 3264 00 7180 7180 5744 40 254 4096 00 3820 3820 3820 10 Total 8 45106959 106193 47424 0 As we can see, we reduced the number of objects stored in class-4096, because a huge number of objects which we previously forcibly stored in class-4096 now stored in non-huge class-3264. This results in lower memory consumption: - zsmalloc now uses 47424 physical pages, which is less than 48787 pages zsmalloc used before. - objects that we store in class-3264 share zspages. That's why overall the number of pages that both class-4096 and class-3264 consumed went down from 10924 to 9564. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 9 - drivers/block/zram/zram_drv.h | 16 2 files changed, 8 insertions(+), 17 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 85110e7931e5..1b8082e6d2f5 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -44,6 +44,11 @@ static const char *default_compressor = "lzo"; /* Module params (documentation at end) */ static unsigned int num_devices = 1; +/* + * Pages that compress to sizes equals or greater than this are stored + * uncompressed in memory. + */ +static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); @@ -786,6 +791,8 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) return false; } + if (!huge_class_size) + huge_class_size = zs_huge_class_size(); return true; } @@ -965,7 +972,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, return ret; } - if (unlikely(comp_len > max_zpage_size)) { + if (unlikely(comp_len >= huge_class_size)) { if (zram_wb_enabled(zram) && allow_wb) { zcomp_stream_put(zram->comp); ret = write_to_bdev(zram, bvec, index, bio, &element); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 31762db861e3..d71c8000a964 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -21,22 +21,6 @@ #include "zcomp.h" -/*-- Configurable parameters */ - -/* - * Pages that compress to size greater than this are stored - * uncompressed in memory. - */ -static const size_t max_zpage_size = PAGE_SIZE / 4 * 3; - -/* - * NOTE: max_zpage_size must be less than or equal to: - * ZS_MAX_ALLOC_SIZE. Otherwise, zs_malloc() would - * always return failure. - */ - -/*-- End of configurable params */ - #define SECTOR_SHIFT 9 #define SECTORS_PER_PAGE_SHIFT (PAGE_SHIFT - SECTOR_SHIFT) #define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT) -- 2.16.2
[PATCHv2 0/2] zsmalloc/zram: drop zram's max_zpage_size
Hello, ZRAM's max_zpage_size is a bad thing. It forces zsmalloc to store normal objects as huge ones, which results in bigger zsmalloc memory usage. Drop it and use actual zsmalloc huge-class value when decide if the object is huge or not. Sergey Senozhatsky (2): zsmalloc: introduce zs_huge_class_size() function zram: drop max_zpage_size and use zs_huge_class_size() drivers/block/zram/zram_drv.c | 9 - drivers/block/zram/zram_drv.h | 16 include/linux/zsmalloc.h | 2 ++ mm/zsmalloc.c | 40 4 files changed, 50 insertions(+), 17 deletions(-) -- 2.16.2
[PATCHv2 1/2] zsmalloc: introduce zs_huge_class_size() function
Not every object can be share its zspage with other objects, e.g. when the object is as big as zspage or nearly as big a zspage. For such objects zsmalloc has a so called huge class - every object which belongs to huge class consumes the entire zspage (which consists of a physical page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is 3264, so starting down from size 3264, objects can share page(-s) and thus minimize memory wastage. ZRAM, however, has its own statically defined watermark for huge objects - "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every object larger than this watermark (3072) as a PAGE_SIZE object, in other words, to a huge class, while zsmalloc can keep some of those objects in non-huge classes. This results in increased memory consumption. zsmalloc knows better if the object is huge or not. Introduce zs_huge_class_size() function which tells if the given object can be stored in one of non-huge classes or not. This will let us to drop ZRAM's huge object watermark and fully rely on zsmalloc when we decide if the object is huge. Signed-off-by: Sergey Senozhatsky --- include/linux/zsmalloc.h | 2 ++ mm/zsmalloc.c| 40 2 files changed, 42 insertions(+) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 57a8e98f2708..753c1af4d2cb 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -47,6 +47,8 @@ void zs_destroy_pool(struct zs_pool *pool); unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags); void zs_free(struct zs_pool *pool, unsigned long obj); +size_t zs_huge_class_size(void); + void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm); void zs_unmap_object(struct zs_pool *pool, unsigned long handle); diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a583ab111a43..63422cf35b94 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -193,6 +193,7 @@ static struct vfsmount *zsmalloc_mnt; * (see: fix_fullness_group()) */ static const int fullness_threshold_frac = 4; +static size_t huge_class_size; struct size_class { spinlock_t lock; @@ -1407,6 +1408,24 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); +/** + * zs_huge_class_size() - Returns the size (in bytes) of the first huge + *zsmalloc &size_class. + * + * The function returns the size of the first huge class - any object of equal + * or bigger size will be stored in zspage consisting of a single physical + * page. + * + * Context: Any context. + * + * Return: the size (in bytes) of the first huge zsmalloc &size_class. + */ +size_t zs_huge_class_size(void) +{ + return huge_class_size; +} +EXPORT_SYMBOL_GPL(zs_huge_class_size); + static unsigned long obj_malloc(struct size_class *class, struct zspage *zspage, unsigned long handle) { @@ -2363,6 +2382,27 @@ struct zs_pool *zs_create_pool(const char *name) pages_per_zspage = get_pages_per_zspage(size); objs_per_zspage = pages_per_zspage * PAGE_SIZE / size; + /* +* We iterate from biggest down to smallest classes, +* so huge_class_size holds the size of the first huge +* class. Any object bigger than or equal to that will +* endup in the huge class. +*/ + if (pages_per_zspage != 1 && objs_per_zspage != 1 && + !huge_class_size) { + huge_class_size = size; + /* +* The object uses ZS_HANDLE_SIZE bytes to store the +* handle. We need to subtract it, because zs_malloc() +* unconditionally adds handle size before it performs +* size class search - so object may be smaller than +* huge class size, yet it still can end up in the huge +* class because it grows by ZS_HANDLE_SIZE extra bytes +* right before class lookup. +*/ + huge_class_size -= (ZS_HANDLE_SIZE - 1); + } + /* * size_class is used for normal zsmalloc operation such * as alloc/free for that size. Although it is natural that we -- 2.16.2
Re: [PATCH 07/34] x86/entry/32: Restore segments before int registers
* H. Peter Anvin wrote: > On NX-enabled hardware NX works with PDE, but the PDPDT in general doesn't > have permission bits (it's really more of a set of four CR3s than a page > table level.) The 4 PDPDT entries are also shadowed in the CPU and are only refreshed on CR3 loads, not spontaneously reloaded from memory during TLB walk like regular page table entries, right? This too strengthens the notion that the third page table level of PAE is more like a special in-memory CR3[4] array. Thanks, Ingo
Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
Hi Alex, I love your patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.16-rc4 next-20180306] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Alex-Williamson/vfio-pci-Pull-BAR-mapping-setup-from-read-write-path/20180303-015851 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/vfio/pci/vfio_pci_rdwr.c:290:1: sparse: incorrect type in argument 2 >> (different address spaces) @@expected void [noderef] * >> @@got sn:2>* @@ drivers/vfio/pci/vfio_pci_rdwr.c:290:1:expected void [noderef] * drivers/vfio/pci/vfio_pci_rdwr.c:290:1:got void *opaque drivers/vfio/pci/vfio_pci_rdwr.c:291:1: sparse: incorrect type in argument 2 (different address spaces) @@expected void [noderef] * @@ got sn:2>* @@ drivers/vfio/pci/vfio_pci_rdwr.c:291:1:expected void [noderef] * drivers/vfio/pci/vfio_pci_rdwr.c:291:1:got void *opaque drivers/vfio/pci/vfio_pci_rdwr.c:292:1: sparse: incorrect type in argument 2 (different address spaces) @@expected void [noderef] * @@ got sn:2>* @@ drivers/vfio/pci/vfio_pci_rdwr.c:292:1:expected void [noderef] * drivers/vfio/pci/vfio_pci_rdwr.c:292:1:got void *opaque >> drivers/vfio/pci/vfio_pci_rdwr.c:378:52: sparse: incorrect type in argument >> 1 (different address spaces) @@expected void *opaque @@got void >> [noderef] * vim +290 drivers/vfio/pci/vfio_pci_rdwr.c 286 287 #ifdef iowrite64 288 VFIO_PCI_IOEVENTFD_HANDLER(64) 289 #endif > 290 VFIO_PCI_IOEVENTFD_HANDLER(32) 291 VFIO_PCI_IOEVENTFD_HANDLER(16) 292 VFIO_PCI_IOEVENTFD_HANDLER(8) 293 294 long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset, 295 uint64_t data, int count, int fd) 296 { 297 struct pci_dev *pdev = vdev->pdev; 298 loff_t pos = offset & VFIO_PCI_OFFSET_MASK; 299 int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); 300 struct vfio_pci_ioeventfd *ioeventfd; 301 int (*handler)(void *addr, void *value); 302 303 /* Only support ioeventfds into BARs */ 304 if (bar > VFIO_PCI_BAR5_REGION_INDEX) 305 return -EINVAL; 306 307 if (pos + count > pci_resource_len(pdev, bar)) 308 return -EINVAL; 309 310 /* Disallow ioeventfds working around MSI-X table writes */ 311 if (bar == vdev->msix_bar && 312 !(pos + count <= vdev->msix_offset || 313pos >= vdev->msix_offset + vdev->msix_size)) 314 return -EINVAL; 315 316 switch (count) { 317 case 1: 318 handler = &vfio_pci_ioeventfd_handler8; 319 break; 320 case 2: 321 handler = &vfio_pci_ioeventfd_handler16; 322 break; 323 case 4: 324 handler = &vfio_pci_ioeventfd_handler32; 325 break; 326 #ifdef iowrite64 327 case 8: 328 handler = &vfio_pci_ioeventfd_handler64; 329 break; 330 #endif 331 default: 332 return -EINVAL; 333 } 334 335 ret = vfio_pci_setup_barmap(vdev, bar); 336 if (ret) 337 return ret; 338 339 mutex_lock(&vdev->ioeventfds_lock); 340 341 list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) { 342 if (ioeventfd->pos == pos && ioeventfd->bar == bar && 343 ioeventfd->data == data && ioeventfd->count == count) { 344 if (fd == -1) { 345 vfio_virqfd_disable(&ioeventfd->virqfd); 346 list_del(&ioeventfd->next); 347 vdev->ioeventfds_nr--; 348 kfree(ioeventfd); 349 ret = 0; 350 } else 351 ret = -EEXIST; 352 353 goto out_unlock; 354 } 355 } 356 357 if (fd < 0) { 358 ret = -ENODEV; 359 goto out_unlock; 360 } 361 362 if (vdev->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) { 363 ret = -ENOSPC; 364 goto out_unlock; 365 } 366 367 ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL); 368 if (!ioeventfd) { 369 ret = -ENOMEM;
[tip:perf/core] perf mmap: Discard legacy interfaces for mmap read forward
Commit-ID: 6afad54d2f0ddebacfcf3b829147d7fed8dab298 Gitweb: https://git.kernel.org/tip/6afad54d2f0ddebacfcf3b829147d7fed8dab298 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:11 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:51:10 -0300 perf mmap: Discard legacy interfaces for mmap read forward Discards legacy interfaces perf_evlist__mmap_read_forward(), perf_evlist__mmap_read() and perf_evlist__mmap_consume(). No tools use them. Signed-off-by: Kan Liang Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-14-git-send-email-kan.li...@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evlist.c | 25 + tools/perf/util/evlist.h | 4 tools/perf/util/mmap.c | 21 + 3 files changed, 2 insertions(+), 48 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 7b7d535396f7..41a4666f1519 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -702,29 +702,6 @@ static int perf_evlist__resume(struct perf_evlist *evlist) return perf_evlist__set_paused(evlist, false); } -union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist, int idx) -{ - struct perf_mmap *md = &evlist->mmap[idx]; - - /* -* Check messup is required for forward overwritable ring buffer: -* memory pointed by md->prev can be overwritten in this case. -* No need for read-write ring buffer: kernel stop outputting when -* it hit md->prev (perf_mmap__consume()). -*/ - return perf_mmap__read_forward(md); -} - -union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx) -{ - return perf_evlist__mmap_read_forward(evlist, idx); -} - -void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx) -{ - perf_mmap__consume(&evlist->mmap[idx], false); -} - static void perf_evlist__munmap_nofree(struct perf_evlist *evlist) { int i; @@ -761,7 +738,7 @@ static struct perf_mmap *perf_evlist__alloc_mmap(struct perf_evlist *evlist) map[i].fd = -1; /* * When the perf_mmap() call is made we grab one refcount, plus -* one extra to let perf_evlist__mmap_consume() get the last +* one extra to let perf_mmap__consume() get the last * events after all real references (perf_mmap__get()) are * dropped. * diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 336b838e6957..6c41b2f78713 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -129,10 +129,6 @@ struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id); void perf_evlist__toggle_bkw_mmap(struct perf_evlist *evlist, enum bkw_mmap_state state); -union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx); - -union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist, -int idx); void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx); int perf_evlist__open(struct perf_evlist *evlist); diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index 91531a7c8fbf..4f27c464ce0b 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -63,25 +63,6 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map, return event; } -/* - * legacy interface for mmap read. - * Don't use it. Use perf_mmap__read_event(). - */ -union perf_event *perf_mmap__read_forward(struct perf_mmap *map) -{ - u64 head; - - /* -* Check if event was unmapped due to a POLLHUP/POLLERR. -*/ - if (!refcount_read(&map->refcnt)) - return NULL; - - head = perf_mmap__read_head(map); - - return perf_mmap__read(map, &map->prev, head); -} - /* * Read event from ring buffer one by one. * Return one event for each call. @@ -191,7 +172,7 @@ void perf_mmap__munmap(struct perf_mmap *map) int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd) { /* -* The last one will be done at perf_evlist__mmap_consume(), so that we +* The last one will be done at perf_mmap__consume(), so that we * make sure we don't prevent tools from consuming every last event in * the ring buffer. *
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for task-exit
Commit-ID: 759487307625cd44ac4aa241ee547b52b72bc4ad Gitweb: https://git.kernel.org/tip/759487307625cd44ac4aa241ee547b52b72bc4ad Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:10 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:51:00 -0300 perf test: Switch to new perf_mmap__read_event() interface for task-exit The perf test 'task-exit' still use the legacy interface. No functional change. Committer notes: Testing it: # perf test exit 21: Number of exit events of a simple workload: Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-13-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/task-exit.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c index 01b62b81751b..02b0888b72a3 100644 --- a/tools/perf/tests/task-exit.c +++ b/tools/perf/tests/task-exit.c @@ -47,6 +47,8 @@ int test__task_exit(struct test *test __maybe_unused, int subtest __maybe_unused char sbuf[STRERR_BUFSIZE]; struct cpu_map *cpus; struct thread_map *threads; + struct perf_mmap *md; + u64 end, start; signal(SIGCHLD, sig_handler); @@ -110,13 +112,19 @@ int test__task_exit(struct test *test __maybe_unused, int subtest __maybe_unused perf_evlist__start_workload(evlist); retry: - while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) { + md = &evlist->mmap[0]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + goto out_init; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { if (event->header.type == PERF_RECORD_EXIT) nr_exit++; - perf_evlist__mmap_consume(evlist, 0); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); +out_init: if (!exited || !nr_exit) { perf_evlist__poll(evlist, -1); goto retry;
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for switch-tracking
Commit-ID: ee4024ff858211316c4824b16bea446f08765ae8 Gitweb: https://git.kernel.org/tip/ee4024ff858211316c4824b16bea446f08765ae8 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:09 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:50:50 -0300 perf test: Switch to new perf_mmap__read_event() interface for switch-tracking The perf test 'switch-tracking' still use the legacy interface. No functional change. Committer testing: # perf test switch 32: Track with sched_switch : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-12-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/switch-tracking.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c index 33e00295a972..10c4dcdc2324 100644 --- a/tools/perf/tests/switch-tracking.c +++ b/tools/perf/tests/switch-tracking.c @@ -258,16 +258,23 @@ static int process_events(struct perf_evlist *evlist, unsigned pos, cnt = 0; LIST_HEAD(events); struct event_node *events_array, *node; + struct perf_mmap *md; + u64 end, start; int i, ret; for (i = 0; i < evlist->nr_mmaps; i++) { - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { cnt += 1; ret = add_event(evlist, &events, event); - perf_evlist__mmap_consume(evlist, i); +perf_mmap__consume(md, false); if (ret < 0) goto out_free_nodes; } + perf_mmap__read_done(md); } events_array = calloc(cnt, sizeof(struct event_node));
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for sw-clock
Commit-ID: 5d0007cdfc6612788badceb276156d6ccb30b6de Gitweb: https://git.kernel.org/tip/5d0007cdfc6612788badceb276156d6ccb30b6de Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:08 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:50:37 -0300 perf test: Switch to new perf_mmap__read_event() interface for sw-clock The perf test 'sw-clock' still use the legacy interface. No functional change. Committer testing: # perf test clock 22: Software clock events period values : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-11-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/sw-clock.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/sw-clock.c b/tools/perf/tests/sw-clock.c index f6c72f915d48..e6320e267ba5 100644 --- a/tools/perf/tests/sw-clock.c +++ b/tools/perf/tests/sw-clock.c @@ -39,6 +39,8 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) }; struct cpu_map *cpus; struct thread_map *threads; + struct perf_mmap *md; + u64 end, start; attr.sample_freq = 500; @@ -93,7 +95,11 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) perf_evlist__disable(evlist); - while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) { + md = &evlist->mmap[0]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + goto out_init; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_SAMPLE) @@ -108,9 +114,11 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) total_periods += sample.period; nr_samples++; next_event: - perf_evlist__mmap_consume(evlist, 0); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); +out_init: if ((u64) nr_samples == total_periods) { pr_debug("All (%d) samples have period value of 1!\n", nr_samples);
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for time-to-tsc
Commit-ID: 9dfb85dfaffe6bc38f0c9f8a8622e2a7ca333e58 Gitweb: https://git.kernel.org/tip/9dfb85dfaffe6bc38f0c9f8a8622e2a7ca333e58 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:07 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:50:23 -0300 perf test: Switch to new perf_mmap__read_event() interface for time-to-tsc The perf test 'time-to-tsc' still use the legacy interface. No functional change. Commiter notes: Testing it: # perf test tsc 57: Convert perf time to TSC : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-10-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/arch/x86/tests/perf-time-to-tsc.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/arch/x86/tests/perf-time-to-tsc.c b/tools/perf/arch/x86/tests/perf-time-to-tsc.c index 06abe8108b33..7f82d91ef473 100644 --- a/tools/perf/arch/x86/tests/perf-time-to-tsc.c +++ b/tools/perf/arch/x86/tests/perf-time-to-tsc.c @@ -60,6 +60,8 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe union perf_event *event; u64 test_tsc, comm1_tsc, comm2_tsc; u64 test_time, comm1_time = 0, comm2_time = 0; + struct perf_mmap *md; + u64 end, start; threads = thread_map__new(-1, getpid(), UINT_MAX); CHECK_NOT_NULL__(threads); @@ -109,7 +111,11 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe perf_evlist__disable(evlist); for (i = 0; i < evlist->nr_mmaps; i++) { - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_COMM || @@ -128,8 +134,9 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe comm2_time = sample.time; } next_event: - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); } if (!comm1_time || !comm2_time)
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for perf-record
Commit-ID: 88e37a4bbe6e05fd5ad103738c542658b81e76ea Gitweb: https://git.kernel.org/tip/88e37a4bbe6e05fd5ad103738c542658b81e76ea Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:06 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:50:21 -0300 perf test: Switch to new perf_mmap__read_event() interface for perf-record The perf test 'perf-record' still use the legacy interface. No functional change. Committer notes: Testing it: # perf test PERF_RECORD 8: PERF_RECORD_* events & perf_sample fields : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-9-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/perf-record.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c index 0afafab85238..31f3f70adca6 100644 --- a/tools/perf/tests/perf-record.c +++ b/tools/perf/tests/perf-record.c @@ -164,8 +164,14 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus for (i = 0; i < evlist->nr_mmaps; i++) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { const u32 type = event->header.type; const char *name = perf_event__name(type); @@ -266,8 +272,9 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus ++errs; } - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); } /*
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for tp fields
Commit-ID: 1d1b5632ed0b797721a409bbed718d85384168a2 Gitweb: https://git.kernel.org/tip/1d1b5632ed0b797721a409bbed718d85384168a2 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:05 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:49:59 -0300 perf test: Switch to new perf_mmap__read_event() interface for tp fields The perf test 'syscalls:sys_enter_openat event fields' still use the legacy interface. No functional change. Committer notes: Testing it: # perf test sys_enter_openat 15: syscalls:sys_enter_openat event fields: Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-8-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/openat-syscall-tp-fields.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/openat-syscall-tp-fields.c b/tools/perf/tests/openat-syscall-tp-fields.c index 43519267b93b..620b21023f72 100644 --- a/tools/perf/tests/openat-syscall-tp-fields.c +++ b/tools/perf/tests/openat-syscall-tp-fields.c @@ -86,8 +86,14 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest for (i = 0; i < evlist->nr_mmaps; i++) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { const u32 type = event->header.type; int tp_flags; struct perf_sample sample; @@ -95,7 +101,7 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest ++nr_events; if (type != PERF_RECORD_SAMPLE) { - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); continue; } @@ -115,6 +121,7 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest goto out_ok; } + perf_mmap__read_done(md); } if (nr_events == before)
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for mmap-basic
Commit-ID: 334f823e2ab58b3c0e58fa71321680382c5f60ff Gitweb: https://git.kernel.org/tip/334f823e2ab58b3c0e58fa71321680382c5f60ff Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:04 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:49:37 -0300 perf test: Switch to new perf_mmap__read_event() interface for mmap-basic The perf test 'mmap-basic' still use the legacy interface. No functional change. Committer notes: Testing it: # perf test "mmap interface" 4: Read samples using the mmap interface : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-7-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/mmap-basic.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c index c0e971da965c..44c58d69cd87 100644 --- a/tools/perf/tests/mmap-basic.c +++ b/tools/perf/tests/mmap-basic.c @@ -38,6 +38,8 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse expected_nr_events[nsyscalls], i, j; struct perf_evsel *evsels[nsyscalls], *evsel; char sbuf[STRERR_BUFSIZE]; + struct perf_mmap *md; + u64 end, start; threads = thread_map__new(-1, getpid(), UINT_MAX); if (threads == NULL) { @@ -106,7 +108,11 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse ++foo; } - while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) { + md = &evlist->mmap[0]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + goto out_init; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_SAMPLE) { @@ -129,9 +135,11 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse goto out_delete_evlist; } nr_events[evsel->idx]++; - perf_evlist__mmap_consume(evlist, 0); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); +out_init: err = 0; evlist__for_each_entry(evlist, evsel) { if (nr_events[evsel->idx] != expected_nr_events[evsel->idx]) {
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for "keep tracking" test
Commit-ID: 693d32aebf857ef1d1803b08ef1b631990ae3747 Gitweb: https://git.kernel.org/tip/693d32aebf857ef1d1803b08ef1b631990ae3747 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:03 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:49:01 -0300 perf test: Switch to new perf_mmap__read_event() interface for "keep tracking" test The perf test 'keep tracking' still use the legacy interface. No functional change. Committer testing: # perf test tracking 25: Use a dummy software event to keep tracking : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-6-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/keep-tracking.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c index c46530918938..4590d8fb91ab 100644 --- a/tools/perf/tests/keep-tracking.c +++ b/tools/perf/tests/keep-tracking.c @@ -27,18 +27,24 @@ static int find_comm(struct perf_evlist *evlist, const char *comm) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; int i, found; found = 0; for (i = 0; i < evlist->nr_mmaps; i++) { - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { if (event->header.type == PERF_RECORD_COMM && (pid_t)event->comm.pid == getpid() && (pid_t)event->comm.tid == getpid() && strcmp(event->comm.comm, comm) == 0) found += 1; - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); } + perf_mmap__read_done(md); } return found; }
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for 'code reading' test
Commit-ID: 00fc2460e735fa0f6add802c7426273e7dbc2b27 Gitweb: https://git.kernel.org/tip/00fc2460e735fa0f6add802c7426273e7dbc2b27 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:02 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:48:36 -0300 perf test: Switch to new perf_mmap__read_event() interface for 'code reading' test The perf test 'object code reading' still use the legacy interface. No functional change. Committer notes: Testing: # perf test reading 23: Object code reading: Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-5-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/code-reading.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c index c7115d369511..03ed8c77b1bb 100644 --- a/tools/perf/tests/code-reading.c +++ b/tools/perf/tests/code-reading.c @@ -409,15 +409,22 @@ static int process_events(struct machine *machine, struct perf_evlist *evlist, struct state *state) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; int i, ret; for (i = 0; i < evlist->nr_mmaps; i++) { - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { ret = process_event(machine, evlist, event, state); - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); if (ret < 0) return ret; } + perf_mmap__read_done(md); } return 0; }
[tip:perf/core] perf python: Switch to new perf_mmap__read_event() interface
Commit-ID: 35b7cdc6379ea8300161f0f80fe8aad083a1c5d0 Gitweb: https://git.kernel.org/tip/35b7cdc6379ea8300161f0f80fe8aad083a1c5d0 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:00 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:47:07 -0300 perf python: Switch to new perf_mmap__read_event() interface The perf python binding still use the legacy interface. No functional change. Committer notes: Tested before and after with: [root@jouet perf]# export PYTHONPATH=/tmp/build/perf/python [root@jouet perf]# tools/perf/python/twatch.py cpu: 0, pid: 1183, tid: 6293 { type: exit, pid: 1183, ppid: 1183, tid: 6293, ptid: 6293, time: 17886646588257} cpu: 2, pid: 13820, tid: 13820 { type: fork, pid: 13820, ppid: 13820, tid: 6306, ptid: 13820, time: 17886869099529} cpu: 1, pid: 13820, tid: 6306 { type: comm, pid: 13820, tid: 6306, comm: TaskSchedulerFo } ^CTraceback (most recent call last): File "tools/perf/python/twatch.py", line 68, in main() File "tools/perf/python/twatch.py", line 40, in main evlist.poll(timeout = -1) KeyboardInterrupt [root@jouet perf]# No problems found. Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-3-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/python.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 2918cac7a142..35fb5ef7d290 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -983,13 +983,19 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, union perf_event *event; int sample_id_all = 1, cpu; static char *kwlist[] = { "cpu", "sample_id_all", NULL }; + struct perf_mmap *md; + u64 end, start; int err; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "i|i", kwlist, &cpu, &sample_id_all)) return NULL; - event = perf_evlist__mmap_read(evlist, cpu); + md = &evlist->mmap[cpu]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + goto end; + + event = perf_mmap__read_event(md, false, &start, end); if (event != NULL) { PyObject *pyevent = pyrf_event__new(event); struct pyrf_event *pevent = (struct pyrf_event *)pyevent; @@ -1007,14 +1013,14 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, err = perf_evsel__parse_sample(evsel, event, &pevent->sample); /* Consume the even only after we parsed it out. */ - perf_evlist__mmap_consume(evlist, cpu); + perf_mmap__consume(md, false); if (err) return PyErr_Format(PyExc_OSError, "perf: can't parse sample, err=%d", err); return pyevent; } - +end: Py_INCREF(Py_None); return Py_None; }
[tip:perf/core] perf test: Switch to new perf_mmap__read_event() interface for bpf
Commit-ID: 2f54f3a4733c0cd857992d793af5e8321b281012 Gitweb: https://git.kernel.org/tip/2f54f3a4733c0cd857992d793af5e8321b281012 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:09:01 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:47:54 -0300 perf test: Switch to new perf_mmap__read_event() interface for bpf The perf test 'bpf' still use the legacy interface. No functional change. Committer notes: Tested with: # perf test bpf 39: BPF filter: 39.1: Basic BPF filtering : Ok 39.2: BPF pinning : Ok 39.3: BPF prologue generation : Ok 39.4: BPF relocation checker : Ok # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-4-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/tests/bpf.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c index e8399beca62b..09c9c9f9e827 100644 --- a/tools/perf/tests/bpf.c +++ b/tools/perf/tests/bpf.c @@ -176,13 +176,20 @@ static int do_test(struct bpf_object *obj, int (*func)(void), for (i = 0; i < evlist->nr_mmaps; i++) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { const u32 type = event->header.type; if (type == PERF_RECORD_SAMPLE) count ++; } + perf_mmap__read_done(md); } if (count != expect) {
[tip:perf/core] perf trace: Switch to new perf_mmap__read_event() interface
Commit-ID: d7f55c62e63461c4071afe8730851e406935d960 Gitweb: https://git.kernel.org/tip/d7f55c62e63461c4071afe8730851e406935d960 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:08:59 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:41:59 -0300 perf trace: Switch to new perf_mmap__read_event() interface The 'perf trace' utility still use the legacy interface. Switch to the new perf_mmap__read_event() interface. No functional change. Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-2-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-trace.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index e7f1b182fc15..1a93debc1e8d 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -2472,8 +2472,14 @@ again: for (i = 0; i < evlist->nr_mmaps; i++) { union perf_event *event; + struct perf_mmap *md; + u64 end, start; - while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) { + md = &evlist->mmap[i]; + if (perf_mmap__read_init(md, false, &start, &end) < 0) + continue; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { struct perf_sample sample; ++trace->nr_events; @@ -2486,7 +2492,7 @@ again: trace__handle_event(trace, event, &sample); next_event: - perf_evlist__mmap_consume(evlist, i); + perf_mmap__consume(md, false); if (interrupted) goto out_disable; @@ -2496,6 +2502,7 @@ next_event: draining = true; } } + perf_mmap__read_done(md); } if (trace->nr_events == before) {
[tip:perf/core] perf record: Fix crash in pipe mode
Commit-ID: ad46e48c65fa1f204fa29eaff1b91174d314a94b Gitweb: https://git.kernel.org/tip/ad46e48c65fa1f204fa29eaff1b91174d314a94b Author: Jiri Olsa AuthorDate: Fri, 2 Mar 2018 17:13:54 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:45 -0300 perf record: Fix crash in pipe mode Currently we can crash perf record when running in pipe mode, like: $ perf record ls | perf report # To display the perf.data header info, please use --header/--header-only options. # perf: Segmentation fault Error: The - file has no samples! The callstack of the crash is: 0x00515242 in perf_event__synthesize_event_update_name 3513ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]); (gdb) bt #0 0x00515242 in perf_event__synthesize_event_update_name #1 0x005158a4 in perf_event__synthesize_extra_attr #2 0x00443347 in record__synthesize #3 0x004438e3 in __cmd_record #4 0x0044514e in cmd_record #5 0x004cbc95 in run_builtin #6 0x004cbf02 in handle_internal_command #7 0x004cc054 in run_argv #8 0x004cc422 in main The reason of the crash is that the evsel does not have ids array allocated and the pipe's synthesize code tries to access it. We don't force evsel ids allocation when we have single event, because it's not needed. However we need it when we are in pipe mode even for single event as a key for evsel update event. Fixing this by forcing evsel ids allocation event for single event, when we are in pipe mode. Signed-off-by: Jiri Olsa Cc: Alexander Shishkin Cc: David Ahern Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180302161354.30192-1-jo...@kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-record.c | 9 + tools/perf/perf.h | 1 + tools/perf/util/record.c| 8 ++-- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 62387942a1d5..12230ddb6506 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -882,6 +882,15 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) } } + /* +* If we have just single event and are sending data +* through pipe, we need to force the ids allocation, +* because we synthesize event name through the pipe +* and need the id for that. +*/ + if (data->is_pipe && rec->evlist->nr_entries == 1) + rec->opts.sample_id = true; + if (record__open(rec) != 0) { err = -1; goto out_child; diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 007e0dfd5ce3..8fec1abd0f1f 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -62,6 +62,7 @@ struct record_opts { bool overwrite; bool ignore_missing_thread; bool strict_freq; + bool sample_id; unsigned int freq; unsigned int mmap_pages; unsigned int auxtrace_mmap_pages; diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index 4f1a82e76d39..9cfc7bf16531 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -138,6 +138,7 @@ void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts, struct perf_evsel *evsel; bool use_sample_identifier = false; bool use_comm_exec; + bool sample_id = opts->sample_id; /* * Set the evsel leader links before we configure attributes, @@ -164,8 +165,7 @@ void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts, * match the id. */ use_sample_identifier = perf_can_sample_identifier(); - evlist__for_each_entry(evlist, evsel) - perf_evsel__set_sample_id(evsel, use_sample_identifier); + sample_id = true; } else if (evlist->nr_entries > 1) { struct perf_evsel *first = perf_evlist__first(evlist); @@ -175,6 +175,10 @@ void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts, use_sample_identifier = perf_can_sample_identifier(); break; } + sample_id = true; + } + + if (sample_id) { evlist__for_each_entry(evlist, evsel) perf_evsel__set_sample_id(evsel, use_sample_identifier); }
[tip:perf/core] perf kvm: Switch to new perf_mmap__read_event() interface
Commit-ID: 53172f9057e92c9b27f0bbf2a46827d87f12b0d2 Gitweb: https://git.kernel.org/tip/53172f9057e92c9b27f0bbf2a46827d87f12b0d2 Author: Kan Liang AuthorDate: Thu, 1 Mar 2018 18:08:58 -0500 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 10:41:36 -0300 perf kvm: Switch to new perf_mmap__read_event() interface The perf kvm still use the legacy interface. Switch to the new perf_mmap__read_event() interface for perf kvm. No functional change. Committer notes: Tested before and after running: # perf kvm stat record On a machine with a kvm guest, then used: # perf kvm stat report Before/after results match and look like: # perf kvm stat record -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.132 MB perf.data.guest (1828 samples) ] # perf kvm stat report Analyze events for all VMs, all VCPUs: VM-EXIT Samples Samples% Time% Min TimeMax TimeAvg time IO_INSTRUCTION 258 40.06% 0.08% 3.51us122.54us 14.87us (+- 6.76%) MSR_WRITE 178 27.64% 0.01% 0.47us 6.34us 2.18us (+- 4.80%) EPT_MISCONFIG 148 22.98% 0.03% 3.76us 65.60us 11.22us (+- 8.14%) HLT 477.30% 99.88% 181.69us 249988.06us 102061.36us (+-13.49%) PAUSE_INSTRUCTION 50.78% 0.00% 0.38us 0.79us 0.47us (+-17.05%) MSR_READ 40.62% 0.00% 1.14us 3.33us 2.67us (+-19.35%) EXTERNAL_INTERRUPT 20.31% 0.00% 2.15us 2.17us 2.16us (+- 0.30%) PENDING_INTERRUPT 10.16% 0.00% 2.56us 2.56us 2.56us (+- 0.00%) PREEMPTION_TIMER 10.16% 0.00% 3.21us 3.21us 3.21us (+- 0.00%) Total Samples:644, Total events handled time:4802790.72us. # Signed-off-by: Kan Liang Tested-by: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/r/1519945751-37786-1-git-send-email-kan.li...@linux.intel.com [ Changed bool parameters from 0 to 'false', as per Jiri comment ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-kvm.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 55d919dc5bc6..d2703d3b8366 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -743,16 +743,24 @@ static bool verify_vcpu(int vcpu) static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, u64 *mmap_time) { + struct perf_evlist *evlist = kvm->evlist; union perf_event *event; + struct perf_mmap *md; + u64 end, start; u64 timestamp; s64 n = 0; int err; *mmap_time = ULLONG_MAX; - while ((event = perf_evlist__mmap_read(kvm->evlist, idx)) != NULL) { - err = perf_evlist__parse_sample_timestamp(kvm->evlist, event, ×tamp); + md = &evlist->mmap[idx]; + err = perf_mmap__read_init(md, false, &start, &end); + if (err < 0) + return (err == -EAGAIN) ? 0 : -1; + + while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) { + err = perf_evlist__parse_sample_timestamp(evlist, event, ×tamp); if (err) { - perf_evlist__mmap_consume(kvm->evlist, idx); + perf_mmap__consume(md, false); pr_err("Failed to parse sample\n"); return -1; } @@ -762,7 +770,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, * FIXME: Here we can't consume the event, as perf_session__queue_event will *point to it, and it'll get possibly overwritten by the kernel. */ - perf_evlist__mmap_consume(kvm->evlist, idx); + perf_mmap__consume(md, false); if (err) { pr_err("Failed to enqueue sample: %d\n", err); @@ -779,6 +787,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, break; } + perf_mmap__read_done(md); return n; }
[tip:perf/core] perf annotate: Find 'call' instruction target symbol at parsing time
Commit-ID: 696703af37a28892db89ff6a6d0cdfde6fb803ab Gitweb: https://git.kernel.org/tip/696703af37a28892db89ff6a6d0cdfde6fb803ab Author: Arnaldo Carvalho de Melo AuthorDate: Fri, 2 Mar 2018 11:59:36 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:45 -0300 perf annotate: Find 'call' instruction target symbol at parsing time So that we do it just once, not everytime we press enter or -> on a 'call' instruction line. Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-uysyojl1e6nm94amzzzs0...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/ui/browsers/annotate.c | 17 + tools/perf/util/annotate.c| 38 +- tools/perf/util/annotate.h| 1 + 3 files changed, 27 insertions(+), 29 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 6ff6839558b0..618edf96353c 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -568,35 +568,28 @@ static bool annotate_browser__callq(struct annotate_browser *browser, struct map_symbol *ms = browser->b.priv; struct disasm_line *dl = disasm_line(browser->selection); struct annotation *notes; - struct addr_map_symbol target = { - .map = ms->map, - .addr = map__objdump_2mem(ms->map, dl->ops.target.addr), - }; char title[SYM_TITLE_MAX_SIZE]; if (!ins__is_call(&dl->ins)) return false; - if (map_groups__find_ams(&target) || - map__rip_2objdump(target.map, target.map->map_ip(target.map, -target.addr)) != - dl->ops.target.addr) { + if (!dl->ops.target.sym) { ui_helpline__puts("The called function was not found."); return true; } - notes = symbol__annotation(target.sym); + notes = symbol__annotation(dl->ops.target.sym); pthread_mutex_lock(¬es->lock); - if (notes->src == NULL && symbol__alloc_hist(target.sym) < 0) { + if (notes->src == NULL && symbol__alloc_hist(dl->ops.target.sym) < 0) { pthread_mutex_unlock(¬es->lock); ui__warning("Not enough memory for annotating '%s' symbol!\n", - target.sym->name); + dl->ops.target.sym->name); return true; } pthread_mutex_unlock(¬es->lock); - symbol__tui_annotate(target.sym, target.map, evsel, hbt); + symbol__tui_annotate(dl->ops.target.sym, ms->map, evsel, hbt); sym_title(ms->sym, ms->map, title, sizeof(title)); ui_browser__show_title(&browser->b, title); return true; diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 28b233c3dcbe..49ff825f745c 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -187,6 +187,9 @@ bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2) static int call__parse(struct arch *arch, struct ins_operands *ops, struct map *map) { char *endptr, *tok, *name; + struct addr_map_symbol target = { + .map = map, + }; ops->target.addr = strtoull(ops->raw, &endptr, 16); @@ -208,28 +211,29 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map * ops->target.name = strdup(name); *tok = '>'; - return ops->target.name == NULL ? -1 : 0; + if (ops->target.name == NULL) + return -1; +find_target: + target.addr = map__objdump_2mem(map, ops->target.addr); -indirect_call: - tok = strchr(endptr, '*'); - if (tok == NULL) { - struct symbol *sym = map__find_symbol(map, map->map_ip(map, ops->target.addr)); - if (sym != NULL) - ops->target.name = strdup(sym->name); - else - ops->target.addr = 0; - return 0; - } + if (map_groups__find_ams(&target) == 0 && + map__rip_2objdump(target.map, map->map_ip(target.map, target.addr)) == ops->target.addr) + ops->target.sym = target.sym; - ops->target.addr = strtoull(tok + 1, NULL, 16); return 0; + +indirect_call: + tok = strchr(endptr, '*'); + if (tok != NULL) + ops->target.addr = strtoull(tok + 1, NULL, 16); + goto find_target; } static int call__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { - if (ops->target.name) - return scnprintf(bf, size, "%-6s %s", ins->name, ops->target.name); + if (ops->target.sym) + return scnprintf(bf, size, "%-6s %s", ins->name, ops->target.sym->name); if (ops->target.addr == 0)
[tip:perf/core] perf record: Throttle user defined frequencies to the maximum allowed
Commit-ID: b09c2364a4dc2a67e640c2b839d936302815693f Gitweb: https://git.kernel.org/tip/b09c2364a4dc2a67e640c2b839d936302815693f Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 1 Mar 2018 14:52:50 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:44 -0300 perf record: Throttle user defined frequencies to the maximum allowed # perf record -F 20 sleep 1 warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz. The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate. The kernel will lower it when perf's interrupts take too long. Use --strict-freq to disable this throttling, refusing to record. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ] # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 For those wanting that it fails if the desired frequency can't be used: # perf record --strict-freq -F 20 sleep 1 error: Maximum frequency rate (15,000 Hz) exceeded. Please use -F freq option with a lower value or consider tweaking /proc/sys/kernel/perf_event_max_sample_rate. # Suggested-by: Ingo Molnar Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-oyebruc44nlja499nqkr1...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-record.txt | 7 ++- tools/perf/builtin-record.c | 2 ++ tools/perf/perf.h| 1 + tools/perf/util/record.c | 20 +++- 4 files changed, 24 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 94f2faebc7f0..cc37b3a4be76 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -191,11 +191,16 @@ OPTIONS -i:: --no-inherit:: Child tasks do not inherit counters. + -F:: --freq=:: Profile at this frequency. Use 'max' to use the currently maximum allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate - sysctl. + sysctl. Will throttle down to the currently maximum allowed frequency. + See --strict-freq. + +--strict-freq:: + Fail if the specified frequency can't be used. -m:: --mmap-pages=:: diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index e1821eea14ef..62387942a1d5 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -1543,6 +1543,8 @@ static struct option __record_options[] = { OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize, "synthesize non-sample events at the end of output"), OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"), + OPT_BOOLEAN(0, "strict-freq", &record.opts.strict_freq, + "Fail if the specified frequency can't be used"), OPT_CALLBACK('F', "freq", &record.opts, "freq or 'max'", "profile at this frequency", record__parse_freq), diff --git a/tools/perf/perf.h b/tools/perf/perf.h index a5df8bf73a68..007e0dfd5ce3 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -61,6 +61,7 @@ struct record_opts { bool tail_synthesize; bool overwrite; bool ignore_missing_thread; + bool strict_freq; unsigned int freq; unsigned int mmap_pages; unsigned int auxtrace_mmap_pages; diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index acabf54ceccb..4f1a82e76d39 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -216,11 +216,21 @@ static int record_opts__config_freq(struct record_opts *opts) * User specified frequency is over current maximum. */ if (user_freq && (max_rate < opts->freq)) { - pr_err("Maximum frequency rate (%u) reached.\n" - "Please use -F freq option with lower value or consider\n" - "tweaking /proc/sys/kernel/perf_event_max_sample_rate.\n", - max_rate); - return -1; + if (opts->strict_freq) { + pr_err("error: Maximum frequency rate (%'u Hz) exceeded.\n" + " Please use -F freq option with a lower value or consider\n" + " tweaking /proc/sys/kernel/perf_event_max_sample_rate.\n", + max_rate); + return -1; + } else { + pr_warning("warning: Maximum f
[tip:perf/core] perf top browser: Show sample_freq in browser title line
Commit-ID: a9980a6dbb9efd154b032ad729c869784302f361 Gitweb: https://git.kernel.org/tip/a9980a6dbb9efd154b032ad729c869784302f361 Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 1 Mar 2018 14:22:12 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:43 -0300 perf top browser: Show sample_freq in browser title line The '--stdio' 'perf top' UI shows it, so lets remove this UI difference and show it too in '--tui', will be useful for 'perf top --tui -F max'. Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-n3wd8n395uo4y9irst29p...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/ui/browsers/hists.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c index 6495ee55d9c3..de2bde232cb3 100644 --- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -2223,7 +2223,7 @@ static int perf_evsel_browser_title(struct hist_browser *browser, u64 nr_events = hists->stats.total_period; struct perf_evsel *evsel = hists_to_evsel(hists); const char *ev_name = perf_evsel__name(evsel); - char buf[512]; + char buf[512], sample_freq_str[64] = ""; size_t buflen = sizeof(buf); char ref[30] = " show reference callgraph, "; bool enable_ref = false; @@ -2255,10 +2255,14 @@ static int perf_evsel_browser_title(struct hist_browser *browser, if (symbol_conf.show_ref_callgraph && strstr(ev_name, "call-graph=no")) enable_ref = true; + + if (!is_report_browser(hbt)) + scnprintf(sample_freq_str, sizeof(sample_freq_str), " %d Hz,", evsel->attr.sample_freq); + nr_samples = convert_unit(nr_samples, &unit); printed = scnprintf(bf, size, - "Samples: %lu%c of event '%s',%sEvent count (approx.): %" PRIu64, - nr_samples, unit, ev_name, enable_ref ? ref : " ", nr_events); + "Samples: %lu%c of event '%s',%s%sEvent count (approx.): %" PRIu64, + nr_samples, unit, ev_name, sample_freq_str, enable_ref ? ref : " ", nr_events); if (hists->uid_filter_str)
[tip:perf/core] perf top: Allow asking for the maximum allowed sample rate
Commit-ID: 7831bf236505bcb2a0a1255e7f3e902a0cb732d6 Gitweb: https://git.kernel.org/tip/7831bf236505bcb2a0a1255e7f3e902a0cb732d6 Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 1 Mar 2018 14:25:56 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:44 -0300 perf top: Allow asking for the maximum allowed sample rate Add the handy '-F max' shortcut, just introduced to 'perf record', to reading and using the kernel.perf_event_max_sample_rate value as the user supplied sampling frequency: Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-hz04f296zccknnb5at06a...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-top.txt | 4 +++- tools/perf/builtin-top.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 8a32cc77bead..a039407d63b8 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -55,7 +55,9 @@ Default is to monitor all CPUS. -F :: --freq=:: - Profile at this frequency. + Profile at this frequency. Use 'max' to use the currently maximum + allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate + sysctl. -i:: --inherit:: diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 35ac016fcb98..bb4f9fafd11d 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1307,7 +1307,9 @@ int cmd_top(int argc, const char **argv) OPT_STRING(0, "sym-annotate", &top.sym_filter, "symbol name", "symbol to annotate"), OPT_BOOLEAN('z', "zero", &top.zero, "zero history across updates"), - OPT_UINTEGER('F', "freq", &opts->user_freq, "profile at this frequency"), + OPT_CALLBACK('F', "freq", &top.record_opts, "freq or 'max'", +"profile at this frequency", + record__parse_freq), OPT_INTEGER('E', "entries", &top.print_entries, "display this many functions"), OPT_BOOLEAN('U', "hide_user_symbols", &top.hide_user_symbols,
[tip:perf/core] perf record: Allow asking for the maximum allowed sample rate
Commit-ID: 67230479b2304be99e9451ee171aa288a112ea16 Gitweb: https://git.kernel.org/tip/67230479b2304be99e9451ee171aa288a112ea16 Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 1 Mar 2018 13:46:23 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:43 -0300 perf record: Allow asking for the maximum allowed sample rate Add the handy '-F max' shortcut to reading and using the kernel.perf_event_max_sample_rate value as the user supplied sampling frequency: # perf record -F max sleep 1 info: Using a maximum frequency rate of 15,000 Hz [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ] # sysctl kernel.perf_event_max_sample_rate kernel.perf_event_max_sample_rate = 15000 # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 # perf record -F 10 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ] # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 # Suggested-by: Ingo Molnar Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hm...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-record.txt | 4 +++- tools/perf/builtin-record.c | 7 ++- tools/perf/perf.h| 2 ++ tools/perf/util/record.c | 23 +++ 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 76bc2181d214..94f2faebc7f0 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -193,7 +193,9 @@ OPTIONS Child tasks do not inherit counters. -F:: --freq=:: - Profile at this frequency. + Profile at this frequency. Use 'max' to use the currently maximum + allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate + sysctl. -m:: --mmap-pages=:: diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 907267206973..e1821eea14ef 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -45,6 +45,7 @@ #include #include +#include #include #include #include @@ -1542,7 +1543,9 @@ static struct option __record_options[] = { OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize, "synthesize non-sample events at the end of output"), OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"), - OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"), + OPT_CALLBACK('F', "freq", &record.opts, "freq or 'max'", +"profile at this frequency", + record__parse_freq), OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]", "number of mmap data pages and AUX area tracing mmap pages", record__parse_mmap_pages), @@ -1651,6 +1654,8 @@ int cmd_record(int argc, const char **argv) struct record *rec = &record; char errbuf[BUFSIZ]; + setlocale(LC_ALL, ""); + #ifndef HAVE_LIBBPF_SUPPORT # define set_nobuild(s, l, c) set_option_nobuild(record_options, s, l, "NO_LIBBPF=1", c) set_nobuild('\0', "clang-path", true); diff --git a/tools/perf/perf.h b/tools/perf/perf.h index cfe46236a5e5..a5df8bf73a68 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -82,4 +82,6 @@ struct record_opts { struct option; extern const char * const *record_usage; extern struct option *record_options; + +int record__parse_freq(const struct option *opt, const char *str, int unset); #endif diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index 1e97937b03a9..acabf54ceccb 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -5,6 +5,7 @@ #include "parse-events.h" #include #include +#include #include "util.h" #include "cloexec.h" @@ -287,3 +288,25 @@ out_delete: perf_evlist__delete(temp_evlist); return ret; } + +int record__parse_freq(const struct option *opt, const char *str, int unset __maybe_unused) +{ + unsigned int freq; + struct record_opts *opts = opt->value; + + if (!str) + return -EINVAL; + + if (strcasecmp(str, "max") == 0) { + if (get_max_rate(&freq)) { + pr_err("couldn
[PATCH] perf stat: fix cvs output format
From: Ilya Pronin When printing stats in CSV mode, perf stat appends extra CSV separators when counter is not supported: ,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00 which causes a failure of parsing fields. The numbers of separators is fixed for each line, no matter supported or not supported. Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output") Cc: Andi Kleen Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Signed-off-by: Ilya Pronin Signed-off-by: Cong Wang --- tools/perf/builtin-stat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..54a4c152edb3 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -917,7 +917,7 @@ static void print_metric_csv(void *ctx, char buf[64], *vals, *ends; if (unit == NULL || fmt == NULL) { - fprintf(out, "%s%s%s%s", csv_sep, csv_sep, csv_sep, csv_sep); + fprintf(out, "%s%s", csv_sep, csv_sep); return; } snprintf(buf, sizeof(buf), fmt, val); -- 2.13.0
[tip:perf/core] perf tests: Switch trace+probe_libc_inet_pton to use record
Commit-ID: a18ee796f8af5569628c324700b9a34b4488 Gitweb: https://git.kernel.org/tip/a18ee796f8af5569628c324700b9a34b4488 Author: Jiri Olsa AuthorDate: Thu, 1 Mar 2018 17:52:14 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:42 -0300 perf tests: Switch trace+probe_libc_inet_pton to use record There's a problem with relying on backtrace data from 'perf trace' the way the trace+probe_libc_inet_pton does. This test inserts uprobe within ping binary and checks that it gets its sample using 'perf trace'. It also checks it gets proper backtrace from sample and that's where the issue is. The 'perf trace' does not sort events (by definition) so it can happen that it processes the event sample before the ping binary memory map event. This can (very rarely) happen as proved by this events dump output (from custom added debug output): ... 7680/7680: [0x7f4e29718000(0x204000) @ 0 fd:00 33611321 4230892504]: r-xp /usr/lib64/libdl-2.17.so 7680/7680: [0x7f4e29502000(0x216000) @ 0 fd:00 33617257 2606846872]: r-xp /usr/lib64/libz.so.1.2.7 (IP, 0x2): 7680/7680: 0x7f4e29c2ed60 period: 1 addr: 0 7680/7680: [0x564842ef(0x233000) @ 0 fd:00 83 1989280200]: r-xp /usr/bin/ping 7680/7680: [0x7f4e2aca2000(0x224000) @ 0 fd:00 33611308 1219144940]: r-xp /usr/lib64/ld-2.17.so ... In this case 'perf trace' fails to resolve the last callchain IP (within the ping binary) because it does not know about the ping binary memory map yet and the test fails like this: PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.037 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms 0.000 probe_libc:inet_pton:(7f4e29c2ed60)) __GI___inet_pton (/usr/lib64/libc-2.17.so) getaddrinfo (/usr/lib64/libc-2.17.so) [0] ([unknown]) FAIL: expected backtrace entry 8 ".*\(.*/bin/ping.*\)$" got "[0] ([unknown])" Switching the test to use 'perf record' and 'perf script' instead of 'perf trace'. Signed-off-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: David Ahern Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180301165215.6780-1-jo...@kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- .../perf/tests/shell/trace+probe_libc_inet_pton.sh | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh index 8c4ab0b390c0..52c3ee701a89 100755 --- a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh +++ b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh @@ -15,30 +15,28 @@ nm -g $libc 2>/dev/null | fgrep -q inet_pton || exit 254 trace_libc_inet_pton_backtrace() { idx=0 - expected[0]="PING.*bytes" - expected[1]="64 bytes from ::1.*" - expected[2]=".*ping statistics.*" - expected[3]=".*packets transmitted.*" - expected[4]="rtt min.*" - expected[5]="[0-9]+\.[0-9]+[[:space:]]+probe_libc:inet_pton:\([[:xdigit:]]+\)" - expected[6]=".*inet_pton[[:space:]]\($libc|inlined\)$" + expected[0]="ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)" + expected[1]=".*inet_pton[[:space:]]\($libc\)$" case "$(uname -m)" in s390x) eventattr='call-graph=dwarf' - expected[7]="gaih_inet.*[[:space:]]\($libc|inlined\)$" - expected[8]="__GI_getaddrinfo[[:space:]]\($libc|inlined\)$" - expected[9]="main[[:space:]]\(.*/bin/ping.*\)$" - expected[10]="__libc_start_main[[:space:]]\($libc\)$" - expected[11]="_start[[:space:]]\(.*/bin/ping.*\)$" + expected[2]="gaih_inet.*[[:space:]]\($libc|inlined\)$" + expected[3]="__GI_getaddrinfo[[:space:]]\($libc|inlined\)$" + expected[4]="main[[:space:]]\(.*/bin/ping.*\)$" + expected[5]="__libc_start_main[[:space:]]\($libc\)$" + expected[6]="_start[[:space:]]\(.*/bin/ping.*\)$" ;; *) eventattr='max-stack=3' - expected[7]="getaddrinfo[[:space:]]\($libc\)$" - expected[8]=".*\(.*/bin/ping.*\)$" + expected[2]="getaddrinfo[[:space:]]\($libc\)$" + expected[3]=".*\(.*/bin/ping.*\)$" ;; esac - perf trace --no-syscalls -e probe_libc:inet_pton/$eventattr/ ping -6 -c 1 ::1 2>&1 | grep -v ^$ | while read line ; do + file=`mktemp -u /tmp/perf.data.XXX` + + perf record -e probe_libc:inet_pton/$eventattr/ -o $file ping -6 -c 1 ::1 > /dev/null 2>&1 + perf script -i $file | while read line ; do echo $line echo "$line" | egrep -q "${expected[$idx]}" if [ $? -ne 0 ] ; then @@ -48,6 +46,8 @@ trace_libc_inet_pton_backtrace() { let idx+=1
[tip:perf/core] perf tests: Rename trace+probe_libc_inet_pton to record+probe_libc_inet_pton
Commit-ID: 4f67336870f641daa485ea504777486e24a9aece Gitweb: https://git.kernel.org/tip/4f67336870f641daa485ea504777486e24a9aece Author: Jiri Olsa AuthorDate: Thu, 1 Mar 2018 17:52:15 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:58:42 -0300 perf tests: Rename trace+probe_libc_inet_pton to record+probe_libc_inet_pton Because the test is no longer using perf trace but perf record instead. Signed-off-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: David Ahern Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180301165215.6780-2-jo...@kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- .../{trace+probe_libc_inet_pton.sh => record+probe_libc_inet_pton.sh} | 0 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh similarity index 100% rename from tools/perf/tests/shell/trace+probe_libc_inet_pton.sh rename to tools/perf/tests/shell/record+probe_libc_inet_pton.sh
[tip:perf/core] perf stat: Ignore error thread when enabling system-wide --per-thread
Commit-ID: ab6c79b819f5a50cf41a11ebec17bef63b530333 Gitweb: https://git.kernel.org/tip/ab6c79b819f5a50cf41a11ebec17bef63b530333 Author: Jin Yao AuthorDate: Tue, 16 Jan 2018 23:43:08 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 27 Feb 2018 11:29:21 -0300 perf stat: Ignore error thread when enabling system-wide --per-thread If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-34586.171984 cpu-clock:u (msec) # 0.000 CPUs utilized perf-36700.515599 cpu-clock:u (msec) # 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u # 1.21 insn per cycle perf-3670 3,536 instructions:u # 0.09 insn per cycle vmstat-3458 288,937 branches:u # 46.814 M/sec perf-3670 936 branches:u # 1.815 M/sec vmstat-3458 15,195 branch-misses:u# 5.26% of all branches perf-3670 76 branch-misses:u# 8.12% of all branches 12.651675247 seconds time elapsed Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-stat.c| 14 +- tools/perf/util/evsel.c | 3 +++ tools/perf/util/thread_map.c | 1 + tools/perf/util/thread_map.h | 1 + 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index fadcff52cd09..6214d2b220b2 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -637,7 +637,19 @@ try_again: if (verbose > 0) ui__warning("%s\n", msg); goto try_again; -} + } else if (target__has_per_thread(&target) && + evsel_list->threads && + evsel_list->threads->err_thread != -1) { + /* +* For global --per-thread case, skip current +* error thread. +*/ + if (!thread_map__remove(evsel_list->threads, + evsel_list->threads->err_thread)) { + evsel_list->threads->err_thread = -1; + goto try_again; + } + } perf_evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index ef351688b797..b56e1c2ddaee 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1915,6 +1915,9 @@ try_fallback: goto fallback_missing_features; } out_close: + if (err) + threads->err_thread = thread; + do { while (--thread >= 0) { close(FD(evsel, cpu, thread)); diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 729dad8f412d..5d467d8ae9ab 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -32,6 +32,7 @@ static void
[tip:perf/core] perf annotate browser: Be more robust when drawing jump arrows
Commit-ID: 9c04409d7f5c325233961673356ea8aced6a4ef3 Gitweb: https://git.kernel.org/tip/9c04409d7f5c325233961673356ea8aced6a4ef3 Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 1 Mar 2018 11:33:59 -0300 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 5 Mar 2018 09:57:57 -0300 perf annotate browser: Be more robust when drawing jump arrows This first happened with a gcc function, _cpp_lex_token, that has the usual jumps: │1159e6c: ↓ jne115aa32 <_cpp_lex_token@@Base+0xf92> I.e. jumps to a label inside that function (_cpp_lex_token), and those works, but also this kind: │1159e8b: ↓ jnec469be I.e. jumps to another function, outside _cpp_lex_token, which are not being correctly handled generating as a side effect references to ab->offset[] entries that are set to NULL, so to make this code more robust, check that here. A proper fix for will be put in place, looking at the function name right after the '<' token and probably treating this like a 'call' instruction. For now just don't draw the arrow. Reported-by: Ingo Molnar Reported-by: Linus Torvalds Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Cc: Jin Yao Cc: Kan Liang Link: https://lkml.kernel.org/n/tip-5tzvb875ep2sel03aeefg...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/ui/browsers/annotate.c | 25 + 1 file changed, 25 insertions(+) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index e2f666391ac4..6ff6839558b0 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -328,7 +328,32 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) if (!disasm_line__is_valid_jump(cursor, sym)) return; + /* +* This first was seen with a gcc function, _cpp_lex_token, that +* has the usual jumps: +* +* │1159e6c: ↓ jne115aa32 <_cpp_lex_token@@Base+0xf92> +* +* I.e. jumps to a label inside that function (_cpp_lex_token), and +* those works, but also this kind: +* +* │1159e8b: ↓ jnec469be +* +* I.e. jumps to another function, outside _cpp_lex_token, which +* are not being correctly handled generating as a side effect references +* to ab->offset[] entries that are set to NULL, so to make this code +* more robust, check that here. +* +* A proper fix for will be put in place, looking at the function +* name right after the '<' token and probably treating this like a +* 'call' instruction. +*/ target = ab->offsets[cursor->ops.target.offset]; + if (target == NULL) { + ui_helpline__printf("WARN: jump target inconsistency, press 'o', ab->offsets[%#x] = NULL\n", + cursor->ops.target.offset); + return; + } bcursor = browser_line(&cursor->al); btarget = browser_line(target);
[tip:perf/core] perf top: Fix annoying fallback message on older kernels
Commit-ID: 853745f5e6d95faaae6381c9a01dbd43de992fb3 Gitweb: https://git.kernel.org/tip/853745f5e6d95faaae6381c9a01dbd43de992fb3 Author: Kan Liang AuthorDate: Mon, 26 Feb 2018 10:17:10 -0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 26 Feb 2018 16:04:08 -0300 perf top: Fix annoying fallback message on older kernels On older (e.g. v4.4) kernels, an annoying fallback message can be observed in 'perf top': ┌─Warning:──┐ │fall back to non-overwrite mode│ │ │ │ │ │Press any key... │ └───┘ The 'perf top' utility has been changed to overwrite mode since commit ebebbf082357 ("perf top: Switch default mode to overwrite mode"). For older kernels which don't have overwrite mode support, 'perf top' will fall back to non-overwrite mode and print out the fallback message using ui__warning(), which needs user's input to close. The fallback message is not critical for end users. Turning it to debug message which is printed when running with -vv. Reported-by: Ingo Molnar Signed-off-by: Kan Liang Cc: Kan Liang Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode") Link: http://lkml.kernel.org/r/1519669030-176549-1-git-send-email-kan.li...@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-top.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index b7c823ba8374..35ac016fcb98 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -991,7 +991,7 @@ static int perf_top_overwrite_fallback(struct perf_top *top, evlist__for_each_entry(evlist, counter) counter->attr.write_backward = false; opts->overwrite = false; - ui__warning("fall back to non-overwrite mode\n"); + pr_debug2("fall back to non-overwrite mode\n"); return 1; }
[tip:perf/core] perf cgroup: Simplify arguments when tracking multiple events
Commit-ID: 25f72f9ed88d5be86c92432fc779e2725e3506cd Gitweb: https://git.kernel.org/tip/25f72f9ed88d5be86c92432fc779e2725e3506cd Author: weiping zhang AuthorDate: Mon, 29 Jan 2018 23:48:09 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Thu, 22 Feb 2018 10:02:27 -0300 perf cgroup: Simplify arguments when tracking multiple events When using -G with one cgroup and -e with multiple events, only the first event gets the correct cgroup setting, all events from the second onwards will track system-wide events. If the user wants to track multiple events for a specific cgroup, the user must give parameters like the following: $ perf stat -e e1 -e e2 -e e3 -G test,test,test This patch simplify this case, just type one cgroup: $ perf stat -e e1 -e e2 -e e3 -G test $ mkdir -p /sys/fs/cgroup/perf_event/empty_cgroup $ perf stat -e cycles -e cache-misses -a -I 1000 -G empty_cgroup Before: 1.001007226 cycles empty_cgroup 1.001007226 7,506 cache-misses After: 1.000834097 cycles empty_cgroup 1.000834097 cache-misses empty_cgroup Signed-off-by: weiping zhang Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180129154805.ga6...@localhost.didichuxing.com [ Improved the doc text a bit, providing an example for cgroup + system wide counting ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-record.txt | 6 +- tools/perf/Documentation/perf-stat.txt | 6 +- tools/perf/util/cgroup.c | 17 - 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 3eea6de35a38..76bc2181d214 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -308,7 +308,11 @@ can be provided. Each cgroup is applied to the corresponding event, i.e., first to first event, second cgroup to second event and so on. It is possible to provide an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have corresponding events, i.e., they always refer to events defined earlier on the command -line. +line. If the user wants to track multiple events for a specific cgroup, the user can +use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. + +If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this +command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. -b:: --branch-any:: diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 2bbe79a50d3c..2b38e222016a 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -118,7 +118,11 @@ can be provided. Each cgroup is applied to the corresponding event, i.e., first to first event, second cgroup to second event and so on. It is possible to provide an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have corresponding events, i.e., they always refer to events defined earlier on the command -line. +line. If the user wants to track multiple events for a specific cgroup, the user can +use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. + +If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this +command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. -o file:: --output file:: diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c index 984f69144f87..5dd9b5ea314d 100644 --- a/tools/perf/util/cgroup.c +++ b/tools/perf/util/cgroup.c @@ -157,9 +157,11 @@ int parse_cgroups(const struct option *opt __maybe_unused, const char *str, int unset __maybe_unused) { struct perf_evlist *evlist = *(struct perf_evlist **)opt->value; + struct perf_evsel *counter; + struct cgroup_sel *cgrp = NULL; const char *p, *e, *eos = str + strlen(str); char *s; - int ret; + int ret, i; if (list_empty(&evlist->entries)) { fprintf(stderr, "must define events before cgroups\n"); @@ -188,5 +190,18 @@ int parse_cgroups(const struct option *opt __maybe_unused, const char *str, break; str = p+1; } + /* for the case one cgroup combine to multiple events */ + i = 0; + if (nr_cgroups == 1) { + evlist__for_each_entry(evlist, counter) { + if (i == 0) + cgrp = counter->cgrp; + else { + counter->cgrp = cgrp; + refcount_inc(&cgrp->refcnt); + } + i++; + } + } return 0; }
[tip:perf/core] perf stat: Use xyarray dimensions to iterate fds
Commit-ID: 42811d509d6e0b0118918ce6be346be54d8e8801 Gitweb: https://git.kernel.org/tip/42811d509d6e0b0118918ce6be346be54d8e8801 Author: Andi Kleen AuthorDate: Thu, 5 Oct 2017 19:00:28 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Feb 2018 11:36:57 -0300 perf stat: Use xyarray dimensions to iterate fds Now that the xyarray stores the dimensions we can use those to iterate over the FDs for a evsel. Signed-off-by: Andi Kleen Acked-by: Jiri Olsa Link: http://lkml.kernel.org/r/20171006020029.13339-1-a...@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-stat.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 2d49eccf98f2..fadcff52cd09 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -508,14 +508,13 @@ static int perf_stat_synthesize_config(bool is_pipe) #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y)) -static int __store_counter_ids(struct perf_evsel *counter, - struct cpu_map *cpus, - struct thread_map *threads) +static int __store_counter_ids(struct perf_evsel *counter) { int cpu, thread; - for (cpu = 0; cpu < cpus->nr; cpu++) { - for (thread = 0; thread < threads->nr; thread++) { + for (cpu = 0; cpu < xyarray__max_x(counter->fd); cpu++) { + for (thread = 0; thread < xyarray__max_y(counter->fd); +thread++) { int fd = FD(counter, cpu, thread); if (perf_evlist__id_add_fd(evsel_list, counter, @@ -535,7 +534,7 @@ static int store_counter_ids(struct perf_evsel *counter) if (perf_evsel__alloc_id(counter, cpus->nr, threads->nr)) return -ENOMEM; - return __store_counter_ids(counter, cpus, threads); + return __store_counter_ids(counter); } static bool perf_evsel__should_store_id(struct perf_evsel *counter)
[tip:perf/core] perf kallsyms: Fix the usage on the man page
Commit-ID: de7112868829b3286def38297848d5d2592b4a70 Gitweb: https://git.kernel.org/tip/de7112868829b3286def38297848d5d2592b4a70 Author: Sangwon Hong AuthorDate: Mon, 12 Feb 2018 04:37:44 +0900 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Feb 2018 09:23:36 -0300 perf kallsyms: Fix the usage on the man page First, all man pages highlight only perf and subcommands except 'perf kallsyms', which includes the full usage. Fix it for commands to monopolize underlines. Second, options can be ommited when executing 'perf kallsyms', so add square brackets between . Signed-off-by: Sangwon Hong Acked-by: Namhyung Kim Cc: Jiri Olsa Cc: Taeung Song Link: http://lkml.kernel.org/r/1518377864-20353-1-git-send-email-qpa...@gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-kallsyms.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-kallsyms.txt b/tools/perf/Documentation/perf-kallsyms.txt index 954ea9e21236..cf9f4040ea5c 100644 --- a/tools/perf/Documentation/perf-kallsyms.txt +++ b/tools/perf/Documentation/perf-kallsyms.txt @@ -8,7 +8,7 @@ perf-kallsyms - Searches running kernel for symbols SYNOPSIS [verse] -'perf kallsyms symbol_name[,symbol_name...]' +'perf kallsyms' [] symbol_name[,symbol_name...] DESCRIPTION ---
Re: [GIT PULL 00/28] perf/core improvements and fixes
* Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, I'll cherry pick some into a separate > perf/urgent pull request, like the jump-to-another-function one, after > the usual round of tests, but since I've been working on then in my > perf/core branch, lets flush them now. > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit ddc4becca1409541c2ebb7ecb99b5cef44cf17e4: > > Merge tag 'perf-core-for-mingo-4.17-20180220' of > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core > (2018-02-21 08:50:45 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo-4.17-20180305 > > for you to fetch changes up to 6afad54d2f0ddebacfcf3b829147d7fed8dab298: > > perf mmap: Discard legacy interfaces for mmap read forward (2018-03-05 > 10:51:10 -0300) > > > perf/core improvements and fixes: > > - Be more robust when drawing arrows in the annotation TUI, avoiding a > segfault when jump instructions have as a target addresses in functions > other that the one currently being annotated. The full fix will come in > the following days, when jumping to other functions will work as call > instructions (Arnaldo Carvalho de Melo) > > - Allow asking for the maximum allowed sample rate in 'top' and > 'record', i.e. 'perf record -F max' will read the > kernel.perf_event_max_sample_rate sysctl and use it (Arnaldo Carvalho de > Melo) > > - When the user specifies a freq above kernel.perf_event_max_sample_rate, > Throttle it down to that max freq, and warn the user about it, add as > well --strict-freq so that the previous behaviour of not starting the > session when the desired freq can't be used can be selected (Arnaldo > Carvalho de Melo) > > - Find 'call' instruction target symbol at parsing time, used so far in > the TUI, part of the infrastructure changes that will end up allowing > for jumps to navigate to other functions, just like 'call' > instructions. (Arnaldo Carvalho de Melo) > > - Use xyarray dimensions to iterate fds in 'perf stat' (Andi Kleen) > > - Ignore threads for which the current user hasn't permissions when > enabling system-wide --per-thread (Jin Yao) > > - Fix some backtrace perf test cases to use 'perf record' + 'perf script' > instead, till 'perf trace' starts using ordered_events or equivalent > to avoid symbol resolving artifacts due to reordering of > PERF_RECORD_MMAP events (Jiri Olsa) > > - Fix crash in 'perf record' pipe mode, it needs to allocate the ID > array even for a single event, unlike non-pipe mode (Jiri Olsa) > > - Make annoying fallback message on older kernels with newer 'perf top' > binaries trying to use overwrite mode and that not being present > in the older kernels (Kan Liang) > > - Switch last users of old APIs to the newer perf_mmap__read_event() > one, then discard those old mmap read forward APIs (Kan Liang) > > - Fix the usage on the 'perf kallsyms' man page (Sangwon Hong) > > - Simplify cgroup arguments when tracking multiple events (weiping zhang) > > Signed-off-by: Arnaldo Carvalho de Melo > > > Andi Kleen (1): > perf stat: Use xyarray dimensions to iterate fds > > Arnaldo Carvalho de Melo (6): > perf annotate browser: Be more robust when drawing jump arrows > perf record: Allow asking for the maximum allowed sample rate > perf top browser: Show sample_freq in browser title line > perf top: Allow asking for the maximum allowed sample rate > perf record: Throttle user defined frequencies to the maximum allowed > perf annotate: Find 'call' instruction target symbol at parsing time > > Jin Yao (1): > perf stat: Ignore error thread when enabling system-wide --per-thread > > Jiri Olsa (3): > perf tests: Switch trace+probe_libc_inet_pton to use record > perf tests: Rename trace+probe_libc_inet_pton to > record+probe_libc_inet_pton > perf record: Fix crash in pipe mode > > Kan Liang (15): > perf top: Fix annoying fallback message on older kernels > perf kvm: Switch to new perf_mmap__read_event() interface > perf trace: Switch to new perf_mmap__read_event() interface > perf python: Switch to new perf_mmap__read_event() interface > p
[PATCH] cxgb3: remove VLA
In preparation to enabling -Wvla, remove VLA and replace it with dynamic memory allocation. Signed-off-by: Gustavo A. R. Silva --- drivers/net/ethernet/chelsio/cxgb3/t3_hw.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb3/t3_hw.c b/drivers/net/ethernet/chelsio/cxgb3/t3_hw.c index a89721f..ad6a280 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/t3_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb3/t3_hw.c @@ -683,20 +683,37 @@ int t3_seeprom_wp(struct adapter *adapter, int enable) static int vpdstrtouint(char *s, int len, unsigned int base, unsigned int *val) { - char tok[len + 1]; + char *tok; + int ret; + + tok = kcalloc(len + 1, sizeof(*tok), GFP_KERNEL); + if (!tok) + return -ENOMEM; memcpy(tok, s, len); tok[len] = 0; - return kstrtouint(strim(tok), base, val); + ret = kstrtouint(strim(tok), base, val); + + kfree(tok); + return ret; } static int vpdstrtou16(char *s, int len, unsigned int base, u16 *val) { - char tok[len + 1]; + char *tok; + int ret; + + tok = kcalloc(len + 1, sizeof(*tok), GFP_KERNEL); + if (!tok) + return -ENOMEM; memcpy(tok, s, len); tok[len] = 0; - return kstrtou16(strim(tok), base, val); + + ret = kstrtou16(strim(tok), base, val); + + kfree(tok); + return ret; } /** -- 2.7.4
Re: [PATCH] RDMA/bnxt_re/qplib_sp: Use true and false for boolean values
On Tue, Mar 6, 2018 at 5:06 AM, Gustavo A. R. Silva wrote: > Assign true or false to boolean variables instead of an integer value. > > This issue was detected with the help of Coccinelle. > > Signed-off-by: Gustavo A. R. Silva Thanks. Acked-by: Selvin Xavier > --- > drivers/infiniband/hw/bnxt_re/qplib_sp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c > b/drivers/infiniband/hw/bnxt_re/qplib_sp.c > index ee98e5e..2f3f32ea 100644 > --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c > +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c > @@ -154,7 +154,7 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw, > attr->tqm_alloc_reqs[i * 4 + 3] = *(++tqm_alloc); > } > > - attr->is_atomic = 0; > + attr->is_atomic = false; > bail: > bnxt_qplib_rcfw_free_sbuf(rcfw, sbuf); > return rc; > -- > 2.7.4 >
[PATCH] perf: correct ctx_event_type in ctx_resched()
In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is added. However, ctx_resched() calculates ctx_event_type before checking this condition. As a result, pinned events will NOT get higher priority than flexible events. The following shows this issue on an Intel CPU (where ref-cycles can only use one hardware counter). 1. First start: perf stat -C 0 -e ref-cycles -I 1000 2. Then, in the second console, run: perf stat -C 0 -e ref-cycles:D -I 1000 The second perf uses pinned events, which is expected to have higher priority. However, because it failed in ctx_resched(). It is never run. This patch fixes this by calculating ctx_event_type after re-evaluating event_type. Fixes: 487f05e18aa4 ("perf/core: Optimize event rescheduling on active contexts") Signed-off-by: Song Liu Reported-by: Ephraim Park --- kernel/events/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 5789810..cf52fc0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2246,7 +2246,7 @@ static void ctx_resched(struct perf_cpu_context *cpuctx, struct perf_event_context *task_ctx, enum event_type_t event_type) { - enum event_type_t ctx_event_type = event_type & EVENT_ALL; + enum event_type_t ctx_event_type; bool cpu_event = !!(event_type & EVENT_CPU); /* @@ -2256,6 +2256,8 @@ static void ctx_resched(struct perf_cpu_context *cpuctx, if (event_type & EVENT_PINNED) event_type |= EVENT_FLEXIBLE; + ctx_event_type = event_type & EVENT_ALL; + perf_pmu_disable(cpuctx->ctx.pmu); if (task_ctx) task_ctx_sched_out(cpuctx, task_ctx, event_type); -- 2.9.5
Re: [PATCH] thermal: of: Allow selection of thermal governor in DT
On Tue, Mar 6, 2018 at 2:41 AM, Daniel Lezcano wrote: > On 05/03/2018 19:36, Amit Kucheria wrote: >> From: Ram Chandrasekar >> >> There is currently no way for the governor to be selected for each thermal >> zone in devicetree. This results in the default governor being used for all >> thermal zones even though no such restriction exists in the core code. >> >> Add support for specifying the thermal governor to be used for a thermal >> zone in the devicetree. The devicetree config should specify the governor >> name as a string that matches any available governors. If not specified, we >> maintain the current behaviour of using the default governor. >> >> Signed-off-by: Ram Chandrasekar >> Signed-off-by: Amit Kucheria > > Why not create a kernel parameter (eg. thermal.governor=) ? So everyone > can gain benefit of this feature. And in order to specify that from the > DT, add the 'chosen' node and bootargs with the desired kernel parameter? > This is supposed to be a per-thermal zone property. So specifying it on the command-line, while possible, might be a little cumbersome. I'm not even sure if kernel parameters can have a variable number of arguments. IOW, thermal.tz0.governor=userspace, thermal.tz1.governor=step_wise, thermal.tz2.governor=userspace, . I'm already seeing SoCs defining 8 or more thermal zones.
Re: [PATCH] spi: tegra20-slink: use true and false for boolean values
On Tuesday 06 March 2018 05:23 AM, Gustavo A. R. Silva wrote: Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva Acked-by: Laxman Dewangan
Re: [PATCH v3 03/10] drivers: qcom: rpmh-rsc: log RPMH requests in FTRACE
Hi Lina, Thank you for the patch! Yet something to improve: [auto build test ERROR on robh/for-next] [also build test ERROR on v4.16-rc4 next-20180306] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Lina-Iyer/drivers-qcom-add-RPMH-communication-support/20180305-225623 base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next config: arm64-allmodconfig (attached as .config) compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm64 All error/warnings (new ones prefixed by >>): In file included from include/trace/define_trace.h:96:0, from drivers/soc/qcom/trace-rpmh.h:89, from drivers/soc/qcom/rpmh-rsc.c:28: drivers/soc/qcom/./trace-rpmh.h: In function 'trace_event_raw_event_rpmh_notify': >> drivers/soc/qcom/./trace-rpmh.h:29:3: error: implicit declaration of >> function '__assign_string'; did you mean '__assign_str'? >> [-Werror=implicit-function-declaration] __assign_string(name, d->name); ^ include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:28:2: note: in expansion of macro >> 'TP_fast_assign' TP_fast_assign( ^~ >> drivers/soc/qcom/./trace-rpmh.h:29:19: error: 'name' undeclared (first use >> in this function); did you mean 'node'? __assign_string(name, d->name); ^ include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:28:2: note: in expansion of macro >> 'TP_fast_assign' TP_fast_assign( ^~ drivers/soc/qcom/./trace-rpmh.h:29:19: note: each undeclared identifier is reported only once for each function it appears in __assign_string(name, d->name); ^ include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:28:2: note: in expansion of macro >> 'TP_fast_assign' TP_fast_assign( ^~ drivers/soc/qcom/./trace-rpmh.h: In function 'trace_event_raw_event_rpmh_send_msg': drivers/soc/qcom/./trace-rpmh.h:67:19: error: 'name' undeclared (first use in this function); did you mean 'node'? __assign_string(name, d->name); ^ include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS' PARAMS(assign), \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:50:1: note: in expansion of macro >> 'TRACE_EVENT' TRACE_EVENT(rpmh_send_msg, ^~~ drivers/soc/qcom/./trace-rpmh.h:66:2: note: in expansion of macro 'TP_fast_assign' TP_fast_assign( ^~ In file included from include/trace/define_trace.h:97:0, from drivers/soc/qcom/trace-rpmh.h:89, from drivers/soc/qcom/rpmh-rsc.c:28: drivers/soc/qcom/./trace-rpmh.h: In function 'perf_trace_rpmh_notify': >> drivers/soc/qcom/./trace-rpmh.h:29:19: error: 'name' undeclared (first use >> in this function); did you mean 'node'? __assign_string(name, d->name); ^ include/trace/perf.h:66:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:28:2: note: in expansion of macro >> 'TP_fast_assign' TP_fast_assign( ^~ drivers/soc/qcom/./trace-rpmh.h: In function 'perf_trace_rpmh_send_msg': drivers/soc/qcom/./trace-rpmh.h:67:19: error: 'name' undeclared (first use in this function); did you mean 'node'? __assign_string(name, d->name); ^ include/trace/perf.h:66:4: note: in definition of macro 'DECLARE_EVENT_CLASS' { assign; } \ ^~ include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS' PARAMS(assign), \ ^~ >> drivers/soc/qcom/./trace-rpmh.h:50:1: note: in expansion of macro >> 'TRACE_EVENT'
Re: [PATCH] thermal: of: Allow selection of thermal governor in DT
On Tue, Mar 6, 2018 at 1:38 AM, Rob Herring wrote: > On Mon, Mar 5, 2018 at 12:36 PM, Amit Kucheria > wrote: >> From: Ram Chandrasekar >> >> There is currently no way for the governor to be selected for each thermal >> zone in devicetree. This results in the default governor being used for all >> thermal zones even though no such restriction exists in the core code. >> >> Add support for specifying the thermal governor to be used for a thermal >> zone in the devicetree. The devicetree config should specify the governor >> name as a string that matches any available governors. If not specified, we >> maintain the current behaviour of using the default governor. >> >> Signed-off-by: Ram Chandrasekar >> Signed-off-by: Amit Kucheria >> --- >> Documentation/devicetree/bindings/thermal/thermal.txt | 8 >> drivers/thermal/of-thermal.c | 6 ++ >> 2 files changed, 14 insertions(+) >> >> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt >> b/Documentation/devicetree/bindings/thermal/thermal.txt >> index 1719d47..fced9d3 100644 >> --- a/Documentation/devicetree/bindings/thermal/thermal.txt >> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt >> @@ -168,6 +168,14 @@ Optional property: >> by means of sensor ID. Additional coefficients are >> interpreted as constant offset. >> >> +- thermal-governor: Thermal governor to be used for this thermal zone. >> + Expected values are: >> + "step_wise": Use step wise governor. >> + "fair_share": Use fair share governor. >> + "user_space": Use user space governor. >> + "power_allocator": Use power allocator governor. > > This looks pretty Linux specific. Not that we can't have Linux > specific properties, but we try to avoid them. > > What determines the selection? I'd imagine only certain governors make > sense for certain devices. We should perhaps describe those > characteristics which can then infer the best governor. Not really > sure though... I'm not sure if it would be easy to assign preferred governors to device classes. It is dependent on what devices are present on the system, what throttling knobs they expose and how the system designer decided to integrate it all. e.g. A GPU driver might be controlled in kernel or userspace depending on whether it exposes a devfreq knob or some more esoteric statistics to userspace. Bang Bang governor seems to be designed for Fans with a simple ON/OFF iterface. Userspace governor is designed to move thermal policy to userspace (e.g. through thermald). So backlight brightness, battery charging, GPU scaling, even cpu frequency scaling can be offloaded to userspace. On embedded platforms, modem control typically happens in userspace Power allocator governor is designed for a closed-loop system to keep the total TDP of the platform under control while allowing various devices (cpu, gpu, modem, etc.) to dynamically increase or decrease their individual budget depending on the usecase. Regards, Amit
Re: [PATCH] dump_stack: convert generic dump_stack into a weak symbol
2018-03-06 12:31 GMT+08:00 Sergey Senozhatsky : > On (03/06/18 10:50), Greentime Hu wrote: > [..] >> > Greentime Hu, you tested this on nds32. Could I use your Tested-by, >> > please? >> > >> >> Yes, please use it. :) > > Thanks. > > To be sure, is this > > Tested-by: Greentime Hu # nds32 > or > Acked-by: Greentime Hu # nds32 > Acked-by is prefered. Thanks.
[RFC] rcu: Prevent expedite reporting within RCU read-side section
Hello Paul and RCU folks, I am afraid I correctly understand and fix it. But I really wonder why sync_rcu_exp_handler() reports the quiescent state even in the case that current task is within a RCU read-side section. Do I miss something? If I correctly understand it and you agree with it, I can add more logic which make it more expedited by boosting current or making it urgent when we fail to report the quiescent state on the IPI. ->8- >From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001 From: Byungchul Park Date: Tue, 6 Mar 2018 13:54:41 +0900 Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section We report the quiescent state for this cpu if it's out of RCU read-side section at the moment IPI was just fired during the expedite process. However, current code reports the quiescent state even in the case: 1) the current task is still within a RCU read-side section 2) the current task has been blocked within the RCU read-side section Since we don't get to the quiescent state yet in the case, we shouldn't report it but check it another time. Signed-off-by: Byungchul Park --- kernel/rcu/tree_exp.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 73e1d3d..cc69d14 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info) /* * We are either exiting an RCU read-side critical section (negative * values of t->rcu_read_lock_nesting) or are not in one at all -* (zero value of t->rcu_read_lock_nesting). Or we are in an RCU -* read-side critical section that blocked before this expedited -* grace period started. Either way, we can immediately report -* the quiescent state. +* (zero value of t->rcu_read_lock_nesting). We can immediately +* report the quiescent state. */ - rdp = this_cpu_ptr(rsp->rda); - rcu_report_exp_rdp(rsp, rdp, true); + if (t->rcu_read_lock_nesting <= 0) { + rdp = this_cpu_ptr(rsp->rda); + rcu_report_exp_rdp(rsp, rdp, true); + } } /** -- 1.9.1
Re: [PATCH v8 15/15] dt-bindings: cpufreq: Document operating-points-v2-krait-cpu
On 3/6/2018 3:49 AM, Rob Herring wrote: > On Tue, Feb 27, 2018 at 07:37:02PM +0530, Sricharan R wrote: >> In Certain QCOM SoCs like ipq8064, apq8064, msm8960, msm8974 >> that has KRAIT processors the voltage/current value of each OPP >> varies based on the silicon variant in use. >> operating-points-v2-krait-cpu specifies the phandle to nvmem efuse cells >> and the operating-points-v2 table for each opp. The qcom-cpufreq driver >> reads the efuse value from the SoC to provide the required information >> that is used to determine the voltage and current value for each OPP of >> operating-points-v2 table when it is parsed by the OPP framework. >> >> Signed-off-by: Sricharan R >> --- >> .../devicetree/bindings/cpufreq/krait-cpufreq.txt | 363 >> + >> 1 file changed, 363 insertions(+) >> create mode 100644 >> Documentation/devicetree/bindings/cpufreq/krait-cpufreq.txt > > Reviewed-by: Rob Herring Thanks Rob !! Will post with all tags and the Makefile corrected. Regards, Sricharan -- "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 0/3] Improve and extend checkpatch.pl Kconfig help text checks
(+To: Andrew) 2018-03-06 13:52 GMT+09:00 Ulf Magnusson : > On Sat, Feb 24, 2018 at 2:53 PM, Masahiro Yamada > wrote: >> 2018-02-23 10:30 GMT+09:00 Ulf Magnusson : >>> On Fri, Feb 16, 2018 at 10:14 PM, Joe Perches wrote: On Fri, 2018-02-16 at 21:22 +0100, Ulf Magnusson wrote: > Hello, > > This patchset contains some improvements for the Kconfig help text check > in > scripts/checkconfig.pl: Seems sensible enough to me. Signed-off-by: Joe Perches >>> >>> Will you be taking this in yourself? >>> >>> (Adding Masahiro on CC -- forgot when I sent the patchset.) >>> >>> Cheers, >>> Ulf >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> I am not a perl expert, but I have no objection for this series. >> >> >> Thanks! >> >> >> >> >> -- >> Best Regards >> Masahiro Yamada > > *Bump* Who is addressed by "*Bump*" ? I think patches for checkpatch.pl are supposed to be taken care of by Andrew. He forwards patches to Linus. $ git log --no-merges --pretty=fuller scripts/checkpatch.pl | grep 'Commit:' | sort | uniq -c | sort -nr 555 Commit: Linus Torvalds 16 Commit: Linus Torvalds 4 Commit: Paul E. McKenney 4 Commit: Michael S. Tsirkin 2 Commit: Thomas Gleixner 2 Commit: Ingo Molnar 2 Commit: Greg Kroah-Hartman 1 Commit: Tobin C. Harding 1 Commit: Rob Herring 1 Commit: Petr Mladek 1 Commit: Michal Marek 1 Commit: Mauro Carvalho Chehab 1 Commit: Masahiro Yamada 1 Commit: Lucas De Marchi 1 Commit: Jiri Kosina 1 Commit: Dan Williams 1 Commit: Bjorn Helgaas -- Best Regards Masahiro Yamada
[linux-next:master 5332/5518] drivers/net/ethernet/marvell/mvpp2.c:4288:5: sparse: symbol 'mvpp2_check_hw_buf_num' was not declared. Should it be static?
tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master head: 9c142d8a6556f069be6278ccab701039da81ad6f commit: effbf5f58d64b1d1f93cb687d9797b42f291d5fd [5332/5518] net: mvpp2: update the BM buffer free/destroy logic reproduce: # apt-get install sparse git checkout effbf5f58d64b1d1f93cb687d9797b42f291d5fd make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/marvell/mvpp2.c:4288:5: sparse: symbol >> 'mvpp2_check_hw_buf_num' was not declared. Should it be static? drivers/net/ethernet/marvell/mvpp2.c:6620:36: sparse: incorrect type in argument 2 (different base types) @@expected int [signed] l3_proto @@ got restricted __be1int [signed] l3_proto @@ drivers/net/ethernet/marvell/mvpp2.c:6620:36:expected int [signed] l3_proto drivers/net/ethernet/marvell/mvpp2.c:6620:36:got restricted __be16 [usertype] protocol Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[RFC PATCH linux-next] net: mvpp2: mvpp2_check_hw_buf_num() can be static
Fixes: effbf5f58d64 ("net: mvpp2: update the BM buffer free/destroy logic") Signed-off-by: Fengguang Wu --- mvpp2.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c index c7b8093..c360430 100644 --- a/drivers/net/ethernet/marvell/mvpp2.c +++ b/drivers/net/ethernet/marvell/mvpp2.c @@ -4285,7 +4285,7 @@ static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv, } /* Check number of buffers in BM pool */ -int mvpp2_check_hw_buf_num(struct mvpp2 *priv, struct mvpp2_bm_pool *bm_pool) +static int mvpp2_check_hw_buf_num(struct mvpp2 *priv, struct mvpp2_bm_pool *bm_pool) { int buf_num = 0;
[PATCH] kernel/memremap: Remove stale devres_free() call
devm_memremap_pages() was re-worked in e8d513483300 to take a caller allocated struct dev_pagemap as a function parameter. A call to devres_free() was left in the error cleanup path which results in a kernel panic if the remap fails for some reason. Remove it to fix the panic and let devm_memremap_pages() fail gracefully. Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface to use struct dev_pagemap") Cc: Logan Gunthorpe Cc: Christoph Hellwig Cc: Dan Williams Signed-off-by: Oliver O'Halloran --- Both in-tree users of devm_memremap_pages() embed dev_pagemap into other structures so this shouldn't cause any leaks. Logan's p2p series does add one usage that assumes pgmap will be freed on error so that'll need fixing. --- kernel/memremap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 4dd4274cabe2..895e6b76b25e 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -427,7 +427,6 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) err_pfn_remap: err_radix: pgmap_radix_release(res, pgoff); - devres_free(pgmap); return ERR_PTR(error); } EXPORT_SYMBOL(devm_memremap_pages); -- 2.9.5
Re: [PATCH 0/3] Improve and extend checkpatch.pl Kconfig help text checks
On Sat, Feb 24, 2018 at 2:53 PM, Masahiro Yamada wrote: > 2018-02-23 10:30 GMT+09:00 Ulf Magnusson : >> On Fri, Feb 16, 2018 at 10:14 PM, Joe Perches wrote: >>> On Fri, 2018-02-16 at 21:22 +0100, Ulf Magnusson wrote: Hello, This patchset contains some improvements for the Kconfig help text check in scripts/checkconfig.pl: >>> >>> Seems sensible enough to me. >>> Signed-off-by: Joe Perches >> >> Will you be taking this in yourself? >> >> (Adding Masahiro on CC -- forgot when I sent the patchset.) >> >> Cheers, >> Ulf >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > I am not a perl expert, but I have no objection for this series. > > > Thanks! > > > > > -- > Best Regards > Masahiro Yamada *Bump*
Re: [PATCH 2/3] vfio: Add support for unmanaged or userspace managed SR-IOV
Hi Alexander, Thank you for the patch! Yet something to improve: [auto build test ERROR on pci/next] [also build test ERROR on v4.16-rc4 next-20180305] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/pci-iov-Add-support-for-unmanaged-SR-IOV/20180306-063954 base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next config: s390-default_defconfig (attached as .config) compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=s390 All errors (new ones prefixed by >>): drivers/vfio/pci/vfio_pci.c: In function 'vfio_pci_sriov_configure': >> drivers/vfio/pci/vfio_pci.c:1291:8: error: implicit declaration of function >> 'pci_sriov_configure_unmanaged'; did you mean 'pci_write_config_dword'? >> [-Werror=implicit-function-declaration] err = pci_sriov_configure_unmanaged(pdev, nr_virtfn); ^ pci_write_config_dword At top level: drivers/vfio/pci/vfio_pci.c:1265:12: warning: 'vfio_pci_sriov_configure' defined but not used [-Wunused-function] static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) ^~~~ cc1: some warnings being treated as errors vim +1291 drivers/vfio/pci/vfio_pci.c 1264 1265 static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) 1266 { 1267 struct vfio_pci_device *vdev; 1268 struct vfio_device *device; 1269 int err; 1270 1271 device = vfio_device_get_from_dev(&pdev->dev); 1272 if (device == NULL) 1273 return -ENODEV; 1274 1275 vdev = vfio_device_data(device); 1276 if (vdev == NULL) { 1277 vfio_device_put(device); 1278 return -ENODEV; 1279 } 1280 1281 /* 1282 * If a userspace process is already using this device just return 1283 * busy and don't allow for any changes. 1284 */ 1285 if (vdev->refcnt) { 1286 pci_warn(pdev, 1287 "PF is currently in use, blocked until released by user\n"); 1288 return -EBUSY; 1289 } 1290 > 1291 err = pci_sriov_configure_unmanaged(pdev, nr_virtfn); 1292 if (err <= 0) 1293 return err; 1294 1295 /* 1296 * We are now leaving VFs in the control of some unknown PF entity. 1297 * 1298 * Best case is a well behaved userspace PF is expected and any VMs 1299 * that the VFs will be assigned to are dependent on the userspace 1300 * entity anyway. An example being NFV where maybe the PF is acting 1301 * as an accelerated interface for a firewall or switch. 1302 * 1303 * Worst case is somebody really messed up and just enabled SR-IOV 1304 * on a device they were planning to assign to a VM somwhere. 1305 * 1306 * In either case it is probably best for us to set the taint flag 1307 * and warn the user since this could get really ugly really quick 1308 * if this wasn't what they were planning to do. 1309 */ 1310 add_taint(TAINT_USER, LOCKDEP_STILL_OK); 1311 pci_warn(pdev, 1312 "Adding kernel taint for vfio-pci now managing SR-IOV PF device\n"); 1313 1314 return nr_virtfn; 1315 } 1316 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
RE: [PATCH 5/6] dma-mapping: support fsl-mc bus
> From: Robin Murphy [mailto:robin.mur...@arm.com] > Sent: Tuesday, March 06, 2018 0:22 > > On 05/03/18 18:39, Christoph Hellwig wrote: > > On Mon, Mar 05, 2018 at 03:48:32PM +, Robin Murphy wrote: > >> Unfortunately for us, fsl-mc is conceptually rather like PCI in that it's > >> software-discoverable and the only thing described in DT is the bus "host", > >> thus we need the same sort of thing as for PCI to map from the child > >> devices back to the bus root in order to find the appropriate firmware > >> node. Worse than PCI, though, we wouldn't even have the option of > >> describing child devices statically in firmware at all, since it's actually > >> one of these runtime-configurable "build your own network accelerator" > >> hardware pools where userspace gets to create and destroy "devices" as it > >> likes. > > > > I really hate the PCI special case just as much. Maybe we just > > need a dma_configure method on the bus, and move PCI as well as fsl-mc > > to it. > > Hmm, on reflection, 100% ack to that idea. It would neatly supersede > bus->force_dma *and* mean that we don't have to effectively pull pci.h > into everything, which I've never liked. In hindsight dma_configure() > does feel like it's grown into this odd choke point where we munge > everything in just for it to awkwardly unpick things again. > > Robin. +1 to the idea. Sorry for asking a trivial question - looking into dma_configure() I see that PCI is used in the start and the end of the API. In the end part pci_put_host_bridge_device() is called. So are two bus callbacks something like 'dma_config_start' & 'dma_config_end' will be required where the former one will return "dma_dev"? Regards, Nipun
Re: [PATCH] dump_stack: convert generic dump_stack into a weak symbol
On (03/06/18 10:50), Greentime Hu wrote: [..] > > Greentime Hu, you tested this on nds32. Could I use your Tested-by, > > please? > > > > Yes, please use it. :) Thanks. To be sure, is this Tested-by: Greentime Hu # nds32 or Acked-by: Greentime Hu # nds32 ? -ss
Re: [PATCH] dump_stack: convert generic dump_stack into a weak symbol
On (03/05/18 15:48), Petr Mladek wrote: [..] > > I hope that I did not miss anything. I could not try this at > runtime. I think you can. The rules are universal, you can do on x86 something like this --- arch/x86/kernel/dumpstack.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index a2d8a3908670..5d45f406717e 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -375,3 +375,16 @@ static int __init code_bytes_setup(char *s) return 1; } __setup("code_bytes=", code_bytes_setup); + +void dump_stack(void) +{ + dump_stack_print_info(KERN_DEFAULT); + + pr_crit("\t\tLinux\n\n"); + + pr_crit("An error has occurred. To continue:\n" + "Press Enter to return to Linux, or\n" + "Press CTRL+ALT+DEL to restart your computer.\n"); + + pr_crit("\n\n\tPress any key to continue _"); +} --- Should be enough for testing. > Anyway, from my side: > > Reviewed-by: Petr Mladek Thanks. -ss
Re: [PATCH] acpi, nfit: remove redundant __func__ in dev_dbg
On Fri, Mar 02, 2018 at 01:20:49PM +0100, Johannes Thumshirn wrote: > Dynamic debug can be instructed to add the function name to the debug > output using the +f switch, so there is no need for the nfit module to > do it again. If a user decides to add the +f switch for nfit's dynamic > debug this results in double prints of the function name like the > following: > > [ 2391.935383] acpi_nfit_ctl: nfit ACPI0012:00: acpi_nfit_ctl:nmem8 cmd: 10: > func: 1 input length: 0 > > Thus remove the stray __func__ printing. > > Signed-off-by: Johannes Thumshirn Oh, Johannes I noticed that here is one stray one still in drivers/acpi/nfit/mce.c. Do you mind pulling it into your patch to keep the drivers/acpi/nfit/* changes together?
Re: [PATCH 4/7] Protectable Memory
snip . . . + +config PROTECTABLE_MEMORY +bool +depends on MMU Curious, would you also want to depend on "SECURITY" as well, as this is being advertised as a compliment to __read_only_after_init, per the file header comments, as I'm assuming ro_after_init would be disabled if the SECURITY Kconfig selection is *NOT* selected? +depends on ARCH_HAS_SET_MEMORY +select GENERIC_ALLOCATOR +default y diff --git a/mm/Makefile b/mm/Makefile index e669f02c5a54..959fdbdac118 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += page_poison.o obj-$(CONFIG_SLAB) += slab.o diff --git a/mm/pmalloc.c b/mm/pmalloc.c new file mode 100644 index ..acdec0fbdde6 --- /dev/null +++ b/mm/pmalloc.c @@ -0,0 +1,468 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * pmalloc.c: Protectable Memory Allocator + * + * (C) Copyright 2017 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +/* + * pmalloc_data contains the data specific to a pmalloc pool, + * in a format compatible with the design of gen_alloc. + * Some of the fields are used for exposing the corresponding parameter + * to userspace, through sysfs. + */ +struct pmalloc_data { + struct gen_pool *pool; /* Link back to the associated pool. */ + bool protected; /* Status of the pool: RO or RW. */ nitpick, you could probably get a tad bit better byte packing alignment of this struct if "bool protected" was stuck as the last element in this data structure. + struct kobj_attribute attr_protected; /* Sysfs attribute. */ + struct kobj_attribute attr_avail; /* Sysfs attribute. */ + struct kobj_attribute attr_size; /* Sysfs attribute. */ + struct kobj_attribute attr_chunks;/* Sysfs attribute. */ + struct kobject *pool_kobject; + struct list_head node; /* list of pools */ +}; + +static LIST_HEAD(pmalloc_final_list); +static LIST_HEAD(pmalloc_tmp_list); +static struct list_head *pmalloc_list = &pmalloc_tmp_list; +static DEFINE_MUTEX(pmalloc_mutex); +static struct kobject *pmalloc_kobject; + +static ssize_t pmalloc_pool_show_protected(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_protected); + if (data->protected) + return sprintf(buf, "protected\n"); + else + return sprintf(buf, "unprotected\n"); +} + +static ssize_t pmalloc_pool_show_avail(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_avail); + return sprintf(buf, "%lu\n", + (unsigned long)gen_pool_avail(data->pool)); +} + +static ssize_t pmalloc_pool_show_size(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + + data = container_of(attr, struct pmalloc_data, attr_size); + return sprintf(buf, "%lu\n", + (unsigned long)gen_pool_size(data->pool)); +} Curious, will this show the size in bytes? + +static void pool_chunk_number(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long *counter = data; + + (*counter)++; +} + +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev, + struct kobj_attribute *attr, + char *buf) +{ + struct pmalloc_data *data; + unsigned long chunks_num = 0; + + data = container_of(attr, struct pmalloc_data, attr_chunks); + gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num); + return sprintf(buf, "%lu\n", chunks_num); +} + +/* Exposes the pool and its attributes through sysfs. */ +static struct kobject *pmalloc_connect(struct pmalloc_data *data) +{ + const struct attribute *attrs[] = { + &data->attr_protected.attr, + &data->attr_avail.attr, + &data->attr_size.attr, + &data->attr_chunks.attr, + NULL + }; + struct kobject *kobj; + + kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject); + if (unlikely(!kobj)) +
Re: [PATCH] device-dax: remove redundant __func__ in dev_dbg
On Mon, Mar 05, 2018 at 05:09:32PM -0800, Dan Williams wrote: > Dynamic debug can be instructed to add the function name to the debug > output using the +f switch, so there is no need for the dax modules to > do it again. If a user decides to add the +f switch for the dax modules' > dynamic debug this results in double prints of the function name. > > Reported-by: Johannes Thumshirn > Reported-by: Ross Zwisler > Signed-off-by: Dan Williams Looks good to me. Reviewed-by: Ross Zwisler
Re: [PATCH] libnvdimm: remove redundant __func__ in dev_dbg
On Mon, Mar 05, 2018 at 05:09:21PM -0800, Dan Williams wrote: > Dynamic debug can be instructed to add the function name to the debug > output using the +f switch, so there is no need for the libnvdimm > modules to do it again. If a user decides to add the +f switch for > libnvdimm's dynamic debug this results in double prints of the function > name. > > Reported-by: Johannes Thumshirn > Reported-by: Ross Zwisler > Signed-off-by: Dan Williams > --- > drivers/nvdimm/badrange.c |3 +- > drivers/nvdimm/btt_devs.c | 21 > drivers/nvdimm/bus.c| 13 +- > drivers/nvdimm/claim.c |2 +- > drivers/nvdimm/core.c |4 ++- > drivers/nvdimm/dax_devs.c |5 ++-- > drivers/nvdimm/dimm_devs.c |7 ++--- > drivers/nvdimm/label.c | 51 > ++- > drivers/nvdimm/namespace_devs.c | 38 - > drivers/nvdimm/pfn_devs.c | 25 +-- > drivers/nvdimm/pmem.c |2 +- > 11 files changed, 77 insertions(+), 94 deletions(-) > > diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c > index e068d72b4357..df17f1cd696d 100644 > --- a/drivers/nvdimm/badrange.c > +++ b/drivers/nvdimm/badrange.c > @@ -176,8 +176,7 @@ static void set_badblock(struct badblocks *bb, sector_t > s, int num) > (u64) s * 512, (u64) num * 512); > /* this isn't an error as the hardware will still throw an exception */ > if (badblocks_set(bb, s, num, 1)) > - dev_info_once(bb->dev, "%s: failed for sector %llx\n", > - __func__, (u64) s); > + dev_info_once(bb->dev, "failed for sector %llx\n", (u64) s); I don't think you should remove this one. dev_info_once() is just a printk(), and doesn't inherit the +f flag from the dynamic debugging code. The __func__ here does add value. The rest of these look correct, though I think you missed one in each of nvdimm_map_release() and validate_dimm(). (I made these changes as well, but you sent out your patch first. :)
[PATCH v2 1/2] perf sched: move thread::shortname to thread_runtime
From: Changbin Du The thread::shortname only used by sched command, so move it to sched private structure. Signed-off-by: Changbin Du --- tools/perf/builtin-sched.c | 95 +++--- tools/perf/util/thread.h | 1 - 2 files changed, 55 insertions(+), 41 deletions(-) diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 83283fe..5bfc8d5 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -255,6 +255,8 @@ struct thread_runtime { int last_state; u64 migrations; + + char shortname[3]; }; /* per event run time data */ @@ -897,6 +899,37 @@ struct sort_dimension { struct list_headlist; }; +/* + * handle runtime stats saved per thread + */ +static struct thread_runtime *thread__init_runtime(struct thread *thread) +{ + struct thread_runtime *r; + + r = zalloc(sizeof(struct thread_runtime)); + if (!r) + return NULL; + + init_stats(&r->run_stats); + thread__set_priv(thread, r); + + return r; +} + +static struct thread_runtime *thread__get_runtime(struct thread *thread) +{ + struct thread_runtime *tr; + + tr = thread__priv(thread); + if (tr == NULL) { + tr = thread__init_runtime(thread); + if (tr == NULL) + pr_debug("Failed to malloc memory for runtime data.\n"); + } + + return tr; +} + static int thread_lat_cmp(struct list_head *list, struct work_atoms *l, struct work_atoms *r) { @@ -1480,6 +1513,7 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, { const u32 next_pid = perf_evsel__intval(evsel, sample, "next_pid"); struct thread *sched_in; + struct thread_runtime *tr; int new_shortname; u64 timestamp0, timestamp = sample->time; s64 delta; @@ -1519,22 +1553,28 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, if (sched_in == NULL) return -1; + tr = thread__get_runtime(sched_in); + if (tr == NULL) { + thread__put(sched_in); + return -1; + } + sched->curr_thread[this_cpu] = thread__get(sched_in); printf(" "); new_shortname = 0; - if (!sched_in->shortname[0]) { + if (!tr->shortname[0]) { if (!strcmp(thread__comm_str(sched_in), "swapper")) { /* * Don't allocate a letter-number for swapper:0 * as a shortname. Instead, we use '.' for it. */ - sched_in->shortname[0] = '.'; - sched_in->shortname[1] = ' '; + tr->shortname[0] = '.'; + tr->shortname[1] = ' '; } else { - sched_in->shortname[0] = sched->next_shortname1; - sched_in->shortname[1] = sched->next_shortname2; + tr->shortname[0] = sched->next_shortname1; + tr->shortname[1] = sched->next_shortname2; if (sched->next_shortname1 < 'Z') { sched->next_shortname1++; @@ -1552,6 +1592,7 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, for (i = 0; i < cpus_nr; i++) { int cpu = sched->map.comp ? sched->map.comp_cpus[i] : i; struct thread *curr_thread = sched->curr_thread[cpu]; + struct thread_runtime *curr_tr; const char *pid_color = color; const char *cpu_color = color; @@ -1569,9 +1610,14 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, else color_fprintf(stdout, cpu_color, "*"); - if (sched->curr_thread[cpu]) - color_fprintf(stdout, pid_color, "%2s ", sched->curr_thread[cpu]->shortname); - else + if (sched->curr_thread[cpu]) { + curr_tr = thread__get_runtime(sched->curr_thread[cpu]); + if (curr_tr == NULL) { + thread__put(sched_in); + return -1; + } + color_fprintf(stdout, pid_color, "%2s ", curr_tr->shortname); + } else color_fprintf(stdout, color, " "); } @@ -1587,7 +1633,7 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, pid_color = COLOR_PIDS; color_fprintf(stdout, pid_color, "%s => %s:%d", - sched_in->shortname, thread__comm_str(sched_in), sched_in->tid); + tr->shortname, thread__comm_str(sched_in), sched_in->tid); } if (sched->map.comp
[PATCH v2 2/2] perf sched map: re-annotate shortname if thread comm changed
From: Changbin Du This is to show the real name of thread that created via fork-exec. See below example for shortname *A0*. $ sudo ./perf sched map *A0 80393.050639 secs A0 => perf:22368 *. A0 80393.050748 secs . => swapper:0 . *.80393.050887 secs *B0 . .80393.052735 secs B0 => rcu_sched:8 *. . .80393.052743 secs . *C0 .80393.056264 secs C0 => kworker/2:1H:287 . *A0 .80393.056270 secs . *D0 .80393.056769 secs D0 => ksoftirqd/2:22 - . *A0 .80393.056804 secs + . *A0 .80393.056804 secs A0 => pi:22368 . *. .80393.056854 secs *B0 . .80393.060727 secs ... Cc: Namhyung Kim Cc: Jiri Olsa Signed-off-by: Changbin Du --- v2: add function perf_sched__process_comm() to process PERF_RECORD_COMM event. --- tools/perf/builtin-sched.c | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 5bfc8d5..7aa0600 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -257,6 +257,7 @@ struct thread_runtime { u64 migrations; char shortname[3]; + bool comm_changed; }; /* per event run time data */ @@ -1626,7 +1627,7 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, timestamp__scnprintf_usec(timestamp, stimestamp, sizeof(stimestamp)); color_fprintf(stdout, color, " %12s secs ", stimestamp); - if (new_shortname || (verbose > 0 && sched_in->tid)) { + if (new_shortname || tr->comm_changed || (verbose > 0 && sched_in->tid)) { const char *pid_color = color; if (thread__has_color(sched_in)) @@ -1634,6 +1635,7 @@ static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel, color_fprintf(stdout, pid_color, "%s => %s:%d", tr->shortname, thread__comm_str(sched_in), sched_in->tid); + tr->comm_changed = false; } if (sched->map.comp && new_cpu) @@ -1737,6 +1739,37 @@ static int perf_sched__process_tracepoint_sample(struct perf_tool *tool __maybe_ return err; } +static int perf_sched__process_comm(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + struct thread *thread; + struct thread_runtime *tr; + int err; + + err = perf_event__process_comm(tool, event, sample, machine); + if (err) + return err; + + thread = machine__find_thread(machine, sample->pid, sample->tid); + if (!thread) { + pr_err("Internal error: can't find thread\n"); + return -1; + } + + tr = thread__get_runtime(thread); + if (tr == NULL) { + thread__put(thread); + return -1; + } + + tr->comm_changed = true; + thread__put(thread); + + return 0; +} + static int perf_sched__read_events(struct perf_sched *sched) { const struct perf_evsel_str_handler handlers[] = { @@ -3306,7 +3339,7 @@ int cmd_sched(int argc, const char **argv) struct perf_sched sched = { .tool = { .sample = perf_sched__process_tracepoint_sample, - .comm= perf_event__process_comm, + .comm= perf_sched__process_comm, .namespaces = perf_event__process_namespaces, .lost= perf_event__process_lost, .fork= perf_sched__process_fork_event, -- 2.7.4
[PATCH v2 0/2] perf sched map: re-annotate shortname if thread comm changed
From: Changbin Du v2: o add a patch to move thread::shortname to thread_runtime o add function perf_sched__process_comm() to process PERF_RECORD_COMM event. Changbin Du (2): perf sched: move thread::shortname to thread_runtime perf sched map: re-annotate shortname if thread comm changed tools/perf/builtin-sched.c | 132 ++--- tools/perf/util/thread.h | 1 - 2 files changed, 90 insertions(+), 43 deletions(-) -- 2.7.4
Re: [RESEND PATCH] perf sched map: re-annotate shortname if thread comm changed
I just done final version, please check v2. Thanks for your comments! On Mon, Mar 05, 2018 at 11:37:54PM +0100, Jiri Olsa wrote: > On Mon, Mar 05, 2018 at 03:11:36PM +0800, Du, Changbin wrote: > > SNIP > > > > > on the other hand it's simple enough and looks > > > > like generic solution would be more tricky > > > > > > What about adding perf_sched__process_comm() to set it in the > > > thread::priv? > > > > > I can be done, then thread->comm_changed moves to > > thread_runtime->comm_changed. > > Draft code as below. It is also a little tricky. > > > > +int perf_sched__process_comm(struct perf_tool *tool __maybe_unused, > > +union perf_event *event, > > +struct perf_sample *sample, > > +struct machine *machine) > > +{ > > + struct thread *thread; > > + struct thread_runtime *r; > > + > > + perf_event__process_comm(tool, event, sample, machine); > > + > > + thread = machine__findnew_thread(machine, pid, tid); > > should you use machine__find_thread in here? > > > + if (thread) { > > + r = thread__priv(thread); > > + if (r) > > + r->comm_changed = true; > > + thread__put(thread); > > + } > > +} > > + > > static int perf_sched__read_events(struct perf_sched *sched) > > { > > const struct perf_evsel_str_handler handlers[] = { > > @@ -3291,7 +3311,7 @@ int cmd_sched(int argc, const char **argv) > > struct perf_sched sched = { > > .tool = { > > .sample = > > perf_sched__process_tracepoint_sample, > > - .comm= perf_event__process_comm, > > + .comm= perf_sched__process_comm, > > > > > > But I'd keep 'comm_changed' where 'shortname' is defined. I think they > > should appears > > togother. And 'shortname' is only used by sched command, too. > > they can both go to struct thread_runtime then > > > > > So I still prefer my privous simpler change. Thanks! > > I was wrong thinking that the amount of code > making it sched specific would be biger > > we're trying to keep the core structs generic, > so this one fits better > > thanks, > jirka -- Thanks, Changbin Du
linux-next: Tree for Mar 6
Hi all, Changes since 20180305: The mali-dp tree gained a conflict against the drm-misc tree. Non-merge commits (relative to Linus' tree): 4880 5418 files changed, 202951 insertions(+), 143721 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 260 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (661e50bc8532 Linux 4.16-rc4) Merging fixes/master (7928b2cbe55b Linux 4.16-rc1) Merging kbuild-current/fixes (638e69cf2230 fixdep: do not ignore kconfig.h) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index) Merging arm64-fixes/for-next/fixes (b08e5fd90bfc arm_pmu: Use disable_irq_nosync when disabling SPI in CPU teardown hook) Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem entry) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (e7666d046ac0 ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL) Merging sparc/master (aebb48f5e465 sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (a7f0fb1bfb66 Merge branch 'hv_netvsc-minor-fixes') Merging bpf/master (d02f51cbcf12 bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs) Merging ipsec/master (b8b549eec818 xfrm: Fix ESN sequence number handling for IPsec GSO packets.) Merging netfilter/master (4e00f5d5f9fc Merge tag 'batadv-net-for-davem-20180302' of git://git.open-mesh.org/linux-merge) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (78dc897b7ee6 rtlwifi: rtl8723be: Fix loss of signal) Merging mac80211/master (a78872363614 cfg80211: add missing dependency to CFG80211 suboptions) Merging rdma-fixes/for-rc (4cd482c12be4 IB/core : Add null pointer check in addr_resolve) Merging sound-current/for-linus (d5078193e56b ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines) Merging pci-current/for-linus (c37406e05d1e PCI: Allow release of resources that were never assigned) Merging driver-core.current/driver-core-linus (4a3928c6f8a5 Linux 4.16-rc3) Merging tty.current/tty-linus (5d7f77ec72d1 serial: imx: fix bogus dev_err) Merging usb.current/usb-linus (4a3928c6f8a5 Linux 4.16-rc3) Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: add binging for r8a77965) Merging usb-serial-fixes/usb-linus (0a17f9fef994 USB: serial: ftdi_sio: add RT Systems VX-8 cable) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (7928b2cbe55b Linux 4.16-rc1) Merging staging.current/staging-linus (cb57469c9573 staging: android: ashmem: Fix lockdep issue during llseek) Merging char-misc.current/char-misc-linus (4a3928c6f8a5 Linux 4.16-rc3) Merging input-current/for-linus (ea4f7bd2aca9 Input: matrix_keypad - fix race when disabling interrupts) Merging crypto-current/master (c927b080c67e crypto: s5p-sss - Fix kernel Oops in AES-ECB mode) Merging ide/master (8e44e6600caa Merge branch 'KASAN-read_word_at_a_time') Mer
[PATCH v7 14/14] iommu/rockchip: Support sharing IOMMU between masters
There would be some masters sharing the same IOMMU device. Put them in the same iommu group and share the same iommu domain. Signed-off-by: Jeffy Chen Reviewed-by: Robin Murphy --- Changes in v7: Use iommu_group_ref_get to avoid ref leak Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: Remove rk_iommudata->domain. Changes in v2: None drivers/iommu/rockchip-iommu.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index db08978203f7..6a1c7efa7c17 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -104,6 +104,7 @@ struct rk_iommu { struct iommu_device iommu; struct list_head node; /* entry in rk_iommu_domain.iommus */ struct iommu_domain *domain; /* domain to which iommu is attached */ + struct iommu_group *group; }; /** @@ -1091,6 +1092,15 @@ static void rk_iommu_remove_device(struct device *dev) iommu_group_remove_device(dev); } +static struct iommu_group *rk_iommu_device_group(struct device *dev) +{ + struct rk_iommu *iommu; + + iommu = rk_iommu_from_dev(dev); + + return iommu_group_ref_get(iommu->group); +} + static int rk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) { @@ -1122,7 +1132,7 @@ static const struct iommu_ops rk_iommu_ops = { .add_device = rk_iommu_add_device, .remove_device = rk_iommu_remove_device, .iova_to_phys = rk_iommu_iova_to_phys, - .device_group = generic_device_group, + .device_group = rk_iommu_device_group, .pgsize_bitmap = RK_IOMMU_PGSIZE_BITMAP, .of_xlate = rk_iommu_of_xlate, }; @@ -1191,9 +1201,15 @@ static int rk_iommu_probe(struct platform_device *pdev) if (err) return err; + iommu->group = iommu_group_alloc(); + if (IS_ERR(iommu->group)) { + err = PTR_ERR(iommu->group); + goto err_unprepare_clocks; + } + err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, dev_name(dev)); if (err) - goto err_unprepare_clocks; + goto err_put_group; iommu_device_set_ops(&iommu->iommu, &rk_iommu_ops); iommu_device_set_fwnode(&iommu->iommu, &dev->of_node->fwnode); @@ -1217,6 +1233,8 @@ static int rk_iommu_probe(struct platform_device *pdev) return 0; err_remove_sysfs: iommu_device_sysfs_remove(&iommu->iommu); +err_put_group: + iommu_group_put(iommu->group); err_unprepare_clocks: clk_bulk_unprepare(iommu->num_clocks, iommu->clocks); return err; -- 2.11.0
[PATCH v7 13/14] iommu/rockchip: Add runtime PM support
When the power domain is powered off, the IOMMU cannot be accessed and register programming must be deferred until the power domain becomes enabled. Add runtime PM support, and use runtime PM device link from IOMMU to master to startup and shutdown IOMMU. Signed-off-by: Jeffy Chen --- Changes in v7: Add WARN_ON in irq isr, and modify iommu archdata comment. Changes in v6: None Changes in v5: Avoid race about pm_runtime_get_if_in_use() and pm_runtime_enabled(). Changes in v4: None Changes in v3: Only call startup() and shutdown() when iommu attached. Remove pm_mutex. Check runtime PM disabled. Check pm_runtime in rk_iommu_irq(). Changes in v2: None drivers/iommu/rockchip-iommu.c | 189 - 1 file changed, 148 insertions(+), 41 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 2448a0528e39..db08978203f7 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -105,7 +106,14 @@ struct rk_iommu { struct iommu_domain *domain; /* domain to which iommu is attached */ }; +/** + * struct rk_iommudata - iommu archdata of master device. + * @link: device link with runtime PM integration from the master + * (consumer) to the IOMMU (supplier). + * @iommu: IOMMU of the master device. + */ struct rk_iommudata { + struct device_link *link; struct rk_iommu *iommu; }; @@ -518,7 +526,13 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) u32 int_status; dma_addr_t iova; irqreturn_t ret = IRQ_NONE; - int i; + bool need_runtime_put; + int i, err; + + err = pm_runtime_get_if_in_use(iommu->dev); + if (WARN_ON(err <= 0 && err != -EINVAL)) + return ret; + need_runtime_put = err > 0; WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); @@ -570,6 +584,9 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) clk_bulk_disable(iommu->num_clocks, iommu->clocks); + if (need_runtime_put) + pm_runtime_put(iommu->dev); + return ret; } @@ -611,10 +628,20 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain *rk_domain, spin_lock_irqsave(&rk_domain->iommus_lock, flags); list_for_each(pos, &rk_domain->iommus) { struct rk_iommu *iommu; + int ret; + iommu = list_entry(pos, struct rk_iommu, node); - WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); - rk_iommu_zap_lines(iommu, iova, size); - clk_bulk_disable(iommu->num_clocks, iommu->clocks); + + /* Only zap TLBs of IOMMUs that are powered on. */ + ret = pm_runtime_get_if_in_use(iommu->dev); + if (ret > 0 || ret == -EINVAL) { + WARN_ON(clk_bulk_enable(iommu->num_clocks, + iommu->clocks)); + rk_iommu_zap_lines(iommu, iova, size); + clk_bulk_disable(iommu->num_clocks, iommu->clocks); + } + if (ret > 0) + pm_runtime_put(iommu->dev); } spin_unlock_irqrestore(&rk_domain->iommus_lock, flags); } @@ -817,22 +844,30 @@ static struct rk_iommu *rk_iommu_from_dev(struct device *dev) return data ? data->iommu : NULL; } -static int rk_iommu_attach_device(struct iommu_domain *domain, - struct device *dev) +/* Must be called with iommu powered on and attached */ +static void rk_iommu_shutdown(struct rk_iommu *iommu) { - struct rk_iommu *iommu; + int i; + + /* Ignore error while disabling, just keep going */ + WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); + rk_iommu_enable_stall(iommu); + rk_iommu_disable_paging(iommu); + for (i = 0; i < iommu->num_mmu; i++) { + rk_iommu_write(iommu->bases[i], RK_MMU_INT_MASK, 0); + rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 0); + } + rk_iommu_disable_stall(iommu); + clk_bulk_disable(iommu->num_clocks, iommu->clocks); +} + +/* Must be called with iommu powered on and attached */ +static int rk_iommu_startup(struct rk_iommu *iommu) +{ + struct iommu_domain *domain = iommu->domain; struct rk_iommu_domain *rk_domain = to_rk_domain(domain); - unsigned long flags; int ret, i; - /* -* Allow 'virtual devices' (e.g., drm) to attach to domain. -* Such a device does not belong to an iommu group. -*/ - iommu = rk_iommu_from_dev(dev); - if (!iommu) - return 0; - ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks); if (ret) return ret; @@ -845,8 +880,6 @@ static int rk_iommu_attach_device(struct io
[PATCH v7 12/14] iommu/rockchip: Fix error handling in init
It's hard to undo bus_set_iommu() in the error path, so move it to the end of rk_iommu_probe(). Signed-off-by: Jeffy Chen Reviewed-by: Tomasz Figa Reviewed-by: Robin Murphy --- Changes in v7: None Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: Move bus_set_iommu() to rk_iommu_probe(). drivers/iommu/rockchip-iommu.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 1346bbb8a3e7..2448a0528e39 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -1133,6 +1133,8 @@ static int rk_iommu_probe(struct platform_device *pdev) if (!dma_dev) dma_dev = &pdev->dev; + bus_set_iommu(&platform_bus_type, &rk_iommu_ops); + return 0; err_remove_sysfs: iommu_device_sysfs_remove(&iommu->iommu); @@ -1158,19 +1160,6 @@ static struct platform_driver rk_iommu_driver = { static int __init rk_iommu_init(void) { - struct device_node *np; - int ret; - - np = of_find_matching_node(NULL, rk_iommu_dt_ids); - if (!np) - return 0; - - of_node_put(np); - - ret = bus_set_iommu(&platform_bus_type, &rk_iommu_ops); - if (ret) - return ret; - return platform_driver_register(&rk_iommu_driver); } subsys_initcall(rk_iommu_init); -- 2.11.0
[PATCH v7 11/14] iommu/rockchip: Use OF_IOMMU to attach devices automatically
Converts the rockchip-iommu driver to use the OF_IOMMU infrastructure, which allows attaching master devices to their IOMMUs automatically according to DT properties. Signed-off-by: Jeffy Chen Reviewed-by: Robin Murphy --- Changes in v7: None Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: Add struct rk_iommudata. Squash iommu/rockchip: Use iommu_group_get_for_dev() for add_device Changes in v2: None drivers/iommu/rockchip-iommu.c | 135 - 1 file changed, 40 insertions(+), 95 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 6789e11b7087..1346bbb8a3e7 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -104,6 +105,10 @@ struct rk_iommu { struct iommu_domain *domain; /* domain to which iommu is attached */ }; +struct rk_iommudata { + struct rk_iommu *iommu; +}; + static struct device *dma_dev; static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma, @@ -807,18 +812,9 @@ static size_t rk_iommu_unmap(struct iommu_domain *domain, unsigned long _iova, static struct rk_iommu *rk_iommu_from_dev(struct device *dev) { - struct iommu_group *group; - struct device *iommu_dev; - struct rk_iommu *rk_iommu; + struct rk_iommudata *data = dev->archdata.iommu; - group = iommu_group_get(dev); - if (!group) - return NULL; - iommu_dev = iommu_group_get_iommudata(group); - rk_iommu = dev_get_drvdata(iommu_dev); - iommu_group_put(group); - - return rk_iommu; + return data ? data->iommu : NULL; } static int rk_iommu_attach_device(struct iommu_domain *domain, @@ -989,110 +985,53 @@ static void rk_iommu_domain_free(struct iommu_domain *domain) iommu_put_dma_cookie(&rk_domain->domain); } -static bool rk_iommu_is_dev_iommu_master(struct device *dev) -{ - struct device_node *np = dev->of_node; - int ret; - - /* -* An iommu master has an iommus property containing a list of phandles -* to iommu nodes, each with an #iommu-cells property with value 0. -*/ - ret = of_count_phandle_with_args(np, "iommus", "#iommu-cells"); - return (ret > 0); -} - -static int rk_iommu_group_set_iommudata(struct iommu_group *group, - struct device *dev) +static int rk_iommu_add_device(struct device *dev) { - struct device_node *np = dev->of_node; - struct platform_device *pd; - int ret; - struct of_phandle_args args; + struct iommu_group *group; + struct rk_iommu *iommu; - /* -* An iommu master has an iommus property containing a list of phandles -* to iommu nodes, each with an #iommu-cells property with value 0. -*/ - ret = of_parse_phandle_with_args(np, "iommus", "#iommu-cells", 0, -&args); - if (ret) { - dev_err(dev, "of_parse_phandle_with_args(%pOF) => %d\n", - np, ret); - return ret; - } - if (args.args_count != 0) { - dev_err(dev, "incorrect number of iommu params found for %pOF (found %d, expected 0)\n", - args.np, args.args_count); - return -EINVAL; - } + iommu = rk_iommu_from_dev(dev); + if (!iommu) + return -ENODEV; - pd = of_find_device_by_node(args.np); - of_node_put(args.np); - if (!pd) { - dev_err(dev, "iommu %pOF not found\n", args.np); - return -EPROBE_DEFER; - } + group = iommu_group_get_for_dev(dev); + if (IS_ERR(group)) + return PTR_ERR(group); + iommu_group_put(group); - /* TODO(djkurtz): handle multiple slave iommus for a single master */ - iommu_group_set_iommudata(group, &pd->dev, NULL); + iommu_device_link(&iommu->iommu, dev); return 0; } -static int rk_iommu_add_device(struct device *dev) +static void rk_iommu_remove_device(struct device *dev) { - struct iommu_group *group; struct rk_iommu *iommu; - int ret; - - if (!rk_iommu_is_dev_iommu_master(dev)) - return -ENODEV; - - group = iommu_group_get(dev); - if (!group) { - group = iommu_group_alloc(); - if (IS_ERR(group)) { - dev_err(dev, "Failed to allocate IOMMU group\n"); - return PTR_ERR(group); - } - } - - ret = iommu_group_add_device(group, dev); - if (ret) - goto err_put_group; - - ret = rk_iommu_group_set_iommudata(group, dev); - if (ret) - goto err_remove_device; iommu = rk_iommu_from_dev(dev); - if (iommu
[PATCH v7 10/14] iommu/rockchip: Use IOMMU device for dma mapping operations
Use the first registered IOMMU device for dma mapping operations, and drop the domain platform device. This is similar to exynos iommu driver. Signed-off-by: Jeffy Chen Reviewed-by: Tomasz Figa Reviewed-by: Robin Murphy --- Changes in v7: None Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: None drivers/iommu/rockchip-iommu.c | 85 -- 1 file changed, 24 insertions(+), 61 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 6c6275589bd5..6789e11b7087 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -79,7 +79,6 @@ struct rk_iommu_domain { struct list_head iommus; - struct platform_device *pdev; u32 *dt; /* page directory table */ dma_addr_t dt_dma; spinlock_t iommus_lock; /* lock for iommus list */ @@ -105,12 +104,14 @@ struct rk_iommu { struct iommu_domain *domain; /* domain to which iommu is attached */ }; +static struct device *dma_dev; + static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma, unsigned int count) { size_t size = count * sizeof(u32); /* count of u32 entry */ - dma_sync_single_for_device(&dom->pdev->dev, dma, size, DMA_TO_DEVICE); + dma_sync_single_for_device(dma_dev, dma, size, DMA_TO_DEVICE); } static struct rk_iommu_domain *to_rk_domain(struct iommu_domain *dom) @@ -625,7 +626,6 @@ static void rk_iommu_zap_iova_first_last(struct rk_iommu_domain *rk_domain, static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain, dma_addr_t iova) { - struct device *dev = &rk_domain->pdev->dev; u32 *page_table, *dte_addr; u32 dte_index, dte; phys_addr_t pt_phys; @@ -643,9 +643,9 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain, if (!page_table) return ERR_PTR(-ENOMEM); - pt_dma = dma_map_single(dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE); - if (dma_mapping_error(dev, pt_dma)) { - dev_err(dev, "DMA mapping error while allocating page table\n"); + pt_dma = dma_map_single(dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE); + if (dma_mapping_error(dma_dev, pt_dma)) { + dev_err(dma_dev, "DMA mapping error while allocating page table\n"); free_page((unsigned long)page_table); return ERR_PTR(-ENOMEM); } @@ -911,29 +911,20 @@ static void rk_iommu_detach_device(struct iommu_domain *domain, static struct iommu_domain *rk_iommu_domain_alloc(unsigned type) { struct rk_iommu_domain *rk_domain; - struct platform_device *pdev; - struct device *iommu_dev; if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; - /* Register a pdev per domain, so DMA API can base on this *dev -* even some virtual master doesn't have an iommu slave -*/ - pdev = platform_device_register_simple("rk_iommu_domain", - PLATFORM_DEVID_AUTO, NULL, 0); - if (IS_ERR(pdev)) + if (!dma_dev) return NULL; - rk_domain = devm_kzalloc(&pdev->dev, sizeof(*rk_domain), GFP_KERNEL); + rk_domain = devm_kzalloc(dma_dev, sizeof(*rk_domain), GFP_KERNEL); if (!rk_domain) - goto err_unreg_pdev; - - rk_domain->pdev = pdev; + return NULL; if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&rk_domain->domain)) - goto err_unreg_pdev; + return NULL; /* * rk32xx iommus use a 2 level pagetable. @@ -944,11 +935,10 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type) if (!rk_domain->dt) goto err_put_cookie; - iommu_dev = &pdev->dev; - rk_domain->dt_dma = dma_map_single(iommu_dev, rk_domain->dt, + rk_domain->dt_dma = dma_map_single(dma_dev, rk_domain->dt, SPAGE_SIZE, DMA_TO_DEVICE); - if (dma_mapping_error(iommu_dev, rk_domain->dt_dma)) { - dev_err(iommu_dev, "DMA map error for DT\n"); + if (dma_mapping_error(dma_dev, rk_domain->dt_dma)) { + dev_err(dma_dev, "DMA map error for DT\n"); goto err_free_dt; } @@ -969,8 +959,6 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type) err_put_cookie: if (type == IOMMU_DOMAIN_DMA) iommu_put_dma_cookie(&rk_domain->domain); -err_unreg_pdev: - platform_device_unregister(pdev); return NULL; } @@ -987,20 +975,18 @@ static void rk_iommu_domain_free(struct iommu_domain *domain) if (rk_dte_is_pt_valid(dte)) { phys_addr_t pt_phys = rk_dte_pt_address(dte);
Re: [PATCH 4.4 000/193] 4.4.118-stable review
Patch #132 (which didn't reach the mailing list) was: > From: Arnd Bergmann > Date: Wed, 26 Oct 2016 15:55:02 -0700 > Subject: Input: tca8418_keypad - hide gcc-4.9 -Wmaybe-uninitialized warning > > commit ea4348c8462a20e8b1b6455a7145d2b86f8a49b6 upstream. This appears to introduce a regression, fixed upstream by: commit 9dd46c02532a6bed6240101ecf4bbc407f8c6adf Author: Dmitry Torokhov Date: Mon Feb 13 15:45:59 2017 -0800 Input: tca8418_keypad - remove double read of key event register Ben. -- Ben Hutchings Software Developer, Codethink Ltd.
[PATCH v7 09/14] dt-bindings: iommu/rockchip: Add clock property
Add clock property, since we are going to control clocks in rockchip iommu driver. Signed-off-by: Jeffy Chen Reviewed-by: Rob Herring --- Changes in v7: None Changes in v6: Fix dt-binding as Robin suggested. Use aclk and iface clk as Rob and Robin suggested, and split binding patch. Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: None Documentation/devicetree/bindings/iommu/rockchip,iommu.txt | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt index 2098f7732264..6ecefea1c6f9 100644 --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt @@ -14,6 +14,11 @@ Required properties: "single-master" device, and needs no additional information to associate with its master device. See: Documentation/devicetree/bindings/iommu/iommu.txt +- clocks : A list of clocks required for the IOMMU to be accessible by +the host CPU. +- clock-names : Should contain the following: + "iface" - Main peripheral bus clock (PCLK/HCL) (required) + "aclk" - AXI bus clock (required) Optional properties: - rockchip,disable-mmu-reset : Don't use the mmu reset operation. @@ -27,5 +32,7 @@ Example: reg = <0xff940300 0x100>; interrupts = ; interrupt-names = "vopl_mmu"; + clocks = <&cru ACLK_VOP1>, <&cru HCLK_VOP1>; + clock-names = "aclk", "iface"; #iommu-cells = <0>; }; -- 2.11.0
[PATCH v7 07/14] ARM: dts: rockchip: add clocks in iommu nodes
Add clocks in iommu nodes, since we are going to control clocks in rockchip iommu driver. Signed-off-by: Jeffy Chen --- Changes in v7: None Changes in v6: Add clk names, and modify all iommu nodes in all existing rockchip dts Changes in v5: Remove clk names. Changes in v4: None Changes in v3: None Changes in v2: None arch/arm/boot/dts/rk3036.dtsi| 2 ++ arch/arm/boot/dts/rk322x.dtsi| 8 arch/arm/boot/dts/rk3288.dtsi| 12 arch/arm64/boot/dts/rockchip/rk3328.dtsi | 10 ++ arch/arm64/boot/dts/rockchip/rk3368.dtsi | 10 ++ arch/arm64/boot/dts/rockchip/rk3399.dtsi | 14 -- 6 files changed, 54 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/rk3036.dtsi b/arch/arm/boot/dts/rk3036.dtsi index a97458112ff6..567a6a725f9c 100644 --- a/arch/arm/boot/dts/rk3036.dtsi +++ b/arch/arm/boot/dts/rk3036.dtsi @@ -197,6 +197,8 @@ reg = <0x10118300 0x100>; interrupts = ; interrupt-names = "vop_mmu"; + clocks = <&cru ACLK_LCDC>, <&cru HCLK_LCDC>; + clock-names = "aclk", "iface"; #iommu-cells = <0>; status = "disabled"; }; diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index df1e47858675..be80e9a2c9af 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -584,6 +584,8 @@ reg = <0x20020800 0x100>; interrupts = ; interrupt-names = "vpu_mmu"; + clocks = <&cru ACLK_VPU>, <&cru HCLK_VPU>; + clock-names = "aclk", "iface"; iommu-cells = <0>; status = "disabled"; }; @@ -593,6 +595,8 @@ reg = <0x20030480 0x40>, <0x200304c0 0x40>; interrupts = ; interrupt-names = "vdec_mmu"; + clocks = <&cru ACLK_RKVDEC>, <&cru HCLK_RKVDEC>; + clock-names = "aclk", "iface"; iommu-cells = <0>; status = "disabled"; }; @@ -602,6 +606,8 @@ reg = <0x20053f00 0x100>; interrupts = ; interrupt-names = "vop_mmu"; + clocks = <&cru ACLK_VOP>, <&cru HCLK_VOP>; + clock-names = "aclk", "iface"; iommu-cells = <0>; status = "disabled"; }; @@ -611,6 +617,8 @@ reg = <0x20070800 0x100>; interrupts = ; interrupt-names = "iep_mmu"; + clocks = <&cru ACLK_IEP>, <&cru HCLK_IEP>; + clock-names = "aclk", "iface"; iommu-cells = <0>; status = "disabled"; }; diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi index 6102e4e7f35c..ad77c8eb3c38 100644 --- a/arch/arm/boot/dts/rk3288.dtsi +++ b/arch/arm/boot/dts/rk3288.dtsi @@ -958,6 +958,8 @@ reg = <0x0 0xff900800 0x0 0x40>; interrupts = ; interrupt-names = "iep_mmu"; + clocks = <&cru ACLK_IEP>, <&cru HCLK_IEP>; + clock-names = "aclk", "iface"; #iommu-cells = <0>; status = "disabled"; }; @@ -967,6 +969,8 @@ reg = <0x0 0xff914000 0x0 0x100>, <0x0 0xff915000 0x0 0x100>; interrupts = ; interrupt-names = "isp_mmu"; + clocks = <&cru ACLK_ISP>, <&cru HCLK_ISP>; + clock-names = "aclk", "iface"; #iommu-cells = <0>; rockchip,disable-mmu-reset; status = "disabled"; @@ -1026,6 +1030,8 @@ reg = <0x0 0xff930300 0x0 0x100>; interrupts = ; interrupt-names = "vopb_mmu"; + clocks = <&cru ACLK_VOP0>, <&cru HCLK_VOP0>; + clock-names = "aclk", "iface"; power-domains = <&power RK3288_PD_VIO>; #iommu-cells = <0>; status = "disabled"; @@ -1074,6 +1080,8 @@ reg = <0x0 0xff940300 0x0 0x100>; interrupts = ; interrupt-names = "vopl_mmu"; + clocks = <&cru ACLK_VOP1>, <&cru HCLK_VOP1>; + clock-names = "aclk", "iface"; power-domains = <&power RK3288_PD_VIO>; #iommu-cells = <0>; status = "disabled"; @@ -1204,6 +1212,8 @@ reg = <0x0 0xff9a0800 0x0 0x100>; interrupts = ; interrupt-names = "vpu_mmu"; + clocks = <&cru ACLK_VCODEC>, <&cru HCLK_VCODEC>; + clock-names = "aclk", "iface"; #iommu-cells = <0>; status = "disabled"; }; @@ -1213,6 +1223,8 @@ reg = <0x0 0xff9c0440 0x0 0x40>, <0x0 0xff9c0480 0x0 0x40>; interrupts = ; interrupt-names = "hevc_mmu"; + clocks = <&cru
[PATCH v7 04/14] iommu/rockchip: Fix error handling in attach
From: Tomasz Figa Currently if the driver encounters an error while attaching device, it will leave the IOMMU in an inconsistent state. Even though it shouldn't really happen in reality, let's just add proper error path to keep things consistent. Signed-off-by: Tomasz Figa Signed-off-by: Jeffy Chen Reviewed-by: Robin Murphy --- Changes in v7: None Changes in v6: None Changes in v5: Use out labels to save the duplication between the error and success paths. Changes in v4: None Changes in v3: None Changes in v2: Move irq request to probe(in patch[0]) drivers/iommu/rockchip-iommu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index b743d82e6fe1..f7ff3a3645ea 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -824,7 +824,7 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, ret = rk_iommu_force_reset(iommu); if (ret) - return ret; + goto out_disable_stall; iommu->domain = domain; @@ -837,7 +837,7 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, ret = rk_iommu_enable_paging(iommu); if (ret) - return ret; + goto out_disable_stall; spin_lock_irqsave(&rk_domain->iommus_lock, flags); list_add_tail(&iommu->node, &rk_domain->iommus); @@ -845,9 +845,9 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, dev_dbg(dev, "Attached to iommu domain\n"); +out_disable_stall: rk_iommu_disable_stall(iommu); - - return 0; + return ret; } static void rk_iommu_detach_device(struct iommu_domain *domain, -- 2.11.0
[PATCH v7 08/14] iommu/rockchip: Control clocks needed to access the IOMMU
From: Tomasz Figa Current code relies on master driver enabling necessary clocks before IOMMU is accessed, however there are cases when the IOMMU should be accessed while the master is not running yet, for example allocating V4L2 videobuf2 buffers, which is done by the VB2 framework using DMA mapping API and doesn't engage the master driver at all. This patch fixes the problem by letting clocks needed for IOMMU operation to be listed in Device Tree and making the driver enable them for the time of accessing the hardware. Signed-off-by: Jeffy Chen Signed-off-by: Tomasz Figa Acked-by: Robin Murphy --- Changes in v7: None Changes in v6: Fix dt-binding as Robin suggested. Use aclk and iface clk as Rob and Robin suggested, and split binding patch. Changes in v5: Use clk_bulk APIs. Changes in v4: None Changes in v3: None Changes in v2: None drivers/iommu/rockchip-iommu.c | 54 +- 1 file changed, 48 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index c4131ca792e0..6c6275589bd5 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -4,6 +4,7 @@ * published by the Free Software Foundation. */ +#include #include #include #include @@ -87,10 +88,17 @@ struct rk_iommu_domain { struct iommu_domain domain; }; +/* list of clocks required by IOMMU */ +static const char * const rk_iommu_clocks[] = { + "aclk", "iface", +}; + struct rk_iommu { struct device *dev; void __iomem **bases; int num_mmu; + struct clk_bulk_data *clocks; + int num_clocks; bool reset_disabled; struct iommu_device iommu; struct list_head node; /* entry in rk_iommu_domain.iommus */ @@ -506,6 +514,8 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) irqreturn_t ret = IRQ_NONE; int i; + WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); + for (i = 0; i < iommu->num_mmu; i++) { int_status = rk_iommu_read(iommu->bases[i], RK_MMU_INT_STATUS); if (int_status == 0) @@ -552,6 +562,8 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) rk_iommu_write(iommu->bases[i], RK_MMU_INT_CLEAR, int_status); } + clk_bulk_disable(iommu->num_clocks, iommu->clocks); + return ret; } @@ -594,7 +606,9 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain *rk_domain, list_for_each(pos, &rk_domain->iommus) { struct rk_iommu *iommu; iommu = list_entry(pos, struct rk_iommu, node); + WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); rk_iommu_zap_lines(iommu, iova, size); + clk_bulk_disable(iommu->num_clocks, iommu->clocks); } spin_unlock_irqrestore(&rk_domain->iommus_lock, flags); } @@ -823,10 +837,14 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, if (!iommu) return 0; - ret = rk_iommu_enable_stall(iommu); + ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks); if (ret) return ret; + ret = rk_iommu_enable_stall(iommu); + if (ret) + goto out_disable_clocks; + ret = rk_iommu_force_reset(iommu); if (ret) goto out_disable_stall; @@ -852,6 +870,8 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, out_disable_stall: rk_iommu_disable_stall(iommu); +out_disable_clocks: + clk_bulk_disable(iommu->num_clocks, iommu->clocks); return ret; } @@ -873,6 +893,7 @@ static void rk_iommu_detach_device(struct iommu_domain *domain, spin_unlock_irqrestore(&rk_domain->iommus_lock, flags); /* Ignore error while disabling, just keep going */ + WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); rk_iommu_enable_stall(iommu); rk_iommu_disable_paging(iommu); for (i = 0; i < iommu->num_mmu; i++) { @@ -880,6 +901,7 @@ static void rk_iommu_detach_device(struct iommu_domain *domain, rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 0); } rk_iommu_disable_stall(iommu); + clk_bulk_disable(iommu->num_clocks, iommu->clocks); iommu->domain = NULL; @@ -1172,18 +1194,38 @@ static int rk_iommu_probe(struct platform_device *pdev) iommu->reset_disabled = device_property_read_bool(dev, "rockchip,disable-mmu-reset"); - err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, dev_name(dev)); + iommu->num_clocks = ARRAY_SIZE(rk_iommu_clocks); + iommu->clocks = devm_kcalloc(iommu->dev, iommu->num_clocks, +sizeof(*iommu->clocks), GFP_KERNEL); + if (!iommu->clocks) + return -ENOMEM; + + for (i = 0; i < iommu->num_clocks; ++i) + iommu->cl
[PATCH v7 05/14] iommu/rockchip: Use iopoll helpers to wait for hardware
From: Tomasz Figa This patch converts the rockchip-iommu driver to use the in-kernel iopoll helpers to wait for certain status bits to change in registers instead of an open-coded custom macro. Signed-off-by: Tomasz Figa Signed-off-by: Jeffy Chen Reviewed-by: Robin Murphy --- Changes in v7: None Changes in v6: None Changes in v5: Use RK_MMU_POLL_PERIOD_US instead of 100. Changes in v4: None Changes in v3: None Changes in v2: None drivers/iommu/rockchip-iommu.c | 75 ++ 1 file changed, 39 insertions(+), 36 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index f7ff3a3645ea..baba283ccdf9 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include @@ -36,7 +36,10 @@ #define RK_MMU_AUTO_GATING 0x24 #define DTE_ADDR_DUMMY 0xCAFEBABE -#define FORCE_RESET_TIMEOUT100 /* ms */ + +#define RK_MMU_POLL_PERIOD_US 100 +#define RK_MMU_FORCE_RESET_TIMEOUT_US 10 +#define RK_MMU_POLL_TIMEOUT_US 1000 /* RK_MMU_STATUS fields */ #define RK_MMU_STATUS_PAGING_ENABLED BIT(0) @@ -73,8 +76,6 @@ */ #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000 -#define IOMMU_REG_POLL_COUNT_FAST 1000 - struct rk_iommu_domain { struct list_head iommus; struct platform_device *pdev; @@ -109,27 +110,6 @@ static struct rk_iommu_domain *to_rk_domain(struct iommu_domain *dom) return container_of(dom, struct rk_iommu_domain, domain); } -/** - * Inspired by _wait_for in intel_drv.h - * This is NOT safe for use in interrupt context. - * - * Note that it's important that we check the condition again after having - * timed out, since the timeout could be due to preemption or similar and - * we've never had a chance to check the condition before the timeout. - */ -#define rk_wait_for(COND, MS) ({ \ - unsigned long timeout__ = jiffies + msecs_to_jiffies(MS) + 1; \ - int ret__ = 0; \ - while (!(COND)) { \ - if (time_after(jiffies, timeout__)) { \ - ret__ = (COND) ? 0 : -ETIMEDOUT;\ - break; \ - } \ - usleep_range(50, 100); \ - } \ - ret__; \ -}) - /* * The Rockchip rk3288 iommu uses a 2-level page table. * The first level is the "Directory Table" (DT). @@ -333,9 +313,21 @@ static bool rk_iommu_is_paging_enabled(struct rk_iommu *iommu) return enable; } +static bool rk_iommu_is_reset_done(struct rk_iommu *iommu) +{ + bool done = true; + int i; + + for (i = 0; i < iommu->num_mmu; i++) + done &= rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR) == 0; + + return done; +} + static int rk_iommu_enable_stall(struct rk_iommu *iommu) { int ret, i; + bool val; if (rk_iommu_is_stall_active(iommu)) return 0; @@ -346,7 +338,9 @@ static int rk_iommu_enable_stall(struct rk_iommu *iommu) rk_iommu_command(iommu, RK_MMU_CMD_ENABLE_STALL); - ret = rk_wait_for(rk_iommu_is_stall_active(iommu), 1); + ret = readx_poll_timeout(rk_iommu_is_stall_active, iommu, val, +val, RK_MMU_POLL_PERIOD_US, +RK_MMU_POLL_TIMEOUT_US); if (ret) for (i = 0; i < iommu->num_mmu; i++) dev_err(iommu->dev, "Enable stall request timed out, status: %#08x\n", @@ -358,13 +352,16 @@ static int rk_iommu_enable_stall(struct rk_iommu *iommu) static int rk_iommu_disable_stall(struct rk_iommu *iommu) { int ret, i; + bool val; if (!rk_iommu_is_stall_active(iommu)) return 0; rk_iommu_command(iommu, RK_MMU_CMD_DISABLE_STALL); - ret = rk_wait_for(!rk_iommu_is_stall_active(iommu), 1); + ret = readx_poll_timeout(rk_iommu_is_stall_active, iommu, val, +!val, RK_MMU_POLL_PERIOD_US, +RK_MMU_POLL_TIMEOUT_US); if (ret) for (i = 0; i < iommu->num_mmu; i++) dev_err(iommu->dev, "Disable stall request timed out, status: %#08x\n", @@ -376,13 +373,16 @@ static int rk_iommu_disable_stall(struct rk_iommu *iommu) static int rk_iommu_enable_paging(struct rk_iommu *iommu) { int ret, i; + bool val; if (rk_iommu_is_paging_enabled(iommu)) return 0; rk_iommu_command(iommu, RK_MMU_CMD_ENABLE_PAGING); -
[PATCH v7 06/14] iommu/rockchip: Fix TLB flush of secondary IOMMUs
From: Tomasz Figa Due to the bug in current code, only first IOMMU has the TLB lines flushed in rk_iommu_zap_lines. This patch fixes the inner loop to execute for all IOMMUs and properly flush the TLB. Signed-off-by: Tomasz Figa Signed-off-by: Jeffy Chen --- Changes in v7: None Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: None drivers/iommu/rockchip-iommu.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index baba283ccdf9..c4131ca792e0 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -274,19 +274,21 @@ static void rk_iommu_base_command(void __iomem *base, u32 command) { writel(command, base + RK_MMU_COMMAND); } -static void rk_iommu_zap_lines(struct rk_iommu *iommu, dma_addr_t iova, +static void rk_iommu_zap_lines(struct rk_iommu *iommu, dma_addr_t iova_start, size_t size) { int i; - - dma_addr_t iova_end = iova + size; + dma_addr_t iova_end = iova_start + size; /* * TODO(djkurtz): Figure out when it is more efficient to shootdown the * entire iotlb rather than iterate over individual iovas. */ - for (i = 0; i < iommu->num_mmu; i++) - for (; iova < iova_end; iova += SPAGE_SIZE) + for (i = 0; i < iommu->num_mmu; i++) { + dma_addr_t iova; + + for (iova = iova_start; iova < iova_end; iova += SPAGE_SIZE) rk_iommu_write(iommu->bases[i], RK_MMU_ZAP_ONE_LINE, iova); + } } static bool rk_iommu_is_stall_active(struct rk_iommu *iommu) -- 2.11.0
Donation For Charity Work
-- Good Day, My wife and I have awarded you with a donation of $ 1,000,000.00 Dollars from part of our Jackpot Lottery of 50 Million Dollars, respond with your details for claims. We await your earliest response and God Bless you. Friedrich And Ann Mayrhofer.
Re: Would you help to tell why async printk solution was not taken to upstream kernel ?
On Tue, 6 Mar 2018 11:43:58 +0900 Sergey Senozhatsky wrote: > One more thing > > On (03/06/18 10:52), Sergey Senozhatsky wrote: > [..] > > > If you know the baud rate, logbuf size * console throughput is actually > > > trivial to calculate. > > It's trivial when your setup is trivial. In a less trivial case if you > set watchdog threshold based on "logbuf size * console throughput" then > things are still too bad. > > So this is what a typical printk over serial console looks like > > printk() > console_unlock() > for (;;) { >local_irq_save() > call_console_drivers() > foo_console_write() > spin_lock_irqsave(&port->lock, flags); > uart_console_write(port, s, count, foo_console_putchar); > spin_unlock_irqrestore(&port->lock, flags); >local_irq_restore() > } > > Notice that call_console_drivers->foo_console_write spins on > port->lock every time it wants to print out a logbuf line. > Why does it do this? > > In short, because of printf(). Yes, printk() may depend on printf(). > > printf() > n_tty_write() > uart_write() >uart_port_lock(state, flags) // > spin_lock_irqsave(&uport->lock, flags) > memcpy(circ->buf + circ->head, buf, c); >uart_port_unlock(port, flags) // > spin_unlock_irqrestore(&port->lock, flags); > > Now, printf() messages stored in uart circ buffer must be printed > to the console. And this is where console's IRQ handler jumps in. > > A typical IRQ handler does something like this > > static irqreturn_t foo_console_irq_handler(...) > { > spin_lock(&port->lock); > rx_chars(port, status); > tx_chars(port, status); > spin_unlock(&port->lock); > } > > Where tx_chars() usually does something like this > > while (...) { > write_char(port, xmit->buf[xmit->tail]); > xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1); > if (uart_circ_empty(xmit)) > break; > } > > Some drivers flush all pending chars, some drivers limit the number > of TX chars to some number, e.g. 512. > > But in any case, printk() -> call_console_drivers() -> foo_console_write() > must spin on port->lock as long as foo_console_irq_handler() has chars to > TX / RX. > > Thus, if you have O(logbuf) of kernel messages, and O(circ->buf) of user > space messages, then printk() will spend O(logbuf) + O(circ->buf) + O(RX). > > So the watchdog threshold value based purely on O(logbuf) (printing to > _all_ of the consoles) will not always work. > If you have a complex setup happening like above, you most likely have printks happening on multiple CPUs which means the work load will be spread out across those CPUs. -- Steve
Re: [PATCH 4.4 130/193] [media] tc358743: fix register i2c_rd/wr functions
On Fri, 2018-02-23 at 19:26 +0100, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Arnd Bergmann > > commit 3538aa6ecfb2dd727a40f9ebbbf25a0c2afe6226 upstream. [...] This introduces a regression in i2c_wr8_and_or(), fixed upstream by: commit f2c61f98e0b5f8b53b8fb860e5dcdd661bde7d0b Author: Philipp Zabel Date: Thu May 4 12:20:17 2017 -0300 [media] tc358743: fix register i2c_rd/wr function fix Ben. -- Ben Hutchings Software Developer, Codethink Ltd.
Re: Would you help to tell why async printk solution was not taken to upstream kernel ?
On Tue, 6 Mar 2018 11:53:50 +0900 Sergey Senozhatsky wrote: > Yes. My point was that "CPU can print one full buffer max" is not > guaranteed and not exactly true. There are ways for CPUs to break > that O(logbuf) boundary. Yes, when printk or the consoles have a bug, it can be greater than O(logbuf). -- Steve