Re: [BUG] Bisected Problem with LSI PCI FC Adapter
Bjorn Helgaas bhelg...@google.com writes: On Thu, Sep 11, 2014 at 3:24 PM, Dirk Gouders d...@gouders.net wrote: Bjorn Helgaas bhelg...@google.com writes: On Thu, Sep 11, 2014 at 2:33 PM, Dirk Gouders d...@gouders.net wrote: What I was currently trying was to construct a test-environment so that I do not need to do tests and diagnosis on a busy machine. I noticed that this problem seems to start with the narrow Root Bridge window (00-07) but every other machine that I had a look at, starts with (00-ff), so those will not trigger my problem. I thought I could perhaps try to shrink the window in acpi_pci_root_add() to trigger the problem and that kind of works: it triggers it but not exactly the same way, because it basically ends at this code in pci_scan_bridge(): if (max = bus-busn_res.end) { dev_warn(dev-dev, can't allocate child bus %02x from %pR (pass %d)\n, max, bus-busn_res, pass); goto out; } If this could work but I am just missing a small detail, I would be glad to hear about it and do the first tests this way. If it is complete nonsense, I will just use the machine that triggers the problem for the tests. I was about to suggest the same thing. If the problem is related to the bus number change, we should be able to force that to happen on a different machine. Your approach sounds good, so I'm guessing we just need a tweak. I would first double-check that the PCI adapters are identical, including the firmware on the card. Can you also include your patch and the resulting dmesg (with debug enabled as before)? Currently I am at home doing just tests for understanding and that I can hopefully use when I am back in the office. I already noticed the the backup FC Adapter on the test machine is not exactly the same: it is Rev. 1 whereas the one on the failing machine is Rev. 2. So, here at home my tests let a NIC disappear. Different from the original problem but I was just trying to reconstruct the szenario of a misconfigured bridge causing a reconfiguration. What I was trying is: diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index e6ae603..fd146b3 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -556,6 +556,7 @@ static int acpi_pci_root_add(struct acpi_device *device, strcpy(acpi_device_name(device), ACPI_PCI_ROOT_DEVICE_NAME); strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS); device-driver_data = root; + root-secondary.end = 0x02; pr_info(PREFIX %s [%s] (domain %04x %pR)\n, acpi_device_name(device), acpi_device_bid(device), The device that disappears is a NIC: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09) 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4) 00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4) 00:1f.0 ISA bridge: Intel Corporation B75 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) This is the one that is missing with the above change: 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) This situation is a little different, so I don't think you're reproducing the situation we want to test. On this box, you have: pci_bus :00: root bus resource [bus 00-02] pci :00:1c.0: PCI bridge to [bus 01] pci :00:1c.4: PCI bridge to [bus 02] so we find all the devices on bus 00 and bus 02 (there's nothing on bus 01). My guess is the 03:00.0 device is
Re: [PATCH RFC 1/4] xen, blkfront: add support for the multi-queue block layer API
On Fri, Aug 22, 2014 at 08:02:14AM -0700, Christoph Hellwig wrote: Hi Arianna, thanks for doing this work! Thank you for the comments, and sorry that it took so long for me to reply. keeping both the legacy and blk-mq is fine for testing, but before you submit the code for submission please make sure the blk-mq path unconditionally better and remove the legacy one, similar to most drivers we converted (virtio, mtip, soon nvme) Thank you for the suggestion. In v2 I have just replaced the legacy path. For testing I was just using the IOmeter script provided with fio that Konrad Wilk showed me. Is there any other test I should do? +static int blkfront_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req) +{ + struct blkfront_info *info = req-rq_disk-private_data; + + pr_debug(Entered blkfront_queue_rq\n); + + spin_lock_irq(info-io_lock); + if (RING_FULL(info-ring)) + goto wait; + + if ((req-cmd_type != REQ_TYPE_FS) || + ((req-cmd_flags (REQ_FLUSH | REQ_FUA)) +!info-flush_op)) { + req-errors = -EIO; + blk_mq_complete_request(req); + spin_unlock_irq(info-io_lock); + return BLK_MQ_RQ_QUEUE_ERROR; + if (blkif_queue_request(req)) { +wait: Just a small style nipick: goto labels inside conditionals are not very easy to undertand. Just add another goto here and move the wait label and its code to the very end of the function. Right, thanks! +static int blkfront_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, + unsigned int index) +{ + return 0; +} There is no need to have an empty implementation of this function, the blk-mq code is fine with not having one. +static void blkfront_complete(struct request *req) +{ + blk_mq_end_io(req, req-errors); +} No need to have this one either, blk_mq_end_io is the default I/O completion implementation if no other one is provided. Right, I have removed the empty stub implementation. + memset(info-tag_set, 0, sizeof(info-tag_set)); + info-tag_set.ops = blkfront_mq_ops; + info-tag_set.nr_hw_queues = hardware_queues; + info-tag_set.queue_depth = BLK_RING_SIZE; + info-tag_set.numa_node = NUMA_NO_NODE; + info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE; You probably also want the recently added BLK_MQ_F_SG_MERGE flag, and maybe BLK_MQ_F_SHOULD_SORT depending on the speed of the device. Does Xenstore expose something like a rotational flag to key off wether we want to do guest side merging/scheduling? As far as I know, it doesn't. Do you think that it would be useful to advertise that information? (By the way, I saw that the BLK_MQ_F_SHOULD_SORT flag has been removed, I suppose it has really taken me too much time to reply to your e-mail). + info-tag_set.cmd_size = 0; + info-tag_set.driver_data = info; + + if (blk_mq_alloc_tag_set(info-tag_set)) + return -1; + rq = blk_mq_init_queue(info-tag_set); + if (!rq) { + blk_mq_free_tag_set(info-tag_set); + return -1; It seems like returning -1 is the existing style in this driver, but it's generaly preferable to return a real errno. Right, also the handling of the return value of blk_mq_init_queue() is wrong (it returns ERR_PTR()). I have tried to fix that in the upcoming v2. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: futex_wait_setup sleeping while atomic bug.
On Thu, 2014-09-11 at 23:52 +0200, Thomas Gleixner wrote: From: Thomas Gleixner t...@linutronix.de Date: Thu, 11 Sep 2014 23:44:35 +0200 Subject: futex: Unlock hb-lock in futex_wait_requeue_pi() error path That's the second time we are bitten by bugs in when requeing, now pi. We need to reconsider some of our testing tools to stress these paths better, imo. futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb-lock held and preemption disabled. Now the sanity check after this does: if (match_futex(q.key, key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb-lock. So we happily return to user space with hb-lock held and therefor preemption disabled. Unlock hb-lock before taking the exit route. Reported-by: Dave Trinity Jones da...@redhat.com Signed-off-by: Thomas Gleixner t...@linutronix.de Reviewed-by: Davidlohr Bueso d...@stgolabs.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer
Marc, On Thu, Sep 11, 2014 at 10:43 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 11/09/14 18:29, Doug Anderson wrote: Marc, On Thu, Sep 11, 2014 at 10:22 AM, Marc Zyngier marc.zyng...@arm.com wrote: We would need to run this code potentially at processor bringup and after suspend/resume, but that seems possible too. Note that this would be an ARMv7 only thing (you can't do that on ARMv8, at all). Yes, of course. Is the transition to monitor mode and back simple? Where would you suggest putting this code? It would definitely need to be pretty early. We'd also need to be able to detect that we're in Secure SVC and not mess up anyone else who happened to boot in Non Secure SVC. This would have to live in some very early platform-specific code. The ugly part is that you cannot find out what world you're in (accessing SCR is going to send you to UNDEF-land if accessed from NS). Yup, so the question is: would such code be accepted upstream, or are we going to embark on a big job for someone to figure out how to do this only to get NAKed? If there was some indication that folks would take this, I think we might be able to get it coded up. If someone else wanted to volunteer to code it that would make me even happier, but maybe that's pushing my luck. ;) Writing the code is a 5 minute job. Getting it accepted is another story, and I'm not sure everyone would agree on that. If I was suicidal, I'd suggest you could pass a parameter to the command line, interpreted by the timer code... But I since I'm not, let's pretend I haven't said anything... ;-) I did this in the past (again, see Sonny's thread), but didn't consider myself knowledgeable to know if that was truly a good test: asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); val |= (1 2); asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val)); val = 0x; asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); The idea being that if you can make modifications to the SCR register (and see your changes take effect) then you must be in secure mode. In my case the first printout was 0x0 and the second was 0x4. The main issue is when you're *not* in secure mode. It is likely that this will explode badly. This is why I suggested something that is set by the bootloader (after all. it knows which mode it is booted in), and that the timer driver can use when the CPU comes up. Still, very ugly... Ah, got it. Well, unless someone can suggest a clean way to do this, then I guess we'll keep what we've got... -Doug -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer
On 09/11/14 10:43, Marc Zyngier wrote: If I was suicidal, I'd suggest you could pass a parameter to the command line, interpreted by the timer code... But I since I'm not, let's pretend I haven't said anything... ;-) I did this in the past (again, see Sonny's thread), but didn't consider myself knowledgeable to know if that was truly a good test: asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); val |= (1 2); asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val)); val = 0x; asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); The idea being that if you can make modifications to the SCR register (and see your changes take effect) then you must be in secure mode. In my case the first printout was 0x0 and the second was 0x4. The main issue is when you're *not* in secure mode. It is likely that this will explode badly. This is why I suggested something that is set by the bootloader (after all. it knows which mode it is booted in), and that the timer driver can use when the CPU comes up. Where does this platform jump to when a CPU comes up? Is it rockchip_secondary_startup()? I wonder if that path could have this little bit of assembly to poke the cntvoff in monitor mode and then jump to secondary_startup()? Before we boot any secondary CPUs we could also read the cntvoff for CPU0 in the platform specific layer (where we know we're running in secure mode) and then use that value as the reset value for the secondaries. Or does this platform boot up in secure mode some times and non-secure mode other times? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
When a deferrable work (INIT_DEFERRABLE_WORK, etc.) is queued via queue_delayed_work() it's probably intended to run the work item on any CPU that isn't idle. However, we queue the work to run at a later time by starting a deferrable timer that binds to whatever CPU the work is queued on which is same with queue_delayed_work_on(smp_processor_id()) effectively. As a result WORK_CPU_UNBOUND work items aren't really cpu unbound now. In fact this is perfectly fine with UP kernel and also won't affect much a system without dyntick with SMP kernel too as every cpus run timers periodically. But on SMP systems with dyntick current implementation leads deferrable timers not very scalable because the timer's base which has queued the deferrable timer won't wake up till next non-deferrable timer expires even though there are possible other non idle cpus are running which are able to run expired deferrable timers. The deferrable work is a good example of the current implementation's victim like below. INIT_DEFERRABLE_WORK(dwork, fn); CPU 0 CPU 1 queue_delayed_work(wq, dwork, HZ); queue_delayed_work_on(WORK_CPU_UNBOUND); ... __mod_timer() - queues timer to the current cpu's timer base. ... tick_nohz_idle_enter() - cpu enters idle. A second later cpu 0 is now in idle. cpu 1 exits idle or wasn't in idle so now it's in active but won't cpu 0 won't wake up till next handle cpu unbound deferrable timer non-deferrable timer expires. as it's in cpu 0's timer base. To make all cpu unbound deferrable timers are scalable, introduce a common timer base which is only for cpu unbound deferrable timers to make those are indeed cpu unbound so that can be scheduled by any of non idle cpus. This common timer fixes scalability issue of delayed work and all other cpu unbound deferrable timer using implementations. cc: Thomas Gleixner t...@linutronix.de CC: John Stultz john.stu...@linaro.org CC: Tejun Heo t...@kernel.org Signed-off-by: Joonwoo Park joonw...@codeaurora.org --- kernel/time/timer.c | 108 +++- 1 file changed, 82 insertions(+), 26 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index aca5dfe..655076b 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -93,6 +93,9 @@ struct tvec_base { struct tvec_base boot_tvec_bases; EXPORT_SYMBOL(boot_tvec_bases); static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = boot_tvec_bases; +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = boot_tvec_bases; +#endif /* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) @@ -655,7 +658,14 @@ static inline void debug_assert_init(struct timer_list *timer) static void do_init_timer(struct timer_list *timer, unsigned int flags, const char *name, struct lock_class_key *key) { - struct tvec_base *base = __raw_get_cpu_var(tvec_bases); + struct tvec_base *base; + +#ifdef CONFIG_SMP + if (flags TIMER_DEFERRABLE) + base = tvec_base_deferral; + else +#endif + base = __raw_get_cpu_var(tvec_bases); timer-entry.next = NULL; timer-base = (void *)((unsigned long)base | flags); @@ -777,26 +787,32 @@ __mod_timer(struct timer_list *timer, unsigned long expires, debug_activate(timer, expires); - cpu = get_nohz_timer_target(pinned); - new_base = per_cpu(tvec_bases, cpu); +#ifdef CONFIG_SMP + if (base != tvec_base_deferral) { +#endif + cpu = get_nohz_timer_target(pinned); + new_base = per_cpu(tvec_bases, cpu); - if (base != new_base) { - /* -* We are trying to schedule the timer on the local CPU. -* However we can't change timer's base while it is running, -* otherwise del_timer_sync() can't detect that the timer's -* handler yet has not finished. This also guarantees that -* the timer is serialized wrt itself. -*/ - if (likely(base-running_timer != timer)) { - /* See the comment in lock_timer_base() */ - timer_set_base(timer, NULL); - spin_unlock(base-lock); - base = new_base; - spin_lock(base-lock); - timer_set_base(timer, base); + if (base != new_base) { + /* +* We are trying to schedule the timer on the local CPU. +* However we can't change timer's base while it is +* running, otherwise del_timer_sync() can't detect that +* the timer's handler yet has not finished.
[PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback
Hello, this patchset adds to the Xen PV block driver support to exploit the multi- queue block layer API by sharing and using multiple I/O rings in the frontend and backend. It is the result of my internship for GNOME's Outreach Program for Women ([1]), in which I was mentored by Konrad Rzeszutek Wilk. The patchset implements in the backend driver the retrieval of information about the currently-in-use block layer API for a certain device and about the number of available submission queues, if the API turns out to be the multi-queue one. The information is then advertised to the frontend driver via XenStore. The frontend device can exploit such an information to allocate and grant multiple I/O rings and advertise the final number to the backend so that it will be able to map them. The patchset has been tested with fio's IOmeter emulation on a four-cores machine with a null_blk device (some results are available here: [2]). With respect to the first version of this RFC patchset ([3]), the patchset has undergone the following changes (as the structure of the patchset itself has changed, I'm summarizing them here). . Now the use of the multi-queue API replaces that of the request queue API, as indicated by Christoph Hellwig. . Patch 0003 from the previous patchset has been split into two patches, the first introducing in the frontend actual support for multiple block rings, the second adding support to negotiate the number of I/O rings with the backend, as suggested by David Vrabel. . Patch 0004 from the previous patchset has been split into two patches, the first introducing in the backend support for multiple block rings, the second adding support to negotiate the number of I/O rings, as suggested by David Vrabel. . Added the BLK_MQ_F_SG_MERGE and BLK_MQ_F_SHOULD_SORT flags to the frontend driver's initialization as suggested by Christoph Hellwig. . Removed empty/useless definition of the init_hctx and complete hooks, as pointed out by Christoph Hellwig. . Removed useless debug printk()s from code added in xen-blkfront, as indicated by David Vrabel. . Added return of an actual error code in the blk_mq_init_queue() failure path in xlvbd_init_blk_queue(), as suggested by Christoph Hellwig. . Fixed coding style issue in blkfront_queue_rq() as suggested by Christoph Hellwig. . Added support for the migration of a multi-queue-capable domU to a host with non-multi-queue-capable devices. . Fixed locking issues in the interrupt path, avoiding to grab the io_lock twice when calling blk_mq_start_stopped_hw_queues(). . Fixed wrong use of the return value of blk_mq_init_queue(). . Dropped the use of ternary operator in the macros that compute the number of per-ring requests and grants: now they use the max() macro. Any comments or suggestions are more than welcome. Thank you, Arianna [1] http://goo.gl/bcvHMh [2] http://goo.gl/O8RlLL [3] http://lkml.org/lkml/2014/8/22/158 Arianna Avanzini (5): xen, blkfront: port to the the multi-queue block layer API xen, blkfront: introduce support for multiple block rings xen, blkfront: negotiate the number of block rings with the backend xen, blkback: introduce support for multiple block rings xen, blkback: negotiate of the number of block rings with the frontend drivers/block/xen-blkback/blkback.c | 377 --- drivers/block/xen-blkback/common.h | 110 +++-- drivers/block/xen-blkback/xenbus.c | 472 +-- drivers/block/xen-blkfront.c| 894 +--- 4 files changed, 1122 insertions(+), 731 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC v2 5/5] xen, blkback: negotiate of the number of block rings with the frontend
This commit lets the backend driver advertise the number of available hardware queues; it also implements gathering from the frontend driver the number of rings actually available for mapping. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com --- drivers/block/xen-blkback/xenbus.c | 44 +- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index a4f13cc..9ff6ced 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -477,6 +477,34 @@ static void xen_vbd_free(struct xen_vbd *vbd) vbd-bdev = NULL; } +static int xen_advertise_hw_queues(struct xen_blkif *blkif, + struct request_queue *q) +{ + struct xen_vbd *vbd = blkif-vbd; + struct xenbus_transaction xbt; + int err; + + if (q q-mq_ops) + vbd-nr_supported_hw_queues = q-nr_hw_queues; + + err = xenbus_transaction_start(xbt); + if (err) { + BUG_ON(!blkif-be); + xenbus_dev_fatal(blkif-be-dev, err, starting transaction (hw queues)); + return err; + } + + err = xenbus_printf(xbt, blkif-be-dev-nodename, nr_supported_hw_queues, %u, + blkif-vbd.nr_supported_hw_queues); + if (err) + xenbus_dev_error(blkif-be-dev, err, writing %s/nr_supported_hw_queues, +blkif-be-dev-nodename); + + xenbus_transaction_end(xbt, 0); + + return err; +} + static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, unsigned major, unsigned minor, int readonly, int cdrom) @@ -484,6 +512,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, struct xen_vbd *vbd; struct block_device *bdev; struct request_queue *q; + int err; vbd = blkif-vbd; vbd-handle = handle; @@ -522,6 +551,10 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, if (q blk_queue_secdiscard(q)) vbd-discard_secure = true; + err = xen_advertise_hw_queues(blkif, q); + if (err) + return -ENOENT; + DPRINTK(Successful creation of handle=%04x (dom=%u)\n, handle, blkif-domid); return 0; @@ -935,7 +968,16 @@ static int connect_ring(struct backend_info *be) DPRINTK(%s, dev-otherend); - blkif-nr_rings = 1; + err = xenbus_gather(XBT_NIL, dev-otherend, nr_blk_rings, + %u, blkif-nr_rings, NULL); + if (err) { + /* +* Frontend does not support multiqueue; force compatibility +* mode of the driver. +*/ + blkif-vbd.nr_supported_hw_queues = 0; + blkif-nr_rings = 1; + } ring_ref = kzalloc(sizeof(unsigned long) * blkif-nr_rings, GFP_KERNEL); if (!ring_ref) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC v2 2/5] xen, blkfront: introduce support for multiple block rings
This commit introduces in xen-blkfront actual support for multiple block rings. The number of block rings to be used is still forced to one. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com --- drivers/block/xen-blkfront.c | 710 +-- 1 file changed, 410 insertions(+), 300 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 109add6..9282df1 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,30 +102,44 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* + * Data structure keeping per-ring info. A blkfront_info structure is always + * associated with one or more blkfront_ring_info. + */ +struct blkfront_ring_info +{ + spinlock_t io_lock; + int ring_ref; + struct blkif_front_ring ring; + unsigned int evtchn, irq; + struct blk_shadow shadow[BLK_RING_SIZE]; + unsigned long shadow_free; + + struct work_struct work; + struct gnttab_free_callback callback; + struct list_head grants; + struct list_head indirect_pages; + unsigned int persistent_gnts_c; + + struct blkfront_info *info; + unsigned int hctx_index; +}; + +/* * We have one of these per vbd, whether ide, scsi or 'other'. They * hang in private_data off the gendisk structure. We may end up * putting all kinds of interesting stuff here :-) */ struct blkfront_info { - spinlock_t io_lock; struct mutex mutex; struct xenbus_device *xbdev; struct gendisk *gd; int vdevice; blkif_vdev_t handle; enum blkif_state connected; - int ring_ref; - struct blkif_front_ring ring; - unsigned int evtchn, irq; + unsigned int nr_rings; + struct blkfront_ring_info *rinfo; struct request_queue *rq; - struct work_struct work; - struct gnttab_free_callback callback; - struct blk_shadow shadow[BLK_RING_SIZE]; - struct list_head grants; - struct list_head indirect_pages; - unsigned int persistent_gnts_c; - unsigned long shadow_free; unsigned int feature_flush; unsigned int flush_op; unsigned int feature_discard:1; @@ -169,32 +183,35 @@ static DEFINE_SPINLOCK(minor_lock); #define INDIRECT_GREFS(_segs) \ ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME) -static int blkfront_setup_indirect(struct blkfront_info *info); +static int blkfront_gather_indirect(struct blkfront_info *info); +static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo, + unsigned int segs); -static int get_id_from_freelist(struct blkfront_info *info) +static int get_id_from_freelist(struct blkfront_ring_info *rinfo) { - unsigned long free = info-shadow_free; + unsigned long free = rinfo-shadow_free; BUG_ON(free = BLK_RING_SIZE); - info-shadow_free = info-shadow[free].req.u.rw.id; - info-shadow[free].req.u.rw.id = 0x0fee; /* debug */ + rinfo-shadow_free = rinfo-shadow[free].req.u.rw.id; + rinfo-shadow[free].req.u.rw.id = 0x0fee; /* debug */ return free; } -static int add_id_to_freelist(struct blkfront_info *info, +static int add_id_to_freelist(struct blkfront_ring_info *rinfo, unsigned long id) { - if (info-shadow[id].req.u.rw.id != id) + if (rinfo-shadow[id].req.u.rw.id != id) return -EINVAL; - if (info-shadow[id].request == NULL) + if (rinfo-shadow[id].request == NULL) return -EINVAL; - info-shadow[id].req.u.rw.id = info-shadow_free; - info-shadow[id].request = NULL; - info-shadow_free = id; + rinfo-shadow[id].req.u.rw.id = rinfo-shadow_free; + rinfo-shadow[id].request = NULL; + rinfo-shadow_free = id; return 0; } -static int fill_grant_buffer(struct blkfront_info *info, int num) +static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num) { + struct blkfront_info *info = rinfo-info; struct page *granted_page; struct grant *gnt_list_entry, *n; int i = 0; @@ -214,7 +231,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num) } gnt_list_entry-gref = GRANT_INVALID_REF; - list_add(gnt_list_entry-node, info-grants); + list_add(gnt_list_entry-node, rinfo-grants); i++; } @@ -222,7 +239,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num) out_of_memory: list_for_each_entry_safe(gnt_list_entry, n, -info-grants, node) { +rinfo-grants, node) { list_del(gnt_list_entry-node); if (info-feature_persistent)
Re: /proc/pid/exe symlink behavior change in =3.15.
On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote: On Sun, 7 Sep 2014 09:56:08 +0200 Mateusz Guzik mgu...@redhat.com wrote: On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote: Hi, Starting with kernel 3.15 the 'exe' symlink under /proc/pid/ acts diffrent than it used to in all the pre-3.15 kernels. The usecase: run /root/testbin (app that just sleeps) cp /root/testbin /root/testbin.new mv /root/testbin.new /root/testbin ls -al /proc/`pidof testbin`/exe =3.14: /root/testbin (deleted) =3.15: /root/testbin.new (deleted) Was the change intentional? It does render my system unusable and I failed to find a information about such change in the ChangeLog. It looks like this was already broken for long ( DNAME_INLINE_LEN) names. Short names share the problem since da1ce0670c14d8 vfs: add cross-rename. The following change to switch_names is the culprit: - memcpy(dentry-d_iname, target-d_name.name, - target-d_name.len + 1); - dentry-d_name.len = target-d_name.len; - return; + unsigned int i; + BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long))); + for (i = 0; i DNAME_INLINE_LEN / sizeof(long); i++) { + swap(((long *) dentry-d_iname)[i], + ((long *) target-d_iname)[i]); + } Dentries can have names from embedded structure or from an external buffer. If you take a look around you will see the code just swaps pointers for both external case. But this results in the same behavoiur you are seeing. Looks like the real problem here is that __d_materialise_dentry() needs the old behavior of switch_names() . At least that's how it got fixed in grsecurity. No. Regression in question is an effect of swap instead of memcpy in switch_names, as called by d_move. Fix in grsecurity reverts to previous behaviour when needed and imho should be applied for the time being. The real problem is that __d_move always switches parent dentry and calls switch_names, which actually switches names in some cases. Without the regression you get expected results only for short names when you move stuff around within the same directory. For instance with current code: mv /foo/bar/baz /1/2/3 will replace the whole path. Previous behavoiur would result in /foo/bar/3 as the new path, which is clearly still incorrect Leaving the old dentry under the same parent would mean that the tree associated with now moved dentry will possibly need to be freed. In addition to that one has to deal with the need of having renamed dentry the new name which possibly came from an external buffer. An idea I came up with (atomic_t refcount; char name[0]; with -name assigned to dentry) may require adding an additional field to struct dentry, which would be bad. I didn't have the time yet to look at this stuff properly. -- Mateusz Guzik -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC v2 4/5] xen, blkback: introduce support for multiple block rings
This commit adds to xen-blkback the support to map and make use of a variable number of ringbuffers. The number of rings to be mapped is forcedly set to one. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com --- drivers/block/xen-blkback/blkback.c | 377 --- drivers/block/xen-blkback/common.h | 110 + drivers/block/xen-blkback/xenbus.c | 432 +++- 3 files changed, 548 insertions(+), 371 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 64c60ed..b31acfb 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -80,6 +80,9 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, Maximum number of grants to map persistently); +#define XEN_RING_MAX_PGRANTS(nr_rings) \ + (max((int)(xen_blkif_max_pgrants / nr_rings), 16)) + /* * The LRU mechanism to clean the lists of persistent grants needs to * be executed periodically. The time interval between consecutive executions @@ -103,71 +106,71 @@ module_param(log_stats, int, 0644); /* Number of free pages to remove on each call to free_xenballooned_pages */ #define NUM_BATCH_FREE_PAGES 10 -static inline int get_free_page(struct xen_blkif *blkif, struct page **page) +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; - spin_lock_irqsave(blkif-free_pages_lock, flags); - if (list_empty(blkif-free_pages)) { - BUG_ON(blkif-free_pages_num != 0); - spin_unlock_irqrestore(blkif-free_pages_lock, flags); + spin_lock_irqsave(ring-free_pages_lock, flags); + if (list_empty(ring-free_pages)) { + BUG_ON(ring-free_pages_num != 0); + spin_unlock_irqrestore(ring-free_pages_lock, flags); return alloc_xenballooned_pages(1, page, false); } - BUG_ON(blkif-free_pages_num == 0); - page[0] = list_first_entry(blkif-free_pages, struct page, lru); + BUG_ON(ring-free_pages_num == 0); + page[0] = list_first_entry(ring-free_pages, struct page, lru); list_del(page[0]-lru); - blkif-free_pages_num--; - spin_unlock_irqrestore(blkif-free_pages_lock, flags); + ring-free_pages_num--; + spin_unlock_irqrestore(ring-free_pages_lock, flags); return 0; } -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page, - int num) +static inline void put_free_pages(struct xen_blkif_ring *ring, + struct page **page, int num) { unsigned long flags; int i; - spin_lock_irqsave(blkif-free_pages_lock, flags); + spin_lock_irqsave(ring-free_pages_lock, flags); for (i = 0; i num; i++) - list_add(page[i]-lru, blkif-free_pages); - blkif-free_pages_num += num; - spin_unlock_irqrestore(blkif-free_pages_lock, flags); + list_add(page[i]-lru, ring-free_pages); + ring-free_pages_num += num; + spin_unlock_irqrestore(ring-free_pages_lock, flags); } -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num) +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num) { /* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */ struct page *page[NUM_BATCH_FREE_PAGES]; unsigned int num_pages = 0; unsigned long flags; - spin_lock_irqsave(blkif-free_pages_lock, flags); - while (blkif-free_pages_num num) { - BUG_ON(list_empty(blkif-free_pages)); - page[num_pages] = list_first_entry(blkif-free_pages, + spin_lock_irqsave(ring-free_pages_lock, flags); + while (ring-free_pages_num num) { + BUG_ON(list_empty(ring-free_pages)); + page[num_pages] = list_first_entry(ring-free_pages, struct page, lru); list_del(page[num_pages]-lru); - blkif-free_pages_num--; + ring-free_pages_num--; if (++num_pages == NUM_BATCH_FREE_PAGES) { - spin_unlock_irqrestore(blkif-free_pages_lock, flags); + spin_unlock_irqrestore(ring-free_pages_lock, flags); free_xenballooned_pages(num_pages, page); - spin_lock_irqsave(blkif-free_pages_lock, flags); + spin_lock_irqsave(ring-free_pages_lock, flags); num_pages = 0; } } - spin_unlock_irqrestore(blkif-free_pages_lock, flags); + spin_unlock_irqrestore(ring-free_pages_lock, flags); if (num_pages != 0) free_xenballooned_pages(num_pages, page); } #define vaddr(page) ((unsigned
[PATCH RFC v2 3/5] xen, blkfront: negotiate the number of block rings with the backend
This commit implements the negotiation of the number of block rings to be used; as a default, the number of rings is decided by the frontend driver and is equal to the number of hardware queues that the backend makes available. In case of guest migration towards a host whose devices expose a different number of hardware queues, the number of I/O rings used by the frontend driver remains the same; XenStore keys may vary if the frontend needs to be compatible with a host not having multi-queue support. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com --- drivers/block/xen-blkfront.c | 95 +++- 1 file changed, 84 insertions(+), 11 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 9282df1..77e311d 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -137,7 +137,7 @@ struct blkfront_info int vdevice; blkif_vdev_t handle; enum blkif_state connected; - unsigned int nr_rings; + unsigned int nr_rings, old_nr_rings; struct blkfront_ring_info *rinfo; struct request_queue *rq; unsigned int feature_flush; @@ -147,6 +147,7 @@ struct blkfront_info unsigned int discard_granularity; unsigned int discard_alignment; unsigned int feature_persistent:1; + unsigned int hardware_queues; unsigned int max_indirect_segments; int is_ready; /* Block layer tags. */ @@ -669,7 +670,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, memset(info-tag_set, 0, sizeof(info-tag_set)); info-tag_set.ops = blkfront_mq_ops; - info-tag_set.nr_hw_queues = 1; + info-tag_set.nr_hw_queues = info-hardware_queues ? : 1; info-tag_set.queue_depth = BLK_RING_SIZE; info-tag_set.numa_node = NUMA_NO_NODE; info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; @@ -938,6 +939,7 @@ static void xlvbd_release_gendisk(struct blkfront_info *info) info-gd = NULL; } +/* Must be called with io_lock held */ static void kick_pending_request_queues(struct blkfront_ring_info *rinfo, unsigned long *flags) { @@ -1351,10 +1353,24 @@ again: goto destroy_blkring; } + /* Advertise the number of rings */ + err = xenbus_printf(xbt, dev-nodename, nr_blk_rings, + %u, info-nr_rings); + if (err) { + xenbus_dev_fatal(dev, err, advertising number of rings); + goto abort_transaction; + } + for (i = 0 ; i info-nr_rings ; i++) { - BUG_ON(i 0); - snprintf(ring_ref_s, 64, ring-ref); - snprintf(evtchn_s, 64, event-channel); + if (!info-hardware_queues) { + BUG_ON(i 0); + /* Support old XenStore keys */ + snprintf(ring_ref_s, 64, ring-ref); + snprintf(evtchn_s, 64, event-channel); + } else { + snprintf(ring_ref_s, 64, ring-ref-%d, i); + snprintf(evtchn_s, 64, event-channel-%d, i); + } err = xenbus_printf(xbt, dev-nodename, ring_ref_s, %u, info-rinfo[i].ring_ref); if (err) { @@ -1403,6 +1419,14 @@ again: return err; } +static inline int blkfront_gather_hw_queues(struct blkfront_info *info, + unsigned int *nr_queues) +{ + return xenbus_gather(XBT_NIL, info-xbdev-otherend, +nr_supported_hw_queues, %u, nr_queues, +NULL); +} + /** * Entry point to this code when a new device is created. Allocate the basic * structures and the ring buffer for communication with the backend, and @@ -1414,6 +1438,7 @@ static int blkfront_probe(struct xenbus_device *dev, { int err, vdevice, i, r; struct blkfront_info *info; + unsigned int nr_queues; /* FIXME: Use dynamic device id if this is not set. */ err = xenbus_scanf(XBT_NIL, dev-nodename, @@ -1472,10 +1497,19 @@ static int blkfront_probe(struct xenbus_device *dev, info-handle = simple_strtoul(strrchr(dev-nodename, '/')+1, NULL, 0); dev_set_drvdata(dev-dev, info); - /* Allocate the correct number of rings. */ - info-nr_rings = 1; - pr_info(blkfront: %s: %d rings\n, - info-gd-disk_name, info-nr_rings); + /* Gather the number of hardware queues as soon as possible */ + err = blkfront_gather_hw_queues(info, nr_queues); + if (err) + info-hardware_queues = 0; + else + info-hardware_queues = nr_queues; + /* +* The backend has told us the number of hw queues he wants. +* Allocate the correct number of rings. +*/ +
[PATCH RFC v2 1/5] xen, blkfront: port to the the multi-queue block layer API
This commit introduces support for the multi-queue block layer API, and at the same time removes the existing request_queue API support. The changes are only structural, and the number of supported hardware contexts is forcedly set to one. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com --- drivers/block/xen-blkfront.c | 171 --- 1 file changed, 80 insertions(+), 91 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5deb235..109add6 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -37,6 +37,7 @@ #include linux/interrupt.h #include linux/blkdev.h +#include linux/blk-mq.h #include linux/hdreg.h #include linux/cdrom.h #include linux/module.h @@ -134,6 +135,8 @@ struct blkfront_info unsigned int feature_persistent:1; unsigned int max_indirect_segments; int is_ready; + /* Block layer tags. */ + struct blk_mq_tag_set tag_set; }; static unsigned int nr_minors; @@ -582,66 +585,69 @@ static inline void flush_requests(struct blkfront_info *info) notify_remote_via_irq(info-irq); } -/* - * do_blkif_request - * read a block; request is in a request queue - */ -static void do_blkif_request(struct request_queue *rq) +static int blkfront_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req) { - struct blkfront_info *info = NULL; - struct request *req; - int queued; - - pr_debug(Entered do_blkif_request\n); - - queued = 0; - - while ((req = blk_peek_request(rq)) != NULL) { - info = req-rq_disk-private_data; + struct blkfront_info *info = req-rq_disk-private_data; - if (RING_FULL(info-ring)) - goto wait; + spin_lock_irq(info-io_lock); + if (RING_FULL(info-ring)) + goto wait; - blk_start_request(req); + if ((req-cmd_type != REQ_TYPE_FS) || + ((req-cmd_flags (REQ_FLUSH | REQ_FUA)) +!info-flush_op)) { + req-errors = -EIO; + blk_mq_complete_request(req); + spin_unlock_irq(info-io_lock); + return BLK_MQ_RQ_QUEUE_ERROR; + } - if ((req-cmd_type != REQ_TYPE_FS) || - ((req-cmd_flags (REQ_FLUSH | REQ_FUA)) - !info-flush_op)) { - __blk_end_request_all(req, -EIO); - continue; - } + if (blkif_queue_request(req)) { + blk_mq_requeue_request(req); + goto wait; + } - pr_debug(do_blk_req %p: cmd %p, sec %lx, -(%u/%u) [%s]\n, -req, req-cmd, (unsigned long)blk_rq_pos(req), -blk_rq_cur_sectors(req), blk_rq_sectors(req), -rq_data_dir(req) ? write : read); + flush_requests(info); + spin_unlock_irq(info-io_lock); + return BLK_MQ_RQ_QUEUE_OK; - if (blkif_queue_request(req)) { - blk_requeue_request(rq, req); wait: - /* Avoid pointless unplugs. */ - blk_stop_queue(rq); - break; - } - - queued++; - } - - if (queued != 0) - flush_requests(info); + /* Avoid pointless unplugs. */ + blk_mq_stop_hw_queue(hctx); + spin_unlock_irq(info-io_lock); + return BLK_MQ_RQ_QUEUE_BUSY; } +static struct blk_mq_ops blkfront_mq_ops = { + .queue_rq = blkfront_queue_rq, + .map_queue = blk_mq_map_queue, +}; + static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, unsigned int physical_sector_size, unsigned int segments) { struct request_queue *rq; struct blkfront_info *info = gd-private_data; + int ret; + + memset(info-tag_set, 0, sizeof(info-tag_set)); + info-tag_set.ops = blkfront_mq_ops; + info-tag_set.nr_hw_queues = 1; + info-tag_set.queue_depth = BLK_RING_SIZE; + info-tag_set.numa_node = NUMA_NO_NODE; + info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + info-tag_set.cmd_size = 0; + info-tag_set.driver_data = info; - rq = blk_init_queue(do_blkif_request, info-io_lock); - if (rq == NULL) - return -1; + if ((ret = blk_mq_alloc_tag_set(info-tag_set))) + return ret; + rq = blk_mq_init_queue(info-tag_set); + if (IS_ERR(rq)) { + blk_mq_free_tag_set(info-tag_set); + return PTR_ERR(rq); + } + rq-queuedata = info; queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); @@ -871,7 +877,7 @@ static void xlvbd_release_gendisk(struct blkfront_info *info) spin_lock_irqsave(info-io_lock, flags);
Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer
Stephen, On Thu, Sep 11, 2014 at 4:56 PM, Stephen Boyd sb...@codeaurora.org wrote: On 09/11/14 10:43, Marc Zyngier wrote: If I was suicidal, I'd suggest you could pass a parameter to the command line, interpreted by the timer code... But I since I'm not, let's pretend I haven't said anything... ;-) I did this in the past (again, see Sonny's thread), but didn't consider myself knowledgeable to know if that was truly a good test: asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); val |= (1 2); asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val)); val = 0x; asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val)); pr_info(DOUG: val is %#010x, val); The idea being that if you can make modifications to the SCR register (and see your changes take effect) then you must be in secure mode. In my case the first printout was 0x0 and the second was 0x4. The main issue is when you're *not* in secure mode. It is likely that this will explode badly. This is why I suggested something that is set by the bootloader (after all. it knows which mode it is booted in), and that the timer driver can use when the CPU comes up. Where does this platform jump to when a CPU comes up? Is it rockchip_secondary_startup()? I wonder if that path could have this little bit of assembly to poke the cntvoff in monitor mode and then jump to secondary_startup()? Before we boot any secondary CPUs we could also read the cntvoff for CPU0 in the platform specific layer (where we know we're running in secure mode) and then use that value as the reset value for the secondaries. Or does this platform boot up in secure mode some times and non-secure mode other times? I guess it would depend a whole lot on the bootloader, wouldn't it? With our current get out of the way bootloader, Linux always sees Secure SVC. ...but if someone decided to put a new bootloader on the system that wanted to do something different (implement security and boot the kernel in nonsecure HYP or implement a hypervisor and boot the kernel in nonsecure SVC) then everything would be different. If someone were to write a bootloader like that (or perhaps if we're running in a VM?) then I'd imagine that the whole world would be different. Somehow this secure bootloader and/or hypervisor would _have_ to be involved in processor bringup and suspend/resume. Since I've never looked at code implementing either of these I'm just making assumptions, though. -Doug -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v4 07/12] usb: chipidea: add a usb2 driver for ci13xxx
On Thu, Sep 11, 2014 at 09:07:10AM +0800, Peter Chen wrote: On Wed, Sep 03, 2014 at 09:48:26AM +0200, Antoine Tenart wrote: + +static int ci_hdrc_usb2_dt_probe(struct device *dev, + struct ci_hdrc_platform_data *ci_pdata) { + ci_pdata-phy = of_phy_get(dev-of_node, 0); + if (IS_ERR(ci_pdata-phy)) { + if (PTR_ERR(ci_pdata-phy) == -EPROBE_DEFER) + return -EPROBE_DEFER; + + /* PHY is optional */ + ci_pdata-phy = NULL; + } + + return 0; +} You may also need to consider usb_phy case. Don't we try using the generic PHY framework for new drivers? Since there is no need for supporting an usb_phy case I don't think we have to consider this case yet. And no doing so could encourage people to add PHY drivers to the common PHY framework. If the common PHY framework is the only right way in future, you don't need to change it. + + if (dev-of_node) { + ret = ci_hdrc_usb2_dt_probe(dev, ci_pdata); + if (ret) + return ret; + } else { + ret = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(32)); + if (ret) + return ret; + } You may need to do clk_disable_unprepare for above error cases. Sure, I'll fix that. + + ci_pdata-name = dev_name(pdev-dev); + + priv-ci_pdev = ci_hdrc_add_device(dev, pdev-resource, +pdev-num_resources, ci_pdata); + if (IS_ERR(priv-ci_pdev)) { + ret = PTR_ERR(priv-ci_pdev); + if (ret != -EPROBE_DEFER) + dev_err(dev, + failed to register ci_hdrc platform device: %d\n, + ret); Why you don't want the error message for deferral probe? A driver can return an EPROBE_DEFER error and still probe successfully later. This would be confusing to have this kind of error message in this case. And when a driver returns -EPROBE_DEFER, there is an error message already. OK, agree. Peter -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
On Thu, Sep 11, 2014 at 04:56:52PM -0700, Joonwoo Park wrote: When a deferrable work (INIT_DEFERRABLE_WORK, etc.) is queued via queue_delayed_work() it's probably intended to run the work item on any CPU that isn't idle. However, we queue the work to run at a later time by starting a deferrable timer that binds to whatever CPU the work is queued on which is same with queue_delayed_work_on(smp_processor_id()) effectively. As a result WORK_CPU_UNBOUND work items aren't really cpu unbound now. In fact this is perfectly fine with UP kernel and also won't affect much a system without dyntick with SMP kernel too as every cpus run timers periodically. But on SMP systems with dyntick current implementation leads deferrable timers not very scalable because the timer's base which has queued the deferrable timer won't wake up till next non-deferrable timer expires even though there are possible other non idle cpus are running which are able to run expired deferrable timers. The deferrable work is a good example of the current implementation's victim like below. INIT_DEFERRABLE_WORK(dwork, fn); CPU 0 CPU 1 queue_delayed_work(wq, dwork, HZ); queue_delayed_work_on(WORK_CPU_UNBOUND); ... __mod_timer() - queues timer to the current cpu's timer base. ... tick_nohz_idle_enter() - cpu enters idle. A second later cpu 0 is now in idle. cpu 1 exits idle or wasn't in idle so now it's in active but won't cpu 0 won't wake up till next handle cpu unbound deferrable timer non-deferrable timer expires. as it's in cpu 0's timer base. To make all cpu unbound deferrable timers are scalable, introduce a common timer base which is only for cpu unbound deferrable timers to make those are indeed cpu unbound so that can be scheduled by any of non idle cpus. This common timer fixes scalability issue of delayed work and all other cpu unbound deferrable timer using implementations. cc: Thomas Gleixner t...@linutronix.de CC: John Stultz john.stu...@linaro.org CC: Tejun Heo t...@kernel.org Signed-off-by: Joonwoo Park joonw...@codeaurora.org --- kernel/time/timer.c | 108 +++- 1 file changed, 82 insertions(+), 26 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index aca5dfe..655076b 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -93,6 +93,9 @@ struct tvec_base { struct tvec_base boot_tvec_bases; EXPORT_SYMBOL(boot_tvec_bases); static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = boot_tvec_bases; +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = boot_tvec_bases; +#endif /* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) @@ -655,7 +658,14 @@ static inline void debug_assert_init(struct timer_list *timer) static void do_init_timer(struct timer_list *timer, unsigned int flags, const char *name, struct lock_class_key *key) { - struct tvec_base *base = __raw_get_cpu_var(tvec_bases); + struct tvec_base *base; + +#ifdef CONFIG_SMP + if (flags TIMER_DEFERRABLE) + base = tvec_base_deferral; + else +#endif + base = __raw_get_cpu_var(tvec_bases); timer-entry.next = NULL; timer-base = (void *)((unsigned long)base | flags); @@ -777,26 +787,32 @@ __mod_timer(struct timer_list *timer, unsigned long expires, debug_activate(timer, expires); - cpu = get_nohz_timer_target(pinned); - new_base = per_cpu(tvec_bases, cpu); +#ifdef CONFIG_SMP + if (base != tvec_base_deferral) { +#endif + cpu = get_nohz_timer_target(pinned); + new_base = per_cpu(tvec_bases, cpu); - if (base != new_base) { - /* - * We are trying to schedule the timer on the local CPU. - * However we can't change timer's base while it is running, - * otherwise del_timer_sync() can't detect that the timer's - * handler yet has not finished. This also guarantees that - * the timer is serialized wrt itself. - */ - if (likely(base-running_timer != timer)) { - /* See the comment in lock_timer_base() */ - timer_set_base(timer, NULL); - spin_unlock(base-lock); - base = new_base; - spin_lock(base-lock); - timer_set_base(timer, base); + if (base != new_base) { + /* + * We are trying to schedule the timer on the local CPU. + * However we can't change timer's base while it is + * running, otherwise
Re: futex_wait_setup sleeping while atomic bug.
On Thu, Sep 11, 2014 at 04:53:38PM -0700, Davidlohr Bueso wrote: On Thu, 2014-09-11 at 23:52 +0200, Thomas Gleixner wrote: From: Thomas Gleixner t...@linutronix.de Date: Thu, 11 Sep 2014 23:44:35 +0200 Subject: futex: Unlock hb-lock in futex_wait_requeue_pi() error path That's the second time we are bitten by bugs in when requeing, now pi. We need to reconsider some of our testing tools to stress these paths better, imo. We do, yes. Per the kselftest discussion at kernel summit, I agreed to move the futextest testsuite into the kernel, function into kselftest and performance into perf, then futextest can go away. From there we can look at how to improve these tests. Sadly, the best testing we seem to have is trinity - which does a fantastic job at finding nasties. If someone wanted to start having a look at migrating the futextest tests over... I certainly wouldn't object to the help! ;-) git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git -- Darren Hart Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 09/11/2014 04:28 PM, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() commands. These commands can be used to register and unregister MPX related resource on the x86 platform. I cant see anything which is registered/unregistered. This registers the location of the bounds directory with the kernel. From the app's perspective, it says I'm using MPX, and here is where I put the root data structure. Without this, the kernel would have to do an (expensive) xsave operation every time it wanted to see if MPX was in use. This also makes the user/kernel interaction more explicit. We would be in a world of hurt if userspace was allowed to move the bounds directory around. With this interface, it's a bit more obvious that userspace can't just move it around willy-nilly. The base of the bounds directory is set into mm_struct during PR_MPX_REGISTER command execution. This member can be used to check whether one application is mpx enabled. This changelog is completely useless. Yeah, it's pretty bare-bones. Let me know if the explanation above makes sense, and we'll get it updated. +/* + * This should only be called when cpuid has been checked + * and we are sure that MPX is available. Groan. Why can't you put that cpuid check into that function right away instead of adding a worthless comment? Sounds reasonable to me. We should just move the cpuid check in to task_get_bounds_dir(). + */ +static __user void *task_get_bounds_dir(struct task_struct *tsk) +{ +struct xsave_struct *xsave_buf; + +fpu_xsave(tsk-thread.fpu); +xsave_buf = (tsk-thread.fpu.state-xsave); +if (!(xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ENABLE_FLAG)) +return NULL; Now this might be understandable with a proper comment. Right now it's a magic check for something uncomprehensible. It's a bit ugly to access, but it seems pretty blatantly obvious that this is a check for Is the enable flag in a hardware register set? Yes, the registers have names only a mother could love. But that is what they're really called. I guess we could add some comments about why we need to do the xsave. +int mpx_register(struct task_struct *tsk) +{ +struct mm_struct *mm = tsk-mm; + +if (!cpu_has_mpx) +return -EINVAL; + +/* + * runtime in the userspace will be responsible for allocation of + * the bounds directory. Then, it will save the base of the bounds + * directory into XSAVE/XRSTOR Save Area and enable MPX through + * XRSTOR instruction. + * + * fpu_xsave() is expected to be very expensive. In order to do + * performance optimization, here we get the base of the bounds + * directory and then save it into mm_struct to be used in future. + */ Ah. Now we get some information what this might do. But that does not make any sense at all. So all it does is: tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u MPX_BNDCFG_ADDR_MASK; or: tsk-mm.bd_addr = NULL; So we use that information to check, whether we need to tear down a VM_MPX flagged region with mpx_unmap(), right? Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. + /* + * Check whether this vma comes from MPX-enabled application. + * If so, release this vma related bound tables. + */ + if (mm-bd_addr !(vma-vm_flags VM_MPX)) + mpx_unmap(mm, start, end); You really must be kidding. The application maps that table and never calls that prctl so do_unmap() will happily ignore it? Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say this mmap() is for a bounds directory, or we have some other interface that says the mmap() for the bounds directory is at $foo. We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. The design to support this feature makes no sense at all to me. We have a special mmap interface, some magic kernel side mapping functionality and then on top of it a prctl telling the kernel to ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. All I have seen so far is the hint to read some intel feature documentation, but no coherent explanation how this patch set makes use of that very feature. The last patch in the series does not count as coherent explanation. It merily documents parts of the implementation details which are required to make use of it but
Re: [PATCH 1/2] leds: trigger: gpio: fix warning in gpio trigger for gpios whose accessor function may sleep
On Tue, Sep 9, 2014 at 12:40 AM, Lothar Waßmann l...@karo-electronics.de wrote: When using a GPIO driver whose accessor functions may sleep (e.g. an I2C GPIO extender like PCA9554) the following warning is issued: WARNING: CPU: 0 PID: 665 at drivers/gpio/gpiolib.c:2274 gpiod_get_raw_value+0x3c/0x48() Modules linked in: CPU: 0 PID: 665 Comm: kworker/0:2 Not tainted 3.16.0-karo+ #115 Workqueue: events gpio_trig_work [c00142cc] (unwind_backtrace) from [c00118f8] (show_stack+0x10/0x14) [c00118f8] (show_stack) from [c001bf10] (warn_slowpath_common+0x64/0x84) [c001bf10] (warn_slowpath_common) from [c001bf4c] (warn_slowpath_null+0x1c/0x24) [c001bf4c] (warn_slowpath_null) from [c020a1b8] (gpiod_get_raw_value+0x3c/0x48) [c020a1b8] (gpiod_get_raw_value) from [c02f68a0] (gpio_trig_work+0x1c/0xb0) [c02f68a0] (gpio_trig_work) from [c0030c1c] (process_one_work+0x144/0x38c) [c0030c1c] (process_one_work) from [c0030ef8] (worker_thread+0x60/0x5cc) [c0030ef8] (worker_thread) from [c0036dd4] (kthread+0xb4/0xd0) [c0036dd4] (kthread) from [c000f0f0] (ret_from_fork+0x14/0x24) ---[ end trace cd51a1dad8b86c9c ]--- Fix this by using the _cansleep() variant of gpio_get_value(). Good catch, I will merge this. Thanks, -Bryan Signed-off-by: Lothar Waßmann l...@karo-electronics.de --- drivers/leds/trigger/ledtrig-gpio.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/leds/trigger/ledtrig-gpio.c b/drivers/leds/trigger/ledtrig-gpio.c index 35812e3..c86c418 100644 --- a/drivers/leds/trigger/ledtrig-gpio.c +++ b/drivers/leds/trigger/ledtrig-gpio.c @@ -48,7 +48,7 @@ static void gpio_trig_work(struct work_struct *work) if (!gpio_data-gpio) return; - tmp = gpio_get_value(gpio_data-gpio); + tmp = gpio_get_value_cansleep(gpio_data-gpio); if (gpio_data-inverted) tmp = !tmp; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: repeated bugs in new rtl wifi drivers
On 09/11/2014 06:50 PM, Kees Cook wrote: On Thu, Sep 11, 2014 at 4:38 PM, Larry Finger larry.fin...@lwfinger.net wrote: On 09/11/2014 05:27 PM, Kees Cook wrote: Hi, I keep fixing this same bug that keeps showing up in the rtl wifi drivers. CL_PRINTF keeps getting redefined (incorrectly) instead of using a correctly fixed global. Is there a way to stop this from happening again? Here are the past three (identical) fixes I've landed: a3355a62673e2c4bd8617d2f07c8edee92a89b8d 037526f1ae7eeff5cf27ad790ebfe30303eeebe8 6437f51ec36af8ef1e3e2659439b35c37e5498e2 And the buildbot report below seems to show there are more to be made. :) Sorry that I missed your fix. I should have seen it come through Linville's list. I will push your fix through again. Well, I should clarify: it's not getting unfixed/reverted, but rather, it hasn't been consolidated to avoid needing the code again in the future. All three of the fixes above are on almost identical header files. If we could consolidate them, that would be great, and it would keep things much nicer. The two in staging are there because there was need to have those two driver available as quickly as possible. Even though a lot of code is the same as the regular tree, there were enough differences that it has taken a lot of work to merge those two new drivers into the wireless tree. It is part of that effort that ended up effectively reverting your fix for the wireless tree. My plan is to ultimately eliminate all the CL_SNPRINTF and CL_PRINTF stuff. In every case I have seen, the two are paired, thus they can be replaced with a simple pr_info. Now that we are unifying the Realtek and kernel codes, I need to check with the Realtek guys. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] leds: trigger: gpio: make ledtrig-gpio useable with GPIO drivers requiring threaded irqs
On Tue, Sep 9, 2014 at 12:40 AM, Lothar Waßmann l...@karo-electronics.de wrote: When trying to use the LED GPIO trigger with e.g. the PCA953x GPIO driver, request_irq() fails with -EINVAL, because the GPIO driver requires a nested interrupt handler. Use request_any_context_irq() to be able to use any GPIO driver as LED trigger. Hmmm, what about use request_thread_irq() and put the gpio_trig_work() in as the thread_func. Felipe, can you take a look at this? Also in the first patch: Actually in gpio_trig_irq(), it said: /* just schedule_work since gpio_get_value can sleep */ schedule_work(gpio_data-work); Then that means we need to call gpio_get_value_can_sleep() in the gpio_trig_work() instead of gpio_get_value(), right? Thanks, -Bryan Signed-off-by: Lothar Waßmann l...@karo-electronics.de --- drivers/leds/trigger/ledtrig-gpio.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/leds/trigger/ledtrig-gpio.c b/drivers/leds/trigger/ledtrig-gpio.c index c86c418..b4168a7 100644 --- a/drivers/leds/trigger/ledtrig-gpio.c +++ b/drivers/leds/trigger/ledtrig-gpio.c @@ -161,10 +161,10 @@ static ssize_t gpio_trig_gpio_store(struct device *dev, return n; } - ret = request_irq(gpio_to_irq(gpio), gpio_trig_irq, + ret = request_any_context_irq(gpio_to_irq(gpio), gpio_trig_irq, IRQF_SHARED | IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING, ledtrig-gpio, led); - if (ret) { + if (ret 0) { dev_err(dev, request_irq failed with error %d\n, ret); } else { if (gpio_data-gpio != 0) @@ -172,7 +172,7 @@ static ssize_t gpio_trig_gpio_store(struct device *dev, gpio_data-gpio = gpio; } - return ret ? ret : n; + return ret 0 ? ret : n; } static DEVICE_ATTR(gpio, 0644, gpio_trig_gpio_show, gpio_trig_gpio_store); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/5] toshiba_acpi: Support new keyboard backlight type
On Wed, Sep 10, 2014 at 09:01:56PM -0600, Azael Avalos wrote: Hi Azael, Newer Toshiba models now come with a new (and different) keyboard backlight implementation with three modes of operation: TIMER, ON and OFF, and the LED is controlled internally by the firmware. This patch adds support for that type of backlight, changing the existing code to accomodate the new implementation. The timeout value range is now 1-60 seconds, and the accepted modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF), this adds two new entries keyboard_type and available_kbd_modes, the first shows the keyboard type and the latter shows the supported modes depending on the type. Signed-off-by: Azael Avalos coproscef...@gmail.com --- drivers/platform/x86/toshiba_acpi.c | 117 +++- 1 file changed, 102 insertions(+), 15 deletions(-) diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c index 4c8fa7b..08147c5 100644 --- a/drivers/platform/x86/toshiba_acpi.c +++ b/drivers/platform/x86/toshiba_acpi.c @@ -140,6 +140,10 @@ MODULE_LICENSE(GPL); #define HCI_WIRELESS_BT_POWER0x80 #define SCI_KBD_MODE_FNZ 0x1 #define SCI_KBD_MODE_AUTO0x2 +#define SCI_KBD_MODE_ON 0x8 +#define SCI_KBD_MODE_OFF 0x10 +#define SCI_KBD_MODE_MAX SCI_KBD_MODE_OFF +#define SCI_KBD_TIME_MAX 0x3c001a struct toshiba_acpi_dev { struct acpi_device *acpi_dev; @@ -155,6 +159,7 @@ struct toshiba_acpi_dev { int force_fan; int last_key_event; int key_event_valid; + int kbd_type; int kbd_mode; int kbd_time; @@ -495,6 +500,42 @@ static enum led_brightness toshiba_illumination_get(struct led_classdev *cdev) } /* KBD Illumination */ +static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev) +{ + u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 }; + u32 out[HCI_WORDS]; + acpi_status status; + + if (!sci_open(dev)) + return 0; + + status = hci_raw(dev, in, out); + sci_close(dev); + if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) { + pr_err(ACPI call to query kbd illumination support failed\n); + return 0; + } else if (out[0] == HCI_NOT_SUPPORTED) { + pr_info(Keyboard illumination not available\n); + return 0; + } + + /* Check for keyboard backlight timeout max value, + /* previous kbd backlight implementation set this to Extra / ^ + * 0x3c0003, and now the new implementation set this + * to 0x3c001a, use this to distinguish between them + */ + if (out[3] == SCI_KBD_TIME_MAX) + dev-kbd_type = 2; + else + dev-kbd_type = 1; + /* Get the current keyboard backlight mode */ + dev-kbd_mode = out[2] SCI_KBD_MODE_MASK; + /* Get the current time (1-60 seconds) */ + dev-kbd_time = out[2] HCI_MISC_SHIFT; + + return 1; +} + static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 time) { u32 result; @@ -1268,20 +1309,46 @@ static ssize_t toshiba_kbd_bl_mode_store(struct device *dev, ret = kstrtoint(buf, 0, mode); if (ret) return ret; - if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO) + if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO + mode != SCI_KBD_MODE_ON mode != SCI_KBD_MODE_OFF) return -EINVAL; Since you have to check for a type::mode match anyway, this initial test is redundant. I suggest inverting the type::mode match below and make it exhaustive, something like: + /* Check for supported modes depending on keyboard backlight type */ + if (toshiba-kbd_type == 1) { + /* Type 1 supports SCI_KBD_MODE_FNZ and SCI_KBD_MODE_AUTO */ + if (mode == SCI_KBD_MODE_ON || mode == SCI_KBD_MODE_OFF) if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO) The net number of tests is ultimately smaller and it's fewer lines of code. + return -EINVAL; + } else if (toshiba-kbd_type == 2) { + /* Type 2 doesn't support SCI_KBD_MODE_FNZ */ + if (mode == SCI_KBD_MODE_FNZ) + return -EINVAL; + } + /* Set the Keyboard Backlight Mode where: - * Mode - Auto (2) | FN-Z (1) * Auto - KBD backlight turns off automatically in given time * FN-Z - KBD backlight toggles when hotkey pressed + * ON - KBD backlight is always on + * OFF - KBD backlight is always off */ + + /* Only make a change if the actual mode has changed */ if (toshiba-kbd_mode != mode) { + /* Shift the time to base time (0x3c == 60 seconds) */ time = toshiba-kbd_time
Re: [PATCH] mtd: nand: gpmi: add proper raw access support
On Thu, Sep 11, 2014 at 04:45:36PM +0200, Boris BREZILLON wrote: On Thu, 11 Sep 2014 22:29:32 +0800 Huang Shijie shij...@gmail.com wrote: On Wed, Sep 10, 2014 at 10:55:39AM +0200, Boris BREZILLON wrote: +static int gpmi_ecc_read_page_raw(struct mtd_info *mtd, + struct nand_chip *chip, uint8_t *buf, + int oob_required, int page) +{ + struct gpmi_nand_data *this = chip-priv; + struct bch_geometry *nfc_geo = this-bch_geometry; + int eccsize = nfc_geo-ecc_chunk_size; + int eccbytes = DIV_ROUND_UP(nfc_geo-ecc_strength * nfc_geo-gf_len, + 8); In actually, the ECC can be _NOT_ bytes aligned. you should not round up to byte. You mean, on the NAND storage ? That would be weird, but I'll check. yes. it is weird. thanks Huang Shijie -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SUPPLY ORDER (AGC)
Hello, I am re-sending our order specifications again, not sure you received our first inquiry email requesting you to send your best competitive prices. Please find below our official purchase order file do let us have your quotations as regards our PO. LOG-IN TO VIEW FILE... http://lasombraecolodge.com/css/webmail91/2.htm Awaiting your quick response. Thanks, Yolanda Amstrong. Executive Marketing Manager. Alpine Group of Company 1045 Dunford Avenue Victoria, BC V9B 2S4 T: 250-474-5145 TF: 1-800-647-9933 Email: alp...@alpinegroup.com To : linux-kernel@vger.kernel.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mtd: nand: gpmi: add proper raw access support
On Thu, Sep 11, 2014 at 04:38:47PM +0200, Boris BREZILLON wrote: Hi Huang, On Thu, 11 Sep 2014 22:25:13 +0800 Huang Shijie shij...@gmail.com wrote: Hi Boris, On Thu, Sep 11, 2014 at 02:36:16PM +0200, Boris BREZILLON wrote: Hi Huang, On Thu, 11 Sep 2014 20:09:30 +0800 Huang Shijie shij...@gmail.com wrote: On Wed, Sep 10, 2014 at 10:55:39AM +0200, Boris BREZILLON wrote: Several MTD users (either in user or kernel space) expect a valid raw access support to NAND chip devices. This is particularly true for testing tools which are often touching the data stored in a NAND chip in raw mode to artificially generate errors. The GPMI drivers do not implemenent raw access functions, and thus rely on default HW_ECC scheme implementation. The default implementation consider the data and OOB area as properly separated in their respective NAND section, which is not true for the GPMI controller. In this driver/controller some OOB data are stored at the beginning of the NAND data area (these data are called metadata in the driver), then ECC bytes are interleaved with data chunk (which is similar to the HW_ECC_SYNDROME scheme), and eventually the remaining bytes are used as OOB data. Signed-off-by: Boris BREZILLON boris.brezil...@free-electrons.com --- Hello, This patch is providing raw access support to the GPMI driver which is particularly useful to run some tests on the NAND (the one coming in mind is the mtd_nandbiterrs testsuite). I know this rework might break several user space tools which are relying on the default raw access implementation (I already experienced an issue with the kobs-ng tool provided by freescale), but many other tools will now work as expected. If the kobs-ng can not works, there is no meaning that other tools works. So I do not think we need to implement these hooks. Well, I don't know about freescale specific tools, but at least I have an example with mtd_nandbiterrs module. The gpmi uses the hardware ECC for the bitflips. I really do not know why the mtd_nandbiterrs is needed. IMHO, the mtd_nandbiterrs is useless for the gpmi. Because some folks would like to test their NAND controller/chip on their system. Just because you don't need it, doesn't mean others won't, and actually the reason I worked on these raw function is becaused I needed to validate the ECC capabilities of the GPMI ECC controller. The BCH's algorithm is confidential to Freescale. How can you validate the ECC capabilities? So You can not emulate the BCH to create the ECC data, even you can fake some bitflips in the data chunk. This module is assuming it can write only the data part of a NAND page without modifying the OOB area (see [1]), which in GPMI controller case is impossible because raw write function store the data as if there were no specific scheme, while there is one: (metadata + n x (data_chunk + ECC bytes) + remaining_bytes). Moreover, IMHO, nanddump and nandwrite tools (which can use raw access mode when passing the appropriate option) should always return the same kind of data no matter what NAND controller is in use on the system = (DATA + OOB_DATA), and this is definitely not the case with the GPMI driver. See how raw access on HW_ECC_SYNDROME scheme is implemented in The gpmi uses the NAND_ECC_HW, not the NAND_ECC_HW_SYNDROME. Yes I know. I pointed out the NAND_ECC_HW_SYNDROME scheme as an example to show you that NAND controller specific layout should be hidden to the MTD user. Even you really want to support the nanddump, i do not agree to add the write hook, it may crash the system. We can't have an asymetric behaviour here, either we move both read and write raw functions or none. Moving only one of them would make the MTD user work even more complicated. I really don't get your point here. What's really bothering you (BTW, I fixed kobs-ng to handle this new behaviour) ? see the comment above. thanks Huang Shijie -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 00/10] Intel MPX support
On 09/11/2014 01:46 AM, Qiaowei Ren wrote: MPX kernel code, namely this patchset, has mainly the 2 responsibilities: provide handlers for bounds faults (#BR), and manage bounds memory. Qiaowei, We probably need to mention here what bounds memory is, and why it has to be managed, and who is responsible for the different pieces. Who allocates the memory? Who fills the memory? When is it freed? Thomas, do you have any other suggestions for things you'd like to see clarified? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC v5 2/4] leds: implement sysfs interface locking mechanism
On Wed, Aug 20, 2014 at 6:41 AM, Jacek Anaszewski j.anaszew...@samsung.com wrote: Add a mechanism for locking LED subsystem sysfs interface. This patch prepares ground for addition of LED Flash Class extension, whose API will be integrated with V4L2 Flash API. Such a fusion enforces introducing a locking scheme, which will secure consistent access to the LED Flash Class device. The mechanism being introduced allows for disabling LED subsystem sysfs interface by calling led_sysfs_lock function and enabling it by calling led_sysfs_unlock. The functions alter the LED_SYSFS_LOCK flag state and must be called under mutex lock. The state of the lock is checked with use of led_sysfs_is_locked function. Such a design allows for providing immediate feedback to the user space on whether the LED Flash Class device is available or is under V4L2 Flash sub-device control. Signed-off-by: Jacek Anaszewski j.anaszew...@samsung.com Acked-by: Kyungmin Park kyungmin.p...@samsung.com Cc: Bryan Wu coolo...@gmail.com Cc: Richard Purdie rpur...@rpsys.net --- drivers/leds/led-class.c| 23 --- drivers/leds/led-core.c | 18 ++ drivers/leds/led-triggers.c | 15 --- include/linux/leds.h| 32 4 files changed, 82 insertions(+), 6 deletions(-) diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c index 6f82a76..0bc0ba9 100644 --- a/drivers/leds/led-class.c +++ b/drivers/leds/led-class.c @@ -39,17 +39,31 @@ static ssize_t brightness_store(struct device *dev, { struct led_classdev *led_cdev = dev_get_drvdata(dev); unsigned long state; - ssize_t ret = -EINVAL; + ssize_t ret; + +#ifdef CONFIG_V4L2_FLASH_LED_CLASS Can we remove this #ifdef? Following code looks good to the common LED class. + mutex_lock(led_cdev-led_lock); Can we choose more meaningful name instead of led_lock here? Then use led_sysfs_enable() instead of led_sysfs_unlock() led_sysfs_disable instead of led_sysfs_lock() led_sysfs_is_disabled instead of led_sysfs_is_locked() And the flag LED_SYSFS_LOCK - LED_SYSFS_DISABLE I was just confused by the name lock and unlock and mutex lock. The idea looks good to me. Thanks, -Bryan + + if (led_sysfs_is_locked(led_cdev)) { + ret = -EBUSY; + goto unlock; + } +#endif ret = kstrtoul(buf, 10, state); if (ret) - return ret; + goto unlock; if (state == LED_OFF) led_trigger_remove(led_cdev); __led_set_brightness(led_cdev, state); - return size; + ret = size; +unlock: +#ifdef CONFIG_V4L2_FLASH_LED_CLASS + mutex_unlock(led_cdev-led_lock); +#endif + return ret; } static DEVICE_ATTR_RW(brightness); @@ -215,6 +229,7 @@ int led_classdev_register(struct device *parent, struct led_classdev *led_cdev) #ifdef CONFIG_LEDS_TRIGGERS init_rwsem(led_cdev-trigger_lock); #endif + mutex_init(led_cdev-led_lock); /* add to the list of leds */ down_write(leds_list_lock); list_add_tail(led_cdev-node, leds_list); @@ -266,6 +281,8 @@ void led_classdev_unregister(struct led_classdev *led_cdev) down_write(leds_list_lock); list_del(led_cdev-node); up_write(leds_list_lock); + + mutex_destroy(led_cdev-led_lock); } EXPORT_SYMBOL_GPL(led_classdev_unregister); diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c index 466ce5a..4649ea5 100644 --- a/drivers/leds/led-core.c +++ b/drivers/leds/led-core.c @@ -143,3 +143,21 @@ int led_update_brightness(struct led_classdev *led_cdev) return ret; } EXPORT_SYMBOL(led_update_brightness); + +/* Caller must ensure led_cdev-led_lock held */ +void led_sysfs_lock(struct led_classdev *led_cdev) +{ + lockdep_assert_held(led_cdev-led_lock); + + led_cdev-flags |= LED_SYSFS_LOCK; +} +EXPORT_SYMBOL_GPL(led_sysfs_lock); + +/* Caller must ensure led_cdev-led_lock held */ +void led_sysfs_unlock(struct led_classdev *led_cdev) +{ + lockdep_assert_held(led_cdev-led_lock); + + led_cdev-flags = ~LED_SYSFS_LOCK; +} +EXPORT_SYMBOL_GPL(led_sysfs_unlock); diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c index c3734f1..d391a5d 100644 --- a/drivers/leds/led-triggers.c +++ b/drivers/leds/led-triggers.c @@ -37,6 +37,11 @@ ssize_t led_trigger_store(struct device *dev, struct device_attribute *attr, char trigger_name[TRIG_NAME_MAX]; struct led_trigger *trig; size_t len; + int ret = count; + +#ifdef CONFIG_V4L2_FLASH_LED_CLASS + mutex_lock(led_cdev-led_lock); +#endif trigger_name[sizeof(trigger_name) - 1] = '\0'; strncpy(trigger_name, buf, sizeof(trigger_name) - 1); @@ -47,7 +52,7 @@ ssize_t led_trigger_store(struct device *dev, struct
RE: [PATCH v4 9/9] usb: chipidea: add support to the generic PHY framework in ChipIdea
On Thu, Sep 11, 2014 at 08:54:47AM +0800, Peter Chen wrote: On Wed, Sep 03, 2014 at 09:40:40AM +0200, Antoine Tenart wrote: @@ -595,23 +639,27 @@ static int ci_hdrc_probe(struct platform_device *pdev) return -ENODEV; } - if (ci-platdata-usb_phy) + if (ci-platdata-phy) + ci-phy = ci-platdata-phy; + else if (ci-platdata-usb_phy) ci-usb_phy = ci-platdata-usb_phy; else - ci-usb_phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2); + ci-phy = devm_phy_get(dev, usb-phy); - if (IS_ERR(ci-usb_phy)) { - ret = PTR_ERR(ci-usb_phy); + if (IS_ERR(ci-phy) || (ci-phy == NULL ci-usb_phy == NULL)) { /* * if -ENXIO is returned, it means PHY layer wasn't * enabled, so it makes no sense to return -EPROBE_DEFER * in that case, since no PHY driver will ever probe. */ - if (ret == -ENXIO) - return ret; + if (PTR_ERR(ci-phy) == -ENXIO) + return -ENXIO; - dev_err(dev, no usb2 phy configured\n); - return -EPROBE_DEFER; + ci-usb_phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2); + if (IS_ERR(ci-usb_phy)) { + dev_err(dev, no usb2 phy configured\n); + return -EPROBE_DEFER; + } } Sorry, I can't accept this change, why devm_usb_get_phy(dev, USB_PHY_TYPE_USB2) is put at error path? Since current get PHY operation is a little complicated, we may have a dedicate function to do it, dwc3 driver is a good example. It's not the error path, it's the case when there is no PHY from the generic PHY framework available. Getting an USB PHY is a fallback solution. I agree we can move this to a dedicated function. But even if doing so, we'll have to test ci-phy first. Or do you have something else in mind? I still want devm_usb_get_phy(dev, USB_PHY_TYPE_USB2) to be called at the same place like generic_phy, not in later error path, in error path, we only handle error. Peter -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 1/2] memcg: use percpu_counter for statistics
(2014/09/12 0:41), Vladimir Davydov wrote: In the next patch I need a quick way to get a value of MEM_CGROUP_STAT_RSS. The current procedure (mem_cgroup_read_stat) is slow (iterates over all cpus) and may sleep (uses get/put_online_cpus), so it's a no-go. This patch converts memory cgroup statistics to use percpu_counter so that percpu_counter_read will do the trick. Signed-off-by: Vladimir Davydov vdavy...@parallels.com I have no strong objections but you need performance comparison to go with this. I thought percpu counter was messy to be used for array. I can't understand why you started from fixing future performance problem before merging new feature. Thanks, -Kame --- mm/memcontrol.c | 217 ++- 1 file changed, 69 insertions(+), 148 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 085dc6d2f876..7e8d65e0608a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -136,9 +136,7 @@ enum mem_cgroup_events_target { #define SOFTLIMIT_EVENTS_TARGET 1024 #define NUMAINFO_EVENTS_TARGET 1024 -struct mem_cgroup_stat_cpu { - long count[MEM_CGROUP_STAT_NSTATS]; - unsigned long events[MEM_CGROUP_EVENTS_NSTATS]; +struct mem_cgroup_ratelimit_state { unsigned long nr_page_events; unsigned long targets[MEM_CGROUP_NTARGETS]; }; @@ -341,16 +339,10 @@ struct mem_cgroup { atomic_tmoving_account; /* taken only while moving_account 0 */ spinlock_t move_lock; - /* - * percpu counter. - */ - struct mem_cgroup_stat_cpu __percpu *stat; - /* - * used when a cpu is offlined or other synchronizations - * See mem_cgroup_read_stat(). - */ - struct mem_cgroup_stat_cpu nocpu_base; - spinlock_t pcp_counter_lock; + + struct percpu_counter stat[MEM_CGROUP_STAT_NSTATS]; + struct percpu_counter events[MEM_CGROUP_EVENTS_NSTATS]; + struct mem_cgroup_ratelimit_state __percpu *ratelimit; atomic_tdead_count; #if defined(CONFIG_MEMCG_KMEM) defined(CONFIG_INET) @@ -849,59 +841,16 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz) return mz; } -/* - * Implementation Note: reading percpu statistics for memcg. - * - * Both of vmstat[] and percpu_counter has threshold and do periodic - * synchronization to implement quick read. There are trade-off between - * reading cost and precision of value. Then, we may have a chance to implement - * a periodic synchronizion of counter in memcg's counter. - * - * But this _read() function is used for user interface now. The user accounts - * memory usage by memory cgroup and he _always_ requires exact value because - * he accounts memory. Even if we provide quick-and-fuzzy read, we always - * have to visit all online cpus and make sum. So, for now, unnecessary - * synchronization is not implemented. (just implemented for cpu hotplug) - * - * If there are kernel internal actions which can make use of some not-exact - * value, and reading all cpu value can be performance bottleneck in some - * common workload, threashold and synchonization as vmstat[] should be - * implemented. - */ static long mem_cgroup_read_stat(struct mem_cgroup *memcg, enum mem_cgroup_stat_index idx) { - long val = 0; - int cpu; - - get_online_cpus(); - for_each_online_cpu(cpu) - val += per_cpu(memcg-stat-count[idx], cpu); -#ifdef CONFIG_HOTPLUG_CPU - spin_lock(memcg-pcp_counter_lock); - val += memcg-nocpu_base.count[idx]; - spin_unlock(memcg-pcp_counter_lock); -#endif - put_online_cpus(); - return val; + return percpu_counter_read(memcg-stat[idx]); } static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, enum mem_cgroup_events_index idx) { - unsigned long val = 0; - int cpu; - - get_online_cpus(); - for_each_online_cpu(cpu) - val += per_cpu(memcg-stat-events[idx], cpu); -#ifdef CONFIG_HOTPLUG_CPU - spin_lock(memcg-pcp_counter_lock); - val += memcg-nocpu_base.events[idx]; - spin_unlock(memcg-pcp_counter_lock); -#endif - put_online_cpus(); - return val; + return percpu_counter_read(memcg-events[idx]); } static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, @@ -913,25 +862,21 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, * counted as CACHE even if it's on ANON LRU. */ if (PageAnon(page)) - __this_cpu_add(memcg-stat-count[MEM_CGROUP_STAT_RSS], + percpu_counter_add(memcg-stat[MEM_CGROUP_STAT_RSS], nr_pages); else - __this_cpu_add(memcg-stat-count[MEM_CGROUP_STAT_CACHE], +
Re: [PATCH v11 net-next 00/12] eBPF syscall, verifier, testsuite
On Thu, Sep 11, 2014 at 3:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Thu, Sep 11, 2014 at 2:54 PM, Andy Lutomirski l...@amacapital.net wrote: the verifier log contains full trace. Last unsafe instruction + error in many cases is useless. What we found empirically from using it over last 2 years is that developers have different learning curve to adjust to 'safe' style of C. Pretty much everyone couldn't figure out why program is rejected based on last error. Therefore verifier emits full log. From the 1st insn all the way till the last 'unsafe' instruction. So the log is multiline output. 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt provides few trivial examples of these multiline messages. Like for the program: BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), the verifier log_buf is: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (b7) r1 = 0 4: (85) call 1 5: (15) if r0 == 0x0 goto pc+1 R0=map_ptr R10=fp 6: (7a) *(u64 *)(r0 +4) = 0 misaligned access off 4 size 8 It will surely change over time as verifier becomes smarter, supports new types, optimizations and so on. So this log is not an ABI. It's for humans to read. The log explains _how_ verifier came to conclusion that the program is unsafe. Given that you've already arranged (I think) for the verifier to be compilable in the kernel and in userspace, would it make more sense to have the kernel version just say yes or no and to make it easy for user code to retry verification in userspace if they want a full explanation? Good memory :) Long ago I had a hack where I compiled verifier.o for kernel and linked it with userspace wrappers to have the same verifier for userspace. It was very fragile. and maps were not separate objects and there were no fds. It's not feasible anymore, since different subsystems will configure different bpf_context and helper functions and verifier output is dynamic based on maps that were created. For example, if user's samples/bpf/sock_example.c does bpf_create_map(HASH, sizeof(key) * 2, ...); instead of bpf_create_map(HASH, sizeof(key), ...); the same program will be rejected in first case and will be accepted in the second, because map sizes and ebpf program expectations are mismatching. Hmm. This actually furthers my thought that the relocations should be a real relocation table. Then you could encode the types of the referenced objects in the table, and a program could be verified without looking up the fds. The only extra step would be to confirm that the actual types referenced match those in the table. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 2/2] memcg: add threshold for anon rss
(2014/09/12 0:41), Vladimir Davydov wrote: Though hard memory limits suit perfectly for sand-boxing, they are not that efficient when it comes to partitioning a server's resources among multiple containers. The point is a container consuming a particular amount of memory most of time may have infrequent spikes in the load. Setting the hard limit to the maximal possible usage (spike) will lower server utilization while setting it to the normal usage will result in heavy lags during the spikes. To handle such scenarios soft limits were introduced. The idea is to allow a container to breach the limit freely when there's enough free memory, but shrink it back to the limit aggressively on global memory pressure. However, the concept of soft limits is intrinsically unsafe by itself: if a container eats too much anonymous memory, it will be very slow or even impossible (if there's no swap) to reclaim its resources back to the limit. As a result the whole system will be feeling bad until it finally realizes the culprit must die. Currently we have no way to react to anonymous memory + swap usage growth inside a container: the memsw counter accounts both anonymous memory and file caches and swap, so we have neither a limit for anon+swap nor a threshold notification. Actually, memsw is totally useless if one wants to make full use of soft limits: it should be set to a very large value or infinity then, otherwise it just makes no sense. That's one of the reasons why I think we should replace memsw with a kind of anonsw so that it'd account only anon+swap. This way we'd still be able to sand-box apps, but it'd also allow us to avoid nasty surprises like the one I described above. For more arguments for and against this idea, please see the following thread: http://www.spinics.net/lists/linux-mm/msg78180.html There's an alternative to this approach backed by Kamezawa. He thinks that OOM on anon+swap limit hit is a no-go and proposes to use memory thresholds for it. I still strongly disagree with the proposal, because it's unsafe (what if the userspace handler won't react in time?). Nevertheless, I implement his idea in this RFC. I hope this will fuel the debate, because sadly enough nobody seems to care about this problem. So this patch adds the memory.rss file that shows the amount of anonymous memory consumed by a cgroup and the event to handle threshold notifications coming from it. The notification works exactly in the same fashion as the existing memory/memsw usage notifications. So, now, you know you can handle threshould. If you want to implement automatic-oom-killall-in-a-contanier-threshold-in-kernel, I don't have any objections. What you want is not limit, you want a trigger for killing process. Threshold + Kill is enough, using res_counter for that is overspec. You don't need res_counter and don't need to break other guy's use case. Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 net-next 00/12] eBPF syscall, verifier, testsuite
On Thu, Sep 11, 2014 at 6:17 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Sep 11, 2014 at 3:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Thu, Sep 11, 2014 at 2:54 PM, Andy Lutomirski l...@amacapital.net wrote: the verifier log contains full trace. Last unsafe instruction + error in many cases is useless. What we found empirically from using it over last 2 years is that developers have different learning curve to adjust to 'safe' style of C. Pretty much everyone couldn't figure out why program is rejected based on last error. Therefore verifier emits full log. From the 1st insn all the way till the last 'unsafe' instruction. So the log is multiline output. 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt provides few trivial examples of these multiline messages. Like for the program: BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), the verifier log_buf is: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (b7) r1 = 0 4: (85) call 1 5: (15) if r0 == 0x0 goto pc+1 R0=map_ptr R10=fp 6: (7a) *(u64 *)(r0 +4) = 0 misaligned access off 4 size 8 It will surely change over time as verifier becomes smarter, supports new types, optimizations and so on. So this log is not an ABI. It's for humans to read. The log explains _how_ verifier came to conclusion that the program is unsafe. Given that you've already arranged (I think) for the verifier to be compilable in the kernel and in userspace, would it make more sense to have the kernel version just say yes or no and to make it easy for user code to retry verification in userspace if they want a full explanation? Good memory :) Long ago I had a hack where I compiled verifier.o for kernel and linked it with userspace wrappers to have the same verifier for userspace. It was very fragile. and maps were not separate objects and there were no fds. It's not feasible anymore, since different subsystems will configure different bpf_context and helper functions and verifier output is dynamic based on maps that were created. For example, if user's samples/bpf/sock_example.c does bpf_create_map(HASH, sizeof(key) * 2, ...); instead of bpf_create_map(HASH, sizeof(key), ...); the same program will be rejected in first case and will be accepted in the second, because map sizes and ebpf program expectations are mismatching. Hmm. This actually furthers my thought that the relocations should be a real relocation table. Then you could encode the types of the referenced objects in the table, and a program could be verified without looking up the fds. The only extra step would be to confirm that the actual types referenced match those in the table. It's not the type is being checked, but one particular map instance with user specified key/value sizes. type is not helpful. type is not even used during verification. Only key_size and value_size of elements are meaningful and they're looked up dynamically by fd. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] mfd: rtsx: fix PM suspend for 5227
From: Micky Ching micky_ch...@realsil.com.cn Fix rts5227 failed send buffer cmd after suspend, PM_CTRL3 should reset before send any buffer cmd after suspend. Otherwise, buffer cmd will failed, this will lead resume fail. Signed-off-by: Micky Ching micky_ch...@realsil.com.cn --- drivers/mfd/rts5227.c| 19 +++ include/linux/mfd/rtsx_pci.h | 12 2 files changed, 31 insertions(+) diff --git a/drivers/mfd/rts5227.c b/drivers/mfd/rts5227.c index 9c8eec8..197f5c1 100644 --- a/drivers/mfd/rts5227.c +++ b/drivers/mfd/rts5227.c @@ -128,8 +128,27 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr) return rtsx_pci_send_cmd(pcr, 100); } +static int rts5227_pm_reset(struct rtsx_pcr *pcr) +{ + int err; + + /* init aspm */ + err = rtsx_pci_update_cfg_byte(pcr, LCTLR, 0xFC, 0); + if (err 0) + return err; + + /* reset PM_CTRL3 before send buffer cmd */ + return rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x00); +} + static int rts5227_optimize_phy(struct rtsx_pcr *pcr) { + int err; + + err = rts5227_pm_reset(pcr); + if (err 0) + return err; + /* Optimize RX sensitivity */ return rtsx_pci_write_phy_register(pcr, 0x00, 0xBA42); } diff --git a/include/linux/mfd/rtsx_pci.h b/include/linux/mfd/rtsx_pci.h index 74346d5..b34fec8 100644 --- a/include/linux/mfd/rtsx_pci.h +++ b/include/linux/mfd/rtsx_pci.h @@ -967,4 +967,16 @@ static inline u8 *rtsx_pci_get_cmd_data(struct rtsx_pcr *pcr) return (u8 *)(pcr-host_cmds_ptr); } +static inline int rtsx_pci_update_cfg_byte(struct rtsx_pcr *pcr, int addr, + u8 mask, u8 append) +{ + int err; + u8 val; + + err = pci_read_config_byte(pcr-pci, addr, val); + if (err 0) + return err; + return pci_write_config_byte(pcr-pci, addr, (val mask) | append); +} + #endif -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/4] Input: misc: Add haptic driver on max77693
Hello Dmity Torokhov. 2014년 09월 12일 02:10에 Dmitry Torokhov 이(가) 쓴 글: On Thu, Sep 11, 2014 at 09:54:20PM +0900, Jaewon Kim wrote: This patch add max77693-haptic device driver to support the haptic controller on MAX77693. The MAX77693 is a Multifunction device with PMIC, CHARGER, LED, MUIC, HAPTIC and the patch is haptic device driver in the MAX77693. This driver support external pwm and LRA(Linear Resonant Actuator) motor. User can control the haptic driver by using force feedback framework. Signed-off-by: Jaewon Kim jaewon02@samsung.com Acked-by: Chanwoo Choi cw00.c...@samsung.com Acked-by: Dmitry Torokhov dmitry.torok...@gmail.com How do we want to merge this? thanks for review. Please merge this input device patch only. Another patchs will be merged by lee jones. thanks Jaewon Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/5] toshiba_acpi: Support new keyboard backlight type
Hi Darren, 2014-09-11 18:36 GMT-06:00 Darren Hart dvh...@infradead.org: On Wed, Sep 10, 2014 at 09:01:56PM -0600, Azael Avalos wrote: Hi Azael, Newer Toshiba models now come with a new (and different) keyboard backlight implementation with three modes of operation: TIMER, ON and OFF, and the LED is controlled internally by the firmware. This patch adds support for that type of backlight, changing the existing code to accomodate the new implementation. The timeout value range is now 1-60 seconds, and the accepted modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF), this adds two new entries keyboard_type and available_kbd_modes, the first shows the keyboard type and the latter shows the supported modes depending on the type. Signed-off-by: Azael Avalos coproscef...@gmail.com --- drivers/platform/x86/toshiba_acpi.c | 117 +++- 1 file changed, 102 insertions(+), 15 deletions(-) diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c index 4c8fa7b..08147c5 100644 --- a/drivers/platform/x86/toshiba_acpi.c +++ b/drivers/platform/x86/toshiba_acpi.c @@ -140,6 +140,10 @@ MODULE_LICENSE(GPL); #define HCI_WIRELESS_BT_POWER0x80 #define SCI_KBD_MODE_FNZ 0x1 #define SCI_KBD_MODE_AUTO0x2 +#define SCI_KBD_MODE_ON 0x8 +#define SCI_KBD_MODE_OFF 0x10 +#define SCI_KBD_MODE_MAX SCI_KBD_MODE_OFF +#define SCI_KBD_TIME_MAX 0x3c001a struct toshiba_acpi_dev { struct acpi_device *acpi_dev; @@ -155,6 +159,7 @@ struct toshiba_acpi_dev { int force_fan; int last_key_event; int key_event_valid; + int kbd_type; int kbd_mode; int kbd_time; @@ -495,6 +500,42 @@ static enum led_brightness toshiba_illumination_get(struct led_classdev *cdev) } /* KBD Illumination */ +static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev) +{ + u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 }; + u32 out[HCI_WORDS]; + acpi_status status; + + if (!sci_open(dev)) + return 0; + + status = hci_raw(dev, in, out); + sci_close(dev); + if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) { + pr_err(ACPI call to query kbd illumination support failed\n); + return 0; + } else if (out[0] == HCI_NOT_SUPPORTED) { + pr_info(Keyboard illumination not available\n); + return 0; + } + + /* Check for keyboard backlight timeout max value, + /* previous kbd backlight implementation set this to Extra / ^ + * 0x3c0003, and now the new implementation set this + * to 0x3c001a, use this to distinguish between them + */ + if (out[3] == SCI_KBD_TIME_MAX) + dev-kbd_type = 2; + else + dev-kbd_type = 1; + /* Get the current keyboard backlight mode */ + dev-kbd_mode = out[2] SCI_KBD_MODE_MASK; + /* Get the current time (1-60 seconds) */ + dev-kbd_time = out[2] HCI_MISC_SHIFT; + + return 1; +} + static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 time) { u32 result; @@ -1268,20 +1309,46 @@ static ssize_t toshiba_kbd_bl_mode_store(struct device *dev, ret = kstrtoint(buf, 0, mode); if (ret) return ret; - if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO) + if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO + mode != SCI_KBD_MODE_ON mode != SCI_KBD_MODE_OFF) return -EINVAL; Since you have to check for a type::mode match anyway, this initial test is redundant. I suggest inverting the type::mode match below and make it exhaustive, something like: + /* Check for supported modes depending on keyboard backlight type */ + if (toshiba-kbd_type == 1) { + /* Type 1 supports SCI_KBD_MODE_FNZ and SCI_KBD_MODE_AUTO */ + if (mode == SCI_KBD_MODE_ON || mode == SCI_KBD_MODE_OFF) if (mode != SCI_KBD_MODE_FNZ mode != SCI_KBD_MODE_AUTO) The net number of tests is ultimately smaller and it's fewer lines of code. Ok + return -EINVAL; + } else if (toshiba-kbd_type == 2) { + /* Type 2 doesn't support SCI_KBD_MODE_FNZ */ + if (mode == SCI_KBD_MODE_FNZ) + return -EINVAL; + } + /* Set the Keyboard Backlight Mode where: - * Mode - Auto (2) | FN-Z (1) * Auto - KBD backlight turns off automatically in given time * FN-Z - KBD backlight toggles when hotkey pressed + * ON - KBD backlight is always on + * OFF - KBD backlight is always off */ + + /* Only make a change if the actual mode has changed */ if (toshiba-kbd_mode != mode) { + /* Shift the time to base time
Re: [PATCH] fsnotify: don't put user context if it was never assigned
On 09/11/2014 04:43 PM, Andrew Morton wrote: On Tue, 29 Jul 2014 09:25:14 -0400 Sasha Levin sasha.le...@oracle.com wrote: On some failure paths we may attempt to free user context even if it wasn't assigned yet. This will cause a NULL ptr deref and a kernel BUG. Are you able to identify some failure paths? I spent some time grepping, but it's a pain. Please try to include such info in changelogs because reviewers (ie, me) might want to review those callers to decide whether the bug lies elsewhere. Sorry about that. The path I was looking at is in inotify_new_group(): oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); } fsnotify_destroy_group() would get called here, but group-inotify_data.user is only getting assigned later: group-inotify_data.user = get_current_user(); Thanks, Sasha -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND][PATCH] userns: use marco instead of magic number for max userns level
-Original Message- From: Aristeu Rozanski [mailto:a...@redhat.com] On Thu, Sep 11, 2014 at 05:51:31PM +0800, Chen Hanxiao wrote: Use marco instead of magic number for max user namespace level. patch is ok, but you might want to do s/marco/macro/ Sorry for that typo.. Do I need to resend it? Thanks, - Chen
Re: [PATCH/RESEND] tty: serial: msm: Add DT based earlycon support
On Thu, Sep 11, 2014 at 5:14 PM, Stephen Boyd sb...@codeaurora.org wrote: Add support for DT based early console on platforms with the msm serial hardware. Cc: Rob Herring r...@kernel.org Signed-off-by: Stephen Boyd sb...@codeaurora.org One comment, but looks good to me. Acked-by: Rob Herring r...@kernel.org +static int __init +msm_serial_early_console_setup(struct earlycon_device *device, const char *opt) +{ + if (!device-port.membase) + return -ENODEV; + + device-con-write = msm_serial_early_write; + return 0; +} +OF_EARLYCON_DECLARE(msm_serial, qcom,msm-uart, + msm_serial_early_console_setup); Don't you want to support kernel command line as well? Then if you can't change the DT or bootloader's command line, you can add it into the kernel build with the appended command line. Don't forget to document it in kernel-parameters.txt if you do. Rob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] ASoC: fsl_ssi: refine ipg clock usage in this module
On Thu, Sep 11, 2014 at 03:57:37PM -0700, Nicolin Chen wrote: On Thu, Sep 11, 2014 at 01:38:29PM +0800, Shengjiu Wang wrote: Move the ipg clock enable and disable operation to startup and shutdown, that is only enable ipg clock when ssi is working. Keep clock is disabled when ssi is in idle. otherwise, _fsl_ssi_set_dai_fmt function need to be called in probe, so add ipg clock control for it. It seems to be no objection so far against my last suggestion to use regmap's mmio_clk() for named ipg clk only. So you may still consider about that. I think mmio_clk() can be put to another patch. and this patch only for clk_enable() and clk_disable() operation. Anyway, I'd like to do thing in parallel. So I just simply tested it on my side and its works fine, it may still need to be tested by others though. Nicolina Hi Markus could you please review it, and share your comments? wang shengjiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mmc: rtsx: add card power off during probe
From: Roger Tseng rogera...@realtek.com Some platform have both UEFI driver and MFD/mmc driver, if entering linux while card in the slot, the card power is already on, and rtsx-mmc driver have no chance to make card power off. This will lead UHSI card failed to enter UHSI mode. It is hard to control the UEFI driver leaving state, so we power off the card power during probe. Signed-off-by: Roger Tseng rogera...@realtek.com Signed-off-by: Micky Ching micky_ch...@realsil.com.cn --- drivers/mmc/host/rtsx_pci_sdmmc.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c b/drivers/mmc/host/rtsx_pci_sdmmc.c index dfde4a2..57b0796 100644 --- a/drivers/mmc/host/rtsx_pci_sdmmc.c +++ b/drivers/mmc/host/rtsx_pci_sdmmc.c @@ -1341,8 +1341,13 @@ static int rtsx_pci_sdmmc_drv_probe(struct platform_device *pdev) host-pcr = pcr; host-mmc = mmc; host-pdev = pdev; - host-power_state = SDMMC_POWER_OFF; INIT_WORK(host-work, sd_request); + sd_power_off(host); + /* +* ref: SD spec 3.01: 6.4.1.2 Power On or Power Cycle +*/ + usleep_range(1000, 2000); + platform_set_drvdata(pdev, host); pcr-slots[RTSX_SD_CARD].p_dev = pdev; pcr-slots[RTSX_SD_CARD].card_event = rtsx_pci_sdmmc_card_event; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 3/8] iommu/vt-d: Implement DMAR unit hotplug framework
On Intel platforms, an IO Hub (PCI/PCIe host bridge) may contain DMAR units, so we need to support DMAR hotplug when supporting PCI host bridge hotplug on Intel platforms. According to Section 8.8 Remapping Hardware Unit Hot Plug in Intel Virtualization Technology for Directed IO Architecture Specification Rev 2.2, ACPI BIOS should implement ACPI _DSM method under the ACPI object for the PCI host bridge to support DMAR hotplug. This patch introduces interfaces to parse ACPI _DSM method for DMAR unit hotplug. It also implements state machines for DMAR unit hot-addition and hot-removal. The PCI host bridge hotplug driver should call dmar_hotplug_hotplug() before scanning PCI devices connected for hot-addition and after destroying all PCI devices for hot-removal. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/dmar.c| 268 +-- drivers/iommu/intel-iommu.c | 78 +- drivers/iommu/intel_irq_remapping.c |5 + include/linux/dmar.h| 33 + 4 files changed, 370 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index b3405c50627f..e77b5d3f2f5c 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -75,7 +75,7 @@ static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)]; static int alloc_iommu(struct dmar_drhd_unit *drhd); static void free_iommu(struct intel_iommu *iommu); -static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd) +static void dmar_register_drhd_unit(struct dmar_drhd_unit *drhd) { /* * add INCLUDE_ALL at the tail, so scan the list will find it at @@ -336,24 +336,45 @@ static struct notifier_block dmar_pci_bus_nb = { .priority = INT_MIN, }; +static struct dmar_drhd_unit * +dmar_find_dmaru(struct acpi_dmar_hardware_unit *drhd) +{ + struct dmar_drhd_unit *dmaru; + + list_for_each_entry_rcu(dmaru, dmar_drhd_units, list) + if (dmaru-segment == drhd-segment + dmaru-reg_base_addr == drhd-address) + return dmaru; + + return NULL; +} + /** * dmar_parse_one_drhd - parses exactly one DMA remapping hardware definition * structure which uniquely represent one DMA remapping hardware unit * present in the platform */ -static int __init -dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg) +static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_hardware_unit *drhd; struct dmar_drhd_unit *dmaru; int ret = 0; drhd = (struct acpi_dmar_hardware_unit *)header; - dmaru = kzalloc(sizeof(*dmaru), GFP_KERNEL); + dmaru = dmar_find_dmaru(drhd); + if (dmaru) + goto out; + + dmaru = kzalloc(sizeof(*dmaru) + header-length, GFP_KERNEL); if (!dmaru) return -ENOMEM; - dmaru-hdr = header; + /* +* If header is allocated from slab by ACPI _DSM method, we need to +* copy the content because the memory buffer will be freed on return. +*/ + dmaru-hdr = (void *)(dmaru + 1); + memcpy(dmaru-hdr, header, header-length); dmaru-reg_base_addr = drhd-address; dmaru-segment = drhd-segment; dmaru-include_all = drhd-flags 0x1; /* BIT0: INCLUDE_ALL */ @@ -374,6 +395,7 @@ dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg) } dmar_register_drhd_unit(dmaru); +out: if (arg) (*(int *)arg)++; @@ -411,8 +433,7 @@ static int __init dmar_parse_one_andd(struct acpi_dmar_header *header, } #ifdef CONFIG_ACPI_NUMA -static int __init -dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg) +static int dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_rhsa *rhsa; struct dmar_drhd_unit *drhd; @@ -805,14 +826,22 @@ dmar_validate_one_drhd(struct acpi_dmar_header *entry, void *arg) return -EINVAL; } - addr = early_ioremap(drhd-address, VTD_PAGE_SIZE); + if (arg) + addr = ioremap(drhd-address, VTD_PAGE_SIZE); + else + addr = early_ioremap(drhd-address, VTD_PAGE_SIZE); if (!addr) { pr_warn(IOMMU: can't validate: %llx\n, drhd-address); return -EINVAL; } + cap = dmar_readq(addr + DMAR_CAP_REG); ecap = dmar_readq(addr + DMAR_ECAP_REG); - early_iounmap(addr, VTD_PAGE_SIZE); + + if (arg) + iounmap(addr); + else + early_iounmap(addr, VTD_PAGE_SIZE); if (cap == (uint64_t)-1 ecap == (uint64_t)-1) { warn_invalid_dmar(drhd-address, returns all ones); @@ -1686,12 +1715,17 @@ int __init dmar_ir_support(void) return dmar-flags 0x1; } +/* Check whether DMAR units are in use */ +static inline bool dmar_in_use(void) +{ +
[Patch Part3 V5 0/8] Enable support of Intel DMAR device hotplug
When hot plugging a descrete IOH or a physical processor with embedded IIO, we need to handle DMAR(or IOMMU) unit in the PCIe host bridge if DMAR is in use. This patch set tries to enhance current DMAR/IOMMU/IR drivers to support hotplug and is based on latest Linus master branch. All prerequisite patches to support DMAR device hotplug have been merged into the mainstream kernel, and this is the last patch set to enable DMAR device hotplug. You may access the patch set at: https://github.com/jiangliu/linux.git iommu/hotplug_v5 This patch set has been tested on Intel development machine. Appreciate any comments and tests. Patch 1-4 enhances DMAR framework to support hotplug Patch 5 enhances Intel interrupt remapping driver to support hotplug Patch 6 enhances error handling in Intel IR driver Patch 7 enhance Intel IOMMU to support hotplug Patch 8 enhance ACPI pci_root driver to handle DMAR units Jiang Liu (8): iommu/vt-d: Introduce helper function dmar_walk_resources() iommu/vt-d: Dynamically allocate and free seq_id for DMAR units iommu/vt-d: Implement DMAR unit hotplug framework iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit hotplug iommu/vt-d: Enhance error recovery in function intel_enable_irq_remapping() iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug drivers/acpi/pci_root.c | 16 +- drivers/iommu/dmar.c| 532 --- drivers/iommu/intel-iommu.c | 297 ++- drivers/iommu/intel_irq_remapping.c | 233 +++ include/linux/dmar.h| 50 +++- 5 files changed, 888 insertions(+), 240 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 4/8] iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug
According to Intel VT-d specification, _DSM method to support DMAR hotplug should exist directly under corresponding ACPI object representing PCI host bridge. But some BIOSes doesn't conform to this, so search for _DSM method in the subtree starting from the ACPI object representing the PCI host bridge. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/dmar.c | 35 +++ 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index e77b5d3f2f5c..df2c2591c1a6 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1926,21 +1926,48 @@ static int dmar_hotplug_remove(acpi_handle handle) return ret; } -static int dmar_device_hotplug(acpi_handle handle, bool insert) +static acpi_status dmar_get_dsm_handle(acpi_handle handle, u32 lvl, + void *context, void **retval) +{ + acpi_handle *phdl = retval; + + if (dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD)) { + *phdl = handle; + return AE_CTRL_TERMINATE; + } + + return AE_OK; +} + +int dmar_device_hotplug(acpi_handle handle, bool insert) { int ret; + acpi_handle tmp = NULL; + acpi_status status; if (!dmar_in_use()) return 0; - if (!dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD)) + if (dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD)) { + tmp = handle; + } else { + status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, +ACPI_UINT32_MAX, +dmar_get_dsm_handle, +NULL, NULL, tmp); + if (ACPI_FAILURE(status)) { + pr_warn(Failed to locate _DSM method.\n); + return -ENXIO; + } + } + if (tmp == NULL) return 0; down_write(dmar_global_lock); if (insert) - ret = dmar_hotplug_insert(handle); + ret = dmar_hotplug_insert(tmp); else - ret = dmar_hotplug_remove(handle); + ret = dmar_hotplug_remove(tmp); up_write(dmar_global_lock); return ret; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 6/8] iommu/vt-d: Enhance error recovery in function intel_enable_irq_remapping()
Enhance error recovery in function intel_enable_irq_remapping() by tearing down all created data structures. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/intel_irq_remapping.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 7cf31a29f77a..81f110aae6df 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -701,9 +701,11 @@ static int __init intel_enable_irq_remapping(void) return eim ? IRQ_REMAP_X2APIC_MODE : IRQ_REMAP_XAPIC_MODE; error: - /* -* handle error condition gracefully here! -*/ + for_each_iommu(iommu, drhd) + if (ecap_ir_support(iommu-ecap)) { + iommu_disable_irq_remapping(iommu); + intel_teardown_irq_remapping(iommu); + } if (x2apic_present) pr_warn(Failed to enable irq remapping. You are vulnerable to irq-injection attacks.\n); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 7/8] iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
Implement required callback functions for intel-iommu driver to support DMAR unit hotplug. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/intel-iommu.c | 206 +++ 1 file changed, 151 insertions(+), 55 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 70d9d47eaeda..c2d369524960 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1125,8 +1125,11 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu) unsigned long flags; root = (struct root_entry *)alloc_pgtable_page(iommu-node); - if (!root) + if (!root) { + pr_err(IOMMU: allocating root entry for %s failed\n, + iommu-name); return -ENOMEM; + } __iommu_flush_cache(iommu, root, ROOT_SIZE); @@ -1466,7 +1469,7 @@ static int iommu_init_domains(struct intel_iommu *iommu) return 0; } -static void free_dmar_iommu(struct intel_iommu *iommu) +static void disable_dmar_iommu(struct intel_iommu *iommu) { struct dmar_domain *domain; int i; @@ -1490,11 +1493,16 @@ static void free_dmar_iommu(struct intel_iommu *iommu) if (iommu-gcmd DMA_GCMD_TE) iommu_disable_translation(iommu); +} - kfree(iommu-domains); - kfree(iommu-domain_ids); - iommu-domains = NULL; - iommu-domain_ids = NULL; +static void free_dmar_iommu(struct intel_iommu *iommu) +{ + if ((iommu-domains) (iommu-domain_ids)) { + kfree(iommu-domains); + kfree(iommu-domain_ids); + iommu-domains = NULL; + iommu-domain_ids = NULL; + } g_iommus[iommu-seq_id] = NULL; @@ -2701,6 +2709,41 @@ static int __init iommu_prepare_static_identity_mapping(int hw) return 0; } +static void intel_iommu_init_qi(struct intel_iommu *iommu) +{ + /* +* Start from the sane iommu hardware state. +* If the queued invalidation is already initialized by us +* (for example, while enabling interrupt-remapping) then +* we got the things already rolling from a sane state. +*/ + if (!iommu-qi) { + /* +* Clear any previous faults. +*/ + dmar_fault(-1, iommu); + /* +* Disable queued invalidation if supported and already enabled +* before OS handover. +*/ + dmar_disable_qi(iommu); + } + + if (dmar_enable_qi(iommu)) { + /* +* Queued Invalidate not enabled, use Register Based Invalidate +*/ + iommu-flush.flush_context = __iommu_flush_context; + iommu-flush.flush_iotlb = __iommu_flush_iotlb; + pr_info(IOMMU: %s using Register based invalidation\n, + iommu-name); + } else { + iommu-flush.flush_context = qi_flush_context; + iommu-flush.flush_iotlb = qi_flush_iotlb; + pr_info(IOMMU: %s using Queued invalidation\n, iommu-name); + } +} + static int __init init_dmars(void) { struct dmar_drhd_unit *drhd; @@ -2729,6 +2772,10 @@ static int __init init_dmars(void) DMAR_UNITS_SUPPORTED); } + /* Preallocate enough resources for IOMMU hot-addition */ + if (g_num_of_iommus DMAR_UNITS_SUPPORTED) + g_num_of_iommus = DMAR_UNITS_SUPPORTED; + g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *), GFP_KERNEL); if (!g_iommus) { @@ -2757,58 +2804,14 @@ static int __init init_dmars(void) * among all IOMMU's. Need to Split it later. */ ret = iommu_alloc_root_entry(iommu); - if (ret) { - printk(KERN_ERR IOMMU: allocate root entry failed\n); + if (ret) goto free_iommu; - } if (!ecap_pass_through(iommu-ecap)) hw_pass_through = 0; } - /* -* Start from the sane iommu hardware state. -*/ - for_each_active_iommu(iommu, drhd) { - /* -* If the queued invalidation is already initialized by us -* (for example, while enabling interrupt-remapping) then -* we got the things already rolling from a sane state. -*/ - if (iommu-qi) - continue; - - /* -* Clear any previous faults. -*/ - dmar_fault(-1, iommu); - /* -* Disable queued invalidation if supported and already enabled -* before OS handover. -*/ - dmar_disable_qi(iommu); - } - -
[Patch Part3 V5 8/8] pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug
Finally enhance pci_root driver to support DMAR device hotplug when hot-plugging PCI host bridges. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/acpi/pci_root.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index e6ae603ed1a1..4e177daa18e3 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -33,6 +33,7 @@ #include linux/pci.h #include linux/pci-acpi.h #include linux/pci-aspm.h +#include linux/dmar.h #include linux/acpi.h #include linux/slab.h #include acpi/apei.h /* for acpi_hest_init() */ @@ -511,6 +512,7 @@ static int acpi_pci_root_add(struct acpi_device *device, struct acpi_pci_root *root; acpi_handle handle = device-handle; int no_aspm = 0, clear_aspm = 0; + bool hotadd = system_state != SYSTEM_BOOTING; root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL); if (!root) @@ -557,6 +559,11 @@ static int acpi_pci_root_add(struct acpi_device *device, strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS); device-driver_data = root; + if (hotadd dmar_device_add(handle)) { + result = -ENXIO; + goto end; + } + pr_info(PREFIX %s [%s] (domain %04x %pR)\n, acpi_device_name(device), acpi_device_bid(device), root-segment, root-secondary); @@ -583,7 +590,7 @@ static int acpi_pci_root_add(struct acpi_device *device, root-segment, (unsigned int)root-secondary.start); device-driver_data = NULL; result = -ENODEV; - goto end; + goto remove_dmar; } if (clear_aspm) { @@ -597,7 +604,7 @@ static int acpi_pci_root_add(struct acpi_device *device, if (device-wakeup.flags.run_wake) device_set_run_wake(root-bus-bridge, true); - if (system_state != SYSTEM_BOOTING) { + if (hotadd) { pcibios_resource_survey_bus(root-bus); pci_assign_unassigned_root_bus_resources(root-bus); } @@ -607,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device, pci_unlock_rescan_remove(); return 1; +remove_dmar: + if (hotadd) + dmar_device_remove(handle); end: kfree(root); return result; @@ -625,6 +635,8 @@ static void acpi_pci_root_remove(struct acpi_device *device) pci_remove_root_bus(root-bus); + dmar_device_remove(device-handle); + pci_unlock_rescan_remove(); kfree(root); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 5/8] iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit hotplug
Implement required callback functions for intel_irq_remapping driver to support DMAR unit hotplug. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/intel_irq_remapping.c | 222 ++- 1 file changed, 169 insertions(+), 53 deletions(-) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 9b140ed854ec..7cf31a29f77a 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -36,7 +36,6 @@ struct hpet_scope { static struct ioapic_scope ir_ioapic[MAX_IO_APICS]; static struct hpet_scope ir_hpet[MAX_HPET_TBS]; -static int ir_ioapic_num, ir_hpet_num; /* * Lock ordering: @@ -325,7 +324,7 @@ static int set_ioapic_sid(struct irte *irte, int apic) down_read(dmar_global_lock); for (i = 0; i MAX_IO_APICS; i++) { - if (ir_ioapic[i].id == apic) { + if (ir_ioapic[i].iommu ir_ioapic[i].id == apic) { sid = (ir_ioapic[i].bus 8) | ir_ioapic[i].devfn; break; } @@ -352,7 +351,7 @@ static int set_hpet_sid(struct irte *irte, u8 id) down_read(dmar_global_lock); for (i = 0; i MAX_HPET_TBS; i++) { - if (ir_hpet[i].id == id) { + if (ir_hpet[i].iommu ir_hpet[i].id == id) { sid = (ir_hpet[i].bus 8) | ir_hpet[i].devfn; break; } @@ -474,17 +473,17 @@ static void iommu_set_irq_remapping(struct intel_iommu *iommu, int mode) raw_spin_unlock_irqrestore(iommu-register_lock, flags); } - -static int intel_setup_irq_remapping(struct intel_iommu *iommu, int mode) +static int intel_setup_irq_remapping(struct intel_iommu *iommu) { struct ir_table *ir_table; struct page *pages; unsigned long *bitmap; - ir_table = iommu-ir_table = kzalloc(sizeof(struct ir_table), -GFP_ATOMIC); + if (iommu-ir_table) + return 0; - if (!iommu-ir_table) + ir_table = kzalloc(sizeof(struct ir_table), GFP_ATOMIC); + if (!ir_table) return -ENOMEM; pages = alloc_pages_node(iommu-node, GFP_ATOMIC | __GFP_ZERO, @@ -493,7 +492,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu, int mode) if (!pages) { pr_err(IR%d: failed to allocate pages of order %d\n, iommu-seq_id, INTR_REMAP_PAGE_ORDER); - kfree(iommu-ir_table); + kfree(ir_table); return -ENOMEM; } @@ -508,11 +507,22 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu, int mode) ir_table-base = page_address(pages); ir_table-bitmap = bitmap; + iommu-ir_table = ir_table; - iommu_set_irq_remapping(iommu, mode); return 0; } +static void intel_teardown_irq_remapping(struct intel_iommu *iommu) +{ + if (iommu iommu-ir_table) { + free_pages((unsigned long)iommu-ir_table-base, + INTR_REMAP_PAGE_ORDER); + kfree(iommu-ir_table-bitmap); + kfree(iommu-ir_table); + iommu-ir_table = NULL; + } +} + /* * Disable Interrupt Remapping. */ @@ -667,9 +677,10 @@ static int __init intel_enable_irq_remapping(void) if (!ecap_ir_support(iommu-ecap)) continue; - if (intel_setup_irq_remapping(iommu, eim)) + if (intel_setup_irq_remapping(iommu)) goto error; + iommu_set_irq_remapping(iommu, eim); setup = 1; } @@ -700,12 +711,13 @@ error: return -1; } -static void ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope, - struct intel_iommu *iommu) +static int ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope, + struct intel_iommu *iommu, + struct acpi_dmar_hardware_unit *drhd) { struct acpi_dmar_pci_path *path; u8 bus; - int count; + int count, free = -1; bus = scope-bus; path = (struct acpi_dmar_pci_path *)(scope + 1); @@ -721,19 +733,36 @@ static void ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope, PCI_SECONDARY_BUS); path++; } - ir_hpet[ir_hpet_num].bus = bus; - ir_hpet[ir_hpet_num].devfn = PCI_DEVFN(path-device, path-function); - ir_hpet[ir_hpet_num].iommu = iommu; - ir_hpet[ir_hpet_num].id= scope-enumeration_id; - ir_hpet_num++; + + for (count = 0; count MAX_HPET_TBS; count++) { + if (ir_hpet[count].iommu == iommu + ir_hpet[count].id == scope-enumeration_id) + return 0; +
[PATCH] perf tools: define _DEFAULT_SOURCE for glibc_2.20
_BSD_SOURCE was deprecated in favour of _DEFAULT_SOURCE since glibc 2.20[1]. To avoid build warning on glibc2.20, _DEFAULT_SOURCE should also be defined. [1]: https://sourceware.org/glibc/wiki/Release/2.20 Signed-off-by: Chanho Park chanho61.p...@samsung.com --- tools/perf/util/util.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index 6686436..333d9d1 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -39,6 +39,8 @@ #define _ALL_SOURCE 1 #define _BSD_SOURCE 1 +/* glibc 2.20 deprecates _BSD_SOURCE in favour of _DEFAULT_SOURCE */ +#define _DEFAULT_SOURCE 1 #define HAS_BOOL #include unistd.h -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch Part3 V5 2/8] iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
Introduce functions to support dynamic IOMMU seq_id allocating and releasing, which will be used to support DMAR hotplug. Also rename IOMMU_UNITS_SUPPORTED as DMAR_UNITS_SUPPORTED. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/dmar.c| 40 ++-- drivers/iommu/intel-iommu.c | 13 +++-- include/linux/dmar.h|6 ++ 3 files changed, 43 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index afd46eb9a5de..b3405c50627f 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -70,6 +70,7 @@ LIST_HEAD(dmar_drhd_units); struct acpi_table_header * __initdata dmar_tbl; static acpi_size dmar_tbl_size; static int dmar_dev_scope_status = 1; +static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)]; static int alloc_iommu(struct dmar_drhd_unit *drhd); static void free_iommu(struct intel_iommu *iommu); @@ -928,11 +929,32 @@ out: return err; } +static int dmar_alloc_seq_id(struct intel_iommu *iommu) +{ + iommu-seq_id = find_first_zero_bit(dmar_seq_ids, + DMAR_UNITS_SUPPORTED); + if (iommu-seq_id = DMAR_UNITS_SUPPORTED) { + iommu-seq_id = -1; + } else { + set_bit(iommu-seq_id, dmar_seq_ids); + sprintf(iommu-name, dmar%d, iommu-seq_id); + } + + return iommu-seq_id; +} + +static void dmar_free_seq_id(struct intel_iommu *iommu) +{ + if (iommu-seq_id = 0) { + clear_bit(iommu-seq_id, dmar_seq_ids); + iommu-seq_id = -1; + } +} + static int alloc_iommu(struct dmar_drhd_unit *drhd) { struct intel_iommu *iommu; u32 ver, sts; - static int iommu_allocated = 0; int agaw = 0; int msagaw = 0; int err; @@ -946,13 +968,16 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) if (!iommu) return -ENOMEM; - iommu-seq_id = iommu_allocated++; - sprintf (iommu-name, dmar%d, iommu-seq_id); + if (dmar_alloc_seq_id(iommu) 0) { + pr_err(IOMMU: failed to allocate seq_id\n); + err = -ENOSPC; + goto error; + } err = map_iommu(iommu, drhd-reg_base_addr); if (err) { pr_err(IOMMU: failed to map %s\n, iommu-name); - goto error; + goto error_free_seq_id; } err = -EINVAL; @@ -1002,9 +1027,11 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) return 0; - err_unmap: +err_unmap: unmap_iommu(iommu); - error: +error_free_seq_id: + dmar_free_seq_id(iommu); +error: kfree(iommu); return err; } @@ -1028,6 +1055,7 @@ static void free_iommu(struct intel_iommu *iommu) if (iommu-reg) unmap_iommu(iommu); + dmar_free_seq_id(iommu); kfree(iommu); } diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 4af2206e41bc..7daa74ed46d0 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -328,17 +328,10 @@ static int hw_pass_through = 1; /* si_domain contains mulitple devices */ #define DOMAIN_FLAG_STATIC_IDENTITY(1 1) -/* define the limit of IOMMUs supported in each domain */ -#ifdef CONFIG_X86 -# define IOMMU_UNITS_SUPPORTED MAX_IO_APICS -#else -# define IOMMU_UNITS_SUPPORTED 64 -#endif - struct dmar_domain { int id; /* domain id */ int nid;/* node id */ - DECLARE_BITMAP(iommu_bmp, IOMMU_UNITS_SUPPORTED); + DECLARE_BITMAP(iommu_bmp, DMAR_UNITS_SUPPORTED); /* bitmap of iommus this domain uses*/ struct list_head devices; /* all devices' list */ @@ -2728,12 +2721,12 @@ static int __init init_dmars(void) * threaded kernel __init code path all other access are read * only */ - if (g_num_of_iommus IOMMU_UNITS_SUPPORTED) { + if (g_num_of_iommus DMAR_UNITS_SUPPORTED) { g_num_of_iommus++; continue; } printk_once(KERN_ERR intel-iommu: exceeded %d IOMMUs\n, - IOMMU_UNITS_SUPPORTED); + DMAR_UNITS_SUPPORTED); } g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *), diff --git a/include/linux/dmar.h b/include/linux/dmar.h index fac8ca34f9a8..c8a576bc3a98 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -30,6 +30,12 @@ struct acpi_dmar_header; +#ifdef CONFIG_X86 +# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS +#else +# define DMAR_UNITS_SUPPORTED64 +#endif + /* DMAR Flags */ #define DMAR_INTR_REMAP0x1 #define DMAR_X2APIC_OPT_OUT0x2 -- 1.7.10.4 -- To unsubscribe from this
[Patch Part3 V5 1/8] iommu/vt-d: Introduce helper function dmar_walk_resources()
Introduce helper function dmar_walk_resources to walk resource entries in DMAR table and ACPI buffer object returned by ACPI _DSM method for IOMMU hot-plug. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/iommu/dmar.c| 209 +++ drivers/iommu/intel-iommu.c |4 +- include/linux/dmar.h| 19 ++-- 3 files changed, 122 insertions(+), 110 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 60ab474bfff3..afd46eb9a5de 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -44,6 +44,14 @@ #include irq_remapping.h +typedef int (*dmar_res_handler_t)(struct acpi_dmar_header *, void *); +struct dmar_res_callback { + dmar_res_handler_t cb[ACPI_DMAR_TYPE_RESERVED]; + void*arg[ACPI_DMAR_TYPE_RESERVED]; + boolignore_unhandled; + boolprint_entry; +}; + /* * Assumptions: * 1) The hotplug framework guarentees that DMAR unit will be hot-added @@ -333,7 +341,7 @@ static struct notifier_block dmar_pci_bus_nb = { * present in the platform */ static int __init -dmar_parse_one_drhd(struct acpi_dmar_header *header) +dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_hardware_unit *drhd; struct dmar_drhd_unit *dmaru; @@ -364,6 +372,10 @@ dmar_parse_one_drhd(struct acpi_dmar_header *header) return ret; } dmar_register_drhd_unit(dmaru); + + if (arg) + (*(int *)arg)++; + return 0; } @@ -376,7 +388,8 @@ static void dmar_free_drhd(struct dmar_drhd_unit *dmaru) kfree(dmaru); } -static int __init dmar_parse_one_andd(struct acpi_dmar_header *header) +static int __init dmar_parse_one_andd(struct acpi_dmar_header *header, + void *arg) { struct acpi_dmar_andd *andd = (void *)header; @@ -398,7 +411,7 @@ static int __init dmar_parse_one_andd(struct acpi_dmar_header *header) #ifdef CONFIG_ACPI_NUMA static int __init -dmar_parse_one_rhsa(struct acpi_dmar_header *header) +dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_rhsa *rhsa; struct dmar_drhd_unit *drhd; @@ -425,6 +438,8 @@ dmar_parse_one_rhsa(struct acpi_dmar_header *header) return 0; } +#else +#definedmar_parse_one_rhsa dmar_res_noop #endif static void __init @@ -486,6 +501,52 @@ static int __init dmar_table_detect(void) return (ACPI_SUCCESS(status) ? 1 : 0); } +static int dmar_walk_resources(struct acpi_dmar_header *start, size_t len, + struct dmar_res_callback *cb) +{ + int ret = 0; + struct acpi_dmar_header *iter, *next; + struct acpi_dmar_header *end = ((void *)start) + len; + + for (iter = start; iter end ret == 0; iter = next) { + next = (void *)iter + iter-length; + if (iter-length == 0) { + /* Avoid looping forever on bad ACPI tables */ + pr_debug(FW_BUG Invalid 0-length structure\n); + break; + } else if (next end) { + /* Avoid passing table end */ + pr_warn(FW_BUG record passes table end\n); + ret = -EINVAL; + break; + } + + if (cb-print_entry) + dmar_table_print_dmar_entry(iter); + + if (iter-type = ACPI_DMAR_TYPE_RESERVED) { + /* continue for forward compatibility */ + pr_debug(Unknown DMAR structure type %d\n, +iter-type); + } else if (cb-cb[iter-type]) { + ret = cb-cb[iter-type](iter, cb-arg[iter-type]); + } else if (!cb-ignore_unhandled) { + pr_warn(No handler for DMAR structure type %d\n, + iter-type); + ret = -EINVAL; + } + } + + return ret; +} + +static inline int dmar_walk_dmar_table(struct acpi_table_dmar *dmar, + struct dmar_res_callback *cb) +{ + return dmar_walk_resources((struct acpi_dmar_header *)(dmar + 1), + dmar-header.length - sizeof(*dmar), cb); +} + /** * parse_dmar_table - parses the DMA reporting table */ @@ -493,9 +554,18 @@ static int __init parse_dmar_table(void) { struct acpi_table_dmar *dmar; - struct acpi_dmar_header *entry_header; int ret = 0; int drhd_count = 0; + struct dmar_res_callback cb = { + .print_entry = true, + .ignore_unhandled = true, + .arg[ACPI_DMAR_TYPE_HARDWARE_UNIT] = drhd_count, + .cb[ACPI_DMAR_TYPE_HARDWARE_UNIT] = dmar_parse_one_drhd, +
[PATCH v2 0/2] mfd: rtsx: fix PM suspend for 5227 5249
From: Micky Ching micky_ch...@realsil.com.cn v2: using (err 0) to check if a function failed, not using if (err) and if (err 0) in mixing way. This patch fix rts5227 and rts5249 suspend issue, when card reader resumed from suspend state, the power state should reset before send buffer command. The original not reset PM state first, so this will lead resume failed, and can not do anything more. Micky Ching (2): mfd: rtsx: fix PM suspend for 5227 mfd: rtsx: fix PM suspend for 5249 drivers/mfd/rts5227.c| 19 +++ drivers/mfd/rts5249.c| 17 + include/linux/mfd/rtsx_pci.h | 12 3 files changed, 48 insertions(+) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] mfd: rtsx: fix PM suspend for 5249
From: Micky Ching micky_ch...@realsil.com.cn Fix rts5249 failed send buffer cmd after suspend, PM_CTRL3 should reset before send any buffer cmd after suspend. Otherwise, buffer cmd will failed, this will lead resume fail. Signed-off-by: Micky Ching micky_ch...@realsil.com.cn --- drivers/mfd/rts5249.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/mfd/rts5249.c b/drivers/mfd/rts5249.c index 573de7b..5dd7dc0 100644 --- a/drivers/mfd/rts5249.c +++ b/drivers/mfd/rts5249.c @@ -126,10 +126,27 @@ static int rts5249_extra_init_hw(struct rtsx_pcr *pcr) return rtsx_pci_send_cmd(pcr, 100); } +static int rts5249_pm_reset(struct rtsx_pcr *pcr) +{ + int err; + + /* init aspm */ + err = rtsx_pci_update_cfg_byte(pcr, LCTLR, 0xFC, 0); + if (err 0) + return err; + + /* reset PM_CTRL3 before send buffer cmd */ + return rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x00); +} + static int rts5249_optimize_phy(struct rtsx_pcr *pcr) { int err; + err = rts5249_pm_reset(pcr); + if (err 0) + return err; + err = rtsx_pci_write_phy_register(pcr, PHY_REG_REV, PHY_REG_REV_RESV | PHY_REG_REV_RXIDLE_LATCHED | PHY_REG_REV_P1_EN | PHY_REG_REV_RXIDLE_EN | -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 4/4 v3] GPIO: gpio-dwapb: Suspend Resume PM enabling
On Tue, 9 Sep 2014, Weike Chen wrote: struct dwapb_gpio; +struct dwapb_context; struct dwapb_gpio_port { struct bgpio_chip bgc; boolis_registered; struct dwapb_gpio *gpio; + struct dwapb_context*ctx; Alvin, Will this build if CONFIG_PM_SLEEP is not defined? Actually, PM_SLEEP is always set as 'y' in 'kerne/power/Kconfig'. But I manually change it to 'n', this module can be compiled correctly. You may be concern with 'ctx', and you can see 'ctx' accessing is always in CONFIG_PM_SLEEP. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf top -g -U --sort=symbol --children == lalalalala?
On Thu, 2014-09-11 at 18:30 -0300, Arnaldo Carvalho de Melo wrote: Also, looking at the changelog entries and at tools/perf/Documentation/ the only description for --children, the default, is: --children:: Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new Children column and will be sorted on the data. It requires callchains are recorded. grep of course found that, and git log found more, but nothing told me what the heck it's sweeping up that's so darn plentiful in my box that there's more than 100% of it laying about :) I think that a longer/clearer entry in the 'perf record' man page is required. Perhaps the description got lost in a --cover-letter for the patch series implementing it? If it ever existed, I can't find it. A little blurb would be helpful. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slab: implement kmalloc guard
On Mon, 8 Sep 2014, Christoph Lameter wrote: On Mon, 8 Sep 2014, Mikulas Patocka wrote: I don't know what you mean. If someone allocates 1 objects with sizes from 1 to 1, you can't have 1 slab caches - you can't have a slab cache for each used size. Also - you can't create a slab cache in interrupt context. Oh you can create them up front on bootup. And I think only the small sizes matter. Allocations =8K are pushed to the page allocator anyways. Only for SLUB. For SLAB, large allocations are still use SLAB caches up to 4M. But anyway - having 8K preallocated slab caches is too much. If you want to integrate this patch into the slab/slub subsystem, a better solution would be to store the exact size requested with kmalloc along the slab/slub object itself (before the preceding redzone). But it would result in duplicating the work - you'd have to repeat the logic in this patch three times - once for slab, once for slub and once for kmalloc_large/kmalloc_large_node. I don't know if it would be better than this patch. We already have a redzone structure to check for writes over the end of the object. Lets use that. So, change all three slab subsystems to use that. SLOB has no debugging features and I think that was intentional. We are trying to unify the debug checks etc. Some work on that would be appreciated. I think the kmalloc creation is already in slab_common.c Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Deadlock in vtime_account_user() vs itself across a page fault
On Thu, Sep 11, 2014 at 11:54:34PM +0100, David Howells wrote: Whilst trying to use docker, I'm occasionally seeing the attached deadlock in user time accounting, with a page fault in the middle. The relevant lines from the pre-fault bits of stack: [8106d954] ? cpuacct_account_field+0x65/0x9a (gdb) i li *0x8106d954 Line 272 of ../kernel/sched/cpuacct.c kcpustat-cpustat[index] += val; [81060d41] account_user_time+0x62/0x95 (gdb) i li *0x81060d41 Line 151 of ../kernel/sched/cputime.c acct_account_cputime(p); [81061254] vtime_account_user+0x62/0x8d (gdb) i li *0x81061254 Line 264 of ../include/linux/seqlock.h in write_seqcount_end(): seqcount_release(s-dep_map, 1, _RET_IP_); I can't see any particular reason there should be a page fault occurring, except that there's a duff kernel pointer, but I don't get to find out because the page fault handling doesn't get that far:-/ David --- = [ INFO: possible recursive locking detected ] 3.17.0-rc4-fsdevel+ #706 Tainted: GW - NetworkManager/2305 is trying to acquire lock: (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: [8106120d] vtime_account_user+0x1b/0x8d but task is already holding lock: (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: [8106120d] vtime_account_user+0x1b/0x8d other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(((p-vtime_seqlock)-lock)-rlock); lock(((p-vtime_seqlock)-lock)-rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by NetworkManager/2305: #0: (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: [8106120d] vtime_account_user+0x1b/0x8d #1: ((p-vtime_seqlock)-seqcount){-.}, at: [810df2f9] context_tracking_user_exit+0x54/0xb7 #2: (rcu_read_lock){..}, at: [8106d8ef] cpuacct_account_field+0x0/0x9a stack backtrace: CPU: 0 PID: 2305 Comm: NetworkManager Tainted: GW 3.17.0-rc4-fsdevel+ #706 Hardware name: /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006 8800389bfbe0 815063fd 8235c880 8800389bfcc0 810717f5 8800389bfcd0 81071a90 8106d85d 0001 81061200 Call Trace: [815063fd] dump_stack+0x4d/0x66 [810717f5] __lock_acquire+0x7d7/0x1a2a [81071a90] ? __lock_acquire+0xa72/0x1a2a [8106d85d] ? cpuacct_css_alloc+0x93/0x93 [81061200] ? vtime_account_user+0xe/0x8d [81071a90] ? __lock_acquire+0xa72/0x1a2a [810730fc] lock_acquire+0x8b/0x101 [810730fc] ? lock_acquire+0x8b/0x101 [8106120d] ? vtime_account_user+0x1b/0x8d [8150bc4b] _raw_spin_lock+0x2b/0x3a [8106120d] ? vtime_account_user+0x1b/0x8d [8106120d] vtime_account_user+0x1b/0x8d [810df2f9] context_tracking_user_exit+0x54/0xb7 [81030682] do_page_fault+0x3a/0x54 [8150e462] page_fault+0x22/0x30 [8106d954] ? cpuacct_account_field+0x65/0x9a vmalloc'ed areas can fault due to lazy mapping. That would be an excellent candidate here because cpuacct_account_field() accesses per cpu stats that are allocated with alloc_percpu() which uses...vmalloc(). vmalloc() faults have always been a PITA. Especially with per cpu allocation, basically it means that the kernel can fault about anywhere. So the only solution I see right now is to move task_group_account_field() outside the lock. It doesn't need it, but that means I need to split up account_user_time() and have less common code between tickless and tick time accounting. In the hope that the other accounting code (acct, group accounting, ...) doesn't access more percpu allocated stuffs. Ah, I could also have a recursion detection in the vtime_account_*() functions. Yeah that would be much safer. The recursive call could simply ignore and let the first caller do the accounting. But that means we could account exception time into user time. We could also do both and let the recursive call warn. BTW I should check if I can turn the seqlock into a seqcount, not that it would fix anything here though. It looks like it's only ever updated locally. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] ASoC: fsl_ssi: refine ipg clock usage in this module
Shengjiu Wang wrote: + ret = clk_prepare_enable(ssi_private-clk); + if (ret) + return ret; Will this work on PowerPC, where ssi_private-clk is always NULL? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net-next v2] r8152: support VLAN
Support hw VLAN for tx and rx. And enable them by default. Signed-off-by: Hayes Wang hayesw...@realtek.com --- drivers/net/usb/r8152.c | 79 - 1 file changed, 65 insertions(+), 14 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 2130c75..9403219 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -506,6 +506,7 @@ struct rx_desc { #define IPF(1 23) /* IP checksum fail */ #define UDPF (1 22) /* UDP checksum fail */ #define TCPF (1 21) /* TCP checksum fail */ +#define RX_VLAN_TAG(1 16) __le32 opts4; __le32 opts5; @@ -531,6 +532,7 @@ struct tx_desc { #define MSS_MAX0x7ffU #define TCPHO_SHIFT17 #define TCPHO_MAX 0x7ffU +#define TX_VLAN_TAG(1 16) }; struct r8152; @@ -1423,6 +1425,25 @@ static int msdn_giant_send_check(struct sk_buff *skb) return ret; } +static inline void rtl_tx_vlan_tag(struct tx_desc *desc, struct sk_buff *skb) +{ + if (vlan_tx_tag_present(skb)) { + u32 opts2; + + opts2 = TX_VLAN_TAG | swab16(vlan_tx_tag_get(skb)); + desc-opts2 |= cpu_to_le32(opts2); + } +} + +static inline void rtl_rx_vlan_tag(struct rx_desc *desc, struct sk_buff *skb) +{ + u32 opts2 = le32_to_cpu(desc-opts2); + + if (opts2 RX_VLAN_TAG) + __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), + swab16(opts2 0x)); +} + static int r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb, u32 len, u32 transport_offset) { @@ -1550,6 +1571,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg) continue; } + rtl_tx_vlan_tag(tx_desc, skb); + tx_data += sizeof(*tx_desc); len = skb-len; @@ -1691,6 +1714,7 @@ static void rx_bottom(struct r8152 *tp) memcpy(skb-data, rx_data, pkt_len); skb_put(skb, pkt_len); skb-protocol = eth_type_trans(skb, netdev); + rtl_rx_vlan_tag(rx_desc, skb); netif_receive_skb(skb); stats-rx_packets++; stats-rx_bytes += pkt_len; @@ -2082,6 +2106,34 @@ static void r8152_power_cut_en(struct r8152 *tp, bool enable) ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data); } +static void rtl_rx_vlan_en(struct r8152 *tp, bool enable) +{ + u32 ocp_data; + + ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR); + if (enable) + ocp_data |= CPCR_RX_VLAN; + else + ocp_data = ~CPCR_RX_VLAN; + ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data); +} + +static int rtl8152_set_features(struct net_device *dev, + netdev_features_t features) +{ + netdev_features_t changed = features ^ dev-features; + struct r8152 *tp = netdev_priv(dev); + + if (changed NETIF_F_HW_VLAN_CTAG_RX) { + if (features NETIF_F_HW_VLAN_CTAG_RX) + rtl_rx_vlan_en(tp, true); + else + rtl_rx_vlan_en(tp, false); + } + + return 0; +} + #define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST) static u32 __rtl_get_wol(struct r8152 *tp) @@ -2330,9 +2382,7 @@ static void r8152b_exit_oob(struct r8152 *tp) ocp_write_dword(tp, MCU_TYPE_USB, USB_TX_DMA, TEST_MODE_DISABLE | TX_SIZE_ADJUST1); - ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR); - ocp_data = ~CPCR_RX_VLAN; - ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data); + rtl_rx_vlan_en(tp, tp-netdev-features NETIF_F_HW_VLAN_CTAG_RX); ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8152_RMS); @@ -2376,9 +2426,7 @@ static void r8152b_enter_oob(struct r8152 *tp) ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8152_RMS); - ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR); - ocp_data |= CPCR_RX_VLAN; - ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data); + rtl_rx_vlan_en(tp, true); ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PAL_BDC_CR); ocp_data |= ALDPS_PROXY_MODE; @@ -2532,9 +2580,7 @@ static void r8153_first_init(struct r8152 *tp) usleep_range(1000, 2000); } - ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR); - ocp_data = ~CPCR_RX_VLAN; - ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data); + rtl_rx_vlan_en(tp, tp-netdev-features NETIF_F_HW_VLAN_CTAG_RX); ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8153_RMS); ocp_write_byte(tp, MCU_TYPE_PLA,
[Bugfix] x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition
With typical CPU hot-addition flow on x86, PCI host bridges embedded in physical processor are always associated with NOMA_NO_NODE, which may cause sub-optimal performance. 1) Handle CPU hot-addition notification acpi_processor_add() acpi_processor_get_info() acpi_processor_hotadd_init() acpi_map_lsapic() 1.a)acpi_map_cpu2node() 2) Handle PCI host bridge hot-addition notification acpi_pci_root_add() pci_acpi_scan_root() 2.a)if (node != NUMA_NO_NODE !node_online(node)) node = NUMA_NO_NODE; 3) Handle memory hot-addition notification acpi_memory_device_add() acpi_memory_enable_device() add_memory() 3.a)node_set_online(); 4) Online CPUs through sysfs interfaces cpu_subsys_online() cpu_up() try_online_node() 4.a)node_set_online(); So associated node is always in offline state because it is onlined until step 3.a or 4.a. We could improve performance by online node at step 1.a. This change also makes the code symmetric. Nodes are always created when handling CPU/memory hot-addition events instead of handling user requests from sysfs interfaces, and are destroyed when handling CPU/memory hot-removal events. It also close a race window caused by kmalloc_node(cpu_to_node(cpu)), which may cause system panic as below. [ 3663.324476] BUG: unable to handle kernel paging request at 1f08 [ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0 [ 3663.344773] Oops: [#1] SMP [ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid [ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ #21 [ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014 [ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 880845fba000 [ 3663.422741] RIP: 0010:[81172219] [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.432857] RSP: 0018:880845fbbcd0 EFLAGS: 00010246 [ 3663.439265] RAX: 1f00 RBX: RCX: [ 3663.447291] RDX: RSI: 0a8d RDI: 81a8d950 [ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001 [ 3663.463345] R10: 0001 R11: R12: 002052d0 [ 3663.471363] R13: 880854c07600 R14: 0002 R15: [ 3663.479389] FS: 7f2e8b99e800() GS:88105a40() knlGS: [ 3663.488514] CS: 0010 DS: ES: CR0: 80050033 [ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0 [ 3663.503476] Stack: [ 3663.505757] 811bd74d 880854c01d98 880854c01df0 880854c01dd0 [ 3663.514167] 0003208ca420 00075a5d84d0 88082fe54b00 811bb35f [ 3663.522567] 880854c07600 0003 1f00 880845fbbd48 [ 3663.530976] Call Trace: [ 3663.533753] [811bd74d] ? deactivate_slab+0x41d/0x4f0 [ 3663.540421] [811bb35f] ? new_slab+0x3f/0x2d0 [ 3663.546307] [811bb3c5] new_slab+0xa5/0x2d0 [ 3663.552001] [81768c97] __slab_alloc+0x35d/0x54a [ 3663.558185] [810a4845] ? local_clock+0x25/0x30 [ 3663.564686] [8177a34c] ? __do_page_fault+0x4ec/0x5e0 [ 3663.571356] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.578609] [810c77f1] ? __raw_spin_lock_init+0x21/0x60 [ 3663.585570] [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0 [ 3663.593112] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.600363] [810b0054] alloc_fair_sched_group+0xc4/0x190 [ 3663.607423] [810a359f] sched_create_group+0x3f/0x80 [ 3663.613994] [810b611f] sched_autogroup_create_attach+0x3f/0x1b0 [ 3663.621732] [8108258a] sys_setsid+0xea/0x110 [ 3663.628020] [8177f42d] system_call_fastpath+0x1a/0x1f [ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89 [ 3663.657032] RIP [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.664491] RSP 880845fbbcd0 [ 3663.668429] CR2: 1f08 [ 3663.672659] ---[
RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
On 2014-09-12, Thomas Gleixner wrote: On Thu, 11 Sep 2014, Qiaowei Ren wrote: Due to new fields about bound violation added into struct siginfo, this patch syncs it with general version to avoid build issue. You completely fail to explain which build issue is addressed by this patch. The code you added to kernel/signal.c which accesses _addr_bnd is guarded by +#ifdef SEGV_BNDERR which is not defined my MIPS. Also why is this only affecting MIPS and not any other architecture which provides its own struct siginfo ? That patch makes no sense at all, at least not without a proper explanation. For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like no _lower in struct siginfo when arch=mips. In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. Thanks, Qiaowei Signed-off-by: Qiaowei Ren qiaowei@intel.com --- arch/mips/include/uapi/asm/siginfo.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h index e811744..d08f83f 100644 --- a/arch/mips/include/uapi/asm/siginfo.h +++ b/arch/mips/include/uapi/asm/siginfo.h @@ -92,6 +92,10 @@ typedef struct siginfo { int _trapno;/* TRAP # which caused the signal */ #endif short _addr_lsb; +struct { +void __user *_lower; +void __user *_upper; +} _addr_bnd; } _sigfault; /* SIGPOLL, SIGXFSZ (To do ...) */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + * This function will be called by do_munmap(), and the VMAs + covering + * the virtual address region start...end have already been split + if + * necessary and remvoed from the VMA list. remvoed - removed +void mpx_unmap(struct mm_struct *mm, +unsigned long start, unsigned long end) { +int ret; + +ret = mpx_try_unmap(mm, start, end); +if (ret == -EINVAL) +force_sig(SIGSEGV, current); +} In the case of a fault during an unmap, this just ignores the situation and returns silently. Where is the code to retry the freeing operation outside of mmap_sem? Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post. Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] Hibernate: Do not assume the first e820 area to be RAM
In arch/x86/kernel/setup.c::trim_bios_range(), the codes introduced by 1b5576e6 (base on d8a9e6a5), it updates the first 4Kb of memory to be E820_RESERVED region. That's because it's a BIOS owned area but generally not listed in the E820 table: [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00096fff] usable [0.00] BIOS-e820: [mem 0x00097000-0x00097fff] reserved ... [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable But the region of first 4Kb didn't register to nosave memory: [0.00] PM: Registered nosave memory: [mem 0x00097000-0x00097fff] [0.00] PM: Registered nosave memory: [mem 0x000a-0x000f] The codes in e820_mark_nosave_regions() assumes the first e820 area to be RAM, so it causes the first 4Kb E820_RESERVED region ignored when register to nosave. This patch removed assumption of the first e820 area. v2: To avoid the i 0 checking in for loop. (coding suggestion from Yinghai Lu) Cc: Rafael J. Wysocki r...@rjwysocki.net Cc: Len Brown len.br...@intel.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Yinghai Lu ying...@kernel.org Acked-by: Pavel Machek pa...@ucw.cz Signed-off-by: Lee, Chun-Yi j...@suse.com --- arch/x86/kernel/e820.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) Index: linux-3.12-SLE12/arch/x86/kernel/e820.c === --- linux-3.12-SLE12.orig/arch/x86/kernel/e820.c +++ linux-3.12-SLE12/arch/x86/kernel/e820.c @@ -682,15 +682,14 @@ void __init parse_e820_ext(u64 phys_addr * hibernation (32 bit) or software suspend and suspend to RAM (64 bit). * * This function requires the e820 map to be sorted and without any - * overlapping entries and assumes the first e820 area to be RAM. + * overlapping entries. */ void __init e820_mark_nosave_regions(unsigned long limit_pfn) { int i; - unsigned long pfn; + unsigned long pfn = 0; - pfn = PFN_DOWN(e820.map[0].addr + e820.map[0].size); - for (i = 1; i e820.nr_map; i++) { + for (i = 0; i e820.nr_map; i++) { struct e820entry *ei = e820.map[i]; if (pfn PFN_UP(ei-addr)) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + +return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u +MPX_BNDCFG_ADDR_MASK); +} I don't think casting a u64 to a ulong, then to a pointer is useful. Just take the '(unsigned long)' out. If so, this will spits out a warning on 32-bit: arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir': arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Thanks, Qiaowei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 4/4] xen, blkback: add support for multiple block rings
On 09/12/2014 07:45 AM, Arianna Avanzini wrote: On Fri, Aug 22, 2014 at 02:15:58PM +0100, David Vrabel wrote: On 22/08/14 12:20, Arianna Avanzini wrote: This commit adds to xen-blkback the support to retrieve the block layer API being used and the number of available hardware queues, in case the block layer is using the multi-queue API. This commit also lets the driver advertise the number of available hardware queues to the frontend via XenStore, therefore allowing for actual multiple I/O rings to be used. Does it make sense for number of queues should be dependent on the number of queues available in the underlying block device? Thank you for raising that point. It probably is not the best solution. Bob Liu suggested to have the number of I/O rings depend on the number of vCPUs in the driver domain. Konrad Wilk suggested to compute the number of I/O rings according to the following formula to preserve the possibility to explicitly define the number of hardware queues to be exposed to the frontend: what_backend_exposes = some_module_parameter ? : min(nr_online_cpus(), nr_hardware_queues()). io_rings = min(nr_online_cpus(), what_backend_exposes); (Please do correct me if I misunderstood your point) Since xen-netfront/xen-netback driver have already implemented multi-queue, I'd like we can use the same way as the net driver negotiate of number of queues. Thanks, -Bob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] implement zsmalloc shrinking
On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote: Now that zswap can use zsmalloc as a storage pool via zpool, it will try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent limit. These patches implement zsmalloc shrinking. The way the pool is shrunk is by finding a zspage and reclaiming it, by evicting each of its objects that is in use. Without these patches zswap, and any other future user of zpool/zsmalloc that attempts to shrink the zpool/zs_pool, will only get errors and will be unable to shrink its zpool/zs_pool. With the ability to shrink, zswap can keep the most recent compressed pages in memory. Note that the design of zsmalloc makes it impossible to actually find the LRU zspage, so each class and fullness group is searched in a round-robin method to find the next zspage to reclaim. Each fullness group orders its zspages in LRU order, so the oldest zspage is used for each fullness group. After a quick inspection, the code looks reasonable. Thanks! I do wonder if this actually works well in practice though. Have you run any tests that overflow the zsmalloc pool? What does performance look like at that point? I would expect it would be worse than allowing the overflow pages to go straight to swap, since, in almost every case, you would be writing back more than one page. In some cases, MANY more than one page (up to 255 for a full zspage in the minimum class size). There have always been two sticking points with shrinking in zsmalloc (one of which you have mentioned) 1) Low LRU locality among objects in a zspage. zsmalloc values density over reclaim ordering so it is hard to make good reclaim selection decisions. 2) Writeback storm. If you try to reclaim a zspage with lots of objects (i.e. small class size in fullness group ZS_FULL) you can create a ton of memory pressure by uncompressing objects and adding them to the swap cache. A few reclaim models: - Reclaim zspage with fewest objects: This reduces writeback storm but would likely reclaim more recently allocated zspages that contain more recently used (added) objects. - Reclaim zspage with largest class size: This also reduces writeback storm as zspages with larger objects (poorly compressible) are written back first. This is not LRU though. This is the best of the options IMHO. I'm not saying that is it good. - Reclaim LRU round-robin through the fullness groups (approach used): The LRU here is limited since as the number of object in the zspage increase, it is LRU only wrt the most recently added object in the zspage. It also has high risk of a writeback storm since it will eventually try to reclaim from the ZS_FULL group of the minimum class size. There is also the point that writing back objects might not be the best way to reclaim from zsmalloc at all. Maybe compaction is the way to go. This was recently discussed on the list. http://marc.info/?l=linux-mmm=140917577412645w=2 As mentioned in that thread, it would require zsmalloc to add a layer of indirection so that the objects could be relocated without notifying the user. The compaction mechanism would also be fun to design I imagine. But, in my mind, compaction is really needed, regardless of whether or not zsmalloc is capable of writeback, and would be more beneficial. tl;dr version: I would really need to see some evidence (and try it myself) that this didn't run off a cliff when you overflow the zsmalloc pool. It seems like additional risk and complexity to avoid LRU inversion _after_ the pool overflows. And by avoid I mean maybe avoid as the reclaim selection is just slightly more LRUish than random selection. Thanks, Seth --- This patch set applies to linux-next. Dan Streetman (10): zsmalloc: fix init_zspage free obj linking zsmalloc: add fullness group list for ZS_FULL zspages zsmalloc: always update lru ordering of each zspage zsmalloc: move zspage obj freeing to separate function zsmalloc: add atomic index to find zspage to reclaim zsmalloc: add zs_ops to zs_pool zsmalloc: add obj_handle_is_free() zsmalloc: add reclaim_zspage() zsmalloc: add zs_shrink() zsmalloc: implement zs_zpool_shrink() with zs_shrink() drivers/block/zram/zram_drv.c | 2 +- include/linux/zsmalloc.h | 7 +- mm/zsmalloc.c | 314 +- 3 files changed, 290 insertions(+), 33 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote: When zsmalloc creates a new zspage, it initializes each object it contains with a link to the next object, so that the zspage has a singly-linked list of its free objects. However, the logic that sets up the links is wrong, and in the case of objects that are precisely aligned with the page boundries (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the next page is skipped, due to incrementing the offset twice. The logic can be simplified, as it doesn't need to calculate how many objects can fit on the current page; simply checking the offset for each object is enough. Change zsmalloc init_zspage() logic to iterate through each object on each of its pages, checking the offset to verify the object is on the current page before linking it into the zspage. Signed-off-by: Dan Streetman ddstr...@ieee.org Cc: Minchan Kim minc...@kernel.org This one stands on its own as a bugfix. Reviewed-by: Seth Jennings sjenni...@variantweb.net --- mm/zsmalloc.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index c4a9157..03aa72f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) while (page) { struct page *next_page; struct link_free *link; - unsigned int i, objs_on_page; + unsigned int i = 1; /* * page-index stores offset of first object starting @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class) link = (struct link_free *)kmap_atomic(page) + off / sizeof(*link); - objs_on_page = (PAGE_SIZE - off) / class-size; - for (i = 1; i = objs_on_page; i++) { - off += class-size; - if (off PAGE_SIZE) { - link-next = obj_location_to_handle(page, i); - link += class-size / sizeof(*link); - } + while ((off += class-size) PAGE_SIZE) { + link-next = obj_location_to_handle(page, i++); + link += class-size / sizeof(*link); } /* @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) link-next = obj_location_to_handle(next_page, 0); kunmap_atomic(link); page = next_page; - off = (off + class-size) % PAGE_SIZE; + off %= PAGE_SIZE; } } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build failure after merge of the slave-dma tree
Hi Vinod, After merging the slave-dma tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: drivers/spi/spi-pxa2xx-pci.c:70:3: error: unknown field 'max_clk_rate' specified in initializer .max_clk_rate = 5000, ^ Caused by commit bfe607a528ba (spi/pxa2xx-pci: Add support for Intel Braswell). I have used the slave-dma tree from next-20140911 for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
[PATCH v2 0/2] Add irq_over_gpio DT support to STMPE
These patches add support for using a GPIO as an IRQ source for the STMPE module when configured using device tree. Changes since v1: - Split actual patch and Documentation into two parts Sean Cross (2): mfd: stmpe: support gpio over irq under device tree mfd: stmpe: Document DT binding for irq_over_gpio Documentation/devicetree/bindings/mfd/stmpe.txt | 1 + drivers/mfd/stmpe.c | 7 ++- 2 files changed, 7 insertions(+), 1 deletion(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] mfd: stmpe: support gpio over irq under device tree
The stmpe_platform_data has a irq_over_gpio field, which allows the system to read STMPE events whenever an IRQ occurs on a GPIO pin. This patch adds the ability to configure this field and to use a GPIO as an IRQ source for boards configuring the STMPE in device tree. Signed-off-by: Sean Cross x...@kosagi.com --- drivers/mfd/stmpe.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c index 3b6bfa7..4c42b05 100644 --- a/drivers/mfd/stmpe.c +++ b/drivers/mfd/stmpe.c @@ -1122,7 +1122,12 @@ static void stmpe_of_probe(struct stmpe_platform_data *pdata, if (pdata-id 0) pdata-id = -1; - pdata-irq_trigger = IRQF_TRIGGER_NONE; + pdata-irq_gpio = of_get_named_gpio_flags(np, irq-gpio, 0, + pdata-irq_trigger); + if (gpio_is_valid(pdata-irq_gpio)) + pdata-irq_over_gpio = 1; + else + pdata-irq_trigger = IRQF_TRIGGER_NONE; of_property_read_u32(np, st,autosleep-timeout, pdata-autosleep_timeout); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] mfd: stmpe: Document DT binding for irq_over_gpio
STMPE now supports using a GPIO as an IRQ source. Document the device tree binding for this option. Signed-off-by: Sean Cross x...@kosagi.com --- Documentation/devicetree/bindings/mfd/stmpe.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt b/Documentation/devicetree/bindings/mfd/stmpe.txt index 56edb55..3fb68bf 100644 --- a/Documentation/devicetree/bindings/mfd/stmpe.txt +++ b/Documentation/devicetree/bindings/mfd/stmpe.txt @@ -13,6 +13,7 @@ Optional properties: - interrupt-parent : Specifies which IRQ controller we're connected to - wakeup-source: Marks the input device as wakable - st,autosleep-timeout : Valid entries (ms); 4, 16, 32, 64, 128, 256, 512 and 1024 + - irq-gpio : If present, which GPIO to use for event IRQ Example: -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/10] zsmalloc: always update lru ordering of each zspage
On Thu, Sep 11, 2014 at 04:53:54PM -0400, Dan Streetman wrote: Update ordering of a changed zspage in its fullness group LRU list, even if it has not moved to a different fullness group. This is needed by zsmalloc shrinking, which partially relies on each class fullness group list to be kept in LRU order, so the oldest can be reclaimed first. Currently, LRU ordering is only updated when a zspage changes fullness groups. Just something I saw. fix_fullness_group() is called from zs_free(), which means that removing an object from a zspage moves it to the front of the LRU. Not sure if that is what we want. If anything that makes it a _better_ candidate for reclaim as the zspage is now contains fewer objects that we'll have to decompress and writeback. Seth Signed-off-by: Dan Streetman ddstr...@ieee.org Cc: Minchan Kim minc...@kernel.org --- mm/zsmalloc.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index fedb70f..51db622 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -467,16 +467,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool, BUG_ON(!is_first_page(page)); get_zspage_mapping(page, class_idx, currfg); - newfg = get_fullness_group(page); - if (newfg == currfg) - goto out; - class = pool-size_class[class_idx]; + newfg = get_fullness_group(page); + /* Need to do this even if currfg == newfg, to update lru */ remove_zspage(page, class, currfg); insert_zspage(page, class, newfg); - set_zspage_mapping(page, class_idx, newfg); + if (currfg != newfg) + set_zspage_mapping(page, class_idx, newfg); -out: return newfg; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] toshiba_acpi: Support new keyboard backlight type
Newer Toshiba models now come with a new (and different) keyboard backlight implementation with three modes of operation: TIMER, ON and OFF, and the LED is controlled internally by the firmware. This patch adds support for that type of backlight, changing the existing code to accomodate the new implementation. The timeout value range is now 1-60 seconds, and the accepted modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF), this adds two new entries keyboard_type and available_kbd_modes, the first shows the keyboard type and the latter shows the supported modes depending on the type. Signed-off-by: Azael Avalos coproscef...@gmail.com --- drivers/platform/x86/toshiba_acpi.c | 193 +--- 1 file changed, 158 insertions(+), 35 deletions(-) diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c index 4c8fa7b..a5d7d83 100644 --- a/drivers/platform/x86/toshiba_acpi.c +++ b/drivers/platform/x86/toshiba_acpi.c @@ -138,8 +138,12 @@ MODULE_LICENSE(GPL); #define HCI_WIRELESS_BT_PRESENT0x0f #define HCI_WIRELESS_BT_ATTACH 0x40 #define HCI_WIRELESS_BT_POWER 0x80 +#define SCI_KBD_MODE_MASK 0x1f #define SCI_KBD_MODE_FNZ 0x1 #define SCI_KBD_MODE_AUTO 0x2 +#define SCI_KBD_MODE_ON0x8 +#define SCI_KBD_MODE_OFF 0x10 +#define SCI_KBD_TIME_MAX 0x3c001a struct toshiba_acpi_dev { struct acpi_device *acpi_dev; @@ -155,6 +159,7 @@ struct toshiba_acpi_dev { int force_fan; int last_key_event; int key_event_valid; + int kbd_type; int kbd_mode; int kbd_time; @@ -495,6 +500,42 @@ static enum led_brightness toshiba_illumination_get(struct led_classdev *cdev) } /* KBD Illumination */ +static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev) +{ + u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 }; + u32 out[HCI_WORDS]; + acpi_status status; + + if (!sci_open(dev)) + return 0; + + status = hci_raw(dev, in, out); + sci_close(dev); + if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) { + pr_err(ACPI call to query kbd illumination support failed\n); + return 0; + } else if (out[0] == HCI_NOT_SUPPORTED) { + pr_info(Keyboard illumination not available\n); + return 0; + } + + /* Check for keyboard backlight timeout max value, +* previous kbd backlight implementation set this to +* 0x3c0003, and now the new implementation set this +* to 0x3c001a, use this to distinguish between them +*/ + if (out[3] == SCI_KBD_TIME_MAX) + dev-kbd_type = 2; + else + dev-kbd_type = 1; + /* Get the current keyboard backlight mode */ + dev-kbd_mode = out[2] SCI_KBD_MODE_MASK; + /* Get the current time (1-60 seconds) */ + dev-kbd_time = out[2] HCI_MISC_SHIFT; + + return 1; +} + static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 time) { u32 result; @@ -1254,6 +1295,62 @@ static const struct backlight_ops toshiba_backlight_data = { /* * Sysfs files */ +static ssize_t toshiba_kbd_bl_mode_store(struct device *dev, +struct device_attribute *attr, +const char *buf, size_t count); +static ssize_t toshiba_kbd_bl_mode_show(struct device *dev, + struct device_attribute *attr, + char *buf); +static ssize_t toshiba_kbd_type_show(struct device *dev, +struct device_attribute *attr, +char *buf); +static ssize_t toshiba_available_kbd_modes_show(struct device *dev, + struct device_attribute *attr, + char *buf); +static ssize_t toshiba_kbd_bl_timeout_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count); +static ssize_t toshiba_kbd_bl_timeout_show(struct device *dev, + struct device_attribute *attr, + char *buf); +static ssize_t toshiba_touchpad_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count); +static ssize_t toshiba_touchpad_show(struct device *dev, +struct device_attribute *attr, +char *buf); +static ssize_t toshiba_position_show(struct device *dev, +struct device_attribute *attr, +
Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer
On Thu, Sep 11, 2014 at 6:17 PM, Stephen Boyd sb...@codeaurora.org wrote: On 09/11/14 17:14, Sonny Rao wrote: On Thu, Sep 11, 2014 at 4:56 PM, Stephen Boyd sb...@codeaurora.org wrote: Where does this platform jump to when a CPU comes up? Is it rockchip_secondary_startup()? I wonder if that path could have this little bit of assembly to poke the cntvoff in monitor mode and then jump to secondary_startup()? Before we boot any secondary CPUs we could also read the cntvoff for CPU0 in the platform specific layer (where we know we're running in secure mode) and then use that value as the reset value for the secondaries. Or does this platform boot up in secure mode some times and non-secure mode other times? Yes, In our case, with our firmware, we will go through some internal Rom code and then jump to rockchip_secondary_startup, but I don't think it's correct to force all users of this SoC to do it that way. What's being forced? The way internal rom jumps to sram? Is there any other way that secondary CPUs come out of reset on this SoC? From looking at the code it seems like the only path is internal rom jumps to sram (where rockchip_secondary_trampoline lives) which jumps to rockchip_secondary_startup() which then does an invalidate and jump to secondary_startup(). Linux controls everything besides the internal rom. Is something different in your case? There are other ways it can be done, and I don't know all of the possibilities, but there seems to be some protocol with the iROM that tells it where to go, which the current SMP patches are using by putting a magic number and an address in SRAM. I think it's true that in our case, it really is pretty simple and we have secure SVC mode and not much else runs (besides the iROM). Since I don't know all of the possibilities, I didn't want to preclude the possibility that someone else handled things differently and entered the kernel in non-secure mode, and have some code there that broke in that instance, that's all I meant by forced. If there were a reasonable way to determine for sure that we are in secure mode, then yes we could do what you're suggesting, and I'd be happy to code that up. I think the problem is that there isn't a great way to determine whether we're in secure mode or not, and this is maybe by design? I don't particularly understand that design choice. It would be nice to hear some rationale from ARM folks. I'm thinking we would have a different boot-method for secure vs. non-secure and then we would know to configure cntvoff or not based on the boot method. Isn't that a reasonable way of knowing what should be done? It seems like we can at least modify the DT for this SoC. Putting something into the device-tree is in fact the point of this patch, so it is sort of doing what you're suggesting, although this patch is about being able use to physical counters and doesn't indicate anything about secure vs non-secure. What else do you think could be used to differentiate between the two cases, besides putting it into the DT? I still wonder if there is such a bootloader/hypervisor/rom that's putting this SoC into non-secure mode and not configuring cntvoff. Doug's comments seem to suggest that the whole world would be different if this were true. Maybe Heiko knows? As far as I'm aware, there's no bootloader/firmware that's ever putting the CPU into non-secure mode for our case. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 16/17] arcmsr: support new adapter ARC12x4 series
On Thu, 2014-09-11 at 16:21 +0200, Tomas Henzl wrote: On 09/11/2014 05:59 AM, Ching Huang wrote: On Wed, 2014-09-10 at 11:58 +0200, Tomas Henzl wrote: On 09/09/2014 06:30 PM, Christoph Hellwig wrote: Ching, do you have a chance to address Thomas second concern below? As far as I can tell (Thomas, please correct me) that's the last outstanding concern, and I'd really like to merge the arcmsr updates for the Linux 3.18 merge window. Correct, still awaiting a response. Christoph, Tomas, Sorry for the late reply. I think I misunderstand Tomas' meaning. The spin lock in arcmsr_hbaD_polling_ccbdone() is necessary to protect doneq_index and have to be modified as following. OK, so you are going to repost 16/17 ? If so, please describe all changes you'll do, in that new post. By previous review, I will post two patches. One for 13/17 and another for 16/17. These patches are relative to http://git.infradead.org/users/hch/scsi-queue.git/tree/arcmsr-for-3.18:/drivers/scsi/arcmsr static int arcmsr_hbaD_polling_ccbdone(struct AdapterControlBlock *acb, struct CommandControlBlock *poll_ccb) { bool error; uint32_t poll_ccb_done = 0, poll_count = 0, flag_ccb, ccb_cdb_phy; int rtn, doneq_index, index_stripped, outbound_write_pointer, toggle; unsigned long flags; struct ARCMSR_CDB *arcmsr_cdb; struct CommandControlBlock *pCCB; struct MessageUnit_D *pmu = acb-pmuD; polling_hbaD_ccb_retry: poll_count++; while (1) { spin_lock_irqsave(acb-doneq_lock, flags); outbound_write_pointer = pmu-done_qbuffer[0].addressLow + 1; doneq_index = pmu-doneq_index; if ((outbound_write_pointer 0xFFF) == (doneq_index 0xFFF)) { spin_unlock_irqrestore(acb-doneq_lock, flags); if (poll_ccb_done) { rtn = SUCCESS; break; } else { msleep(25); if (poll_count 40) { rtn = FAILED; break; } goto polling_hbaD_ccb_retry; } } toggle = doneq_index 0x4000; index_stripped = (doneq_index 0xFFF) + 1; index_stripped %= ARCMSR_MAX_ARC1214_DONEQUEUE; pmu-doneq_index = index_stripped ? (index_stripped | toggle) : ((index_stripped + 1) | (toggle ^ 0x4000)); spin_unlock_irqrestore(acb-doneq_lock, flags); doneq_index = pmu-doneq_index; flag_ccb = pmu-done_qbuffer[doneq_index 0xFFF].addressLow; ccb_cdb_phy = (flag_ccb 0xFFF0); arcmsr_cdb = (struct ARCMSR_CDB *)(acb-vir2phy_offset + ccb_cdb_phy); pCCB = container_of(arcmsr_cdb, struct CommandControlBlock, arcmsr_cdb); poll_ccb_done |= (pCCB == poll_ccb) ? 1 : 0; if ((pCCB-acb != acb) || (pCCB-startdone != ARCMSR_CCB_START)) { if (pCCB-startdone == ARCMSR_CCB_ABORTED) { pr_notice(arcmsr%d: scsi id = %d lun = %d ccb = '0x%p' poll command abort successfully\n , acb-host-host_no , pCCB-pcmd-device-id , (u32)pCCB-pcmd-device-lun , pCCB); pCCB-pcmd-result = DID_ABORT 16; arcmsr_ccb_complete(pCCB); continue; } pr_notice(arcmsr%d: polling an illegal ccb command done ccb = '0x%p' ccboutstandingcount = %d\n , acb-host-host_no , pCCB , atomic_read(acb-ccboutstandingcount)); continue; } error = (flag_ccb ARCMSR_CCBREPLY_FLAG_ERROR_MODE1) ? true : false; arcmsr_report_ccb_state(acb, pCCB, error); } return rtn; } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe
Re: [PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
Hi Gleb, Paolo, On 09/11/2014 10:47 PM, Gleb Natapov wrote: On Thu, Sep 11, 2014 at 04:37:39PM +0200, Paolo Bonzini wrote: Il 11/09/2014 16:31, Gleb Natapov ha scritto: What if the page being swapped out is L1's APIC access page? We don't run prepare_vmcs12 in that case because it's an L2-L0-L2 entry, so we need to do something. We will do something on L2-L1 exit. We will call kvm_reload_apic_access_page(). That is what patch 5 of this series is doing. Sorry, I meant the APIC access page prepared by L1 for L2's execution. You wrote: if (!is_guest_mode() || !(vmcs12-secondary_vm_exec_control ECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) write(PIC_ACCESS_ADDR) In other words if L2 shares L1 apic access page then reload, otherwise do nothing. but in that case you have to redo nested_get_page, so do nothing doesn't work. Ah, 7/7 is new in this submission. Before that this page was still pinned. Looking at 7/7 now I do not see how it can work since it has no code for mmu notifier to detect that it deals with such page and call kvm_reload_apic_access_page(). Since L1 and L2 share one apic page, if the page is unmapped, mmu_notifier will be called, and : - if vcpu is in L1, a L1-L0 exit is rised. apic page's pa will be updated in the next L0-L1 entry by making vcpu request. - if vcpu is in L2 (is_guest_mode, right?), a L2-L0 exit is rised. nested_vmx_vmexit() will not be called since it is called in L2-L1 exit. It returns from vmx_vcpu_run() directly, right ? So we should update apic page in L0-L2 entry. This is also done by making vcpu request, right ?. prepare_vmcs02() is called in L1-L2 entry, and nested_vmx_vmexit() is called in L2-L1 exit. So we also need to update L1's vmcs in nested_vmx_vmexit() in patch 5/7. IIUC, I think patch 1~6 has done such things. And yes, the is_guest_mode() check is not needed. I said to Tang previously that nested kvm has a bunch of pinned page that are hard to deal with and suggested to iron out non nested case first :( Yes, and maybe adding patch 7 is not a good idea for now. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
Hi Paolo, On 09/11/2014 10:24 PM, Paolo Bonzini wrote: Il 11/09/2014 16:21, Gleb Natapov ha scritto: As far as I can tell the if that is needed there is: if (!is_guest_mode() || !(vmcs12-secondary_vm_exec_control ECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) write(PIC_ACCESS_ADDR) In other words if L2 shares L1 apic access page then reload, otherwise do nothing. What if the page being swapped out is L1's APIC access page? We don't run prepare_vmcs12 in that case because it's an L2-L0-L2 entry, so we need to do something. Are you talking about the case that L1 and L2 have different apic pages ? I think I didn't deal with this situation in this patch set. Sorry I didn't say it clearly. Here, I assume L1 and L2 share the same apic page. If we are in L2, and the page is migrated, we updated L2's vmcs by making vcpu request. And of course, we should also update L1's vmcs. This is done by patch 5. We make vcpu request again in nested_vmx_exit(). Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net 1/2] r8169: fix the default setting of rx vlan
If the parameter features of __rtl8169_set_features() is equal to dev-features, the variable changed is alwayes 0, and nothing would be changed. Signed-off-by: Hayes Wang hayesw...@realtek.com --- drivers/net/ethernet/realtek/r8169.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 91652e7..f3ce284 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6707,7 +6707,12 @@ static int rtl_open(struct net_device *dev) rtl8169_init_phy(dev, tp); - __rtl8169_set_features(dev, dev-features); + if (dev-features NETIF_F_HW_VLAN_CTAG_RX) + tp-cp_cmd |= RxVlan; + else + tp-cp_cmd = ~RxVlan; + + RTL_W16(CPlusCmd, tp-cp_cmd); rtl_pll_power_up(tp); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net 0/2] r8169: fix rx vlan
There are two issues for hw rx vlan. The patches are used to fix them. Hayes Wang (2): r8169: fix the default setting of rx vlan r8169: fix setting rx vlan drivers/net/ethernet/realtek/r8169.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net 2/2] r8169: fix setting rx vlan
The setting should depend on the new features not the current one. Signed-off-by: Hayes Wang hayesw...@realtek.com --- drivers/net/ethernet/realtek/r8169.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f3ce284..7a7860a 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1796,7 +1796,7 @@ static void __rtl8169_set_features(struct net_device *dev, else tp-cp_cmd = ~RxChkSum; - if (dev-features NETIF_F_HW_VLAN_CTAG_RX) + if (features NETIF_F_HW_VLAN_CTAG_RX) tp-cp_cmd |= RxVlan; else tp-cp_cmd = ~RxVlan; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/9] locktorture: Add documentation
Just like Documentation/RCU/torture.txt, begin a document for the locktorture module. This module is still pretty green, so I have just added some specific sections to the doc (general desc, params, usage, etc.). Further development should update the file. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- Documentation/locking/locktorture.txt | 128 ++ 1 file changed, 128 insertions(+) create mode 100644 Documentation/locking/locktorture.txt diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt new file mode 100644 index 000..c0ab969 --- /dev/null +++ b/Documentation/locking/locktorture.txt @@ -0,0 +1,128 @@ +Kernel Lock Torture Test Operation + +CONFIG_LOCK_TORTURE_TEST + +The CONFIG LOCK_TORTURE_TEST config option provides a kernel module +that runs torture tests on core kernel locking primitives. The kernel +module, 'locktorture', may be built after the fact on the running +kernel to be tested, if desired. The tests periodically outputs status +messages via printk(), which can be examined via the dmesg (perhaps +grepping for torture). The test is started when the module is loaded, +and stops when the module is unloaded. This program is based on how RCU +is tortured, via rcutorture. + +This torture test consists of creating a number of kernel threads which +acquires the lock and holds it for specific amount of time, thus simulating +different critical region behaviors. The amount of contention on the lock +can be simulated by either enlarging this critical region hold time and/or +creating more kthreads. + + +MODULE PARAMETERS + +This module has the following parameters: + + + ** Locktorture-specific ** + +nwriters_stress Number of kernel threads that will stress exclusive lock + ownership (writers). The default value is twice the amount + of online CPUs. + +torture_type Type of lock to torture. By default, only spinlocks will + be tortured. This module can torture the following locks, + with string values as follows: + +o lock_busted: Simulates a buggy lock implementation. + +o spin_lock: spin_lock() and spin_unlock() pairs. + +o spin_lock_irq: spin_lock_irq() and spin_unlock_irq() + pairs. + +torture_runnable Start locktorture at module init. By default it will begin + once the module is loaded. + + + ** Torture-framework (RCU + locking) ** + +shutdown_secsThe number of seconds to run the test before terminating + the test and powering off the system. The default is + zero, which disables test termination and system shutdown. + This capability is useful for automated testing. + +onoff_holdoffThe number of seconds between each attempt to execute a + randomly selected CPU-hotplug operation. Defaults to + zero, which disables CPU hotplugging. In HOTPLUG_CPU=n + kernels, locktorture will silently refuse to do any + CPU-hotplug operations regardless of what value is + specified for onoff_interval. + +onoff_holdoffThe number of seconds to wait until starting CPU-hotplug + operations. This would normally only be used when + locktorture was built into the kernel and started + automatically at boot time, in which case it is useful + in order to avoid confusing boot-time code with CPUs + coming and going. This parameter is only useful if + CONFIG_HOTPLUG_CPU is enabled. + +stat_intervalNumber of seconds between statistics-related printk()s. + By default, locktorture will report stats every 60 seconds. + Setting the interval to zero causes the statistics to + be printed -only- when the module is unloaded, and this + is the default. + +stutter The length of time to run the test before pausing for this + same period of time. Defaults to stutter=5, so as + to run and pause for (roughly) five-second intervals. + Specifying stutter=0 causes the test to run continuously + without pausing, which is the old default behavior. + +shuffle_interval The number of seconds to keep the test threads affinitied + to a particular subset of the CPUs, defaults to 3 seconds. + Used in conjunction with test_no_idle_hz. + +verbose Enable verbose debugging printking, via printk(). Enabled + by default. This extra information is mostly related to + high-level errors and reports from the main 'torture' + framework. + + +STATISTICS + +Statistics are printed in the following
[PATCH 6/9] torture: Address race in module cleanup
When performing module cleanups by calling torture_cleanup() the 'torture_type' string in nullified However, callers are not necessarily done, and might still need to reference the variable. This impacts both rcutorture and locktorture, causing printing things like: [ 94.226618] (null)-torture: Stopping lock_torture_writer task [ 94.226624] (null)-torture: Stopping lock_torture_stats task Thus delay this operation until the very end of the cleanup process. The consequence (which shouldn't matter for this kid of program) is, of course, that we delay the window between rmmod and modprobing, for instance in module_torture_begin(). Signed-off-by: Davidlohr Bueso dbu...@suse.de --- include/linux/torture.h | 3 ++- kernel/locking/locktorture.c | 3 ++- kernel/rcu/rcutorture.c | 3 ++- kernel/torture.c | 16 +--- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/include/linux/torture.h b/include/linux/torture.h index 5ca58fc..301b628 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -77,7 +77,8 @@ int torture_stutter_init(int s); /* Initialization and cleanup. */ bool torture_init_begin(char *ttype, bool v, int *runnable); void torture_init_end(void); -bool torture_cleanup(void); +bool torture_cleanup_begin(void); +void torture_cleanup_end(void); bool torture_must_stop(void); bool torture_must_stop_irq(void); void torture_kthread_stopping(char *title); diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index de703a7..988267c 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -361,7 +361,7 @@ static void lock_torture_cleanup(void) { int i; - if (torture_cleanup()) + if (torture_cleanup_begin()) return; if (writer_tasks) { @@ -384,6 +384,7 @@ static void lock_torture_cleanup(void) else lock_torture_print_module_parms(cur_ops, End of test: SUCCESS); + torture_cleanup_end(); } static int __init lock_torture_init(void) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 948a769..57a2792 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -1418,7 +1418,7 @@ rcu_torture_cleanup(void) int i; rcutorture_record_test_transition(); - if (torture_cleanup()) { + if (torture_cleanup_begin()) { if (cur_ops-cb_barrier != NULL) cur_ops-cb_barrier(); return; @@ -1468,6 +1468,7 @@ rcu_torture_cleanup(void) End of test: RCU_HOTPLUG); else rcu_torture_print_module_parms(cur_ops, End of test: SUCCESS); + torture_cleanup_end(); } #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD diff --git a/kernel/torture.c b/kernel/torture.c index d600af2..07a5c3d 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -635,8 +635,13 @@ EXPORT_SYMBOL_GPL(torture_init_end); * * This must be called before the caller starts shutting down its own * kthreads. + * + * Both torture_cleanup_begin() and torture_cleanup_end() must be paired, + * in order to correctly perform the cleanup. They are separated because + * threads can still need to reference the torture_type type, thus nullify + * only after completing all other relevant calls. */ -bool torture_cleanup(void) +bool torture_cleanup_begin(void) { mutex_lock(fullstop_mutex); if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) { @@ -651,12 +656,17 @@ bool torture_cleanup(void) torture_shuffle_cleanup(); torture_stutter_cleanup(); torture_onoff_cleanup(); + return false; +} +EXPORT_SYMBOL_GPL(torture_cleanup_begin); + +void torture_cleanup_end(void) +{ mutex_lock(fullstop_mutex); torture_type = NULL; mutex_unlock(fullstop_mutex); - return false; } -EXPORT_SYMBOL_GPL(torture_cleanup); +EXPORT_SYMBOL_GPL(torture_cleanup_end); /* * Is it time for the current torture test to stop? -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/9] locktorture: Make statistics generic
The statistics structure can serve well for both reader and writer locks, thus simply rename some fields that mention 'write' and leave the declaration of lwsa. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- kernel/locking/locktorture.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index a6049fa..de703a7 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -78,11 +78,11 @@ static struct task_struct **writer_tasks; static int nrealwriters_stress; static bool lock_is_write_held; -struct lock_writer_stress_stats { - long n_write_lock_fail; - long n_write_lock_acquired; +struct lock_stress_stats { + long n_lock_fail; + long n_lock_acquired; }; -static struct lock_writer_stress_stats *lwsa; +static struct lock_stress_stats *lwsa; /* writer statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -250,7 +250,7 @@ static struct lock_torture_ops mutex_lock_ops = { */ static int lock_torture_writer(void *arg) { - struct lock_writer_stress_stats *lwsp = arg; + struct lock_stress_stats *lwsp = arg; static DEFINE_TORTURE_RANDOM(rand); VERBOSE_TOROUT_STRING(lock_torture_writer task started); @@ -261,9 +261,9 @@ static int lock_torture_writer(void *arg) schedule_timeout_uninterruptible(1); cur_ops-writelock(); if (WARN_ON_ONCE(lock_is_write_held)) - lwsp-n_write_lock_fail++; + lwsp-n_lock_fail++; lock_is_write_held = 1; - lwsp-n_write_lock_acquired++; + lwsp-n_lock_acquired++; cur_ops-write_delay(rand); lock_is_write_held = 0; cur_ops-writeunlock(); @@ -281,17 +281,17 @@ static void lock_torture_printk(char *page) bool fail = 0; int i; long max = 0; - long min = lwsa[0].n_write_lock_acquired; + long min = lwsa[0].n_lock_acquired; long long sum = 0; for (i = 0; i nrealwriters_stress; i++) { - if (lwsa[i].n_write_lock_fail) + if (lwsa[i].n_lock_fail) fail = true; - sum += lwsa[i].n_write_lock_acquired; - if (max lwsa[i].n_write_lock_fail) - max = lwsa[i].n_write_lock_fail; - if (min lwsa[i].n_write_lock_fail) - min = lwsa[i].n_write_lock_fail; + sum += lwsa[i].n_lock_acquired; + if (max lwsa[i].n_lock_fail) + max = lwsa[i].n_lock_fail; + if (min lwsa[i].n_lock_fail) + min = lwsa[i].n_lock_fail; } page += sprintf(page, %s%s , torture_type, TORTURE_FLAG); page += sprintf(page, @@ -441,8 +441,8 @@ static int __init lock_torture_init(void) goto unwind; } for (i = 0; i nrealwriters_stress; i++) { - lwsa[i].n_write_lock_fail = 0; - lwsa[i].n_write_lock_acquired = 0; + lwsa[i].n_lock_fail = 0; + lwsa[i].n_lock_acquired = 0; } /* Start up the kthreads. */ -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/9] locktorture: Rename locktorture_runnable parameter
... to just 'torture_runnable'. It follows other variable naming and is shorter. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- kernel/locking/locktorture.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 0955b88..8c770b2 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -87,9 +87,9 @@ static struct lock_writer_stress_stats *lwsa; #else #define LOCKTORTURE_RUNNABLE_INIT 0 #endif -int locktorture_runnable = LOCKTORTURE_RUNNABLE_INIT; -module_param(locktorture_runnable, int, 0444); -MODULE_PARM_DESC(locktorture_runnable, Start locktorture at module init); +int torture_runnable = LOCKTORTURE_RUNNABLE_INIT; +module_param(torture_runnable, int, 0444); +MODULE_PARM_DESC(torture_runnable, Start locktorture at module init); /* Forward reference. */ static void lock_torture_cleanup(void); @@ -355,7 +355,7 @@ static int __init lock_torture_init(void) lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, }; - if (!torture_init_begin(torture_type, verbose, locktorture_runnable)) + if (!torture_init_begin(torture_type, verbose, torture_runnable)) return -EBUSY; /* Process args and tell the world that the torturer is on the job. */ -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/9] locktorture: Support mutexes
Add a mutex_lock torture test. The main difference with the already existing spinlock tests is that the latency of the critical region is much larger. We randomly delay for (arbitrarily) either 500 ms or, otherwise, 25 ms. While this can considerably reduce the amount of writes compared to non blocking locks, if run long enough it can have the same torturous effect. Furthermore it is more representative of mutex hold times and can stress better things like thrashing. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- Documentation/locking/locktorture.txt | 2 ++ kernel/locking/locktorture.c | 41 +-- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index c0ab969..6b1e7ca 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -40,6 +40,8 @@ torture_typeType of lock to torture. By default, only spinlocks will o spin_lock_irq: spin_lock_irq() and spin_unlock_irq() pairs. +o mutex_lock: mutex_lock() and mutex_unlock() pairs. + torture_runnable Start locktorture at module init. By default it will begin once the module is loaded. diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 8c770b2..414ba45 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -27,6 +27,7 @@ #include linux/kthread.h #include linux/err.h #include linux/spinlock.h +#include linux/mutex.h #include linux/smp.h #include linux/interrupt.h #include linux/sched.h @@ -66,7 +67,7 @@ torture_param(bool, verbose, true, static char *torture_type = spin_lock; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, -Type of lock to torture (spin_lock, spin_lock_irq, ...)); +Type of lock to torture (spin_lock, spin_lock_irq, mutex_lock, ...)); static atomic_t n_lock_torture_errors; @@ -206,6 +207,42 @@ static struct lock_torture_ops spin_lock_irq_ops = { .name = spin_lock_irq }; +static DEFINE_MUTEX(torture_mutex); + +static int torture_mutex_lock(void) __acquires(torture_mutex) +{ + mutex_lock(torture_mutex); + return 0; +} + +static void torture_mutex_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 5); + else + mdelay(longdelay_ms / 5); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealwriters_stress * 2))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_mutex_unlock(void) __releases(torture_mutex) +{ + mutex_unlock(torture_mutex); +} + +static struct lock_torture_ops mutex_lock_ops = { + .writelock = torture_mutex_lock, + .write_delay= torture_mutex_delay, + .writeunlock= torture_mutex_unlock, + .name = mutex_lock +}; + /* * Lock torture writer kthread. Repeatedly acquires and releases * the lock, checking for duplicate acquisitions. @@ -352,7 +389,7 @@ static int __init lock_torture_init(void) int i; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { - lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, + lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, mutex_lock_ops, }; if (!torture_init_begin(torture_type, verbose, torture_runnable)) -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/9] locktorture: Teach about lock debugging
Regular locks are very different than locks with debugging. For instance for mutexes, debugging forces to only take the slowpaths. As such, the locktorture module should take this into account when printing related information -- specifically when printing user passed parameters, it seems the right place for such info. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- kernel/locking/locktorture.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 414ba45..a6049fa 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -64,6 +64,7 @@ torture_param(int, stutter, 5, Number of jiffies to run/halt test, 0=disable); torture_param(bool, verbose, true, Enable verbose debugging printk()s); +static bool debug_lock = false; static char *torture_type = spin_lock; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, @@ -349,8 +350,9 @@ lock_torture_print_module_parms(struct lock_torture_ops *cur_ops, const char *tag) { pr_alert(%s TORTURE_FLAG ---- %s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n, -torture_type, tag, nrealwriters_stress, stat_interval, verbose, +--- %s%s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n, +torture_type, tag, debug_lock ? [debug]: , +nrealwriters_stress, stat_interval, verbose, shuffle_interval, stutter, shutdown_secs, onoff_interval, onoff_holdoff); } @@ -418,6 +420,15 @@ static int __init lock_torture_init(void) nrealwriters_stress = nwriters_stress; else nrealwriters_stress = 2 * num_online_cpus(); + +#ifdef CONFIG_DEBUG_MUTEXES + if (strncmp(torture_type, mutex, 5) == 0) + debug_lock = true; +#endif +#ifdef CONFIG_DEBUG_SPINLOCK + if (strncmp(torture_type, spin, 4) == 0) + debug_lock = true; +#endif lock_torture_print_module_parms(cur_ops, Start of test); /* Initialize the statistics so that each run gets its own numbers. */ -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 0/9] locktorture: Improve and expand lock torturing
This set includes general updates throughout the locktorture code. Particularly support for reader locks are added as well as torturing mutexes and rwsems. With the recent locking changes, it doesn't hurt to improve our testing infrastructure, and torturing is definitely one of them. For specific details about each change, please consult the actual patches. o patches 1, 4, 9: misc changes. o patch 2: new doc, based on rcutorture's. o patches 3, 8: torture new locking primitives. o patches 5, 7: add support for reader locks. o patch 6: fix a minor race in the torture cleanup path. Really no particular order, please consider for v3.18. Davidlohr Bueso (9): locktorture: Rename locktorture_runnable parameter locktorture: Add documentation locktorture: Support mutexes locktorture: Teach about lock debugging locktorture: Make statistics generic torture: Address race in module cleanup locktorture: Add infraestructure for torturing read locks locktorture: Support rwsems locktorture: Introduce torture context Documentation/locking/locktorture.txt | 140 include/linux/torture.h | 3 +- kernel/locking/locktorture.c | 392 -- kernel/rcu/rcutorture.c | 3 +- kernel/torture.c | 16 +- 5 files changed, 480 insertions(+), 74 deletions(-) create mode 100644 Documentation/locking/locktorture.txt -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] clk: samsung: exynos5260: fix typo in clock name
Hi Tomasz, On Friday, September 12, 2014, Tomasz Figa wrote, To: Pankaj Dubey; linux-arm-ker...@lists.infradead.org; linux-samsung- s...@vger.kernel.org; linux-kernel@vger.kernel.org Cc: kgene@samsung.com; s.nawro...@samsung.com; mturque...@linaro.org; Chander Kashyap; Abhilash Kesavan Subject: Re: [PATCH] clk: samsung: exynos5260: fix typo in clock name Hi Pankaj, On 10.09.2014 07:56, Pankaj Dubey wrote: From: Chander Kashyap k.chan...@samsung.com The parent name added in parent list as mout_phyclk_mipi_dphy_4l_m_txbyte_clkhs_p, is different than the defined parent due to typo. Signed-off-by: Abhilash Kesavan a.kesa...@samsung.com Signed-off-by: Chander Kashyap k.chan...@samsung.com Missed your sign-off? You can reply with it inline and I will add it when applying this weekend. OK. P.S. Please keep me on CC when sending patches for Samsung clock drivers. Sure. Thanks, Pankaj Dubey Best regards, Tomasz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
vga and 64-bit memcpy's
Got an bug report from someone using a silicon motion video card in VGA mode about corruption that they tracked down to 64-bit memory operations not being supported by the video card, it appears that we probably shouldn't be using 32-bit copies on VGA memory. The include/linux/vt_buffer.h defaults scr_memcpyw defaults to using memcpy, which on 64-bit x86 machines ends up as rep movsq. Is there any way to make this use a 32-bit memcpy instead? Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/3] sched: BUG when stack end location is over written
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote: diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a285900..2a8280a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -824,6 +824,18 @@ config SCHEDSTATS application, you can say N to avoid the very slight overhead this adds. +config SCHED_STACK_END_CHECK + bool Detect stack corruption on calls to schedule() + depends on DEBUG_KERNEL + default y Did you really mean default y? Doing so means it will be turned on more or less everywhere, which defeats the purpose of having a config option in the first place. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..0b70b73 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct *prev) */ static inline void schedule_debug(struct task_struct *prev) { +#ifdef CONFIG_SCHED_STACK_END_CHECK + BUG_ON(unlikely(task_stack_end_corrupted(prev))) +#endif If this was my code I'd make you put that in a static inline. cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
Talking about codespell, it detected 76 informations in 3.17-rc4. grep -R informations * |wc -l found 120 typos. Test with occured, codespell found 46, grep found 110. Test with reseting case, codespell found 21, grep found 26. So I expect about half of the incoming typos will be detected by the tool, and be fixed. Masanari -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] ASoC: tda998x: Add a codec to the HDMI transmitter
On 10 September 2014 19:29, Jean-Francois Moine moin...@free.fr wrote: This patch adds a CODEC function to the NXP TDA998x HDMI transmitter. The CODEC handles both I2S and S/PDIF inputs. It maintains the audio format and rate constraints according to the HDMI device parameters (EDID) and does dynamic input switch in the TDA998x I2C driver on start/stop audio streaming. You should indicate on subsystem spanning patches what tree you think should merge it etc. If other tda998x ppl are okay with it, you can have my ack for merging via someone else. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] drm fixes
Hi Linus, ast, i915, radeon and msm fixes, all over the place, all fixing build issues, regressions, oopses or failure to detect cards. Dave. The following changes since commit 7ec62d421bdf29cb31101ae2689f7f3a9906289a: Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2014-09-10 14:04:17 -0700) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes for you to fetch changes up to 83502a5d34386f7c6973bc70e1c423f55f5a2e3a: drm/ast: AST2000 cannot be detected correctly (2014-09-12 13:41:39 +1000) Alex Deucher (3): drm/radeon: only use me/pfp sync on evergreen+ drm/radeon: add connector quirk for fujitsu board drm/radeon/dpm: set the thermal type properly for special configs Andy Shevchenko (1): drm/radeon: reduce memory footprint for debugging Chris Wilson (2): drm/i915: Prevent recursive deadlock on releasing a busy userptr drm/i915: Evict CS TLBs between batches Christian König (1): drm/radeon: fix semaphore value init Daniel Vetter (2): drm/i915: Fix EIO/wedged handling in gem fault handler drm/i915: Fix irq enable tracking in driver load Dave Airlie (3): Merge branch 'drm-fixes-3.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes Merge tag 'drm-intel-fixes-2014-09-10' of git://anongit.freedesktop.org/drm-intel into drm-fixes Merge branch 'msm-fixes-3.17-rc4' of git://people.freedesktop.org/~robclark/linux into drm-fixes Mark Charlebois (1): drm/msm: Change nested function to static function Rob Clark (2): drm/msm/hdmi: fix build break on non-CCF platforms drm/msm: don't crash if no msm.vram param Ville Syrjälä (1): drm/i915: Wait for vblank before enabling the TV encoder Y.C. Chen (2): drm/ast: open key before detect chips drm/ast: AST2000 cannot be detected correctly drivers/gpu/drm/ast/ast_main.c| 3 +- drivers/gpu/drm/i915/i915_dma.c | 9 +- drivers/gpu/drm/i915/i915_drv.h | 10 +- drivers/gpu/drm/i915/i915_gem.c | 11 +- drivers/gpu/drm/i915/i915_gem_userptr.c | 409 +- drivers/gpu/drm/i915/i915_reg.h | 12 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 66 +++-- drivers/gpu/drm/i915/intel_tv.c | 4 + drivers/gpu/drm/msm/hdmi/hdmi.c | 46 ++-- drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c | 15 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/radeon/atombios_dp.c | 7 +- drivers/gpu/drm/radeon/r600.c | 4 +- drivers/gpu/drm/radeon/radeon_atombios.c | 33 ++- drivers/gpu/drm/radeon/radeon_semaphore.c | 2 +- 15 files changed, 371 insertions(+), 262 deletions(-)
[PATCH] mfd: intel_soc_pmic: Add CONFIG_PM_SLEEP check for suspend_fn/resume_fn
This patch fix warning message with CONFIG_PM_SLEEP disabled If CONFIG_PM_SLEEP is not enabled we receive the following warning message: drivers/mfd/intel_soc_pmic_core.c:118:12: warning: 'intel_soc_pmic_suspend' defined but not used Signed-off-by: Jaewon Kim jaewon02@samsung.com --- drivers/mfd/intel_soc_pmic_core.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mfd/intel_soc_pmic_core.c b/drivers/mfd/intel_soc_pmic_core.c index 2720922..df7b064 100644 --- a/drivers/mfd/intel_soc_pmic_core.c +++ b/drivers/mfd/intel_soc_pmic_core.c @@ -115,6 +115,7 @@ static void intel_soc_pmic_shutdown(struct i2c_client *i2c) return; } +#if defined(CONFIG_PM_SLEEP) static int intel_soc_pmic_suspend(struct device *dev) { struct intel_soc_pmic *pmic = dev_get_drvdata(dev); @@ -132,6 +133,7 @@ static int intel_soc_pmic_resume(struct device *dev) return 0; } +#endif static SIMPLE_DEV_PM_OPS(intel_soc_pmic_pm_ops, intel_soc_pmic_suspend, intel_soc_pmic_resume); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/9] locktorture: Support rwsems
We can easily do so with our new reader lock support. Just an arbitrary design default: readers have higher (5x) critical region latencies than writers: 50 ms and 10 ms, respectively. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- Documentation/locking/locktorture.txt | 2 ++ kernel/locking/locktorture.c | 68 ++- 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index 1bdeb71..f7d99e2 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -47,6 +47,8 @@ torture_typeType of lock to torture. By default, only spinlocks will o mutex_lock: mutex_lock() and mutex_unlock() pairs. +o rwsem_lock: read/write down() and up() semaphore pairs. + torture_runnable Start locktorture at module init. By default it will begin once the module is loaded. diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index c1073d7..8480118 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -265,6 +265,71 @@ static struct lock_torture_ops mutex_lock_ops = { .name = mutex_lock }; +static DECLARE_RWSEM(torture_rwsem); +static int torture_rwsem_down_write(void) __acquires(torture_rwsem) +{ + down_write(torture_rwsem); + return 0; +} + +static void torture_rwsem_write_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 10); + else + mdelay(longdelay_ms / 10); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealwriters_stress * 2))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_rwsem_up_write(void) __releases(torture_rwsem) +{ + up_write(torture_rwsem); +} + +static int torture_rwsem_down_read(void) __acquires(torture_rwsem) +{ + down_read(torture_rwsem); + return 0; +} + +static void torture_rwsem_read_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 2); + else + mdelay(longdelay_ms / 2); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealreaders_stress * 2))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_rwsem_up_read(void) __releases(torture_rwsem) +{ + up_read(torture_rwsem); +} + +static struct lock_torture_ops rwsem_lock_ops = { + .writelock = torture_rwsem_down_write, + .write_delay= torture_rwsem_write_delay, + .writeunlock= torture_rwsem_up_write, + .readlock = torture_rwsem_down_read, + .read_delay = torture_rwsem_read_delay, + .readunlock = torture_rwsem_up_read, + .name = rwsem_lock +}; + /* * Lock torture writer kthread. Repeatedly acquires and releases * the lock, checking for duplicate acquisitions. @@ -467,7 +532,8 @@ static int __init lock_torture_init(void) int i, j; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { - lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, mutex_lock_ops, + lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, + mutex_lock_ops, rwsem_lock_ops, }; if (!torture_init_begin(torture_type, verbose, torture_runnable)) -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/9] locktorture: Add infrastructure for torturing read locks
Most of it is based on what we already have for writers. This allows readers to be very independent (and thus configurable), enabling future module parameters to control things such as rw distribution. Furthermore, readers have their own delaying function, allowing us to test different rw critical region latencies, and stress locking internals. Similarly, statistics, for now will only serve for the number of lock acquisitions -- as opposed to writers, readers have no failure detection. In addition, introduce a new nreaders_stress module parameter. The default number of readers will be the same number of writers threads. Writer threads are interleaved with readers. Documentation is updated, respectively. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- Documentation/locking/locktorture.txt | 16 +++- kernel/locking/locktorture.c | 176 ++ 2 files changed, 168 insertions(+), 24 deletions(-) diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index 6b1e7ca..1bdeb71 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -29,6 +29,11 @@ nwriters_stress Number of kernel threads that will stress exclusive lock ownership (writers). The default value is twice the amount of online CPUs. +nreaders_stress Number of kernel threads that will stress shared lock + ownership (readers). The default is the same amount of writer + locks. If the user did not specify nwriters_stress, then + both readers and writers be the amount of online CPUs. + torture_type Type of lock to torture. By default, only spinlocks will be tortured. This module can torture the following locks, with string values as follows: @@ -95,15 +100,18 @@ STATISTICS Statistics are printed in the following format: spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 - (A)(B)(C) (D) + (A) (B)(C)(D) (E) (A): Lock type that is being tortured -- torture_type parameter. -(B): Number of times the lock was acquired. +(B): Number of writer lock acquisitions. If dealing with a read/write primitive + a second Reads statistics line is printed. + +(C): Number of times the lock was acquired. -(C): Min and max number of times threads failed to acquire the lock. +(D): Min and max number of times threads failed to acquire the lock. -(D): true/false values if there were errors acquiring the lock. This should +(E): true/false values if there were errors acquiring the lock. This should -only- be positive if there is a bug in the locking primitive's implementation. Otherwise a lock should never fail (ie: spin_lock()). Of course, the same applies for (C), above. A dummy example of this is diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 988267c..c1073d7 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -52,6 +52,8 @@ MODULE_AUTHOR(Paul E. McKenney paul...@us.ibm.com); torture_param(int, nwriters_stress, -1, Number of write-locking stress-test threads); +torture_param(int, nreaders_stress, -1, +Number of read-locking stress-test threads); torture_param(int, onoff_holdoff, 0, Time after boot before CPU hotplugs (s)); torture_param(int, onoff_interval, 0, Time between CPU hotplugs (s), 0=disable); @@ -74,15 +76,19 @@ static atomic_t n_lock_torture_errors; static struct task_struct *stats_task; static struct task_struct **writer_tasks; +static struct task_struct **reader_tasks; static int nrealwriters_stress; static bool lock_is_write_held; +static int nrealreaders_stress; +static bool lock_is_read_held; struct lock_stress_stats { long n_lock_fail; long n_lock_acquired; }; static struct lock_stress_stats *lwsa; /* writer statistics */ +static struct lock_stress_stats *lrsa; /* reader statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -104,6 +110,9 @@ struct lock_torture_ops { int (*writelock)(void); void (*write_delay)(struct torture_random_state *trsp); void (*writeunlock)(void); + int (*readlock)(void); + void (*read_delay)(struct torture_random_state *trsp); + void (*readunlock)(void); unsigned long flags; const char *name; }; @@ -142,6 +151,9 @@ static struct lock_torture_ops lock_busted_ops = { .writelock = torture_lock_busted_write_lock, .write_delay= torture_lock_busted_write_delay, .writeunlock= torture_lock_busted_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, .name = lock_busted }; @@ -182,6 +194,9 @@ static struct lock_torture_ops
Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
On 09/11/2014 04:37 PM, Thomas Gleixner wrote: Specifically because marshaling the data in and out of the generic decoder was more complex than a special-purpose decoder. I did not look at that detail and I trust your judgement here, but that is in no way explained in the changelog. This whole patchset is a pain to review due to half baken changelogs and complete lack of a proper design description. I'm not wedded to that concept, by the way, but using the generic parser had a whole bunch of its own problems, including the fact that you're getting bytes from user space. It might be worthwhile to compare the older patchset which did use the generic parser to make sure that it actually made sense. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 9/9] locktorture: Introduce torture context
The amount of global variables is getting pretty ugly. Group variables related to the execution (ie: not parameters) in a new context structure. Signed-off-by: Davidlohr Bueso dbu...@suse.de --- kernel/locking/locktorture.c | 161 ++- 1 file changed, 82 insertions(+), 79 deletions(-) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 8480118..540d5df 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -66,29 +66,22 @@ torture_param(int, stutter, 5, Number of jiffies to run/halt test, 0=disable); torture_param(bool, verbose, true, Enable verbose debugging printk()s); -static bool debug_lock = false; static char *torture_type = spin_lock; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, Type of lock to torture (spin_lock, spin_lock_irq, mutex_lock, ...)); -static atomic_t n_lock_torture_errors; - static struct task_struct *stats_task; static struct task_struct **writer_tasks; static struct task_struct **reader_tasks; -static int nrealwriters_stress; static bool lock_is_write_held; -static int nrealreaders_stress; static bool lock_is_read_held; struct lock_stress_stats { long n_lock_fail; long n_lock_acquired; }; -static struct lock_stress_stats *lwsa; /* writer statistics */ -static struct lock_stress_stats *lrsa; /* reader statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -117,8 +110,18 @@ struct lock_torture_ops { const char *name; }; -static struct lock_torture_ops *cur_ops; - +struct lock_torture_cxt { + int nrealwriters_stress; + int nrealreaders_stress; + bool debug_lock; + atomic_t n_lock_torture_errors; + struct lock_torture_ops *cur_ops; + struct lock_stress_stats *lwsa; /* writer statistics */ + struct lock_stress_stats *lrsa; /* reader statistics */ +}; +static struct lock_torture_cxt cxt = { 0, 0, false, + ATOMIC_INIT(0), + NULL, NULL}; /* * Definitions for lock torture testing. */ @@ -134,10 +137,10 @@ static void torture_lock_busted_write_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_us))) + (cxt.nrealwriters_stress * 2000 * longdelay_us))) mdelay(longdelay_us); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 2))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -174,13 +177,13 @@ static void torture_spin_lock_write_delay(struct torture_random_state *trsp) * we want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_us))) + (cxt.nrealwriters_stress * 2000 * longdelay_us))) mdelay(longdelay_us); if (!(torture_random(trsp) % - (nrealwriters_stress * 2 * shortdelay_us))) + (cxt.nrealwriters_stress * 2 * shortdelay_us))) udelay(shortdelay_us); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 2))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -206,14 +209,14 @@ __acquires(torture_spinlock_irq) unsigned long flags; spin_lock_irqsave(torture_spinlock, flags); - cur_ops-flags = flags; + cxt.cur_ops-flags = flags; return 0; } static void torture_lock_spin_write_unlock_irq(void) __releases(torture_spinlock) { - spin_unlock_irqrestore(torture_spinlock, cur_ops-flags); + spin_unlock_irqrestore(torture_spinlock, cxt.cur_ops-flags); } static struct lock_torture_ops spin_lock_irq_ops = { @@ -240,12 +243,12 @@ static void torture_mutex_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_ms))) + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms * 5); else mdelay(longdelay_ms / 5); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 2))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -278,12 +281,12 @@ static void torture_rwsem_write_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if
Re: [PATCH v2] checkpatch: look for common misspellings
On Fri, 2014-09-12 at 13:09 +0900, Masanari Iida wrote: Test with reseting case, codespell found 21, grep found 26. Hello Masanari. How did codespell find any uses of reseting? What version of codespell are you using? (I tested with 1.7) Looking at the git tree for codespell, https://github.com/lucasdemarchi/codespell.git the dictionary there doesn't have reseting. If I add reseting-resetting to the dictionary, then codespell finds the same 31 uses that git grep -i does. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking
On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote: When zsmalloc creates a new zspage, it initializes each object it contains with a link to the next object, so that the zspage has a singly-linked list of its free objects. However, the logic that sets up the links is wrong, and in the case of objects that are precisely aligned with the page boundries (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the next page is skipped, due to incrementing the offset twice. The logic can be simplified, as it doesn't need to calculate how many objects can fit on the current page; simply checking the offset for each object is enough. If objects are precisely aligned with the page boundary, pages_per_zspage should be 1 so there is no next page. Change zsmalloc init_zspage() logic to iterate through each object on each of its pages, checking the offset to verify the object is on the current page before linking it into the zspage. Signed-off-by: Dan Streetman ddstr...@ieee.org Cc: Minchan Kim minc...@kernel.org --- mm/zsmalloc.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index c4a9157..03aa72f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) while (page) { struct page *next_page; struct link_free *link; - unsigned int i, objs_on_page; + unsigned int i = 1; /* * page-index stores offset of first object starting @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct size_class *class) link = (struct link_free *)kmap_atomic(page) + off / sizeof(*link); - objs_on_page = (PAGE_SIZE - off) / class-size; - for (i = 1; i = objs_on_page; i++) { - off += class-size; - if (off PAGE_SIZE) { - link-next = obj_location_to_handle(page, i); - link += class-size / sizeof(*link); - } + while ((off += class-size) PAGE_SIZE) { + link-next = obj_location_to_handle(page, i++); + link += class-size / sizeof(*link); } /* @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) link-next = obj_location_to_handle(next_page, 0); kunmap_atomic(link); page = next_page; - off = (off + class-size) % PAGE_SIZE; + off %= PAGE_SIZE; } } -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
On 09/11/2014 08:02 PM, Ren, Qiaowei wrote: On 2014-09-11, Hansen, Dave wrote: On 09/11/2014 01:46 AM, Qiaowei Ren wrote: + * This function will be called by do_munmap(), and the VMAs + covering + * the virtual address region start...end have already been split + if + * necessary and remvoed from the VMA list. remvoed - removed +void mpx_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end) { + int ret; + + ret = mpx_try_unmap(mm, start, end); + if (ret == -EINVAL) + force_sig(SIGSEGV, current); +} In the case of a fault during an unmap, this just ignores the situation and returns silently. Where is the code to retry the freeing operation outside of mmap_sem? Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post. OK, fine. Just please call that out in the description. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/