Re: [BUG] Bisected Problem with LSI PCI FC Adapter

2014-09-11 Thread Dirk Gouders
Bjorn Helgaas bhelg...@google.com writes:

 On Thu, Sep 11, 2014 at 3:24 PM, Dirk Gouders d...@gouders.net wrote:
 Bjorn Helgaas bhelg...@google.com writes:

 On Thu, Sep 11, 2014 at 2:33 PM, Dirk Gouders d...@gouders.net wrote:
 What I was currently trying was to construct a test-environment so that
 I do not need to do tests and diagnosis on a busy machine.

 I noticed that this problem seems to start with the narrow Root
 Bridge window (00-07) but every other machine that I had a look at,
 starts with (00-ff), so those will not trigger my problem.

 I thought I could perhaps try to shrink the window in
 acpi_pci_root_add() to trigger the problem and that kind of works: it
 triggers it but not exactly the same way, because it basically ends at
 this code in pci_scan_bridge():

 if (max = bus-busn_res.end) {
 dev_warn(dev-dev, can't allocate child bus %02x from 
 %pR (pass %d)\n,
  max, bus-busn_res, pass);
 goto out;
 }

 If this could work but I am just missing a small detail, I would be
 glad to hear about it and do the first tests this way.  If it is
 complete nonsense, I will just use the machine that triggers the problem
 for the tests.

 I was about to suggest the same thing.  If the problem is related to
 the bus number change, we should be able to force that to happen on a
 different machine.  Your approach sounds good, so I'm guessing we just
 need a tweak.

 I would first double-check that the PCI adapters are identical,
 including the firmware on the card.  Can you also include your patch
 and the resulting dmesg (with debug enabled as before)?

 Currently I am at home doing just tests for understanding and that I can
 hopefully use when I am back in the office.

 I already noticed the the backup FC Adapter on the test machine is not
 exactly the same: it is Rev. 1 whereas the one on the failing machine is
 Rev. 2.

 So, here at home my tests let a NIC disappear.  Different from the
 original problem but I was just trying to reconstruct the szenario of a
 misconfigured bridge causing a reconfiguration.

 What I was trying is:

 diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
 index e6ae603..fd146b3 100644
 --- a/drivers/acpi/pci_root.c
 +++ b/drivers/acpi/pci_root.c
 @@ -556,6 +556,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
 strcpy(acpi_device_name(device), ACPI_PCI_ROOT_DEVICE_NAME);
 strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
 device-driver_data = root;
 +   root-secondary.end = 0x02;

 pr_info(PREFIX %s [%s] (domain %04x %pR)\n,
acpi_device_name(device), acpi_device_bid(device),

 The device that disappears is a NIC:

 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core 
 processor DRAM Controller (rev 09)
 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen 
 Core processor Graphics Controller (rev 09)
 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset 
 Family USB xHCI Host Controller (rev 04)
 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series 
 Chipset Family MEI Controller #1 (rev 04)
 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset 
 Family USB Enhanced Host Controller #2 (rev 04)
 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family 
 High Definition Audio Controller (rev 04)
 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family 
 PCI Express Root Port 1 (rev c4)
 00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family 
 PCI Express Root Port 5 (rev c4)
 00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family 
 PCI Express Root Port 6 (rev c4)
 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset 
 Family USB Enhanced Host Controller #1 (rev 04)
 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4)
 00:1f.0 ISA bridge: Intel Corporation B75 Express Chipset LPC Controller 
 (rev 04)
 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset 
 Family 6-port SATA Controller [AHCI mode] (rev 04)
 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus 
 Controller (rev 04)
 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
 RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

 This is the one that is missing with the above change:
 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
 RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

 This situation is a little different, so I don't think you're
 reproducing the situation we want to test.  On this box, you have:

 pci_bus :00: root bus resource [bus 00-02]
 pci :00:1c.0: PCI bridge to [bus 01]
 pci :00:1c.4: PCI bridge to [bus 02]

 so we find all the devices on bus 00 and bus 02 (there's nothing on
 bus 01).  My guess is the 03:00.0 device is 

Re: [PATCH RFC 1/4] xen, blkfront: add support for the multi-queue block layer API

2014-09-11 Thread Arianna Avanzini
On Fri, Aug 22, 2014 at 08:02:14AM -0700, Christoph Hellwig wrote:
 Hi Arianna,
 
 thanks for doing this work!

Thank you for the comments, and sorry that it took so long for me to reply.

 
 keeping both the legacy and blk-mq is fine for testing, but before you
 submit the code for submission please make sure the blk-mq path
 unconditionally better and remove the legacy one, similar to most
 drivers we converted (virtio, mtip, soon nvme)

Thank you for the suggestion. In v2 I have just replaced the legacy path. For
testing I was just using the IOmeter script provided with fio that Konrad
Wilk showed me. Is there any other test I should do?

 
  +static int blkfront_queue_rq(struct blk_mq_hw_ctx *hctx, struct request 
  *req)
  +{
  +   struct blkfront_info *info = req-rq_disk-private_data;
  +
  +   pr_debug(Entered blkfront_queue_rq\n);
  +
  +   spin_lock_irq(info-io_lock);
  +   if (RING_FULL(info-ring))
  +   goto wait;
  +
  +   if ((req-cmd_type != REQ_TYPE_FS) ||
  +   ((req-cmd_flags  (REQ_FLUSH | REQ_FUA)) 
  +!info-flush_op)) {
  +   req-errors = -EIO;
  +   blk_mq_complete_request(req);
  +   spin_unlock_irq(info-io_lock);
  +   return BLK_MQ_RQ_QUEUE_ERROR;
 
 
  +   if (blkif_queue_request(req)) {
  +wait:
 
 Just a small style nipick: goto labels inside conditionals are not
 very easy to undertand.  Just add another goto here and move the wait
 label and its code to the very end of the function.

Right, thanks!

 
  +static int blkfront_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
  + unsigned int index)
  +{
  +   return 0;
  +}
 
 There is no need to have an empty implementation of this function,
 the blk-mq code is fine with not having one.
 
  +static void blkfront_complete(struct request *req)
  +{
  +   blk_mq_end_io(req, req-errors);
  +}
 
 No need to have this one either, blk_mq_end_io is the default I/O
 completion implementation if no other one is provided.
 

Right, I have removed the empty stub implementation.

  +   memset(info-tag_set, 0, sizeof(info-tag_set));
  +   info-tag_set.ops = blkfront_mq_ops;
  +   info-tag_set.nr_hw_queues = hardware_queues;
  +   info-tag_set.queue_depth = BLK_RING_SIZE;
  +   info-tag_set.numa_node = NUMA_NO_NODE;
  +   info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 
 You probably also want the recently added BLK_MQ_F_SG_MERGE flag,
 and maybe BLK_MQ_F_SHOULD_SORT depending on the speed of the device.
 
 Does Xenstore expose something like a rotational flag to key off wether
 we want to do guest side merging/scheduling?
 

As far as I know, it doesn't. Do you think that it would be useful to
advertise that information? (By the way, I saw that the BLK_MQ_F_SHOULD_SORT
flag has been removed, I suppose it has really taken me too much time to
reply to your e-mail).

  +   info-tag_set.cmd_size = 0;
  +   info-tag_set.driver_data = info;
  +
  +   if (blk_mq_alloc_tag_set(info-tag_set))
  +   return -1;
  +   rq = blk_mq_init_queue(info-tag_set);
  +   if (!rq) {
  +   blk_mq_free_tag_set(info-tag_set);
  +   return -1;
 
 It seems like returning -1 is the existing style in this driver, but
 it's generaly preferable to return a real errno.
 

Right, also the handling of the return value of blk_mq_init_queue() is wrong
(it returns ERR_PTR()). I have tried to fix that in the upcoming v2.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex_wait_setup sleeping while atomic bug.

2014-09-11 Thread Davidlohr Bueso
On Thu, 2014-09-11 at 23:52 +0200, Thomas Gleixner wrote:
 From: Thomas Gleixner t...@linutronix.de
 Date: Thu, 11 Sep 2014 23:44:35 +0200
 Subject: futex: Unlock hb-lock in futex_wait_requeue_pi() error path

That's the second time we are bitten by bugs in when requeing, now pi.
We need to reconsider some of our testing tools to stress these paths
better, imo.

 futex_wait_requeue_pi() calls futex_wait_setup(). If
 futex_wait_setup() succeeds it returns with hb-lock held and
 preemption disabled. Now the sanity check after this does:
 
 if (match_futex(q.key, key2)) {
   ret = -EINVAL;
   goto out_put_keys;
   }
 
 which releases the keys but does not release hb-lock. So we happily
 return to user space with hb-lock held and therefor preemption
 disabled.
 
 Unlock hb-lock before taking the exit route.
 
 Reported-by: Dave Trinity Jones da...@redhat.com
 Signed-off-by: Thomas Gleixner t...@linutronix.de

Reviewed-by: Davidlohr Bueso d...@stgolabs.net


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer

2014-09-11 Thread Doug Anderson
Marc,

On Thu, Sep 11, 2014 at 10:43 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 11/09/14 18:29, Doug Anderson wrote:
 Marc,

 On Thu, Sep 11, 2014 at 10:22 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 We would need to run this code potentially at processor bringup and
 after suspend/resume, but that seems possible too.

 Note that this would be an ARMv7 only thing (you can't do that on ARMv8,
 at all).

 Yes, of course.


 Is the transition to monitor mode and back simple?  Where would you
 suggest putting this code?  It would definitely need to be pretty
 early.  We'd also need to be able to detect that we're in Secure SVC
 and not mess up anyone else who happened to boot in Non Secure SVC.

 This would have to live in some very early platform-specific code. The
 ugly part is that you cannot find out what world you're in (accessing
 SCR is going to send you to UNDEF-land if accessed from NS).

 Yup, so the question is: would such code be accepted upstream, or are
 we going to embark on a big job for someone to figure out how to do
 this only to get NAKed?

 If there was some indication that folks would take this, I think we
 might be able to get it coded up.  If someone else wanted to volunteer
 to code it that would make me even happier, but maybe that's pushing
 my luck.  ;)

 Writing the code is a 5 minute job. Getting it accepted is another
 story, and I'm not sure everyone would agree on that.

 If I was suicidal, I'd suggest you could pass a parameter to the command
 line, interpreted by the timer code... But I since I'm not, let's
 pretend I haven't said anything... ;-)

 I did this in the past (again, see Sonny's thread), but didn't
 consider myself knowledgeable to know if that was truly a good test:

asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);
val |= (1  2);
asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val));
val = 0x;
asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);

 The idea being that if you can make modifications to the SCR register
 (and see your changes take effect) then you must be in secure mode.
 In my case the first printout was 0x0 and the second was 0x4.

 The main issue is when you're *not* in secure mode. It is likely that
 this will explode badly. This is why I suggested something that is set
 by the bootloader (after all. it knows which mode it is booted in), and
 that the timer driver can use when the CPU comes up.

 Still, very ugly...

Ah, got it.  Well, unless someone can suggest a clean way to do this,
then I guess we'll keep what we've got...

-Doug
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer

2014-09-11 Thread Stephen Boyd
On 09/11/14 10:43, Marc Zyngier wrote:
 If I was suicidal, I'd suggest you could pass a parameter to the command
 line, interpreted by the timer code... But I since I'm not, let's
 pretend I haven't said anything... ;-)
 I did this in the past (again, see Sonny's thread), but didn't
 consider myself knowledgeable to know if that was truly a good test:

asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);
val |= (1  2);
asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val));
val = 0x;
asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);

 The idea being that if you can make modifications to the SCR register
 (and see your changes take effect) then you must be in secure mode.
 In my case the first printout was 0x0 and the second was 0x4.
 The main issue is when you're *not* in secure mode. It is likely that
 this will explode badly. This is why I suggested something that is set
 by the bootloader (after all. it knows which mode it is booted in), and
 that the timer driver can use when the CPU comes up.

Where does this platform jump to when a CPU comes up? Is it
rockchip_secondary_startup()? I wonder if that path could have this
little bit of assembly to poke the cntvoff in monitor mode and then jump
to secondary_startup()? Before we boot any secondary CPUs we could also
read the cntvoff for CPU0 in the platform specific layer (where we know
we're running in secure mode) and then use that value as the reset
value for the secondaries. Or does this platform boot up in secure mode
some times and non-secure mode other times?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu

2014-09-11 Thread Joonwoo Park
When a deferrable work (INIT_DEFERRABLE_WORK, etc.) is queued via
queue_delayed_work() it's probably intended to run the work item on any
CPU that isn't idle. However, we queue the work to run at a later time
by starting a deferrable timer that binds to whatever CPU the work is
queued on which is same with queue_delayed_work_on(smp_processor_id())
effectively.

As a result WORK_CPU_UNBOUND work items aren't really cpu unbound now.
In fact this is perfectly fine with UP kernel and also won't affect much a
system without dyntick with SMP kernel too as every cpus run timers
periodically.  But on SMP systems with dyntick current implementation leads
deferrable timers not very scalable because the timer's base which has
queued the deferrable timer won't wake up till next non-deferrable timer
expires even though there are possible other non idle cpus are running
which are able to run expired deferrable timers.

The deferrable work is a good example of the current implementation's
victim like below.

INIT_DEFERRABLE_WORK(dwork, fn);
CPU 0 CPU 1
queue_delayed_work(wq, dwork, HZ);
queue_delayed_work_on(WORK_CPU_UNBOUND);
...
__mod_timer() - queues timer to the
 current cpu's timer
 base.
...
tick_nohz_idle_enter() - cpu enters idle.
A second later
cpu 0 is now in idle. cpu 1 exits idle or wasn't in idle so
  now it's in active but won't
cpu 0 won't wake up till next handle cpu unbound deferrable timer
non-deferrable timer expires. as it's in cpu 0's timer base.

To make all cpu unbound deferrable timers are scalable, introduce a common
timer base which is only for cpu unbound deferrable timers to make those
are indeed cpu unbound so that can be scheduled by any of non idle cpus.
This common timer fixes scalability issue of delayed work and all other cpu
unbound deferrable timer using implementations.

cc: Thomas Gleixner t...@linutronix.de
CC: John Stultz john.stu...@linaro.org
CC: Tejun Heo t...@kernel.org
Signed-off-by: Joonwoo Park joonw...@codeaurora.org
---
 kernel/time/timer.c | 108 +++-
 1 file changed, 82 insertions(+), 26 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index aca5dfe..655076b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -93,6 +93,9 @@ struct tvec_base {
 struct tvec_base boot_tvec_bases;
 EXPORT_SYMBOL(boot_tvec_bases);
 static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = boot_tvec_bases;
+#ifdef CONFIG_SMP
+static struct tvec_base *tvec_base_deferral = boot_tvec_bases;
+#endif
 
 /* Functions below help us manage 'deferrable' flag */
 static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
@@ -655,7 +658,14 @@ static inline void debug_assert_init(struct timer_list 
*timer)
 static void do_init_timer(struct timer_list *timer, unsigned int flags,
  const char *name, struct lock_class_key *key)
 {
-   struct tvec_base *base = __raw_get_cpu_var(tvec_bases);
+   struct tvec_base *base;
+
+#ifdef CONFIG_SMP
+   if (flags  TIMER_DEFERRABLE)
+   base = tvec_base_deferral;
+   else
+#endif
+   base = __raw_get_cpu_var(tvec_bases);
 
timer-entry.next = NULL;
timer-base = (void *)((unsigned long)base | flags);
@@ -777,26 +787,32 @@ __mod_timer(struct timer_list *timer, unsigned long 
expires,
 
debug_activate(timer, expires);
 
-   cpu = get_nohz_timer_target(pinned);
-   new_base = per_cpu(tvec_bases, cpu);
+#ifdef CONFIG_SMP
+   if (base != tvec_base_deferral) {
+#endif
+   cpu = get_nohz_timer_target(pinned);
+   new_base = per_cpu(tvec_bases, cpu);
 
-   if (base != new_base) {
-   /*
-* We are trying to schedule the timer on the local CPU.
-* However we can't change timer's base while it is running,
-* otherwise del_timer_sync() can't detect that the timer's
-* handler yet has not finished. This also guarantees that
-* the timer is serialized wrt itself.
-*/
-   if (likely(base-running_timer != timer)) {
-   /* See the comment in lock_timer_base() */
-   timer_set_base(timer, NULL);
-   spin_unlock(base-lock);
-   base = new_base;
-   spin_lock(base-lock);
-   timer_set_base(timer, base);
+   if (base != new_base) {
+   /*
+* We are trying to schedule the timer on the local CPU.
+* However we can't change timer's base while it is
+* running, otherwise del_timer_sync() can't detect that
+* the timer's handler yet has not finished. 

[PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2014-09-11 Thread Arianna Avanzini
Hello,

this patchset adds to the Xen PV block driver support to exploit the multi-
queue block layer API by sharing and using multiple I/O rings in the frontend
and backend. It is the result of my internship for GNOME's Outreach Program
for Women ([1]), in which I was mentored by Konrad Rzeszutek Wilk.

The patchset implements in the backend driver the retrieval of information
about the currently-in-use block layer API for a certain device and about
the number of available submission queues, if the API turns out to be the
multi-queue one. The information is then advertised to the frontend driver
via XenStore.
The frontend device can exploit such an information to allocate and grant
multiple I/O rings and advertise the final number to the backend so that
it will be able to map them.
The patchset has been tested with fio's IOmeter emulation on a four-cores
machine with a null_blk device (some results are available here: [2]).

With respect to the first version of this RFC patchset ([3]), the patchset has
undergone the following changes (as the structure of the patchset itself
has changed, I'm summarizing them here).

. Now the use of the multi-queue API replaces that of the request queue API,
  as indicated by Christoph Hellwig.
. Patch 0003 from the previous patchset has been split into two patches, the
  first introducing in the frontend actual support for multiple block rings,
  the second adding support to negotiate the number of I/O rings with the
  backend, as suggested by David Vrabel.
. Patch 0004 from the previous patchset has been split into two patches, the
  first introducing in the backend support for multiple block rings, the second
  adding support to negotiate the number of I/O rings, as suggested by David
  Vrabel.
. Added the BLK_MQ_F_SG_MERGE and BLK_MQ_F_SHOULD_SORT flags to the frontend
  driver's initialization as suggested by Christoph Hellwig.
. Removed empty/useless definition of the init_hctx and complete hooks, as
  pointed out by Christoph Hellwig.
. Removed useless debug printk()s from code added in xen-blkfront, as indicated
  by David Vrabel.
. Added return of an actual error code in the blk_mq_init_queue() failure path
  in xlvbd_init_blk_queue(), as suggested by Christoph Hellwig.
. Fixed coding style issue in blkfront_queue_rq() as suggested by Christoph
  Hellwig.

. Added support for the migration of a multi-queue-capable domU to a host with
  non-multi-queue-capable devices.
. Fixed locking issues in the interrupt path, avoiding to grab the io_lock
  twice when calling blk_mq_start_stopped_hw_queues().
. Fixed wrong use of the return value of blk_mq_init_queue().
. Dropped the use of ternary operator in the macros that compute the number
  of per-ring requests and grants: now they use the max() macro.

Any comments or suggestions are more than welcome.
Thank you,
Arianna

[1] http://goo.gl/bcvHMh
[2] http://goo.gl/O8RlLL
[3] http://lkml.org/lkml/2014/8/22/158

Arianna Avanzini (5):
  xen, blkfront: port to the the multi-queue block layer API
  xen, blkfront: introduce support for multiple block rings
  xen, blkfront: negotiate the number of block rings with the backend
  xen, blkback: introduce support for multiple block rings
  xen, blkback: negotiate of the number of block rings with the frontend

 drivers/block/xen-blkback/blkback.c | 377 ---
 drivers/block/xen-blkback/common.h  | 110 +++--
 drivers/block/xen-blkback/xenbus.c  | 472 +--
 drivers/block/xen-blkfront.c| 894 +---
 4 files changed, 1122 insertions(+), 731 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v2 5/5] xen, blkback: negotiate of the number of block rings with the frontend

2014-09-11 Thread Arianna Avanzini
This commit lets the backend driver advertise the number of available
hardware queues; it also implements gathering from the frontend driver
the number of rings actually available for mapping.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 drivers/block/xen-blkback/xenbus.c | 44 +-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index a4f13cc..9ff6ced 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -477,6 +477,34 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd-bdev = NULL;
 }
 
+static int xen_advertise_hw_queues(struct xen_blkif *blkif,
+  struct request_queue *q)
+{
+   struct xen_vbd *vbd = blkif-vbd;
+   struct xenbus_transaction xbt;
+   int err;
+
+   if (q  q-mq_ops)
+   vbd-nr_supported_hw_queues = q-nr_hw_queues;
+
+   err = xenbus_transaction_start(xbt);
+   if (err) {
+   BUG_ON(!blkif-be);
+   xenbus_dev_fatal(blkif-be-dev, err, starting transaction (hw 
queues));
+   return err;
+   }
+
+   err = xenbus_printf(xbt, blkif-be-dev-nodename, 
nr_supported_hw_queues, %u,
+   blkif-vbd.nr_supported_hw_queues);
+   if (err)
+   xenbus_dev_error(blkif-be-dev, err, writing 
%s/nr_supported_hw_queues,
+blkif-be-dev-nodename);
+
+   xenbus_transaction_end(xbt, 0);
+
+   return err;
+}
+
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -484,6 +512,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
struct xen_vbd *vbd;
struct block_device *bdev;
struct request_queue *q;
+   int err;
 
vbd = blkif-vbd;
vbd-handle   = handle;
@@ -522,6 +551,10 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (q  blk_queue_secdiscard(q))
vbd-discard_secure = true;
 
+   err = xen_advertise_hw_queues(blkif, q);
+   if (err)
+   return -ENOENT;
+
DPRINTK(Successful creation of handle=%04x (dom=%u)\n,
handle, blkif-domid);
return 0;
@@ -935,7 +968,16 @@ static int connect_ring(struct backend_info *be)
 
DPRINTK(%s, dev-otherend);
 
-   blkif-nr_rings = 1;
+   err = xenbus_gather(XBT_NIL, dev-otherend, nr_blk_rings,
+   %u, blkif-nr_rings, NULL);
+   if (err) {
+   /*
+* Frontend does not support multiqueue; force compatibility
+* mode of the driver.
+*/
+   blkif-vbd.nr_supported_hw_queues = 0;
+   blkif-nr_rings = 1;
+   }
 
ring_ref = kzalloc(sizeof(unsigned long) * blkif-nr_rings, GFP_KERNEL);
if (!ring_ref)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v2 2/5] xen, blkfront: introduce support for multiple block rings

2014-09-11 Thread Arianna Avanzini
This commit introduces in xen-blkfront actual support for multiple
block rings. The number of block rings to be used is still forced
to one.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 drivers/block/xen-blkfront.c | 710 +--
 1 file changed, 410 insertions(+), 300 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 109add6..9282df1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -102,30 +102,44 @@ MODULE_PARM_DESC(max, Maximum amount of segments in 
indirect requests (default
 #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
 
 /*
+ * Data structure keeping per-ring info. A blkfront_info structure is always
+ * associated with one or more blkfront_ring_info.
+ */
+struct blkfront_ring_info
+{
+   spinlock_t io_lock;
+   int ring_ref;
+   struct blkif_front_ring ring;
+   unsigned int evtchn, irq;
+   struct blk_shadow shadow[BLK_RING_SIZE];
+   unsigned long shadow_free;
+
+   struct work_struct work;
+   struct gnttab_free_callback callback;
+   struct list_head grants;
+   struct list_head indirect_pages;
+   unsigned int persistent_gnts_c;
+
+   struct blkfront_info *info;
+   unsigned int hctx_index;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
  */
 struct blkfront_info
 {
-   spinlock_t io_lock;
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
-   int ring_ref;
-   struct blkif_front_ring ring;
-   unsigned int evtchn, irq;
+   unsigned int nr_rings;
+   struct blkfront_ring_info *rinfo;
struct request_queue *rq;
-   struct work_struct work;
-   struct gnttab_free_callback callback;
-   struct blk_shadow shadow[BLK_RING_SIZE];
-   struct list_head grants;
-   struct list_head indirect_pages;
-   unsigned int persistent_gnts_c;
-   unsigned long shadow_free;
unsigned int feature_flush;
unsigned int flush_op;
unsigned int feature_discard:1;
@@ -169,32 +183,35 @@ static DEFINE_SPINLOCK(minor_lock);
 #define INDIRECT_GREFS(_segs) \
((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
+static int blkfront_gather_indirect(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo,
+  unsigned int segs);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-   unsigned long free = info-shadow_free;
+   unsigned long free = rinfo-shadow_free;
BUG_ON(free = BLK_RING_SIZE);
-   info-shadow_free = info-shadow[free].req.u.rw.id;
-   info-shadow[free].req.u.rw.id = 0x0fee; /* debug */
+   rinfo-shadow_free = rinfo-shadow[free].req.u.rw.id;
+   rinfo-shadow[free].req.u.rw.id = 0x0fee; /* debug */
return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
   unsigned long id)
 {
-   if (info-shadow[id].req.u.rw.id != id)
+   if (rinfo-shadow[id].req.u.rw.id != id)
return -EINVAL;
-   if (info-shadow[id].request == NULL)
+   if (rinfo-shadow[id].request == NULL)
return -EINVAL;
-   info-shadow[id].req.u.rw.id  = info-shadow_free;
-   info-shadow[id].request = NULL;
-   info-shadow_free = id;
+   rinfo-shadow[id].req.u.rw.id  = rinfo-shadow_free;
+   rinfo-shadow[id].request = NULL;
+   rinfo-shadow_free = id;
return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
+   struct blkfront_info *info = rinfo-info;
struct page *granted_page;
struct grant *gnt_list_entry, *n;
int i = 0;
@@ -214,7 +231,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
}
 
gnt_list_entry-gref = GRANT_INVALID_REF;
-   list_add(gnt_list_entry-node, info-grants);
+   list_add(gnt_list_entry-node, rinfo-grants);
i++;
}
 
@@ -222,7 +239,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
 
 out_of_memory:
list_for_each_entry_safe(gnt_list_entry, n,
-info-grants, node) {
+rinfo-grants, node) {
list_del(gnt_list_entry-node);
if (info-feature_persistent)

Re: /proc/pid/exe symlink behavior change in =3.15.

2014-09-11 Thread Mateusz Guzik
On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote:
 On Sun, 7 Sep 2014 09:56:08 +0200
 Mateusz Guzik mgu...@redhat.com wrote:
 
  On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
   Hi,
   
   Starting with kernel 3.15 the 'exe' symlink under /proc/pid/ acts 
   diffrent
   than it used to in all the pre-3.15 kernels.
   
   The usecase:
   
   run /root/testbin (app that just sleeps)
   cp /root/testbin /root/testbin.new
   mv /root/testbin.new /root/testbin
   ls -al /proc/`pidof testbin`/exe
   
   =3.14: /root/testbin (deleted)
   =3.15: /root/testbin.new (deleted)
   
   Was the change intentional? It does render my system unusable and I failed
   to find a information about such change in the ChangeLog.
   
  
  It looks like this was already broken for long ( DNAME_INLINE_LEN)
  names.
  
  Short names share the problem since da1ce0670c14d8 vfs: add
  cross-rename.
  
  The following change to switch_names is the culprit:
  
  -   memcpy(dentry-d_iname, target-d_name.name,
  -   target-d_name.len + 1);
  -   dentry-d_name.len = target-d_name.len;
  -   return;
  +   unsigned int i;
  +   BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
  +   for (i = 0; i  DNAME_INLINE_LEN / sizeof(long); i++) {
  + swap(((long *) dentry-d_iname)[i],
  +   ((long *) target-d_iname)[i]);
  +   }
  
  
  Dentries can have names from embedded structure or from an external buffer.
  
  If you take a look around you will see the code just swaps pointers for
  both external case. But this results in the same behavoiur you are seeing.
  
 
 Looks like the real problem here is that __d_materialise_dentry() needs the
 old behavior of switch_names() . At least that's how it got fixed in 
 grsecurity.

No.

Regression in question is an effect of swap instead of memcpy in
switch_names, as called by d_move. Fix in grsecurity reverts to previous
behaviour when needed and imho should be applied for the time being.

The real problem is that __d_move always switches parent dentry and
calls switch_names, which actually switches names in some cases.

Without the regression you get expected results only for short names
when you move stuff around within the same directory.

For instance with current code:
mv /foo/bar/baz /1/2/3

will replace the whole path.

Previous behavoiur would result in /foo/bar/3 as the new path, which is
clearly still incorrect

Leaving the old dentry under the same parent would mean that the tree
associated with now moved dentry will possibly need to be freed.

In addition to that one has to deal with the need of having renamed
dentry the new name which possibly came from an external buffer. An idea
I came up with (atomic_t refcount; char name[0]; with -name assigned to
dentry) may require adding an additional field to struct dentry, which
would be bad.

I didn't have the time yet to look at this stuff properly.

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v2 4/5] xen, blkback: introduce support for multiple block rings

2014-09-11 Thread Arianna Avanzini
This commit adds to xen-blkback the support to map and make use
of a variable number of ringbuffers. The number of rings to be
mapped is forcedly set to one.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 drivers/block/xen-blkback/blkback.c | 377 ---
 drivers/block/xen-blkback/common.h  | 110 +
 drivers/block/xen-blkback/xenbus.c  | 432 +++-
 3 files changed, 548 insertions(+), 371 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 64c60ed..b31acfb 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -80,6 +80,9 @@ module_param_named(max_persistent_grants, 
xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  Maximum number of grants to map persistently);
 
+#define XEN_RING_MAX_PGRANTS(nr_rings) \
+   (max((int)(xen_blkif_max_pgrants / nr_rings), 16))
+
 /*
  * The LRU mechanism to clean the lists of persistent grants needs to
  * be executed periodically. The time interval between consecutive executions
@@ -103,71 +106,71 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to free_xenballooned_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
 
-   spin_lock_irqsave(blkif-free_pages_lock, flags);
-   if (list_empty(blkif-free_pages)) {
-   BUG_ON(blkif-free_pages_num != 0);
-   spin_unlock_irqrestore(blkif-free_pages_lock, flags);
+   spin_lock_irqsave(ring-free_pages_lock, flags);
+   if (list_empty(ring-free_pages)) {
+   BUG_ON(ring-free_pages_num != 0);
+   spin_unlock_irqrestore(ring-free_pages_lock, flags);
return alloc_xenballooned_pages(1, page, false);
}
-   BUG_ON(blkif-free_pages_num == 0);
-   page[0] = list_first_entry(blkif-free_pages, struct page, lru);
+   BUG_ON(ring-free_pages_num == 0);
+   page[0] = list_first_entry(ring-free_pages, struct page, lru);
list_del(page[0]-lru);
-   blkif-free_pages_num--;
-   spin_unlock_irqrestore(blkif-free_pages_lock, flags);
+   ring-free_pages_num--;
+   spin_unlock_irqrestore(ring-free_pages_lock, flags);
 
return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
-  int num)
+static inline void put_free_pages(struct xen_blkif_ring *ring,
+ struct page **page, int num)
 {
unsigned long flags;
int i;
 
-   spin_lock_irqsave(blkif-free_pages_lock, flags);
+   spin_lock_irqsave(ring-free_pages_lock, flags);
for (i = 0; i  num; i++)
-   list_add(page[i]-lru, blkif-free_pages);
-   blkif-free_pages_num += num;
-   spin_unlock_irqrestore(blkif-free_pages_lock, flags);
+   list_add(page[i]-lru, ring-free_pages);
+   ring-free_pages_num += num;
+   spin_unlock_irqrestore(ring-free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
struct page *page[NUM_BATCH_FREE_PAGES];
unsigned int num_pages = 0;
unsigned long flags;
 
-   spin_lock_irqsave(blkif-free_pages_lock, flags);
-   while (blkif-free_pages_num  num) {
-   BUG_ON(list_empty(blkif-free_pages));
-   page[num_pages] = list_first_entry(blkif-free_pages,
+   spin_lock_irqsave(ring-free_pages_lock, flags);
+   while (ring-free_pages_num  num) {
+   BUG_ON(list_empty(ring-free_pages));
+   page[num_pages] = list_first_entry(ring-free_pages,
   struct page, lru);
list_del(page[num_pages]-lru);
-   blkif-free_pages_num--;
+   ring-free_pages_num--;
if (++num_pages == NUM_BATCH_FREE_PAGES) {
-   spin_unlock_irqrestore(blkif-free_pages_lock, flags);
+   spin_unlock_irqrestore(ring-free_pages_lock, flags);
free_xenballooned_pages(num_pages, page);
-   spin_lock_irqsave(blkif-free_pages_lock, flags);
+   spin_lock_irqsave(ring-free_pages_lock, flags);
num_pages = 0;
}
}
-   spin_unlock_irqrestore(blkif-free_pages_lock, flags);
+   spin_unlock_irqrestore(ring-free_pages_lock, flags);
if (num_pages != 0)
free_xenballooned_pages(num_pages, page);
 }
 
 #define vaddr(page) ((unsigned 

[PATCH RFC v2 3/5] xen, blkfront: negotiate the number of block rings with the backend

2014-09-11 Thread Arianna Avanzini
This commit implements the negotiation of the number of block rings
to be used; as a default, the number of rings is decided by the
frontend driver and is equal to the number of hardware queues that
the backend makes available. In case of guest migration towards a
host whose devices expose a different number of hardware queues, the
number of I/O rings used by the frontend driver remains the same;
XenStore keys may vary if the frontend needs to be compatible with
a host not having multi-queue support.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 drivers/block/xen-blkfront.c | 95 +++-
 1 file changed, 84 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9282df1..77e311d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -137,7 +137,7 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
-   unsigned int nr_rings;
+   unsigned int nr_rings, old_nr_rings;
struct blkfront_ring_info *rinfo;
struct request_queue *rq;
unsigned int feature_flush;
@@ -147,6 +147,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+   unsigned int hardware_queues;
unsigned int max_indirect_segments;
int is_ready;
/* Block layer tags. */
@@ -669,7 +670,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
 
memset(info-tag_set, 0, sizeof(info-tag_set));
info-tag_set.ops = blkfront_mq_ops;
-   info-tag_set.nr_hw_queues = 1;
+   info-tag_set.nr_hw_queues = info-hardware_queues ? : 1;
info-tag_set.queue_depth = BLK_RING_SIZE;
info-tag_set.numa_node = NUMA_NO_NODE;
info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -938,6 +939,7 @@ static void xlvbd_release_gendisk(struct blkfront_info 
*info)
info-gd = NULL;
 }
 
+/* Must be called with io_lock held */
 static void kick_pending_request_queues(struct blkfront_ring_info *rinfo,
unsigned long *flags)
 {
@@ -1351,10 +1353,24 @@ again:
goto destroy_blkring;
}
 
+   /* Advertise the number of rings */
+   err = xenbus_printf(xbt, dev-nodename, nr_blk_rings,
+   %u, info-nr_rings);
+   if (err) {
+   xenbus_dev_fatal(dev, err, advertising number of rings);
+   goto abort_transaction;
+   }
+
for (i = 0 ; i  info-nr_rings ; i++) {
-   BUG_ON(i  0);
-   snprintf(ring_ref_s, 64, ring-ref);
-   snprintf(evtchn_s, 64, event-channel);
+   if (!info-hardware_queues) {
+   BUG_ON(i  0);
+   /* Support old XenStore keys */
+   snprintf(ring_ref_s, 64, ring-ref);
+   snprintf(evtchn_s, 64, event-channel);
+   } else {
+   snprintf(ring_ref_s, 64, ring-ref-%d, i);
+   snprintf(evtchn_s, 64, event-channel-%d, i);
+   }
err = xenbus_printf(xbt, dev-nodename,
ring_ref_s, %u, info-rinfo[i].ring_ref);
if (err) {
@@ -1403,6 +1419,14 @@ again:
return err;
 }
 
+static inline int blkfront_gather_hw_queues(struct blkfront_info *info,
+   unsigned int *nr_queues)
+{
+   return xenbus_gather(XBT_NIL, info-xbdev-otherend,
+nr_supported_hw_queues, %u, nr_queues,
+NULL);
+}
+
 /**
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffer for communication with the backend, and
@@ -1414,6 +1438,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 {
int err, vdevice, i, r;
struct blkfront_info *info;
+   unsigned int nr_queues;
 
/* FIXME: Use dynamic device id if this is not set. */
err = xenbus_scanf(XBT_NIL, dev-nodename,
@@ -1472,10 +1497,19 @@ static int blkfront_probe(struct xenbus_device *dev,
info-handle = simple_strtoul(strrchr(dev-nodename, '/')+1, NULL, 0);
dev_set_drvdata(dev-dev, info);
 
-   /* Allocate the correct number of rings. */
-   info-nr_rings = 1;
-   pr_info(blkfront: %s: %d rings\n,
-   info-gd-disk_name, info-nr_rings);
+   /* Gather the number of hardware queues as soon as possible */
+   err = blkfront_gather_hw_queues(info, nr_queues);
+   if (err)
+   info-hardware_queues = 0;
+   else
+   info-hardware_queues = nr_queues;
+   /*
+* The backend has told us the number of hw queues he wants.
+* Allocate the correct number of rings.
+*/
+   

[PATCH RFC v2 1/5] xen, blkfront: port to the the multi-queue block layer API

2014-09-11 Thread Arianna Avanzini
This commit introduces support for the multi-queue block layer API,
and at the same time removes the existing request_queue API support.
The changes are only structural, and the number of supported hardware
contexts is forcedly set to one.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 drivers/block/xen-blkfront.c | 171 ---
 1 file changed, 80 insertions(+), 91 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5deb235..109add6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@
 
 #include linux/interrupt.h
 #include linux/blkdev.h
+#include linux/blk-mq.h
 #include linux/hdreg.h
 #include linux/cdrom.h
 #include linux/module.h
@@ -134,6 +135,8 @@ struct blkfront_info
unsigned int feature_persistent:1;
unsigned int max_indirect_segments;
int is_ready;
+   /* Block layer tags. */
+   struct blk_mq_tag_set tag_set;
 };
 
 static unsigned int nr_minors;
@@ -582,66 +585,69 @@ static inline void flush_requests(struct blkfront_info 
*info)
notify_remote_via_irq(info-irq);
 }
 
-/*
- * do_blkif_request
- *  read a block; request is in a request queue
- */
-static void do_blkif_request(struct request_queue *rq)
+static int blkfront_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 {
-   struct blkfront_info *info = NULL;
-   struct request *req;
-   int queued;
-
-   pr_debug(Entered do_blkif_request\n);
-
-   queued = 0;
-
-   while ((req = blk_peek_request(rq)) != NULL) {
-   info = req-rq_disk-private_data;
+   struct blkfront_info *info = req-rq_disk-private_data;
 
-   if (RING_FULL(info-ring))
-   goto wait;
+   spin_lock_irq(info-io_lock);
+   if (RING_FULL(info-ring))
+   goto wait;
 
-   blk_start_request(req);
+   if ((req-cmd_type != REQ_TYPE_FS) ||
+   ((req-cmd_flags  (REQ_FLUSH | REQ_FUA)) 
+!info-flush_op)) {
+   req-errors = -EIO;
+   blk_mq_complete_request(req);
+   spin_unlock_irq(info-io_lock);
+   return BLK_MQ_RQ_QUEUE_ERROR;
+   }
 
-   if ((req-cmd_type != REQ_TYPE_FS) ||
-   ((req-cmd_flags  (REQ_FLUSH | REQ_FUA)) 
-   !info-flush_op)) {
-   __blk_end_request_all(req, -EIO);
-   continue;
-   }
+   if (blkif_queue_request(req)) {
+   blk_mq_requeue_request(req);
+   goto wait;
+   }
 
-   pr_debug(do_blk_req %p: cmd %p, sec %lx, 
-(%u/%u) [%s]\n,
-req, req-cmd, (unsigned long)blk_rq_pos(req),
-blk_rq_cur_sectors(req), blk_rq_sectors(req),
-rq_data_dir(req) ? write : read);
+   flush_requests(info);
+   spin_unlock_irq(info-io_lock);
+   return BLK_MQ_RQ_QUEUE_OK;
 
-   if (blkif_queue_request(req)) {
-   blk_requeue_request(rq, req);
 wait:
-   /* Avoid pointless unplugs. */
-   blk_stop_queue(rq);
-   break;
-   }
-
-   queued++;
-   }
-
-   if (queued != 0)
-   flush_requests(info);
+   /* Avoid pointless unplugs. */
+   blk_mq_stop_hw_queue(hctx);
+   spin_unlock_irq(info-io_lock);
+   return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static struct blk_mq_ops blkfront_mq_ops = {
+   .queue_rq = blkfront_queue_rq,
+   .map_queue = blk_mq_map_queue,
+};
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
unsigned int physical_sector_size,
unsigned int segments)
 {
struct request_queue *rq;
struct blkfront_info *info = gd-private_data;
+   int ret;
+
+   memset(info-tag_set, 0, sizeof(info-tag_set));
+   info-tag_set.ops = blkfront_mq_ops;
+   info-tag_set.nr_hw_queues = 1;
+   info-tag_set.queue_depth = BLK_RING_SIZE;
+   info-tag_set.numa_node = NUMA_NO_NODE;
+   info-tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   info-tag_set.cmd_size = 0;
+   info-tag_set.driver_data = info;
 
-   rq = blk_init_queue(do_blkif_request, info-io_lock);
-   if (rq == NULL)
-   return -1;
+   if ((ret = blk_mq_alloc_tag_set(info-tag_set)))
+   return ret;
+   rq = blk_mq_init_queue(info-tag_set);
+   if (IS_ERR(rq)) {
+   blk_mq_free_tag_set(info-tag_set);
+   return PTR_ERR(rq);
+   }
+   rq-queuedata = info;
 
queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
@@ -871,7 +877,7 @@ static void xlvbd_release_gendisk(struct blkfront_info 
*info)
spin_lock_irqsave(info-io_lock, flags);
 
   

Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer

2014-09-11 Thread Doug Anderson
Stephen,

On Thu, Sep 11, 2014 at 4:56 PM, Stephen Boyd sb...@codeaurora.org wrote:
 On 09/11/14 10:43, Marc Zyngier wrote:
 If I was suicidal, I'd suggest you could pass a parameter to the command
 line, interpreted by the timer code... But I since I'm not, let's
 pretend I haven't said anything... ;-)
 I did this in the past (again, see Sonny's thread), but didn't
 consider myself knowledgeable to know if that was truly a good test:

asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);
val |= (1  2);
asm volatile(mcr p15, 0, %0, c1, c1, 0 : : r (val));
val = 0x;
asm volatile(mrc p15, 0, %0, c1, c1, 0 : =r (val));
pr_info(DOUG: val is %#010x, val);

 The idea being that if you can make modifications to the SCR register
 (and see your changes take effect) then you must be in secure mode.
 In my case the first printout was 0x0 and the second was 0x4.
 The main issue is when you're *not* in secure mode. It is likely that
 this will explode badly. This is why I suggested something that is set
 by the bootloader (after all. it knows which mode it is booted in), and
 that the timer driver can use when the CPU comes up.

 Where does this platform jump to when a CPU comes up? Is it
 rockchip_secondary_startup()? I wonder if that path could have this
 little bit of assembly to poke the cntvoff in monitor mode and then jump
 to secondary_startup()? Before we boot any secondary CPUs we could also
 read the cntvoff for CPU0 in the platform specific layer (where we know
 we're running in secure mode) and then use that value as the reset
 value for the secondaries. Or does this platform boot up in secure mode
 some times and non-secure mode other times?

I guess it would depend a whole lot on the bootloader, wouldn't it?

With our current get out of the way bootloader, Linux always sees
Secure SVC.  ...but if someone decided to put a new bootloader on
the system that wanted to do something different (implement security
and boot the kernel in nonsecure HYP or implement a hypervisor and
boot the kernel in nonsecure SVC) then everything would be different.

If someone were to write a bootloader like that (or perhaps if we're
running in a VM?) then I'd imagine that the whole world would be
different.  Somehow this secure bootloader and/or hypervisor would
_have_ to be involved in processor bringup and suspend/resume.  Since
I've never looked at code implementing either of these I'm just making
assumptions, though.

-Doug
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v4 07/12] usb: chipidea: add a usb2 driver for ci13xxx

2014-09-11 Thread Peter Chen

 
 On Thu, Sep 11, 2014 at 09:07:10AM +0800, Peter Chen wrote:
  On Wed, Sep 03, 2014 at 09:48:26AM +0200, Antoine Tenart wrote:
   +
   +static int ci_hdrc_usb2_dt_probe(struct device *dev,
   +  struct ci_hdrc_platform_data *ci_pdata) {
   + ci_pdata-phy = of_phy_get(dev-of_node, 0);
   + if (IS_ERR(ci_pdata-phy)) {
   + if (PTR_ERR(ci_pdata-phy) == -EPROBE_DEFER)
   + return -EPROBE_DEFER;
   +
   + /* PHY is optional */
   + ci_pdata-phy = NULL;
   + }
   +
   + return 0;
   +}
 
  You may also need to consider usb_phy case.
 
 Don't we try using the generic PHY framework for new drivers?
 
 Since there is no need for supporting an usb_phy case I don't think we have to
 consider this case yet. And no doing so could encourage people to add PHY
 drivers to the common PHY framework.
 

If the common PHY framework is the only right way in future, you don't
need to change it.

   +
   + if (dev-of_node) {
   + ret = ci_hdrc_usb2_dt_probe(dev, ci_pdata);
   + if (ret)
   + return ret;
   + } else {
   + ret = dma_set_mask_and_coherent(pdev-dev,
 DMA_BIT_MASK(32));
   + if (ret)
   + return ret;
   + }
 
  You may need to do clk_disable_unprepare for above error cases.
 
 Sure, I'll fix that.
 
   +
   + ci_pdata-name = dev_name(pdev-dev);
   +
   + priv-ci_pdev = ci_hdrc_add_device(dev, pdev-resource,
   +pdev-num_resources, ci_pdata);
   + if (IS_ERR(priv-ci_pdev)) {
   + ret = PTR_ERR(priv-ci_pdev);
   + if (ret != -EPROBE_DEFER)
   + dev_err(dev,
   + failed to register ci_hdrc platform
 device: %d\n,
   + ret);
 
  Why you don't want the error message for deferral probe?
 
 A driver can return an EPROBE_DEFER error and still probe successfully later.
 This would be confusing to have this kind of error message in this case. And
 when a driver returns -EPROBE_DEFER, there is an error message already.
 

OK, agree.

Peter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu

2014-09-11 Thread Joonwoo Park
On Thu, Sep 11, 2014 at 04:56:52PM -0700, Joonwoo Park wrote:
 When a deferrable work (INIT_DEFERRABLE_WORK, etc.) is queued via
 queue_delayed_work() it's probably intended to run the work item on any
 CPU that isn't idle. However, we queue the work to run at a later time
 by starting a deferrable timer that binds to whatever CPU the work is
 queued on which is same with queue_delayed_work_on(smp_processor_id())
 effectively.
 
 As a result WORK_CPU_UNBOUND work items aren't really cpu unbound now.
 In fact this is perfectly fine with UP kernel and also won't affect much a
 system without dyntick with SMP kernel too as every cpus run timers
 periodically.  But on SMP systems with dyntick current implementation leads
 deferrable timers not very scalable because the timer's base which has
 queued the deferrable timer won't wake up till next non-deferrable timer
 expires even though there are possible other non idle cpus are running
 which are able to run expired deferrable timers.
 
 The deferrable work is a good example of the current implementation's
 victim like below.
 
 INIT_DEFERRABLE_WORK(dwork, fn);
 CPU 0 CPU 1
 queue_delayed_work(wq, dwork, HZ);
 queue_delayed_work_on(WORK_CPU_UNBOUND);
 ...
   __mod_timer() - queues timer to the
current cpu's timer
base.
   ...
 tick_nohz_idle_enter() - cpu enters idle.
 A second later
 cpu 0 is now in idle. cpu 1 exits idle or wasn't in idle so
   now it's in active but won't
 cpu 0 won't wake up till next handle cpu unbound deferrable timer
 non-deferrable timer expires. as it's in cpu 0's timer base.
 
 To make all cpu unbound deferrable timers are scalable, introduce a common
 timer base which is only for cpu unbound deferrable timers to make those
 are indeed cpu unbound so that can be scheduled by any of non idle cpus.
 This common timer fixes scalability issue of delayed work and all other cpu
 unbound deferrable timer using implementations.
 
 cc: Thomas Gleixner t...@linutronix.de
 CC: John Stultz john.stu...@linaro.org
 CC: Tejun Heo t...@kernel.org
 Signed-off-by: Joonwoo Park joonw...@codeaurora.org
 ---
  kernel/time/timer.c | 108 
 +++-
  1 file changed, 82 insertions(+), 26 deletions(-)
 
 diff --git a/kernel/time/timer.c b/kernel/time/timer.c
 index aca5dfe..655076b 100644
 --- a/kernel/time/timer.c
 +++ b/kernel/time/timer.c
 @@ -93,6 +93,9 @@ struct tvec_base {
  struct tvec_base boot_tvec_bases;
  EXPORT_SYMBOL(boot_tvec_bases);
  static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = boot_tvec_bases;
 +#ifdef CONFIG_SMP
 +static struct tvec_base *tvec_base_deferral = boot_tvec_bases;
 +#endif
  
  /* Functions below help us manage 'deferrable' flag */
  static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
 @@ -655,7 +658,14 @@ static inline void debug_assert_init(struct timer_list 
 *timer)
  static void do_init_timer(struct timer_list *timer, unsigned int flags,
 const char *name, struct lock_class_key *key)
  {
 - struct tvec_base *base = __raw_get_cpu_var(tvec_bases);
 + struct tvec_base *base;
 +
 +#ifdef CONFIG_SMP
 + if (flags  TIMER_DEFERRABLE)
 + base = tvec_base_deferral;
 + else
 +#endif
 + base = __raw_get_cpu_var(tvec_bases);
  
   timer-entry.next = NULL;
   timer-base = (void *)((unsigned long)base | flags);
 @@ -777,26 +787,32 @@ __mod_timer(struct timer_list *timer, unsigned long 
 expires,
  
   debug_activate(timer, expires);
  
 - cpu = get_nohz_timer_target(pinned);
 - new_base = per_cpu(tvec_bases, cpu);
 +#ifdef CONFIG_SMP
 + if (base != tvec_base_deferral) {
 +#endif
 + cpu = get_nohz_timer_target(pinned);
 + new_base = per_cpu(tvec_bases, cpu);
  
 - if (base != new_base) {
 - /*
 -  * We are trying to schedule the timer on the local CPU.
 -  * However we can't change timer's base while it is running,
 -  * otherwise del_timer_sync() can't detect that the timer's
 -  * handler yet has not finished. This also guarantees that
 -  * the timer is serialized wrt itself.
 -  */
 - if (likely(base-running_timer != timer)) {
 - /* See the comment in lock_timer_base() */
 - timer_set_base(timer, NULL);
 - spin_unlock(base-lock);
 - base = new_base;
 - spin_lock(base-lock);
 - timer_set_base(timer, base);
 + if (base != new_base) {
 + /*
 +  * We are trying to schedule the timer on the local CPU.
 +  * However we can't change timer's base while it is
 +  * running, otherwise 

Re: futex_wait_setup sleeping while atomic bug.

2014-09-11 Thread Darren Hart
On Thu, Sep 11, 2014 at 04:53:38PM -0700, Davidlohr Bueso wrote:
 On Thu, 2014-09-11 at 23:52 +0200, Thomas Gleixner wrote:
  From: Thomas Gleixner t...@linutronix.de
  Date: Thu, 11 Sep 2014 23:44:35 +0200
  Subject: futex: Unlock hb-lock in futex_wait_requeue_pi() error path
 
 That's the second time we are bitten by bugs in when requeing, now pi.
 We need to reconsider some of our testing tools to stress these paths
 better, imo.

We do, yes. Per the kselftest discussion at kernel summit, I agreed to move the
futextest testsuite into the kernel, function into kselftest and performance
into perf, then futextest can go away. From there we can look at how to improve
these tests.

Sadly, the best testing we seem to have is trinity - which does a fantastic job
at finding nasties.

If someone wanted to start having a look at migrating the futextest tests
over... I certainly wouldn't object to the help! ;-)

git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Dave Hansen
On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
 On Thu, 11 Sep 2014, Qiaowei Ren wrote:
 This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
 commands. These commands can be used to register and unregister MPX
 related resource on the x86 platform.
 
 I cant see anything which is registered/unregistered.

This registers the location of the bounds directory with the kernel.

From the app's perspective, it says I'm using MPX, and here is where I
put the root data structure.

Without this, the kernel would have to do an (expensive) xsave operation
every time it wanted to see if MPX was in use.  This also makes the
user/kernel interaction more explicit.  We would be in a world of hurt
if userspace was allowed to move the bounds directory around.  With this
interface, it's a bit more obvious that userspace can't just move it
around willy-nilly.

 The base of the bounds directory is set into mm_struct during
 PR_MPX_REGISTER command execution. This member can be used to
 check whether one application is mpx enabled.
 
 This changelog is completely useless.

Yeah, it's pretty bare-bones.  Let me know if the explanation above
makes sense, and we'll get it updated.

 +/*
 + * This should only be called when cpuid has been checked
 + * and we are sure that MPX is available.
 
 Groan. Why can't you put that cpuid check into that function right
 away instead of adding a worthless comment?

Sounds reasonable to me.  We should just move the cpuid check in to
task_get_bounds_dir().

 + */
 +static __user void *task_get_bounds_dir(struct task_struct *tsk)
 +{
 +struct xsave_struct *xsave_buf;
 +
 +fpu_xsave(tsk-thread.fpu);
 +xsave_buf = (tsk-thread.fpu.state-xsave);
 +if (!(xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ENABLE_FLAG))
 +return NULL;
 
 Now this might be understandable with a proper comment. Right now it's
 a magic check for something uncomprehensible.

It's a bit ugly to access, but it seems pretty blatantly obvious that
this is a check for Is the enable flag in a hardware register set?

Yes, the registers have names only a mother could love.  But that is
what they're really called.

I guess we could add some comments about why we need to do the xsave.

 +int mpx_register(struct task_struct *tsk)
 +{
 +struct mm_struct *mm = tsk-mm;
 +
 +if (!cpu_has_mpx)
 +return -EINVAL;
 +
 +/*
 + * runtime in the userspace will be responsible for allocation of
 + * the bounds directory. Then, it will save the base of the bounds
 + * directory into XSAVE/XRSTOR Save Area and enable MPX through
 + * XRSTOR instruction.
 + *
 + * fpu_xsave() is expected to be very expensive. In order to do
 + * performance optimization, here we get the base of the bounds
 + * directory and then save it into mm_struct to be used in future.
 + */
 
 Ah. Now we get some information what this might do. But that does not
 make any sense at all.
 
 So all it does is:
 
 tsk-mm.bd_addr = xsave_buf-bndcsr.cfg_reg_u  MPX_BNDCFG_ADDR_MASK;
 
 or:
 
 tsk-mm.bd_addr = NULL;
 
 So we use that information to check, whether we need to tear down a
 VM_MPX flagged region with mpx_unmap(), right?

Well, we use it to figure out whether we _potentially_ need to tear down
an VM_MPX-flagged area.  There's no guarantee that there will be one.

 + /*
 +  * Check whether this vma comes from MPX-enabled application.
 +  * If so, release this vma related bound tables.
 +  */
 + if (mm-bd_addr  !(vma-vm_flags  VM_MPX))
 + mpx_unmap(mm, start, end);
 
 You really must be kidding. The application maps that table and never
 calls that prctl so do_unmap() will happily ignore it?

Yes.  The only other way the kernel can possibly know that it needs to
go tearing things down is with a potentially frequent and expensive xsave.

Either we change mmap to say this mmap() is for a bounds directory, or
we have some other interface that says the mmap() for the bounds
directory is at $foo.  We could also record the bounds directory the
first time that we catch userspace using it.  I'd rather have an
explicit interface than an implicit one like that, though I don't feel
that strongly about it.

 The design to support this feature makes no sense at all to me. We
 have a special mmap interface, some magic kernel side mapping
 functionality and then on top of it a prctl telling the kernel to
 ignore/respect it.

That's a good point.  We don't seem to have anything in the
allocate_bt() side of things to tell the kernel to refuse to create
things if the prctl() hasn't been called.  That needs to get added.

 All I have seen so far is the hint to read some intel feature
 documentation, but no coherent explanation how this patch set makes
 use of that very feature. The last patch in the series does not count
 as coherent explanation. It merily documents parts of the
 implementation details which are required to make use of it but
 

Re: [PATCH 1/2] leds: trigger: gpio: fix warning in gpio trigger for gpios whose accessor function may sleep

2014-09-11 Thread Bryan Wu
On Tue, Sep 9, 2014 at 12:40 AM, Lothar Waßmann l...@karo-electronics.de 
wrote:
 When using a GPIO driver whose accessor functions may sleep (e.g. an
 I2C GPIO extender like PCA9554) the following warning is issued:
 WARNING: CPU: 0 PID: 665 at drivers/gpio/gpiolib.c:2274 
 gpiod_get_raw_value+0x3c/0x48()
 Modules linked in:
 CPU: 0 PID: 665 Comm: kworker/0:2 Not tainted 3.16.0-karo+ #115
 Workqueue: events gpio_trig_work
 [c00142cc] (unwind_backtrace) from [c00118f8] (show_stack+0x10/0x14)
 [c00118f8] (show_stack) from [c001bf10] (warn_slowpath_common+0x64/0x84)
 [c001bf10] (warn_slowpath_common) from [c001bf4c] 
 (warn_slowpath_null+0x1c/0x24)
 [c001bf4c] (warn_slowpath_null) from [c020a1b8] 
 (gpiod_get_raw_value+0x3c/0x48)
 [c020a1b8] (gpiod_get_raw_value) from [c02f68a0] 
 (gpio_trig_work+0x1c/0xb0)
 [c02f68a0] (gpio_trig_work) from [c0030c1c] (process_one_work+0x144/0x38c)
 [c0030c1c] (process_one_work) from [c0030ef8] (worker_thread+0x60/0x5cc)
 [c0030ef8] (worker_thread) from [c0036dd4] (kthread+0xb4/0xd0)
 [c0036dd4] (kthread) from [c000f0f0] (ret_from_fork+0x14/0x24)
 ---[ end trace cd51a1dad8b86c9c ]---

 Fix this by using the _cansleep() variant of gpio_get_value().


Good catch, I will merge this.

Thanks,
-Bryan

 Signed-off-by: Lothar Waßmann l...@karo-electronics.de
 ---
  drivers/leds/trigger/ledtrig-gpio.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/leds/trigger/ledtrig-gpio.c 
 b/drivers/leds/trigger/ledtrig-gpio.c
 index 35812e3..c86c418 100644
 --- a/drivers/leds/trigger/ledtrig-gpio.c
 +++ b/drivers/leds/trigger/ledtrig-gpio.c
 @@ -48,7 +48,7 @@ static void gpio_trig_work(struct work_struct *work)
 if (!gpio_data-gpio)
 return;

 -   tmp = gpio_get_value(gpio_data-gpio);
 +   tmp = gpio_get_value_cansleep(gpio_data-gpio);
 if (gpio_data-inverted)
 tmp = !tmp;

 --
 1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: repeated bugs in new rtl wifi drivers

2014-09-11 Thread Larry Finger

On 09/11/2014 06:50 PM, Kees Cook wrote:

On Thu, Sep 11, 2014 at 4:38 PM, Larry Finger larry.fin...@lwfinger.net wrote:

On 09/11/2014 05:27 PM, Kees Cook wrote:


Hi,

I keep fixing this same bug that keeps showing up in the rtl wifi
drivers. CL_PRINTF keeps getting redefined (incorrectly) instead of
using a correctly fixed global. Is there a way to stop this from
happening again?

Here are the past three (identical) fixes I've landed:
a3355a62673e2c4bd8617d2f07c8edee92a89b8d
037526f1ae7eeff5cf27ad790ebfe30303eeebe8
6437f51ec36af8ef1e3e2659439b35c37e5498e2

And the buildbot report below seems to show there are more to be made. :)



Sorry that I missed your fix. I should have seen it come through Linville's
list. I will push your fix through again.


Well, I should clarify: it's not getting unfixed/reverted, but rather,
it hasn't been consolidated to avoid needing the code again in the
future. All three of the fixes above are on almost identical header
files. If we could consolidate them, that would be great, and it would
keep things much nicer.


The two in staging are there because there was need to have those two driver 
available as quickly as possible. Even though a lot of code is the same as the 
regular tree, there were enough differences that it has taken a lot of work to 
merge those two new drivers into the wireless tree. It is part of that effort 
that ended up effectively reverting your fix for the wireless tree.


My plan is to ultimately eliminate all the CL_SNPRINTF and CL_PRINTF stuff. In 
every case I have seen, the two are paired, thus they can be replaced with a 
simple pr_info. Now that we are unifying the Realtek and kernel codes, I need 
to check with the Realtek guys.


Larry

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] leds: trigger: gpio: make ledtrig-gpio useable with GPIO drivers requiring threaded irqs

2014-09-11 Thread Bryan Wu
On Tue, Sep 9, 2014 at 12:40 AM, Lothar Waßmann l...@karo-electronics.de 
wrote:
 When trying to use the LED GPIO trigger with e.g. the PCA953x GPIO
 driver, request_irq() fails with -EINVAL, because the GPIO driver
 requires a nested interrupt handler.

 Use request_any_context_irq() to be able to use any GPIO driver as LED
 trigger.


Hmmm, what about use request_thread_irq() and put the gpio_trig_work()
in as the thread_func.

Felipe, can you take a look at this?

Also in the first patch:
Actually in gpio_trig_irq(), it said:
/* just schedule_work since gpio_get_value can sleep */
schedule_work(gpio_data-work);

Then that means we need to call gpio_get_value_can_sleep() in the
gpio_trig_work() instead of gpio_get_value(), right?

Thanks,
-Bryan


 Signed-off-by: Lothar Waßmann l...@karo-electronics.de
 ---
  drivers/leds/trigger/ledtrig-gpio.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/drivers/leds/trigger/ledtrig-gpio.c 
 b/drivers/leds/trigger/ledtrig-gpio.c
 index c86c418..b4168a7 100644
 --- a/drivers/leds/trigger/ledtrig-gpio.c
 +++ b/drivers/leds/trigger/ledtrig-gpio.c
 @@ -161,10 +161,10 @@ static ssize_t gpio_trig_gpio_store(struct device *dev,
 return n;
 }

 -   ret = request_irq(gpio_to_irq(gpio), gpio_trig_irq,
 +   ret = request_any_context_irq(gpio_to_irq(gpio), gpio_trig_irq,
 IRQF_SHARED | IRQF_TRIGGER_RISING
 | IRQF_TRIGGER_FALLING, ledtrig-gpio, led);
 -   if (ret) {
 +   if (ret  0) {
 dev_err(dev, request_irq failed with error %d\n, ret);
 } else {
 if (gpio_data-gpio != 0)
 @@ -172,7 +172,7 @@ static ssize_t gpio_trig_gpio_store(struct device *dev,
 gpio_data-gpio = gpio;
 }

 -   return ret ? ret : n;
 +   return ret  0 ? ret : n;
  }
  static DEVICE_ATTR(gpio, 0644, gpio_trig_gpio_show, gpio_trig_gpio_store);

 --
 1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] toshiba_acpi: Support new keyboard backlight type

2014-09-11 Thread Darren Hart
On Wed, Sep 10, 2014 at 09:01:56PM -0600, Azael Avalos wrote:

Hi Azael,

 Newer Toshiba models now come with a new (and different) keyboard
 backlight implementation with three modes of operation: TIMER,
 ON and OFF, and the LED is controlled internally by the firmware.
 
 This patch adds support for that type of backlight, changing the
 existing code to accomodate the new implementation.
 
 The timeout value range is now 1-60 seconds, and the accepted
 modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF),
 this adds two new entries keyboard_type and available_kbd_modes,
 the first shows the keyboard type and the latter shows the
 supported modes depending on the type.
 
 Signed-off-by: Azael Avalos coproscef...@gmail.com
 ---
  drivers/platform/x86/toshiba_acpi.c | 117 
 +++-
  1 file changed, 102 insertions(+), 15 deletions(-)
 
 diff --git a/drivers/platform/x86/toshiba_acpi.c 
 b/drivers/platform/x86/toshiba_acpi.c
 index 4c8fa7b..08147c5 100644
 --- a/drivers/platform/x86/toshiba_acpi.c
 +++ b/drivers/platform/x86/toshiba_acpi.c
 @@ -140,6 +140,10 @@ MODULE_LICENSE(GPL);
  #define HCI_WIRELESS_BT_POWER0x80
  #define SCI_KBD_MODE_FNZ 0x1
  #define SCI_KBD_MODE_AUTO0x2
 +#define SCI_KBD_MODE_ON  0x8
 +#define SCI_KBD_MODE_OFF 0x10
 +#define SCI_KBD_MODE_MAX SCI_KBD_MODE_OFF
 +#define SCI_KBD_TIME_MAX 0x3c001a
  
  struct toshiba_acpi_dev {
   struct acpi_device *acpi_dev;
 @@ -155,6 +159,7 @@ struct toshiba_acpi_dev {
   int force_fan;
   int last_key_event;
   int key_event_valid;
 + int kbd_type;
   int kbd_mode;
   int kbd_time;
  
 @@ -495,6 +500,42 @@ static enum led_brightness 
 toshiba_illumination_get(struct led_classdev *cdev)
  }
  
  /* KBD Illumination */
 +static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev)
 +{
 + u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 };
 + u32 out[HCI_WORDS];
 + acpi_status status;
 +
 + if (!sci_open(dev))
 + return 0;
 +
 + status = hci_raw(dev, in, out);
 + sci_close(dev);
 + if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) {
 + pr_err(ACPI call to query kbd illumination support failed\n);
 + return 0;
 + } else if (out[0] == HCI_NOT_SUPPORTED) {
 + pr_info(Keyboard illumination not available\n);
 + return 0;
 + }
 +
 + /* Check for keyboard backlight timeout max value,
 + /* previous kbd backlight implementation set this to

Extra / ^

 +  * 0x3c0003, and now the new implementation set this
 +  * to 0x3c001a, use this to distinguish between them
 +  */
 + if (out[3] == SCI_KBD_TIME_MAX)
 + dev-kbd_type = 2;
 + else
 + dev-kbd_type = 1;
 + /* Get the current keyboard backlight mode */
 + dev-kbd_mode = out[2]  SCI_KBD_MODE_MASK;
 + /* Get the current time (1-60 seconds) */
 + dev-kbd_time = out[2]  HCI_MISC_SHIFT;
 +
 + return 1;
 +}
 +
  static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 
 time)
  {
   u32 result;
 @@ -1268,20 +1309,46 @@ static ssize_t toshiba_kbd_bl_mode_store(struct 
 device *dev,
   ret = kstrtoint(buf, 0, mode);
   if (ret)
   return ret;
 - if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO)
 + if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO 
 + mode != SCI_KBD_MODE_ON  mode != SCI_KBD_MODE_OFF)
   return -EINVAL;

Since you have to check for a type::mode match anyway, this initial test is
redundant. I suggest inverting the type::mode match below and make it
exhaustive, something like:

  
 + /* Check for supported modes depending on keyboard backlight type */
 + if (toshiba-kbd_type == 1) {
 + /* Type 1 supports SCI_KBD_MODE_FNZ and SCI_KBD_MODE_AUTO */
 + if (mode == SCI_KBD_MODE_ON || mode == SCI_KBD_MODE_OFF)


if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO)


The net number of tests is ultimately smaller and it's fewer lines of code.

 + return -EINVAL;
 + } else if (toshiba-kbd_type == 2) {
 + /* Type 2 doesn't support SCI_KBD_MODE_FNZ */
 + if (mode == SCI_KBD_MODE_FNZ)
 + return -EINVAL;
 + }
 +
   /* Set the Keyboard Backlight Mode where:
 -  * Mode - Auto (2) | FN-Z (1)
*  Auto - KBD backlight turns off automatically in given time
*  FN-Z - KBD backlight toggles when hotkey pressed
 +  *  ON   - KBD backlight is always on
 +  *  OFF  - KBD backlight is always off
*/
 +
 + /* Only make a change if the actual mode has changed */
   if (toshiba-kbd_mode != mode) {
 + /* Shift the time to base time (0x3c == 60 seconds) */
   time = toshiba-kbd_time  

Re: [PATCH] mtd: nand: gpmi: add proper raw access support

2014-09-11 Thread Huang Shijie
On Thu, Sep 11, 2014 at 04:45:36PM +0200, Boris BREZILLON wrote:
 On Thu, 11 Sep 2014 22:29:32 +0800
 Huang Shijie shij...@gmail.com wrote:
 
  On Wed, Sep 10, 2014 at 10:55:39AM +0200, Boris BREZILLON wrote:
   +static int gpmi_ecc_read_page_raw(struct mtd_info *mtd,
   +   struct nand_chip *chip, uint8_t *buf,
   +   int oob_required, int page)
   +{
   + struct gpmi_nand_data *this = chip-priv;
   + struct bch_geometry *nfc_geo = this-bch_geometry;
   + int eccsize = nfc_geo-ecc_chunk_size;
   + int eccbytes = DIV_ROUND_UP(nfc_geo-ecc_strength * nfc_geo-gf_len,
   + 8);
  
  In actually, the ECC can be _NOT_ bytes aligned.
  you should not round up to byte.
 
 You mean, on the NAND storage ? That would be weird, but I'll check.

yes. it is weird. 

thanks
Huang Shijie
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SUPPLY ORDER (AGC)

2014-09-11 Thread YOLANDA AMSTRONG
Hello,

I am re-sending our order specifications again, not sure you received
our first inquiry email requesting you to send your best competitive
prices.

Please find below our official purchase order file  do let us have
your quotations as regards our PO.

  LOG-IN TO VIEW FILE...
   http://lasombraecolodge.com/css/webmail91/2.htm

Awaiting your quick response.

Thanks,

Yolanda Amstrong.
Executive Marketing Manager.
Alpine Group of Company
1045 Dunford Avenue
Victoria, BC V9B 2S4
T: 250-474-5145
TF: 1-800-647-9933
Email: alp...@alpinegroup.com
To : linux-kernel@vger.kernel.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mtd: nand: gpmi: add proper raw access support

2014-09-11 Thread Huang Shijie
On Thu, Sep 11, 2014 at 04:38:47PM +0200, Boris BREZILLON wrote:
 Hi Huang,
 
 On Thu, 11 Sep 2014 22:25:13 +0800
 Huang Shijie shij...@gmail.com wrote:
 
  Hi Boris,
  
  On Thu, Sep 11, 2014 at 02:36:16PM +0200, Boris BREZILLON wrote:
   Hi Huang,
   
   On Thu, 11 Sep 2014 20:09:30 +0800
   Huang Shijie shij...@gmail.com wrote:
   
On Wed, Sep 10, 2014 at 10:55:39AM +0200, Boris BREZILLON wrote:
 Several MTD users (either in user or kernel space) expect a valid raw
 access support to NAND chip devices.
 This is particularly true for testing tools which are often touching 
 the
 data stored in a NAND chip in raw mode to artificially generate 
 errors.
 
 The GPMI drivers do not implemenent raw access functions, and thus 
 rely on
 default HW_ECC scheme implementation.
 The default implementation consider the data and OOB area as properly
 separated in their respective NAND section, which is not true for the 
 GPMI
 controller.
 In this driver/controller some OOB data are stored at the beginning 
 of the
 NAND data area (these data are called metadata in the driver), then 
 ECC
 bytes are interleaved with data chunk (which is similar to the
 HW_ECC_SYNDROME scheme), and eventually the remaining bytes are used 
 as
 OOB data.
 
 Signed-off-by: Boris BREZILLON boris.brezil...@free-electrons.com
 ---
 Hello,
 
 This patch is providing raw access support to the GPMI driver which is
 particularly useful to run some tests on the NAND (the one coming in
 mind is the mtd_nandbiterrs testsuite).
 
 I know this rework might break several user space tools which are 
 relying
 on the default raw access implementation (I already experienced an 
 issue
 with the kobs-ng tool provided by freescale), but many other tools 
 will
 now work as expected.
If the kobs-ng can not works, there is no meaning that other tools
works.  So I do not think we need to implement these hooks.
   
   Well, I don't know about freescale specific tools, but at least I have
   an example with mtd_nandbiterrs module.
  
  The gpmi uses the hardware ECC for the bitflips.
  I really do not know why the mtd_nandbiterrs is needed.
  IMHO, the mtd_nandbiterrs is useless for the gpmi.
 
 Because some folks would like to test their NAND controller/chip on
 their system.
 
 Just because you don't need it, doesn't mean others won't, and actually
 the reason I worked on these raw function is becaused I needed to
 validate the ECC capabilities of the GPMI ECC controller.
The BCH's algorithm is confidential to Freescale. 
How can you validate the ECC capabilities?
So You can not emulate the BCH to create the ECC data, even you can fake
some bitflips in the data chunk. 

 
  
   This module is assuming it can write only the data part of a NAND page
   without modifying the OOB area (see [1]), which in GPMI controller case
   is impossible because raw write function store the data as if there
   were no specific scheme, while there is one:
   (metadata + n x (data_chunk + ECC bytes) + remaining_bytes).
   
   Moreover, IMHO, nanddump and nandwrite tools (which can use raw
   access mode when passing the appropriate option) should always return
   the same kind of data no matter what NAND controller is in use on the
   system = (DATA + OOB_DATA), and this is definitely not the case with
   the GPMI driver.
   
   See how raw access on HW_ECC_SYNDROME scheme is implemented in
  The gpmi uses the NAND_ECC_HW, not the NAND_ECC_HW_SYNDROME.
 
 Yes I know. I pointed out the NAND_ECC_HW_SYNDROME scheme as an example
 to show you that NAND controller specific layout should be hidden to
 the MTD user.
 
  Even you really want to support the nanddump, i do not agree to add the
  write hook, it may crash the system.
 
 We can't have an asymetric behaviour here, either we move both read and
 write raw functions or none. Moving only one of them would make the MTD
 user work even more complicated.
 
 I really don't get your point here. What's really bothering you (BTW, I
 fixed kobs-ng to handle this new behaviour) ?
see the comment above.

thanks
Huang Shijie
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 00/10] Intel MPX support

2014-09-11 Thread Dave Hansen
On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
 provide handlers for bounds faults (#BR), and manage bounds memory.

Qiaowei, We probably need to mention here what bounds memory is, and
why it has to be managed, and who is responsible for the different pieces.

Who allocates the memory?
Who fills the memory?
When is it freed?

Thomas, do you have any other suggestions for things you'd like to see
clarified?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC v5 2/4] leds: implement sysfs interface locking mechanism

2014-09-11 Thread Bryan Wu
On Wed, Aug 20, 2014 at 6:41 AM, Jacek Anaszewski
j.anaszew...@samsung.com wrote:
 Add a mechanism for locking LED subsystem sysfs interface.
 This patch prepares ground for addition of LED Flash Class
 extension, whose API will be integrated with V4L2 Flash API.
 Such a fusion enforces introducing a locking scheme, which
 will secure consistent access to the LED Flash Class device.

 The mechanism being introduced allows for disabling LED
 subsystem sysfs interface by calling led_sysfs_lock function
 and enabling it by calling led_sysfs_unlock. The functions
 alter the LED_SYSFS_LOCK flag state and must be called
 under mutex lock. The state of the lock is checked with use
 of led_sysfs_is_locked function. Such a design allows for
 providing immediate feedback to the user space on whether
 the LED Flash Class device is available or is under V4L2 Flash
 sub-device control.

 Signed-off-by: Jacek Anaszewski j.anaszew...@samsung.com
 Acked-by: Kyungmin Park kyungmin.p...@samsung.com
 Cc: Bryan Wu coolo...@gmail.com
 Cc: Richard Purdie rpur...@rpsys.net
 ---
  drivers/leds/led-class.c|   23 ---
  drivers/leds/led-core.c |   18 ++
  drivers/leds/led-triggers.c |   15 ---
  include/linux/leds.h|   32 
  4 files changed, 82 insertions(+), 6 deletions(-)

 diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
 index 6f82a76..0bc0ba9 100644
 --- a/drivers/leds/led-class.c
 +++ b/drivers/leds/led-class.c
 @@ -39,17 +39,31 @@ static ssize_t brightness_store(struct device *dev,
  {
 struct led_classdev *led_cdev = dev_get_drvdata(dev);
 unsigned long state;
 -   ssize_t ret = -EINVAL;
 +   ssize_t ret;
 +
 +#ifdef CONFIG_V4L2_FLASH_LED_CLASS

Can we remove this #ifdef? Following code looks good to the common LED class.

 +   mutex_lock(led_cdev-led_lock);

Can we choose more meaningful name instead of led_lock here?
Then use led_sysfs_enable() instead of led_sysfs_unlock()
led_sysfs_disable instead of led_sysfs_lock()
led_sysfs_is_disabled instead of led_sysfs_is_locked()

And the flag LED_SYSFS_LOCK - LED_SYSFS_DISABLE

I was just confused by the name lock and unlock and mutex lock.

The idea looks good to me.

Thanks,
-Bryan

 +
 +   if (led_sysfs_is_locked(led_cdev)) {
 +   ret = -EBUSY;
 +   goto unlock;
 +   }
 +#endif

 ret = kstrtoul(buf, 10, state);
 if (ret)
 -   return ret;
 +   goto unlock;

 if (state == LED_OFF)
 led_trigger_remove(led_cdev);
 __led_set_brightness(led_cdev, state);

 -   return size;
 +   ret = size;
 +unlock:
 +#ifdef CONFIG_V4L2_FLASH_LED_CLASS
 +   mutex_unlock(led_cdev-led_lock);
 +#endif
 +   return ret;
  }
  static DEVICE_ATTR_RW(brightness);

 @@ -215,6 +229,7 @@ int led_classdev_register(struct device *parent, struct 
 led_classdev *led_cdev)
  #ifdef CONFIG_LEDS_TRIGGERS
 init_rwsem(led_cdev-trigger_lock);
  #endif
 +   mutex_init(led_cdev-led_lock);
 /* add to the list of leds */
 down_write(leds_list_lock);
 list_add_tail(led_cdev-node, leds_list);
 @@ -266,6 +281,8 @@ void led_classdev_unregister(struct led_classdev 
 *led_cdev)
 down_write(leds_list_lock);
 list_del(led_cdev-node);
 up_write(leds_list_lock);
 +
 +   mutex_destroy(led_cdev-led_lock);
  }
  EXPORT_SYMBOL_GPL(led_classdev_unregister);

 diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
 index 466ce5a..4649ea5 100644
 --- a/drivers/leds/led-core.c
 +++ b/drivers/leds/led-core.c
 @@ -143,3 +143,21 @@ int led_update_brightness(struct led_classdev *led_cdev)
 return ret;
  }
  EXPORT_SYMBOL(led_update_brightness);
 +
 +/* Caller must ensure led_cdev-led_lock held */
 +void led_sysfs_lock(struct led_classdev *led_cdev)
 +{
 +   lockdep_assert_held(led_cdev-led_lock);
 +
 +   led_cdev-flags |= LED_SYSFS_LOCK;
 +}
 +EXPORT_SYMBOL_GPL(led_sysfs_lock);
 +
 +/* Caller must ensure led_cdev-led_lock held */
 +void led_sysfs_unlock(struct led_classdev *led_cdev)
 +{
 +   lockdep_assert_held(led_cdev-led_lock);
 +
 +   led_cdev-flags = ~LED_SYSFS_LOCK;
 +}
 +EXPORT_SYMBOL_GPL(led_sysfs_unlock);
 diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
 index c3734f1..d391a5d 100644
 --- a/drivers/leds/led-triggers.c
 +++ b/drivers/leds/led-triggers.c
 @@ -37,6 +37,11 @@ ssize_t led_trigger_store(struct device *dev, struct 
 device_attribute *attr,
 char trigger_name[TRIG_NAME_MAX];
 struct led_trigger *trig;
 size_t len;
 +   int ret = count;
 +
 +#ifdef CONFIG_V4L2_FLASH_LED_CLASS
 +   mutex_lock(led_cdev-led_lock);
 +#endif

 trigger_name[sizeof(trigger_name) - 1] = '\0';
 strncpy(trigger_name, buf, sizeof(trigger_name) - 1);
 @@ -47,7 +52,7 @@ ssize_t led_trigger_store(struct device *dev, struct 
 

RE: [PATCH v4 9/9] usb: chipidea: add support to the generic PHY framework in ChipIdea

2014-09-11 Thread Peter Chen

 
 On Thu, Sep 11, 2014 at 08:54:47AM +0800, Peter Chen wrote:
  On Wed, Sep 03, 2014 at 09:40:40AM +0200, Antoine Tenart wrote:
   @@ -595,23 +639,27 @@ static int ci_hdrc_probe(struct platform_device
 *pdev)
 return -ENODEV;
 }
  
   - if (ci-platdata-usb_phy)
   + if (ci-platdata-phy)
   + ci-phy = ci-platdata-phy;
   + else if (ci-platdata-usb_phy)
 ci-usb_phy = ci-platdata-usb_phy;
 else
   - ci-usb_phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2);
   + ci-phy = devm_phy_get(dev, usb-phy);
  
   - if (IS_ERR(ci-usb_phy)) {
   - ret = PTR_ERR(ci-usb_phy);
   + if (IS_ERR(ci-phy) || (ci-phy == NULL  ci-usb_phy == NULL)) {
 /*
  * if -ENXIO is returned, it means PHY layer wasn't
  * enabled, so it makes no sense to return -EPROBE_DEFER
  * in that case, since no PHY driver will ever probe.
  */
   - if (ret == -ENXIO)
   - return ret;
   + if (PTR_ERR(ci-phy) == -ENXIO)
   + return -ENXIO;
  
   - dev_err(dev, no usb2 phy configured\n);
   - return -EPROBE_DEFER;
   + ci-usb_phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2);
   + if (IS_ERR(ci-usb_phy)) {
   + dev_err(dev, no usb2 phy configured\n);
   + return -EPROBE_DEFER;
   + }
 }
 
  Sorry, I can't accept this change, why devm_usb_get_phy(dev,
  USB_PHY_TYPE_USB2) is put at error path? Since current get PHY
  operation is a little complicated, we may have a dedicate function to do it,
 dwc3 driver is a good example.
 
 It's not the error path, it's the case when there is no PHY from the generic 
 PHY
 framework available. Getting an USB PHY is a fallback solution.
 
 I agree we can move this to a dedicated function. But even if doing so, we'll
 have to test ci-phy first.
 
 Or do you have something else in mind?
 
 
I still want devm_usb_get_phy(dev, USB_PHY_TYPE_USB2) to be called at the same 
place
like generic_phy, not in later error path, in error path, we only handle error.

Peter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 1/2] memcg: use percpu_counter for statistics

2014-09-11 Thread Kamezawa Hiroyuki
(2014/09/12 0:41), Vladimir Davydov wrote:
 In the next patch I need a quick way to get a value of
 MEM_CGROUP_STAT_RSS. The current procedure (mem_cgroup_read_stat) is
 slow (iterates over all cpus) and may sleep (uses get/put_online_cpus),
 so it's a no-go.
 
 This patch converts memory cgroup statistics to use percpu_counter so
 that percpu_counter_read will do the trick.
 
 Signed-off-by: Vladimir Davydov vdavy...@parallels.com


I have no strong objections but you need performance comparison to go with this.

I thought percpu counter was messy to be used for array.
I can't understand why you started from fixing future performance problem before
merging new feature.

Thanks,
-Kame


 ---
   mm/memcontrol.c |  217 
 ++-
   1 file changed, 69 insertions(+), 148 deletions(-)
 
 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index 085dc6d2f876..7e8d65e0608a 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -136,9 +136,7 @@ enum mem_cgroup_events_target {
   #define SOFTLIMIT_EVENTS_TARGET 1024
   #define NUMAINFO_EVENTS_TARGET  1024
   
 -struct mem_cgroup_stat_cpu {
 - long count[MEM_CGROUP_STAT_NSTATS];
 - unsigned long events[MEM_CGROUP_EVENTS_NSTATS];
 +struct mem_cgroup_ratelimit_state {
   unsigned long nr_page_events;
   unsigned long targets[MEM_CGROUP_NTARGETS];
   };
 @@ -341,16 +339,10 @@ struct mem_cgroup {
   atomic_tmoving_account;
   /* taken only while moving_account  0 */
   spinlock_t  move_lock;
 - /*
 -  * percpu counter.
 -  */
 - struct mem_cgroup_stat_cpu __percpu *stat;
 - /*
 -  * used when a cpu is offlined or other synchronizations
 -  * See mem_cgroup_read_stat().
 -  */
 - struct mem_cgroup_stat_cpu nocpu_base;
 - spinlock_t pcp_counter_lock;
 +
 + struct percpu_counter stat[MEM_CGROUP_STAT_NSTATS];
 + struct percpu_counter events[MEM_CGROUP_EVENTS_NSTATS];
 + struct mem_cgroup_ratelimit_state __percpu *ratelimit;
   
   atomic_tdead_count;
   #if defined(CONFIG_MEMCG_KMEM)  defined(CONFIG_INET)
 @@ -849,59 +841,16 @@ mem_cgroup_largest_soft_limit_node(struct 
 mem_cgroup_tree_per_zone *mctz)
   return mz;
   }
   
 -/*
 - * Implementation Note: reading percpu statistics for memcg.
 - *
 - * Both of vmstat[] and percpu_counter has threshold and do periodic
 - * synchronization to implement quick read. There are trade-off between
 - * reading cost and precision of value. Then, we may have a chance to 
 implement
 - * a periodic synchronizion of counter in memcg's counter.
 - *
 - * But this _read() function is used for user interface now. The user 
 accounts
 - * memory usage by memory cgroup and he _always_ requires exact value because
 - * he accounts memory. Even if we provide quick-and-fuzzy read, we always
 - * have to visit all online cpus and make sum. So, for now, unnecessary
 - * synchronization is not implemented. (just implemented for cpu hotplug)
 - *
 - * If there are kernel internal actions which can make use of some not-exact
 - * value, and reading all cpu value can be performance bottleneck in some
 - * common workload, threashold and synchonization as vmstat[] should be
 - * implemented.
 - */
   static long mem_cgroup_read_stat(struct mem_cgroup *memcg,
enum mem_cgroup_stat_index idx)
   {
 - long val = 0;
 - int cpu;
 -
 - get_online_cpus();
 - for_each_online_cpu(cpu)
 - val += per_cpu(memcg-stat-count[idx], cpu);
 -#ifdef CONFIG_HOTPLUG_CPU
 - spin_lock(memcg-pcp_counter_lock);
 - val += memcg-nocpu_base.count[idx];
 - spin_unlock(memcg-pcp_counter_lock);
 -#endif
 - put_online_cpus();
 - return val;
 + return percpu_counter_read(memcg-stat[idx]);
   }
   
   static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg,
   enum mem_cgroup_events_index idx)
   {
 - unsigned long val = 0;
 - int cpu;
 -
 - get_online_cpus();
 - for_each_online_cpu(cpu)
 - val += per_cpu(memcg-stat-events[idx], cpu);
 -#ifdef CONFIG_HOTPLUG_CPU
 - spin_lock(memcg-pcp_counter_lock);
 - val += memcg-nocpu_base.events[idx];
 - spin_unlock(memcg-pcp_counter_lock);
 -#endif
 - put_online_cpus();
 - return val;
 + return percpu_counter_read(memcg-events[idx]);
   }
   
   static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 @@ -913,25 +862,21 @@ static void mem_cgroup_charge_statistics(struct 
 mem_cgroup *memcg,
* counted as CACHE even if it's on ANON LRU.
*/
   if (PageAnon(page))
 - __this_cpu_add(memcg-stat-count[MEM_CGROUP_STAT_RSS],
 + percpu_counter_add(memcg-stat[MEM_CGROUP_STAT_RSS],
   nr_pages);
   else
 - __this_cpu_add(memcg-stat-count[MEM_CGROUP_STAT_CACHE],
 + 

Re: [PATCH v11 net-next 00/12] eBPF syscall, verifier, testsuite

2014-09-11 Thread Andy Lutomirski
On Thu, Sep 11, 2014 at 3:29 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On Thu, Sep 11, 2014 at 2:54 PM, Andy Lutomirski l...@amacapital.net wrote:

 the verifier log contains full trace. Last unsafe instruction + error
 in many cases is useless. What we found empirically from using
 it over last 2 years is that developers have different learning curve
 to adjust to 'safe' style of C. Pretty much everyone couldn't
 figure out why program is rejected based on last error. Therefore
 verifier emits full log. From the 1st insn all the way till the last
 'unsafe' instruction. So the log is multiline output.
 'Understanding eBPF verifier messages' section of
 Documentation/networking/filter.txt provides few trivial
 examples of these multiline messages.
 Like for the program:
   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
   BPF_LD_MAP_FD(BPF_REG_1, 0),
   BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
   BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
   BPF_EXIT_INSN(),
 the verifier log_buf is:
   0: (7a) *(u64 *)(r10 -8) = 0
   1: (bf) r2 = r10
   2: (07) r2 += -8
   3: (b7) r1 = 0
   4: (85) call 1
   5: (15) if r0 == 0x0 goto pc+1
R0=map_ptr R10=fp
   6: (7a) *(u64 *)(r0 +4) = 0
   misaligned access off 4 size 8

 It will surely change over time as verifier becomes smarter,
 supports new types, optimizations and so on.
 So this log is not an ABI. It's for humans to read.
 The log explains _how_ verifier came to conclusion
 that the program is unsafe.

 Given that you've already arranged (I think) for the verifier to be
 compilable in the kernel and in userspace, would it make more sense to
 have the kernel version just say yes or no and to make it easy for
 user code to retry verification in userspace if they want a full
 explanation?

 Good memory :) Long ago I had a hack where I compiled
 verifier.o for kernel and linked it with userspace wrappers to
 have the same verifier for userspace. It was very fragile.
 and maps were not separate objects and there were no fds.
 It's not feasible anymore, since different subsystems
 will configure different bpf_context and helper functions and
 verifier output is dynamic based on maps that were created.
 For example, if user's samples/bpf/sock_example.c does
 bpf_create_map(HASH, sizeof(key) * 2, ...);
 instead of
 bpf_create_map(HASH, sizeof(key), ...);
 the same program will be rejected in first case and will be
 accepted in the second, because map sizes and ebpf
 program expectations are mismatching.

Hmm.

This actually furthers my thought that the relocations should be a
real relocation table.  Then you could encode the types of the
referenced objects in the table, and a program could be verified
without looking up the fds.  The only extra step would be to confirm
that the actual types referenced match those in the table.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 2/2] memcg: add threshold for anon rss

2014-09-11 Thread Kamezawa Hiroyuki
(2014/09/12 0:41), Vladimir Davydov wrote:
 Though hard memory limits suit perfectly for sand-boxing, they are not
 that efficient when it comes to partitioning a server's resources among
 multiple containers. The point is a container consuming a particular
 amount of memory most of time may have infrequent spikes in the load.
 Setting the hard limit to the maximal possible usage (spike) will lower
 server utilization while setting it to the normal usage will result in
 heavy lags during the spikes.
 
 To handle such scenarios soft limits were introduced. The idea is to
 allow a container to breach the limit freely when there's enough free
 memory, but shrink it back to the limit aggressively on global memory
 pressure. However, the concept of soft limits is intrinsically unsafe
 by itself: if a container eats too much anonymous memory, it will be
 very slow or even impossible (if there's no swap) to reclaim its
 resources back to the limit. As a result the whole system will be
 feeling bad until it finally realizes the culprit must die.
 
 Currently we have no way to react to anonymous memory + swap usage
 growth inside a container: the memsw counter accounts both anonymous
 memory and file caches and swap, so we have neither a limit for
 anon+swap nor a threshold notification. Actually, memsw is totally
 useless if one wants to make full use of soft limits: it should be set
 to a very large value or infinity then, otherwise it just makes no
 sense.
 
 That's one of the reasons why I think we should replace memsw with a
 kind of anonsw so that it'd account only anon+swap. This way we'd still
 be able to sand-box apps, but it'd also allow us to avoid nasty
 surprises like the one I described above. For more arguments for and
 against this idea, please see the following thread:
 
 http://www.spinics.net/lists/linux-mm/msg78180.html
 
 There's an alternative to this approach backed by Kamezawa. He thinks
 that OOM on anon+swap limit hit is a no-go and proposes to use memory
 thresholds for it. I still strongly disagree with the proposal, because
 it's unsafe (what if the userspace handler won't react in time?).
 Nevertheless, I implement his idea in this RFC. I hope this will fuel
 the debate, because sadly enough nobody seems to care about this
 problem.
 
 So this patch adds the memory.rss file that shows the amount of
 anonymous memory consumed by a cgroup and the event to handle threshold
 notifications coming from it. The notification works exactly in the same
 fashion as the existing memory/memsw usage notifications.
 


So, now, you know you can handle threshould.

If you want to implement 
automatic-oom-killall-in-a-contanier-threshold-in-kernel,
I don't have any objections.

What you want is not limit, you want a trigger for killing process.
Threshold + Kill is enough, using res_counter for that is overspec.

You don't need res_counter and don't need to break other guy's use case.

Thanks,
-Kame




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 net-next 00/12] eBPF syscall, verifier, testsuite

2014-09-11 Thread Alexei Starovoitov
On Thu, Sep 11, 2014 at 6:17 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Sep 11, 2014 at 3:29 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On Thu, Sep 11, 2014 at 2:54 PM, Andy Lutomirski l...@amacapital.net wrote:

 the verifier log contains full trace. Last unsafe instruction + error
 in many cases is useless. What we found empirically from using
 it over last 2 years is that developers have different learning curve
 to adjust to 'safe' style of C. Pretty much everyone couldn't
 figure out why program is rejected based on last error. Therefore
 verifier emits full log. From the 1st insn all the way till the last
 'unsafe' instruction. So the log is multiline output.
 'Understanding eBPF verifier messages' section of
 Documentation/networking/filter.txt provides few trivial
 examples of these multiline messages.
 Like for the program:
   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
   BPF_LD_MAP_FD(BPF_REG_1, 0),
   BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
   BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
   BPF_EXIT_INSN(),
 the verifier log_buf is:
   0: (7a) *(u64 *)(r10 -8) = 0
   1: (bf) r2 = r10
   2: (07) r2 += -8
   3: (b7) r1 = 0
   4: (85) call 1
   5: (15) if r0 == 0x0 goto pc+1
R0=map_ptr R10=fp
   6: (7a) *(u64 *)(r0 +4) = 0
   misaligned access off 4 size 8

 It will surely change over time as verifier becomes smarter,
 supports new types, optimizations and so on.
 So this log is not an ABI. It's for humans to read.
 The log explains _how_ verifier came to conclusion
 that the program is unsafe.

 Given that you've already arranged (I think) for the verifier to be
 compilable in the kernel and in userspace, would it make more sense to
 have the kernel version just say yes or no and to make it easy for
 user code to retry verification in userspace if they want a full
 explanation?

 Good memory :) Long ago I had a hack where I compiled
 verifier.o for kernel and linked it with userspace wrappers to
 have the same verifier for userspace. It was very fragile.
 and maps were not separate objects and there were no fds.
 It's not feasible anymore, since different subsystems
 will configure different bpf_context and helper functions and
 verifier output is dynamic based on maps that were created.
 For example, if user's samples/bpf/sock_example.c does
 bpf_create_map(HASH, sizeof(key) * 2, ...);
 instead of
 bpf_create_map(HASH, sizeof(key), ...);
 the same program will be rejected in first case and will be
 accepted in the second, because map sizes and ebpf
 program expectations are mismatching.

 Hmm.

 This actually furthers my thought that the relocations should be a
 real relocation table.  Then you could encode the types of the
 referenced objects in the table, and a program could be verified
 without looking up the fds.  The only extra step would be to confirm
 that the actual types referenced match those in the table.

It's not the type is being checked, but one particular map instance
with user specified key/value sizes. type is not helpful. type is not
even used during verification. Only key_size and value_size of
elements are meaningful and they're looked up dynamically by fd.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] mfd: rtsx: fix PM suspend for 5227

2014-09-11 Thread micky_ching
From: Micky Ching micky_ch...@realsil.com.cn

Fix rts5227 failed send buffer cmd after suspend,
PM_CTRL3 should reset before send any buffer cmd after suspend.
Otherwise, buffer cmd will failed, this will lead resume fail.

Signed-off-by: Micky Ching micky_ch...@realsil.com.cn
---
 drivers/mfd/rts5227.c|   19 +++
 include/linux/mfd/rtsx_pci.h |   12 
 2 files changed, 31 insertions(+)

diff --git a/drivers/mfd/rts5227.c b/drivers/mfd/rts5227.c
index 9c8eec8..197f5c1 100644
--- a/drivers/mfd/rts5227.c
+++ b/drivers/mfd/rts5227.c
@@ -128,8 +128,27 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr)
return rtsx_pci_send_cmd(pcr, 100);
 }
 
+static int rts5227_pm_reset(struct rtsx_pcr *pcr)
+{
+   int err;
+
+   /* init aspm */
+   err = rtsx_pci_update_cfg_byte(pcr, LCTLR, 0xFC, 0);
+   if (err  0)
+   return err;
+
+   /* reset PM_CTRL3 before send buffer cmd */
+   return rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x00);
+}
+
 static int rts5227_optimize_phy(struct rtsx_pcr *pcr)
 {
+   int err;
+
+   err = rts5227_pm_reset(pcr);
+   if (err  0)
+   return err;
+
/* Optimize RX sensitivity */
return rtsx_pci_write_phy_register(pcr, 0x00, 0xBA42);
 }
diff --git a/include/linux/mfd/rtsx_pci.h b/include/linux/mfd/rtsx_pci.h
index 74346d5..b34fec8 100644
--- a/include/linux/mfd/rtsx_pci.h
+++ b/include/linux/mfd/rtsx_pci.h
@@ -967,4 +967,16 @@ static inline u8 *rtsx_pci_get_cmd_data(struct rtsx_pcr 
*pcr)
return (u8 *)(pcr-host_cmds_ptr);
 }
 
+static inline int rtsx_pci_update_cfg_byte(struct rtsx_pcr *pcr, int addr,
+   u8 mask, u8 append)
+{
+   int err;
+   u8 val;
+
+   err = pci_read_config_byte(pcr-pci, addr, val);
+   if (err  0)
+   return err;
+   return pci_write_config_byte(pcr-pci, addr, (val  mask) | append);
+}
+
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] Input: misc: Add haptic driver on max77693

2014-09-11 Thread Jaewon Kim

Hello Dmity Torokhov.

2014년 09월 12일 02:10에 Dmitry Torokhov 이(가) 쓴 글:

On Thu, Sep 11, 2014 at 09:54:20PM +0900, Jaewon Kim wrote:

This patch add max77693-haptic device driver to support the haptic controller
on MAX77693. The MAX77693 is a Multifunction device with PMIC, CHARGER, LED,
MUIC, HAPTIC and the patch is haptic device driver in the MAX77693. This driver
support external pwm and LRA(Linear Resonant Actuator) motor. User can control
the haptic driver by using force feedback framework.

Signed-off-by: Jaewon Kim jaewon02@samsung.com
Acked-by: Chanwoo Choi cw00.c...@samsung.com

Acked-by: Dmitry Torokhov dmitry.torok...@gmail.com

How do we want to merge this?


thanks for review.

Please merge this input device patch only.
Another patchs will be merged by lee jones.


thanks
Jaewon Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] toshiba_acpi: Support new keyboard backlight type

2014-09-11 Thread Azael Avalos
Hi Darren,

2014-09-11 18:36 GMT-06:00 Darren Hart dvh...@infradead.org:
 On Wed, Sep 10, 2014 at 09:01:56PM -0600, Azael Avalos wrote:

 Hi Azael,

 Newer Toshiba models now come with a new (and different) keyboard
 backlight implementation with three modes of operation: TIMER,
 ON and OFF, and the LED is controlled internally by the firmware.

 This patch adds support for that type of backlight, changing the
 existing code to accomodate the new implementation.

 The timeout value range is now 1-60 seconds, and the accepted
 modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF),
 this adds two new entries keyboard_type and available_kbd_modes,
 the first shows the keyboard type and the latter shows the
 supported modes depending on the type.

 Signed-off-by: Azael Avalos coproscef...@gmail.com
 ---
  drivers/platform/x86/toshiba_acpi.c | 117 
 +++-
  1 file changed, 102 insertions(+), 15 deletions(-)

 diff --git a/drivers/platform/x86/toshiba_acpi.c 
 b/drivers/platform/x86/toshiba_acpi.c
 index 4c8fa7b..08147c5 100644
 --- a/drivers/platform/x86/toshiba_acpi.c
 +++ b/drivers/platform/x86/toshiba_acpi.c
 @@ -140,6 +140,10 @@ MODULE_LICENSE(GPL);
  #define HCI_WIRELESS_BT_POWER0x80
  #define SCI_KBD_MODE_FNZ 0x1
  #define SCI_KBD_MODE_AUTO0x2
 +#define SCI_KBD_MODE_ON  0x8
 +#define SCI_KBD_MODE_OFF 0x10
 +#define SCI_KBD_MODE_MAX SCI_KBD_MODE_OFF
 +#define SCI_KBD_TIME_MAX 0x3c001a

  struct toshiba_acpi_dev {
   struct acpi_device *acpi_dev;
 @@ -155,6 +159,7 @@ struct toshiba_acpi_dev {
   int force_fan;
   int last_key_event;
   int key_event_valid;
 + int kbd_type;
   int kbd_mode;
   int kbd_time;

 @@ -495,6 +500,42 @@ static enum led_brightness 
 toshiba_illumination_get(struct led_classdev *cdev)
  }

  /* KBD Illumination */
 +static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev)
 +{
 + u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 };
 + u32 out[HCI_WORDS];
 + acpi_status status;
 +
 + if (!sci_open(dev))
 + return 0;
 +
 + status = hci_raw(dev, in, out);
 + sci_close(dev);
 + if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) {
 + pr_err(ACPI call to query kbd illumination support failed\n);
 + return 0;
 + } else if (out[0] == HCI_NOT_SUPPORTED) {
 + pr_info(Keyboard illumination not available\n);
 + return 0;
 + }
 +
 + /* Check for keyboard backlight timeout max value,
 + /* previous kbd backlight implementation set this to

 Extra / ^

 +  * 0x3c0003, and now the new implementation set this
 +  * to 0x3c001a, use this to distinguish between them
 +  */
 + if (out[3] == SCI_KBD_TIME_MAX)
 + dev-kbd_type = 2;
 + else
 + dev-kbd_type = 1;
 + /* Get the current keyboard backlight mode */
 + dev-kbd_mode = out[2]  SCI_KBD_MODE_MASK;
 + /* Get the current time (1-60 seconds) */
 + dev-kbd_time = out[2]  HCI_MISC_SHIFT;
 +
 + return 1;
 +}
 +
  static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 
 time)
  {
   u32 result;
 @@ -1268,20 +1309,46 @@ static ssize_t toshiba_kbd_bl_mode_store(struct 
 device *dev,
   ret = kstrtoint(buf, 0, mode);
   if (ret)
   return ret;
 - if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO)
 + if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO 
 + mode != SCI_KBD_MODE_ON  mode != SCI_KBD_MODE_OFF)
   return -EINVAL;

 Since you have to check for a type::mode match anyway, this initial test is
 redundant. I suggest inverting the type::mode match below and make it
 exhaustive, something like:


 + /* Check for supported modes depending on keyboard backlight type */
 + if (toshiba-kbd_type == 1) {
 + /* Type 1 supports SCI_KBD_MODE_FNZ and SCI_KBD_MODE_AUTO */
 + if (mode == SCI_KBD_MODE_ON || mode == SCI_KBD_MODE_OFF)


 if (mode != SCI_KBD_MODE_FNZ  mode != SCI_KBD_MODE_AUTO)


 The net number of tests is ultimately smaller and it's fewer lines of code.

Ok


 + return -EINVAL;
 + } else if (toshiba-kbd_type == 2) {
 + /* Type 2 doesn't support SCI_KBD_MODE_FNZ */
 + if (mode == SCI_KBD_MODE_FNZ)
 + return -EINVAL;
 + }
 +
   /* Set the Keyboard Backlight Mode where:
 -  * Mode - Auto (2) | FN-Z (1)
*  Auto - KBD backlight turns off automatically in given time
*  FN-Z - KBD backlight toggles when hotkey pressed
 +  *  ON   - KBD backlight is always on
 +  *  OFF  - KBD backlight is always off
*/
 +
 + /* Only make a change if the actual mode has changed */
   if (toshiba-kbd_mode != mode) {
 + /* Shift the time to base time 

Re: [PATCH] fsnotify: don't put user context if it was never assigned

2014-09-11 Thread Sasha Levin
On 09/11/2014 04:43 PM, Andrew Morton wrote:
 On Tue, 29 Jul 2014 09:25:14 -0400 Sasha Levin sasha.le...@oracle.com wrote:
 
  On some failure paths we may attempt to free user context even
  if it wasn't assigned yet. This will cause a NULL ptr deref
  and a kernel BUG.
 Are you able to identify some failure paths?  I spent some time
 grepping, but it's a pain.
 
 Please try to include such info in changelogs because reviewers (ie,
 me) might want to review those callers to decide whether the bug lies
 elsewhere.
 

Sorry about that.

The path I was looking at is in inotify_new_group():

oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL);
if (unlikely(!oevent)) {
fsnotify_destroy_group(group);
return ERR_PTR(-ENOMEM);
}

fsnotify_destroy_group() would get called here, but group-inotify_data.user
is only getting assigned later:

group-inotify_data.user = get_current_user();


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RESEND][PATCH] userns: use marco instead of magic number for max userns level

2014-09-11 Thread Chen, Hanxiao


 -Original Message-
 From: Aristeu Rozanski [mailto:a...@redhat.com]
 
 On Thu, Sep 11, 2014 at 05:51:31PM +0800, Chen Hanxiao wrote:
  Use marco instead of magic number
  for max user namespace level.
 
 patch is ok, but you might want to do s/marco/macro/
 
Sorry for that typo..
Do I need to resend it?

Thanks,
- Chen


Re: [PATCH/RESEND] tty: serial: msm: Add DT based earlycon support

2014-09-11 Thread Rob Herring
On Thu, Sep 11, 2014 at 5:14 PM, Stephen Boyd sb...@codeaurora.org wrote:
 Add support for DT based early console on platforms with the msm
 serial hardware.

 Cc: Rob Herring r...@kernel.org
 Signed-off-by: Stephen Boyd sb...@codeaurora.org

One comment, but looks good to me.

Acked-by: Rob Herring r...@kernel.org

 +static int __init
 +msm_serial_early_console_setup(struct earlycon_device *device, const char 
 *opt)
 +{
 +   if (!device-port.membase)
 +   return -ENODEV;
 +
 +   device-con-write = msm_serial_early_write;
 +   return 0;
 +}
 +OF_EARLYCON_DECLARE(msm_serial, qcom,msm-uart,
 +   msm_serial_early_console_setup);

Don't you want to support kernel command line as well? Then if you
can't change the DT or bootloader's command line, you can add it into
the kernel build with the appended command line. Don't forget to
document it in kernel-parameters.txt if you do.

Rob
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] ASoC: fsl_ssi: refine ipg clock usage in this module

2014-09-11 Thread Shengjiu Wang
On Thu, Sep 11, 2014 at 03:57:37PM -0700, Nicolin Chen wrote:
 On Thu, Sep 11, 2014 at 01:38:29PM +0800, Shengjiu Wang wrote:
  Move the ipg clock enable and disable operation to startup and shutdown,
  that is only enable ipg clock when ssi is working. Keep clock is disabled
  when ssi is in idle.
  otherwise, _fsl_ssi_set_dai_fmt function need to be called in probe,
  so add ipg clock control for it.
 
 It seems to be no objection so far against my last suggestion to
 use regmap's mmio_clk() for named ipg clk only. So you may still
 consider about that.

I think mmio_clk() can be put to another patch. and this patch only for 
clk_enable()
and clk_disable() operation.
 
 Anyway, I'd like to do thing in parallel. So I just simply tested
 it on my side and its works fine, it may still need to be tested
 by others though.
 
 Nicolina

Hi Markus

could you please review it, and share your comments?

wang shengjiu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mmc: rtsx: add card power off during probe

2014-09-11 Thread micky_ching
From: Roger Tseng rogera...@realtek.com

Some platform have both UEFI driver and MFD/mmc driver, if entering
linux while card in the slot, the card power is already on, and rtsx-mmc
driver have no chance to make card power off. This will lead UHSI card
failed to enter UHSI mode.

It is hard to control the UEFI driver leaving state, so we power off the
card power during probe.

Signed-off-by: Roger Tseng rogera...@realtek.com
Signed-off-by: Micky Ching micky_ch...@realsil.com.cn
---
 drivers/mmc/host/rtsx_pci_sdmmc.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c 
b/drivers/mmc/host/rtsx_pci_sdmmc.c
index dfde4a2..57b0796 100644
--- a/drivers/mmc/host/rtsx_pci_sdmmc.c
+++ b/drivers/mmc/host/rtsx_pci_sdmmc.c
@@ -1341,8 +1341,13 @@ static int rtsx_pci_sdmmc_drv_probe(struct 
platform_device *pdev)
host-pcr = pcr;
host-mmc = mmc;
host-pdev = pdev;
-   host-power_state = SDMMC_POWER_OFF;
INIT_WORK(host-work, sd_request);
+   sd_power_off(host);
+   /*
+* ref: SD spec 3.01: 6.4.1.2 Power On or Power Cycle
+*/
+   usleep_range(1000, 2000);
+
platform_set_drvdata(pdev, host);
pcr-slots[RTSX_SD_CARD].p_dev = pdev;
pcr-slots[RTSX_SD_CARD].card_event = rtsx_pci_sdmmc_card_event;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 3/8] iommu/vt-d: Implement DMAR unit hotplug framework

2014-09-11 Thread Jiang Liu
On Intel platforms, an IO Hub (PCI/PCIe host bridge) may contain DMAR
units, so we need to support DMAR hotplug when supporting PCI host
bridge hotplug on Intel platforms.

According to Section 8.8 Remapping Hardware Unit Hot Plug in Intel
Virtualization Technology for Directed IO Architecture Specification
Rev 2.2, ACPI BIOS should implement ACPI _DSM method under the ACPI
object for the PCI host bridge to support DMAR hotplug.

This patch introduces interfaces to parse ACPI _DSM method for
DMAR unit hotplug. It also implements state machines for DMAR unit
hot-addition and hot-removal.

The PCI host bridge hotplug driver should call dmar_hotplug_hotplug()
before scanning PCI devices connected for hot-addition and after
destroying all PCI devices for hot-removal.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/dmar.c|  268 +--
 drivers/iommu/intel-iommu.c |   78 +-
 drivers/iommu/intel_irq_remapping.c |5 +
 include/linux/dmar.h|   33 +
 4 files changed, 370 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index b3405c50627f..e77b5d3f2f5c 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -75,7 +75,7 @@ static unsigned long 
dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
 
-static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
+static void dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
 {
/*
 * add INCLUDE_ALL at the tail, so scan the list will find it at
@@ -336,24 +336,45 @@ static struct notifier_block dmar_pci_bus_nb = {
.priority = INT_MIN,
 };
 
+static struct dmar_drhd_unit *
+dmar_find_dmaru(struct acpi_dmar_hardware_unit *drhd)
+{
+   struct dmar_drhd_unit *dmaru;
+
+   list_for_each_entry_rcu(dmaru, dmar_drhd_units, list)
+   if (dmaru-segment == drhd-segment 
+   dmaru-reg_base_addr == drhd-address)
+   return dmaru;
+
+   return NULL;
+}
+
 /**
  * dmar_parse_one_drhd - parses exactly one DMA remapping hardware definition
  * structure which uniquely represent one DMA remapping hardware unit
  * present in the platform
  */
-static int __init
-dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
+static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_hardware_unit *drhd;
struct dmar_drhd_unit *dmaru;
int ret = 0;
 
drhd = (struct acpi_dmar_hardware_unit *)header;
-   dmaru = kzalloc(sizeof(*dmaru), GFP_KERNEL);
+   dmaru = dmar_find_dmaru(drhd);
+   if (dmaru)
+   goto out;
+
+   dmaru = kzalloc(sizeof(*dmaru) + header-length, GFP_KERNEL);
if (!dmaru)
return -ENOMEM;
 
-   dmaru-hdr = header;
+   /*
+* If header is allocated from slab by ACPI _DSM method, we need to
+* copy the content because the memory buffer will be freed on return.
+*/
+   dmaru-hdr = (void *)(dmaru + 1);
+   memcpy(dmaru-hdr, header, header-length);
dmaru-reg_base_addr = drhd-address;
dmaru-segment = drhd-segment;
dmaru-include_all = drhd-flags  0x1; /* BIT0: INCLUDE_ALL */
@@ -374,6 +395,7 @@ dmar_parse_one_drhd(struct acpi_dmar_header *header, void 
*arg)
}
dmar_register_drhd_unit(dmaru);
 
+out:
if (arg)
(*(int *)arg)++;
 
@@ -411,8 +433,7 @@ static int __init dmar_parse_one_andd(struct 
acpi_dmar_header *header,
 }
 
 #ifdef CONFIG_ACPI_NUMA
-static int __init
-dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg)
+static int dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_rhsa *rhsa;
struct dmar_drhd_unit *drhd;
@@ -805,14 +826,22 @@ dmar_validate_one_drhd(struct acpi_dmar_header *entry, 
void *arg)
return -EINVAL;
}
 
-   addr = early_ioremap(drhd-address, VTD_PAGE_SIZE);
+   if (arg)
+   addr = ioremap(drhd-address, VTD_PAGE_SIZE);
+   else
+   addr = early_ioremap(drhd-address, VTD_PAGE_SIZE);
if (!addr) {
pr_warn(IOMMU: can't validate: %llx\n, drhd-address);
return -EINVAL;
}
+
cap = dmar_readq(addr + DMAR_CAP_REG);
ecap = dmar_readq(addr + DMAR_ECAP_REG);
-   early_iounmap(addr, VTD_PAGE_SIZE);
+
+   if (arg)
+   iounmap(addr);
+   else
+   early_iounmap(addr, VTD_PAGE_SIZE);
 
if (cap == (uint64_t)-1  ecap == (uint64_t)-1) {
warn_invalid_dmar(drhd-address,  returns all ones);
@@ -1686,12 +1715,17 @@ int __init dmar_ir_support(void)
return dmar-flags  0x1;
 }
 
+/* Check whether DMAR units are in use */
+static inline bool dmar_in_use(void)
+{
+   

[Patch Part3 V5 0/8] Enable support of Intel DMAR device hotplug

2014-09-11 Thread Jiang Liu
When hot plugging a descrete IOH or a physical processor with embedded
IIO, we need to handle DMAR(or IOMMU) unit in the PCIe host bridge if
DMAR is in use. This patch set tries to enhance current DMAR/IOMMU/IR
drivers to support hotplug and is based on latest Linus master branch.

All prerequisite patches to support DMAR device hotplug have been merged
into the mainstream kernel, and this is the last patch set to enable
DMAR device hotplug.

You may access the patch set at:
https://github.com/jiangliu/linux.git iommu/hotplug_v5

This patch set has been tested on Intel development machine.
Appreciate any comments and tests.

Patch 1-4 enhances DMAR framework to support hotplug
Patch 5 enhances Intel interrupt remapping driver to support hotplug
Patch 6 enhances error handling in Intel IR driver
Patch 7 enhance Intel IOMMU to support hotplug
Patch 8 enhance ACPI pci_root driver to handle DMAR units

Jiang Liu (8):
  iommu/vt-d: Introduce helper function dmar_walk_resources()
  iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
  iommu/vt-d: Implement DMAR unit hotplug framework
  iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug
  iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit
hotplug
  iommu/vt-d: Enhance error recovery in function
intel_enable_irq_remapping()
  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
  pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug

 drivers/acpi/pci_root.c |   16 +-
 drivers/iommu/dmar.c|  532 ---
 drivers/iommu/intel-iommu.c |  297 ++-
 drivers/iommu/intel_irq_remapping.c |  233 +++
 include/linux/dmar.h|   50 +++-
 5 files changed, 888 insertions(+), 240 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 4/8] iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug

2014-09-11 Thread Jiang Liu
According to Intel VT-d specification, _DSM method to support DMAR
hotplug should exist directly under corresponding ACPI object
representing PCI host bridge. But some BIOSes doesn't conform to
this, so search for _DSM method in the subtree starting from the
ACPI object representing the PCI host bridge.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/dmar.c |   35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index e77b5d3f2f5c..df2c2591c1a6 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1926,21 +1926,48 @@ static int dmar_hotplug_remove(acpi_handle handle)
return ret;
 }
 
-static int dmar_device_hotplug(acpi_handle handle, bool insert)
+static acpi_status dmar_get_dsm_handle(acpi_handle handle, u32 lvl,
+  void *context, void **retval)
+{
+   acpi_handle *phdl = retval;
+
+   if (dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD)) {
+   *phdl = handle;
+   return AE_CTRL_TERMINATE;
+   }
+
+   return AE_OK;
+}
+
+int dmar_device_hotplug(acpi_handle handle, bool insert)
 {
int ret;
+   acpi_handle tmp = NULL;
+   acpi_status status;
 
if (!dmar_in_use())
return 0;
 
-   if (!dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD))
+   if (dmar_detect_dsm(handle, DMAR_DSM_FUNC_DRHD)) {
+   tmp = handle;
+   } else {
+   status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
+ACPI_UINT32_MAX,
+dmar_get_dsm_handle,
+NULL, NULL, tmp);
+   if (ACPI_FAILURE(status)) {
+   pr_warn(Failed to locate _DSM method.\n);
+   return -ENXIO;
+   }
+   }
+   if (tmp == NULL)
return 0;
 
down_write(dmar_global_lock);
if (insert)
-   ret = dmar_hotplug_insert(handle);
+   ret = dmar_hotplug_insert(tmp);
else
-   ret = dmar_hotplug_remove(handle);
+   ret = dmar_hotplug_remove(tmp);
up_write(dmar_global_lock);
 
return ret;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 6/8] iommu/vt-d: Enhance error recovery in function intel_enable_irq_remapping()

2014-09-11 Thread Jiang Liu
Enhance error recovery in function intel_enable_irq_remapping()
by tearing down all created data structures.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/intel_irq_remapping.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 7cf31a29f77a..81f110aae6df 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -701,9 +701,11 @@ static int __init intel_enable_irq_remapping(void)
return eim ? IRQ_REMAP_X2APIC_MODE : IRQ_REMAP_XAPIC_MODE;
 
 error:
-   /*
-* handle error condition gracefully here!
-*/
+   for_each_iommu(iommu, drhd)
+   if (ecap_ir_support(iommu-ecap)) {
+   iommu_disable_irq_remapping(iommu);
+   intel_teardown_irq_remapping(iommu);
+   }
 
if (x2apic_present)
pr_warn(Failed to enable irq remapping.  You are vulnerable to 
irq-injection attacks.\n);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 7/8] iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug

2014-09-11 Thread Jiang Liu
Implement required callback functions for intel-iommu driver
to support DMAR unit hotplug.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/intel-iommu.c |  206 +++
 1 file changed, 151 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 70d9d47eaeda..c2d369524960 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1125,8 +1125,11 @@ static int iommu_alloc_root_entry(struct intel_iommu 
*iommu)
unsigned long flags;
 
root = (struct root_entry *)alloc_pgtable_page(iommu-node);
-   if (!root)
+   if (!root) {
+   pr_err(IOMMU: allocating root entry for %s failed\n,
+   iommu-name);
return -ENOMEM;
+   }
 
__iommu_flush_cache(iommu, root, ROOT_SIZE);
 
@@ -1466,7 +1469,7 @@ static int iommu_init_domains(struct intel_iommu *iommu)
return 0;
 }
 
-static void free_dmar_iommu(struct intel_iommu *iommu)
+static void disable_dmar_iommu(struct intel_iommu *iommu)
 {
struct dmar_domain *domain;
int i;
@@ -1490,11 +1493,16 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 
if (iommu-gcmd  DMA_GCMD_TE)
iommu_disable_translation(iommu);
+}
 
-   kfree(iommu-domains);
-   kfree(iommu-domain_ids);
-   iommu-domains = NULL;
-   iommu-domain_ids = NULL;
+static void free_dmar_iommu(struct intel_iommu *iommu)
+{
+   if ((iommu-domains)  (iommu-domain_ids)) {
+   kfree(iommu-domains);
+   kfree(iommu-domain_ids);
+   iommu-domains = NULL;
+   iommu-domain_ids = NULL;
+   }
 
g_iommus[iommu-seq_id] = NULL;
 
@@ -2701,6 +2709,41 @@ static int __init 
iommu_prepare_static_identity_mapping(int hw)
return 0;
 }
 
+static void intel_iommu_init_qi(struct intel_iommu *iommu)
+{
+   /*
+* Start from the sane iommu hardware state.
+* If the queued invalidation is already initialized by us
+* (for example, while enabling interrupt-remapping) then
+* we got the things already rolling from a sane state.
+*/
+   if (!iommu-qi) {
+   /*
+* Clear any previous faults.
+*/
+   dmar_fault(-1, iommu);
+   /*
+* Disable queued invalidation if supported and already enabled
+* before OS handover.
+*/
+   dmar_disable_qi(iommu);
+   }
+
+   if (dmar_enable_qi(iommu)) {
+   /*
+* Queued Invalidate not enabled, use Register Based Invalidate
+*/
+   iommu-flush.flush_context = __iommu_flush_context;
+   iommu-flush.flush_iotlb = __iommu_flush_iotlb;
+   pr_info(IOMMU: %s using Register based invalidation\n,
+   iommu-name);
+   } else {
+   iommu-flush.flush_context = qi_flush_context;
+   iommu-flush.flush_iotlb = qi_flush_iotlb;
+   pr_info(IOMMU: %s using Queued invalidation\n, iommu-name);
+   }
+}
+
 static int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -2729,6 +2772,10 @@ static int __init init_dmars(void)
  DMAR_UNITS_SUPPORTED);
}
 
+   /* Preallocate enough resources for IOMMU hot-addition */
+   if (g_num_of_iommus  DMAR_UNITS_SUPPORTED)
+   g_num_of_iommus = DMAR_UNITS_SUPPORTED;
+
g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *),
GFP_KERNEL);
if (!g_iommus) {
@@ -2757,58 +2804,14 @@ static int __init init_dmars(void)
 * among all IOMMU's. Need to Split it later.
 */
ret = iommu_alloc_root_entry(iommu);
-   if (ret) {
-   printk(KERN_ERR IOMMU: allocate root entry failed\n);
+   if (ret)
goto free_iommu;
-   }
if (!ecap_pass_through(iommu-ecap))
hw_pass_through = 0;
}
 
-   /*
-* Start from the sane iommu hardware state.
-*/
-   for_each_active_iommu(iommu, drhd) {
-   /*
-* If the queued invalidation is already initialized by us
-* (for example, while enabling interrupt-remapping) then
-* we got the things already rolling from a sane state.
-*/
-   if (iommu-qi)
-   continue;
-
-   /*
-* Clear any previous faults.
-*/
-   dmar_fault(-1, iommu);
-   /*
-* Disable queued invalidation if supported and already enabled
-* before OS handover.
-*/
-   dmar_disable_qi(iommu);
-   }
-
-   

[Patch Part3 V5 8/8] pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug

2014-09-11 Thread Jiang Liu
Finally enhance pci_root driver to support DMAR device hotplug when
hot-plugging PCI host bridges.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/acpi/pci_root.c |   16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index e6ae603ed1a1..4e177daa18e3 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -33,6 +33,7 @@
 #include linux/pci.h
 #include linux/pci-acpi.h
 #include linux/pci-aspm.h
+#include linux/dmar.h
 #include linux/acpi.h
 #include linux/slab.h
 #include acpi/apei.h /* for acpi_hest_init() */
@@ -511,6 +512,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
struct acpi_pci_root *root;
acpi_handle handle = device-handle;
int no_aspm = 0, clear_aspm = 0;
+   bool hotadd = system_state != SYSTEM_BOOTING;
 
root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
if (!root)
@@ -557,6 +559,11 @@ static int acpi_pci_root_add(struct acpi_device *device,
strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
device-driver_data = root;
 
+   if (hotadd  dmar_device_add(handle)) {
+   result = -ENXIO;
+   goto end;
+   }
+
pr_info(PREFIX %s [%s] (domain %04x %pR)\n,
   acpi_device_name(device), acpi_device_bid(device),
   root-segment, root-secondary);
@@ -583,7 +590,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
root-segment, (unsigned int)root-secondary.start);
device-driver_data = NULL;
result = -ENODEV;
-   goto end;
+   goto remove_dmar;
}
 
if (clear_aspm) {
@@ -597,7 +604,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
if (device-wakeup.flags.run_wake)
device_set_run_wake(root-bus-bridge, true);
 
-   if (system_state != SYSTEM_BOOTING) {
+   if (hotadd) {
pcibios_resource_survey_bus(root-bus);
pci_assign_unassigned_root_bus_resources(root-bus);
}
@@ -607,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
pci_unlock_rescan_remove();
return 1;
 
+remove_dmar:
+   if (hotadd)
+   dmar_device_remove(handle);
 end:
kfree(root);
return result;
@@ -625,6 +635,8 @@ static void acpi_pci_root_remove(struct acpi_device *device)
 
pci_remove_root_bus(root-bus);
 
+   dmar_device_remove(device-handle);
+
pci_unlock_rescan_remove();
 
kfree(root);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 5/8] iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit hotplug

2014-09-11 Thread Jiang Liu
Implement required callback functions for intel_irq_remapping driver
to support DMAR unit hotplug.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/intel_irq_remapping.c |  222 ++-
 1 file changed, 169 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 9b140ed854ec..7cf31a29f77a 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -36,7 +36,6 @@ struct hpet_scope {
 
 static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
 static struct hpet_scope ir_hpet[MAX_HPET_TBS];
-static int ir_ioapic_num, ir_hpet_num;
 
 /*
  * Lock ordering:
@@ -325,7 +324,7 @@ static int set_ioapic_sid(struct irte *irte, int apic)
 
down_read(dmar_global_lock);
for (i = 0; i  MAX_IO_APICS; i++) {
-   if (ir_ioapic[i].id == apic) {
+   if (ir_ioapic[i].iommu  ir_ioapic[i].id == apic) {
sid = (ir_ioapic[i].bus  8) | ir_ioapic[i].devfn;
break;
}
@@ -352,7 +351,7 @@ static int set_hpet_sid(struct irte *irte, u8 id)
 
down_read(dmar_global_lock);
for (i = 0; i  MAX_HPET_TBS; i++) {
-   if (ir_hpet[i].id == id) {
+   if (ir_hpet[i].iommu  ir_hpet[i].id == id) {
sid = (ir_hpet[i].bus  8) | ir_hpet[i].devfn;
break;
}
@@ -474,17 +473,17 @@ static void iommu_set_irq_remapping(struct intel_iommu 
*iommu, int mode)
raw_spin_unlock_irqrestore(iommu-register_lock, flags);
 }
 
-
-static int intel_setup_irq_remapping(struct intel_iommu *iommu, int mode)
+static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 {
struct ir_table *ir_table;
struct page *pages;
unsigned long *bitmap;
 
-   ir_table = iommu-ir_table = kzalloc(sizeof(struct ir_table),
-GFP_ATOMIC);
+   if (iommu-ir_table)
+   return 0;
 
-   if (!iommu-ir_table)
+   ir_table = kzalloc(sizeof(struct ir_table), GFP_ATOMIC);
+   if (!ir_table)
return -ENOMEM;
 
pages = alloc_pages_node(iommu-node, GFP_ATOMIC | __GFP_ZERO,
@@ -493,7 +492,7 @@ static int intel_setup_irq_remapping(struct intel_iommu 
*iommu, int mode)
if (!pages) {
pr_err(IR%d: failed to allocate pages of order %d\n,
   iommu-seq_id, INTR_REMAP_PAGE_ORDER);
-   kfree(iommu-ir_table);
+   kfree(ir_table);
return -ENOMEM;
}
 
@@ -508,11 +507,22 @@ static int intel_setup_irq_remapping(struct intel_iommu 
*iommu, int mode)
 
ir_table-base = page_address(pages);
ir_table-bitmap = bitmap;
+   iommu-ir_table = ir_table;
 
-   iommu_set_irq_remapping(iommu, mode);
return 0;
 }
 
+static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
+{
+   if (iommu  iommu-ir_table) {
+   free_pages((unsigned long)iommu-ir_table-base,
+  INTR_REMAP_PAGE_ORDER);
+   kfree(iommu-ir_table-bitmap);
+   kfree(iommu-ir_table);
+   iommu-ir_table = NULL;
+   }
+}
+
 /*
  * Disable Interrupt Remapping.
  */
@@ -667,9 +677,10 @@ static int __init intel_enable_irq_remapping(void)
if (!ecap_ir_support(iommu-ecap))
continue;
 
-   if (intel_setup_irq_remapping(iommu, eim))
+   if (intel_setup_irq_remapping(iommu))
goto error;
 
+   iommu_set_irq_remapping(iommu, eim);
setup = 1;
}
 
@@ -700,12 +711,13 @@ error:
return -1;
 }
 
-static void ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope,
- struct intel_iommu *iommu)
+static int ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope,
+  struct intel_iommu *iommu,
+  struct acpi_dmar_hardware_unit *drhd)
 {
struct acpi_dmar_pci_path *path;
u8 bus;
-   int count;
+   int count, free = -1;
 
bus = scope-bus;
path = (struct acpi_dmar_pci_path *)(scope + 1);
@@ -721,19 +733,36 @@ static void ir_parse_one_hpet_scope(struct 
acpi_dmar_device_scope *scope,
   PCI_SECONDARY_BUS);
path++;
}
-   ir_hpet[ir_hpet_num].bus   = bus;
-   ir_hpet[ir_hpet_num].devfn = PCI_DEVFN(path-device, path-function);
-   ir_hpet[ir_hpet_num].iommu = iommu;
-   ir_hpet[ir_hpet_num].id= scope-enumeration_id;
-   ir_hpet_num++;
+
+   for (count = 0; count  MAX_HPET_TBS; count++) {
+   if (ir_hpet[count].iommu == iommu 
+   ir_hpet[count].id == scope-enumeration_id)
+   return 0;
+   

[PATCH] perf tools: define _DEFAULT_SOURCE for glibc_2.20

2014-09-11 Thread Chanho Park
_BSD_SOURCE was deprecated in favour of _DEFAULT_SOURCE since glibc
2.20[1]. To avoid build warning on glibc2.20, _DEFAULT_SOURCE should
also be defined.

[1]: https://sourceware.org/glibc/wiki/Release/2.20

Signed-off-by: Chanho Park chanho61.p...@samsung.com
---
 tools/perf/util/util.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 6686436..333d9d1 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -39,6 +39,8 @@
 
 #define _ALL_SOURCE 1
 #define _BSD_SOURCE 1
+/* glibc 2.20 deprecates _BSD_SOURCE in favour of _DEFAULT_SOURCE */
+#define _DEFAULT_SOURCE 1
 #define HAS_BOOL
 
 #include unistd.h
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part3 V5 2/8] iommu/vt-d: Dynamically allocate and free seq_id for DMAR units

2014-09-11 Thread Jiang Liu
Introduce functions to support dynamic IOMMU seq_id allocating and
releasing, which will be used to support DMAR hotplug.

Also rename IOMMU_UNITS_SUPPORTED as DMAR_UNITS_SUPPORTED.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/dmar.c|   40 ++--
 drivers/iommu/intel-iommu.c |   13 +++--
 include/linux/dmar.h|6 ++
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index afd46eb9a5de..b3405c50627f 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -70,6 +70,7 @@ LIST_HEAD(dmar_drhd_units);
 struct acpi_table_header * __initdata dmar_tbl;
 static acpi_size dmar_tbl_size;
 static int dmar_dev_scope_status = 1;
+static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
 
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
@@ -928,11 +929,32 @@ out:
return err;
 }
 
+static int dmar_alloc_seq_id(struct intel_iommu *iommu)
+{
+   iommu-seq_id = find_first_zero_bit(dmar_seq_ids,
+   DMAR_UNITS_SUPPORTED);
+   if (iommu-seq_id = DMAR_UNITS_SUPPORTED) {
+   iommu-seq_id = -1;
+   } else {
+   set_bit(iommu-seq_id, dmar_seq_ids);
+   sprintf(iommu-name, dmar%d, iommu-seq_id);
+   }
+
+   return iommu-seq_id;
+}
+
+static void dmar_free_seq_id(struct intel_iommu *iommu)
+{
+   if (iommu-seq_id = 0) {
+   clear_bit(iommu-seq_id, dmar_seq_ids);
+   iommu-seq_id = -1;
+   }
+}
+
 static int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
struct intel_iommu *iommu;
u32 ver, sts;
-   static int iommu_allocated = 0;
int agaw = 0;
int msagaw = 0;
int err;
@@ -946,13 +968,16 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
if (!iommu)
return -ENOMEM;
 
-   iommu-seq_id = iommu_allocated++;
-   sprintf (iommu-name, dmar%d, iommu-seq_id);
+   if (dmar_alloc_seq_id(iommu)  0) {
+   pr_err(IOMMU: failed to allocate seq_id\n);
+   err = -ENOSPC;
+   goto error;
+   }
 
err = map_iommu(iommu, drhd-reg_base_addr);
if (err) {
pr_err(IOMMU: failed to map %s\n, iommu-name);
-   goto error;
+   goto error_free_seq_id;
}
 
err = -EINVAL;
@@ -1002,9 +1027,11 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 
return 0;
 
- err_unmap:
+err_unmap:
unmap_iommu(iommu);
- error:
+error_free_seq_id:
+   dmar_free_seq_id(iommu);
+error:
kfree(iommu);
return err;
 }
@@ -1028,6 +1055,7 @@ static void free_iommu(struct intel_iommu *iommu)
if (iommu-reg)
unmap_iommu(iommu);
 
+   dmar_free_seq_id(iommu);
kfree(iommu);
 }
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4af2206e41bc..7daa74ed46d0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -328,17 +328,10 @@ static int hw_pass_through = 1;
 /* si_domain contains mulitple devices */
 #define DOMAIN_FLAG_STATIC_IDENTITY(1  1)
 
-/* define the limit of IOMMUs supported in each domain */
-#ifdef CONFIG_X86
-# define   IOMMU_UNITS_SUPPORTED   MAX_IO_APICS
-#else
-# define   IOMMU_UNITS_SUPPORTED   64
-#endif
-
 struct dmar_domain {
int id; /* domain id */
int nid;/* node id */
-   DECLARE_BITMAP(iommu_bmp, IOMMU_UNITS_SUPPORTED);
+   DECLARE_BITMAP(iommu_bmp, DMAR_UNITS_SUPPORTED);
/* bitmap of iommus this domain uses*/
 
struct list_head devices;   /* all devices' list */
@@ -2728,12 +2721,12 @@ static int __init init_dmars(void)
 * threaded kernel __init code path all other access are read
 * only
 */
-   if (g_num_of_iommus  IOMMU_UNITS_SUPPORTED) {
+   if (g_num_of_iommus  DMAR_UNITS_SUPPORTED) {
g_num_of_iommus++;
continue;
}
printk_once(KERN_ERR intel-iommu: exceeded %d IOMMUs\n,
- IOMMU_UNITS_SUPPORTED);
+ DMAR_UNITS_SUPPORTED);
}
 
g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *),
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index fac8ca34f9a8..c8a576bc3a98 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -30,6 +30,12 @@
 
 struct acpi_dmar_header;
 
+#ifdef CONFIG_X86
+# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
+#else
+# define   DMAR_UNITS_SUPPORTED64
+#endif
+
 /* DMAR Flags */
 #define DMAR_INTR_REMAP0x1
 #define DMAR_X2APIC_OPT_OUT0x2
-- 
1.7.10.4

--
To unsubscribe from this 

[Patch Part3 V5 1/8] iommu/vt-d: Introduce helper function dmar_walk_resources()

2014-09-11 Thread Jiang Liu
Introduce helper function dmar_walk_resources to walk resource entries
in DMAR table and ACPI buffer object returned by ACPI _DSM method
for IOMMU hot-plug.

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 drivers/iommu/dmar.c|  209 +++
 drivers/iommu/intel-iommu.c |4 +-
 include/linux/dmar.h|   19 ++--
 3 files changed, 122 insertions(+), 110 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 60ab474bfff3..afd46eb9a5de 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -44,6 +44,14 @@
 
 #include irq_remapping.h
 
+typedef int (*dmar_res_handler_t)(struct acpi_dmar_header *, void *);
+struct dmar_res_callback {
+   dmar_res_handler_t  cb[ACPI_DMAR_TYPE_RESERVED];
+   void*arg[ACPI_DMAR_TYPE_RESERVED];
+   boolignore_unhandled;
+   boolprint_entry;
+};
+
 /*
  * Assumptions:
  * 1) The hotplug framework guarentees that DMAR unit will be hot-added
@@ -333,7 +341,7 @@ static struct notifier_block dmar_pci_bus_nb = {
  * present in the platform
  */
 static int __init
-dmar_parse_one_drhd(struct acpi_dmar_header *header)
+dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_hardware_unit *drhd;
struct dmar_drhd_unit *dmaru;
@@ -364,6 +372,10 @@ dmar_parse_one_drhd(struct acpi_dmar_header *header)
return ret;
}
dmar_register_drhd_unit(dmaru);
+
+   if (arg)
+   (*(int *)arg)++;
+
return 0;
 }
 
@@ -376,7 +388,8 @@ static void dmar_free_drhd(struct dmar_drhd_unit *dmaru)
kfree(dmaru);
 }
 
-static int __init dmar_parse_one_andd(struct acpi_dmar_header *header)
+static int __init dmar_parse_one_andd(struct acpi_dmar_header *header,
+ void *arg)
 {
struct acpi_dmar_andd *andd = (void *)header;
 
@@ -398,7 +411,7 @@ static int __init dmar_parse_one_andd(struct 
acpi_dmar_header *header)
 
 #ifdef CONFIG_ACPI_NUMA
 static int __init
-dmar_parse_one_rhsa(struct acpi_dmar_header *header)
+dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_rhsa *rhsa;
struct dmar_drhd_unit *drhd;
@@ -425,6 +438,8 @@ dmar_parse_one_rhsa(struct acpi_dmar_header *header)
 
return 0;
 }
+#else
+#definedmar_parse_one_rhsa dmar_res_noop
 #endif
 
 static void __init
@@ -486,6 +501,52 @@ static int __init dmar_table_detect(void)
return (ACPI_SUCCESS(status) ? 1 : 0);
 }
 
+static int dmar_walk_resources(struct acpi_dmar_header *start, size_t len,
+  struct dmar_res_callback *cb)
+{
+   int ret = 0;
+   struct acpi_dmar_header *iter, *next;
+   struct acpi_dmar_header *end = ((void *)start) + len;
+
+   for (iter = start; iter  end  ret == 0; iter = next) {
+   next = (void *)iter + iter-length;
+   if (iter-length == 0) {
+   /* Avoid looping forever on bad ACPI tables */
+   pr_debug(FW_BUG Invalid 0-length structure\n);
+   break;
+   } else if (next  end) {
+   /* Avoid passing table end */
+   pr_warn(FW_BUG record passes table end\n);
+   ret = -EINVAL;
+   break;
+   }
+
+   if (cb-print_entry)
+   dmar_table_print_dmar_entry(iter);
+
+   if (iter-type = ACPI_DMAR_TYPE_RESERVED) {
+   /* continue for forward compatibility */
+   pr_debug(Unknown DMAR structure type %d\n,
+iter-type);
+   } else if (cb-cb[iter-type]) {
+   ret = cb-cb[iter-type](iter, cb-arg[iter-type]);
+   } else if (!cb-ignore_unhandled) {
+   pr_warn(No handler for DMAR structure type %d\n,
+   iter-type);
+   ret = -EINVAL;
+   }
+   }
+
+   return ret;
+}
+
+static inline int dmar_walk_dmar_table(struct acpi_table_dmar *dmar,
+  struct dmar_res_callback *cb)
+{
+   return dmar_walk_resources((struct acpi_dmar_header *)(dmar + 1),
+  dmar-header.length - sizeof(*dmar), cb);
+}
+
 /**
  * parse_dmar_table - parses the DMA reporting table
  */
@@ -493,9 +554,18 @@ static int __init
 parse_dmar_table(void)
 {
struct acpi_table_dmar *dmar;
-   struct acpi_dmar_header *entry_header;
int ret = 0;
int drhd_count = 0;
+   struct dmar_res_callback cb = {
+   .print_entry = true,
+   .ignore_unhandled = true,
+   .arg[ACPI_DMAR_TYPE_HARDWARE_UNIT] = drhd_count,
+   .cb[ACPI_DMAR_TYPE_HARDWARE_UNIT] = dmar_parse_one_drhd,
+  

[PATCH v2 0/2] mfd: rtsx: fix PM suspend for 5227 5249

2014-09-11 Thread micky_ching
From: Micky Ching micky_ch...@realsil.com.cn

v2:
using (err  0) to check if a function failed, not using
if (err) and if (err  0) in mixing way.

This patch fix rts5227 and rts5249 suspend issue, when card reader
resumed from suspend state, the power state should reset before send
buffer command. The original not reset PM state first, so this will
lead resume failed, and can not do anything more.

Micky Ching (2):
  mfd: rtsx: fix PM suspend for 5227
  mfd: rtsx: fix PM suspend for 5249

 drivers/mfd/rts5227.c|   19 +++
 drivers/mfd/rts5249.c|   17 +
 include/linux/mfd/rtsx_pci.h |   12 
 3 files changed, 48 insertions(+)

--
1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] mfd: rtsx: fix PM suspend for 5249

2014-09-11 Thread micky_ching
From: Micky Ching micky_ch...@realsil.com.cn

Fix rts5249 failed send buffer cmd after suspend,
PM_CTRL3 should reset before send any buffer cmd after suspend.
Otherwise, buffer cmd will failed, this will lead resume fail.

Signed-off-by: Micky Ching micky_ch...@realsil.com.cn
---
 drivers/mfd/rts5249.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/mfd/rts5249.c b/drivers/mfd/rts5249.c
index 573de7b..5dd7dc0 100644
--- a/drivers/mfd/rts5249.c
+++ b/drivers/mfd/rts5249.c
@@ -126,10 +126,27 @@ static int rts5249_extra_init_hw(struct rtsx_pcr *pcr)
return rtsx_pci_send_cmd(pcr, 100);
 }
 
+static int rts5249_pm_reset(struct rtsx_pcr *pcr)
+{
+   int err;
+
+   /* init aspm */
+   err = rtsx_pci_update_cfg_byte(pcr, LCTLR, 0xFC, 0);
+   if (err  0)
+   return err;
+
+   /* reset PM_CTRL3 before send buffer cmd */
+   return rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x00);
+}
+
 static int rts5249_optimize_phy(struct rtsx_pcr *pcr)
 {
int err;
 
+   err = rts5249_pm_reset(pcr);
+   if (err  0)
+   return err;
+
err = rtsx_pci_write_phy_register(pcr, PHY_REG_REV,
PHY_REG_REV_RESV | PHY_REG_REV_RXIDLE_LATCHED |
PHY_REG_REV_P1_EN | PHY_REG_REV_RXIDLE_EN |
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 4/4 v3] GPIO: gpio-dwapb: Suspend Resume PM enabling

2014-09-11 Thread Chen, Alvin
 On Tue, 9 Sep 2014, Weike Chen wrote:
 
 
   struct dwapb_gpio;
  +struct dwapb_context;
 
   struct dwapb_gpio_port {
  struct bgpio_chip   bgc;
  boolis_registered;
  struct dwapb_gpio   *gpio;
  +   struct dwapb_context*ctx;
 
 Alvin,
 
 Will this build if CONFIG_PM_SLEEP is not defined?
Actually, PM_SLEEP is always set as 'y' in 'kerne/power/Kconfig'. But I 
manually change it to 'n', this module can be compiled correctly.
You may be concern with 'ctx', and you can see 'ctx' accessing is always in 
CONFIG_PM_SLEEP.


 Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf top -g -U --sort=symbol --children == lalalalala?

2014-09-11 Thread Mike Galbraith
On Thu, 2014-09-11 at 18:30 -0300, Arnaldo Carvalho de Melo wrote:

 Also, looking at the changelog entries and at tools/perf/Documentation/
 the only description for --children, the default, is:
 
 
 --children::
 Accumulate callchain of children to parent entry so that then can
 show up in the output.  The output will have a new Children column
 and will be sorted on the data.  It requires callchains are recorded.
 

grep of course found that, and git log found more, but nothing told me
what the heck it's sweeping up that's so darn plentiful in my box that
there's more than 100% of it laying about :)

 I think that a longer/clearer entry in the 'perf record' man page is
 required.
 
 Perhaps the description got lost in a --cover-letter for the patch
 series implementing it?

If it ever existed, I can't find it.  A little blurb would be helpful.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slab: implement kmalloc guard

2014-09-11 Thread Mikulas Patocka


On Mon, 8 Sep 2014, Christoph Lameter wrote:

 On Mon, 8 Sep 2014, Mikulas Patocka wrote:
 
  I don't know what you mean. If someone allocates 1 objects with sizes
  from 1 to 1, you can't have 1 slab caches - you can't have a slab
  cache for each used size. Also - you can't create a slab cache in
  interrupt context.
 
 Oh you can create them up front on bootup. And I think only the small
 sizes matter. Allocations =8K are pushed to the page allocator anyways.

Only for SLUB. For SLAB, large allocations are still use SLAB caches up to 
4M. But anyway - having 8K preallocated slab caches is too much.

If you want to integrate this patch into the slab/slub subsystem, a better 
solution would be to store the exact size requested with kmalloc along the 
slab/slub object itself (before the preceding redzone). But it would 
result in duplicating the work - you'd have to repeat the logic in this 
patch three times - once for slab, once for slub and once for 
kmalloc_large/kmalloc_large_node.

I don't know if it would be better than this patch.

   We already have a redzone structure to check for writes over the end of
   the object. Lets use that.
 
  So, change all three slab subsystems to use that.
 
 SLOB has no debugging features and I think that was intentional. We are
 trying to unify the debug checks etc. Some work on that would be
 appreciated. I think the kmalloc creation is already in slab_common.c

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Deadlock in vtime_account_user() vs itself across a page fault

2014-09-11 Thread Frederic Weisbecker
On Thu, Sep 11, 2014 at 11:54:34PM +0100, David Howells wrote:
 
 Whilst trying to use docker, I'm occasionally seeing the attached deadlock in
 user time accounting, with a page fault in the middle.  The relevant lines
 from the pre-fault bits of stack:
 
   [8106d954] ? cpuacct_account_field+0x65/0x9a
   (gdb) i li *0x8106d954
   Line 272 of ../kernel/sched/cpuacct.c
 
   kcpustat-cpustat[index] += val;
 
   [81060d41] account_user_time+0x62/0x95
   (gdb) i li *0x81060d41
   Line 151 of ../kernel/sched/cputime.c
 
   acct_account_cputime(p);
 
   [81061254] vtime_account_user+0x62/0x8d
   (gdb) i li *0x81061254
   Line 264 of ../include/linux/seqlock.h
 
   in write_seqcount_end():
   seqcount_release(s-dep_map, 1, _RET_IP_);
 
 I can't see any particular reason there should be a page fault occurring,
 except that there's a duff kernel pointer, but I don't get to find out because
 the page fault handling doesn't get that far:-/
 
 David
 ---
 =
 [ INFO: possible recursive locking detected ]
 3.17.0-rc4-fsdevel+ #706 Tainted: GW 
 -
 NetworkManager/2305 is trying to acquire lock:
  (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: [8106120d] 
 vtime_account_user+0x1b/0x8d
 
 but task is already holding lock:
  (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: [8106120d] 
 vtime_account_user+0x1b/0x8d
 
 other info that might help us debug this:
  Possible unsafe locking scenario:
 
CPU0

   lock(((p-vtime_seqlock)-lock)-rlock);
   lock(((p-vtime_seqlock)-lock)-rlock);
 
  *** DEADLOCK ***
 
  May be due to missing lock nesting notation
 
 3 locks held by NetworkManager/2305:
  #0:  (((p-vtime_seqlock)-lock)-rlock){-.-.-.}, at: 
 [8106120d] vtime_account_user+0x1b/0x8d
  #1:  ((p-vtime_seqlock)-seqcount){-.}, at: [810df2f9] 
 context_tracking_user_exit+0x54/0xb7
  #2:  (rcu_read_lock){..}, at: [8106d8ef] 
 cpuacct_account_field+0x0/0x9a
 
 stack backtrace:
 CPU: 0 PID: 2305 Comm: NetworkManager Tainted: GW  
 3.17.0-rc4-fsdevel+ #706
 Hardware name:  /DG965RY, BIOS 
 MQ96510J.86A.0816.2006.0716.2308 07/16/2006
   8800389bfbe0 815063fd 8235c880
  8800389bfcc0 810717f5 8800389bfcd0 81071a90
   8106d85d 0001 81061200
 Call Trace:
  [815063fd] dump_stack+0x4d/0x66
  [810717f5] __lock_acquire+0x7d7/0x1a2a
  [81071a90] ? __lock_acquire+0xa72/0x1a2a
  [8106d85d] ? cpuacct_css_alloc+0x93/0x93
  [81061200] ? vtime_account_user+0xe/0x8d
  [81071a90] ? __lock_acquire+0xa72/0x1a2a
  [810730fc] lock_acquire+0x8b/0x101
  [810730fc] ? lock_acquire+0x8b/0x101
  [8106120d] ? vtime_account_user+0x1b/0x8d
  [8150bc4b] _raw_spin_lock+0x2b/0x3a
  [8106120d] ? vtime_account_user+0x1b/0x8d
  [8106120d] vtime_account_user+0x1b/0x8d
  [810df2f9] context_tracking_user_exit+0x54/0xb7
  [81030682] do_page_fault+0x3a/0x54
  [8150e462] page_fault+0x22/0x30
  [8106d954] ? cpuacct_account_field+0x65/0x9a

vmalloc'ed areas can fault due to lazy mapping.
That would be an excellent candidate here because cpuacct_account_field()
accesses per cpu stats that are allocated with alloc_percpu() which
uses...vmalloc().

vmalloc() faults have always been a PITA. Especially with per cpu allocation,
basically it means that the kernel can fault about anywhere.

So the only solution I see right now is to move task_group_account_field()
outside the lock. It doesn't need it, but that means I need to split up
account_user_time() and have less common code between tickless and tick time 
accounting.

In the hope that the other accounting code (acct, group accounting, ...) doesn't
access more percpu allocated stuffs.

Ah, I could also have a recursion detection in the vtime_account_*()
functions. Yeah that would be much safer. The recursive call could simply
ignore and let the first caller do the accounting. But that means we could
account exception time into user time.

We could also do both and let the recursive call warn.

BTW I should check if I can turn the seqlock into a seqcount, not that it
would fix anything here though. It looks like it's only ever updated locally.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] ASoC: fsl_ssi: refine ipg clock usage in this module

2014-09-11 Thread Timur Tabi

Shengjiu Wang wrote:

+   ret = clk_prepare_enable(ssi_private-clk);
+   if (ret)
+   return ret;


Will this work on PowerPC, where ssi_private-clk is always NULL?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH net-next v2] r8152: support VLAN

2014-09-11 Thread Hayes Wang
Support hw VLAN for tx and rx. And enable them by default.

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/usb/r8152.c | 79 -
 1 file changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 2130c75..9403219 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -506,6 +506,7 @@ struct rx_desc {
 #define IPF(1  23) /* IP checksum fail */
 #define UDPF   (1  22) /* UDP checksum fail */
 #define TCPF   (1  21) /* TCP checksum fail */
+#define RX_VLAN_TAG(1  16)
 
__le32 opts4;
__le32 opts5;
@@ -531,6 +532,7 @@ struct tx_desc {
 #define MSS_MAX0x7ffU
 #define TCPHO_SHIFT17
 #define TCPHO_MAX  0x7ffU
+#define TX_VLAN_TAG(1  16)
 };
 
 struct r8152;
@@ -1423,6 +1425,25 @@ static int msdn_giant_send_check(struct sk_buff *skb)
return ret;
 }
 
+static inline void rtl_tx_vlan_tag(struct tx_desc *desc, struct sk_buff *skb)
+{
+   if (vlan_tx_tag_present(skb)) {
+   u32 opts2;
+
+   opts2 = TX_VLAN_TAG | swab16(vlan_tx_tag_get(skb));
+   desc-opts2 |= cpu_to_le32(opts2);
+   }
+}
+
+static inline void rtl_rx_vlan_tag(struct rx_desc *desc, struct sk_buff *skb)
+{
+   u32 opts2 = le32_to_cpu(desc-opts2);
+
+   if (opts2  RX_VLAN_TAG)
+   __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+  swab16(opts2  0x));
+}
+
 static int r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc,
 struct sk_buff *skb, u32 len, u32 transport_offset)
 {
@@ -1550,6 +1571,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct 
tx_agg *agg)
continue;
}
 
+   rtl_tx_vlan_tag(tx_desc, skb);
+
tx_data += sizeof(*tx_desc);
 
len = skb-len;
@@ -1691,6 +1714,7 @@ static void rx_bottom(struct r8152 *tp)
memcpy(skb-data, rx_data, pkt_len);
skb_put(skb, pkt_len);
skb-protocol = eth_type_trans(skb, netdev);
+   rtl_rx_vlan_tag(rx_desc, skb);
netif_receive_skb(skb);
stats-rx_packets++;
stats-rx_bytes += pkt_len;
@@ -2082,6 +2106,34 @@ static void r8152_power_cut_en(struct r8152 *tp, bool 
enable)
ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data);
 }
 
+static void rtl_rx_vlan_en(struct r8152 *tp, bool enable)
+{
+   u32 ocp_data;
+
+   ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR);
+   if (enable)
+   ocp_data |= CPCR_RX_VLAN;
+   else
+   ocp_data = ~CPCR_RX_VLAN;
+   ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data);
+}
+
+static int rtl8152_set_features(struct net_device *dev,
+   netdev_features_t features)
+{
+   netdev_features_t changed = features ^ dev-features;
+   struct r8152 *tp = netdev_priv(dev);
+
+   if (changed  NETIF_F_HW_VLAN_CTAG_RX) {
+   if (features  NETIF_F_HW_VLAN_CTAG_RX)
+   rtl_rx_vlan_en(tp, true);
+   else
+   rtl_rx_vlan_en(tp, false);
+   }
+
+   return 0;
+}
+
 #define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)
 
 static u32 __rtl_get_wol(struct r8152 *tp)
@@ -2330,9 +2382,7 @@ static void r8152b_exit_oob(struct r8152 *tp)
ocp_write_dword(tp, MCU_TYPE_USB, USB_TX_DMA,
TEST_MODE_DISABLE | TX_SIZE_ADJUST1);
 
-   ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR);
-   ocp_data = ~CPCR_RX_VLAN;
-   ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data);
+   rtl_rx_vlan_en(tp, tp-netdev-features  NETIF_F_HW_VLAN_CTAG_RX);
 
ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8152_RMS);
 
@@ -2376,9 +2426,7 @@ static void r8152b_enter_oob(struct r8152 *tp)
 
ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8152_RMS);
 
-   ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR);
-   ocp_data |= CPCR_RX_VLAN;
-   ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data);
+   rtl_rx_vlan_en(tp, true);
 
ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PAL_BDC_CR);
ocp_data |= ALDPS_PROXY_MODE;
@@ -2532,9 +2580,7 @@ static void r8153_first_init(struct r8152 *tp)
usleep_range(1000, 2000);
}
 
-   ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_CPCR);
-   ocp_data = ~CPCR_RX_VLAN;
-   ocp_write_word(tp, MCU_TYPE_PLA, PLA_CPCR, ocp_data);
+   rtl_rx_vlan_en(tp, tp-netdev-features  NETIF_F_HW_VLAN_CTAG_RX);
 
ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, RTL8153_RMS);
ocp_write_byte(tp, MCU_TYPE_PLA, 

[Bugfix] x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition

2014-09-11 Thread Jiang Liu
With typical CPU hot-addition flow on x86, PCI host bridges embedded
in physical processor are always associated with NOMA_NO_NODE, which
may cause sub-optimal performance.
1) Handle CPU hot-addition notification
acpi_processor_add()
acpi_processor_get_info()
acpi_processor_hotadd_init()
acpi_map_lsapic()
1.a)acpi_map_cpu2node()

2) Handle PCI host bridge hot-addition notification
acpi_pci_root_add()
pci_acpi_scan_root()
2.a)if (node != NUMA_NO_NODE  !node_online(node)) node = 
NUMA_NO_NODE;

3) Handle memory hot-addition notification
acpi_memory_device_add()
acpi_memory_enable_device()
add_memory()
3.a)node_set_online();

4) Online CPUs through sysfs interfaces
cpu_subsys_online()
cpu_up()
try_online_node()
4.a)node_set_online();

So associated node is always in offline state because it is onlined
until step 3.a or 4.a.

We could improve performance by online node at step 1.a. This change
also makes the code symmetric. Nodes are always created when handling
CPU/memory hot-addition events instead of handling user requests from
sysfs interfaces, and are destroyed when handling CPU/memory hot-removal
events.

It also close a race window caused by kmalloc_node(cpu_to_node(cpu)),
which may cause system panic as below.
[ 3663.324476] BUG: unable to handle kernel paging request at 1f08
[ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops:  [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis 
ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca 
ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic 
usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ 
#21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 
880845fba000
[ 3663.422741] RIP: 0010:[81172219]  [81172219] 
__alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:880845fbbcd0  EFLAGS: 00010246
[ 3663.439265] RAX: 1f00 RBX:  RCX: 
[ 3663.447291] RDX:  RSI: 0a8d RDI: 81a8d950
[ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001
[ 3663.463345] R10: 0001 R11:  R12: 002052d0
[ 3663.471363] R13: 880854c07600 R14: 0002 R15: 
[ 3663.479389] FS:  7f2e8b99e800() GS:88105a40() 
knlGS:
[ 3663.488514] CS:  0010 DS:  ES:  CR0: 80050033
[ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0
[ 3663.503476] Stack:
[ 3663.505757]  811bd74d 880854c01d98 880854c01df0 
880854c01dd0
[ 3663.514167]  0003208ca420 00075a5d84d0 88082fe54b00 
811bb35f
[ 3663.522567]  880854c07600 0003 1f00 
880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753]  [811bd74d] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421]  [811bb35f] ? new_slab+0x3f/0x2d0
[ 3663.546307]  [811bb3c5] new_slab+0xa5/0x2d0
[ 3663.552001]  [81768c97] __slab_alloc+0x35d/0x54a
[ 3663.558185]  [810a4845] ? local_clock+0x25/0x30
[ 3663.564686]  [8177a34c] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609]  [810c77f1] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570]  [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363]  [810b0054] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423]  [810a359f] sched_create_group+0x3f/0x80
[ 3663.613994]  [810b611f] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732]  [8108258a] sys_setsid+0xea/0x110
[ 3663.628020]  [8177f42d] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 
0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 
78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP  [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491]  RSP 880845fbbcd0
[ 3663.668429] CR2: 1f08
[ 3663.672659] ---[ 

RE: [PATCH v8 06/10] mips: sync struct siginfo with general version

2014-09-11 Thread Ren, Qiaowei


On 2014-09-12, Thomas Gleixner wrote:
 On Thu, 11 Sep 2014, Qiaowei Ren wrote:
 
 Due to new fields about bound violation added into struct siginfo,
 this patch syncs it with general version to avoid build issue.
 
 You completely fail to explain which build issue is addressed by this
 patch. The code you added to kernel/signal.c which accesses _addr_bnd
 is guarded by
 
 +#ifdef SEGV_BNDERR
 
 which is not defined my MIPS. Also why is this only affecting MIPS and
 not any other architecture which provides its own struct siginfo ?
 
 That patch makes no sense at all, at least not without a proper explanation.


For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include 
general siginfo.h, and only replace general stuct siginfo with mips specific 
struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get 
error like no _lower in struct siginfo when arch=mips.

In addition, only MIPS arch define its own struct siginfo, so this is only 
affecting MIPS. 

Thanks,
Qiaowei

 
 Signed-off-by: Qiaowei Ren qiaowei@intel.com
 ---
  arch/mips/include/uapi/asm/siginfo.h |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 diff --git a/arch/mips/include/uapi/asm/siginfo.h
 b/arch/mips/include/uapi/asm/siginfo.h
 index e811744..d08f83f 100644
 --- a/arch/mips/include/uapi/asm/siginfo.h
 +++ b/arch/mips/include/uapi/asm/siginfo.h
 @@ -92,6 +92,10 @@ typedef struct siginfo {
  int _trapno;/* TRAP # which caused the signal */
  #endif
  short _addr_lsb;
 +struct {
 +void __user *_lower;
 +void __user *_upper;
 +} _addr_bnd;
  } _sigfault;
  
  /* SIGPOLL, SIGXFSZ (To do ...)  */
 --
 1.7.1
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables

2014-09-11 Thread Ren, Qiaowei


On 2014-09-11, Hansen, Dave wrote:
 On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 + * This function will be called by do_munmap(), and the VMAs
 + covering
 + * the virtual address region start...end have already been split
 + if
 + * necessary and remvoed from the VMA list.
 
 remvoed - removed
 
 +void mpx_unmap(struct mm_struct *mm,
 +unsigned long start, unsigned long end) {
 +int ret;
 +
 +ret = mpx_try_unmap(mm, start, end);
 +if (ret == -EINVAL)
 +force_sig(SIGSEGV, current);
 +}
 
 In the case of a fault during an unmap, this just ignores the
 situation and returns silently.  Where is the code to retry the
 freeing operation outside of mmap_sem?

Dave, you mean delayed_work code? According to our discussion, it will be 
deferred to another mainline post.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] Hibernate: Do not assume the first e820 area to be RAM

2014-09-11 Thread Lee, Chun-Yi
In arch/x86/kernel/setup.c::trim_bios_range(), the codes introduced
by 1b5576e6 (base on d8a9e6a5), it updates the first 4Kb of memory
to be E820_RESERVED region. That's because it's a BIOS owned area
but generally not listed in the E820 table:

[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00096fff] usable
[0.00] BIOS-e820: [mem 0x00097000-0x00097fff] reserved
...
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable

But the region of first 4Kb didn't register to nosave memory:

[0.00] PM: Registered nosave memory: [mem 0x00097000-0x00097fff]
[0.00] PM: Registered nosave memory: [mem 0x000a-0x000f]

The codes in e820_mark_nosave_regions() assumes the first e820 area to be
RAM, so it causes the first 4Kb E820_RESERVED region ignored when register
to nosave. This patch removed assumption of the first e820 area.

v2:
To avoid the i  0 checking in for loop. (coding suggestion from Yinghai Lu)

Cc: Rafael J. Wysocki r...@rjwysocki.net
Cc: Len Brown len.br...@intel.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Yinghai Lu ying...@kernel.org
Acked-by: Pavel Machek pa...@ucw.cz
Signed-off-by: Lee, Chun-Yi j...@suse.com
---
 arch/x86/kernel/e820.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

Index: linux-3.12-SLE12/arch/x86/kernel/e820.c
===
--- linux-3.12-SLE12.orig/arch/x86/kernel/e820.c
+++ linux-3.12-SLE12/arch/x86/kernel/e820.c
@@ -682,15 +682,14 @@ void __init parse_e820_ext(u64 phys_addr
  * hibernation (32 bit) or software suspend and suspend to RAM (64 bit).
  *
  * This function requires the e820 map to be sorted and without any
- * overlapping entries and assumes the first e820 area to be RAM.
+ * overlapping entries.
  */
 void __init e820_mark_nosave_regions(unsigned long limit_pfn)
 {
int i;
-   unsigned long pfn;
+   unsigned long pfn = 0;
 
-   pfn = PFN_DOWN(e820.map[0].addr + e820.map[0].size);
-   for (i = 1; i  e820.nr_map; i++) {
+   for (i = 0; i  e820.nr_map; i++) {
struct e820entry *ei = e820.map[i];
 
if (pfn  PFN_UP(ei-addr))
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

2014-09-11 Thread Ren, Qiaowei


On 2014-09-11, Hansen, Dave wrote:
 On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 +
 +return (void __user *)(unsigned long)(xsave_buf-bndcsr.cfg_reg_u 
 +MPX_BNDCFG_ADDR_MASK);
 +}
 
 I don't think casting a u64 to a ulong, then to a pointer is useful.
 Just take the '(unsigned long)' out.

If so, this will spits out a warning on 32-bit:

arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir':
arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 4/4] xen, blkback: add support for multiple block rings

2014-09-11 Thread Bob Liu

On 09/12/2014 07:45 AM, Arianna Avanzini wrote:
 On Fri, Aug 22, 2014 at 02:15:58PM +0100, David Vrabel wrote:
 On 22/08/14 12:20, Arianna Avanzini wrote:
 This commit adds to xen-blkback the support to retrieve the block
 layer API being used and the number of available hardware queues,
 in case the block layer is using the multi-queue API. This commit
 also lets the driver advertise the number of available hardware
 queues to the frontend via XenStore, therefore allowing for actual
 multiple I/O rings to be used.

 Does it make sense for number of queues should be dependent on the
 number of queues available in the underlying block device?  
 
 Thank you for raising that point. It probably is not the best solution.
 
 Bob Liu suggested to have the number of I/O rings depend on the number
 of vCPUs in the driver domain. Konrad Wilk suggested to compute the
 number of I/O rings according to the following formula to preserve the
 possibility to explicitly define the number of hardware queues to be
 exposed to the frontend:
 what_backend_exposes = some_module_parameter ? :
min(nr_online_cpus(), nr_hardware_queues()).
 io_rings = min(nr_online_cpus(), what_backend_exposes);
 
 (Please do correct me if I misunderstood your point)

Since xen-netfront/xen-netback driver have already implemented
multi-queue, I'd like we can use the same way as the net driver
negotiate of number of queues.

Thanks,
-Bob
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] implement zsmalloc shrinking

2014-09-11 Thread Seth Jennings
On Thu, Sep 11, 2014 at 04:53:51PM -0400, Dan Streetman wrote:
 Now that zswap can use zsmalloc as a storage pool via zpool, it will
 try to shrink its zsmalloc zs_pool once it reaches its max_pool_percent
 limit.  These patches implement zsmalloc shrinking.  The way the pool is
 shrunk is by finding a zspage and reclaiming it, by evicting each of its
 objects that is in use.
 
 Without these patches zswap, and any other future user of zpool/zsmalloc
 that attempts to shrink the zpool/zs_pool, will only get errors and will
 be unable to shrink its zpool/zs_pool.  With the ability to shrink, zswap
 can keep the most recent compressed pages in memory.
 
 Note that the design of zsmalloc makes it impossible to actually find the
 LRU zspage, so each class and fullness group is searched in a round-robin
 method to find the next zspage to reclaim.  Each fullness group orders its
 zspages in LRU order, so the oldest zspage is used for each fullness group.

After a quick inspection, the code looks reasonable.  Thanks!

I do wonder if this actually works well in practice though.

Have you run any tests that overflow the zsmalloc pool?  What does
performance look like at that point?  I would expect it would be worse
than allowing the overflow pages to go straight to swap, since, in
almost every case, you would be writing back more than one page.  In
some cases, MANY more than one page (up to 255 for a full zspage in the
minimum class size).

There have always been two sticking points with shrinking in zsmalloc
(one of which you have mentioned)

1) Low LRU locality among objects in a zspage.  zsmalloc values density
over reclaim ordering so it is hard to make good reclaim selection
decisions.

2) Writeback storm. If you try to reclaim a zspage with lots of objects
(i.e. small class size in fullness group ZS_FULL) you can create a ton
of memory pressure by uncompressing objects and adding them to the swap
cache.

A few reclaim models:

- Reclaim zspage with fewest objects: 

  This reduces writeback storm but would likely reclaim more recently
  allocated zspages that contain more recently used (added) objects.

- Reclaim zspage with largest class size:

  This also reduces writeback storm as zspages with larger objects
  (poorly compressible) are written back first.  This is not LRU though.
  This is the best of the options IMHO.  I'm not saying that is it good.

- Reclaim LRU round-robin through the fullness groups (approach used):

  The LRU here is limited since as the number of object in the zspage
  increase, it is LRU only wrt the most recently added object in the
  zspage.  It also has high risk of a writeback storm since it will
  eventually try to reclaim from the ZS_FULL group of the minimum class
  size.

There is also the point that writing back objects might not be the best
way to reclaim from zsmalloc at all.  Maybe compaction is the way to go.
This was recently discussed on the list.

http://marc.info/?l=linux-mmm=140917577412645w=2

As mentioned in that thread, it would require zsmalloc to add a
layer of indirection so that the objects could be relocated without
notifying the user.  The compaction mechanism would also be fun to
design I imagine.  But, in my mind, compaction is really needed,
regardless of whether or not zsmalloc is capable of writeback, and would
be more beneficial.

tl;dr version:

I would really need to see some evidence (and try it myself) that this
didn't run off a cliff when you overflow the zsmalloc pool.  It seems
like additional risk and complexity to avoid LRU inversion _after_ the
pool overflows.  And by avoid I mean maybe avoid as the reclaim
selection is just slightly more LRUish than random selection.

Thanks,
Seth

 
 ---
 
 This patch set applies to linux-next.
 
 Dan Streetman (10):
   zsmalloc: fix init_zspage free obj linking
   zsmalloc: add fullness group list for ZS_FULL zspages
   zsmalloc: always update lru ordering of each zspage
   zsmalloc: move zspage obj freeing to separate function
   zsmalloc: add atomic index to find zspage to reclaim
   zsmalloc: add zs_ops to zs_pool
   zsmalloc: add obj_handle_is_free()
   zsmalloc: add reclaim_zspage()
   zsmalloc: add zs_shrink()
   zsmalloc: implement zs_zpool_shrink() with zs_shrink()
 
  drivers/block/zram/zram_drv.c |   2 +-
  include/linux/zsmalloc.h  |   7 +-
  mm/zsmalloc.c | 314 
 +-
  3 files changed, 290 insertions(+), 33 deletions(-)
 
 -- 
 1.8.3.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking

2014-09-11 Thread Seth Jennings
On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
 When zsmalloc creates a new zspage, it initializes each object it contains
 with a link to the next object, so that the zspage has a singly-linked list
 of its free objects.  However, the logic that sets up the links is wrong,
 and in the case of objects that are precisely aligned with the page boundries
 (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
 next page is skipped, due to incrementing the offset twice.  The logic can be
 simplified, as it doesn't need to calculate how many objects can fit on the
 current page; simply checking the offset for each object is enough.
 
 Change zsmalloc init_zspage() logic to iterate through each object on
 each of its pages, checking the offset to verify the object is on the
 current page before linking it into the zspage.
 
 Signed-off-by: Dan Streetman ddstr...@ieee.org
 Cc: Minchan Kim minc...@kernel.org

This one stands on its own as a bugfix.

Reviewed-by: Seth Jennings sjenni...@variantweb.net

 ---
  mm/zsmalloc.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)
 
 diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
 index c4a9157..03aa72f 100644
 --- a/mm/zsmalloc.c
 +++ b/mm/zsmalloc.c
 @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
   while (page) {
   struct page *next_page;
   struct link_free *link;
 - unsigned int i, objs_on_page;
 + unsigned int i = 1;
  
   /*
* page-index stores offset of first object starting
 @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
  
   link = (struct link_free *)kmap_atomic(page) +
   off / sizeof(*link);
 - objs_on_page = (PAGE_SIZE - off) / class-size;
  
 - for (i = 1; i = objs_on_page; i++) {
 - off += class-size;
 - if (off  PAGE_SIZE) {
 - link-next = obj_location_to_handle(page, i);
 - link += class-size / sizeof(*link);
 - }
 + while ((off += class-size)  PAGE_SIZE) {
 + link-next = obj_location_to_handle(page, i++);
 + link += class-size / sizeof(*link);
   }
  
   /*
 @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
   link-next = obj_location_to_handle(next_page, 0);
   kunmap_atomic(link);
   page = next_page;
 - off = (off + class-size) % PAGE_SIZE;
 + off %= PAGE_SIZE;
   }
  }
  
 -- 
 1.8.3.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the slave-dma tree

2014-09-11 Thread Stephen Rothwell
Hi Vinod,

After merging the slave-dma tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:


drivers/spi/spi-pxa2xx-pci.c:70:3: error: unknown field 'max_clk_rate' 
specified in initializer
   .max_clk_rate = 5000,
   ^

Caused by commit bfe607a528ba (spi/pxa2xx-pci: Add support for Intel
Braswell).

I have used the slave-dma tree from next-20140911 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


[PATCH v2 0/2] Add irq_over_gpio DT support to STMPE

2014-09-11 Thread Sean Cross
These patches add support for using a GPIO as an IRQ source for the
STMPE module when configured using device tree.

Changes since v1:
- Split actual patch and Documentation into two parts

Sean Cross (2):
  mfd: stmpe: support gpio over irq under device tree
  mfd: stmpe: Document DT binding for irq_over_gpio

 Documentation/devicetree/bindings/mfd/stmpe.txt | 1 +
 drivers/mfd/stmpe.c | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] mfd: stmpe: support gpio over irq under device tree

2014-09-11 Thread Sean Cross
The stmpe_platform_data has a irq_over_gpio field, which allows the
system to read STMPE events whenever an IRQ occurs on a GPIO pin.
This patch adds the ability to configure this field and to use a GPIO
as an IRQ source for boards configuring the STMPE in device tree.

Signed-off-by: Sean Cross x...@kosagi.com
---
 drivers/mfd/stmpe.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 3b6bfa7..4c42b05 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -1122,7 +1122,12 @@ static void stmpe_of_probe(struct stmpe_platform_data 
*pdata,
if (pdata-id  0)
pdata-id = -1;
 
-   pdata-irq_trigger = IRQF_TRIGGER_NONE;
+   pdata-irq_gpio = of_get_named_gpio_flags(np, irq-gpio, 0,
+   pdata-irq_trigger);
+   if (gpio_is_valid(pdata-irq_gpio))
+   pdata-irq_over_gpio = 1;
+   else
+   pdata-irq_trigger = IRQF_TRIGGER_NONE;
 
of_property_read_u32(np, st,autosleep-timeout,
pdata-autosleep_timeout);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] mfd: stmpe: Document DT binding for irq_over_gpio

2014-09-11 Thread Sean Cross
STMPE now supports using a GPIO as an IRQ source.  Document the device
tree binding for this option.

Signed-off-by: Sean Cross x...@kosagi.com
---
 Documentation/devicetree/bindings/mfd/stmpe.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt 
b/Documentation/devicetree/bindings/mfd/stmpe.txt
index 56edb55..3fb68bf 100644
--- a/Documentation/devicetree/bindings/mfd/stmpe.txt
+++ b/Documentation/devicetree/bindings/mfd/stmpe.txt
@@ -13,6 +13,7 @@ Optional properties:
  - interrupt-parent : Specifies which IRQ controller we're 
connected to
  - wakeup-source: Marks the input device as wakable
  - st,autosleep-timeout : Valid entries (ms); 4, 16, 32, 64, 128, 256, 
512 and 1024
+ - irq-gpio : If present, which GPIO to use for event IRQ
 
 Example:
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/10] zsmalloc: always update lru ordering of each zspage

2014-09-11 Thread Seth Jennings
On Thu, Sep 11, 2014 at 04:53:54PM -0400, Dan Streetman wrote:
 Update ordering of a changed zspage in its fullness group LRU list,
 even if it has not moved to a different fullness group.
 
 This is needed by zsmalloc shrinking, which partially relies on each
 class fullness group list to be kept in LRU order, so the oldest can
 be reclaimed first.  Currently, LRU ordering is only updated when
 a zspage changes fullness groups.

Just something I saw.

fix_fullness_group() is called from zs_free(), which means that removing
an object from a zspage moves it to the front of the LRU.  Not sure if
that is what we want.  If anything that makes it a _better_ candidate
for reclaim as the zspage is now contains fewer objects that we'll have
to decompress and writeback.

Seth

 
 Signed-off-by: Dan Streetman ddstr...@ieee.org
 Cc: Minchan Kim minc...@kernel.org
 ---
  mm/zsmalloc.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)
 
 diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
 index fedb70f..51db622 100644
 --- a/mm/zsmalloc.c
 +++ b/mm/zsmalloc.c
 @@ -467,16 +467,14 @@ static enum fullness_group fix_fullness_group(struct 
 zs_pool *pool,
   BUG_ON(!is_first_page(page));
  
   get_zspage_mapping(page, class_idx, currfg);
 - newfg = get_fullness_group(page);
 - if (newfg == currfg)
 - goto out;
 -
   class = pool-size_class[class_idx];
 + newfg = get_fullness_group(page);
 + /* Need to do this even if currfg == newfg, to update lru */
   remove_zspage(page, class, currfg);
   insert_zspage(page, class, newfg);
 - set_zspage_mapping(page, class_idx, newfg);
 + if (currfg != newfg)
 + set_zspage_mapping(page, class_idx, newfg);
  
 -out:
   return newfg;
  }
  
 -- 
 1.8.3.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] toshiba_acpi: Support new keyboard backlight type

2014-09-11 Thread Azael Avalos
Newer Toshiba models now come with a new (and different) keyboard
backlight implementation with three modes of operation: TIMER,
ON and OFF, and the LED is controlled internally by the firmware.

This patch adds support for that type of backlight, changing the
existing code to accomodate the new implementation.

The timeout value range is now 1-60 seconds, and the accepted
modes are now: 1 (FN-Z), 2 (AUTO or TIMER), 8(ON) and 10 (OFF),
this adds two new entries keyboard_type and available_kbd_modes,
the first shows the keyboard type and the latter shows the
supported modes depending on the type.

Signed-off-by: Azael Avalos coproscef...@gmail.com
---
 drivers/platform/x86/toshiba_acpi.c | 193 +---
 1 file changed, 158 insertions(+), 35 deletions(-)

diff --git a/drivers/platform/x86/toshiba_acpi.c 
b/drivers/platform/x86/toshiba_acpi.c
index 4c8fa7b..a5d7d83 100644
--- a/drivers/platform/x86/toshiba_acpi.c
+++ b/drivers/platform/x86/toshiba_acpi.c
@@ -138,8 +138,12 @@ MODULE_LICENSE(GPL);
 #define HCI_WIRELESS_BT_PRESENT0x0f
 #define HCI_WIRELESS_BT_ATTACH 0x40
 #define HCI_WIRELESS_BT_POWER  0x80
+#define SCI_KBD_MODE_MASK  0x1f
 #define SCI_KBD_MODE_FNZ   0x1
 #define SCI_KBD_MODE_AUTO  0x2
+#define SCI_KBD_MODE_ON0x8
+#define SCI_KBD_MODE_OFF   0x10
+#define SCI_KBD_TIME_MAX   0x3c001a
 
 struct toshiba_acpi_dev {
struct acpi_device *acpi_dev;
@@ -155,6 +159,7 @@ struct toshiba_acpi_dev {
int force_fan;
int last_key_event;
int key_event_valid;
+   int kbd_type;
int kbd_mode;
int kbd_time;
 
@@ -495,6 +500,42 @@ static enum led_brightness toshiba_illumination_get(struct 
led_classdev *cdev)
 }
 
 /* KBD Illumination */
+static int toshiba_kbd_illum_available(struct toshiba_acpi_dev *dev)
+{
+   u32 in[HCI_WORDS] = { SCI_GET, SCI_KBD_ILLUM_STATUS, 0, 0, 0, 0 };
+   u32 out[HCI_WORDS];
+   acpi_status status;
+
+   if (!sci_open(dev))
+   return 0;
+
+   status = hci_raw(dev, in, out);
+   sci_close(dev);
+   if (ACPI_FAILURE(status) || out[0] == SCI_INPUT_DATA_ERROR) {
+   pr_err(ACPI call to query kbd illumination support failed\n);
+   return 0;
+   } else if (out[0] == HCI_NOT_SUPPORTED) {
+   pr_info(Keyboard illumination not available\n);
+   return 0;
+   }
+
+   /* Check for keyboard backlight timeout max value,
+* previous kbd backlight implementation set this to
+* 0x3c0003, and now the new implementation set this
+* to 0x3c001a, use this to distinguish between them
+*/
+   if (out[3] == SCI_KBD_TIME_MAX)
+   dev-kbd_type = 2;
+   else
+   dev-kbd_type = 1;
+   /* Get the current keyboard backlight mode */
+   dev-kbd_mode = out[2]  SCI_KBD_MODE_MASK;
+   /* Get the current time (1-60 seconds) */
+   dev-kbd_time = out[2]  HCI_MISC_SHIFT;
+
+   return 1;
+}
+
 static int toshiba_kbd_illum_status_set(struct toshiba_acpi_dev *dev, u32 time)
 {
u32 result;
@@ -1254,6 +1295,62 @@ static const struct backlight_ops toshiba_backlight_data 
= {
 /*
  * Sysfs files
  */
+static ssize_t toshiba_kbd_bl_mode_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count);
+static ssize_t toshiba_kbd_bl_mode_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf);
+static ssize_t toshiba_kbd_type_show(struct device *dev,
+struct device_attribute *attr,
+char *buf);
+static ssize_t toshiba_available_kbd_modes_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf);
+static ssize_t toshiba_kbd_bl_timeout_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count);
+static ssize_t toshiba_kbd_bl_timeout_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf);
+static ssize_t toshiba_touchpad_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count);
+static ssize_t toshiba_touchpad_show(struct device *dev,
+struct device_attribute *attr,
+char *buf);
+static ssize_t toshiba_position_show(struct device *dev,
+struct device_attribute *attr,
+

Re: [PATCH v2] clocksource: arch_timer: Allow the device tree to specify the physical timer

2014-09-11 Thread Sonny Rao
On Thu, Sep 11, 2014 at 6:17 PM, Stephen Boyd sb...@codeaurora.org wrote:
 On 09/11/14 17:14, Sonny Rao wrote:

 On Thu, Sep 11, 2014 at 4:56 PM, Stephen Boyd sb...@codeaurora.org wrote:


 Where does this platform jump to when a CPU comes up? Is it
 rockchip_secondary_startup()? I wonder if that path could have this
 little bit of assembly to poke the cntvoff in monitor mode and then jump
 to secondary_startup()? Before we boot any secondary CPUs we could also
 read the cntvoff for CPU0 in the platform specific layer (where we know
 we're running in secure mode) and then use that value as the reset
 value for the secondaries. Or does this platform boot up in secure mode
 some times and non-secure mode other times?


 Yes, In our case, with our firmware, we will go through some internal Rom
 code and then jump to rockchip_secondary_startup, but I don't think it's
 correct to force all users of this SoC to do it that way.


 What's being forced? The way internal rom jumps to sram? Is there any other
 way that secondary CPUs come out of reset on this SoC? From looking at the
 code it seems like the only path is internal rom jumps to sram (where
 rockchip_secondary_trampoline lives) which jumps to
 rockchip_secondary_startup() which then does an invalidate and jump to
 secondary_startup(). Linux controls everything besides the internal rom. Is
 something different in your case?


There are other ways it can be done, and I don't know all of the
possibilities, but there seems to be some protocol with the iROM that
tells it where to go, which the current SMP patches are using by
putting a magic number and an address in SRAM.  I think it's true that
in our case, it really is pretty simple and we have secure SVC mode
and not much else runs (besides the iROM).

Since I don't know all of the possibilities, I didn't want to preclude
the possibility that someone else handled things differently and
entered the kernel in non-secure mode, and have some code there that
broke in that instance, that's all I meant by forced.


  If there were a reasonable way to determine for sure that we are in secure
 mode, then yes we could do what you're suggesting, and I'd be happy to code
 that up.


 I think the problem is that there isn't a great way to determine whether
 we're in secure mode or not, and this is maybe by design?  I don't
 particularly understand that design choice.  It would be nice to hear some
 rationale from ARM folks.


 I'm thinking we would have a different boot-method for secure vs. non-secure
 and then we would know to configure cntvoff or not based on the boot method.
 Isn't that a reasonable way of knowing what should be done? It seems like we
 can at least modify the DT for this SoC.

Putting something into the device-tree is in fact the point of this
patch, so it is sort of doing what you're suggesting, although this
patch is about being able use to physical counters and doesn't
indicate anything about secure vs non-secure.  What else do you think
could be used to differentiate between the two cases, besides putting
it into the DT?


 I still wonder if there is such a bootloader/hypervisor/rom that's putting
 this SoC into non-secure mode and not configuring cntvoff. Doug's comments
 seem to suggest that the whole world would be different if this were true.
 Maybe Heiko knows?

As far as I'm aware, there's no bootloader/firmware that's ever
putting the CPU into non-secure mode for our case.

 --
 Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 hosted by The Linux Foundation
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 16/17] arcmsr: support new adapter ARC12x4 series

2014-09-11 Thread Ching Huang
On Thu, 2014-09-11 at 16:21 +0200, Tomas Henzl wrote:
 On 09/11/2014 05:59 AM, Ching Huang wrote:
  On Wed, 2014-09-10 at 11:58 +0200, Tomas Henzl wrote:
  On 09/09/2014 06:30 PM, Christoph Hellwig wrote:
  Ching,
 
  do you have a chance to address Thomas second concern below?  As
  far as I can tell (Thomas, please correct me) that's the last
  outstanding concern, and I'd really like to merge the arcmsr updates
  for the Linux 3.18 merge window.
  Correct, still awaiting a response.
  Christoph, Tomas,
 
  Sorry for the late reply.
 
  I think I misunderstand Tomas' meaning.
  The spin lock in arcmsr_hbaD_polling_ccbdone() is necessary to protect 
  doneq_index and have to be modified as following.
 
 OK, so you are going to repost 16/17 ? If so, please describe all changes 
 you'll do, in that new post.
 
By previous review, I will post two patches.
One for 13/17 and another for 16/17.
These patches are relative to 
http://git.infradead.org/users/hch/scsi-queue.git/tree/arcmsr-for-3.18:/drivers/scsi/arcmsr
 
  static int arcmsr_hbaD_polling_ccbdone(struct AdapterControlBlock *acb,
  struct CommandControlBlock *poll_ccb)
  {
  bool error;
  uint32_t poll_ccb_done = 0, poll_count = 0, flag_ccb, ccb_cdb_phy;
  int rtn, doneq_index, index_stripped, outbound_write_pointer, toggle;
  unsigned long flags;
  struct ARCMSR_CDB *arcmsr_cdb;
  struct CommandControlBlock *pCCB;
  struct MessageUnit_D *pmu = acb-pmuD;
 
  polling_hbaD_ccb_retry:
  poll_count++;
  while (1) {
  spin_lock_irqsave(acb-doneq_lock, flags);
  outbound_write_pointer = pmu-done_qbuffer[0].addressLow + 1;
  doneq_index = pmu-doneq_index;
  if ((outbound_write_pointer  0xFFF) == (doneq_index  0xFFF)) {
  spin_unlock_irqrestore(acb-doneq_lock, flags);
  if (poll_ccb_done) {
  rtn = SUCCESS;
  break;
  } else {
  msleep(25);
  if (poll_count  40) {
  rtn = FAILED;
  break;
  }
  goto polling_hbaD_ccb_retry;
  }
  }
  toggle = doneq_index  0x4000;
  index_stripped = (doneq_index  0xFFF) + 1;
  index_stripped %= ARCMSR_MAX_ARC1214_DONEQUEUE;
  pmu-doneq_index = index_stripped ? (index_stripped | toggle) :
  ((index_stripped + 1) | (toggle ^ 0x4000));
  spin_unlock_irqrestore(acb-doneq_lock, flags);
  doneq_index = pmu-doneq_index;
  flag_ccb = pmu-done_qbuffer[doneq_index  0xFFF].addressLow;
  ccb_cdb_phy = (flag_ccb  0xFFF0);
  arcmsr_cdb = (struct ARCMSR_CDB *)(acb-vir2phy_offset +
  ccb_cdb_phy);
  pCCB = container_of(arcmsr_cdb, struct CommandControlBlock,
  arcmsr_cdb);
  poll_ccb_done |= (pCCB == poll_ccb) ? 1 : 0;
  if ((pCCB-acb != acb) ||
  (pCCB-startdone != ARCMSR_CCB_START)) {
  if (pCCB-startdone == ARCMSR_CCB_ABORTED) {
  pr_notice(arcmsr%d: scsi id = %d 
  lun = %d ccb = '0x%p' poll command 
  abort successfully\n
  , acb-host-host_no
  , pCCB-pcmd-device-id
  , (u32)pCCB-pcmd-device-lun
  , pCCB);
  pCCB-pcmd-result = DID_ABORT  16;
  arcmsr_ccb_complete(pCCB);
  continue;
  }
  pr_notice(arcmsr%d: polling an illegal 
  ccb command done ccb = '0x%p' 
  ccboutstandingcount = %d\n
  , acb-host-host_no
  , pCCB
  , atomic_read(acb-ccboutstandingcount));
  continue;
  }
  error = (flag_ccb  ARCMSR_CCBREPLY_FLAG_ERROR_MODE1)
  ? true : false;
  arcmsr_report_ccb_state(acb, pCCB, error);
  }
  return rtn;
  }
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-scsi in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-scsi in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-11 Thread tangchen

Hi Gleb, Paolo,

On 09/11/2014 10:47 PM, Gleb Natapov wrote:

On Thu, Sep 11, 2014 at 04:37:39PM +0200, Paolo Bonzini wrote:

Il 11/09/2014 16:31, Gleb Natapov ha scritto:

What if the page being swapped out is L1's APIC access page?  We don't
run prepare_vmcs12 in that case because it's an L2-L0-L2 entry, so we
need to do something.

We will do something on L2-L1 exit. We will call kvm_reload_apic_access_page().
That is what patch 5 of this series is doing.

Sorry, I meant the APIC access page prepared by L1 for L2's execution.

You wrote:


if (!is_guest_mode() || !(vmcs12-secondary_vm_exec_control  
ECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 write(PIC_ACCESS_ADDR)

In other words if L2 shares L1 apic access page then reload, otherwise do 
nothing.

but in that case you have to redo nested_get_page, so do nothing
doesn't work.


Ah, 7/7 is new in this submission. Before that this page was still
pinned.  Looking at 7/7 now I do not see how it can work since it has no
code for mmu notifier to detect that it deals with such page and call
kvm_reload_apic_access_page().


Since L1 and L2 share one apic page, if the page is unmapped, 
mmu_notifier will

be called, and :

 - if vcpu is in L1, a L1-L0 exit is rised. apic page's pa will be 
updated in the next

   L0-L1 entry by making vcpu request.

 - if vcpu is in L2 (is_guest_mode, right?), a L2-L0 exit is rised. 
nested_vmx_vmexit()
   will not be called since it is called in L2-L1 exit. It returns 
from vmx_vcpu_run()
   directly, right ? So we should update apic page in L0-L2 entry. 
This is also done

   by making vcpu request, right ?.

   prepare_vmcs02() is called in L1-L2 entry, and nested_vmx_vmexit() 
is called in
   L2-L1 exit. So we also need to update L1's vmcs in 
nested_vmx_vmexit() in patch 5/7.


IIUC, I think patch 1~6 has done such things.

And yes, the is_guest_mode() check is not needed.


I said to Tang previously that nested
kvm has a bunch of pinned page that are hard to deal with and suggested
to iron out non nested case first :(


Yes, and maybe adding patch 7 is not a good idea for now.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 4/7] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-11 Thread tangchen

Hi Paolo,

On 09/11/2014 10:24 PM, Paolo Bonzini wrote:

Il 11/09/2014 16:21, Gleb Natapov ha scritto:

As far as I can tell the if that is needed there is:

if (!is_guest_mode() || !(vmcs12-secondary_vm_exec_control  
ECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 write(PIC_ACCESS_ADDR)

In other words if L2 shares L1 apic access page then reload, otherwise do 
nothing.

What if the page being swapped out is L1's APIC access page?  We don't
run prepare_vmcs12 in that case because it's an L2-L0-L2 entry, so we
need to do something.


Are you talking about the case that L1 and L2 have different apic pages ?
I think I didn't deal with this situation in this patch set.

Sorry I didn't say it clearly. Here, I assume L1 and L2 share the same 
apic page.
If we are in L2, and the page is migrated, we updated L2's vmcs by 
making vcpu
request. And of course, we should also update L1's vmcs. This is done by 
patch 5.

We make vcpu request again in nested_vmx_exit().

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH net 1/2] r8169: fix the default setting of rx vlan

2014-09-11 Thread Hayes Wang
If the parameter features of __rtl8169_set_features() is equal to
dev-features, the variable changed is alwayes 0, and nothing would
be changed.

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/ethernet/realtek/r8169.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 91652e7..f3ce284 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6707,7 +6707,12 @@ static int rtl_open(struct net_device *dev)
 
rtl8169_init_phy(dev, tp);
 
-   __rtl8169_set_features(dev, dev-features);
+   if (dev-features  NETIF_F_HW_VLAN_CTAG_RX)
+   tp-cp_cmd |= RxVlan;
+   else
+   tp-cp_cmd = ~RxVlan;
+
+   RTL_W16(CPlusCmd, tp-cp_cmd);
 
rtl_pll_power_up(tp);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH net 0/2] r8169: fix rx vlan

2014-09-11 Thread Hayes Wang
There are two issues for hw rx vlan. The patches are
used to fix them.

Hayes Wang (2):
  r8169: fix the default setting of rx vlan
  r8169: fix setting rx vlan

 drivers/net/ethernet/realtek/r8169.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH net 2/2] r8169: fix setting rx vlan

2014-09-11 Thread Hayes Wang
The setting should depend on the new features not the current one.

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/ethernet/realtek/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index f3ce284..7a7860a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1796,7 +1796,7 @@ static void __rtl8169_set_features(struct net_device *dev,
else
tp-cp_cmd = ~RxChkSum;
 
-   if (dev-features  NETIF_F_HW_VLAN_CTAG_RX)
+   if (features  NETIF_F_HW_VLAN_CTAG_RX)
tp-cp_cmd |= RxVlan;
else
tp-cp_cmd = ~RxVlan;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/9] locktorture: Add documentation

2014-09-11 Thread Davidlohr Bueso
Just like Documentation/RCU/torture.txt, begin a document for the
locktorture module. This module is still pretty green, so I have
just added some specific sections to the doc (general desc, params,
usage, etc.). Further development should update the file.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 Documentation/locking/locktorture.txt | 128 ++
 1 file changed, 128 insertions(+)
 create mode 100644 Documentation/locking/locktorture.txt

diff --git a/Documentation/locking/locktorture.txt 
b/Documentation/locking/locktorture.txt
new file mode 100644
index 000..c0ab969
--- /dev/null
+++ b/Documentation/locking/locktorture.txt
@@ -0,0 +1,128 @@
+Kernel Lock Torture Test Operation
+
+CONFIG_LOCK_TORTURE_TEST
+
+The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
+that runs torture tests on core kernel locking primitives. The kernel
+module, 'locktorture', may be built after the fact on the running
+kernel to be tested, if desired. The tests periodically outputs status
+messages via printk(), which can be examined via the dmesg (perhaps
+grepping for torture).  The test is started when the module is loaded,
+and stops when the module is unloaded. This program is based on how RCU
+is tortured, via rcutorture.
+
+This torture test consists of creating a number of kernel threads which
+acquires the lock and holds it for specific amount of time, thus simulating
+different critical region behaviors. The amount of contention on the lock
+can be simulated by either enlarging this critical region hold time and/or
+creating more kthreads.
+
+
+MODULE PARAMETERS
+
+This module has the following parameters:
+
+
+   ** Locktorture-specific **
+
+nwriters_stress   Number of kernel threads that will stress exclusive lock
+ ownership (writers). The default value is twice the amount
+ of online CPUs.
+
+torture_type Type of lock to torture. By default, only spinlocks will
+ be tortured. This module can torture the following locks,
+ with string values as follows:
+
+o lock_busted: Simulates a buggy lock implementation.
+
+o spin_lock: spin_lock() and spin_unlock() pairs.
+
+o spin_lock_irq: spin_lock_irq() and spin_unlock_irq()
+   pairs.
+
+torture_runnable  Start locktorture at module init. By default it will begin
+ once the module is loaded.
+
+
+   ** Torture-framework (RCU + locking) **
+
+shutdown_secsThe number of seconds to run the test before terminating
+ the test and powering off the system.  The default is
+ zero, which disables test termination and system shutdown.
+ This capability is useful for automated testing.
+
+onoff_holdoffThe number of seconds between each attempt to execute a
+ randomly selected CPU-hotplug operation.  Defaults to
+ zero, which disables CPU hotplugging.  In HOTPLUG_CPU=n
+ kernels, locktorture will silently refuse to do any
+ CPU-hotplug operations regardless of what value is
+ specified for onoff_interval.
+
+onoff_holdoffThe number of seconds to wait until starting CPU-hotplug
+ operations.  This would normally only be used when
+ locktorture was built into the kernel and started
+ automatically at boot time, in which case it is useful
+ in order to avoid confusing boot-time code with CPUs
+ coming and going. This parameter is only useful if
+ CONFIG_HOTPLUG_CPU is enabled.
+
+stat_intervalNumber of seconds between statistics-related printk()s.
+ By default, locktorture will report stats every 60 seconds.
+ Setting the interval to zero causes the statistics to
+ be printed -only- when the module is unloaded, and this
+ is the default.
+
+stutter  The length of time to run the test before pausing for 
this
+ same period of time.  Defaults to stutter=5, so as
+ to run and pause for (roughly) five-second intervals.
+ Specifying stutter=0 causes the test to run continuously
+ without pausing, which is the old default behavior.
+
+shuffle_interval  The number of seconds to keep the test threads affinitied
+ to a particular subset of the CPUs, defaults to 3 seconds.
+ Used in conjunction with test_no_idle_hz.
+
+verbose  Enable verbose debugging printking, via printk(). 
Enabled
+ by default. This extra information is mostly related to
+ high-level errors and reports from the main 'torture'
+ framework.
+
+
+STATISTICS
+
+Statistics are printed in the following 

[PATCH 6/9] torture: Address race in module cleanup

2014-09-11 Thread Davidlohr Bueso
When performing module cleanups by calling torture_cleanup() the
'torture_type' string in nullified However, callers are not necessarily
done, and might still need to reference the variable. This impacts
both rcutorture and locktorture, causing printing things like:

[   94.226618] (null)-torture: Stopping lock_torture_writer task
[   94.226624] (null)-torture: Stopping lock_torture_stats task

Thus delay this operation until the very end of the cleanup process.
The consequence (which shouldn't matter for this kid of program) is,
of course, that we delay the window between rmmod and modprobing,
for instance in module_torture_begin().

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 include/linux/torture.h  |  3 ++-
 kernel/locking/locktorture.c |  3 ++-
 kernel/rcu/rcutorture.c  |  3 ++-
 kernel/torture.c | 16 +---
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/torture.h b/include/linux/torture.h
index 5ca58fc..301b628 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -77,7 +77,8 @@ int torture_stutter_init(int s);
 /* Initialization and cleanup. */
 bool torture_init_begin(char *ttype, bool v, int *runnable);
 void torture_init_end(void);
-bool torture_cleanup(void);
+bool torture_cleanup_begin(void);
+void torture_cleanup_end(void);
 bool torture_must_stop(void);
 bool torture_must_stop_irq(void);
 void torture_kthread_stopping(char *title);
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index de703a7..988267c 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -361,7 +361,7 @@ static void lock_torture_cleanup(void)
 {
int i;
 
-   if (torture_cleanup())
+   if (torture_cleanup_begin())
return;
 
if (writer_tasks) {
@@ -384,6 +384,7 @@ static void lock_torture_cleanup(void)
else
lock_torture_print_module_parms(cur_ops,
End of test: SUCCESS);
+   torture_cleanup_end();
 }
 
 static int __init lock_torture_init(void)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 948a769..57a2792 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1418,7 +1418,7 @@ rcu_torture_cleanup(void)
int i;
 
rcutorture_record_test_transition();
-   if (torture_cleanup()) {
+   if (torture_cleanup_begin()) {
if (cur_ops-cb_barrier != NULL)
cur_ops-cb_barrier();
return;
@@ -1468,6 +1468,7 @@ rcu_torture_cleanup(void)
   End of test: RCU_HOTPLUG);
else
rcu_torture_print_module_parms(cur_ops, End of test: SUCCESS);
+   torture_cleanup_end();
 }
 
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
diff --git a/kernel/torture.c b/kernel/torture.c
index d600af2..07a5c3d 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -635,8 +635,13 @@ EXPORT_SYMBOL_GPL(torture_init_end);
  *
  * This must be called before the caller starts shutting down its own
  * kthreads.
+ *
+ * Both torture_cleanup_begin() and torture_cleanup_end() must be paired,
+ * in order to correctly perform the cleanup. They are separated because
+ * threads can still need to reference the torture_type type, thus nullify
+ * only after completing all other relevant calls.
  */
-bool torture_cleanup(void)
+bool torture_cleanup_begin(void)
 {
mutex_lock(fullstop_mutex);
if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) {
@@ -651,12 +656,17 @@ bool torture_cleanup(void)
torture_shuffle_cleanup();
torture_stutter_cleanup();
torture_onoff_cleanup();
+   return false;
+}
+EXPORT_SYMBOL_GPL(torture_cleanup_begin);
+
+void torture_cleanup_end(void)
+{
mutex_lock(fullstop_mutex);
torture_type = NULL;
mutex_unlock(fullstop_mutex);
-   return false;
 }
-EXPORT_SYMBOL_GPL(torture_cleanup);
+EXPORT_SYMBOL_GPL(torture_cleanup_end);
 
 /*
  * Is it time for the current torture test to stop?
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] locktorture: Make statistics generic

2014-09-11 Thread Davidlohr Bueso
The statistics structure can serve well for both reader and writer
locks, thus simply rename some fields that mention 'write' and leave
the declaration of lwsa.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 kernel/locking/locktorture.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index a6049fa..de703a7 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -78,11 +78,11 @@ static struct task_struct **writer_tasks;
 static int nrealwriters_stress;
 static bool lock_is_write_held;
 
-struct lock_writer_stress_stats {
-   long n_write_lock_fail;
-   long n_write_lock_acquired;
+struct lock_stress_stats {
+   long n_lock_fail;
+   long n_lock_acquired;
 };
-static struct lock_writer_stress_stats *lwsa;
+static struct lock_stress_stats *lwsa; /* writer statistics */
 
 #if defined(MODULE)
 #define LOCKTORTURE_RUNNABLE_INIT 1
@@ -250,7 +250,7 @@ static struct lock_torture_ops mutex_lock_ops = {
  */
 static int lock_torture_writer(void *arg)
 {
-   struct lock_writer_stress_stats *lwsp = arg;
+   struct lock_stress_stats *lwsp = arg;
static DEFINE_TORTURE_RANDOM(rand);
 
VERBOSE_TOROUT_STRING(lock_torture_writer task started);
@@ -261,9 +261,9 @@ static int lock_torture_writer(void *arg)
schedule_timeout_uninterruptible(1);
cur_ops-writelock();
if (WARN_ON_ONCE(lock_is_write_held))
-   lwsp-n_write_lock_fail++;
+   lwsp-n_lock_fail++;
lock_is_write_held = 1;
-   lwsp-n_write_lock_acquired++;
+   lwsp-n_lock_acquired++;
cur_ops-write_delay(rand);
lock_is_write_held = 0;
cur_ops-writeunlock();
@@ -281,17 +281,17 @@ static void lock_torture_printk(char *page)
bool fail = 0;
int i;
long max = 0;
-   long min = lwsa[0].n_write_lock_acquired;
+   long min = lwsa[0].n_lock_acquired;
long long sum = 0;
 
for (i = 0; i  nrealwriters_stress; i++) {
-   if (lwsa[i].n_write_lock_fail)
+   if (lwsa[i].n_lock_fail)
fail = true;
-   sum += lwsa[i].n_write_lock_acquired;
-   if (max  lwsa[i].n_write_lock_fail)
-   max = lwsa[i].n_write_lock_fail;
-   if (min  lwsa[i].n_write_lock_fail)
-   min = lwsa[i].n_write_lock_fail;
+   sum += lwsa[i].n_lock_acquired;
+   if (max  lwsa[i].n_lock_fail)
+   max = lwsa[i].n_lock_fail;
+   if (min  lwsa[i].n_lock_fail)
+   min = lwsa[i].n_lock_fail;
}
page += sprintf(page, %s%s , torture_type, TORTURE_FLAG);
page += sprintf(page,
@@ -441,8 +441,8 @@ static int __init lock_torture_init(void)
goto unwind;
}
for (i = 0; i  nrealwriters_stress; i++) {
-   lwsa[i].n_write_lock_fail = 0;
-   lwsa[i].n_write_lock_acquired = 0;
+   lwsa[i].n_lock_fail = 0;
+   lwsa[i].n_lock_acquired = 0;
}
 
/* Start up the kthreads. */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/9] locktorture: Rename locktorture_runnable parameter

2014-09-11 Thread Davidlohr Bueso
... to just 'torture_runnable'. It follows other variable naming
and is shorter.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 kernel/locking/locktorture.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 0955b88..8c770b2 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -87,9 +87,9 @@ static struct lock_writer_stress_stats *lwsa;
 #else
 #define LOCKTORTURE_RUNNABLE_INIT 0
 #endif
-int locktorture_runnable = LOCKTORTURE_RUNNABLE_INIT;
-module_param(locktorture_runnable, int, 0444);
-MODULE_PARM_DESC(locktorture_runnable, Start locktorture at module init);
+int torture_runnable = LOCKTORTURE_RUNNABLE_INIT;
+module_param(torture_runnable, int, 0444);
+MODULE_PARM_DESC(torture_runnable, Start locktorture at module init);
 
 /* Forward reference. */
 static void lock_torture_cleanup(void);
@@ -355,7 +355,7 @@ static int __init lock_torture_init(void)
lock_busted_ops, spin_lock_ops, spin_lock_irq_ops,
};
 
-   if (!torture_init_begin(torture_type, verbose, locktorture_runnable))
+   if (!torture_init_begin(torture_type, verbose, torture_runnable))
return -EBUSY;
 
/* Process args and tell the world that the torturer is on the job. */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/9] locktorture: Support mutexes

2014-09-11 Thread Davidlohr Bueso
Add a mutex_lock torture test. The main difference with the already
existing spinlock tests is that the latency of the critical region
is much larger. We randomly delay for (arbitrarily) either 500 ms or,
otherwise, 25 ms. While this can considerably reduce the amount of
writes compared to non blocking locks, if run long enough it can have
the same torturous effect. Furthermore it is more representative of
mutex hold times and can stress better things like thrashing.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 Documentation/locking/locktorture.txt |  2 ++
 kernel/locking/locktorture.c  | 41 +--
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/locking/locktorture.txt 
b/Documentation/locking/locktorture.txt
index c0ab969..6b1e7ca 100644
--- a/Documentation/locking/locktorture.txt
+++ b/Documentation/locking/locktorture.txt
@@ -40,6 +40,8 @@ torture_typeType of lock to torture. By default, only 
spinlocks will
 o spin_lock_irq: spin_lock_irq() and spin_unlock_irq()
pairs.
 
+o mutex_lock: mutex_lock() and mutex_unlock() pairs.
+
 torture_runnable  Start locktorture at module init. By default it will begin
  once the module is loaded.
 
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 8c770b2..414ba45 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -27,6 +27,7 @@
 #include linux/kthread.h
 #include linux/err.h
 #include linux/spinlock.h
+#include linux/mutex.h
 #include linux/smp.h
 #include linux/interrupt.h
 #include linux/sched.h
@@ -66,7 +67,7 @@ torture_param(bool, verbose, true,
 static char *torture_type = spin_lock;
 module_param(torture_type, charp, 0444);
 MODULE_PARM_DESC(torture_type,
-Type of lock to torture (spin_lock, spin_lock_irq, ...));
+Type of lock to torture (spin_lock, spin_lock_irq, 
mutex_lock, ...));
 
 static atomic_t n_lock_torture_errors;
 
@@ -206,6 +207,42 @@ static struct lock_torture_ops spin_lock_irq_ops = {
.name   = spin_lock_irq
 };
 
+static DEFINE_MUTEX(torture_mutex);
+
+static int torture_mutex_lock(void) __acquires(torture_mutex)
+{
+   mutex_lock(torture_mutex);
+   return 0;
+}
+
+static void torture_mutex_delay(struct torture_random_state *trsp)
+{
+   const unsigned long longdelay_ms = 100;
+
+   /* We want a long delay occasionally to force massive contention.  */
+   if (!(torture_random(trsp) %
+ (nrealwriters_stress * 2000 * longdelay_ms)))
+   mdelay(longdelay_ms * 5);
+   else
+   mdelay(longdelay_ms / 5);
+#ifdef CONFIG_PREEMPT
+   if (!(torture_random(trsp) % (nrealwriters_stress * 2)))
+   preempt_schedule();  /* Allow test to be preempted. */
+#endif
+}
+
+static void torture_mutex_unlock(void) __releases(torture_mutex)
+{
+   mutex_unlock(torture_mutex);
+}
+
+static struct lock_torture_ops mutex_lock_ops = {
+   .writelock  = torture_mutex_lock,
+   .write_delay= torture_mutex_delay,
+   .writeunlock= torture_mutex_unlock,
+   .name   = mutex_lock
+};
+
 /*
  * Lock torture writer kthread.  Repeatedly acquires and releases
  * the lock, checking for duplicate acquisitions.
@@ -352,7 +389,7 @@ static int __init lock_torture_init(void)
int i;
int firsterr = 0;
static struct lock_torture_ops *torture_ops[] = {
-   lock_busted_ops, spin_lock_ops, spin_lock_irq_ops,
+   lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, 
mutex_lock_ops,
};
 
if (!torture_init_begin(torture_type, verbose, torture_runnable))
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] locktorture: Teach about lock debugging

2014-09-11 Thread Davidlohr Bueso
Regular locks are very different than locks with debugging. For instance
for mutexes, debugging forces to only take the slowpaths. As such, the
locktorture module should take this into account when printing related
information -- specifically when printing user passed parameters, it seems
the right place for such info.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 kernel/locking/locktorture.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 414ba45..a6049fa 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -64,6 +64,7 @@ torture_param(int, stutter, 5, Number of jiffies to run/halt 
test, 0=disable);
 torture_param(bool, verbose, true,
 Enable verbose debugging printk()s);
 
+static bool debug_lock = false;
 static char *torture_type = spin_lock;
 module_param(torture_type, charp, 0444);
 MODULE_PARM_DESC(torture_type,
@@ -349,8 +350,9 @@ lock_torture_print_module_parms(struct lock_torture_ops 
*cur_ops,
const char *tag)
 {
pr_alert(%s TORTURE_FLAG
---- %s: nwriters_stress=%d stat_interval=%d verbose=%d 
shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d 
onoff_holdoff=%d\n,
-torture_type, tag, nrealwriters_stress, stat_interval, verbose,
+--- %s%s: nwriters_stress=%d stat_interval=%d verbose=%d 
shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d 
onoff_holdoff=%d\n,
+torture_type, tag, debug_lock ?  [debug]: ,
+nrealwriters_stress, stat_interval, verbose,
 shuffle_interval, stutter, shutdown_secs,
 onoff_interval, onoff_holdoff);
 }
@@ -418,6 +420,15 @@ static int __init lock_torture_init(void)
nrealwriters_stress = nwriters_stress;
else
nrealwriters_stress = 2 * num_online_cpus();
+
+#ifdef CONFIG_DEBUG_MUTEXES
+   if (strncmp(torture_type, mutex, 5) == 0)
+   debug_lock = true;
+#endif
+#ifdef CONFIG_DEBUG_SPINLOCK
+   if (strncmp(torture_type, spin, 4) == 0)
+   debug_lock = true;
+#endif
lock_torture_print_module_parms(cur_ops, Start of test);
 
/* Initialize the statistics so that each run gets its own numbers. */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 0/9] locktorture: Improve and expand lock torturing

2014-09-11 Thread Davidlohr Bueso
This set includes general updates throughout the locktorture code.
Particularly support for reader locks are added as well as torturing
mutexes and rwsems. With the recent locking changes, it doesn't hurt
to improve our testing infrastructure, and torturing is definitely
one of them. For specific details about each change, please consult
the actual patches.

o patches 1, 4, 9: misc changes.
o patch 2: new doc, based on rcutorture's.
o patches 3, 8: torture new locking primitives.
o patches 5, 7: add support for reader locks.
o patch 6: fix a minor race in the torture cleanup path.

Really no particular order, please consider for v3.18.

Davidlohr Bueso (9):
  locktorture: Rename locktorture_runnable parameter
  locktorture: Add documentation
  locktorture: Support mutexes
  locktorture: Teach about lock debugging
  locktorture: Make statistics generic
  torture: Address race in module cleanup
  locktorture: Add infraestructure for torturing read locks
  locktorture: Support rwsems
  locktorture: Introduce torture context

 Documentation/locking/locktorture.txt | 140 
 include/linux/torture.h   |   3 +-
 kernel/locking/locktorture.c  | 392 --
 kernel/rcu/rcutorture.c   |   3 +-
 kernel/torture.c  |  16 +-
 5 files changed, 480 insertions(+), 74 deletions(-)
 create mode 100644 Documentation/locking/locktorture.txt

-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] clk: samsung: exynos5260: fix typo in clock name

2014-09-11 Thread Pankaj Dubey
Hi Tomasz,

On Friday, September 12, 2014, Tomasz Figa wrote,
 To: Pankaj Dubey; linux-arm-ker...@lists.infradead.org; linux-samsung-
 s...@vger.kernel.org; linux-kernel@vger.kernel.org
 Cc: kgene@samsung.com; s.nawro...@samsung.com; mturque...@linaro.org;
 Chander Kashyap; Abhilash Kesavan
 Subject: Re: [PATCH] clk: samsung: exynos5260: fix typo in clock name
 
 Hi Pankaj,
 
 On 10.09.2014 07:56, Pankaj Dubey wrote:
  From: Chander Kashyap k.chan...@samsung.com
 
  The parent name added in parent list as
  mout_phyclk_mipi_dphy_4l_m_txbyte_clkhs_p, is different than the
  defined parent due to typo.
 
  Signed-off-by: Abhilash Kesavan a.kesa...@samsung.com
  Signed-off-by: Chander Kashyap k.chan...@samsung.com
 
 Missed your sign-off? You can reply with it inline and I will add it when
applying this
 weekend.
 

OK.

 P.S. Please keep me on CC when sending patches for Samsung clock drivers.

Sure.

Thanks,
Pankaj Dubey
 
 Best regards,
 Tomasz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


vga and 64-bit memcpy's

2014-09-11 Thread Dave Airlie
Got an bug report from someone using a silicon motion video card in
VGA mode about corruption that they tracked down to 64-bit memory
operations not being supported by the video card, it appears that we
probably shouldn't be using  32-bit copies on VGA memory.

The include/linux/vt_buffer.h defaults scr_memcpyw defaults to using
memcpy, which on 64-bit x86 machines ends up as rep movsq. Is there
any way to make this use a 32-bit memcpy instead?

Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/3] sched: BUG when stack end location is over written

2014-09-11 Thread Michael Ellerman
On Thu, 2014-09-11 at 16:41 +0100, Aaron Tomlin wrote:
 diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
 index a285900..2a8280a 100644
 --- a/lib/Kconfig.debug
 +++ b/lib/Kconfig.debug
 @@ -824,6 +824,18 @@ config SCHEDSTATS
 application, you can say N to avoid the very slight overhead
 this adds.
  
 +config SCHED_STACK_END_CHECK
 + bool Detect stack corruption on calls to schedule()
 + depends on DEBUG_KERNEL
 + default y

Did you really mean default y?

Doing so means it will be turned on more or less everywhere, which defeats the
purpose of having a config option in the first place.

 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index ec1a286..0b70b73 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -2660,6 +2660,9 @@ static noinline void __schedule_bug(struct task_struct 
 *prev)
   */
  static inline void schedule_debug(struct task_struct *prev)
  {
 +#ifdef CONFIG_SCHED_STACK_END_CHECK
 + BUG_ON(unlikely(task_stack_end_corrupted(prev)))
 +#endif

If this was my code I'd make you put that in a static inline.

cheers



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] checkpatch: look for common misspellings

2014-09-11 Thread Masanari Iida
Talking about codespell,  it detected 76 informations in 3.17-rc4.
 grep -R informations * |wc -l  found 120 typos.
Test with occured,  codespell found 46,  grep found 110.
Test with reseting case,  codespell found 21, grep found 26.

So I expect about half of the incoming typos will be detected by the tool,
and be fixed.
Masanari
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] ASoC: tda998x: Add a codec to the HDMI transmitter

2014-09-11 Thread Dave Airlie
On 10 September 2014 19:29, Jean-Francois Moine moin...@free.fr wrote:
 This patch adds a CODEC function to the NXP TDA998x HDMI transmitter.

 The CODEC handles both I2S and S/PDIF inputs.
 It maintains the audio format and rate constraints according
 to the HDMI device parameters (EDID) and does dynamic input
 switch in the TDA998x I2C driver on start/stop audio streaming.


You should indicate on subsystem spanning patches what tree you think
should merge it etc.

If other tda998x ppl are okay with it, you can have my ack for merging
via someone else.

Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] drm fixes

2014-09-11 Thread Dave Airlie

Hi Linus,

ast, i915, radeon and msm fixes, all over the place, all fixing build 
issues, regressions, oopses or failure to detect cards.

Dave.

The following changes since commit 7ec62d421bdf29cb31101ae2689f7f3a9906289a:

  Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2014-09-10 
14:04:17 -0700)

are available in the git repository at:


  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 83502a5d34386f7c6973bc70e1c423f55f5a2e3a:

  drm/ast: AST2000 cannot be detected correctly (2014-09-12 13:41:39 +1000)


Alex Deucher (3):
  drm/radeon: only use me/pfp sync on evergreen+
  drm/radeon: add connector quirk for fujitsu board
  drm/radeon/dpm: set the thermal type properly for special configs

Andy Shevchenko (1):
  drm/radeon: reduce memory footprint for debugging

Chris Wilson (2):
  drm/i915: Prevent recursive deadlock on releasing a busy userptr
  drm/i915: Evict CS TLBs between batches

Christian König (1):
  drm/radeon: fix semaphore value init

Daniel Vetter (2):
  drm/i915: Fix EIO/wedged handling in gem fault handler
  drm/i915: Fix irq enable tracking in driver load

Dave Airlie (3):
  Merge branch 'drm-fixes-3.17' of 
git://people.freedesktop.org/~agd5f/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2014-09-10' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes
  Merge branch 'msm-fixes-3.17-rc4' of 
git://people.freedesktop.org/~robclark/linux into drm-fixes

Mark Charlebois (1):
  drm/msm: Change nested function to static function

Rob Clark (2):
  drm/msm/hdmi: fix build break on non-CCF platforms
  drm/msm: don't crash if no msm.vram param

Ville Syrjälä (1):
  drm/i915: Wait for vblank before enabling the TV encoder

Y.C. Chen (2):
  drm/ast: open key before detect chips
  drm/ast: AST2000 cannot be detected correctly

 drivers/gpu/drm/ast/ast_main.c|   3 +-
 drivers/gpu/drm/i915/i915_dma.c   |   9 +-
 drivers/gpu/drm/i915/i915_drv.h   |  10 +-
 drivers/gpu/drm/i915/i915_gem.c   |  11 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c   | 409 +-
 drivers/gpu/drm/i915/i915_reg.h   |  12 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c   |  66 +++--
 drivers/gpu/drm/i915/intel_tv.c   |   4 +
 drivers/gpu/drm/msm/hdmi/hdmi.c   |  46 ++--
 drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c  |  15 +-
 drivers/gpu/drm/msm/msm_drv.c |   2 +-
 drivers/gpu/drm/radeon/atombios_dp.c  |   7 +-
 drivers/gpu/drm/radeon/r600.c |   4 +-
 drivers/gpu/drm/radeon/radeon_atombios.c  |  33 ++-
 drivers/gpu/drm/radeon/radeon_semaphore.c |   2 +-
 15 files changed, 371 insertions(+), 262 deletions(-)

[PATCH] mfd: intel_soc_pmic: Add CONFIG_PM_SLEEP check for suspend_fn/resume_fn

2014-09-11 Thread Jaewon Kim
This patch fix warning message with CONFIG_PM_SLEEP disabled
If CONFIG_PM_SLEEP is not enabled we receive the following warning message:

drivers/mfd/intel_soc_pmic_core.c:118:12:
 warning: 'intel_soc_pmic_suspend' defined but not used

Signed-off-by: Jaewon Kim jaewon02@samsung.com
---
 drivers/mfd/intel_soc_pmic_core.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/intel_soc_pmic_core.c 
b/drivers/mfd/intel_soc_pmic_core.c
index 2720922..df7b064 100644
--- a/drivers/mfd/intel_soc_pmic_core.c
+++ b/drivers/mfd/intel_soc_pmic_core.c
@@ -115,6 +115,7 @@ static void intel_soc_pmic_shutdown(struct i2c_client *i2c)
return;
 }
 
+#if defined(CONFIG_PM_SLEEP)
 static int intel_soc_pmic_suspend(struct device *dev)
 {
struct intel_soc_pmic *pmic = dev_get_drvdata(dev);
@@ -132,6 +133,7 @@ static int intel_soc_pmic_resume(struct device *dev)
 
return 0;
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(intel_soc_pmic_pm_ops, intel_soc_pmic_suspend,
 intel_soc_pmic_resume);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] locktorture: Support rwsems

2014-09-11 Thread Davidlohr Bueso
We can easily do so with our new reader lock support. Just an arbitrary
design default: readers have higher (5x) critical region latencies than
writers: 50 ms and 10 ms, respectively.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 Documentation/locking/locktorture.txt |  2 ++
 kernel/locking/locktorture.c  | 68 ++-
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/Documentation/locking/locktorture.txt 
b/Documentation/locking/locktorture.txt
index 1bdeb71..f7d99e2 100644
--- a/Documentation/locking/locktorture.txt
+++ b/Documentation/locking/locktorture.txt
@@ -47,6 +47,8 @@ torture_typeType of lock to torture. By default, only 
spinlocks will
 
 o mutex_lock: mutex_lock() and mutex_unlock() pairs.
 
+o rwsem_lock: read/write down() and up() semaphore pairs.
+
 torture_runnable  Start locktorture at module init. By default it will begin
  once the module is loaded.
 
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index c1073d7..8480118 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -265,6 +265,71 @@ static struct lock_torture_ops mutex_lock_ops = {
.name   = mutex_lock
 };
 
+static DECLARE_RWSEM(torture_rwsem);
+static int torture_rwsem_down_write(void) __acquires(torture_rwsem)
+{
+   down_write(torture_rwsem);
+   return 0;
+}
+
+static void torture_rwsem_write_delay(struct torture_random_state *trsp)
+{
+   const unsigned long longdelay_ms = 100;
+
+   /* We want a long delay occasionally to force massive contention.  */
+   if (!(torture_random(trsp) %
+ (nrealwriters_stress * 2000 * longdelay_ms)))
+   mdelay(longdelay_ms * 10);
+   else
+   mdelay(longdelay_ms / 10);
+#ifdef CONFIG_PREEMPT
+   if (!(torture_random(trsp) % (nrealwriters_stress * 2)))
+   preempt_schedule();  /* Allow test to be preempted. */
+#endif
+}
+
+static void torture_rwsem_up_write(void) __releases(torture_rwsem)
+{
+   up_write(torture_rwsem);
+}
+
+static int torture_rwsem_down_read(void) __acquires(torture_rwsem)
+{
+   down_read(torture_rwsem);
+   return 0;
+}
+
+static void torture_rwsem_read_delay(struct torture_random_state *trsp)
+{
+   const unsigned long longdelay_ms = 100;
+
+   /* We want a long delay occasionally to force massive contention.  */
+   if (!(torture_random(trsp) %
+ (nrealwriters_stress * 2000 * longdelay_ms)))
+   mdelay(longdelay_ms * 2);
+   else
+   mdelay(longdelay_ms / 2);
+#ifdef CONFIG_PREEMPT
+   if (!(torture_random(trsp) % (nrealreaders_stress * 2)))
+   preempt_schedule();  /* Allow test to be preempted. */
+#endif
+}
+
+static void torture_rwsem_up_read(void) __releases(torture_rwsem)
+{
+   up_read(torture_rwsem);
+}
+
+static struct lock_torture_ops rwsem_lock_ops = {
+   .writelock  = torture_rwsem_down_write,
+   .write_delay= torture_rwsem_write_delay,
+   .writeunlock= torture_rwsem_up_write,
+   .readlock   = torture_rwsem_down_read,
+   .read_delay = torture_rwsem_read_delay,
+   .readunlock = torture_rwsem_up_read,
+   .name   = rwsem_lock
+};
+
 /*
  * Lock torture writer kthread.  Repeatedly acquires and releases
  * the lock, checking for duplicate acquisitions.
@@ -467,7 +532,8 @@ static int __init lock_torture_init(void)
int i, j;
int firsterr = 0;
static struct lock_torture_ops *torture_ops[] = {
-   lock_busted_ops, spin_lock_ops, spin_lock_irq_ops, 
mutex_lock_ops,
+   lock_busted_ops, spin_lock_ops, spin_lock_irq_ops,
+   mutex_lock_ops, rwsem_lock_ops,
};
 
if (!torture_init_begin(torture_type, verbose, torture_runnable))
-- 
1.8.4.5



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] locktorture: Add infrastructure for torturing read locks

2014-09-11 Thread Davidlohr Bueso
Most of it is based on what we already have for writers. This allows
readers to be very independent (and thus configurable), enabling
future module parameters to control things such as rw distribution.
Furthermore, readers have their own delaying function, allowing us
to test different rw critical region latencies, and stress locking
internals. Similarly, statistics, for now will only serve for the
number of lock acquisitions -- as opposed to writers, readers have
no failure detection.

In addition, introduce a new nreaders_stress module parameter. The
default number of readers will be the same number of writers threads.
Writer threads are interleaved with readers. Documentation is updated,
respectively.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 Documentation/locking/locktorture.txt |  16 +++-
 kernel/locking/locktorture.c  | 176 ++
 2 files changed, 168 insertions(+), 24 deletions(-)

diff --git a/Documentation/locking/locktorture.txt 
b/Documentation/locking/locktorture.txt
index 6b1e7ca..1bdeb71 100644
--- a/Documentation/locking/locktorture.txt
+++ b/Documentation/locking/locktorture.txt
@@ -29,6 +29,11 @@ nwriters_stress   Number of kernel threads that will stress 
exclusive lock
  ownership (writers). The default value is twice the amount
  of online CPUs.
 
+nreaders_stress   Number of kernel threads that will stress shared lock
+ ownership (readers). The default is the same amount of writer
+ locks. If the user did not specify nwriters_stress, then
+ both readers and writers be the amount of online CPUs.
+
 torture_type Type of lock to torture. By default, only spinlocks will
  be tortured. This module can torture the following locks,
  with string values as follows:
@@ -95,15 +100,18 @@ STATISTICS
 Statistics are printed in the following format:
 
 spin_lock-torture: Writes:  Total: 93746064  Max/Min: 0/0   Fail: 0
-   (A)(B)(C)  (D)
+   (A) (B)(C)(D)  (E)
 
 (A): Lock type that is being tortured -- torture_type parameter.
 
-(B): Number of times the lock was acquired.
+(B): Number of writer lock acquisitions. If dealing with a read/write primitive
+ a second Reads statistics line is printed.
+
+(C): Number of times the lock was acquired.
 
-(C): Min and max number of times threads failed to acquire the lock.
+(D): Min and max number of times threads failed to acquire the lock.
 
-(D): true/false values if there were errors acquiring the lock. This should
+(E): true/false values if there were errors acquiring the lock. This should
  -only- be positive if there is a bug in the locking primitive's
  implementation. Otherwise a lock should never fail (ie: spin_lock()).
  Of course, the same applies for (C), above. A dummy example of this is
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 988267c..c1073d7 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -52,6 +52,8 @@ MODULE_AUTHOR(Paul E. McKenney paul...@us.ibm.com);
 
 torture_param(int, nwriters_stress, -1,
 Number of write-locking stress-test threads);
+torture_param(int, nreaders_stress, -1,
+Number of read-locking stress-test threads);
 torture_param(int, onoff_holdoff, 0, Time after boot before CPU hotplugs 
(s));
 torture_param(int, onoff_interval, 0,
 Time between CPU hotplugs (s), 0=disable);
@@ -74,15 +76,19 @@ static atomic_t n_lock_torture_errors;
 
 static struct task_struct *stats_task;
 static struct task_struct **writer_tasks;
+static struct task_struct **reader_tasks;
 
 static int nrealwriters_stress;
 static bool lock_is_write_held;
+static int nrealreaders_stress;
+static bool lock_is_read_held;
 
 struct lock_stress_stats {
long n_lock_fail;
long n_lock_acquired;
 };
 static struct lock_stress_stats *lwsa; /* writer statistics */
+static struct lock_stress_stats *lrsa; /* reader statistics */
 
 #if defined(MODULE)
 #define LOCKTORTURE_RUNNABLE_INIT 1
@@ -104,6 +110,9 @@ struct lock_torture_ops {
int (*writelock)(void);
void (*write_delay)(struct torture_random_state *trsp);
void (*writeunlock)(void);
+   int (*readlock)(void);
+   void (*read_delay)(struct torture_random_state *trsp);
+   void (*readunlock)(void);
unsigned long flags;
const char *name;
 };
@@ -142,6 +151,9 @@ static struct lock_torture_ops lock_busted_ops = {
.writelock  = torture_lock_busted_write_lock,
.write_delay= torture_lock_busted_write_delay,
.writeunlock= torture_lock_busted_write_unlock,
+   .readlock   = NULL,
+   .read_delay = NULL,
+   .readunlock = NULL,
.name   = lock_busted
 };
 
@@ -182,6 +194,9 @@ static struct lock_torture_ops 

Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information

2014-09-11 Thread H. Peter Anvin

On 09/11/2014 04:37 PM, Thomas Gleixner wrote:


Specifically because marshaling the data in and out of the generic
decoder was more complex than a special-purpose decoder.


I did not look at that detail and I trust your judgement here, but
that is in no way explained in the changelog.

This whole patchset is a pain to review due to half baken changelogs
and complete lack of a proper design description.



I'm not wedded to that concept, by the way, but using the generic parser 
had a whole bunch of its own problems, including the fact that you're 
getting bytes from user space.


It might be worthwhile to compare the older patchset which did use the 
generic parser to make sure that it actually made sense.


-hpa




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9] locktorture: Introduce torture context

2014-09-11 Thread Davidlohr Bueso
The amount of global variables is getting pretty ugly. Group variables
related to the execution (ie: not parameters) in a new context structure.

Signed-off-by: Davidlohr Bueso dbu...@suse.de
---
 kernel/locking/locktorture.c | 161 ++-
 1 file changed, 82 insertions(+), 79 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 8480118..540d5df 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -66,29 +66,22 @@ torture_param(int, stutter, 5, Number of jiffies to 
run/halt test, 0=disable);
 torture_param(bool, verbose, true,
 Enable verbose debugging printk()s);
 
-static bool debug_lock = false;
 static char *torture_type = spin_lock;
 module_param(torture_type, charp, 0444);
 MODULE_PARM_DESC(torture_type,
 Type of lock to torture (spin_lock, spin_lock_irq, 
mutex_lock, ...));
 
-static atomic_t n_lock_torture_errors;
-
 static struct task_struct *stats_task;
 static struct task_struct **writer_tasks;
 static struct task_struct **reader_tasks;
 
-static int nrealwriters_stress;
 static bool lock_is_write_held;
-static int nrealreaders_stress;
 static bool lock_is_read_held;
 
 struct lock_stress_stats {
long n_lock_fail;
long n_lock_acquired;
 };
-static struct lock_stress_stats *lwsa; /* writer statistics */
-static struct lock_stress_stats *lrsa; /* reader statistics */
 
 #if defined(MODULE)
 #define LOCKTORTURE_RUNNABLE_INIT 1
@@ -117,8 +110,18 @@ struct lock_torture_ops {
const char *name;
 };
 
-static struct lock_torture_ops *cur_ops;
-
+struct lock_torture_cxt {
+   int nrealwriters_stress;
+   int nrealreaders_stress;
+   bool debug_lock;
+   atomic_t n_lock_torture_errors;
+   struct lock_torture_ops *cur_ops;
+   struct lock_stress_stats *lwsa; /* writer statistics */
+   struct lock_stress_stats *lrsa; /* reader statistics */
+};
+static struct lock_torture_cxt cxt = { 0, 0, false,
+  ATOMIC_INIT(0),
+  NULL, NULL};
 /*
  * Definitions for lock torture testing.
  */
@@ -134,10 +137,10 @@ static void torture_lock_busted_write_delay(struct 
torture_random_state *trsp)
 
/* We want a long delay occasionally to force massive contention.  */
if (!(torture_random(trsp) %
- (nrealwriters_stress * 2000 * longdelay_us)))
+ (cxt.nrealwriters_stress * 2000 * longdelay_us)))
mdelay(longdelay_us);
 #ifdef CONFIG_PREEMPT
-   if (!(torture_random(trsp) % (nrealwriters_stress * 2)))
+   if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2)))
preempt_schedule();  /* Allow test to be preempted. */
 #endif
 }
@@ -174,13 +177,13 @@ static void torture_spin_lock_write_delay(struct 
torture_random_state *trsp)
 * we want a long delay occasionally to force massive contention.
 */
if (!(torture_random(trsp) %
- (nrealwriters_stress * 2000 * longdelay_us)))
+ (cxt.nrealwriters_stress * 2000 * longdelay_us)))
mdelay(longdelay_us);
if (!(torture_random(trsp) %
- (nrealwriters_stress * 2 * shortdelay_us)))
+ (cxt.nrealwriters_stress * 2 * shortdelay_us)))
udelay(shortdelay_us);
 #ifdef CONFIG_PREEMPT
-   if (!(torture_random(trsp) % (nrealwriters_stress * 2)))
+   if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2)))
preempt_schedule();  /* Allow test to be preempted. */
 #endif
 }
@@ -206,14 +209,14 @@ __acquires(torture_spinlock_irq)
unsigned long flags;
 
spin_lock_irqsave(torture_spinlock, flags);
-   cur_ops-flags = flags;
+   cxt.cur_ops-flags = flags;
return 0;
 }
 
 static void torture_lock_spin_write_unlock_irq(void)
 __releases(torture_spinlock)
 {
-   spin_unlock_irqrestore(torture_spinlock, cur_ops-flags);
+   spin_unlock_irqrestore(torture_spinlock, cxt.cur_ops-flags);
 }
 
 static struct lock_torture_ops spin_lock_irq_ops = {
@@ -240,12 +243,12 @@ static void torture_mutex_delay(struct 
torture_random_state *trsp)
 
/* We want a long delay occasionally to force massive contention.  */
if (!(torture_random(trsp) %
- (nrealwriters_stress * 2000 * longdelay_ms)))
+ (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
mdelay(longdelay_ms * 5);
else
mdelay(longdelay_ms / 5);
 #ifdef CONFIG_PREEMPT
-   if (!(torture_random(trsp) % (nrealwriters_stress * 2)))
+   if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2)))
preempt_schedule();  /* Allow test to be preempted. */
 #endif
 }
@@ -278,12 +281,12 @@ static void torture_rwsem_write_delay(struct 
torture_random_state *trsp)
 
/* We want a long delay occasionally to force massive contention.  */
if 

Re: [PATCH v2] checkpatch: look for common misspellings

2014-09-11 Thread Joe Perches
On Fri, 2014-09-12 at 13:09 +0900, Masanari Iida wrote:
 Test with reseting case,  codespell found 21, grep found 26.

Hello Masanari.

How did codespell find any uses of reseting?
What version of codespell are you using?
(I tested with 1.7)

Looking at the git tree for codespell,
https://github.com/lucasdemarchi/codespell.git
the dictionary there doesn't have reseting.

If I add reseting-resetting to the dictionary,
then codespell finds the same 31 uses that
git grep -i does.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] zsmalloc: fix init_zspage free obj linking

2014-09-11 Thread Minchan Kim
On Thu, Sep 11, 2014 at 04:53:52PM -0400, Dan Streetman wrote:
 When zsmalloc creates a new zspage, it initializes each object it contains
 with a link to the next object, so that the zspage has a singly-linked list
 of its free objects.  However, the logic that sets up the links is wrong,
 and in the case of objects that are precisely aligned with the page boundries
 (e.g. a zspage with objects that are 1/2 PAGE_SIZE) the first object on the
 next page is skipped, due to incrementing the offset twice.  The logic can be
 simplified, as it doesn't need to calculate how many objects can fit on the
 current page; simply checking the offset for each object is enough.

If objects are precisely aligned with the page boundary, pages_per_zspage
should be 1 so there is no next page.

 
 Change zsmalloc init_zspage() logic to iterate through each object on
 each of its pages, checking the offset to verify the object is on the
 current page before linking it into the zspage.
 
 Signed-off-by: Dan Streetman ddstr...@ieee.org
 Cc: Minchan Kim minc...@kernel.org
 ---
  mm/zsmalloc.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)
 
 diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
 index c4a9157..03aa72f 100644
 --- a/mm/zsmalloc.c
 +++ b/mm/zsmalloc.c
 @@ -628,7 +628,7 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
   while (page) {
   struct page *next_page;
   struct link_free *link;
 - unsigned int i, objs_on_page;
 + unsigned int i = 1;
  
   /*
* page-index stores offset of first object starting
 @@ -641,14 +641,10 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
  
   link = (struct link_free *)kmap_atomic(page) +
   off / sizeof(*link);
 - objs_on_page = (PAGE_SIZE - off) / class-size;
  
 - for (i = 1; i = objs_on_page; i++) {
 - off += class-size;
 - if (off  PAGE_SIZE) {
 - link-next = obj_location_to_handle(page, i);
 - link += class-size / sizeof(*link);
 - }
 + while ((off += class-size)  PAGE_SIZE) {
 + link-next = obj_location_to_handle(page, i++);
 + link += class-size / sizeof(*link);
   }
  
   /*
 @@ -660,7 +656,7 @@ static void init_zspage(struct page *first_page, struct 
 size_class *class)
   link-next = obj_location_to_handle(next_page, 0);
   kunmap_atomic(link);
   page = next_page;
 - off = (off + class-size) % PAGE_SIZE;
 + off %= PAGE_SIZE;
   }
  }
  
 -- 
 1.8.3.1
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables

2014-09-11 Thread Dave Hansen
On 09/11/2014 08:02 PM, Ren, Qiaowei wrote:
 On 2014-09-11, Hansen, Dave wrote:
 On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
 + * This function will be called by do_munmap(), and the VMAs
 + covering
 + * the virtual address region start...end have already been split
 + if
 + * necessary and remvoed from the VMA list.

 remvoed - removed

 +void mpx_unmap(struct mm_struct *mm,
 +   unsigned long start, unsigned long end) {
 +   int ret;
 +
 +   ret = mpx_try_unmap(mm, start, end);
 +   if (ret == -EINVAL)
 +   force_sig(SIGSEGV, current);
 +}
 
 In the case of a fault during an unmap, this just ignores the 
 situation and returns silently.  Where is the code to retry the 
 freeing operation outside of mmap_sem?
 
 Dave, you mean delayed_work code? According to our discussion, it
 will be deferred to another mainline post.

OK, fine.  Just please call that out in the description.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    8   9   10   11   12   13   14   15   16   17   >