[GIT PULL] ARC urgent fixes for 4.4-rc1

2015-11-13 Thread Vineet Gupta
Hi Linus,

Found a couple of brown paper bag bugs with the prev pull request (including a 
SMP
build breakage report from Guenter). Since these are urgent I also decided to 
send
over a bunch of other pending fixes which could have otherwise waited an rc or
two. Please pull.

Thx,
-Vineet
->
The following changes since commit 5a364c2a1762e8a78721fafc93144509c0b6cb84:

  ARC: mm: PAE40 support (2015-10-29 18:41:30 +0530)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git/ 
tags/arc-4.4-rc1-part2

for you to fetch changes up to 30b9dbee895ff0d5cbf155bd1ef3f0f5992bca6f:

  ARC: Fix silly typo in MAINTAINERS file (2015-11-14 13:12:31 +0530)


ARC fixes for 4.4-rc1
- A bunch of brown paper bag bugs (MAINTAINERS list email, SMP build failure)
- cpu_relax() now compiler barrier for UP as well
- Handling of userspace Bus Errors for ARCompact builds


Vineet Gupta (6):
  ARCv2: lib: memcpy: use local symbols
  ARC: remove extraneous header include
  ARC: [arcompact] Handle bus error from userspace as Interrupt not 
exception
  ARC: use ASL assembler mnemonic
  ARC: cpu_relax() to be compiler barrier even for UP
  ARC: Fix silly typo in MAINTAINERS file

 MAINTAINERS  |  2 +-
 arch/arc/include/asm/processor.h |  4 
 arch/arc/kernel/entry-arcv2.S| 19 +++
 arch/arc/kernel/entry-compact.S  | 29 ++
 arch/arc/kernel/entry.S  | 17 -
 arch/arc/lib/memcpy-archs.S  | 52 
 arch/arc/mm/tlbex.S  |  6 ++---
 arch/arc/plat-sim/platform.c |  1 -
 8 files changed, 74 insertions(+), 56 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: multi-codec support for arizona-ldo1 was Re: System with multiple arizona (wm5102) codecs

2015-11-13 Thread Pavel Machek

On Fri 2015-11-13 22:53:55, Mark Brown wrote:
> On Fri, Nov 13, 2015 at 10:58:12PM +0100, Pavel Machek wrote:
> > On Tue 2015-10-13 12:53:55, Mark Brown wrote:
> > > On Mon, Oct 12, 2015 at 10:11:38PM +0200, Pavel Machek wrote:
> 
> > > > > No, you definitely shouldn't be doing this - the regulator names 
> > > > > should
> > > > > reflect the names the device has in the datasheet to aid people in 
> > > > > going
> > > > > from software to the hardware and back again.  They shouldn't be
> > > > > dynamically generated at runtime.  If you need to namespace by
> > > > device
> 
> > Ok. But I'd still like to get it working.
> 
> So as I've been saying use the existing interfaces, or extend them as
> needed.

Obviously I'll need to use the existing interfaces, or extend them as
needed. I'd expect subsystem maintainer to know if the existing
interfaces are ok or what needs to be fixed, but it seems you either
don't know how your subsystem works, or are not willing to tell me.

Is there someone else I should talk to with respect to regulators-ALSA
interface?

Thanks,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Build failure in -next: Building arc:vdk_hs38_smp_defconfig ... failed

2015-11-13 Thread Vineet Gupta
On Friday 13 November 2015 11:31 AM, Guenter Roeck wrote:
> Seen since next-20151105.
>
> Building arc:vdk_hs38_smp_defconfig ... failed
> --
> Error log:
> In file included from ./arch/arc/include/asm/irqflags-arcv2.h:12:0,
>   from ./arch/arc/include/asm/irqflags.h:16,
>   from include/linux/irqflags.h:15,
>   from include/linux/spinlock.h:53,
>   from include/linux/rcupdate.h:38,
>   from include/linux/idr.h:18,
>   from include/linux/kernfs.h:14,
>   from include/linux/sysfs.h:15,
>   from include/linux/kobject.h:21,
>   from include/linux/device.h:17,
>   from include/linux/of_platform.h:14,
>   from arch/arc/plat-axs10x/axs10x.c:17:
> arch/arc/plat-axs10x/axs10x.c: In function ‘axs103_early_init’:
> arch/arc/plat-axs10x/axs10x.c:401:41: error: ‘ARC_REG_MCIP_BCR’ undeclared 
> (first use in this function)
>unsigned int num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
>   ^
> ./arch/arc/include/asm/arcregs.h:119:44: note: in definition of macro 
> ‘read_aux_reg’
>   #define read_aux_reg(reg) __builtin_arc_lr(reg)
>  ^
> arch/arc/plat-axs10x/axs10x.c:401:41: note: each undeclared identifier is 
> reported only once for each function it appears in
>unsigned int num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
>   ^
> ./arch/arc/include/asm/arcregs.h:119:44: note: in definition of macro 
> ‘read_aux_reg’
>   #define read_aux_reg(reg) __builtin_arc_lr(reg)
>
> Seems to be caused by commit f78442cc68a1 ("ARC: remove extraneous header 
> include")
> which afaics removes the include file providing the missing define.
>
> Guenter

Hi Guenter,

Thx for the report. Indeed that was the bugger.
Fix pushed to for-next - will be Linus' way soon !

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] usb: core: lpm: fix usb3_hardware_lpm sysfs node

2015-11-13 Thread Lu Baolu



On 11/13/2015 11:28 PM, Alan Stern wrote:

On Fri, 13 Nov 2015, Lu, Baolu wrote:


On 11/13/2015 12:20 AM, Alan Stern wrote:

On Thu, 12 Nov 2015, Lu Baolu wrote:


Commit 655fe4effe0f ("usbcore: add sysfs support to xHCI usb3
hardware LPM") introduced usb3_hardware_lpm sysfs node. This
doesn't show the correct status of USB3 U1 and U2 LPM status.

This patch fixes this by replacing usb3_hardware_lpm with two
nodes, usb3_hardware_lpm_u1 (for U1) and usb3_hardware_lpm_u2
(for U2), and recording the U1/U2 LPM status in right places.

This patch should be back-ported to kernels as old as 4.3,
that contains Commit 655fe4effe0f ("usbcore: add sysfs support
to xHCI usb3 hardware LPM").

Cc: sta...@vger.kernel.org
Signed-off-by: Lu Baolu 

...


--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3875,17 +3875,23 @@ static void usb_enable_link_state(struct usb_hcd *hcd, 
struct usb_device *udev,
return;
}
   
-	if (usb_set_lpm_timeout(udev, state, timeout))

+   ret = usb_set_lpm_timeout(udev, state, timeout);
+   if (ret)
/* If we can't set the parent hub U1/U2 timeout,
 * device-initiated LPM won't be allowed either, so let the xHCI
 * host know that this link state won't be enabled.
 */
hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state);
-
/* Only a configured device will accept the Set Feature U1/U2_ENABLE */
else if (udev->actconfig)
usb_set_device_initiated_lpm(udev, state, true);
   
+	if (!ret) {

+   if (state == USB3_LPM_U1)
+   udev->usb3_lpm_u1_enabled = 1;
+   else if (state == USB3_LPM_U2)
+   udev->usb3_lpm_u2_enabled = 1;
+   }

This doesn't look right at all.  What happens if ret is 0 but the
device isn't configured?  You'll set the usb3_lpm_u*_enabled flag even
though LPM isn't really enabled.

Don't you want to set these flags inside the
usb_set_device_initiated_lpm() function, where you know whether the
action succeeded?  And leave this routine unchanged?

My understand is that both hub and device can initiate LPM.
As soon as usb_set_lpm_timeout(valid_timeout_value)
returns 0, the hub-initiated LPM is enabled. Thus, LPM is
enabled no matter the result of usb_set_device_initiated_lpm().
The only difference is whether device is able to initiate LPM.

On disable side, as soon as usb_set_lpm_timeout(0) return 0,
hub initiated LPM is disabled. Hub will disallows link to enter
U1/U2 as well, even device is initiating LPM. Hence LPM
is disabled as soon as hub LPM timeout set to 0, no matter
device-initiated LPM is disabled or not.

Then maybe you can add a comment explaining this.


Yes, I will add comments for this.



The patch still looks strange, though.  Your new code does this:

ret = usb_set_lpm_timeout(...);
if (ret)
...
else if (udev->actconfig)
...
if (!ret) {
if (state == USB3_LPM_U1)
...
}

It would be better to do this:

if (usb_set_lpm_timeout(...)) {
...
} else {
if (udev->actconfig)
...
if (state == USB3_LPM_U1)
...
}


Yes, this looks better. I will refactor this part of code.



Alan Stern



Thank you.
-Baolu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-13 Thread Christoph Hellwig
On Fri, Nov 13, 2015 at 03:06:36PM -0700, Jason Gunthorpe wrote:
> Looking at that thread and then at the patch a bit more..
> 
> +void ib_process_cq_direct(struct ib_cq *cq)
> [..]
> + __ib_process_cq(cq, INT_MAX);
> 
> INT_MAX is not enough, it needs to loop.
> This is missing a ib_req_notify also.

No.  Direct cases _never_ calls ib_req_notify.  Its for the case where
the SRP case polls the send CQ only from the same context it sends for
without any interrupt notification at al.

> +static int __ib_process_cq(struct ib_cq *cq, int budget)
> + while ((n = ib_poll_cq(cq, IB_POLL_BATCH, cq->wc)) > 0) {
> 
> Does an unnecessary ib_poll_cq call in common cases. I'd suggest
> change the result to bool and do:
> 
> // true return means the caller should attempt ib_req_notify_cq
> while ((n = ib_poll_cq(cq, IB_POLL_BATCH, cq->wc)) > 0) {
>  for (...)
>  if (n != IB_POLL_BATCH)
>return true;
>  completed += n;
>  if (completed > budget)
> return false;
> }
> return true;
> 
> And then change call site like:
> 
> static void ib_cq_poll_work(struct work_struct *work)
> {
> if (__ib_process_cq(...))
> if (ib_req_notify_cq(cq, IB_POLL_FLAGS) == 0)
>   return;
> // Else we need to loop again.
> queue_work(ib_comp_wq, &cq->work);
> }
> 
> Which avoids the rearm.
> 
> void ib_process_cq_direct(struct ib_cq *cq)
> {
>while (1) {
>if (__ib_process_cq(..) &&
>ib_req_notify_cq(cq, IB_POLL_FLAGS) == 0)
>return;
>}
> }
> 
> Which adds the inf loop and rearm.
> 
> etc for softirq

For the workqueue and softirq cases this looks reasonable.  For the
direct case there is no rearming, though.

> Perhaps ib_req_notify_cq should be folded into __ib_process_cq, then
> it can trivially honour the budget on additional loops from
> IB_CQ_REPORT_MISSED_EVENTS.

Which also defeats this proposal.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-13 Thread Christoph Hellwig
On Fri, Nov 13, 2015 at 11:25:13AM -0700, Jason Gunthorpe wrote:
> For instance, like this, not fulling draining the cq and then doing:
> 
> > +   completed = __ib_process_cq(cq, budget);
> > +   if (completed < budget) {
> > +   irq_poll_complete(&cq->iop);
> > +   if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) {
> 
> Doesn't seem entirely right? There is no point in calling
> ib_req_notify_cq if the code knows there is still stuff in the CQ and
> has already, independently, arranged for ib_poll_hander to be
> guarenteed called.

The code only calls ib_req_notify_cq if it knowns we finished earlier than
our budget.

> > +   completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
> > +   if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
> > +   ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> > +   queue_work(ib_comp_wq, &cq->work);
> 
> Same comment here..


Same here - we only requeue the work item if either we processed all of
our budget, or ib_req_notify_cq with IB_CQ_REPORT_MISSED_EVENTS told
us that we need to poll again.

> I understand several drivers are not using a hard irq context for the
> comp_handler call back. Is there any way to exploit that in this new
> API so we don't have to do so many context switches? Ie if the driver
> already is using a softirq when calling comp_handler can we somehow
> just rig ib_poll_handler directly and avoid the overhead? (Future)

Let's say this API makes it possible.  I still don't think moving the
whole budget and rearm logic into the LLD is necessarily a good idea
if we can avoid it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: most: aim-cdev: Used "==" instead of assignment

2015-11-13 Thread Sudip Mukherjee
On Sat, Nov 14, 2015 at 09:57:10AM +0530, Anjali Menon wrote:
> Used double equal sign instead of equal to sign in the if condition
> to remove the error detected by checkpatch.pl.
> 
> ERROR: do not use assignment in if condition
> 
> Signed-off-by: Anjali Menon 
> ---
>  drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c | 2 +-

something wrong with your tree. This path doesnot exist.

But in any case, i think the patch is wrong. wait_event_interruptible()
is executed when most_get_mbo() returns NULL. so if you do
mbo == most_get_mbo() it will immediately come out of sleep as the
condition is true. I think here the intention was to sleep as long as
most_get_mbo() returns NULL. And when it returns a valid pointer it is
saved in mbo variable so that it can be used later in copy_from_user().

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-13 Thread Christoph Hellwig
On Fri, Nov 13, 2015 at 10:16:04AM -0600, Steve Wise wrote:
> So how can we do this for iwarp?  It seems like all that might be needed is 
> to modify the QP state to idle, retrying until it succeeds:
>
>If the QP is transitioning to the Error state, or has not yet
>finished flushing the Work Queues, a Modify QP request to transition
>to the IDLE state MUST fail with an Immediate Error. If none of the
>prior conditions are true, a Modify QP to the Idle state MUST take
>the QP to the Idle state. No other state transitions out of Error
>are supported. Any attempt to transition the QP to a state other
>than Idle MUST result in an Immediate Error.

Can you try to write up some code for this?  We could then wire it up
in the common helper.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] arm: kvm: Fix STRICT_MM_TYPECHECK errors

2015-11-13 Thread Ard Biesheuvel
On 11 November 2015 at 03:03, Laura Abbott  wrote:
>
> PAGE_S2_DEVICE is a pgprot val and needs to be accessed using the proper
> accessors. Switch to these accessors to avoid errors with
> STRICT_MM_TYPECHECK.
>
> Signed-off-by: Laura Abbott 
> ---
> Found in the course of other work

Already fixed here:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/142953

Looks like we may need a mutex :-)

> ---
>  arch/arm/kvm/mmu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6984342..43f8162 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -213,7 +213,8 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> +   if ((pte_val(old_pte) & pgprot_val(PAGE_S2_DEVICE)) !=
> +pgprot_val(PAGE_S2_DEVICE))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -306,7 +307,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> pte = pte_offset_kernel(pmd, addr);
> do {
> if (!pte_none(*pte) &&
> -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> +   (pte_val(*pte) & pgprot_val(PAGE_S2_DEVICE)) !=
> +pgprot_val(PAGE_S2_DEVICE))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.5.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-13 Thread Christoph Hellwig
On Fri, Nov 13, 2015 at 11:19:24AM -0800, Bart Van Assche wrote:
> On 11/13/2015 05:46 AM, Christoph Hellwig wrote:
>> The new name is irq_poll as iopoll is already taken.  Better suggestions
>> welcome.
>
> Hello Christoph,
>
> Would it be possible to provide more background information about this ? 
> Which other kernel subsystem is using the name iopoll ?

Take a look at include/linux/iopoll.h  - I can't reaplly make much sense
of it to be honest, but it's used in a quite a few places.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-13 Thread Christoph Hellwig
On Fri, Nov 13, 2015 at 05:23:39PM +0200, Or Gerlitz wrote:
> On Fri, Nov 13, 2015 at 3:46 PM, Christoph Hellwig  wrote:
> > The new name is irq_poll as iopoll is already taken.  Better suggestions
> > welcome.
> 
> Sagi (or Christoph if you can address that),
> 
> @ some pointer over the last 18 months there was a port done at
> mellanox for iser to use blk-iopoll and AFAIR it didn't work well or
> didn't work at all. Can you tell now what was the problem and how did
> you address it at your generalization?

Hi Or,

Sagi mentioned last time he tried a similar approach in iSER he saw
some large latency sparks.  We've seen nothing worse than the original
approach.  The Flash memory summit slide set has some numbers:

http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FA11_Bandic.pdf

they aren't quite up to date, but the latency distribution hasn't
really changed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] Staging: dgnc: dgnc_neo.c Braces {} should be used on all arms of this statement

2015-11-13 Thread Sudip Mukherjee
On Fri, Nov 13, 2015 at 09:03:56PM +0530, Nizam Haider wrote:
> Fix Checlpatch warning
> HECK: braces {} should be used on all arms of this statement
> 
> Signed-off-by: Nizam Haider 
> ---
>  drivers/staging/dgnc/dgnc_neo.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/dgnc/dgnc_neo.c b/drivers/staging/dgnc/dgnc_neo.c
> index e980150..99b230f 100644
> --- a/drivers/staging/dgnc/dgnc_neo.c
> +++ b/drivers/staging/dgnc/dgnc_neo.c
> @@ -1108,9 +1108,10 @@ static void neo_copy_data_from_uart_to_queue(struct 
> channel_t *ch)
>* On the other hand, if the UART IS in FIFO mode, then ask
>* the UART to give us an approximation of data it has RX'ed.
>*/
> - if (!(ch->ch_flags & CH_FIFO_ENABLED))
> + if (!(ch->ch_flags & CH_FIFO_ENABLED)) {
>   total = 0;
> - else {
> + }
> + else {

This should be:

diff --git a/drivers/staging/dgnc/dgnc_neo.c b/drivers/staging/dgnc/dgnc_neo.c
index e980150..67e2667 100644
--- a/drivers/staging/dgnc/dgnc_neo.c
+++ b/drivers/staging/dgnc/dgnc_neo.c
@@ -1108,9 +1108,9 @@ static void neo_copy_data_from_uart_to_queue(struct 
channel_t *ch)
 * On the other hand, if the UART IS in FIFO mode, then ask
 * the UART to give us an approximation of data it has RX'ed.
 */
-   if (!(ch->ch_flags & CH_FIFO_ENABLED))
+   if (!(ch->ch_flags & CH_FIFO_ENABLED)) {
total = 0;
-   else {
+   } else {
total = readb(&ch->ch_neo_uart->rfifo);
 
/*

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/10] Better compatible for the rockchip thermal and support RK3368 SoCs

2015-11-13 Thread Caesar Wang



在 2015年11月13日 06:14, Heiko Stuebner 写道:

Hi Eduardo,

Am Donnerstag, 12. November 2015, 10:29:52 schrieb Eduardo Valentin:

On Mon, Nov 09, 2015 at 12:48:52PM +0800, Caesar Wang wrote:

Thank you all for providing inputs and comments on previous versions of
this patchset.
Especially thanks to the (Eduardo, Dmitry, Heiko,).

This series patchs are working for RK3368 on Rockchip platform.

Do you have any results on existing support? Is the driver still in one
piece for rk3288?

I've tested this series on a rk3288-veyron-jerry and everything still
runs just fine, so

Tested-by: Heiko Stuebner 


Thanks Heiko for testing.:-)



___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip



--
caesar wang | software engineer | w...@rock-chip.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/10] Better compatible for the rockchip thermal and support RK3368 SoCs

2015-11-13 Thread Caesar Wang

Eduardo,

在 2015年11月13日 02:29, Eduardo Valentin 写道:

On Mon, Nov 09, 2015 at 12:48:52PM +0800, Caesar Wang wrote:

Thank you all for providing inputs and comments on previous versions of
this patchset.
Especially thanks to the (Eduardo, Dmitry, Heiko,).

This series patchs are working for RK3368 on Rockchip platform.

Do you have any results on existing support? Is the driver still in one
piece for rk3288?


Yep. that's still happy work for rk3288 SoCs.

$while true; do grep "" /sys/class/thermal/thermal_zone[1-2]/temp; sleep 
.5; done

...
/sys/class/thermal/thermal_zone1/temp:70833
/sys/class/thermal/thermal_zone2/temp:69615
/sys/class/thermal/thermal_zone1/temp:70416
/sys/class/thermal/thermal_zone2/temp:68846
/sys/class/thermal/thermal_zone1/temp:70416
/sys/class/thermal/thermal_zone2/temp:70833
/sys/class/thermal/thermal_zone1/temp:70833
/sys/class/thermal/thermal_zone2/temp:69615
/sys/class/thermal/thermal_zone1/temp:71666
/sys/class/thermal/thermal_zone2/temp:69615
/sys/class/thermal/thermal_zone1/temp:70416
/sys/class/thermal/thermal_zone2/temp:69615
/sys/class/thermal/thermal_zone1/temp:70833


I am planing to send your series in next rc cycles. It won't appear in
linux-next until merge window finishes.


Thanks!


BR,

Eduardo Valentin

___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip



--
caesar wang | software engineer | w...@rock-chip.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] platform/chrome: Changes for 4.4

2015-11-13 Thread Olof Johansson
Hi Linus,

Chrome hardware platform changes for 4.4. Please merge.


Thanks!

-Olof


The following changes since commit 049e6dde7e57f0054fdc49102e7ef4830c698b46:

  Linux 4.3-rc4 (2015-10-04 16:57:17 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform.git 
tags/chrome-platform-4.4

for you to fetch changes up to ebaf31c46cce0dc8a6ed690b5456b295aa7586a6:

  platform/chrome: Fix i2c-designware adapter name (2015-11-09 19:43:33 -0800)


platform/chrome: Branch for v4.4

Here's the branch of chrome platform changes for v4.4. Some have been queued
up for the full 4.3 release cycle since I forgot to send them in for that
round (rebased early on to deal with fixes conflicts).

Most of these enable EC communication stuff -- Pixel 2015 support, enabling
building for ARM64 platforms, and a few fixes for memory leaks.

There's also a patch in here to allow reading/writing the verified boot
context, which depends on a sysfs patch acked by Greg.


Christian Engelmayer (2):
  platform/chrome: cros_ec: Fix leak in sequence_store()
  platform/chrome: cros_ec: Fix possible leak in led_rgb_store()

Emilio L??pez (2):
  sysfs: Support is_visible() on binary attributes
  platform/chrome: Support reading/writing the vboot context

Jarkko Nikula (1):
  platform/chrome: Fix i2c-designware adapter name

Javier Martinez Canillas (5):
  Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || 
ARM"
  platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO
  platform/chrome: cros_ec_lpc - Use existing function to check EC result
  platform/chrome: cros_ec_lpc - Add support for Google Pixel 2
  platform/chrome: cros_ec_dev - Add a platform device ID table

Thierry Reding (1):
  platform/chrome: Enable Chrome platforms on 64-bit ARM

 drivers/platform/chrome/Kconfig|   5 +-
 drivers/platform/chrome/Makefile   |   3 +-
 drivers/platform/chrome/chromeos_laptop.c  |   4 +-
 drivers/platform/chrome/cros_ec_dev.c  |   7 ++
 drivers/platform/chrome/cros_ec_lightbar.c |  31 ---
 drivers/platform/chrome/cros_ec_lpc.c  |  21 ++---
 drivers/platform/chrome/cros_ec_vbc.c  | 137 +
 fs/sysfs/group.c   |  17 +++-
 include/linux/mfd/cros_ec.h|   1 +
 include/linux/sysfs.h  |  18 +++-
 10 files changed, 207 insertions(+), 37 deletions(-)
 create mode 100644 drivers/platform/chrome/cros_ec_vbc.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] platform/chrome: Fix i2c-designware adapter name

2015-11-13 Thread Olof Johansson
On Tue, Nov 03, 2015 at 10:49:59AM -0800, Jeremiah Mahler wrote:
> Jarkko,
> 
> On Tue, Nov 03, 2015 at 01:09:00PM +0200, Jarkko Nikula wrote:
> > Commit d80d134182ba ("i2c: designware: Move common probe code into
> > i2c_dw_probe()") caused the I2C adapter lookup code here to fail for PCI
> > enumerated i2c-designware because commit changed the adapter name but
> > didn't update it here.
> > 
> > Fix the I2C adapter lookup by using the "Synopsys DesignWare I2C adapter"
> > name.
> > 
> > Reported-by: Jeremiah Mahler 
> > Fixes: d80d134182ba ("i2c: designware: Move common probe code into 
> > i2c_dw_probe()")
> > Signed-off-by: Jarkko Nikula 
> > ---
> > Hi Jeremiah. This is the same diff I had in a reply to your bug report.
> > Can you test does this fix work for you as I don't have the HW.
> > ---
> >  drivers/platform/chrome/chromeos_laptop.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/platform/chrome/chromeos_laptop.c 
> > b/drivers/platform/chrome/chromeos_laptop.c
> > index 02072749fff3..2b441e9ae593 100644
> > --- a/drivers/platform/chrome/chromeos_laptop.c
> > +++ b/drivers/platform/chrome/chromeos_laptop.c
> > @@ -47,8 +47,8 @@ static const char *i2c_adapter_names[] = {
> > "SMBus I801 adapter",
> > "i915 gmbus vga",
> > "i915 gmbus panel",
> > -   "i2c-designware-pci",
> > -   "i2c-designware-pci",
> > +   "Synopsys DesignWare I2C adapter",
> > +   "Synopsys DesignWare I2C adapter",
> >  };
> >  
> >  /* Keep this enum consistent with i2c_adapter_names */
> > -- 
> > 2.6.2
> > 
> 
> Yes, this patch fixes the problem.
> 
> Tested-by: Jeremiah Mahler 

Thanks, applied!


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RT PATCH] sched: rt: fix two possible deadlocks in push_irq_work_func

2015-11-13 Thread yjin


On 2015年11月14日 12:25, Steven Rostedt wrote:

On Sat, 14 Nov 2015 10:53:18 +0800
 wrote:


From: Yanjiang Jin 

This can only happen in RT kernel due to run_timer_softirq() calls
irq_work_tick() when CONFIG_PREEMPT_RT_FULL is enabled as below:

static void run_timer_softirq(struct softirq_action *h)
{

if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
 irq_work_tick();
endif

}

Use raw_spin_{un,}lock_irq{save,restore} in push_irq_work_func() to
prevent following potentially deadlock scenario:

Ug. No, the real fix is that the irq work is to be run from hard
interrupt context.

But if so, we shouldn't call irq_work_tick() in run_timer_softirq(), right?

Thanks!
Yanjiang

Moving the scheduling of high priority real-time
tasks to ksoftirqd defeats the purpose. The question is, why is that
irq work being run from thread context when it has the
IRQ_WORK_HARD_IRQ flag set?

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [PATCH perf/core ] [BUGFIX] perf probe: Fix memory leaking on faiulre by clearing all probe_trace_events

2015-11-13 Thread 平松雅巳 / HIRAMATU,MASAMI
From: Wangnan (F) [mailto:wangn...@huawei.com]
>
>Hi Masami,
>
>Today I remember the reason why I introduced patch [1]. Although your
>patch is correct,
>either [1] or [2] is still required, but they are both need to be fixed.
>
>Here is a bug:
>
>A segfault raises if use glob matching and argument together and one of
>probe point
>failed to get that argument:
>
># ./perf probe -v -n 'SyS_dup? oldfd'
>probe-definition(0): SyS_dup? oldfd
>symbol:SyS_dup? file:(null) line:0 offset:0 return:0 lazy:(null)
>parsing arg: oldfd into oldfd
>1 arguments
>Looking at the vmlinux_path (7 entries long)
>Using /lib/modules/4.3.0-rc4+/build/vmlinux for symbols
>Open Debuginfo file: /lib/modules/4.3.0-rc4+/build/vmlinux
>Try to find probe point from debuginfo.
>Matched function: SyS_dup3
>found inline addr: 0x812095c0
>Probe point found: SyS_dup3+0
>Searching 'oldfd' variable in context.
>Converting variable oldfd into trace event.
>oldfd type is long int.
>found inline addr: 0x812096d4
>Probe point found: SyS_dup2+36
>Searching 'oldfd' variable in context.
>Failed to find 'oldfd' in this function.
>Matched function: SyS_dup3
>Probe point found: SyS_dup3+0
>Searching 'oldfd' variable in context.
>Converting variable oldfd into trace event.
>oldfd type is long int.
>Matched function: SyS_dup2
>Probe point found: SyS_dup2+0
>Searching 'oldfd' variable in context.
>Converting variable oldfd into trace event.
>oldfd type is long int.
>Found 4 probe_trace_events.
>Opening /sys/kernel/debug/tracing//kprobe_events write=1
>Writing event: p:probe/SyS_dup3 _text+2135488 oldfd=%di:s64
>Segmentation fault (core dumped)
>
>
>Here is how the segfault happen:
>
>In following call stack:
>
>add_probe_trace_event
>call_probe_finder
>probe_point_search_cb
>??
>dwarf_getfuncs
>find_probe_point_by_func
>debuginfo__find_probes
>debuginfo__find_trace_events
>try_to_find_probe_trace_events
>convert_to_probe_trace_events
>convert_perf_probe_events
>perf_add_probe_events
>
>add_probe_trace_event get called and failed due to failure of
>find_variable(). In this case
>tev->args is not freed and tev->nargs are positive (1 in this case).
>Error from find_variable()
>will be passed down along the call stack through return values until get to
>probe_point_search_cb(). The problem is that, probe_point_search_cb
>doesn't return error
>if the function name is a glob:
>
> /* Inlined function: search instances */
> param->retval = die_walk_instances(sp_die,
> probe_point_inline_cb, (void *)pf);
> /* This could be a non-existed inline definition */
> if (param->retval == -ENOENT && strisglob(pp->function))
> param->retval = 0;
>
>
>So the error won't be transfer to debuginfo__find_trace_events() from
>debuginfo__find_probes(),
>and your patch won't take effect.
>
>With patch [1], we set argument number to 0 and frees tev->args. With
>this fix we can save that
>tev. Here's the final result:
>
># ./perf probe 'SyS_dup? oldfd'
>Failed to find 'oldfd' in this function.
>Added new events:
>   probe:SyS_dup3   (on SyS_dup? with oldfd)
>   probe:SyS_dup3_1 (on SyS_dup? with oldfd)
>   probe:SyS_dup3_2 (on SyS_dup? with oldfd)
>   probe:SyS_dup2   (on SyS_dup? with oldfd)
>
>You can now use it in all perf tools, such as:
>
> perf record -e probe:SyS_dup2 -aR sleep 1
>
># PAGER=cat ./perf probe -l
>   probe:SyS_dup2   (on SyS_dup2@linux-hydrogen/fs/file.c with oldfd)
>   probe:SyS_dup3   (on SyS_dup3@linux-hydrogen/fs/file.c with oldfd)
>   probe:SyS_dup3_1 (on SYSC_dup2:12@linux-hydrogen/fs/file.c)
>   probe:SyS_dup3_2 (on SyS_dup3@linux-hydrogen/fs/file.c with oldfd)
>
>Perf can't find oldfd in the context of SyS_dup3_1. With [1]'s fix it still
>probe at it but doesn't create argument fetcher.
>
>[2]'s fix totally clear this tev. It is incorrect because it forgets to
>adjust
>tf->ntevs. With that appended the final result becomes:
>
># ./perf probe 'SyS_dup? oldfd'
>Failed to find 'oldfd' in this function.
>Added new events:
>   probe:SyS_dup3   (on SyS_dup? with oldfd)
>   probe:SyS_dup3_1 (on SyS_dup? with oldfd)
>   probe:SyS_dup2   (on SyS_dup? with oldfd)
>
>You can now use it in all perf tools, such as:
>
> perf record -e probe:SyS_dup2 -aR sleep 1
>
># PAGER=cat ./perf probe -l
>   probe:SyS_dup2   (on SyS_dup2@linux-hydrogen/fs/file.c with oldfd)
>   probe:SyS_dup3   (on SyS_dup3@linux-hydrogen/fs/file.c with oldfd)
>   probe:SyS_dup3_1 (on SyS_dup3@linux-hydrogen/fs/file.c with oldfd)
>
>Here only those probe point we can get 'oldfd' is probed.
>
>I suggest [2]'s fix because I think if user provide argument he or she would
>get interested with it, and we have already noticed user the argument is
>unabled
>to be found in one probe point.

OK, this may allow user to probe only where the variables can be accessed.
I also think we'd better offer another o

[PATCH] staging: most: aim-cdev: Used "==" instead of assignment

2015-11-13 Thread Anjali Menon
Used double equal sign instead of equal to sign in the if condition
to remove the error detected by checkpatch.pl.

ERROR: do not use assignment in if condition

Signed-off-by: Anjali Menon 
---
 drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c 
b/drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c
index dc3fb25..da1f894 100644
--- a/drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c
+++ b/drivers/staging/staging/drivers/staging/most/aim-cdev/cdev.c
@@ -175,7 +175,7 @@ static ssize_t aim_write(struct file *filp, const char 
__user *buf,
return -EAGAIN;
if (wait_event_interruptible(
channel->wq,
-   (mbo = most_get_mbo(channel->iface,
+   (mbo == most_get_mbo(channel->iface,
channel->channel_id,
&cdev_aim)) ||
(!channel->dev)))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RT PATCH] sched: rt: fix two possible deadlocks in push_irq_work_func

2015-11-13 Thread Steven Rostedt
On Sat, 14 Nov 2015 10:53:18 +0800
 wrote:

> From: Yanjiang Jin 
> 
> This can only happen in RT kernel due to run_timer_softirq() calls
> irq_work_tick() when CONFIG_PREEMPT_RT_FULL is enabled as below:
> 
> static void run_timer_softirq(struct softirq_action *h)
> {
> 
> if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
> irq_work_tick();
> endif
> 
> }
> 
> Use raw_spin_{un,}lock_irq{save,restore} in push_irq_work_func() to
> prevent following potentially deadlock scenario:

Ug. No, the real fix is that the irq work is to be run from hard
interrupt context. Moving the scheduling of high priority real-time
tasks to ksoftirqd defeats the purpose. The question is, why is that
irq work being run from thread context when it has the
IRQ_WORK_HARD_IRQ flag set?

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] perf probe: Clear probe_trace_event when add_probe_trace_event() fails

2015-11-13 Thread 平松雅巳 / HIRAMATU,MASAMI
>From: Wang Nan [mailto:wangn...@huawei.com]
>
>When probe with glob, error in add_probe_trace_event() won't be passed
>to debuginfo__find_trace_events() because it whould be modified by
>probe_point_search_cb(). It causes segfault if perf failed to find
>argument for one probing point matched by the glob. For example:
>
> # ./perf probe -v -n 'SyS_dup? oldfd'
> probe-definition(0): SyS_dup? oldfd
> symbol:SyS_dup? file:(null) line:0 offset:0 return:0 lazy:(null)
> parsing arg: oldfd into oldfd
> 1 arguments
> Looking at the vmlinux_path (7 entries long)
> Using /lib/modules/4.3.0-rc4+/build/vmlinux for symbols
> Open Debuginfo file: /lib/modules/4.3.0-rc4+/build/vmlinux
> Try to find probe point from debuginfo.
> Matched function: SyS_dup3
> found inline addr: 0x812095c0
> Probe point found: SyS_dup3+0
> Searching 'oldfd' variable in context.
> Converting variable oldfd into trace event.
> oldfd type is long int.
> found inline addr: 0x812096d4
> Probe point found: SyS_dup2+36
> Searching 'oldfd' variable in context.
> Failed to find 'oldfd' in this function.
> Matched function: SyS_dup3
> Probe point found: SyS_dup3+0
> Searching 'oldfd' variable in context.
> Converting variable oldfd into trace event.
> oldfd type is long int.
> Matched function: SyS_dup2
> Probe point found: SyS_dup2+0
> Searching 'oldfd' variable in context.
> Converting variable oldfd into trace event.
> oldfd type is long int.
> Found 4 probe_trace_events.
> Opening /sys/kernel/debug/tracing//kprobe_events write=1
> Writing event: p:probe/SyS_dup3 _text+2135488 oldfd=%di:s64
> Segmentation fault (core dumped)
>
>This patch ensures add_probe_trace_event() not touch tf->ntevs and
>tf->tevs if it returns failure.
>
>Here is testing result:
>
> # perf probe  'SyS_dup? oldfd'
> Failed to find 'oldfd' in this function.
> Added new events:
>   probe:SyS_dup3   (on SyS_dup? with oldfd)
>   probe:SyS_dup3_1 (on SyS_dup? with oldfd)
>   probe:SyS_dup2   (on SyS_dup? with oldfd)
>
> You can now use it in all perf tools, such as:
>
>   perf record -e probe:SyS_dup2 -aR sleep 1

Good catch!

Acked-by: Masami Hiramatsu 

Thanks!

>
>Signed-off-by: Wang Nan 
>Cc: Arnaldo Carvalho de Melo 
>Cc: Masami Hiramatsu 
>Cc: Zefan Li 
>Cc: pi3or...@163.com
>---
> tools/perf/util/probe-finder.c | 20 ++--
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
>diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
>index 63993d7..05012bb 100644
>--- a/tools/perf/util/probe-finder.c
>+++ b/tools/perf/util/probe-finder.c
>@@ -1183,7 +1183,7 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, 
>struct probe_finder *pf)
>   container_of(pf, struct trace_event_finder, pf);
>   struct perf_probe_point *pp = &pf->pev->point;
>   struct probe_trace_event *tev;
>-  struct perf_probe_arg *args;
>+  struct perf_probe_arg *args = NULL;
>   int ret, i;
>
>   /* Check number of tevs */
>@@ -1198,19 +1198,23 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, 
>struct probe_finder *pf)
>   ret = convert_to_trace_point(&pf->sp_die, tf->mod, pf->addr,
>pp->retprobe, pp->function, &tev->point);
>   if (ret < 0)
>-  return ret;
>+  goto end;
>
>   tev->point.realname = strdup(dwarf_diename(sc_die));
>-  if (!tev->point.realname)
>-  return -ENOMEM;
>+  if (!tev->point.realname) {
>+  ret = -ENOMEM;
>+  goto end;
>+  }
>
>   pr_debug("Probe point found: %s+%lu\n", tev->point.symbol,
>tev->point.offset);
>
>   /* Expand special probe argument if exist */
>   args = zalloc(sizeof(struct perf_probe_arg) * MAX_PROBE_ARGS);
>-  if (args == NULL)
>-  return -ENOMEM;
>+  if (args == NULL) {
>+  ret = -ENOMEM;
>+  goto end;
>+  }
>
>   ret = expand_probe_args(sc_die, pf, args);
>   if (ret < 0)
>@@ -1234,6 +1238,10 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, 
>struct probe_finder *pf)
>   }
>
> end:
>+  if (ret) {
>+  clear_probe_trace_event(tev);
>+  tf->ntevs--;
>+  }
>   free(args);
>   return ret;
> }
>--
>1.8.3.4



[PATCH 1/2] KVM: kvm_is_visible_gfn can be boolean

2015-11-13 Thread Yaowei Bai
This patch makes kvm_is_visible_gfn return bool due to this particular
function only using either one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai 
---
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c  | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5706a21..4436539 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -623,7 +623,7 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct 
gfn_to_hva_cache *ghc,
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
-int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
+bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 484079e..73cbb41 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1164,15 +1164,15 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct 
kvm_vcpu *vcpu, gfn_t gfn
return __gfn_to_memslot(kvm_vcpu_memslots(vcpu), gfn);
 }
 
-int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
+bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn);
 
if (!memslot || memslot->id >= KVM_USER_MEM_SLOTS ||
  memslot->flags & KVM_MEMSLOT_INVALID)
-   return 0;
+   return false;
 
-   return 1;
+   return true;
 }
 EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
 
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] KVM: kvm_para_has_feature can be boolean

2015-11-13 Thread Yaowei Bai
This patch makes kvm_para_has_feature return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai 
---
 include/linux/kvm_para.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 00a97bb..35e568f 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -4,10 +4,8 @@
 #include 
 
 
-static inline int kvm_para_has_feature(unsigned int feature)
+static inline bool kvm_para_has_feature(unsigned int feature)
 {
-   if (kvm_arch_para_features() & (1UL << feature))
-   return 1;
-   return 0;
+   return !!(kvm_arch_para_features() & (1UL << feature));
 }
 #endif /* __LINUX_KVM_PARA_H */
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 00/10] xen-block: multi hardware-queues/rings support

2015-11-13 Thread Bob Liu
Note: These patches were based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

After using blk-mq api, a guest has more than one(nr_vpus) software request
queues associated with each block front. These queues can be mapped over several
rings(hardware queues) to the backend, making it very easy for us to run
multiple threads on the backend for a single virtual disk.

By having different threads issuing requests at the same time, the performance
of guest can be improved significantly.

Test was done based on null_blk driver:
dom0: v4.3-rc7 16vcpus 10GB "modprobe null_blk"
domU: v4.3-rc7 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After commit("xen/blkfront: make persistent grants per-queue").
iops2: After commit("xen/blkback: make persistent grants and free pages pool 
per-queue").

Queues:   14  8  16
Iops orig(k):   810 1064780 700
Iops1(k):   810 1230(~20%)  1024(~20%)  850(~20%)
Iops2(k):   810 1410(~35%)  1354(~75%)  1440(~100%)

With 4 queues after this series we can get ~75% increase in IOPS, and
performance won't drop if incresing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

---
v5:
 * Rebase to xen/tip.git tags/for-linus-4.4-rc0-tag.
 * Comments from Konrad.

v4:
 * Rebase to v4.3-rc7.
 * Comments from Roger.

v3:
 * Rebased to v4.2-rc8.

Bob Liu (10):
  xen/blkif: document blkif multi-queue/ring extension
  xen/blkfront: separate per ring information out of device info
  xen/blkfront: pseudo support for multi hardware queues/rings
  xen/blkfront: split per device io_lock
  xen/blkfront: negotiate number of queues/rings to be used with backend
  xen/blkback: separate ring information out of struct xen_blkif
  xen/blkback: pseudo support for multi hardware queues/rings
  xen/blkback: get the number of hardware queues/rings from blkfront
  xen/blkfront: make persistent grants per-queue
  xen/blkback: make pool of persistent grants and free pages per-queue

 drivers/block/xen-blkback/blkback.c | 386 ++-
 drivers/block/xen-blkback/common.h  |  78 ++--
 drivers/block/xen-blkback/xenbus.c  | 359 --
 drivers/block/xen-blkfront.c| 718 ++--
 include/xen/interface/io/blkif.h|  48 +++
 5 files changed, 971 insertions(+), 618 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings

2015-11-13 Thread Bob Liu
Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkfront: negotiate number of queues/rings to be used with backend")
so as to make every single patch small and readable.

Signed-off-by: Bob Liu 
---
v2:
 * Fix memleak.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  341 --
 1 file changed, 195 insertions(+), 146 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 0c3ad21..d73734f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -150,6 +150,7 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
+   /* Number of pages per ring buffer. */
unsigned int nr_ring_pages;
struct request_queue *rq;
struct list_head grants;
@@ -164,7 +165,8 @@ struct blkfront_info
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
-   struct blkfront_ring_info rinfo;
+   struct blkfront_ring_info *rinfo;
+   unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -209,7 +211,7 @@ static DEFINE_SPINLOCK(minor_lock);
 #define GREFS(_psegs)  ((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static void blkfront_gather_backend_features(struct blkfront_info *info);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -338,8 +340,8 @@ static struct grant *get_indirect_grant(grant_ref_t 
*gref_head,
struct page *indirect_page;
 
/* Fetch a pre-allocated page to use for indirect grefs */
-   BUG_ON(list_empty(&info->rinfo.indirect_pages));
-   indirect_page = list_first_entry(&info->rinfo.indirect_pages,
+   BUG_ON(list_empty(&info->rinfo->indirect_pages));
+   indirect_page = list_first_entry(&info->rinfo->indirect_pages,
 struct page, lru);
list_del(&indirect_page->lru);
gnt_list_entry->page = indirect_page;
@@ -597,7 +599,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
 * existing persistent grants, or if we have to get new grants,
 * as there are not sufficiently many free.
 */
-   bool new_persistent_gnts;
struct scatterlist *sg;
int num_sg, max_grefs, num_grant;
 
@@ -609,12 +610,12 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
 */
max_grefs += INDIRECT_GREFS(max_grefs);
 
-   /* Check if we have enough grants to allocate a requests */
-   if (info->persistent_gnts_c < max_grefs) {
-   new_persistent_gnts = 1;
-   if (gnttab_alloc_grant_references(
-   max_grefs - info->persistent_gnts_c,
-   &setup.gref_head) < 0) {
+   /*
+* We have to reserve 'max_grefs' grants at first because persistent
+* grants are shared by all rings.
+*/
+   if (max_grefs > 0)
+   if (gnttab_alloc_grant_references(max_grefs, &setup.gref_head) 
< 0) {
gnttab_request_free_callback(
&rinfo->callback,
blkif_restart_queue_callback,
@@ -622,8 +623,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
max_grefs);
return 1;
}
-   } else
-   new_persistent_gnts = 0;
 
/* Fill out a communications ring structure. */
ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
@@ -712,7 +711,7 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
/* Keep a private copy so we can reissue requests when recovering. */
rinfo->shadow[id].req = *ring_req;
 
-   if (new_persistent_gnts)
+   if (max_grefs > 0)
gnttab_free_grant_references(setup.gref_head);
 
return 0;
@@ -791,7 +790,8 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, 
void *data,
 {
struct blkfront_info *info = (struct blkfront_info *)data;
 
-   hctx->driver_data = &info->rinfo;
+   BUG_ON(info->nr_rings <= index);
+   hctx->driver_data = &info->rinfo[index];
return 0;
 }
 
@@ -1050,8 +1050,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
-   unsigned int minor, nr_minors;
-   struct blkfront_ring_info *rinfo = &info->rinfo;
+   unsigned int minor, nr_minors, i;
 
if (info->rq == NULL)
return;
@@ -1059,11 

[PATCH v5 06/10] xen/blkback: separate ring information out of struct xen_blkif

2015-11-13 Thread Bob Liu
Split per ring information to an new structure "xen_blkif_ring", so that one vbd
device can be associated with one or more rings/hardware queues.

Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
may have multi backend threads.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
v2:
 * Have an BUG_ON on the holding of the pers_gnts_lock.
---
 drivers/block/xen-blkback/blkback.c |  235 ---
 drivers/block/xen-blkback/common.h  |   54 
 drivers/block/xen-blkback/xenbus.c  |   96 +++---
 3 files changed, 214 insertions(+), 171 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index f909994..fb5bfd4 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -173,11 +173,11 @@ static inline void shrink_free_pagepool(struct xen_blkif 
*blkif, int num)
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
struct blkif_request *req,
struct pending_req *pending_req);
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
  unsigned short op, int st);
 
 #define foreach_grant_safe(pos, n, rbtree, node) \
@@ -189,14 +189,8 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 
 
 /*
- * We don't need locking around the persistent grant helpers
- * because blkback uses a single-thread for each backed, so we
- * can be sure that this functions will never be called recursively.
- *
- * The only exception to that is put_persistent_grant, that can be called
- * from interrupt context (by xen_blkbk_unmap), so we have to use atomic
- * bit operations to modify the flags of a persistent grant and to count
- * the number of used grants.
+ * pers_gnts_lock must be used around all the persistent grant helpers
+ * because blkback may use multi-thread/queue for each backend.
  */
 static int add_persistent_gnt(struct xen_blkif *blkif,
   struct persistent_gnt *persistent_gnt)
@@ -204,6 +198,7 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
struct rb_node **new = NULL, *parent = NULL;
struct persistent_gnt *this;
 
+   BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
@@ -241,6 +236,7 @@ static struct persistent_gnt *get_persistent_gnt(struct 
xen_blkif *blkif,
struct persistent_gnt *data;
struct rb_node *node = NULL;
 
+   BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
node = blkif->persistent_gnts.rb_node;
while (node) {
data = container_of(node, struct persistent_gnt, node);
@@ -265,6 +261,7 @@ static struct persistent_gnt *get_persistent_gnt(struct 
xen_blkif *blkif,
 static void put_persistent_gnt(struct xen_blkif *blkif,
struct persistent_gnt *persistent_gnt)
 {
+   BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
pr_alert_ratelimited("freeing a grant already unused\n");
set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
@@ -286,6 +283,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, 
struct rb_root *root,
unmap_data.unmap_ops = unmap;
unmap_data.kunmap_ops = NULL;
 
+   BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
foreach_grant_safe(persistent_gnt, n, root, node) {
BUG_ON(persistent_gnt->handle ==
BLKBACK_INVALID_HANDLE);
@@ -322,11 +320,13 @@ void xen_blkbk_unmap_purged_grants(struct work_struct 
*work)
int segs_to_unmap = 0;
struct xen_blkif *blkif = container_of(work, typeof(*blkif), 
persistent_purge_work);
struct gntab_unmap_queue_data unmap_data;
+   unsigned long flags;
 
unmap_data.pages = pages;
unmap_data.unmap_ops = unmap;
unmap_data.kunmap_ops = NULL;
 
+   spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
while(!list_empty(&blkif->persistent_purge_list)) {
persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
  struct persistent_gnt,
@@ -348,6 +348,7 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
}
kfree(persistent_gnt);
}
+   spin_unlock_irqrestore(&blkif->pers_gnts

[PATCH v5 02/10] xen/blkfront: separate per ring information out of device info

2015-11-13 Thread Bob Liu
Split per ring information to an new structure "blkfront_ring_info".

A ring is the representation of a hardware queue, every vbd device can associate
with one or more rings depending on how many hardware queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
v2: Fix build error.
---
 drivers/block/xen-blkfront.c |  359 +++---
 1 file changed, 197 insertions(+), 162 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2fee2ee..0c3ad21 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -120,6 +120,23 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues/rings to be used.
+ */
+struct blkfront_ring_info {
+   struct blkif_front_ring ring;
+   unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
+   unsigned int evtchn, irq;
+   struct work_struct work;
+   struct gnttab_free_callback callback;
+   struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+   struct list_head indirect_pages;
+   unsigned long shadow_free;
+   struct blkfront_info *dev_info;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
@@ -133,18 +150,10 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
-   int ring_ref[XENBUS_MAX_RING_GRANTS];
unsigned int nr_ring_pages;
-   struct blkif_front_ring ring;
-   unsigned int evtchn, irq;
struct request_queue *rq;
-   struct work_struct work;
-   struct gnttab_free_callback callback;
-   struct blk_shadow shadow[BLK_MAX_RING_SIZE];
struct list_head grants;
-   struct list_head indirect_pages;
unsigned int persistent_gnts_c;
-   unsigned long shadow_free;
unsigned int feature_flush;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
@@ -155,6 +164,7 @@ struct blkfront_info
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
+   struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -198,33 +208,35 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define GREFS(_psegs)  ((_psegs) * GRANTS_PER_PSEG)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-   unsigned long free = info->shadow_free;
-   BUG_ON(free >= BLK_RING_SIZE(info));
-   info->shadow_free = info->shadow[free].req.u.rw.id;
-   info->shadow[free].req.u.rw.id = 0x0fee; /* debug */
+   unsigned long free = rinfo->shadow_free;
+
+   BUG_ON(free >= BLK_RING_SIZE(rinfo->dev_info));
+   rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+   rinfo->shadow[free].req.u.rw.id = 0x0fee; /* debug */
return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
   unsigned long id)
 {
-   if (info->shadow[id].req.u.rw.id != id)
+   if (rinfo->shadow[id].req.u.rw.id != id)
return -EINVAL;
-   if (info->shadow[id].request == NULL)
+   if (rinfo->shadow[id].request == NULL)
return -EINVAL;
-   info->shadow[id].req.u.rw.id  = info->shadow_free;
-   info->shadow[id].request = NULL;
-   info->shadow_free = id;
+   rinfo->shadow[id].req.u.rw.id  = rinfo->shadow_free;
+   rinfo->shadow[id].request = NULL;
+   rinfo->shadow_free = id;
return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
+   struct blkfront_info *info = rinfo->dev_info;
struct page *granted_page;
struct grant *gnt_list_entry, *n;
int i = 0;
@@ -326,8 +338,8 @@ static struct grant *get_indirect_grant(grant_ref_t 
*gref_head,
struct page *indirect_page;
 
/* Fetch a pre-allocated page to use for indirect grefs */
-   BUG_ON(list_empty(&info->indirect_pages));
-   indirect_page = list_first_entry(&info->indirect_pages,
+   BUG_ON(list_empty(&info->rinfo.indirect_pages));
+   indirect_page = list_first_entry(&info->rinfo.indirect_pages,
  

[PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront

2015-11-13 Thread Bob Liu
Backend advertises "multi-queue-max-queues" to front, also get the negotiated
number from "multi-queue-num-queues" written by blkfront.

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/blkback.c |   12 
 drivers/block/xen-blkback/common.h  |1 +
 drivers/block/xen-blkback/xenbus.c  |   34 --
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fb5bfd4..acedc46 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -84,6 +84,15 @@ MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
 /*
+ * Maximum number of rings/queues blkback supports, allow as many queues as 
there
+ * are CPUs if user has not specified a value.
+ */
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+"Maximum number of hardware queues per virtual disk");
+
+/*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
  */
@@ -1483,6 +1492,9 @@ static int __init xen_blkif_init(void)
xen_blkif_max_ring_order = XENBUS_MAX_RING_GRANT_ORDER;
}
 
+   if (xenblk_max_queues == 0)
+   xenblk_max_queues = num_online_cpus();
+
rc = xen_blkif_interface_init();
if (rc)
goto failed_init;
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index f2386e3..0833dc6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -46,6 +46,7 @@
 #include 
 
 extern unsigned int xen_blkif_max_ring_order;
+extern unsigned int xenblk_max_queues;
 /*
  * This is the maximum number of segments that would be allowed in indirect
  * requests. This value will also be passed to the frontend.
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 6c6e048..d83b790 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -181,12 +181,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
blkif->st_print = jiffies;
INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-   blkif->nr_rings = 1;
-   if (xen_blkif_alloc_rings(blkif)) {
-   kmem_cache_free(xen_blkif_cachep, blkif);
-   return ERR_PTR(-ENOMEM);
-   }
-
return blkif;
 }
 
@@ -595,6 +589,12 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
goto fail;
}
 
+   /* Multi-queue: write how many queues are supported by the backend. */
+   err = xenbus_printf(XBT_NIL, dev->nodename,
+   "multi-queue-max-queues", "%u", xenblk_max_queues);
+   if (err)
+   pr_warn("Error writing multi-queue-num-queues\n");
+
/* setup back pointer */
be->blkif->be = be;
 
@@ -980,6 +980,7 @@ static int connect_ring(struct backend_info *be)
char *xspath;
size_t xspathsize;
const size_t xenstore_path_ext_size = 11; /* sufficient for 
"/queue-NNN" */
+   unsigned int requested_num_queues = 0;
 
pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1007,6 +1008,27 @@ static int connect_ring(struct backend_info *be)
be->blkif->vbd.feature_gnt_persistent = pers_grants;
be->blkif->vbd.overflow_max_grants = 0;
 
+   /*
+* Read the number of hardware queues from frontend.
+*/
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
+  "%u", &requested_num_queues);
+   if (err < 0) {
+   requested_num_queues = 1;
+   } else {
+   if (requested_num_queues > xenblk_max_queues
+   || requested_num_queues == 0) {
+   /* buggy or malicious guest */
+   xenbus_dev_fatal(dev, err,
+   "guest requested %u queues, exceeding 
the maximum of %u.",
+   requested_num_queues, 
xenblk_max_queues);
+   return -1;
+   }
+   }
+   be->blkif->nr_rings = requested_num_queues;
+   if (xen_blkif_alloc_rings(be->blkif))
+   return -ENOMEM;
+
pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue

2015-11-13 Thread Bob Liu
Make pool of persistent grants and free pages per-queue/ring instead of
per-device to get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After commit("xen/blkfront: make persistent grants per-queue").
iops2: After this commit.

Queues:   14  8  16
Iops orig(k):   810 1064780 700
Iops1(k):   810 1230(~20%)  1024(~20%)  850(~20%)
Iops2(k):   810 1410(~35%)  1354(~75%)  1440(~100%)

With 4 queues after this commit we can get ~75% increase in IOPS, and
performance won't drop if incresing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/blkback.c |  202 ---
 drivers/block/xen-blkback/common.h  |   32 +++---
 drivers/block/xen-blkback/xenbus.c  |   21 ++--
 3 files changed, 118 insertions(+), 137 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index acedc46..0e8a04d 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -122,60 +122,60 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
-   if (list_empty(&blkif->free_pages)) {
-   BUG_ON(blkif->free_pages_num != 0);
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
+   if (list_empty(&ring->free_pages)) {
+   BUG_ON(ring->free_pages_num != 0);
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
return gnttab_alloc_pages(1, page);
}
-   BUG_ON(blkif->free_pages_num == 0);
-   page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+   BUG_ON(ring->free_pages_num == 0);
+   page[0] = list_first_entry(&ring->free_pages, struct page, lru);
list_del(&page[0]->lru);
-   blkif->free_pages_num--;
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   ring->free_pages_num--;
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page 
**page,
   int num)
 {
unsigned long flags;
int i;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
for (i = 0; i < num; i++)
-   list_add(&page[i]->lru, &blkif->free_pages);
-   blkif->free_pages_num += num;
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   list_add(&page[i]->lru, &ring->free_pages);
+   ring->free_pages_num += num;
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
struct page *page[NUM_BATCH_FREE_PAGES];
unsigned int num_pages = 0;
unsigned long flags;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
-   while (blkif->free_pages_num > num) {
-   BUG_ON(list_empty(&blkif->free_pages));
-   page[num_pages] = list_first_entry(&blkif->free_pages,
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
+   while (ring->free_pages_num > num) {
+   BUG_ON(list_empty(&ring->free_pages));
+   page[num_pages] = list_first_entry(&ring->free_pages,
   struct page, lru);
list_del(&page[num_pages]->lru);
-   blkif->free_pages_num--;
+   ring->free_pages_num--;
if (++num_pages == NUM_BATCH_FREE_PAGES) {
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
gnttab_free_pages(num_pages, page);
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lo

[PATCH v5 04/10] xen/blkfront: split per device io_lock

2015-11-13 Thread Bob Liu
After commit "xen/blkfront: separate per ring information out of device
info", per-ring data is protected by a per-device lock('io_lock').

This is not a good way and will effect the scalability, so introduces a
per-ring lock('ring_lock').

The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
persistent_gnts_c shared by all rings.

Signed-off-by: Bob Liu 
---
v2:
 * Introduce kick_pending_request_queues_locked().
 * Add comment for 'ring_lock'.
 * Move locks to more suitable place.
---
 drivers/block/xen-blkfront.c |   73 +++---
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d73734f..56c9ec6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -125,6 +125,8 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
  *  depending on how many hardware queues/rings to be used.
  */
 struct blkfront_ring_info {
+   /* Lock to protect data in every ring buffer. */
+   spinlock_t ring_lock;
struct blkif_front_ring ring;
unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
unsigned int evtchn, irq;
@@ -143,7 +145,6 @@ struct blkfront_ring_info {
  */
 struct blkfront_info
 {
-   spinlock_t io_lock;
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
@@ -153,6 +154,11 @@ struct blkfront_info
/* Number of pages per ring buffer. */
unsigned int nr_ring_pages;
struct request_queue *rq;
+   /*
+* Lock to protect info->grants list and persistent_gnts_c shared by all
+* rings.
+*/
+   spinlock_t dev_lock;
struct list_head grants;
unsigned int persistent_gnts_c;
unsigned int feature_flush;
@@ -258,7 +264,9 @@ static int fill_grant_buffer(struct blkfront_ring_info 
*rinfo, int num)
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
+   spin_lock_irq(&info->dev_lock);
list_add(&gnt_list_entry->node, &info->grants);
+   spin_unlock_irq(&info->dev_lock);
i++;
}
 
@@ -267,7 +275,9 @@ static int fill_grant_buffer(struct blkfront_ring_info 
*rinfo, int num)
 out_of_memory:
list_for_each_entry_safe(gnt_list_entry, n,
 &info->grants, node) {
+   spin_lock_irq(&info->dev_lock);
list_del(&gnt_list_entry->node);
+   spin_unlock_irq(&info->dev_lock);
if (info->feature_persistent)
__free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
@@ -280,7 +290,9 @@ out_of_memory:
 static struct grant *get_free_grant(struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
+   unsigned long flags;
 
+   spin_lock_irqsave(&info->dev_lock, flags);
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
  node);
@@ -288,6 +300,7 @@ static struct grant *get_free_grant(struct blkfront_info 
*info)
 
if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+   spin_unlock_irqrestore(&info->dev_lock, flags);
 
return gnt_list_entry;
 }
@@ -757,11 +770,11 @@ static inline bool blkif_request_flush_invalid(struct 
request *req,
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
   const struct blk_mq_queue_data *qd)
 {
+   unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info 
*)hctx->driver_data;
-   struct blkfront_info *info = rinfo->dev_info;
 
blk_mq_start_request(qd->rq);
-   spin_lock_irq(&info->io_lock);
+   spin_lock_irqsave(&rinfo->ring_lock, flags);
if (RING_FULL(&rinfo->ring))
goto out_busy;
 
@@ -772,15 +785,15 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
goto out_busy;
 
flush_requests(rinfo);
-   spin_unlock_irq(&info->io_lock);
+   spin_unlock_irqrestore(&rinfo->ring_lock, flags);
return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-   spin_unlock_irq(&info->io_lock);
+   spin_unlock_irqrestore(&rinfo->ring_lock, flags);
return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-   spin_unlock_irq(&info->io_lock);
+   spin_unlock_irqrestore(&rinfo->ring_lock, flags);
blk_mq_stop_hw_queue(hctx);
return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -1082,21 +1095,28 @@ static void xlvbd_release_gendisk(struct blkfront_info 
*info)
info->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
+/* Already hold rinfo->ring_lock. */
+static inline void kick_pending_request_queues_locked(struct 
blkfront_ring_info *rinfo)
 {
if (!RING_FULL(&rinfo->ring))
   

[PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue

2015-11-13 Thread Bob Liu
Make persistent grants per-queue/ring instead of per-device, so that we can
drop the 'dev_lock' and get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Queues:   14  8  16
Iops orig(k):   810 1064780 700
Iops patched(k):810 1230(~20%)  1024(~20%)  850(~20%)

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkfront.c |  110 +-
 1 file changed, 43 insertions(+), 67 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 84496be..451f852 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -142,6 +142,8 @@ struct blkfront_ring_info {
struct gnttab_free_callback callback;
struct blk_shadow shadow[BLK_MAX_RING_SIZE];
struct list_head indirect_pages;
+   struct list_head grants;
+   unsigned int persistent_gnts_c;
unsigned long shadow_free;
struct blkfront_info *dev_info;
 };
@@ -162,13 +164,6 @@ struct blkfront_info
/* Number of pages per ring buffer. */
unsigned int nr_ring_pages;
struct request_queue *rq;
-   /*
-* Lock to protect info->grants list and persistent_gnts_c shared by all
-* rings.
-*/
-   spinlock_t dev_lock;
-   struct list_head grants;
-   unsigned int persistent_gnts_c;
unsigned int feature_flush;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
@@ -272,9 +267,7 @@ static int fill_grant_buffer(struct blkfront_ring_info 
*rinfo, int num)
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
-   spin_lock_irq(&info->dev_lock);
-   list_add(&gnt_list_entry->node, &info->grants);
-   spin_unlock_irq(&info->dev_lock);
+   list_add(&gnt_list_entry->node, &rinfo->grants);
i++;
}
 
@@ -282,10 +275,8 @@ static int fill_grant_buffer(struct blkfront_ring_info 
*rinfo, int num)
 
 out_of_memory:
list_for_each_entry_safe(gnt_list_entry, n,
-&info->grants, node) {
-   spin_lock_irq(&info->dev_lock);
+&rinfo->grants, node) {
list_del(&gnt_list_entry->node);
-   spin_unlock_irq(&info->dev_lock);
if (info->feature_persistent)
__free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
@@ -295,20 +286,17 @@ out_of_memory:
return -ENOMEM;
 }
 
-static struct grant *get_free_grant(struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_ring_info *rinfo)
 {
struct grant *gnt_list_entry;
-   unsigned long flags;
 
-   spin_lock_irqsave(&info->dev_lock, flags);
-   BUG_ON(list_empty(&info->grants));
-   gnt_list_entry = list_first_entry(&info->grants, struct grant,
+   BUG_ON(list_empty(&rinfo->grants));
+   gnt_list_entry = list_first_entry(&rinfo->grants, struct grant,
  node);
list_del(&gnt_list_entry->node);
 
if (gnt_list_entry->gref != GRANT_INVALID_REF)
-   info->persistent_gnts_c--;
-   spin_unlock_irqrestore(&info->dev_lock, flags);
+   rinfo->persistent_gnts_c--;
 
return gnt_list_entry;
 }
@@ -324,9 +312,10 @@ static inline void grant_foreign_access(const struct grant 
*gnt_list_entry,
 
 static struct grant *get_grant(grant_ref_t *gref_head,
   unsigned long gfn,
-  struct blkfront_info *info)
+  struct blkfront_ring_info *rinfo)
 {
-   struct grant *gnt_list_entry = get_free_grant(info);
+   struct grant *gnt_list_entry = get_free_grant(rinfo);
+   struct blkfront_info *info = rinfo->dev_info;
 
if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
@@ -347,9 +336,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 }
 
 static struct grant *get_indirect_grant(grant_ref_t *gref_head,
-   struct blkfront_info *info)
+   struct blkfront_ring_info *rinfo)
 {
-   struct grant *gnt_list_entry = get_free_grant(info);
+   struct grant *gnt_list_entry = get_free_grant(rinfo);
+   struct blkfront_info *info = rinfo->dev_info;
 
if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
@@ -361,8 +351,8 @@ static struct grant *get_indirect_grant(grant_ref_t 
*gref_head,
struct page *indirect_page;
 
/* Fe

[PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend

2015-11-13 Thread Bob Liu
The max number of hardware queues for xen/blkfront is set by parameter
'max_queues'(default 4), while it is also capped by the max value that the
xen/blkback exposes through XenStore key 'multi-queue-max-queues'.

The negotiated number is the smaller one and would be written back to xenstore
as "multi-queue-num-queues", blkback needs to read this negotiated number.

Signed-off-by: Bob Liu 
---
v2:
 * Make 'i' be an unsigned int.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  160 +++---
 1 file changed, 119 insertions(+), 41 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 56c9ec6..84496be 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -99,6 +99,10 @@ static unsigned int xen_blkif_max_segments = 32;
 module_param_named(max, xen_blkif_max_segments, int, S_IRUGO);
 MODULE_PARM_DESC(max, "Maximum amount of segments in indirect requests 
(default is 32)");
 
+static unsigned int xen_blkif_max_queues = 4;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings used per 
virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -118,6 +122,10 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+/*
+ * queue-%u would take 7 + 10(UINT_MAX) = 17 characters
+ */
+#define QUEUE_NAME_LEN (17)
 
 /*
  *  Per-ring info.
@@ -823,7 +831,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
 
memset(&info->tag_set, 0, sizeof(info->tag_set));
info->tag_set.ops = &blkfront_mq_ops;
-   info->tag_set.nr_hw_queues = 1;
+   info->tag_set.nr_hw_queues = info->nr_rings;
info->tag_set.queue_depth =  BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1520,6 +1528,53 @@ fail:
return err;
 }
 
+/*
+ * Write out per-ring/queue nodes including ring-ref and event-channel, and 
each
+ * ring buffer may have multi pages depending on ->nr_ring_pages.
+ */
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+   struct blkfront_ring_info *rinfo, const char 
*dir)
+{
+   int err;
+   unsigned int i;
+   const char *message = NULL;
+   struct blkfront_info *info = rinfo->dev_info;
+
+   if (info->nr_ring_pages == 1) {
+   err = xenbus_printf(xbt, dir, "ring-ref", "%u", 
rinfo->ring_ref[0]);
+   if (err) {
+   message = "writing ring-ref";
+   goto abort_transaction;
+   }
+   } else {
+   for (i = 0; i < info->nr_ring_pages; i++) {
+   char ring_ref_name[RINGREF_NAME_LEN];
+
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   err = xenbus_printf(xbt, dir, ring_ref_name,
+   "%u", rinfo->ring_ref[i]);
+   if (err) {
+   message = "writing ring-ref";
+   goto abort_transaction;
+   }
+   }
+   }
+
+   err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+   if (err) {
+   message = "writing event-channel";
+   goto abort_transaction;
+   }
+
+   return 0;
+
+abort_transaction:
+   xenbus_transaction_end(xbt, 1);
+   if (message)
+   xenbus_dev_fatal(info->xbdev, err, "%s", message);
+
+   return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
 {
const char *message = NULL;
struct xenbus_transaction xbt;
-   int err, i;
-   unsigned int max_page_order = 0;
+   int err;
+   unsigned int i, max_page_order = 0;
unsigned int ring_page_order = 0;
-   struct blkfront_ring_info *rinfo;
 
err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
   "max-ring-page-order", "%u", &max_page_order);
@@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
}
 
for (i = 0; i < info->nr_rings; i++) {
-   rinfo = &info->rinfo[i];
+   struct blkfront_ring_info *rinfo = &info->rinfo[i];
+
/* Create shared ring, alloc event channel. */
err = setup_blkring(dev, rinfo);
if (err)
@@ -1556,44 +1611,49 @@ again:
goto destroy_blkring;
}
 
-   if (info->nr_rings == 1) {
-  

[PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings

2015-11-13 Thread Bob Liu
Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkback: get the number of hardware queues/rings from blkfront") so
as to make every single patch small and readable.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/common.h |3 +-
 drivers/block/xen-blkback/xenbus.c |  277 ++--
 2 files changed, 175 insertions(+), 105 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index f4dfa5b..f2386e3 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -340,7 +340,8 @@ struct xen_blkif {
struct work_struct  free_work;
unsigned int nr_ring_pages;
/* All rings for this device. */
-   struct xen_blkif_ring ring;
+   struct xen_blkif_ring *rings;
+   unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index e4bfc92..6c6e048 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -86,9 +86,11 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
int err;
char name[BLKBACK_NAME_LEN];
+   struct xen_blkif_ring *ring;
+   unsigned int i;
 
/* Not ready to connect? */
-   if (!blkif->ring.irq || !blkif->vbd.bdev)
+   if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
return;
 
/* Already connected? */
@@ -113,19 +115,55 @@ static void xen_update_blkif_status(struct xen_blkif 
*blkif)
}
invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-   blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, 
"%s", name);
-   if (IS_ERR(blkif->ring.xenblkd)) {
-   err = PTR_ERR(blkif->ring.xenblkd);
-   blkif->ring.xenblkd = NULL;
-   xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-   return;
+   for (i = 0; i < blkif->nr_rings; i++) {
+   ring = &blkif->rings[i];
+   ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s-%d", 
name, i);
+   if (IS_ERR(ring->xenblkd)) {
+   err = PTR_ERR(ring->xenblkd);
+   ring->xenblkd = NULL;
+   xenbus_dev_fatal(blkif->be->dev, err,
+   "start %s-%d xenblkd", name, i);
+   goto out;
+   }
+   }
+   return;
+
+out:
+   while (--i >= 0) {
+   ring = &blkif->rings[i];
+   kthread_stop(ring->xenblkd);
}
+   return;
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+   unsigned int r;
+
+   blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), 
GFP_KERNEL);
+   if (!blkif->rings)
+   return -ENOMEM;
+
+   for (r = 0; r < blkif->nr_rings; r++) {
+   struct xen_blkif_ring *ring = &blkif->rings[r];
+
+   spin_lock_init(&ring->blk_ring_lock);
+   init_waitqueue_head(&ring->wq);
+   INIT_LIST_HEAD(&ring->pending_free);
+
+   spin_lock_init(&ring->pending_free_lock);
+   init_waitqueue_head(&ring->pending_free_wq);
+   init_waitqueue_head(&ring->shutdown_wq);
+   ring->blkif = blkif;
+   xen_blkif_get(blkif);
+   }
+
+   return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
-   struct xen_blkif_ring *ring;
 
BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -143,15 +181,11 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
blkif->st_print = jiffies;
INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-   ring = &blkif->ring;
-   ring->blkif = blkif;
-   spin_lock_init(&ring->blk_ring_lock);
-   init_waitqueue_head(&ring->wq);
-
-   INIT_LIST_HEAD(&ring->pending_free);
-   spin_lock_init(&ring->pending_free_lock);
-   init_waitqueue_head(&ring->pending_free_wq);
-   init_waitqueue_head(&ring->shutdown_wq);
+   blkif->nr_rings = 1;
+   if (xen_blkif_alloc_rings(blkif)) {
+   kmem_cache_free(xen_blkif_cachep, blkif);
+   return ERR_PTR(-ENOMEM);
+   }
 
return blkif;
 }
@@ -216,50 +250,54 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, 
grant_ref_t *gref,
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
struct pending_req *req, *n;
-   int i = 0, j;
-   struct xen_blkif_ring *ring = &blkif->ring;
+   unsigned int j, r;
 
-   if (ring->xenblkd) {
-   kthread_stop(ring->xenblkd);
-   wake_up(&ring->shutdown_wq);
-   ring->xe

[PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension

2015-11-13 Thread Bob Liu
Document the multi-queue/ring feature in terms of XenStore keys to be written by
the backend and by the frontend.

Signed-off-by: Bob Liu 
---
v2:
Add descriptions together with multi-page ring buffer.
---
 include/xen/interface/io/blkif.h |   48 ++
 1 file changed, 48 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..8b8cfad 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,54 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues" with the number they wish to use, which must be
+ * greater than zero, and no more than the value reported by the backend in
+ * "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel and ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = ""
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = ""
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = ""
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = ""
+ *
+ * It is also possible to use multiple queues/rings together with
+ * feature multi-page ring buffer.
+ * For example, a frontend requests two queues/rings and the size of each ring
+ * buffer is two pages must write the following set of related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/ring-page-order = "1"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref1 = ""
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = ""
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref0 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = ""
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ  0
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/4] SysFS driver for QEMU fw_cfg device

2015-11-13 Thread Gabriel L. Somlo
From: "Gabriel Somlo" 

Allow access to QEMU firmware blobs, passed into the guest VM via
the fw_cfg device, through SysFS entries. Blob meta-data (e.g. name,
size, and fw_cfg key), as well as the raw binary blob data may be
accessed.

The SysFS access location is /sys/firmware/qemu_fw_cfg/... and was
selected based on overall similarity to the type of information
exposed under /sys/firmware/dmi/entries/...

New (since v3):

Patch 1/4: Device probing now works with either ACPI, DT, or
   optionally by manually specifying a base, size, and
   register offsets on the command line. This way, all
   architectures offering fw_cfg can be supported, although
   x86 and ARM get *automatic* support via ACPI and/or DT.

   HUGE thanks to Laszlo Ersek  for
   pointing out drivers/virtio/virtio_mmio.c, as an example
   on how to pull this off !!!

   Stefan: I saw Marc's DMA patches to fw_cfg. Since only
   x86 and ARM will support it starting with QEMU 2.5, and
   since I expect to get lots of otherwise interesting (but
   otherwise orthogonal) feedback on this series, I'd like
   to stick with ioread8() across the board for now. We can
   always patch in DMA support in a backward compatible way
   later, once this series gets (hopefully) accepted :)

Patch 2/4: (was 3/4 in v3): unchanged. Exports kset_find_obj() so
   modules can call it.

Patch 3/4: (was 4/4 in v3): rebased, but otherwise the same.
   Essentially, creates a "human readable" directory
   hierarchy from "path-like" tokens making up fw_cfg
   blob names. I'm not really sure there's a way to make
   this happen via udev rules, but I have at least one
   potential use case for doing it *before* udev becomes
   available (cc: Andy Lutomirski ),
   so I'd be happy to leave this functionality in the
   kernel module. See further below for an illustration
   of this.

Patch 4/4: Updates the existing ARM DT documentation for fw_cfg,
   mainly by pointing at the more comprehensive document
   introduced with Patch 1/4 for details on the fw_cfg
   device interface, leaving only the specific ARM/DT
   address/size node information in place.

Thanks much,
  --Gabriel

>  In addition to the "by_key" blob listing, e.g.:
>  
>  $ tree /sys/firmware/qemu_fw_cfg/
>  /sys/firmware/qemu_fw_cfg/
>  |-- by_key
>  |   |-- 32
>  |   |   |-- key
>  |   |   |-- name("etc/boot-fail-wait")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 33
>  |   |   |-- key
>  |   |   |-- name("etc/smbios/smbios-tables")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 34
>  |   |   |-- key
>  |   |   |-- name("etc/smbios/smbios-anchor")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 35
>  |   |   |-- key
>  |   |   |-- name("etc/e820")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 36
>  |   |   |-- key
>  |   |   |-- name("genroms/kvmvapic.bin")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 37
>  |   |   |-- key
>  |   |   |-- name("etc/system-states")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 38
>  |   |   |-- key
>  |   |   |-- name("etc/acpi/tables")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 39
>  |   |   |-- key
>  |   |   |-- name("etc/table-loader")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 40
>  |   |   |-- key
>  |   |   |-- name("etc/tpm/log")
>  |   |   |-- raw
>  |   |   `-- size
>  |   |-- 41
>  |   |   |-- key
>  |   |   |-- name("etc/acpi/rsdp")
>  |   |   |-- raw
>  |   |   `-- size
>  |   `-- 42
>  |   |-- key
>  |   |-- name("bootorder")
>  |   |-- raw
>  |   `-- size
>  |
>  ...
>  
>  Patch 3/4 also gets us a "human readable" "by_name" listing, like so:
>  
>  ...
>  |-- by_name
>  |   |-- bootorder -> ../by_key/42
>  |   |-- etc
>  |   |   |-- acpi
>  |   |   |   |-- rsdp -> ../../../by_key/41
>  |   |   |   `-- tables -> ../../../by_key/38
>  |   |   |-- boot-fail-wait -> ../../by_key/32
>  |   |   |-- e820 -> ../../by_key/35
>  |   |   |-- smbios
>  |   |   |   |-- smbios-anchor -> ../../../by_key/34
>  |   |   |   `-- smbios-tables -> ../../../by_key/33
>  |   |   |-- system-states -> ../../by_key/37
>  |   |   |-- table-loader -> ../../by_key/39
>  |   |   `-- tpm
>  |   |   `-- log -> ../../../by_key/40
>  |   `-- genroms
>  |   `-- kvmvapic.bin -> ../../by_key/36
>  `-- rev


[PATCH v4 3/4] firmware: create directory hierarchy for sysfs fw_cfg entries

2015-11-13 Thread Gabriel L. Somlo
From: Gabriel Somlo 

Each fw_cfg entry of type "file" has an associated 56-char,
nul-terminated ASCII string which represents its name. While
the fw_cfg device doesn't itself impose any specific naming
convention, QEMU developers have traditionally used path name
semantics (i.e. "etc/acpi/rsdp") to descriptively name the
various fw_cfg "blobs" passed into the guest.

This patch attempts, on a best effort basis, to create a
directory hierarchy representing the content of fw_cfg file
names, under /sys/firmware/qemu_fw_cfg/by_name.

Upon successful creation of all directories representing the
"dirname" portion of a fw_cfg file, a symlink will be created
to represent the "basename", pointing at the appropriate
/sys/firmware/qemu_fw_cfg/by_key entry. If a file name is not
suitable for this procedure (e.g., if its basename or dirname
components collide with an already existing dirname component
or basename, respectively) the corresponding fw_cfg blob is
skipped and will remain available in sysfs only by its selector
key value.

Signed-off-by: Gabriel Somlo 
Cc: Andy Lutomirski 
---
 .../ABI/testing/sysfs-firmware-qemu_fw_cfg |  42 
 drivers/firmware/qemu_fw_cfg.c | 109 -
 2 files changed, 148 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg 
b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
index b908b6d..a346ee0 100644
--- a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
+++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
@@ -196,3 +196,45 @@ Description:
  entry via the control register, and reading a number
  of bytes equal to the blob size from the data
  register.
+
+   --- Listing fw_cfg blobs by file name ---
+
+   While the fw_cfg device does not impose any specific naming
+   convention on the blobs registered in the file directory,
+   QEMU developers have traditionally used path name semantics
+   to give each blob a descriptive name. For example:
+
+   "bootorder"
+   "genroms/kvmvapic.bin"
+   "etc/e820"
+   "etc/boot-fail-wait"
+   "etc/system-states"
+   "etc/table-loader"
+   "etc/acpi/rsdp"
+   "etc/acpi/tables"
+   "etc/smbios/smbios-tables"
+   "etc/smbios/smbios-anchor"
+   ...
+
+   In addition to the listing by unique selector key described
+   above, the fw_cfg sysfs driver also attempts to build a tree
+   of directories matching the path name components of fw_cfg
+   blob names, ending in symlinks to the by_key entry for each
+   "basename", as illustrated below (assume current directory is
+   /sys/firmware):
+
+   qemu_fw_cfg/by_name/bootorder -> ../by_key/38
+   qemu_fw_cfg/by_name/etc/e820 -> ../../by_key/35
+   qemu_fw_cfg/by_name/etc/acpi/rsdp -> ../../../by_key/41
+   ...
+
+   Construction of the directory tree and symlinks is done on a
+   "best-effort" basis, as there is no guarantee that components
+   of fw_cfg blob names are always "well behaved". I.e., there is
+   the possibility that a symlink (basename) will conflict with
+   a dirname component of another fw_cfg blob, in which case the
+   creation of the offending /sys/firmware/qemu_fw_cfg/by_name
+   entry will be skipped.
+
+   The authoritative list of entries will continue to be found
+   under the /sys/firmware/qemu_fw_cfg/by_key directory.
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 618304a..9ac1ca7 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -318,9 +318,103 @@ static struct bin_attribute fw_cfg_sysfs_attr_raw = {
.read = fw_cfg_sysfs_read_raw,
 };
 
-/* kobjects representing top-level and by_key folders */
+/*
+ * Create a kset subdirectory matching each '/' delimited dirname token
+ * in 'name', starting with sysfs kset/folder 'dir'; At the end, create
+ * a symlink directed at the given 'target'.
+ * NOTE: We do this on a best-effort basis, since 'name' is not guaranteed
+ * to be a well-behaved path name. Whenever a symlink vs. kset directory
+ * name collision occurs, the kernel will issue big scary warnings while
+ * refusing to add the offending link or directory. We follow up with our
+ * own, slightly less scary error messages explaining the situation :)
+ */
+static int fw_cfg_build_symlink(struct kset *dir,
+   struct kobject *target, const char *name)
+{
+   int ret;
+   stru

[PATCH v4 4/4] devicetree: update documentation for fw_cfg ARM bindings

2015-11-13 Thread Gabriel L. Somlo
From: Gabriel Somlo 

Remove redundant details from
Documentation/devicetree/bindings/arm/fw-cfg.txt,
and replace them with a pointer to the more comprehensive
fw_cfg documentation privided by
Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg,
leaving the specific ARM DTB node description in place.

Signed-off-by: Gabriel Somlo 
Cc: Laszlo Ersek 
---
 Documentation/devicetree/bindings/arm/fw-cfg.txt | 37 ++--
 1 file changed, 2 insertions(+), 35 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/fw-cfg.txt 
b/Documentation/devicetree/bindings/arm/fw-cfg.txt
index 953fb64..7aeb48a 100644
--- a/Documentation/devicetree/bindings/arm/fw-cfg.txt
+++ b/Documentation/devicetree/bindings/arm/fw-cfg.txt
@@ -11,43 +11,10 @@ QEMU exposes the control and data register to ARM guests as 
memory mapped
 registers; their location is communicated to the guest's UEFI firmware in the
 DTB that QEMU places at the bottom of the guest's DRAM.
 
-The guest writes a selector value (a key) to the selector register, and then
-can read the corresponding data (produced by QEMU) via the data register. If
-the selected entry is writable, the guest can rewrite it through the data
-register.
 
-The selector register takes keys in big endian byte order.
+For a comprehensive description of the behavior of fw_cfg, please see
+Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg.
 
-The data register allows accesses with 8, 16, 32 and 64-bit width (only at
-offset 0 of the register). Accesses larger than a byte are interpreted as
-arrays, bundled together only for better performance. The bytes constituting
-such a word, in increasing address order, correspond to the bytes that would
-have been transferred by byte-wide accesses in chronological order.
-
-The interface allows guest firmware to download various parameters and blobs
-that affect how the firmware works and what tables it installs for the guest
-OS. For example, boot order of devices, ACPI tables, SMBIOS tables, kernel and
-initrd images for direct kernel booting, virtual machine UUID, SMP information,
-virtual NUMA topology, and so on.
-
-The authoritative registry of the valid selector values and their meanings is
-the QEMU source code; the structure of the data blobs corresponding to the
-individual key values is also defined in the QEMU source code.
-
-The presence of the registers can be verified by selecting the "signature" blob
-with key 0x, and reading four bytes from the data register. The returned
-signature is "QEMU".
-
-The outermost protocol (involving the write / read sequences of the control and
-data registers) is expected to be versioned, and/or described by feature bits.
-The interface revision / feature bitmap can be retrieved with key 0x0001. The
-blob to be read from the data register has size 4, and it is to be interpreted
-as a uint32_t value in little endian byte order. The current value
-(corresponding to the above outer protocol) is zero.
-
-The guest kernel is not expected to use these registers (although it is
-certainly allowed to); the device tree bindings are documented here because
-this is where device tree bindings reside in general.
 
 Required properties:
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device

2015-11-13 Thread Gabriel L. Somlo
From: Gabriel Somlo 

Make fw_cfg entries of type "file" available via sysfs. Entries
are listed under /sys/firmware/qemu_fw_cfg/by_key, in folders
named after each entry's selector key. Filename, selector value,
and size read-only attributes are included for each entry. Also,
a "raw" attribute allows retrieval of the full binary content of
each entry.

This patch also provides a documentation file outlining the
guest-side "hardware" interface exposed by the QEMU fw_cfg device.

The fw_cfg device can be instantiated automatically from ACPI or
the Device Tree, or manually by using a kernel module (or command
line) parameter, with a syntax outlined in the documentation file.

Signed-off-by: Gabriel Somlo 
---
 .../ABI/testing/sysfs-firmware-qemu_fw_cfg | 198 +++
 drivers/firmware/Kconfig   |  19 +
 drivers/firmware/Makefile  |   1 +
 drivers/firmware/qemu_fw_cfg.c | 611 +
 4 files changed, 829 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
 create mode 100644 drivers/firmware/qemu_fw_cfg.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg 
b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
new file mode 100644
index 000..b908b6d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
@@ -0,0 +1,198 @@
+What:  /sys/firmware/qemu_fw_cfg/
+Date:  August 2015
+Contact:   Gabriel Somlo 
+Description:
+   Several different architectures supported by QEMU (x86, arm,
+   sun4*, ppc/mac) are provisioned with a firmware configuration
+   (fw_cfg) device, originally intended as a way for the host to
+   provide configuration data to the guest firmware. Starting
+   with QEMU v2.4, arbitrary fw_cfg file entries may be specified
+   by the user on the command line, which makes fw_cfg additionally
+   useful as an out-of-band, asynchronous mechanism for providing
+   configuration data to the guest userspace.
+
+   === Guest-side Hardware Interface ===
+
+   The fw_cfg device is available to guest VMs as a register pair
+   (control and data), accessible as either a IO ports or as MMIO
+   addresses, depending on the architecture.
+
+   --- Control Register ---
+
+   Width: 16-bit
+   Access: Write-Only
+   Endianness: LE (if IOport) or BE (if MMIO)
+
+   A write to the control register selects the index for one of
+   the firmware configuration items (or "blobs") available on the
+   fw_cfg device, which can subsequently be read from the data
+   register.
+
+   Each time the control register is written, an data offset
+   internal to the fw_cfg device will be set to zero. This data
+   offset impacts which portion of the selected fw_cfg blob is
+   accessed by reading the data register, as explained below.
+
+   --- Data Register ---
+
+   Width: 8-bit (if IOport), or 8/16/32/64-bit (if MMIO)
+   Access: Read-Only
+   Endianness: string preserving
+
+   The data register allows access to an array of bytes which
+   represent the fw_cfg blob last selected by a write to the
+   control register.
+
+   Immediately following a write to the control register, the data
+   offset will be set to zero. Each successful read access to the
+   data register will increment the data offset by the appropriate
+   access width.
+
+   Each fw_cfg blob has a maximum associated data length. Once the
+   data offset exceeds this maximum length, any subsequent reads
+   via the data register will return 0x00.
+
+   An N-byte wide read of the data register will return the next
+   available N bytes of the selected fw_cfg blob, as a substring,
+   in increasing address order, similar to memcpy(), zero-padded
+   if necessary should the maximum data length of the selected
+   item be reached, as described above.
+
+   --- Per-arch Register Details ---
+
+   -
+   archaccess base ctrlctrldatamax.
+   modeaddress offset  endian  offset  data
+   (bytes) (bytes)
+   -
+   x86 IOport0x510 0   LE  1   1
+   arm MMIO  0x902 8   BE  0   8
+   sun4u   IOport0x510

[PATCH v4 2/4] kobject: export kset_find_obj() for module use

2015-11-13 Thread Gabriel L. Somlo
From: Gabriel Somlo 

Signed-off-by: Gabriel Somlo 
---
 lib/kobject.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/kobject.c b/lib/kobject.c
index 7cbccd2..90d1be6 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -861,6 +861,7 @@ struct kobject *kset_find_obj(struct kset *kset, const char 
*name)
spin_unlock(&kset->list_lock);
return ret;
 }
+EXPORT_SYMBOL(kset_find_obj);
 
 static void kset_release(struct kobject *kobj)
 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] fs/stat.c: drop the last new_valid_dev check

2015-11-13 Thread Yaowei Bai
New_valid_dev() always returns true, so that's unnecessary to perform
new_valid_dev() checks in some filesystems. Most checks of new_valid_dev()
have been removed so let's drop this last one and then we can remove
new_valid_dev() from the source code.

No functional change.

Signed-off-by: Yaowei Bai 
---
 fs/stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/stat.c b/fs/stat.c
index d4a61d8..bc045c7 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -219,7 +219,7 @@ SYSCALL_DEFINE2(fstat, unsigned int, fd, struct 
__old_kernel_stat __user *, stat
 #  define choose_32_64(a,b) b
 #endif
 
-#define valid_dev(x)  choose_32_64(old_valid_dev,new_valid_dev)(x)
+#define valid_dev(x)  choose_32_64(old_valid_dev(x),true)
 #define encode_dev(x) choose_32_64(old_encode_dev,new_encode_dev)(x)
 
 #ifndef INIT_STRUCT_STAT_PADDING
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] include/linux/kdev_t.h: remove new_valid_dev()

2015-11-13 Thread Yaowei Bai
As all new_valid_dev() checks have been removed it's time to drop
new_valid_dev() itself.

No functional change.

Signed-off-by: Yaowei Bai 
---
 include/linux/kdev_t.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/linux/kdev_t.h b/include/linux/kdev_t.h
index 052c7b3..8e9e288b 100644
--- a/include/linux/kdev_t.h
+++ b/include/linux/kdev_t.h
@@ -35,11 +35,6 @@ static inline dev_t old_decode_dev(u16 val)
return MKDEV((val >> 8) & 255, val & 255);
 }
 
-static inline bool new_valid_dev(dev_t dev)
-{
-   return 1;
-}
-
 static inline u32 new_encode_dev(dev_t dev)
 {
unsigned major = MAJOR(dev);
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RT PATCH] sched: rt: fix two possible deadlocks in push_irq_work_func

2015-11-13 Thread yanjiang.jin
From: Yanjiang Jin 

This can only happen in RT kernel due to run_timer_softirq() calls
irq_work_tick() when CONFIG_PREEMPT_RT_FULL is enabled as below:

static void run_timer_softirq(struct softirq_action *h)
{

if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
irq_work_tick();
endif

}

Use raw_spin_{un,}lock_irq{save,restore} in push_irq_work_func() to
prevent following potentially deadlock scenario:

=
[ INFO: inconsistent lock state ]
4.1.12-rt8-WR8.0.0.0_preempt-rt #27 Not tainted
-
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ksoftirqd/3/30 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&rt_rq->push_lock) at: [] push_irq_work_func+0xb4/0x190
{IN-HARDIRQ-W} state was registered at:
  [] __lock_acquire+0xd9c/0x1a00
  [] lock_acquire+0x104/0x338
  [] _raw_spin_lock+0x4c/0x68
  [] push_irq_work_func+0xb4/0x190
  [] irq_work_run_list+0x90/0xe8
  [] irq_work_run+0x38/0x80
  [] smp_call_function_interrupt+0x24/0x30
  [] octeon_78xx_call_function_interrupt+0x1c/0x30
  [] handle_irq_event_percpu+0x110/0x620
  [] handle_percpu_irq+0xac/0xf0
  [] generic_handle_irq+0x44/0x58
  [] do_IRQ+0x24/0x30
  [] plat_irq_dispatch+0xdc/0x138
  [] ret_from_irq+0x0/0x4
  [] _raw_spin_unlock_irqrestore+0x94/0xc8
  [] try_to_wake_up+0x9c/0x3e8
  [] call_timer_fn+0xf4/0x570
  [] run_timer_softirq+0x21c/0x508
  [] do_current_softirqs+0x364/0x888
  [] run_ksoftirqd+0x38/0x68
  [] smpboot_thread_fn+0x2ac/0x3a0
  [] kthread+0xe0/0xf8
  [] ret_from_kernel_thread+0x14/0x1c
irq event stamp: 7091
hardirqs last  enabled at (7091): restore_partial+0x74/0x14c
hardirqs last disabled at (7090): handle_int+0x11c/0x13c
softirqs last  enabled at (0): copy_process.part.6+0x544/0x1a08
softirqs last disabled at (0): [<  (null)>]   (null)

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&rt_rq->push_lock);
  
lock(&rt_rq->push_lock);

 *** DEADLOCK ***

1 lock held by ksoftirqd/3/30:
 #0:  (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at:
 [] do_current_softirqs+0x15c/0x888

stack backtrace:
CPU: 3 PID: 30 Comm: ksoftirqd/3 4.1.12-rt8-WR8.0.0.0_preempt-rt #27
Stack : 800026bf1500 80d8e158 0004 80d9
   80d9  
  0003 0004 80d9 801e4830
   81c6  809b5550
  000b0004  801e4d30 80fa5e08
  80bb3bd0 0003 001e 80fbce98
  0002   809b5550
  80040ccbf8c8 80040ccbf7b0 80ce9987 809b8b30
  80040ccb6320 801e5b54 0b1512ba6ee0 80ba73b8
  0003 80155f68  
  ...
Call Trace:
[] show_stack+0x98/0xb8
[] dump_stack+0x88/0xac
[] print_usage_bug+0x25c/0x328
[] mark_lock+0x7fc/0x888
[] __lock_acquire+0x91c/0x1a00
[] lock_acquire+0x104/0x338
[] _raw_spin_lock+0x4c/0x68
[] push_irq_work_func+0xb4/0x190
[] irq_work_run_list+0x90/0xe8
[] irq_work_tick+0x44/0x78
[] run_timer_softirq+0x74/0x508
[] do_current_softirqs+0x364/0x888
[] run_ksoftirqd+0x38/0x68
[] smpboot_thread_fn+0x2ac/0x3a0
[] kthread+0xe0/0xf8
[] ret_from_kernel_thread+0x14/0x1c

=
[ INFO: inconsistent lock state ]
4.1.12-rt8-WR8.0.0.0_preempt-rt #29 Not tainted
-
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ksoftirqd/3/29 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&rq->lock){?...-.}, at: [] push_irq_work_func+0x98/0x198
{IN-HARDIRQ-W} state was registered at:
  [] __lock_acquire+0xd9c/0x1a00
  [] lock_acquire+0x104/0x338
  [] _raw_spin_lock+0x4c/0x68
  [] scheduler_tick+0x58/0x100
  [] update_process_times+0x38/0x68
  [] tick_handle_periodic+0x40/0xb8
  [] c0_compare_interrupt+0x6c/0x98
  [] handle_irq_event_percpu+0x110/0x620
  [] handle_percpu_irq+0xac/0xf0
  [] generic_handle_irq+0x44/0x58
  [] do_IRQ+0x24/0x30
  [] plat_irq_dispatch+0xa4/0x138
  [] ret_from_irq+0x0/0x4
  [] prom_putchar+0x34/0x68
  [] early_console_write+0x54/0xa0
  [] call_console_drivers.constprop.13+0x174/0x3a0
  [] console_unlock+0x388/0x5a0
  [] con_init+0x29c/0x2e0
  [] console_init+0x3c/0x54
  [] start_kernel+0x36c/0x594
irq event stamp: 14847
hardirqs last  enabled at (14847): do_current_softirqs+0x284/0x888
hardirqs last disabled at (14846): do_current_softirqs+0x194/0x888
softirqs last  enabled at (0):  copy_process.part.6+0x544/0x1a08
softirqs last disabled at (0): [<  (null)>]   (null)

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&rq->lock);
  
lock(&rq->lock);

 *** DEADLOCK ***

1 lock held by ksoftirqd/3/29:
 #0:  (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at:
 [] do

[PATCH] Staging: iio: iio_simple_dummy_buffer: Typo in comments area

2015-11-13 Thread Nizam Haider
Fix simple typo in comments

Signed-off-by: Nizam Haider 
---
 drivers/staging/iio/iio_simple_dummy_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/iio/iio_simple_dummy_buffer.c 
b/drivers/staging/iio/iio_simple_dummy_buffer.c
index cf44a6f..c8f889b 100644
--- a/drivers/staging/iio/iio_simple_dummy_buffer.c
+++ b/drivers/staging/iio/iio_simple_dummy_buffer.c
@@ -64,7 +64,7 @@ static irqreturn_t iio_simple_dummy_trigger_h(int irq, void 
*p)
 * software scans: can be considered to be random access
 *   so efficient reading is just a case of minimal bus
 *   transactions.
-* software culled hardware scans:
+* software called hardware scans:
 *   occasionally a driver may process the nearest hardware
 *   scan to avoid storing elements that are not desired. This
 *   is the fiddliest option by far.
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS

2015-11-13 Thread Z Lim
Yang, I noticed another thing...

On Fri, Nov 13, 2015 at 10:09 AM, Yang Shi  wrote:
> Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
> in prologue in order to get the correct stack backtrace.
>
> However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
> change during function call so it may cause the BPF prog stack base address
> change too.
>
> Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
> saved register, so it will keep intact during function call.

Can you please add save/restore for x25 also? :)

> It is initialized in BPF prog prologue when BPF prog is started to run
> everytime. When BPF prog exits, it could be just tossed.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling

2015-11-13 Thread Dan Williams
On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger  wrote:
> On Nov 13, 2015, at 5:20 PM, Dan Williams  wrote:
>>
>> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
>>  wrote:
>>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
>>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
>>> and are used by filesystems to order their metadata, among other things.
>>>
>>> When we get an msync() or fsync() it is the responsibility of the DAX code
>>> to flush all dirty pages to media.  The PMEM driver then just has issue a
>>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
>>> the flushed data has been durably stored on the media.
>>>
>>> Signed-off-by: Ross Zwisler 
>>
>> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
>> the cache is done by the core why does the driver need support
>> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
>> only makes sense if individual writes can bypass the "drive" cache,
>> but no I/O submitted to the driver proper is ever cached we always
>> flush it through to media.
>
> If the upper level filesystem gets an error when submitting a flush
> request, then it assumes the underlying hardware is broken and cannot
> be as aggressive in IO submission, but instead has to wait for in-flight
> IO to complete.

Upper level filesystems won't get errors when the driver does not
support flush.  Those requests are ended cleanly in
generic_make_request_checks().  Yes, the fs still needs to wait for
outstanding I/O to complete but in the case of pmem all I/O is
synchronous.  There's never anything to await when flushing at the
pmem driver level.

> Since FUA/FLUSH is basically a no-op for pmem devices,
> it doesn't make sense _not_ to support this functionality.

Seems to be a nop either way.  Given that DAX may lead to dirty data
pending to the device in the cpu cache that a REQ_FLUSH request will
not touch, its better to leave it all to the mm core to handle.  I.e.
it doesn't make sense to call the driver just for two instructions
(sfence + pcommit) when the mm core is taking on the cache flushing.
Either handle it all in the mm or the driver, not a mixture.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Test. Please ignore

2015-11-13 Thread team
Test message. Please ignore.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] spi: mediatek: single device does not require cs_gpios

2015-11-13 Thread lei liu
On Mon, 2015-11-09 at 12:14 +0800, Nicolas Boichat wrote:
> When only one device is present, it is not necessary to specify
> cs_gpios, as the CS line can be controlled by the hardware
> module.
> 
> Without this patch, older device tree bindings used before
> 37457607 "spi: mediatek: mt8173 spi multiple devices support"
> would cause a panic on boot. This fixes the crash, and
> re-introduces backward compatibility.
> 
> Signed-off-by: Nicolas Boichat 

Acked-by: Leilk Liu 

> ---
> 
> v2: Use gpio_is_valid()
> 
> Applies on top of broonie/spi.git/for-next.
> 
>  drivers/spi/spi-mt65xx.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/spi/spi-mt65xx.c b/drivers/spi/spi-mt65xx.c
> index 563954a..7840067 100644
> --- a/drivers/spi/spi-mt65xx.c
> +++ b/drivers/spi/spi-mt65xx.c
> @@ -410,7 +410,7 @@ static int mtk_spi_setup(struct spi_device *spi)
>   if (!spi->controller_data)
>   spi->controller_data = (void *)&mtk_default_chip_info;
>  
> - if (mdata->dev_comp->need_pad_sel)
> + if (mdata->dev_comp->need_pad_sel && gpio_is_valid(spi->cs_gpio))
>   gpio_direction_output(spi->cs_gpio, !(spi->mode & SPI_CS_HIGH));
>  
>   return 0;
> @@ -632,13 +632,23 @@ static int mtk_spi_probe(struct platform_device *pdev)
>   goto err_put_master;
>   }
>  
> - for (i = 0; i < master->num_chipselect; i++) {
> - ret = devm_gpio_request(&pdev->dev, master->cs_gpios[i],
> - dev_name(&pdev->dev));
> - if (ret) {
> - dev_err(&pdev->dev,
> - "can't get CS GPIO %i\n", i);
> - goto err_put_master;
> + if (!master->cs_gpios && master->num_chipselect > 1) {
> + dev_err(&pdev->dev,
> + "cs_gpios not specified and num_chipselect > 
> 1\n");
> + ret = -EINVAL;
> + goto err_put_master;
> + }
> +
> + if (master->cs_gpios) {
> + for (i = 0; i < master->num_chipselect; i++) {
> + ret = devm_gpio_request(&pdev->dev,
> + master->cs_gpios[i],
> + dev_name(&pdev->dev));
> + if (ret) {
> + dev_err(&pdev->dev,
> + "can't get CS GPIO %i\n", i);
> + goto err_put_master;
> + }
>   }
>   }
>   }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: module: save load_info for livepatch modules

2015-11-13 Thread Jessica Yu

+++ Miroslav Benes [13/11/15 13:56 +0100]:

On Fri, 13 Nov 2015, Miroslav Benes wrote:


I agree this seems like the best approach. So if we preserve
mod_arch_syminfo (in case of s390) we should free it not in
module_finalize, but somewhere in free_module... where
module_arch_cleanup() is called... and also module_arch_freeing_init() is
called there too. And what you find there for s390 is

vfree(mod->arch.syminfo);
mod->arch.syminfo = NULL;

Well, it does nothing here, because mod->arch.syminfo is already NULL. It
was freed in module_finalize. So we can even remove this code from
module_finalize and all should be fine. At least for s390.


Which is not true because module_arch_freeing_init is also called from
do_init_module, called from load_module. So we should move it to
module_arch_cleanup.

That code is like a maze without Ariadne's thread.


Heh, I agree with that sentiment.

I am slightly confused about the s390 code, and whether the authors
originally intended for that double vfree() to happen in both
module_finalize() and module_arch_freeing_init() (called from
do_init_module). Seems like a mistake. If module load succeeds,
do_init_module calls module_arch_freeing_init(). And if load_module
fails halfway through, both module_deallocate() and free_module() will
also call module_arch_freeing_init(). I feel like that vfree should
only happen once in module_arch_freeing_init() and not in
module_finalize(). If we can remove the double vfree() code from
module_finalize(), we can copy the mod_arch_specific safely before the
call to do_init_module().

Jessica
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Removal of wchan and top

2015-11-13 Thread Raymond Jennings
Hey, don't know if this is important enough, but could I request that 
the removal of wchan be reverted, or at least wrapped in an optional 
config setting?


I happen to enjoy monitoring this information with a secure top, and 
it's useful for understanding how my system works and I've used it a 
few times for debugging.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

2015-11-13 Thread Boris Ostrovsky



On 11/13/2015 06:26 PM, Andy Lutomirski wrote:

On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
 wrote:

After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
it's not pt_regs).

We need to adjust it so that subsequent xen_iret can use it.

I'm wondering if this should be more straightforward:

 movq%rsp, %rdi
 calldo_fast_syscall_32
 testl   %eax, %eax
 jz  .Lsyscall_32_done

 /* Opportunistic SYSRET */
sysret32_from_system_call:
 XEN_DO_SYSRET32

where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
variant of Xen's iret path that knows that the fast path is okay.



This patch is for 32-bit kernel. I actually haven't looked at compat 
code (probably because our tests don't try that), I need to do that too.


As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for 
baremetal otherwise current paravirt op will use native_usergs_sysret32 
(for compat code). Which means a new pv_op, I think.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Endless getdents() in vfat filesystem

2015-11-13 Thread Vegard Nossum

Hi,

Using the attached disk image I observe that getdents() never returns
the end of the directory, i.e. mounting the disk image on a loopback
device and running 'ls' under strace shows an endless stream of:

getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 2 entries */, 32768) = 48
...


Vegard



vfat.img.bz2
Description: application/bzip


Re: [PATCH v2 02/11] mm: add pmd_mkclean()

2015-11-13 Thread Dave Hansen
On 11/13/2015 04:06 PM, Ross Zwisler wrote:
> +static inline pmd_t pmd_mkclean(pmd_t pmd)
> +{
> + return pmd_clear_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
> +}

pte_mkclean() doesn't clear _PAGE_SOFT_DIRTY.  What the thought behind
doing it here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] Input updates for 4.4-rc0 (round 2)

2015-11-13 Thread Dmitry Torokhov
Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive updates for the input subsystem. You will get an update to
tsc2005 driver that allows it to support tsc2004 (basically the same
controller, but uses i2c instead of spi bus), and a couple of bug fixes.

Note that there will be a minor conflict in
drivers/input/touchscreen/tsc2005.c due to the recently merged change
removing setting of owner in SPI drivers.

Changelog:
-

Geert Uytterhoeven (1):
  Input: parkbd - drop bogus __init from parkbd_allocate_serio()

Michael Welling (3):
  Input: tsc2005 - separate SPI and core functions
  Input: tsc200x-core - rename functions and variables
  Input: tsc2004 - add support for tsc2004

Takashi Iwai (1):
  Input: elantech - add Fujitsu Lifebook U745 to force crc_enabled


Diffstat:


 .../bindings/input/touchscreen/tsc2005.txt |  34 +-
 drivers/input/mouse/elantech.c |   7 +
 drivers/input/serio/parkbd.c   |   2 +-
 drivers/input/touchscreen/Kconfig  |  17 +
 drivers/input/touchscreen/Makefile |   2 +
 drivers/input/touchscreen/tsc2004.c|  83 +++
 drivers/input/touchscreen/tsc2005.c| 714 +
 drivers/input/touchscreen/tsc200x-core.c   | 665 +++
 drivers/input/touchscreen/tsc200x-core.h   |  78 +++
 9 files changed, 899 insertions(+), 703 deletions(-)
 create mode 100644 drivers/input/touchscreen/tsc2004.c
 create mode 100644 drivers/input/touchscreen/tsc200x-core.c
 create mode 100644 drivers/input/touchscreen/tsc200x-core.h

-- 
Dmitry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tmpfs: avoid a little creat and stat slowdown

2015-11-13 Thread Hugh Dickins
On Fri, 13 Nov 2015, Huang, Ying wrote:
> 
> c435a390574d is the direct parent of afa2db2fb6f1 in its original git.
> 43819159da2b is your patch applied on top of v4.3-rc7.  The comparison
> of 43819159da2b with v4.3-rc7 is as follow:
...
> So you patch improved 11.9% from its base v4.3-rc7.  I think other
> difference are caused by other changes.  Sorry for confusing.

Thanks for getting back on this: that's rather what I was hoping to hear.

Of course, no user will care which commit is responsible for a slowdown,
and we may need to look further; but I couldn't make sense of it before,
so this was a relief.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pinctrl: mediatek: Add get_direction support.

2015-11-13 Thread Hongzhou Yang
Since Linux gpio framework return 0 for output, 1 for input.
But HW use 0 stands for input, and 1 stands for output.
So use negative to correct it.

And gpio_chip.get is used to get input value, no need to get
output value, so removing it.

Signed-off-by: Hongzhou Yang 
---
 Add get direction support.
 Remove gpio_chip.get value for output direction.

 drivers/pinctrl/mediatek/pinctrl-mtk-common.c |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c 
b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
index 7726c6c..bbf0230 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
@@ -757,7 +757,7 @@ static int mtk_gpio_get_direction(struct gpio_chip *chip, 
unsigned offset)
reg_addr =  mtk_get_port(pctl, offset) + pctl->devdata->dir_offset;
bit = BIT(offset & 0xf);
regmap_read(pctl->regmap1, reg_addr, &read_val);
-   return !!(read_val & bit);
+   return !(read_val & bit);
 }
 
 static int mtk_gpio_get(struct gpio_chip *chip, unsigned offset)
@@ -767,12 +767,8 @@ static int mtk_gpio_get(struct gpio_chip *chip, unsigned 
offset)
unsigned int read_val = 0;
struct mtk_pinctrl *pctl = dev_get_drvdata(chip->dev);
 
-   if (mtk_gpio_get_direction(chip, offset))
-   reg_addr = mtk_get_port(pctl, offset) +
-   pctl->devdata->dout_offset;
-   else
-   reg_addr = mtk_get_port(pctl, offset) +
-   pctl->devdata->din_offset;
+   reg_addr = mtk_get_port(pctl, offset) +
+   pctl->devdata->din_offset;
 
bit = BIT(offset & 0xf);
regmap_read(pctl->regmap1, reg_addr, &read_val);
@@ -1007,6 +1003,7 @@ static struct gpio_chip mtk_gpio_chip = {
.owner  = THIS_MODULE,
.request= mtk_gpio_request,
.free   = mtk_gpio_free,
+   .get_direction  = mtk_gpio_get_direction,
.direction_input= mtk_gpio_direction_input,
.direction_output   = mtk_gpio_direction_output,
.get= mtk_gpio_get,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling

2015-11-13 Thread Andreas Dilger
On Nov 13, 2015, at 5:20 PM, Dan Williams  wrote:
> 
> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
>  wrote:
>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
>> and are used by filesystems to order their metadata, among other things.
>> 
>> When we get an msync() or fsync() it is the responsibility of the DAX code
>> to flush all dirty pages to media.  The PMEM driver then just has issue a
>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
>> the flushed data has been durably stored on the media.
>> 
>> Signed-off-by: Ross Zwisler 
> 
> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
> the cache is done by the core why does the driver need support
> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
> only makes sense if individual writes can bypass the "drive" cache,
> but no I/O submitted to the driver proper is ever cached we always
> flush it through to media.

If the upper level filesystem gets an error when submitting a flush
request, then it assumes the underlying hardware is broken and cannot
be as aggressive in IO submission, but instead has to wait for in-flight
IO to complete.  Since FUA/FLUSH is basically a no-op for pmem devices,
it doesn't make sense _not_ to support this functionality.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v3] mm/hugetlbfs Fix bugs in fallocate hole punch of areas with holes

2015-11-13 Thread Hugh Dickins
On Tue, 10 Nov 2015, Mike Kravetz wrote:

> Hugh Dickins pointed out problems with the new hugetlbfs fallocate
> hole punch code.  These problems are in the routine remove_inode_hugepages
> and mostly occur in the case where there are holes in the range of
> pages to be removed.  These holes could be the result of a previous hole
> punch or simply sparse allocation.  The current code could access pages
> outside the specified range.
> 
> remove_inode_hugepages handles both hole punch and truncate operations.
> Page index handling was fixed/cleaned up so that the loop index always
> matches the page being processed.  The code now only makes a single pass
> through the range of pages as it was determined page faults could not
> race with truncate.  A cond_resched() was added after removing up to
> PAGEVEC_SIZE pages.
> 
> Some totally unnecessary code in hugetlbfs_fallocate() that remained from
> early development was also removed.
> 
> V3:
>   Add more descriptive comments and minor improvements as suggested by
>   Naoya Horiguchi
> v2:
>   Make remove_inode_hugepages simpler after verifying truncate can not
>   race with page faults here.
> 
> Tested with fallocate tests submitted here:
> http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/
> And, some ftruncate tests under development
> 
> Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
> Cc: sta...@vger.kernel.org [4.3]
> Signed-off-by: Mike Kravetz 

Acked-by: Hugh Dickins 

> ---
>  fs/hugetlbfs/inode.c | 65 
> ++--
>  1 file changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 316adb9..de4bdfa 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -332,12 +332,17 @@ static void remove_huge_page(struct page *page)
>   * truncation is indicated by end of range being LLONG_MAX
>   *   In this case, we first scan the range and release found pages.
>   *   After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
> - *   maps and global counts.
> + *   maps and global counts.  Page faults can not race with truncation
> + *   in this routine.  hugetlb_no_page() prevents page faults in the
> + *   truncated range.  It checks i_size before allocation, and again after
> + *   with the page table lock for the page held.  The same lock must be
> + *   acquired to unmap a page.
>   * hole punch is indicated if end is not LLONG_MAX
>   *   In the hole punch case we scan the range and release found pages.
>   *   Only when releasing a page is the associated region/reserv map
>   *   deleted.  The region/reserv map for ranges without associated
> - *   pages are not modified.
> + *   pages are not modified.  Page faults can race with hole punch.
> + *   This is indicated if we find a mapped page.
>   * Note: If the passed end of range value is beyond the end of file, but
>   * not LLONG_MAX this routine still performs a hole punch operation.
>   */
> @@ -361,46 +366,37 @@ static void remove_inode_hugepages(struct inode *inode, 
> loff_t lstart,
>   next = start;
>   while (next < end) {
>   /*
> -  * Make sure to never grab more pages that we
> -  * might possibly need.
> +  * Don't grab more pages than the number left in the range.
>*/
>   if (end - next < lookup_nr)
>   lookup_nr = end - next;
>  
>   /*
> -  * This pagevec_lookup() may return pages past 'end',
> -  * so we must check for page->index > end.
> +  * When no more pages are found, we are done.
>*/
> - if (!pagevec_lookup(&pvec, mapping, next, lookup_nr)) {
> - if (next == start)
> - break;
> - next = start;
> - continue;
> - }
> + if (!pagevec_lookup(&pvec, mapping, next, lookup_nr))
> + break;
>  
>   for (i = 0; i < pagevec_count(&pvec); ++i) {
>   struct page *page = pvec.pages[i];
>   u32 hash;
>  
> + /*
> +  * The page (index) could be beyond end.  This is
> +  * only possible in the punch hole case as end is
> +  * max page offset in the truncate case.
> +  */
> + next = page->index;
> + if (next >= end)
> + break;
> +
>   hash = hugetlb_fault_mutex_hash(h, current->mm,
>   &pseudo_vma,
>   mapping, next, 0);
>   mutex_lock(&hugetlb_fault_mutex_table[hash]);
>  
>   lock_page(page);
> -  

Re: [PATCH] mm/hugetlb: Unmap pages if page fault raced with hole punch

2015-11-13 Thread Hugh Dickins
On Tue, 10 Nov 2015, Mike Kravetz wrote:
> On 11/09/2015 02:55 PM, Mike Kravetz wrote:
> > On 11/08/2015 11:42 PM, Hugh Dickins wrote:
> >> On Fri, 30 Oct 2015, Mike Kravetz wrote:
> >>>
> >>> The 'next = start' code is actually from the original truncate_hugepages
> >>> routine.  This functionality was combined with that needed for hole punch
> >>> to create remove_inode_hugepages().
> >>>
> >>> The following code was in truncate_hugepages:
> >>>
> >>>   next = start;
> >>>   while (1) {
> >>>   if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
> >>>   if (next == start)
> >>>   break;
> >>>   next = start;
> >>>   continue;
> >>>   }
> >>>
> >>>
> >>> So, in the truncate case pages starting at 'start' are deleted until
> >>> pagevec_lookup fails.  Then, we call pagevec_lookup() again.  If no
> >>> pages are found we are done.  Else, we repeat the whole process.
> >>>
> >>> Does anyone recall the reason for going back and looking for pages at
> >>> index'es already deleted?  Git doesn't help as that was part of initial
> >>> commit.  My thought is that truncate can race with page faults.  The
> >>> truncate code sets inode offset before unmapping and deleting pages.
> >>> So, faults after the new offset is set should fail.  But, I suppose a
> >>> fault could race with setting offset and deleting of pages.  Does this
> >>> sound right?  Or, is there some other reason I am missing?
> >>
> >> I believe your thinking is correct.  But remember that
> >> truncate_inode_pages_range() is shared by almost all filesystems,
> >> and different filesystems have different internal locking conventions,
> >> and different propensities to such a race: it's trying to cover for
> >> all of them.
> >>
> >> Typically, writing is well serialized (by i_mutex) against truncation,
> >> but faulting (like reading) sails through without enough of a lock.
> >> We resort to i_size checks to avoid the worst of it, but there's often
> >> a corner or two in which those checks are not quite good enough -
> >> it's easy to check i_size at the beginning, but it needs to be checked
> >> again at the end too, and what's been done undone - can be awkward.
> > 
> > Well, it looks like the hugetlb_no_page() routine is checking i_size both
> > before and after.  It appears to be doing the right thing to handle the
> > race, but I need to stare at the code some more to make sure.
> > 
> > Because of the way the truncate code went back and did an extra lookup
> > when done with the range, I assumed it was covering some race.  However,
> > that may not be the case.
> > 
> >>
> >> I hope that in the case of hugetlbfs, since you already have the
> >> additional fault_mutex to handle races between faults and punching,
> >> it should be possible to get away without that "pincer" restarting.
> > 
> > Yes, it looks like this may work as a straight loop over the range of
> > pages.  I just need to study the code some more to make sure I am not
> > missing something.
> 
> I have convinced myself that hugetlb_no_page is coded such that page
> faults can not race with truncate.  hugetlb_no_page handles the case
> where there is no PTE for a faulted in address.  The general flow in
> hugetlb_no_page for the no page found case is:
> - check index against i_size, end if beyond
> - allocate huge page
> - take page table lock for huge page
> - check index against i_size again,  if beyond free page and return
> - add huge page to page table
> - unlock page table lock for huge page
> 
> The flow for the truncate operation in hugetlb_vmtruncate is:
> - set i_size
> - take inode/mapping write lock
> - hugetlb_vmdelete_list() which removes page table entries.  The page
>   table lock will be taken for each huge page in the range
> - release inode/mapping write lock
> - remove_inode_hugepages() to actually remove pages
> 
> The truncate/page fault race we are concerned with is if a page is faulted
> in after hugetlb_vmtruncate sets i_size and unmaps the page, but before
> actually removing the page.  Obviously, any entry into hugetlb_no_page
> after i_size is set will check the value and not allow the fault.  In
> addition, if the value of i_size is set before the second check in
> hugetlb_no_page, it will do the right thing.  Therefore, the only place to
> race is after the second i_size check in hugetlb_no_page.
> 
> Note that the second check for i_size is with the page table lock for
> the huge page held.  It is not possible for hugetlb_vmtruncate to unmap
> the huge page before the page fault completes, as it must acquire the page
> table lock.  This is the same as a fault happening before the truncate
> operation starts and is handled correctly by hugetlb_vmtruncate.
> 
> Another way to look at this is by asking the question, Is it possible to
> fault on a page in the truncate range after it is unmapped by
> hugetlb_vmtruncate/hugetlb_vmdelete_list?  To unmap a page,

Re: module: save load_info for livepatch modules

2015-11-13 Thread Jessica Yu

+++ Miroslav Benes [13/11/15 13:46 +0100]:

On Fri, 13 Nov 2015, Miroslav Benes wrote:


As for load_info, I don't have a strong opinion whether to keep it for all
modules or for livepatch modules only.


I have. We cannot keep it, even for livepatch modules...

In info->hdr there is a temporary copy of the whole module (see
init_module syscall and the first parts of load_module). In load_module
a final struct module * is created with parts of info->hdr copied (I'll
get to that later). So if we saved info->hdr for later purposes we would
just have two copies of the same module in the memory. The original one
with !SHF_ALLOC sections and everything in vmalloc area, and the new
final copy with SHF_ALLOC sections only. This is not good.

If this is correct (and I think it is after some staring into the code) we
need to do something different. We should build the info we need for
delayed relocations from the final copy (or refactor the existing
module code).

The second problem... dynrela sections need to be marked with SHF_ALLOC
flag, right? Perhaps it would be better not to do it and copy also
SHF_RELA_LIVEPATCH sections. It is equivalent but not hidden somewhere
else (in userspace "kpatch-build" tool).


Hm, OK. I understand your concern about leaving a redundant copy of
the module in memory and I agree that we need to do better. I think I
have a solution.

I'm looking at exactly what components we need to make the calls to
apply_relocate_add() work. It's quite simple, I think we only need to
keep the following:

1. A copy of the module's elf section headers i.e. info->sechdrs.
This should be easy to copy. memcpy [info->hdr->e_shnum *
sizeof(Elf_Shdr)] bytes from info->sechdrs. We can maybe put
this in a new field called module->sechdrs.

2. A copy of each __klp_rela section.
If we don't keep info, the current code will discard/not copy the rela
sections over to module core memory since they are !SHF_ALLOC. In
kpatch-build, it is very easy to simply |= the SHF_ALLOC flag with
each __klp_rela section and they will automatically get copied over to
module core memory, and their sh_addr's automatically get reassigned
as well. Thus the klp rela sections will be accessible at
sechdrs[index_of_klpsec].sh_addr. I think this is the easiest solution.

3. A copy of the symbol table. 
Notice that module already has a "symtab" field. In kernels configured

with CONFIG_KALLSYMS, it points to a trimmed down symtab (the
mod->core_symtab) in module core memory. This symtab is not normally
complete; only "core" symbols are kept in it. See add_kallsyms()
(called in post_relocations()) for how core symbols are copied into
this symtab. Then, after the symbols have been copied, module->symtab
is reassigned to point to this core_symtab in do_init_module(). Since
CONFIG_LIVEPATCH requires CONFIG_KALLSYMS, I think we can assume that
mod->symtab will be pointing to mod->core_symtab at the end of the
module load process, since mod->symtab gets assigned to core_symtab in
do_init_module() if CONFIG_KALLSYMS is set.

So for livepatch, what we can do is make sure every symbol in a
livepatch module gets copied into this core symtab. It is important we
keep every symbol since apply_relocate_add() will be using the
original symbol indices. We can implement this by adding a check in
add_kallsyms() to see if we're dealing with a livepatch module. If
yes, just copy all the symbols over.

Then, we will also update Elf_Shdr corresponding to the symbol table
section (sechdrs[symindex].sh_addr) to make sure its sh_addr points to
mod->symtab, so apply_relocate_add() will be able to use it.

4. A copy of mod_arch_specific
I think we discussed this in another email somewhere, but we need to
keep a copy if this somewhere as well. 


So to summarize, keep a copy of sechdrs in module->sechdrs, keep a
copy of mod_arch_specific, mark klp rela sections with SHF_ALLOC,
re-use module->symtab by making sure every symbol gets considered a
"core" symbol and gets copied over. And of course any memory we
allocate (sechdrs, arch stuff) we will free in perhaps free_module()
somewhere.

I haven't implemented it yet but I think it will work, and we don't
need to keep load_info in this scheme. What do you think?

Thanks,
Jessica
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] platform-drivers-x86 for 4.4-2

2015-11-13 Thread Darren Hart
Hi Linus,

The following changes since commit d2f20619942fe4618160a7fa3dbdcbac335cff59:

  toshiba_acpi: Initialize hotkey_event_type variable (2015-11-05 16:09:24 
-0800)

are available in the git repository at:

  git://git.infradead.org/users/dvhart/linux-platform-drivers-x86.git 
tags/platform-drivers-x86-v4.4-2

for you to fetch changes up to b82983401684ba06fcf3fbafa63edf371c0d4775:

  asus-wmi: fix error handling in store_sys_wmi() (2015-11-10 22:22:15 -0800)


platform-drivers-x86 for 4.4-2

Support for the unfortunately rather unique ESC key on the Ideapad Yoga 3 and
two DMI matches for rfkill support. Solitary fix for potential missed errors for
asus-wmi. Downgrade a thinkpad_acpi message to info.

asus-wmi:
 - fix error handling in store_sys_wmi()

ideapad-laptop:
 - Add Lenovo Yoga 900 to no_hw_rfkill dmi list
 - include Yoga 3 1170 in add rfkill whitelist
 - add support for Yoga 3 ESC key

thinkpad_acpi:
 - Don't yell on unsupported brightness interfaces


Arnd Bergmann (2):
  ideapad-laptop: add support for Yoga 3 ESC key
  ideapad-laptop: include Yoga 3 1170 in add rfkill whitelist

Dan Carpenter (1):
  asus-wmi: fix error handling in store_sys_wmi()

David Herrmann (1):
  thinkpad_acpi: Don't yell on unsupported brightness interfaces

Hans de Goede (1):
  ideapad-laptop: Add Lenovo Yoga 900 to no_hw_rfkill dmi list

 drivers/platform/x86/Kconfig  |  1 +
 drivers/platform/x86/asus-wmi.c   |  2 +-
 drivers/platform/x86/ideapad-laptop.c | 49 +++
 drivers/platform/x86/thinkpad_acpi.c  |  3 +--
 4 files changed, 47 insertions(+), 8 deletions(-)

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] usb: dwc2: hcd: fix split schedule issue

2015-11-13 Thread Doug Anderson
John,

On Thu, Nov 12, 2015 at 9:05 PM, John Youn  wrote:
> It seems to be an issue with single TT hubs. I've tried several
> multi-TT hubs with no issues.

Agreed.


> With a single TT hub I do see a problem though not the exact one
> described. I see corrupted and dropped packets on the FS side of
> the hub. In a microframe with SSPLIT.begin, taking up the max
> bandwidth for the microframe, when another SSPLIT for a different
> device is issued, the data gets corrupted on the other side of
> the TT. Probably due to exceeding the bandwidth in the microframe
> since a single TT hub's ports all share the bandwidth.

Seems like different single TT hubs react differently.  I got one
where the mouse kept working but the audio was just static...


> With this fix, the next SSPLIT goes out in the same microframe as
> the SSPLIT.end and the data goes through fine.
>
> However I don't think this will work as a general fix. Since it
> is just skipping things without rescheduling. For example SSPLIT
> now happens a microframe later but the CSPLIT is not adjusted so
> it comes a microframe too early.
>
> I think the correct fix is to create a proper schedule based on
> all the active endpoints to make sure we don't go over the
> bandwidth for a single TT hub. Or to make the adjustments earlier
> like in dwc2_sched_periodic_split().

I've started working on this and just before I needed to leave my desk
I got something that seemed to work.  I'll keep at it on Monday.

At the moment I'm making the assumption that we never got a multi_tt
hub attached to us.  My code will always just schedule one split per
microframe.  Would that be OK for now until we make the scheduler
better?

To handle things smarter, I think I need to research how to deal with
hubs attached to hubs attached to hubs.  For instance:

dwc2
-> multi_tt hub
-> single_tt hub
-> device 1
-> device 2
-> single_tt hub
-> device 3
-> device 4
vs.

dwc2
-> single_tt hub
-> multi_tt hub
-> device 1
-> device 2
-> multi_tt hub
-> device 3
-> device 4

In the first case I presume I could schedule device 1 and device 3 at
the same time, but not device 2 and device 4.  In the 2nd case I
presume I could schedule all 4 devices independently.  ...but I
haven't dug through the spec to confirm that, yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling

2015-11-13 Thread Dan Williams
On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
 wrote:
> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
> and are used by filesystems to order their metadata, among other things.
>
> When we get an msync() or fsync() it is the responsibility of the DAX code
> to flush all dirty pages to media.  The PMEM driver then just has issue a
> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
> the flushed data has been durably stored on the media.
>
> Signed-off-by: Ross Zwisler 

Hmm, I'm not seeing why we need this patch.  If the actual flushing of
the cache is done by the core why does the driver need support
REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
only makes sense if individual writes can bypass the "drive" cache,
but no I/O submitted to the driver proper is ever cached we always
flush it through to media.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 04/11] dax: support dirty DAX entries in radix tree

2015-11-13 Thread Ross Zwisler
Add support for tracking dirty DAX entries in the struct address_space
radix tree.  This tree is already used for dirty page writeback, and it
already supports the use of exceptional (non struct page*) entries.

In order to properly track dirty DAX pages we will insert new exceptional
entries into the radix tree that represent dirty DAX PTE or PMD pages.
These exceptional entries will also contain the writeback addresses for the
PTE or PMD faults that we can use at fsync/msync time.

There are currently two types of exceptional entries (shmem and shadow)
that can be placed into the radix tree, and this adds a third.  There
shouldn't be any collisions between these various exceptional entries
because only one type of exceptional entry should be able to be found in a
radix tree at a time depending on how it is being used.

Signed-off-by: Ross Zwisler 
---
 fs/block_dev.c |  3 ++-
 fs/inode.c |  1 +
 include/linux/dax.h|  5 
 include/linux/fs.h |  1 +
 include/linux/radix-tree.h |  8 ++
 mm/filemap.c   | 10 +---
 mm/truncate.c  | 62 ++
 7 files changed, 59 insertions(+), 31 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 073bb57..afaaf44 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -66,7 +66,8 @@ void kill_bdev(struct block_device *bdev)
 {
struct address_space *mapping = bdev->bd_inode->i_mapping;
 
-   if (mapping->nrpages == 0 && mapping->nrshadows == 0)
+   if (mapping->nrpages == 0 && mapping->nrshadows == 0 &&
+   mapping->nrdax == 0)
return;
 
invalidate_bh_lrus();
diff --git a/fs/inode.c b/fs/inode.c
index 78a17b8..f7c87a6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -496,6 +496,7 @@ void clear_inode(struct inode *inode)
spin_lock_irq(&inode->i_data.tree_lock);
BUG_ON(inode->i_data.nrpages);
BUG_ON(inode->i_data.nrshadows);
+   BUG_ON(inode->i_data.nrdax);
spin_unlock_irq(&inode->i_data.tree_lock);
BUG_ON(!list_empty(&inode->i_data.private_list));
BUG_ON(!(inode->i_state & I_FREEING));
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b415e52..e9d57f68 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -36,4 +36,9 @@ static inline bool vma_is_dax(struct vm_area_struct *vma)
 {
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
 }
+
+static inline bool dax_mapping(struct address_space *mapping)
+{
+   return mapping->host && IS_DAX(mapping->host);
+}
 #endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a84..f791698 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,6 +433,7 @@ struct address_space {
/* Protected by tree_lock together with the radix tree */
unsigned long   nrpages;/* number of total pages */
unsigned long   nrshadows;  /* number of shadow entries */
+   unsigned long   nrdax;  /* number of DAX entries */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops;   /* methods */
unsigned long   flags;  /* error bits/gfp mask */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 33170db..19a533a 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -51,6 +51,14 @@
 #define RADIX_TREE_EXCEPTIONAL_ENTRY   2
 #define RADIX_TREE_EXCEPTIONAL_SHIFT   2
 
+#define RADIX_DAX_MASK 0xf
+#define RADIX_DAX_PTE  (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY)
+#define RADIX_DAX_PMD  (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY)
+#define RADIX_DAX_TYPE(entry) ((__force u64)entry & RADIX_DAX_MASK)
+#define RADIX_DAX_ADDR(entry) ((void __pmem *)((u64)entry & ~RADIX_DAX_MASK))
+#define RADIX_DAX_PTE_ENTRY(addr) ((void *)((__force u64)addr | RADIX_DAX_PTE))
+#define RADIX_DAX_PMD_ENTRY(addr) ((void *)((__force u64)addr | RADIX_DAX_PMD))
+
 static inline int radix_tree_is_indirect_ptr(void *ptr)
 {
return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR);
diff --git a/mm/filemap.c b/mm/filemap.c
index 327910c..d5e94fd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -11,6 +11,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -538,6 +539,9 @@ static int page_cache_tree_insert(struct address_space 
*mapping,
p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
if (!radix_tree_exceptional_entry(p))
return -EEXIST;
+
+   BUG_ON(dax_mapping(mapping));
+
if (shadowp)
*shadowp = p;
mapping->nrshadows--;
@@ -1201,9 +1205,9 @@ repeat:
if (radix_tree_deref_retry(page))
goto restart;
/*
-* A shadow entry of a recently evicted p

[PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling

2015-11-13 Thread Ross Zwisler
Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
are sent down via blkdev_issue_flush() in response to a fsync() or msync()
and are used by filesystems to order their metadata, among other things.

When we get an msync() or fsync() it is the responsibility of the DAX code
to flush all dirty pages to media.  The PMEM driver then just has issue a
wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
the flushed data has been durably stored on the media.

Signed-off-by: Ross Zwisler 
---
 drivers/nvdimm/pmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0ba6a97..b914d66 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -80,7 +80,7 @@ static void pmem_make_request(struct request_queue *q, struct 
bio *bio)
if (do_acct)
nd_iostat_end(bio, start);
 
-   if (bio_data_dir(bio))
+   if (bio_data_dir(bio) || (bio->bi_rw & (REQ_FLUSH|REQ_FUA)))
wmb_pmem();
 
bio_endio(bio);
@@ -189,6 +189,7 @@ static int pmem_attach_disk(struct device *dev,
blk_queue_physical_block_size(pmem->pmem_queue, PAGE_SIZE);
blk_queue_max_hw_sectors(pmem->pmem_queue, UINT_MAX);
blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+   blk_queue_flush(pmem->pmem_queue, REQ_FLUSH|REQ_FUA);
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);
 
disk = alloc_disk(0);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 05/11] mm: add follow_pte_pmd()

2015-11-13 Thread Ross Zwisler
Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
huge page PMD leaf to be found and returned.

Signed-off-by: Ross Zwisler 
Suggested-by: Dave Hansen 
---
 include/linux/mm.h |  2 ++
 mm/memory.c| 38 ++
 2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80001de..393441c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1166,6 +1166,8 @@ int copy_page_range(struct mm_struct *dst, struct 
mm_struct *src,
struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows);
+int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
+pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index deb679c..7f4090e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3512,8 +3512,8 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, 
unsigned long address)
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-static int __follow_pte(struct mm_struct *mm, unsigned long address,
-   pte_t **ptepp, spinlock_t **ptlp)
+static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
+   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp)
 {
pgd_t *pgd;
pud_t *pud;
@@ -3529,12 +3529,20 @@ static int __follow_pte(struct mm_struct *mm, unsigned 
long address,
goto out;
 
pmd = pmd_offset(pud, address);
-   VM_BUG_ON(pmd_trans_huge(*pmd));
-   if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
-   goto out;
 
-   /* We cannot handle huge page PFN maps. Luckily they don't exist. */
-   if (pmd_huge(*pmd))
+   if (pmd_huge(*pmd)) {
+   if (!pmdpp)
+   goto out;
+
+   *ptlp = pmd_lock(mm, pmd);
+   if (pmd_huge(*pmd)) {
+   *pmdpp = pmd;
+   return 0;
+   }
+   spin_unlock(*ptlp);
+   }
+
+   if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
goto out;
 
ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
@@ -3557,9 +3565,23 @@ static inline int follow_pte(struct mm_struct *mm, 
unsigned long address,
 
/* (void) is needed to make gcc happy */
(void) __cond_lock(*ptlp,
-  !(res = __follow_pte(mm, address, ptepp, ptlp)));
+  !(res = __follow_pte_pmd(mm, address, ptepp, NULL,
+  ptlp)));
+   return res;
+}
+
+int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
+pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp)
+{
+   int res;
+
+   /* (void) is needed to make gcc happy */
+   (void) __cond_lock(*ptlp,
+  !(res = __follow_pte_pmd(mm, address, ptepp, pmdpp,
+  ptlp)));
return res;
 }
+EXPORT_SYMBOL(follow_pte_pmd);
 
 /**
  * follow_pfn - look up PFN at a user virtual address
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 02/11] mm: add pmd_mkclean()

2015-11-13 Thread Ross Zwisler
Currently PMD pages can be dirtied via pmd_mkdirty(), but cannot be
cleaned.  For DAX mmap dirty page tracking we need to be able to clean PMD
pages when we flush them to media so that we get a new write fault the next
time the are written to.

Signed-off-by: Ross Zwisler 
---
 arch/x86/include/asm/pgtable.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..c548e4c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -277,6 +277,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
 }
 
+static inline pmd_t pmd_mkclean(pmd_t pmd)
+{
+   return pmd_clear_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+}
+
 static inline pmd_t pmd_mkhuge(pmd_t pmd)
 {
return pmd_set_flags(pmd, _PAGE_PSE);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 01/11] pmem: add wb_cache_pmem() to the PMEM API

2015-11-13 Thread Ross Zwisler
The function __arch_wb_cache_pmem() was already an internal implementation
detail of the x86 PMEM API, but this functionality needs to be exported as
part of the general PMEM API to handle the fsync/msync case for DAX mmaps.

One thing worth noting is that we really do want this to be part of the
PMEM API as opposed to a stand-alone function like clflush_cache_range()
because of ordering restrictions.  By having wb_cache_pmem() as part of the
PMEM API we can leave it unordered, call it multiple times to write back
large amounts of memory, and then order the multiple calls with a single
wmb_pmem().

Signed-off-by: Ross Zwisler 
---
 arch/x86/include/asm/pmem.h | 11 ++-
 include/linux/pmem.h| 22 +-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec..6c7ade0 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -67,18 +67,19 @@ static inline void arch_wmb_pmem(void)
 }
 
 /**
- * __arch_wb_cache_pmem - write back a cache range with CLWB
+ * arch_wb_cache_pmem - write back a cache range with CLWB
  * @vaddr: virtual start address
  * @size:  number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
  * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.  This API is internal to the x86 PMEM implementation.
+ * arch_wmb_pmem() call.
  */
-static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
+static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
 {
u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
unsigned long clflush_mask = x86_clflush_size - 1;
+   void *vaddr = (void __force *)addr;
void *vend = vaddr + size;
void *p;
 
@@ -115,7 +116,7 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem 
*addr, size_t bytes,
len = copy_from_iter_nocache(vaddr, bytes, i);
 
if (__iter_needs_pmem_wb(i))
-   __arch_wb_cache_pmem(vaddr, bytes);
+   arch_wb_cache_pmem(addr, bytes);
 
return len;
 }
@@ -138,7 +139,7 @@ static inline void arch_clear_pmem(void __pmem *addr, 
size_t size)
else
memset(vaddr, 0, size);
 
-   __arch_wb_cache_pmem(vaddr, size);
+   arch_wb_cache_pmem(addr, size);
 }
 
 static inline bool __arch_has_wmb_pmem(void)
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 85f810b3..2cd5003 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -53,12 +53,18 @@ static inline void arch_clear_pmem(void __pmem *addr, 
size_t size)
 {
BUG();
 }
+
+static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+{
+   BUG();
+}
 #endif
 
 /*
  * Architectures that define ARCH_HAS_PMEM_API must provide
  * implementations for arch_memcpy_to_pmem(), arch_wmb_pmem(),
- * arch_copy_from_iter_pmem(), arch_clear_pmem() and arch_has_wmb_pmem().
+ * arch_copy_from_iter_pmem(), arch_clear_pmem(), arch_wb_cache_pmem()
+ * and arch_has_wmb_pmem().
  */
 static inline void memcpy_from_pmem(void *dst, void __pmem const *src, size_t 
size)
 {
@@ -202,4 +208,18 @@ static inline void clear_pmem(void __pmem *addr, size_t 
size)
else
default_clear_pmem(addr, size);
 }
+
+/**
+ * wb_cache_pmem - write back processor cache for PMEM memory range
+ * @addr:  virtual start address
+ * @size:  number of bytes to write back
+ *
+ * Write back the processor cache range starting at 'addr' for 'size' bytes.
+ * This function requires explicit ordering with a wmb_pmem() call.
+ */
+static inline void wb_cache_pmem(void __pmem *addr, size_t size)
+{
+   if (arch_has_pmem_api())
+   arch_wb_cache_pmem(addr, size);
+}
 #endif /* __PMEM_H__ */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 06/11] mm: add pgoff_mkclean()

2015-11-13 Thread Ross Zwisler
Introduce pgoff_mkclean() which conceptually is similar to page_mkclean()
except it works in the absence of struct page and it can also be used to
clean PMDs.  This is needed for DAX's dirty page handling.

pgoff_mkclean() doesn't return an error for a missing PTE/PMD when looping
through the VMAs because it's not a requirement that each of the
potentially many VMAs associated with a given struct address_space have a
mapping set up for our pgoff.

Signed-off-by: Ross Zwisler 
---
 include/linux/rmap.h |  5 +
 mm/rmap.c| 51 +++
 2 files changed, 56 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 29446ae..171a4ac 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -223,6 +223,11 @@ unsigned long page_address_in_vma(struct page *, struct 
vm_area_struct *);
 int page_mkclean(struct page *);
 
 /*
+ * Cleans and write protects the PTEs of shared mappings.
+ */
+void pgoff_mkclean(pgoff_t, struct address_space *);
+
+/*
  * called in munlock()/munmap() path to check for other vmas holding
  * the page mlocked.
  */
diff --git a/mm/rmap.c b/mm/rmap.c
index f5b5c1f..8114862 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -586,6 +586,16 @@ vma_address(struct page *page, struct vm_area_struct *vma)
return address;
 }
 
+static inline unsigned long
+pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma)
+{
+   unsigned long address;
+
+   address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+   VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+   return address;
+}
+
 #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 static void percpu_flush_tlb_batch_pages(void *data)
 {
@@ -1040,6 +1050,47 @@ int page_mkclean(struct page *page)
 }
 EXPORT_SYMBOL_GPL(page_mkclean);
 
+void pgoff_mkclean(pgoff_t pgoff, struct address_space *mapping)
+{
+   struct vm_area_struct *vma;
+   int ret = 0;
+
+   i_mmap_lock_read(mapping);
+   vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+   struct mm_struct *mm = vma->vm_mm;
+   pmd_t pmd, *pmdp = NULL;
+   pte_t pte, *ptep = NULL;
+   unsigned long address;
+   spinlock_t *ptl;
+
+   address = pgoff_address(pgoff, vma);
+
+   /* when this returns successfully ptl is locked */
+   ret = follow_pte_pmd(mm, address, &ptep, &pmdp, &ptl);
+   if (ret)
+   continue;
+
+   if (pmdp) {
+   flush_cache_page(vma, address, pmd_pfn(*pmdp));
+   pmd = pmdp_huge_clear_flush(vma, address, pmdp);
+   pmd = pmd_wrprotect(pmd);
+   pmd = pmd_mkclean(pmd);
+   set_pmd_at(mm, address, pmdp, pmd);
+   spin_unlock(ptl);
+   } else {
+   BUG_ON(!ptep);
+   flush_cache_page(vma, address, pte_pfn(*ptep));
+   pte = ptep_clear_flush(vma, address, ptep);
+   pte = pte_wrprotect(pte);
+   pte = pte_mkclean(pte);
+   set_pte_at(mm, address, ptep, pte);
+   pte_unmap_unlock(ptep, ptl);
+   }
+   }
+   i_mmap_unlock_read(mapping);
+}
+EXPORT_SYMBOL_GPL(pgoff_mkclean);
+
 /**
  * page_move_anon_rmap - move a page to our anon_vma
  * @page:  the page to move to our anon_vma
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 08/11] dax: add support for fsync/sync

2015-11-13 Thread Ross Zwisler
To properly handle fsync/msync in an efficient way DAX needs to track dirty
pages so it is able to flush them durably to media on demand.

The tracking of dirty pages is done via the radix tree in struct
address_space.  This radix tree is already used by the page writeback
infrastructure for tracking dirty pages associated with an open file, and
it already has support for exceptional (non struct page*) entries.  We
build upon these features to add exceptional entries to the radix tree for
DAX dirty PMD or PTE pages at fault time.

When called as part of the msync/fsync flush path DAX queries the radix
tree for dirty entries, flushing them and then marking the PTE or PMD page
table entries as clean.  The step of cleaning the PTE or PMD entries is
necessary so that on subsequent writes to the same page we get a new write
fault allowing us to once again dirty the DAX tag in the radix tree.

Signed-off-by: Ross Zwisler 
---
 fs/dax.c| 140 +---
 include/linux/dax.h |   1 +
 mm/huge_memory.c|  14 +++---
 3 files changed, 141 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 131fd35a..9ce6d1b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -24,7 +24,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -287,6 +289,53 @@ static int copy_user_bh(struct page *to, struct 
buffer_head *bh,
return 0;
 }
 
+static int dax_dirty_pgoff(struct address_space *mapping, unsigned long pgoff,
+   void __pmem *addr, bool pmd_entry)
+{
+   struct radix_tree_root *page_tree = &mapping->page_tree;
+   int error = 0;
+   void *entry;
+
+   __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+
+   spin_lock_irq(&mapping->tree_lock);
+   entry = radix_tree_lookup(page_tree, pgoff);
+   if (addr == NULL) {
+   if (entry)
+   goto dirty;
+   else {
+   WARN(1, "DAX pfn_mkwrite failed to find an entry");
+   goto out;
+   }
+   }
+
+   if (entry) {
+   if (pmd_entry && RADIX_DAX_TYPE(entry) == RADIX_DAX_PTE) {
+   radix_tree_delete(&mapping->page_tree, pgoff);
+   mapping->nrdax--;
+   } else
+   goto dirty;
+   }
+
+   BUG_ON(RADIX_DAX_TYPE(addr));
+   if (pmd_entry)
+   error = radix_tree_insert(page_tree, pgoff,
+   RADIX_DAX_PMD_ENTRY(addr));
+   else
+   error = radix_tree_insert(page_tree, pgoff,
+   RADIX_DAX_PTE_ENTRY(addr));
+
+   if (error)
+   goto out;
+
+   mapping->nrdax++;
+ dirty:
+   radix_tree_tag_set(page_tree, pgoff, PAGECACHE_TAG_DIRTY);
+ out:
+   spin_unlock_irq(&mapping->tree_lock);
+   return error;
+}
+
 static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
struct vm_area_struct *vma, struct vm_fault *vmf)
 {
@@ -327,7 +376,10 @@ static int dax_insert_mapping(struct inode *inode, struct 
buffer_head *bh,
}
 
error = vm_insert_mixed(vma, vaddr, pfn);
+   if (error)
+   goto out;
 
+   error = dax_dirty_pgoff(mapping, vmf->pgoff, addr, false);
  out:
i_mmap_unlock_read(mapping);
 
@@ -450,6 +502,7 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf,
delete_from_page_cache(page);
unlock_page(page);
page_cache_release(page);
+   page = NULL;
}
 
/*
@@ -537,7 +590,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
pgoff_t size, pgoff;
sector_t block, sector;
unsigned long pfn;
-   int result = 0;
+   int error, result = 0;
 
/* Fall back to PTEs if we're going to COW */
if (write && !(vma->vm_flags & VM_SHARED))
@@ -638,6 +691,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
}
 
result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
+
+   error = dax_dirty_pgoff(mapping, pgoff, kaddr, true);
+   if (error)
+   goto fallback;
}
 
  out:
@@ -689,15 +746,12 @@ EXPORT_SYMBOL_GPL(dax_pmd_fault);
  * dax_pfn_mkwrite - handle first write to DAX page
  * @vma: The virtual memory area where the fault occurred
  * @vmf: The description of the fault
- *
  */
 int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-   struct super_block *sb = file_inode(vma->vm_file)->i_sb;
+   struct file *file = vma->vm_file;
 
-   sb_start_pagefault(sb);
-   file_update_time(vma->vm_file);
-   sb_end_pagefault(sb);
+   dax_dirty_pgoff(file->f_mapping, vmf->pgoff, NULL, false);
return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
@@ 

[PATCH v2 09/11] ext2: add support for DAX fsync/msync

2015-11-13 Thread Ross Zwisler
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can properly track when a user
write faults on a previously cleaned address.  They also need to call
dax_fsync() in the filesystem fsync() path.  This dax_fsync() call uses
addresses retrieved from get_block() so it needs to be ordered with
respect to truncate.  This is accomplished by using the same locking that
was set up for DAX page faults.

Signed-off-by: Ross Zwisler 
---
 fs/ext2/file.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 11a42c5..6c30ea2 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -102,8 +102,8 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
 {
struct inode *inode = file_inode(vma->vm_file);
struct ext2_inode_info *ei = EXT2_I(inode);
-   int ret = VM_FAULT_NOPAGE;
loff_t size;
+   int ret;
 
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
@@ -113,6 +113,8 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
if (vmf->pgoff >= size)
ret = VM_FAULT_SIGBUS;
+   else
+   ret = dax_pfn_mkwrite(vma, vmf);
 
up_read(&ei->dax_sem);
sb_end_pagefault(inode->i_sb);
@@ -161,6 +163,16 @@ int ext2_fsync(struct file *file, loff_t start, loff_t 
end, int datasync)
struct super_block *sb = file->f_mapping->host->i_sb;
struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 
+#ifdef CONFIG_FS_DAX
+   if (dax_mapping(mapping)) {
+   struct ext2_inode_info *ei = EXT2_I(file_inode(file));
+
+   down_read(&ei->dax_sem);
+   dax_fsync(mapping, start, end);
+   up_read(&ei->dax_sem);
+   }
+#endif
+
ret = generic_file_fsync(file, start, end, datasync);
if (ret == -EIO || test_and_clear_bit(AS_EIO, &mapping->flags)) {
/* We don't really know where the IO error happened... */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 11/11] xfs: add support for DAX fsync/msync

2015-11-13 Thread Ross Zwisler
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can properly track when a user
write faults on a previously cleaned address.  They also need to call
dax_fsync() in the filesystem fsync() path.  This dax_fsync() call uses
addresses retrieved from get_block() so it needs to be ordered with
respect to truncate.  This is accomplished by using the same locking that
was set up for DAX page faults.

Signed-off-by: Ross Zwisler 
---
 fs/xfs/xfs_file.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 39743ef..2b490a1 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -209,7 +209,8 @@ xfs_file_fsync(
loff_t  end,
int datasync)
 {
-   struct inode*inode = file->f_mapping->host;
+   struct address_space*mapping = file->f_mapping;
+   struct inode*inode = mapping->host;
struct xfs_inode*ip = XFS_I(inode);
struct xfs_mount*mp = ip->i_mount;
int error = 0;
@@ -218,7 +219,13 @@ xfs_file_fsync(
 
trace_xfs_file_fsync(ip);
 
-   error = filemap_write_and_wait_range(inode->i_mapping, start, end);
+   if (dax_mapping(mapping)) {
+   xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+   dax_fsync(mapping, start, end);
+   xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+   }
+
+   error = filemap_write_and_wait_range(mapping, start, end);
if (error)
return error;
 
@@ -1603,9 +1610,8 @@ xfs_filemap_pmd_fault(
 /*
  * pfn_mkwrite was originally inteneded to ensure we capture time stamp
  * updates on write faults. In reality, it's need to serialise against
- * truncate similar to page_mkwrite. Hence we open-code dax_pfn_mkwrite()
- * here and cycle the XFS_MMAPLOCK_SHARED to ensure we serialise the fault
- * barrier in place.
+ * truncate similar to page_mkwrite. Hence we cycle the XFS_MMAPLOCK_SHARED
+ * to ensure we serialise the fault barrier in place.
  */
 static int
 xfs_filemap_pfn_mkwrite(
@@ -1628,6 +1634,8 @@ xfs_filemap_pfn_mkwrite(
size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
if (vmf->pgoff >= size)
ret = VM_FAULT_SIGBUS;
+   else if (IS_DAX(inode))
+   ret = dax_pfn_mkwrite(vma, vmf);
xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 10/11] ext4: add support for DAX fsync/msync

2015-11-13 Thread Ross Zwisler
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can properly track when a user
write faults on a previously cleaned address.  They also need to call
dax_fsync() in the filesystem fsync() path.  This dax_fsync() call uses
addresses retrieved from get_block() so it needs to be ordered with
respect to truncate.  This is accomplished by using the same locking that
was set up for DAX page faults.

Signed-off-by: Ross Zwisler 
---
 fs/ext4/file.c  |  4 +++-
 fs/ext4/fsync.c | 12 ++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 749b222..8c8965c 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,8 +291,8 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
 {
struct inode *inode = file_inode(vma->vm_file);
struct super_block *sb = inode->i_sb;
-   int ret = VM_FAULT_NOPAGE;
loff_t size;
+   int ret;
 
sb_start_pagefault(sb);
file_update_time(vma->vm_file);
@@ -300,6 +300,8 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
if (vmf->pgoff >= size)
ret = VM_FAULT_SIGBUS;
+   else
+   ret = dax_pfn_mkwrite(vma, vmf);
up_read(&EXT4_I(inode)->i_mmap_sem);
sb_end_pagefault(sb);
 
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 8850254..e87c29b 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ext4.h"
 #include "ext4_jbd2.h"
@@ -86,7 +87,8 @@ static int ext4_sync_parent(struct inode *inode)
 
 int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 {
-   struct inode *inode = file->f_mapping->host;
+   struct address_space *mapping = file->f_mapping;
+   struct inode *inode = mapping->host;
struct ext4_inode_info *ei = EXT4_I(inode);
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
int ret = 0, err;
@@ -112,7 +114,13 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
goto out;
}
 
-   ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+   if (dax_mapping(mapping)) {
+   down_read(&ei->i_mmap_sem);
+   dax_fsync(mapping, start, end);
+   up_read(&ei->i_mmap_sem);
+   }
+
+   ret = filemap_write_and_wait_range(mapping, start, end);
if (ret)
return ret;
/*
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 07/11] mm: add find_get_entries_tag()

2015-11-13 Thread Ross Zwisler
Add find_get_entries_tag() to the family of functions that include
find_get_entries(), find_get_pages() and find_get_pages_tag().  This is
needed for DAX dirty page handling because we need a list of both page
offsets and radix tree entries ('indices' and 'entries' in this function)
that are marked with the PAGECACHE_TAG_TOWRITE tag.

Signed-off-by: Ross Zwisler 
---
 include/linux/pagemap.h |  3 +++
 mm/filemap.c| 61 +
 2 files changed, 64 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a6c78e0..6fea3be 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -354,6 +354,9 @@ unsigned find_get_pages_contig(struct address_space 
*mapping, pgoff_t start,
   unsigned int nr_pages, struct page **pages);
 unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
int tag, unsigned int nr_pages, struct page **pages);
+unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
+   int tag, unsigned int nr_entries,
+   struct page **entries, pgoff_t *indices);
 
 struct page *grab_cache_page_write_begin(struct address_space *mapping,
pgoff_t index, unsigned flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index d5e94fd..89ab448 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1454,6 +1454,67 @@ repeat:
 }
 EXPORT_SYMBOL(find_get_pages_tag);
 
+/**
+ * find_get_entries_tag - find and return entries that match @tag
+ * @mapping:   the address_space to search
+ * @start: the starting page cache index
+ * @tag:   the tag index
+ * @nr_entries:the maximum number of entries
+ * @entries:   where the resulting entries are placed
+ * @indices:   the cache indices corresponding to the entries in @entries
+ *
+ * Like find_get_entries, except we only return entries which are tagged with
+ * @tag.
+ */
+unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
+   int tag, unsigned int nr_entries,
+   struct page **entries, pgoff_t *indices)
+{
+   void **slot;
+   unsigned int ret = 0;
+   struct radix_tree_iter iter;
+
+   if (!nr_entries)
+   return 0;
+
+   rcu_read_lock();
+restart:
+   radix_tree_for_each_tagged(slot, &mapping->page_tree,
+  &iter, start, tag) {
+   struct page *page;
+repeat:
+   page = radix_tree_deref_slot(slot);
+   if (unlikely(!page))
+   continue;
+   if (radix_tree_exception(page)) {
+   if (radix_tree_deref_retry(page))
+   goto restart;
+   /*
+* A shadow entry of a recently evicted page, a swap
+* entry from shmem/tmpfs or a DAX entry.  Return it
+* without attempting to raise page count.
+*/
+   goto export;
+   }
+   if (!page_cache_get_speculative(page))
+   goto repeat;
+
+   /* Has the page moved? */
+   if (unlikely(page != *slot)) {
+   page_cache_release(page);
+   goto repeat;
+   }
+export:
+   indices[ret] = iter.index;
+   entries[ret] = page;
+   if (++ret == nr_entries)
+   break;
+   }
+   rcu_read_unlock();
+   return ret;
+}
+EXPORT_SYMBOL(find_get_entries_tag);
+
 /*
  * CD/DVDs are error prone. When a medium error occurs, the driver may fail
  * a _large_ part of the i/o request. Imagine the worst scenario:
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/11] DAX fsynx/msync support

2015-11-13 Thread Ross Zwisler
This patch series adds support for fsync/msync to DAX.

Patches 1 through 7 add various utilities that the DAX code will eventually
need, and the DAX code itself is added by patch 8.  Patches 9-11 update the
three filesystems that currently support DAX, ext2, ext4 and XFS, to use
the new DAX fsync/msync code.

These patches build on the recent DAX locking changes from Dave Chinner,
Jan Kara and myself.  Dave's changes for XFS and my changes for ext2 have
been merged in the v4.4 window, but Jan's are still unmerged.  You can grab
them here:

http://www.spinics.net/lists/linux-ext4/msg49951.html

Ross Zwisler (11):
  pmem: add wb_cache_pmem() to the PMEM API
  mm: add pmd_mkclean()
  pmem: enable REQ_FUA/REQ_FLUSH handling
  dax: support dirty DAX entries in radix tree
  mm: add follow_pte_pmd()
  mm: add pgoff_mkclean()
  mm: add find_get_entries_tag()
  dax: add support for fsync/sync
  ext2: add support for DAX fsync/msync
  ext4: add support for DAX fsync/msync
  xfs: add support for DAX fsync/msync

 arch/x86/include/asm/pgtable.h |   5 ++
 arch/x86/include/asm/pmem.h|  11 ++--
 drivers/nvdimm/pmem.c  |   3 +-
 fs/block_dev.c |   3 +-
 fs/dax.c   | 140 +++--
 fs/ext2/file.c |  14 -
 fs/ext4/file.c |   4 +-
 fs/ext4/fsync.c|  12 +++-
 fs/inode.c |   1 +
 fs/xfs/xfs_file.c  |  18 --
 include/linux/dax.h|   6 ++
 include/linux/fs.h |   1 +
 include/linux/mm.h |   2 +
 include/linux/pagemap.h|   3 +
 include/linux/pmem.h   |  22 ++-
 include/linux/radix-tree.h |   8 +++
 include/linux/rmap.h   |   5 ++
 mm/filemap.c   |  71 -
 mm/huge_memory.c   |  14 ++---
 mm/memory.c|  38 ---
 mm/rmap.c  |  51 +++
 mm/truncate.c  |  62 ++
 22 files changed, 425 insertions(+), 69 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] final round of SCSI updates for the 4.3+ merge window

2015-11-13 Thread James Bottomley
Sorry for the delay in this patch which was mostly caused by getting the
merger of the mpt2/mpt3sas driver, which was seen as an essential item
of maintenance work to do before the drivers diverge too much.
Unfortunately, this caused a compile failure (detected by linux-next),
which then had to be fixed up and incubated.  In addition to the
mpt2/3sas rework, there are updates from pm80xx, lpfc, bnx2fc, hpsa,
ipr, aacraid, megaraid_sas, storvsc and ufs plus an assortment of
changes including some year 2038 issues, a fix for a remove before
detach issue in some drivers and a couple of other minor issues.

This tree also includes a subtree pull from Martin, who has been
wrangling the mpt2/3 merger plus sorting out applying several other
drivers.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

The Short Changelog is:

Alison Schofield (1):
  scsi: pmcraid: replace struct timeval with ktime_get_real_seconds()

Arnd Bergmann (1):
  bnx2fc: reduce stack usage in __bnx2fc_enable

Bart Van Assche (1):
  scsi: Fix a bdi reregistration race

Benjamin Rood (9):
  pm80xx: remove the SCSI host before detaching from SAS transport
  pm80xx: avoid a panic if MSI(X) interrupts are disabled
  pm80xx: wait a minimum of 500ms before issuing commands to SPCv
  pm80xx: do not examine registers for iButton feature if ATTO adapter
  pm80xx: set PHY profiles for ATTO 12Gb SAS controllers
  pm80xx: add support for ATTO devices during SAS address initiailization
  pm80xx: add ATTO PCI IDs to pm8001_pci_table
  pm80xx: add support for PMC Sierra 8070 and PMC Sierra 8072 SAS 
controllers
  pm80xx: configure PHY settings based on subsystem vendor ID

Brian King (1):
  SCSI: Increase REPORT_LUNS timeout

Calvin Owens (1):
  sg: Fix double-free when drives detach during SG_IO

Chad Dupuis (6):
  bnx2fc: Update version number to 2.9.6.
  bnx2fc: Remove explicit logouts.
  bnx2fc: Fix FCP RSP residual parsing.
  bnx2fc: Set ELS transfer length correctly for middle path commands.
  bnx2fc: Remove 'NetXtreme II' from source files.
  bnx2fc: Update copyright for 2015.

Christoph Hellwig (3):
  mpt2sas: Use mpi headers from mpt3sas
  scsi: use host wide tags by default
  scsi: restart list search after unlock in scsi_remove_target

Dan Carpenter (1):
  mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl()

Don Brace (15):
  hpsa: bump the driver version
  hpsa: enhance device messages
  hpsa: enhance hpsa_get_device_id
  hpsa: correct ioaccel2 sg chain len
  hpsa: correct check for non-disk devices
  hpsa: fix hpsa_adjust_hpsa_scsi_table
  hpsa: correct transfer length for 6 byte read/write commands
  hpsa: abandon rescans on memory alloaction failures.
  hpsa: allow driver requested rescans
  hpsa: fix null device issues
  hpsa: check for null arguments to dev_printk
  hpsa: change devtype to unsigned
  hpsa: remove unused hpsa_tag_discard_error_bits
  hpsa: stop zeroing reset_cmds_out and ioaccel_cmds_out during rescan
  hpsa: remove unused parameter hostno

Gabriel Krisman Bertazi (6):
  sd: Clear PS bit before Mode Select.
  ipr: Driver version 2.6.3.
  ipr: Issue Configure Cache Parameters command.
  ipr: Inquiry IOA page 0xC4 during initialization.
  ipr: Don't set NO_ULEN_CHK bit when resource is a vset.
  ipr: Add delay to ensure coherent dumps.

Jack Wang (3):
  mvsas: remove SCSI host before detaching from SAS transport
  aic94xx: remove SCSI host before detaching from SAS transport
  isci: remove SCSI host before detaching from SAS transport

Jiri Slaby (1):
  fcoe: use continue instead of goto+label

Johannes Thumshirn (1):
  scsi: Export SCSI Inquiry data to sysfs

John Soni Jose (2):
  be2iscsi: Bump the driver version
  be2iscsi: Fix updating the next pointer during WRB posting

K. Y. Srinivasan (2):
  scsi: storvsc: Fix a bug in the handling of SRB status flags
  storvsc: Don't set the SRB_FLAGS_QUEUE_ACTION_ENABLE flag

Kevin Barnett (6):
  hpsa: add in sas transport class
  hpsa: move scsi_add_device and scsi_remove_device calls to new function
  hpsa: refactor hpsa_figure_bus_target_lun
  hpsa: add function is_logical_device
  hpsa: simplify update scsi devices
  hpsa: simplify check for device exposure

Laurent Vivier (2):
  ibmvscsi: set max_lun to 32
  ibmvscsi: display default value for max_id, max_lun and max_channel.

Mahesh Rajashekhara (9):
  aacraid: Update driver version
  aacraid: Use pci_enable_msix_range()
  aacraid: IOCTL fix
  aacraid: Reset irq affinity hints
  aacraid: Tune response path if IsFastPath bit set
  aacraid: Enable 64bit write to controller register
  aacraid: Change interrupt mode to MSI for Series 6
  aacraid: Add Power Management support
  aacraid: Fix for 

Re: spi: OF module autoloading is still broken (was: Re: m25p80: Commit "allow arbitrary OF matching for "jedec,spi-nor"" breaks module autoloading)

2015-11-13 Thread Brian Norris
Hi,

On Fri, Nov 13, 2015 at 11:14:10PM +, Mark Brown wrote:
> On Fri, Nov 13, 2015 at 02:51:13PM -0800, Brian Norris wrote:
> 
> > General problem:
> > 
> 
> > The SPI core doesn't use the OF compatible property for generating
> > uevent/modalias, and therefore can't autoload modules based on the full
> > compatible property of a device. It *only* can use the 'modalias', which
> > is a castrated version of the compatible property -- it only includes
> > part of the 1st entry in 'compatible'.
> 
> > This forces SPI device drivers to use spi_device_id tables even when
> > they might be better suited for of_match_tables.
> 
> Well, I don't actually see this as that bad a thing - it's good practice
> to include the Linux ID tables even if you also support DT since not all
> the world is DT.

I suppose so, but that's still not the whole story.

(I believe I avoided this in the first place for mostly-aesthetic
reasons; technically this allows people to put garbage in their DT, like
"garbage,spi-nor". It's unclear whether "garbage" becomes part of the
mythical DT ABI [1].)

> > Specifics for m25p80:
> > =
> 
> > We support many flash devices and have traditionally been doing so by
> > simply adding more entries to the spi_device_id table. Recently, we have
> > tried to get away from adding new entries and aliases for every single
> > variation by instead supporting a common OF match: "jedec,spi-nor". So
> > we might expect to see nodes like this:
> 
> > flash@xxx {
> > compatible = "vendor,shiny-new-device", "jedec,spi-nor";
> > ...
> > };
> 
> > We may or may not add "shiny-new-device" to the spi_device_id array. But
> > "jedec,spi-nor" should be sufficient to load the driver and check if the
> > READ ID string matches any known flash. If "shiny-new-device" isn't in
> > the spi_device_id array, then we don't get module autoloading.
> 
> OK, so you're trying to do dynamic enumeration?  Then you don't want
> specific things in any of the ID tables since you'll match it yourself
> at runtime (which is obviously good).

Well, we do have to support existing cases (e.g., existing device trees
without "jedec,spi-nor") so we have to keep some around. But otherwise,
mostly yes.

> > There's also the case of omitting "vendor,shiny-new-device" entirely,
> > which is probably a little more dangerous, but still legal (and also
> > won't autoload modules):
> 
> > flash@xxx {
> > compatible = "jedec,spi-nor";
> > ...
> > };
> 
> My immediate thought is that I'd expect to see spi-nor and (based on a
> quick scan of the m25p80 driver) nor-jedec to appear in the spi_device_id
> table since regardless of what happens with Javier's patch we want the
> autoprobing mechanism to work for board file based platforms too
> (there's a bunch of architectures that still use them).  That'd also
> have the side effect of solving your immediate problem I think?

No "nor-jedec" -- that was an intermediate name that got replaced
mid-release-cycle due to some late DT review comments.

But yes, I suppose adding "spi-nor" back to the spi_device_id table
fixes *one* of the immediate problems (i.e., 'compatible =
"jedec,spi-nor"'). That would cover Heiner's report. But it doesn't
solve:

  compatible = "vendor,shiny-new-device", "jedec,spi-nor"

I believe that the latter is sometimes the Right Way (TM) to do things
for device tree, so you have a fallback if auto-probing "jedec,spi-nor"
ever doesn't suffice.

(This came up in Heiner's original post: "In case of m25p80 this means
that "jedec,spi-nor" has to be the first "compatible" value. This
constraint might be too strict ..")

> [Snip example with three different prefixes for m25p80 in compatible
> strings]
> 
> > All three are supported by SPI's current modalias code, and so are part
> > of the ABI. Thus, m25p80.c will always contain both a spi_device_id
> > table and an of_match_table. But I think Javier's patch would break
> > these three cases.
> 
> Right, IIRC I think that sort of thing was what I was looking for in
> documentation for his patch.  Now you mention it I'm not sure we can do
> wildcarding with DT which is a bit unfortunate for cases like this.

Yeah, I expect wildcards are a no-go.

> Hrm.  Not sure and it's getting late on a Friday night...

:)

I suspect we'll have to fully support both spi_device_id tables (fully
supported already; if nothing else, to keep wildcard matching) and
of_match_tables (not fully supported for module loading), and in some
cases, the two will have to stay partially in sync.

Brian

[1] "Device Tree as a stable ABI: a fairy tale?"

http://free-electrons.com/pub/conferences/2015/elc/petazzoni-dt-as-stable-abi-fairy-tale/petazzoni-dt-as-stable-abi-fairy-tale.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-inf

Re: [PATCH] pstore: add support for 64 Bit address space

2015-11-13 Thread Kees Cook
On Fri, Nov 13, 2015 at 4:10 AM, Wiebe, Wladislav (Nokia - DE/Ulm)
 wrote:
> Some architectures has there reserved RAM in 64 Bit address space.
> Therefore converting mem_address module parameter to ullong.
>
> Signed-off-by: Wladislav Wiebe 

If this works correctly, I have no objection. :)

Acked-by: Kees Cook 

Thanks!

-Kees

> ---
>  fs/pstore/ram.c|4 ++--
>  include/linux/pstore_ram.h |2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index 319c3a6..bd9812e 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -55,8 +55,8 @@ static ulong ramoops_pmsg_size = MIN_MEM_SIZE;
>  module_param_named(pmsg_size, ramoops_pmsg_size, ulong, 0400);
>  MODULE_PARM_DESC(pmsg_size, "size of user space message log");
>
> -static ulong mem_address;
> -module_param(mem_address, ulong, 0400);
> +static unsigned long long mem_address;
> +module_param(mem_address, ullong, 0400);
>  MODULE_PARM_DESC(mem_address,
> "start of reserved RAM used to store oops/panic logs");
>
> diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
> index 9c9d6c1..4660aaa 100644
> --- a/include/linux/pstore_ram.h
> +++ b/include/linux/pstore_ram.h
> @@ -76,7 +76,7 @@ ssize_t persistent_ram_ecc_string(struct 
> persistent_ram_zone *prz,
>
>  struct ramoops_platform_data {
> unsigned long   mem_size;
> -   unsigned long   mem_address;
> +   phys_addr_t mem_address;
> unsigned intmem_type;
> unsigned long   record_size;
> unsigned long   console_size;
> --
> 1.7.1
>
> Regards,
> Wladislav Wiebe
>
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH net-next] Driver: Vmxnet3: Fix use of mfTableLen for big endian architectures

2015-11-13 Thread Shrikrishna Khare
Signed-off-by: Shrikrishna Khare 
Reported-by: Masao Uebayashi 
Signed-off-by: Bhavesh Davda 
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 7 ---
 drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index 46f4cad..899ea42 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2157,12 +2157,13 @@ vmxnet3_set_mc(struct net_device *netdev)
if (!netdev_mc_empty(netdev)) {
new_table = vmxnet3_copy_mc(netdev);
if (new_table) {
-   rxConf->mfTableLen = cpu_to_le16(
-   netdev_mc_count(netdev) * ETH_ALEN);
+   size_t sz = netdev_mc_count(netdev) * ETH_ALEN;
+
+   rxConf->mfTableLen = cpu_to_le16(sz);
new_table_pa = dma_map_single(
&adapter->pdev->dev,
new_table,
-   rxConf->mfTableLen,
+   sz,
PCI_DMA_TODEVICE);
}
 
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index 3f859a5..4c58c83 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.3.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.4.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040300
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040400
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM / wakeirq: check that wake IRQ is valid before accepting it

2015-11-13 Thread Rafael J. Wysocki
On Thursday, November 12, 2015 10:52:11 AM Dmitry Torokhov wrote:
> On Thu, Nov 12, 2015 at 08:41:55PM +0200, Grygorii Strashko wrote:
> > On 11/12/2015 08:26 PM, Dmitry Torokhov wrote:
> > >Check that IRQ number passed to dev_pm_set_wake_irq and
> > >dev_pm_set_dedicated_wake_irq is valid (not negative) before accepting it.
> > >
> > >Signed-off-by: Dmitry Torokhov 
> > >---
> > >
> > >My recent change to i2c core introduced a code path that led to calling
> > >dev_pm_set_wake_irq(&client->dev, -ENOENT), which succeeded but
> > >obviously did the wrong thing. Checking the IRQ and bailing out early
> > >would have helped noticing this issue earlier.
> > >
> > >  drivers/base/power/wakeirq.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > >diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c
> > >index eb6e674..0d77cd6 100644
> > >--- a/drivers/base/power/wakeirq.c
> > >+++ b/drivers/base/power/wakeirq.c
> > >@@ -68,6 +68,9 @@ int dev_pm_set_wake_irq(struct device *dev, int irq)
> > >   struct wake_irq *wirq;
> > >   int err;
> > >
> > >+  if (irq < 0)
> > 
> > <= 0 ?
> 
> Maybe. I am still confused whether we treat 0 as invalid or not.

Well, it all boils down to whether or not IRQ 0 may be a valid wakeup IRQ
on any architectures.

In any case, though, we can add that check later, so I'll apply the patch
as is.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

2015-11-13 Thread Andy Lutomirski
On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
 wrote:
> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
> it's not pt_regs).
>
> We need to adjust it so that subsequent xen_iret can use it.

I'm wondering if this should be more straightforward:

movq%rsp, %rdi
calldo_fast_syscall_32
testl   %eax, %eax
jz  .Lsyscall_32_done

/* Opportunistic SYSRET */
sysret32_from_system_call:
XEN_DO_SYSRET32

where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
variant of Xen's iret path that knows that the fast path is okay.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen/x86: Adjust stack pointer in xen_sysexit

2015-11-13 Thread Boris Ostrovsky
After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
it's not pt_regs).

We need to adjust it so that subsequent xen_iret can use it.

Signed-off-by: Boris Ostrovsky 
---

Alternatively, we could return 0 from do_fast_syscall_32() if paravirt_enabled()
is true since Xen PV guests will end up using xen_iret one way or the other. And
then we won't need xen_sysexit at all.

 arch/x86/xen/xen-asm_32.S |   23 ---
 1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index fd92a64..c70ec37 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -36,15 +36,24 @@ check_events:
 
 /*
  * We can't use sysexit directly, because we're not running in ring0.
- * But we can easily fake it up using iret.  Assuming xen_sysexit is
- * jumped to with a standard stack frame, we can just strip it back to
- * a standard iret frame and use iret.
+ * But we can easily fake it up using iret.
+ * We came here from the opportunistic SYSEXIT path in entry_SYSENTER_32
+ * which left the stack looking like this:
+ * $__USER_DS
+ * %ecx
+ * eflags
+ * $__USER_CS
+ * %eip
+ * %eax
+ * %gs
+ * %fs
+ * %es
+ * %ds <-- %esp
+ *
+ * so we need to adjust it to look like a standard iret frame
  */
 ENTRY(xen_sysexit)
-   movl PT_EAX(%esp), %eax /* Shouldn't be necessary? */
-   orl $X86_EFLAGS_IF, PT_EFLAGS(%esp)
-   lea PT_EIP(%esp), %esp
-
+   add $5*4, %esp
jmp xen_iret
 ENDPROC(xen_sysexit)
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spi: OF module autoloading is still broken (was: Re: m25p80: Commit "allow arbitrary OF matching for "jedec,spi-nor"" breaks module autoloading)

2015-11-13 Thread Mark Brown
On Fri, Nov 13, 2015 at 02:51:13PM -0800, Brian Norris wrote:
> On Fri, Nov 13, 2015 at 10:12:28PM +, Mark Brown wrote:
> > On Fri, Nov 13, 2015 at 11:40:31AM -0800, Brian Norris wrote:

> > > (Changing subject line, because apparently some people ignore mail if it
> > > doesn't have 'SPI' in the subject line)

> > Well, if you mean me I'm getting CCed on such a large number of large
> > threads about MTD patches that only have relevance to SPI in that
> > they're for a driver that uses SPI that I pretty delete a very large
> > proportion of mail that looks like it's about MTD patch unread I'm
> > afraid.  It's almost all completely irrelevant and uninteresting to me.

> I understand, but I'm not sure how to fix that. In some cases, it's
> somewhat unavoidable, since there are series that need (or at least,
> think they need) upgrades to SPI infrastructure in order to support new
> things in MTD. But that's rare, and most of the time, people are just
> CC'ing anything and anyone that looks relevant.

Those I'm less worried about.  It's the serieses that have no SPI
content at all that get a bit much.

> Any suggestions are welcome. I'll try to discourage it when I notice.
> I'm not sure documentation helps, unless we can find something people
> actually read. And tooling doesn't exactly help, since
> scripts/get_maintainer.pl already doesn't suggest you or linux-spi@ for
> any of the drivers/mtd/spi-nor/ or drivers/mtd/devices/m25p80.c.

I get the impression a lot of it is "I once copied some vaugely related
patch set to these people, I'll add them to this one too" and that it's
mostly just about education.  I supposed I should write some boiler
plate to send to people, I've not 

> General problem:
> 

> The SPI core doesn't use the OF compatible property for generating
> uevent/modalias, and therefore can't autoload modules based on the full
> compatible property of a device. It *only* can use the 'modalias', which
> is a castrated version of the compatible property -- it only includes
> part of the 1st entry in 'compatible'.

> This forces SPI device drivers to use spi_device_id tables even when
> they might be better suited for of_match_tables.

Well, I don't actually see this as that bad a thing - it's good practice
to include the Linux ID tables even if you also support DT since not all
the world is DT.

> Specifics for m25p80:
> =

> We support many flash devices and have traditionally been doing so by
> simply adding more entries to the spi_device_id table. Recently, we have
> tried to get away from adding new entries and aliases for every single
> variation by instead supporting a common OF match: "jedec,spi-nor". So
> we might expect to see nodes like this:

>   flash@xxx {
>   compatible = "vendor,shiny-new-device", "jedec,spi-nor";
>   ...
>   };

> We may or may not add "shiny-new-device" to the spi_device_id array. But
> "jedec,spi-nor" should be sufficient to load the driver and check if the
> READ ID string matches any known flash. If "shiny-new-device" isn't in
> the spi_device_id array, then we don't get module autoloading.

OK, so you're trying to do dynamic enumeration?  Then you don't want
specific things in any of the ID tables since you'll match it yourself
at runtime (which is obviously good).

> There's also the case of omitting "vendor,shiny-new-device" entirely,
> which is probably a little more dangerous, but still legal (and also
> won't autoload modules):

>   flash@xxx {
>   compatible = "jedec,spi-nor";
>   ...
>   };

My immediate thought is that I'd expect to see spi-nor and (based on a
quick scan of the m25p80 driver) nor-jedec to appear in the spi_device_id
table since regardless of what happens with Javier's patch we want the
autoprobing mechanism to work for board file based platforms too
(there's a bunch of architectures that still use them).  That'd also
have the side effect of solving your immediate problem I think?

[Snip example with three different prefixes for m25p80 in compatible
strings]

> All three are supported by SPI's current modalias code, and so are part
> of the ABI. Thus, m25p80.c will always contain both a spi_device_id
> table and an of_match_table. But I think Javier's patch would break
> these three cases.

Right, IIRC I think that sort of thing was what I was looking for in
documentation for his patch.  Now you mention it I'm not sure we can do
wildcarding with DT which is a bit unfortunate for cases like this.
Hrm.  Not sure and it's getting late on a Friday night...


signature.asc
Description: PGP signature


Re: [PATCH] drivers: staging: vme: Fixed code style issues

2015-11-13 Thread Martyn Welch



On 13/11/15 20:01, Egor Uleyskiy wrote:

From: Egor Uleyskiy 

* Fixed indention
* Deleted extra empty lines
* Constructions that looks like
 card = kzalloc(sizeof(struct pio2_card), GFP_KERNEL);
   are changed to
 card = kzalloc(sizeof(*card), GFP_KERNEL);

Also:

 * Removing extra bracketing from uses of the address operator
 * Use preferred null return check style

Other than that:

Acked-by: Martyn Welch 

Martyn



Signed-off-by: Egor Uleyskiy 
---
  drivers/staging/vme/devices/vme_pio2_cntr.c |  2 +-
  drivers/staging/vme/devices/vme_pio2_core.c | 20 +-
  drivers/staging/vme/devices/vme_pio2_gpio.c | 32 ++---
  drivers/staging/vme/devices/vme_user.h  |  2 --
  4 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/staging/vme/devices/vme_pio2_cntr.c 
b/drivers/staging/vme/devices/vme_pio2_cntr.c
index 6335471..486c30c 100644
--- a/drivers/staging/vme/devices/vme_pio2_cntr.c
+++ b/drivers/staging/vme/devices/vme_pio2_cntr.c
@@ -61,7 +61,7 @@ int pio2_cntr_reset(struct pio2_card *card)
/* Ensure all counter interrupts are cleared */
do {
retval = vme_master_read(card->window, ®, 1,
-   PIO2_REGS_INT_STAT_CNTR);
+PIO2_REGS_INT_STAT_CNTR);
if (retval < 0)
return retval;
} while (reg != 0);
diff --git a/drivers/staging/vme/devices/vme_pio2_core.c 
b/drivers/staging/vme/devices/vme_pio2_core.c
index 35c6ce5..28a6ab6 100644
--- a/drivers/staging/vme/devices/vme_pio2_core.c
+++ b/drivers/staging/vme/devices/vme_pio2_core.c
@@ -90,7 +90,7 @@ static void pio2_int(int level, int vector, void *ptr)
case 4:
/* Channels 0 to 7 */
retval = vme_master_read(card->window, ®, 1,
-   PIO2_REGS_INT_STAT[vec - 1]);
+PIO2_REGS_INT_STAT[vec - 1]);
if (retval < 0) {
dev_err(&card->vdev->dev,
"Unable to read IRQ status register\n");
@@ -100,8 +100,8 @@ static void pio2_int(int level, int vector, void *ptr)
channel = ((vec - 1) * 8) + i;
if (reg & PIO2_CHANNEL_BIT[channel])
dev_info(&card->vdev->dev,
-   "Interrupt on I/O channel %d\n",
-   channel);
+"Interrupt on I/O channel %d\n",
+channel);
}
break;
case 5:
@@ -215,7 +215,7 @@ static int pio2_probe(struct vme_dev *vdev)
u8 reg;
int vec;
  
-	card = kzalloc(sizeof(struct pio2_card), GFP_KERNEL);

+   card = kzalloc(sizeof(*card), GFP_KERNEL);
if (!card) {
retval = -ENOMEM;
goto err_struct;
@@ -289,7 +289,7 @@ static int pio2_probe(struct vme_dev *vdev)
}
  
  	retval = vme_master_set(card->window, 1, card->base, 0x1, VME_A24,

-   (VME_SCT | VME_USER | VME_DATA), VME_D16);
+   (VME_SCT | VME_USER | VME_DATA), VME_D16);
if (retval) {
dev_err(&card->vdev->dev,
"Unable to configure VME master resource\n");
@@ -335,7 +335,7 @@ static int pio2_probe(struct vme_dev *vdev)
  
  	/* Set VME vector */

retval = vme_master_write(card->window, &card->irq_vector, 1,
-   PIO2_REGS_VME_VECTOR);
+ PIO2_REGS_VME_VECTOR);
if (retval < 0)
return retval;
  
@@ -343,7 +343,7 @@ static int pio2_probe(struct vme_dev *vdev)

vec = card->irq_vector | PIO2_VME_VECTOR_SPUR;
  
  	retval = vme_irq_request(vdev, card->irq_level, vec,

-   &pio2_int, (void *)card);
+&pio2_int, (void *)card);
if (retval < 0) {
dev_err(&card->vdev->dev,
"Unable to attach VME interrupt vector0x%x, level 
0x%x\n",
@@ -356,7 +356,7 @@ static int pio2_probe(struct vme_dev *vdev)
vec = card->irq_vector | PIO2_VECTOR_BANK[i];
  
  		retval = vme_irq_request(vdev, card->irq_level, vec,

-   &pio2_int, (void *)card);
+&pio2_int, (void *)card);
if (retval < 0) {
dev_err(&card->vdev->dev,
"Unable to attach VME interrupt vector0x%x, level 
0x%x\n",
@@ -370,7 +370,7 @@ static int pio2_probe(struct vme_dev *vdev)
vec = card->irq_vector | PIO2_VECTOR_CNTR[i];
  
  		retval = vme_irq_request(vdev, card->irq_level, vec,

-   &pio2_int, (void *)card);
+&pio2_int, (void *)card);
if (retval < 0) {
dev_err(&

Re: multi-codec support for arizona-ldo1 was Re: System with multiple arizona (wm5102) codecs

2015-11-13 Thread Mark Brown
On Fri, Nov 13, 2015 at 10:58:12PM +0100, Pavel Machek wrote:
> On Tue 2015-10-13 12:53:55, Mark Brown wrote:
> > On Mon, Oct 12, 2015 at 10:11:38PM +0200, Pavel Machek wrote:

> > > > No, you definitely shouldn't be doing this - the regulator names should
> > > > reflect the names the device has in the datasheet to aid people in going
> > > > from software to the hardware and back again.  They shouldn't be
> > > > dynamically generated at runtime.  If you need to namespace by
> > > device

> Ok. But I'd still like to get it working.

So as I've been saying use the existing interfaces, or extend them as
needed.

> Now... I got up-to v4.2 kernel, and it seems that it has support for
> multiple sources with same name (but on different chips):

> [1.125485] Adding alias for supply MICVDD,(null) -> MICVDD,spi32766.1
> [1.285470] Adding alias for supply MICVDD,(null) -> MICVDD,spi32766.2

> ...but it does not look like I can use those aliases from the ALSA side:

> [2.734198] wm5102-codec.1 supply MICVDD,spi32766.1 not found, using dummy 
> regulator
> [3.170912] wm5102-codec.2 supply MICVDD,spi32766.2 not found, using dummy 
> regulator

> I tried to do this:

> SND_SOC_DAPM_REGULATOR_SUPPLY("MICVDD,spi32766.1", 0, 
> SND_SOC_DAPM_REGULATOR_BYPASS),

You're attempting to put a system specific string into a generic driver,
this will break all other users which is clearly not OK.

> Any idea what I did wrong, or what needs to be fixed?

Well, if we look at the code that prints the alias message you pasted
above:

pr_info("Adding alias for supply %s,%s -> %s,%s\n",
id, dev_name(dev), alias_id, dev_name(alias_dev));

we can see that it's not just rewriting a string here but is rather
mapping one supply, device tuple to another.  You shouldn't find any
places where the device and supply are concatenated into a single
strong, including the interface used to request regulators, so
attempting to rewrite the name of the supply is not going to get
anywhere.

> > > > provide an interface which explicitly namespaces by device rather than
> > > > hacking it into another interface, the usual thing is to use the struct
> > > > device as the context.

> > > I'll need some more help here. I need to use it from ALSA, so I don't
> > > think I can influence that interface easily.

> > Sorry?  If this is going into the userspace ABI there's something
> > seriously wrong...

> It is exposed to the ALSA. If ALSA exposes it to userspace, I'm not sure. 

So if it's not exposed to userspace (and it *really* shouldn't be) why
would it not be possible to influence whatever interface you're thinking
of here?  I'm really confused by what you're saying here.

> > > What is currently in tree _does not work_, as there are two arizona
> > > chips, and two "LDO1" regulators. (Doable) suggestions how to fix that
> > > are welcome.

> > To repeat what I said above, provide an interface which namespaces by
> > device (as we normally do when we need to distinguish between multiple
> > instances of the same device).  Given that everything is part of the
> > same device it's very easy to discover which device so it's clearly no
> > problem when mapping the supplies.

> I'm afraid I don't know how to do this. See above.

Look at how we resolve supplies when we do lookups, then look at how we
create aliases for the MFD cells to map supplies into the function
devices and figure out why those mappings aren't being found.  The NULL
you're seeing above seems like a bit of a warning sign here - where did
that come from?


signature.asc
Description: PGP signature


Re: spi: OF module autoloading is still broken (was: Re: m25p80: Commit "allow arbitrary OF matching for "jedec,spi-nor"" breaks module autoloading)

2015-11-13 Thread Brian Norris
Hi Mark,

On Fri, Nov 13, 2015 at 10:12:28PM +, Mark Brown wrote:
> On Fri, Nov 13, 2015 at 11:40:31AM -0800, Brian Norris wrote:
> 
> > (Changing subject line, because apparently some people ignore mail if it
> > doesn't have 'SPI' in the subject line)
> 
> Well, if you mean me I'm getting CCed on such a large number of large
> threads about MTD patches that only have relevance to SPI in that
> they're for a driver that uses SPI that I pretty delete a very large
> proportion of mail that looks like it's about MTD patch unread I'm
> afraid.  It's almost all completely irrelevant and uninteresting to me.

I understand, but I'm not sure how to fix that. In some cases, it's
somewhat unavoidable, since there are series that need (or at least,
think they need) upgrades to SPI infrastructure in order to support new
things in MTD. But that's rare, and most of the time, people are just
CC'ing anything and anyone that looks relevant.

Any suggestions are welcome. I'll try to discourage it when I notice.
I'm not sure documentation helps, unless we can find something people
actually read. And tooling doesn't exactly help, since
scripts/get_maintainer.pl already doesn't suggest you or linux-spi@ for
any of the drivers/mtd/spi-nor/ or drivers/mtd/devices/m25p80.c.

I feel bad for anyone on devicet...@vger.kernel.org for similar reasons,
BTW. But I guess that's a product of their own decisions. See #2 in
Documentation/devicetree/bindings/submitting-patches.txt.

> > > Is this [1] getting fixed in SPI any time soon? Looks like there was
> > > some progress [2], but AFAICT it's not completed.
> 
> Please include human readable descriptions of things like commits IDs
> and issues being discussed in e-mail in your mails, this makes them much
> easier for humans to read especially when they have no internet access.
> I do frequently catch up on my mail on flights or while otherwise
> travelling so this is even more pressing for me than just being about
> making things a bit easier to read.

Sorry, I suppose I could have summarized a bit. But I didn't want to
copy-and-paste the whole thing, and Javier's work pretty clearly
explains the problem.

> > > I'd just like to know what the way forward here should be for m25p80.
> > > Really, "jedec,spi-nor" never autoloaded modules very reliably because
> > > of the SPI core constaints. So I'm not sure I'd consider this a
> > > regression, and I might be OK waiting around if it'll be fixed in a
> > > reasonable time frame.
> 
> Someone will need to tell me what the actual problem is for m25p80
> before I can understand what the way forward might be.  From a brief
> scan through of the thread it looks like if Javier's series solves the
> problem it needs a bit more analysis and/or a clearer presentation and
> probably a resubmit.

General problem:


The SPI core doesn't use the OF compatible property for generating
uevent/modalias, and therefore can't autoload modules based on the full
compatible property of a device. It *only* can use the 'modalias', which
is a castrated version of the compatible property -- it only includes
part of the 1st entry in 'compatible'.

This forces SPI device drivers to use spi_device_id tables even when
they might be better suited for of_match_tables.


Specifics for m25p80:
=

We support many flash devices and have traditionally been doing so by
simply adding more entries to the spi_device_id table. Recently, we have
tried to get away from adding new entries and aliases for every single
variation by instead supporting a common OF match: "jedec,spi-nor". So
we might expect to see nodes like this:

flash@xxx {
compatible = "vendor,shiny-new-device", "jedec,spi-nor";
...
};

We may or may not add "shiny-new-device" to the spi_device_id array. But
"jedec,spi-nor" should be sufficient to load the driver and check if the
READ ID string matches any known flash. If "shiny-new-device" isn't in
the spi_device_id array, then we don't get module autoloading.

There's also the case of omitting "vendor,shiny-new-device" entirely,
which is probably a little more dangerous, but still legal (and also
won't autoload modules):

flash@xxx {
compatible = "jedec,spi-nor";
...
};


Addendum:
=

(This isn't the core problem I'm worried about, but I believe it serves
as commentary on Javier's patch:)

Cases like this are possible and should be considered:

flash@xxx {
compatible = "m25p80";
...
};

flash@xxx {
compatible = "st,m25p80";
...
};

flash@xxx {
compatible = "something-nonsensical,m25p80";
...
};

All three are supported by SPI's current modalias code, and so are part
of the ABI. Thus, m25p80.c will always contain both a spi_device_id
table and an of_match_table. But I think Javier's pa

Re: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

2015-11-13 Thread David Rientjes
On Fri, 13 Nov 2015, Joonsoo Kim wrote:

> cma allocation should be guranteeded to succeed, but, sometimes,
> it could be failed in current implementation. To track down
> the problem, we need to know which page is problematic and
> this new tracepoint will report it.
> 
> Acked-by: Michal Nazarewicz 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 

Thanks for generalizing this!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] ktime: add a roundup function

2015-11-13 Thread Jacob Pan
On Fri, 13 Nov 2015 15:13:45 -0500 (EST)
Thomas Gleixner  wrote:

> > 
> > +static inline ktime_t ktime_roundup(ktime_t x, ktime_t y)  
> 
> Kerneldoc comment of this function would be appreciated.
will do. Plan to reuse John's comment.

Thanks,

Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] ktime: add a roundup function

2015-11-13 Thread Jacob Pan
On Fri, 13 Nov 2015 12:11:01 -0800
John Stultz  wrote:

> Could you add a comment as to what the function does, and use some
> better variable names here to make it more immediately obvious what is
> being done here?
> 
> Something like:
> /**
>  * ktime_roundup - Rounds value up to interval chunk
>  * @ value: Value to be rounded up
>  * @ interval: interval size to round up to
>  *
>  * Rounds a value up to the next higher multiple of an interval size
>  */
> static inline ktime ktime_roundup(ktime_t value, ktime_t interval)
will do. thank you for taking the time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V6] mm: fix kernel crash in khugepaged thread

2015-11-13 Thread David Rientjes
On Fri, 13 Nov 2015, yalin wang wrote:

> This crash is caused by NULL pointer deference, in page_to_pfn() marco,
> when page == NULL :
> 
> [  182.639154 ] Unable to handle kernel NULL pointer dereference at virtual 
> address 
> [  182.639491 ] pgd = ffc00077a000
> [  182.639761 ] [] *pgd=b9422003, *pud=b9422003, 
> *pmd=b9423003, *pte=006008000707
> [  182.640749 ] Internal error: Oops: 9406 [#1] SMP
> [  182.641197 ] Modules linked in:
> [  182.641580 ] CPU: 1 PID: 26 Comm: khugepaged Tainted: GW   
> 4.3.0-rc6-next-20151022ajb-1-g32f3386-dirty #3
> [  182.642077 ] Hardware name: linux,dummy-virt (DT)
> [  182.642227 ] task: ffc07957c080 ti: ffc079638000 task.ti: 
> ffc079638000
> [  182.642598 ] PC is at khugepaged+0x378/0x1af8
> [  182.642826 ] LR is at khugepaged+0x418/0x1af8
> [  182.643047 ] pc : [] lr : [] pstate: 
> 6145
> [  182.643490 ] sp : ffc07963bca0
> [  182.643650 ] x29: ffc07963bca0 x28: ffc00075c000
> [  182.644024 ] x27: ffc00f275040 x26: ffc0006c7000
> [  182.644334 ] x25: 00e848800f51 x24: 0640
> [  182.644687 ] x23: 0002 x22: 
> [  182.644972 ] x21:  x20: 
> [  182.645446 ] x19:  x18: 007ff86d0990
> [  182.645931 ] x17: 007ef9c8 x16: ffc98390
> [  182.646236 ] x15:  x14: 
> [  182.646649 ] x13: 016a x12: 
> [  182.647046 ] x11: ffc07f025020 x10: 
> [  182.647395 ] x9 : 0048 x8 : ffc000721e28
> [  182.647872 ] x7 :  x6 : ffc07f02d000
> [  182.648261 ] x5 : fe00 x4 : ffc00f275040
> [  182.648611 ] x3 :  x2 : ffc00f2ad000
> [  182.648908 ] x1 :  x0 : ffc000727000
> [  182.649147 ]
> [  182.649252 ] Process khugepaged (pid: 26, stack limit = 0xffc079638020)
> [  182.649724 ] Stack: (0xffc07963bca0 to 0xffc07963c000)
> [  182.650141 ] bca0: ffc07963be30 ffcb5044 ffc07961fb80 
> ffc00072e630
> [  182.650587 ] bcc0: ffc0005d5090  ffc000197d34 
> 
> [  182.651009 ] bce0:    
> 
> [  182.651446 ] bd00: ffc07963bd90 ffc07f1cbf80 4f3be003 
> ffc00f2750a4
> [  182.651956 ] bd20: ffc00f3bf000 ffc1 0001 
> ffc07f085740
> [  182.652520 ] bd40: ffc00f2ad188 ffc0 0620 
> ffc00f275040
> [  182.652972 ] bd60: ffc0006b1a90 ffc079638000 ffc07963be20 
> ffc00f0144d0
> [  182.653357 ] bd80: ffc0 0640 ffc00f0144d0 
> 0a080001
> [  182.653793 ] bda0: 1001 ffc1 ffc07f025000 
> ffc00f2750a8
> [  182.654226 ] bdc0: 000105f8 ffc00075a000 06a0 
> ffc000727000
> [  182.654522 ] bde0: ffc0006e8478 ffc0 0001 
> ffc078fb9000
> [  182.654869 ] be00: ffc07963be30 ffc0 ffc07957c080 
> ffccfc4c
> [  182.655225 ] be20: ffc07963be20 ffc07963be20  
> ffc85c50
> [  182.655588 ] be40: ffcb4f64 ffc07961fb80  
> 
> [  182.656138 ] be60:  ffcbee2c ffcb4f64 
> 
> [  182.656609 ] be80:    
> 
> [  182.657145 ] bea0: ffc07963bea0 ffc07963bea0  
> ffc0
> [  182.657475 ] bec0: ffc07963bec0 ffc07963bec0  
> 
> [  182.657922 ] bee0:    
> 
> [  182.658558 ] bf00:    
> 
> [  182.658972 ] bf20:    
> 
> [  182.659291 ] bf40:    
> 
> [  182.659722 ] bf60:    
> 
> [  182.660122 ] bf80:    
> 
> [  182.660654 ] bfa0:    
> 
> [  182.661064 ] bfc0:    
> 0005
> [  182.661466 ] bfe0:    
> 
> [  182.661848 ] Call trace:
> [  182.662050 ] [] khugepaged+0x378/0x1af8
> [  182.662294 ] [] kthread+0xdc/0xf4
> [  182.662605 ] [] ret_from_fork+0xc/0x40
> [  182.663046 ] Code: 35001700 f0002c60 aa0703e3 f9009fa0 (f94000e0)
> [  182.663901 ] ---[ end trace 637503d8e28ae69e  ]---
> [  182.664160 ] Kernel panic - not syncing: Fatal exception
> [  182.664571 ] CPU2

Re: [PATCH v2 net-next] net/core: ensure features get disabled on new lower devs

2015-11-13 Thread Laura Abbott

On 11/13/2015 02:51 AM, Nikolay Aleksandrov wrote:

On 11/13/2015 11:29 AM, Jiri Pirko wrote:

Fri, Nov 13, 2015 at 01:26:18AM CET, f.faine...@gmail.com wrote:

On 04/11/15 18:56, David Miller wrote:

Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down 
stack")

  ...

Reported-by: Nikolay Aleksandrov 
Signed-off-by: Jarod Wilson 
---
v2: Based on suggestions from Alex, and with not changing err to ret, this
patch actually becomes quite minimal and doesn't ugly up the code much.


Applied, thanks.


This causes some warnings to be displayed for DSA stacked devices:

[1.272297] brcm-sf2 f0b0.ethernet_switch: Starfighter 2 top:
4.00, core: 2.00 base: 0xf0c8, IRQs: 68, 69
[1.283181] libphy: dsa slave smi: probed
[1.344088] f0b403c0.mdio:05: Broadcom BCM7445 PHY revision: 0xd0,
patch: 3
[1.658917] brcm-sf2 f0b0.ethernet_switch gphy (uninitialized):
attached PHY at address 5 [Broadcom BCM7445]
[1.669414] brcm-sf2 f0b0.ethernet_switch gphy: set_features()
failed (-1); wanted 0x4020, left 0x4820
[1.734202] brcm-sf2 f0b0.ethernet_switch rgmii_1
(uninitialized): attached PHY at address 0 [Generic PHY]
[1.744486] brcm-sf2 f0b0.ethernet_switch rgmii_1: set_features()
failed (-1); wanted 0x4020, left 0x4820
[1.809091] brcm-sf2 f0b0.ethernet_switch rgmii_2
(uninitialized): attached PHY at address 1 [Generic PHY]
[1.819364] brcm-sf2 f0b0.ethernet_switch rgmii_2: set_features()
failed (-1); wanted 0x4020, left 0x4820
[1.884090] brcm-sf2 f0b0.ethernet_switch moca (uninitialized):
attached PHY at address 2 [Generic PHY]
[1.894109] brcm-sf2 f0b0.ethernet_switch moca: set_features()
failed (-1); wanted 0x4020, left 0x4820

DSA slave network devices are not associated with their master network
device using the typical lower/upper netdev helpers.

I do not have a good fix to come up with yet, but if you see something
obvious with net/dsa/slave.c, feel free to send patches for testing, I
can boot net-next on this platform.


I'm having similar issues with bridge, with linus's git now:


[snip]

Hmm, I think it's because the bridge and dsa/slave don't have ndo_set_features()
so err is left as -1 and thus an error is reported which isn't actually true.
Before in this case the features would just get set, so could you please try
the following patch ?


diff --git a/net/core/dev.c b/net/core/dev.c
index ab9b8d0d115e..4a1d198dbbff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6426,6 +6426,8 @@ int __netdev_update_features(struct net_device *dev)

if (dev->netdev_ops->ndo_set_features)
err = dev->netdev_ops->ndo_set_features(dev, features);
+   else
+   err = 0;

if (unlikely(err < 0)) {
netdev_err(dev,



The patch seems to be working for at least one person who reported the
problem in Fedora rawhide https://bugzilla.redhat.com/show_bug.cgi?id=1281674

Thanks,
Laura

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: get rid of __alloc_pages_high_priority

2015-11-13 Thread David Rientjes
On Fri, 13 Nov 2015, Michal Hocko wrote:

> > > Hi,
> > > I think that this is more a cleanup than any functional change. We
> > > are rarely screwed so much that __alloc_pages_high_priority would
> > > fail. Yet I think that __alloc_pages_high_priority is obscuring the
> > > overal intention more than it is helpful. Another motivation is to
> > > reduce wait_iff_congested call to a single one in the allocator. I plan
> > > to do other changes in that area and get rid of it altogether.
> > 
> > I think it's a combination of a cleanup (the inlining of 
> > __alloc_pages_high_priority) and a functional change (no longer looping 
> > infinitely around a get_page_from_freelist() call).  I'd suggest doing the 
> > inlining in one patch and then the reworking of __GFP_NOFAIL when 
> > ALLOC_NO_WATERMARKS fails just so we could easily revert the latter if 
> > necessary.
> 
> I can split it up if this is really preferable of course.

I think it's preferable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 5/8] drivers:input:ads7846(+tsc2046): add new common binding names, pre-calibration and flipping

2015-11-13 Thread Sebastian Reichel
Hi,

On Fri, Nov 13, 2015 at 09:35:56PM +0100, H. Nikolaus Schaller wrote:
> commit b98abe52fa8e ("Input: add common DT binding for touchscreens")
> introduced common DT bindings for touchscreens [1] and a helper function to
> parse the DT.
> 
> This has been integrated and interpretation of the inversion (flipping)
> properties for the x and y axis has been added to accommodate any
> orientation of the touch in relation to the LCD.
> 
> By scaling the min/max ADC values to the screen size it is now possible to
> pre-calibrate the touch so that is (almost) exactly matches the LCD it is
> glued onto. This allows to well enough operate the touch before a user
> space calibration can improve the precision.
> 
> [1]: Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt
> 
> Signed-off-by: H. Nikolaus Schaller 
> ---
>  .../devicetree/bindings/input/ads7846.txt  |  8 ++-
>  drivers/input/touchscreen/ads7846.c| 72 
> --
>  2 files changed, 74 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/input/ads7846.txt 
> b/Documentation/devicetree/bindings/input/ads7846.txt
> index df8b127..ae56355 100644
> --- a/Documentation/devicetree/bindings/input/ads7846.txt
> +++ b/Documentation/devicetree/bindings/input/ads7846.txt
> @@ -26,12 +26,17 @@ Additional required properties:
>  
>  Optional properties:
>  
> +You can optionally specify any of the touchscreen parameters described in
> +
> + Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt
> +
> +This allows to scale, invert or swap coordinates and define the fuzz factors.
> +
>   ti,vref-delay-usecs vref supply delay in usecs, 0 for
>   external vref (u16).
>   ti,vref-mv  The VREF voltage, in millivolts (u16).
>   ti,keep-vref-on set to keep vref on for differential
>   measurements as well
> - ti,swap-xy  swap x and y axis

I guess this should be:

ti,swap-xy: deprecated name for touchscreen-swapped-x-y

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH 2/4] timer: relax tick stop in idle entry

2015-11-13 Thread Jacob Pan
On Fri, 13 Nov 2015 15:22:16 -0500 (EST)
Thomas Gleixner  wrote:

> 
> 
> On Fri, 13 Nov 2015, Jacob Pan wrote:
> 
> > Upon entering idle, we can turn off tick if the next timeout
> > is exactly one tick away. Otherwise, we could enter inner idle loop
> > with tick still enabled, without resched set, the tick will continue
> > during idle therefore less optimal in terms of energy savings.
> 
> This does not make any sense at all. 
> 
> next_tick is the next required tick event. If it's exactly ONE tick
> away why should we go through the hassle of stopping it? Just to
> cancel the timer and then set it to the same value again? Oh well.
> 
I have been trying to understand this code, please help. Here is my theory
and the ftrace of an injection period where tick did not stop.
(sorry about the long lines). My comments are after [JP]

 cat-1993  [000]30.093405: sched_cfs_idle_inject: action:0 
throttled:1
[JP] injection timer expired, set forced idle flag, call scheduler

 cat-1993  [000]30.093406: hrtimer_expire_exit:  
hrtimer=0x88003de0cc20
 cat-1993  [000]30.093406: hrtimer_start:
hrtimer=0x88003de0cc20 function=idle_inject_timer_fn/0x0 
expires=29993055250 softexpires=29993055250
 cat-1993  [000]30.093407: hrtimer_cancel:   
hrtimer=0x88003dfce400
 cat-1993  [000]30.093407: hrtimer_expire_entry: 
hrtimer=0x88003dfce400 now=29988042960 function=tick_sched_timer/0x0
 cat-1993  [000]30.093407: function: 
tick_sched_timer
 cat-1993  [000]30.093422: function:
tick_sched_do_timer
 cat-1993  [000]30.093422: function:   
tick_do_update_jiffies64
 cat-1993  [000]30.093433: function:
tick_sched_handle.isra.15
 cat-1993  [000]30.093447: sched_stat_runtime:   comm=cat 
pid=1993 runtime=1058498 [ns] vruntime=6695549826 [ns]
 cat-1993  [000]30.093449: hrtimer_expire_exit:  
hrtimer=0x88003dfce400
 cat-1993  [000]30.093449: hrtimer_start:
hrtimer=0x88003dfce400 function=tick_sched_timer/0x0 expires=2998900 
softexpires=2998900
 cat-1993  [000]30.093450: function: 
tick_program_event
 cat-1993  [000]30.093460: sched_waking: 
comm=rcu_preempt pid=7 prio=120 target_cpu=002
 cat-1993  [000]30.093461: sched_wake_idle_without_ipi: cpu=2

 cat-1993  [000]30.093463: sched_cfs_idle_inject: action:1 
throttled:1
[JP] CFS pick_next_task_fair sees forced idle, pick no task to run.


 cat-1993  [000]30.093463: sched_stat_runtime:   comm=cat 
pid=1993 runtime=16122 [ns] vruntime=6695565948 [ns]
 cat-1993  [000]30.093464: sched_switch: cat:1993 [120] 
R ==> swapper/0:0 [120]
  -0 [000]30.093465: function: 
tick_nohz_idle_enter

  -0 [000]30.093473: bprint:   
__tick_nohz_idle_enter: JPAN: __tick_nohz_idle_enter 803
  -0 [000]30.093473: bprint:   
__tick_nohz_idle_enter: JPAN: can_stop_idle_tick 743
[JP] can_stop_idle_tick() checks ok to stop tick

  -0 [000]30.093474: bprint:   
__tick_nohz_idle_enter: JPAN: tick_nohz_stop_sched_tick 609 delta 100
[JP] but sees delta is exactly 1 tick away. didn't stop tick.

  -0 [000]30.093475: function: 
tick_check_broadcast_expired
  -0 [000]30.094366: function: tick_irq_enter
  -0 [000]30.094367: function:
tick_check_oneshot_broadcast_this_cpu
  -0 [000]30.094372: function:
tick_nohz_stop_idle
  -0 [000]30.094387: hrtimer_cancel:
  hrtimer=0x88003dfce400

[JP] enter repeated tick sched in inner idle loop since !need_resched()


> Thanks,
> 
>   tglx
> 
> 

[Jacob Pan]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] Documentation/x86: Update EFI memory region description

2015-11-13 Thread Matt Fleming
On Fri, 13 Nov, at 08:42:54AM, Linus Torvalds wrote:
> On Fri, Nov 13, 2015 at 1:29 AM, Matt Fleming  
> wrote:
> > On Fri, 13 Nov, at 10:22:10AM, Ingo Molnar wrote:
> >
> > You've snipped the patch hunk that gives the address range used,
> 
> I'm actually wondering if we should strive to make the UEFI stuff more
> like a user process, and just map the UEFI mappings in low memory in
> that magic UEFI address space.
 
We map things in the user address space now but only for the purposes
of having an identity mapping, for the reasons that I mentioned
previously: bust firmware accesses and for the SetVirtaulAddressMap()
call [1]. Importantly, the kernel does not access the identity mapping
directly.

So if we were to repurpose the user address space it would make sense
to just have the identity mapping be the one and only mapping.

However, going through the identity addresses to invoke EFI runtime
services is known to break some Apple Macs. It's probably worth
revisiting this issue, because I don't have any further details.

Having a separate mapping in the user address space that isn't the
identity mapping is also possible of course.

> We won't be able to run those things *as* user space, since I assume
> the code will want to do a lot of kernely things, but shouldn't we aim
> to make it look as much like that as possible? Maybe some day we could
> even strive to run it in some controlled environment (ie user space
> with fixups, virtual machine, whatever), but even if we never get
> there it sounds like a potentially good idea to try to set up the
> mappings to move in that direction..

It would be interesting to see how far we could push this, say, using
SMAP/SMEP to further isolate what kernel pieces the firmware can
touch. It's not about security guarantees since most of the firmware
functionality is implemented in SMM today for x86, but it does go some
way towards providing protection from unintended accesses.

> No big hurry, and maybe there are good reasons not to go that way. The
> first step is indeed just to get rid of the WX mappings in the normal
> kernel page tables.

I think it's worth exploring.

[1] Oh, and also for the EFI mixed mode code (running 64-bit kernels
on 32-bit EFI), but less people tend to care about that ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.13.y-ckt 07/96] perf tools: Fix copying of /proc/kcore

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Adrian Hunter 

commit b5cabbcbd157a4bf5a92dfc85134999a3b55342d upstream.

A copy of /proc/kcore containing the kernel text can be made to the
buildid cache. e.g.

perf buildid-cache -v -k /proc/kcore

To workaround objdump limitations, a copy is also made when annotating
against /proc/kcore.

The copying process stops working from libelf about v1.62 onwards (the
problem was found with v1.63).

The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
because additional validation has been added to gelf_getphdr().

The use of gelf_getphdr() is a misguided attempt to get default
initialization of the Gelf_Phdr structure.  That should not be
necessary because every member of the Gelf_Phdr structure is
subsequently assigned.  So just remove the call to gelf_getphdr().

Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
removed also.

Committer notes:

Note to sta...@kernel.org, from Adrian in the cover letter for this
patchkit:

The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
it is important enough for stable.

Signed-off-by: Adrian Hunter 
Cc: Jiri Olsa 
Link: 
http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Kamal Mostafa 
---
 tools/perf/util/symbol-elf.c | 35 +--
 1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index eed0b96..9efb213 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1098,8 +1098,6 @@ out_close:
 static int kcore__init(struct kcore *kcore, char *filename, int elfclass,
   bool temp)
 {
-   GElf_Ehdr *ehdr;
-
kcore->elfclass = elfclass;
 
if (temp)
@@ -1116,9 +1114,7 @@ static int kcore__init(struct kcore *kcore, char 
*filename, int elfclass,
if (!gelf_newehdr(kcore->elf, elfclass))
goto out_end;
 
-   ehdr = gelf_getehdr(kcore->elf, &kcore->ehdr);
-   if (!ehdr)
-   goto out_end;
+   memset(&kcore->ehdr, 0, sizeof(GElf_Ehdr));
 
return 0;
 
@@ -1175,23 +1171,18 @@ static int kcore__copy_hdr(struct kcore *from, struct 
kcore *to, size_t count)
 static int kcore__add_phdr(struct kcore *kcore, int idx, off_t offset,
   u64 addr, u64 len)
 {
-   GElf_Phdr gphdr;
-   GElf_Phdr *phdr;
-
-   phdr = gelf_getphdr(kcore->elf, idx, &gphdr);
-   if (!phdr)
-   return -1;
-
-   phdr->p_type= PT_LOAD;
-   phdr->p_flags   = PF_R | PF_W | PF_X;
-   phdr->p_offset  = offset;
-   phdr->p_vaddr   = addr;
-   phdr->p_paddr   = 0;
-   phdr->p_filesz  = len;
-   phdr->p_memsz   = len;
-   phdr->p_align   = page_size;
-
-   if (!gelf_update_phdr(kcore->elf, idx, phdr))
+   GElf_Phdr phdr = {
+   .p_type = PT_LOAD,
+   .p_flags= PF_R | PF_W | PF_X,
+   .p_offset   = offset,
+   .p_vaddr= addr,
+   .p_paddr= 0,
+   .p_filesz   = len,
+   .p_memsz= len,
+   .p_align= page_size,
+   };
+
+   if (!gelf_update_phdr(kcore->elf, idx, &phdr))
return -1;
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.13.y-ckt 08/96] ASoC: db1200: Fix DAI link format for db1300 and db1550

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Lars-Peter Clausen 

commit e74679b38c9417c1c524081121cdcdb36f82264d upstream.

Commit b4508d0f95fa ("ASoC: db1200: Use static DAI format setup") switched
the db1200 driver over to using static DAI format setup instead of a
callback function. But the commit only added the dai_fmt field to one of
the three DAI links in the driver. This breaks audio on db1300 and db1550.

Add the two missing dai_fmt settings to fix the issue.

Fixes: b4508d0f95fa ("ASoC: db1200: Use static DAI format setup")
Reported-by: Manuel Lauss 
Tested-by: Manuel Lauss 
Signed-off-by: Lars-Peter Clausen 
Signed-off-by: Mark Brown 
Signed-off-by: Kamal Mostafa 
---
 sound/soc/au1x/db1200.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/sound/soc/au1x/db1200.c b/sound/soc/au1x/db1200.c
index decba87..4e53f59 100644
--- a/sound/soc/au1x/db1200.c
+++ b/sound/soc/au1x/db1200.c
@@ -142,6 +142,8 @@ static struct snd_soc_dai_link db1300_i2s_dai = {
.cpu_dai_name   = "au1xpsc_i2s.2",
.platform_name  = "au1xpsc-pcm.2",
.codec_name = "wm8731.0-001b",
+   .dai_fmt= SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
+ SND_SOC_DAIFMT_CBM_CFM,
.ops= &db1200_i2s_wm8731_ops,
 };
 
@@ -159,6 +161,8 @@ static struct snd_soc_dai_link db1550_i2s_dai = {
.cpu_dai_name   = "au1xpsc_i2s.3",
.platform_name  = "au1xpsc-pcm.3",
.codec_name = "wm8731.0-001b",
+   .dai_fmt= SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
+ SND_SOC_DAIFMT_CBM_CFM,
.ops= &db1200_i2s_wm8731_ops,
 };
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.13.y-ckt 05/96] regmap: debugfs: Don't bother actually printing when calculating max length

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Mark Brown 

commit 176fc2d5770a0990eebff903ba680d2edd32e718 upstream.

The in kernel snprintf() will conveniently return the actual length of
the printed string even if not given an output beffer at all so just do
that rather than relying on the user to pass in a suitable buffer,
ensuring that we don't need to worry if the buffer was truncated due to
the size of the buffer passed in.

Reported-by: Rasmus Villemoes 
Signed-off-by: Mark Brown 
Signed-off-by: Kamal Mostafa 
---
 drivers/base/regmap/regmap-debugfs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/regmap/regmap-debugfs.c 
b/drivers/base/regmap/regmap-debugfs.c
index b454be2..3b31805 100644
--- a/drivers/base/regmap/regmap-debugfs.c
+++ b/drivers/base/regmap/regmap-debugfs.c
@@ -32,8 +32,7 @@ static DEFINE_MUTEX(regmap_debugfs_early_lock);
 /* Calculate the length of a fixed format  */
 static size_t regmap_calc_reg_len(int max_val, char *buf, size_t buf_size)
 {
-   snprintf(buf, buf_size, "%x", max_val);
-   return strlen(buf);
+   return snprintf(NULL, 0, "%x", max_val);
 }
 
 static ssize_t regmap_name_read_file(struct file *file,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.13.y-ckt 09/96] m68k: Define asmlinkage_protect

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Andreas Schwab 

commit 8474ba74193d302e83401e16c85cc4b98caf upstream.

Make sure the compiler does not modify arguments of syscall functions.
This can happen if the compiler generates a tailcall to another
function.  For example, without asmlinkage_protect sys_openat is compiled
into this function:

sys_openat:
clr.l %d0
move.w 18(%sp),%d0
move.l %d0,16(%sp)
jbra do_sys_open

Note how the fourth argument is modified in place, modifying the register
%d4 that gets restored from this stack slot when the function returns to
user-space.  The caller may expect the register to be unmodified across
system calls.

Signed-off-by: Andreas Schwab 
Signed-off-by: Geert Uytterhoeven 
Signed-off-by: Kamal Mostafa 
---
 arch/m68k/include/asm/linkage.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/m68k/include/asm/linkage.h b/arch/m68k/include/asm/linkage.h
index 5a822bb..066e74f 100644
--- a/arch/m68k/include/asm/linkage.h
+++ b/arch/m68k/include/asm/linkage.h
@@ -4,4 +4,34 @@
 #define __ALIGN .align 4
 #define __ALIGN_STR ".align 4"
 
+/*
+ * Make sure the compiler doesn't do anything stupid with the
+ * arguments on the stack - they are owned by the *caller*, not
+ * the callee. This just fools gcc into not spilling into them,
+ * and keeps it from doing tailcall recursion and/or using the
+ * stack slots for temporaries, since they are live and "used"
+ * all the way to the end of the function.
+ */
+#define asmlinkage_protect(n, ret, args...) \
+   __asmlinkage_protect##n(ret, ##args)
+#define __asmlinkage_protect_n(ret, args...) \
+   __asm__ __volatile__ ("" : "=r" (ret) : "0" (ret), ##args)
+#define __asmlinkage_protect0(ret) \
+   __asmlinkage_protect_n(ret)
+#define __asmlinkage_protect1(ret, arg1) \
+   __asmlinkage_protect_n(ret, "m" (arg1))
+#define __asmlinkage_protect2(ret, arg1, arg2) \
+   __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2))
+#define __asmlinkage_protect3(ret, arg1, arg2, arg3) \
+   __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3))
+#define __asmlinkage_protect4(ret, arg1, arg2, arg3, arg4) \
+   __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
+ "m" (arg4))
+#define __asmlinkage_protect5(ret, arg1, arg2, arg3, arg4, arg5) \
+   __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
+ "m" (arg4), "m" (arg5))
+#define __asmlinkage_protect6(ret, arg1, arg2, arg3, arg4, arg5, arg6) \
+   __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
+ "m" (arg4), "m" (arg5), "m" (arg6))
+
 #endif
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.13.y-ckt 02/96] ppp, slip: Validate VJ compression slot parameters completely

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Ben Hutchings 

commit 4ab42d78e37a294ac7bc56901d563c642e03c4ae upstream.

Currently slhc_init() treats out-of-range values of rslots and tslots
as equivalent to 0, except that if tslots is too large it will
dereference a null pointer (CVE-2015-7799).

Add a range-check at the top of the function and make it return an
ERR_PTR() on error instead of NULL.  Change the callers accordingly.

Compile-tested only.

Reported-by: 郭永刚 
References: http://article.gmane.org/gmane.comp.security.oss.general/17908
Signed-off-by: Ben Hutchings 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 drivers/isdn/i4l/isdn_ppp.c   | 10 --
 drivers/net/ppp/ppp_generic.c |  6 ++
 drivers/net/slip/slhc.c   | 12 
 drivers/net/slip/slip.c   |  2 +-
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c
index 12bcce1..0ed6731 100644
--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -322,9 +322,9 @@ isdn_ppp_open(int min, struct file *file)
 * VJ header compression init
 */
is->slcomp = slhc_init(16, 16); /* not necessary for 2. link in bundle 
*/
-   if (!is->slcomp) {
+   if (IS_ERR(is->slcomp)) {
isdn_ppp_ccp_reset_free(is);
-   return -ENOMEM;
+   return PTR_ERR(is->slcomp);
}
 #endif
 #ifdef CONFIG_IPPP_FILTER
@@ -574,10 +574,8 @@ isdn_ppp_ioctl(int min, struct file *file, unsigned int 
cmd, unsigned long arg)
is->maxcid = val;
 #ifdef CONFIG_ISDN_PPP_VJ
sltmp = slhc_init(16, val);
-   if (!sltmp) {
-   printk(KERN_ERR "ippp, can't realloc slhc 
struct\n");
-   return -ENOMEM;
-   }
+   if (IS_ERR(sltmp))
+   return PTR_ERR(sltmp);
if (is->slcomp)
slhc_free(is->slcomp);
is->slcomp = sltmp;
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 5a1897d..a2d7d5f 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -716,10 +716,8 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
val &= 0x;
}
vj = slhc_init(val2+1, val+1);
-   if (!vj) {
-   netdev_err(ppp->dev,
-  "PPP: no memory (VJ compressor)\n");
-   err = -ENOMEM;
+   if (IS_ERR(vj)) {
+   err = PTR_ERR(vj);
break;
}
ppp_lock(ppp);
diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index 1252d9c..b52eabc 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -84,8 +84,9 @@ static long decode(unsigned char **cpp);
 static unsigned char * put16(unsigned char *cp, unsigned short x);
 static unsigned short pull16(unsigned char **cpp);
 
-/* Initialize compression data structure
+/* Allocate compression data structure
  * slots must be in range 0 to 255 (zero meaning no compression)
+ * Returns pointer to structure or ERR_PTR() on error.
  */
 struct slcompress *
 slhc_init(int rslots, int tslots)
@@ -94,11 +95,14 @@ slhc_init(int rslots, int tslots)
register struct cstate *ts;
struct slcompress *comp;
 
+   if (rslots < 0 || rslots > 255 || tslots < 0 || tslots > 255)
+   return ERR_PTR(-EINVAL);
+
comp = kzalloc(sizeof(struct slcompress), GFP_KERNEL);
if (! comp)
goto out_fail;
 
-   if ( rslots > 0  &&  rslots < 256 ) {
+   if (rslots > 0) {
size_t rsize = rslots * sizeof(struct cstate);
comp->rstate = kzalloc(rsize, GFP_KERNEL);
if (! comp->rstate)
@@ -106,7 +110,7 @@ slhc_init(int rslots, int tslots)
comp->rslot_limit = rslots - 1;
}
 
-   if ( tslots > 0  &&  tslots < 256 ) {
+   if (tslots > 0) {
size_t tsize = tslots * sizeof(struct cstate);
comp->tstate = kzalloc(tsize, GFP_KERNEL);
if (! comp->tstate)
@@ -141,7 +145,7 @@ out_free2:
 out_free:
kfree(comp);
 out_fail:
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 }
 
 
diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index 8752644..0641fcc 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -164,7 +164,7 @@ static int sl_alloc_bufs(struct slip *sl, int mtu)
if (cbuff == NULL)
goto err_exit;
slcomp = slhc_init(16, 16);
-   if (slcomp == NULL)
+   if (IS_ERR(slcomp))
goto err_exit;
 #endif
  

[PATCH 3.13.y-ckt 04/96] regmap: debugfs: Ensure we don't underflow when printing access masks

2015-11-13 Thread Kamal Mostafa
3.13.11-ckt30 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Mark Brown 

commit b763ec17ac762470eec5be8ebcc43e4f8b2c2b82 upstream.

If a read is attempted which is smaller than the line length then we may
underflow the subtraction we're doing with the unsigned size_t type so
move some of the calculation to be additions on the right hand side
instead in order to avoid this.

Reported-by: Rasmus Villemoes 
Signed-off-by: Mark Brown 
Signed-off-by: Kamal Mostafa 
---
 drivers/base/regmap/regmap-debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/regmap/regmap-debugfs.c 
b/drivers/base/regmap/regmap-debugfs.c
index 004e132..b454be2 100644
--- a/drivers/base/regmap/regmap-debugfs.c
+++ b/drivers/base/regmap/regmap-debugfs.c
@@ -432,7 +432,7 @@ static ssize_t regmap_access_read_file(struct file *file,
/* If we're in the region the user is trying to read */
if (p >= *ppos) {
/* ...but not beyond it */
-   if (buf_pos >= count - 1 - tot_len)
+   if (buf_pos + tot_len + 1 >= count)
break;
 
/* Format the register */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >