date:20161029

Re: [RFC PATCH] kbuild: add -fno-PIE

2016-10-29 Thread Tomas Janousek

Hi Sven,

On Mon, Oct 24, 2016 at 07:32:30PM +0200, Sven Joachim wrote:
> The attached patch works for me with Debian's gcc-6 package.

I tried your patch when building 4.8.5 on an up-to-date Debian testing and
still got this:

  AS  arch/x86/entry/vdso/vdso32/note.o
arch/x86/entry/vdso/vdso32/note.S:1:0: sorry, unimplemented: -mfentry isn’t 
supported for 32-bit in combination with -fpic

Adding KBUILD_AFLAGS += $(call cc-option,-fno-pie,) helps.
(Maybe that should be as-option instead. Don't know. There are lots of
AFLAGS=$(call cc-option, ...) in the Makefiles, anyway.)

-- 
Tomáš Janoušek, a.k.a. Pivník, a.k.a. Liskni_si, http://work.lisk.in/

Re: [RFC PATCH] kbuild: add -fno-PIE

2016-10-29 Thread Tomas Janousek

Hi Sven,

On Mon, Oct 24, 2016 at 07:32:30PM +0200, Sven Joachim wrote:
> The attached patch works for me with Debian's gcc-6 package.

I tried your patch when building 4.8.5 on an up-to-date Debian testing and
still got this:

  AS  arch/x86/entry/vdso/vdso32/note.o
arch/x86/entry/vdso/vdso32/note.S:1:0: sorry, unimplemented: -mfentry isn’t 
supported for 32-bit in combination with -fpic

Adding KBUILD_AFLAGS += $(call cc-option,-fno-pie,) helps.
(Maybe that should be as-option instead. Don't know. There are lots of
AFLAGS=$(call cc-option, ...) in the Makefiles, anyway.)

-- 
Tomáš Janoušek, a.k.a. Pivník, a.k.a. Liskni_si, http://work.lisk.in/

Re: 4.8.2 not booting in 32-bit VM without I/O-APIC

2016-10-29 Thread Borislav Petkov

On Fri, Oct 28, 2016 at 09:34:53PM +0200, Thomas Gleixner wrote:
> Right. That mapping setup is an utter trainwreck as we do it from multiple
> places, but there is no reason why we can't move it before the call to
> prefill_possible_map().

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bbfbca5fea0c..b59fdba3cbdf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1221,11 +1221,13 @@ void __init setup_arch(char **cmdline_p)
>*/
>   get_smp_config();
>  
> + /* Make sure apic is mapped before prefill_possible_map() */
> + init_apic_mappings();
> +
>   prefill_possible_map();
>  
>   init_cpu_to_node();
>  
> - init_apic_mappings();
>   io_apic_init_mappings();
>  
>   kvm_guest_init();

FWIW, I got another user's confirmation that this works with his
virtual box:

https://bugzilla.suse.com/show_bug.cgi?id=1006417#c32
https://bugzilla.suse.com/show_bug.cgi?id=1006417#c33

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

Re: 4.8.2 not booting in 32-bit VM without I/O-APIC

2016-10-29 Thread Borislav Petkov

On Fri, Oct 28, 2016 at 09:34:53PM +0200, Thomas Gleixner wrote:
> Right. That mapping setup is an utter trainwreck as we do it from multiple
> places, but there is no reason why we can't move it before the call to
> prefill_possible_map().

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bbfbca5fea0c..b59fdba3cbdf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1221,11 +1221,13 @@ void __init setup_arch(char **cmdline_p)
>*/
>   get_smp_config();
>  
> + /* Make sure apic is mapped before prefill_possible_map() */
> + init_apic_mappings();
> +
>   prefill_possible_map();
>  
>   init_cpu_to_node();
>  
> - init_apic_mappings();
>   io_apic_init_mappings();
>  
>   kvm_guest_init();

FWIW, I got another user's confirmation that this works with his
virtual box:

https://bugzilla.suse.com/show_bug.cgi?id=1006417#c32
https://bugzilla.suse.com/show_bug.cgi?id=1006417#c33

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

Section mismatch in reference from the function generic_NCR5380_isa_match()

2016-10-29 Thread Borislav Petkov

Hi,

I'm seeing this during randconfig builds:

WARNING: vmlinux.o(.text+0x1588439): Section mismatch in reference from the 
function generic_NCR5380_isa_match() to the function .init.text:probe_intr()
The function generic_NCR5380_isa_match() references
the function __init probe_intr().
This is often because generic_NCR5380_isa_match lacks a __init 
annotation or the annotation of probe_intr is wrong.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

Section mismatch in reference from the function generic_NCR5380_isa_match()

2016-10-29 Thread Borislav Petkov

Hi,

I'm seeing this during randconfig builds:

WARNING: vmlinux.o(.text+0x1588439): Section mismatch in reference from the 
function generic_NCR5380_isa_match() to the function .init.text:probe_intr()
The function generic_NCR5380_isa_match() references
the function __init probe_intr().
This is often because generic_NCR5380_isa_match lacks a __init 
annotation or the annotation of probe_intr is wrong.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

Re: [PATCH 2/2] MAINTAINERS: add ARM and arm64 EFI specific files to EFI subsystem

2016-10-29 Thread Ard Biesheuvel

On 21 September 2016 at 16:35, Ard Biesheuvel  wrote:
> Since I will be co-maintaining the EFI subsystem, it makes sense to
> mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS
> so that Matt, the list and I get cc'ed on proposed changes.
>
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Russell King 
> Signed-off-by: Ard Biesheuvel 
> ---
>  MAINTAINERS | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>

Russell,

do you have an objections to this change?

Thanks,
Ard.


> diff --git a/MAINTAINERS b/MAINTAINERS
> index 224518556a84..cc8b36699f94 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4562,12 +4562,14 @@ L:  linux-...@vger.kernel.org
>  T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
>  S: Maintained
>  F: Documentation/efi-stub.txt
> -F: arch/ia64/kernel/efi.c
> +F: arch/*/kernel/efi.c
>  F: arch/x86/boot/compressed/eboot.[ch]
> -F: arch/x86/include/asm/efi.h
> +F: arch/*/include/asm/efi.h
>  F: arch/x86/platform/efi/
>  F: drivers/firmware/efi/
>  F: include/linux/efi*.h
> +F: arch/arm/boot/compressed/efi-header.S
> +F: arch/arm64/kernel/efi-entry.S
>
>  EFI VARIABLE FILESYSTEM
>  M: Matthew Garrett 
> --
> 2.7.4
>

Re: [PATCH 2/2] MAINTAINERS: add ARM and arm64 EFI specific files to EFI subsystem

2016-10-29 Thread Ard Biesheuvel

On 21 September 2016 at 16:35, Ard Biesheuvel  wrote:
> Since I will be co-maintaining the EFI subsystem, it makes sense to
> mention the ARM and arm64 EFI bits in the EFI section in MAINTAINERS
> so that Matt, the list and I get cc'ed on proposed changes.
>
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Russell King 
> Signed-off-by: Ard Biesheuvel 
> ---
>  MAINTAINERS | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>

Russell,

do you have an objections to this change?

Thanks,
Ard.


> diff --git a/MAINTAINERS b/MAINTAINERS
> index 224518556a84..cc8b36699f94 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4562,12 +4562,14 @@ L:  linux-...@vger.kernel.org
>  T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
>  S: Maintained
>  F: Documentation/efi-stub.txt
> -F: arch/ia64/kernel/efi.c
> +F: arch/*/kernel/efi.c
>  F: arch/x86/boot/compressed/eboot.[ch]
> -F: arch/x86/include/asm/efi.h
> +F: arch/*/include/asm/efi.h
>  F: arch/x86/platform/efi/
>  F: drivers/firmware/efi/
>  F: include/linux/efi*.h
> +F: arch/arm/boot/compressed/efi-header.S
> +F: arch/arm64/kernel/efi-entry.S
>
>  EFI VARIABLE FILESYSTEM
>  M: Matthew Garrett 
> --
> 2.7.4
>

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-29 Thread Kirti Wankhede



On 10/29/2016 10:00 AM, Jike Song wrote:
> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
>> +{
>> +int ret;
>> +struct parent_device *parent;
>> +
>> +/* check for mandatory ops */
>> +if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>> +return -EINVAL;
>> +
>> +dev = get_device(dev);
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +mutex_lock(_list_lock);
>> +
>> +/* Check for duplicate */
>> +parent = __find_parent_device(dev);
>> +if (parent) {
>> +ret = -EEXIST;
>> +goto add_dev_err;
>> +}
>> +
>> +parent = kzalloc(sizeof(*parent), GFP_KERNEL);
>> +if (!parent) {
>> +ret = -ENOMEM;
>> +goto add_dev_err;
>> +}
>> +
>> +kref_init(>ref);
>> +mutex_init(>lock);
>> +
>> +parent->dev = dev;
>> +parent->ops = ops;
>> +
>> +ret = parent_create_sysfs_files(parent);
>> +if (ret) {
>> +mutex_unlock(_list_lock);
>> +mdev_put_parent(parent);
>> +return ret;
>> +}
>> +
>> +ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
>> +if (ret)
>> +dev_warn(dev, "Failed to create compatibility class link\n");
>> +
> 
> Hi Kirti,
> 
> Like I replied to previous version:
> 
>   http://www.spinics.net/lists/kvm/msg139331.html
> 

Hi Jike,

I saw your reply but by that time v10 version of patch series was out
for review.

> You can always check if mdev_bus_compat_class already registered
> here, and register it if not yet. Same logic should be adopted to
> mdev_init.
> 
> Current implementation will simply panic if configured as builtin,
> which is rare but far from impossible.
> 

Can you verify attached patch with v10 patch-set whether this works for you?
I'll incorporate this change in my next version.

Thanks,
Kirti

From: Kirti Wankhede 
Date: Sat, 29 Oct 2016 15:12:01 +0530
Subject: [PATCH 1/1] Register mdev_bus class on first mdev_device_register

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
---
 drivers/vfio/mdev/mdev_core.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 9d8fa5c91c2e..54c59f325336 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -187,13 +187,18 @@ int mdev_register_device(struct device *dev, const struct 
parent_ops *ops)
parent->dev = dev;
parent->ops = ops;
 
-   ret = parent_create_sysfs_files(parent);
-   if (ret) {
-   mutex_unlock(_list_lock);
-   mdev_put_parent(parent);
-   return ret;
+   if (!mdev_bus_compat_class) {
+   mdev_bus_compat_class = class_compat_register("mdev_bus");
+   if (!mdev_bus_compat_class) {
+   ret = -ENOMEM;
+   goto add_dev_err;
+   }
}
 
+   ret = parent_create_sysfs_files(parent);
+   if (ret)
+   goto add_dev_err;
+
ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
if (ret)
dev_warn(dev, "Failed to create compatibility class link\n");
@@ -206,7 +211,10 @@ int mdev_register_device(struct device *dev, const struct 
parent_ops *ops)
 
 add_dev_err:
mutex_unlock(_list_lock);
-   put_device(dev);
+   if (parent)
+   mdev_put_parent(parent);
+   else
+   put_device(dev);
return ret;
 }
 EXPORT_SYMBOL(mdev_register_device);
@@ -354,12 +362,6 @@ static int __init mdev_init(void)
return ret;
}
 
-   mdev_bus_compat_class = class_compat_register("mdev_bus");
-   if (!mdev_bus_compat_class) {
-   mdev_bus_unregister();
-   return -ENOMEM;
-   }
-
/*
 * Attempt to load known vfio_mdev.  This gives us a working environment
 * without the user needing to explicitly load vfio_mdev driver.
@@ -371,7 +373,9 @@ static int __init mdev_init(void)
 
 static void __exit mdev_exit(void)
 {
-   class_compat_unregister(mdev_bus_compat_class);
+   if (mdev_bus_compat_class)
+   class_compat_unregister(mdev_bus_compat_class);
+
mdev_bus_unregister();
 }
 
-- 
2.7.0

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-29 Thread Kirti Wankhede



On 10/29/2016 10:00 AM, Jike Song wrote:
> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
>> +{
>> +int ret;
>> +struct parent_device *parent;
>> +
>> +/* check for mandatory ops */
>> +if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>> +return -EINVAL;
>> +
>> +dev = get_device(dev);
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +mutex_lock(_list_lock);
>> +
>> +/* Check for duplicate */
>> +parent = __find_parent_device(dev);
>> +if (parent) {
>> +ret = -EEXIST;
>> +goto add_dev_err;
>> +}
>> +
>> +parent = kzalloc(sizeof(*parent), GFP_KERNEL);
>> +if (!parent) {
>> +ret = -ENOMEM;
>> +goto add_dev_err;
>> +}
>> +
>> +kref_init(>ref);
>> +mutex_init(>lock);
>> +
>> +parent->dev = dev;
>> +parent->ops = ops;
>> +
>> +ret = parent_create_sysfs_files(parent);
>> +if (ret) {
>> +mutex_unlock(_list_lock);
>> +mdev_put_parent(parent);
>> +return ret;
>> +}
>> +
>> +ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
>> +if (ret)
>> +dev_warn(dev, "Failed to create compatibility class link\n");
>> +
> 
> Hi Kirti,
> 
> Like I replied to previous version:
> 
>   http://www.spinics.net/lists/kvm/msg139331.html
> 

Hi Jike,

I saw your reply but by that time v10 version of patch series was out
for review.

> You can always check if mdev_bus_compat_class already registered
> here, and register it if not yet. Same logic should be adopted to
> mdev_init.
> 
> Current implementation will simply panic if configured as builtin,
> which is rare but far from impossible.
> 

Can you verify attached patch with v10 patch-set whether this works for you?
I'll incorporate this change in my next version.

Thanks,
Kirti

From: Kirti Wankhede 
Date: Sat, 29 Oct 2016 15:12:01 +0530
Subject: [PATCH 1/1] Register mdev_bus class on first mdev_device_register

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
---
 drivers/vfio/mdev/mdev_core.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 9d8fa5c91c2e..54c59f325336 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -187,13 +187,18 @@ int mdev_register_device(struct device *dev, const struct 
parent_ops *ops)
parent->dev = dev;
parent->ops = ops;
 
-   ret = parent_create_sysfs_files(parent);
-   if (ret) {
-   mutex_unlock(_list_lock);
-   mdev_put_parent(parent);
-   return ret;
+   if (!mdev_bus_compat_class) {
+   mdev_bus_compat_class = class_compat_register("mdev_bus");
+   if (!mdev_bus_compat_class) {
+   ret = -ENOMEM;
+   goto add_dev_err;
+   }
}
 
+   ret = parent_create_sysfs_files(parent);
+   if (ret)
+   goto add_dev_err;
+
ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
if (ret)
dev_warn(dev, "Failed to create compatibility class link\n");
@@ -206,7 +211,10 @@ int mdev_register_device(struct device *dev, const struct 
parent_ops *ops)
 
 add_dev_err:
mutex_unlock(_list_lock);
-   put_device(dev);
+   if (parent)
+   mdev_put_parent(parent);
+   else
+   put_device(dev);
return ret;
 }
 EXPORT_SYMBOL(mdev_register_device);
@@ -354,12 +362,6 @@ static int __init mdev_init(void)
return ret;
}
 
-   mdev_bus_compat_class = class_compat_register("mdev_bus");
-   if (!mdev_bus_compat_class) {
-   mdev_bus_unregister();
-   return -ENOMEM;
-   }
-
/*
 * Attempt to load known vfio_mdev.  This gives us a working environment
 * without the user needing to explicitly load vfio_mdev driver.
@@ -371,7 +373,9 @@ static int __init mdev_init(void)
 
 static void __exit mdev_exit(void)
 {
-   class_compat_unregister(mdev_bus_compat_class);
+   if (mdev_bus_compat_class)
+   class_compat_unregister(mdev_bus_compat_class);
+
mdev_bus_unregister();
 }
 
-- 
2.7.0

Re: [PATCH v12 RESEND 0/4] generic TEE subsystem

2016-10-29 Thread Jens Wiklander

On Fri, Oct 28, 2016 at 10:43:24AM -0500, Andrew F. Davis wrote:
> On 10/28/2016 05:19 AM, Jens Wiklander wrote:
> > Hi,
> > 
> > This patch set introduces a generic TEE subsystem. The TEE subsystem will
> > contain drivers for various TEE implementations. A TEE (Trusted Execution
> > Environment) is a trusted OS running in some secure environment, for
> > example, TrustZone on ARM CPUs, or a separate secure co-processor etc.
> > 
> > Regarding use cases, TrustZone has traditionally been used for
> > offloading secure tasks to the secure world. Examples include: 
> > - Secure key handling where the OS may or may not have direct access to key
> >   material.
> > - E-commerce and payment technologies. Credentials, credit card numbers etc
> >   could be stored in a more secure environment.
> > - Trusted User Interface (TUI) to ensure that no-one can snoop PIN-codes
> >   etc.
> > - Secure boot to ensure that loaded binaries haven’t been tampered with.
> >   It’s not strictly needed for secure boot, but you could enhance security
> >   by leveraging a TEE during boot.
> > - Digital Rights Management (DRM), the studios provides content with
> >   different resolution depending on the security of the device. Higher
> >   security means higher resolution.
> > 
> > A TEE could also be used in existing and new technologies. For example IMA
> > (Integrity Measurement Architecture) which has been in the kernel for quite
> > a while. Today you can enhance security by using a TPM-chip to sign the IMA
> > measurement list. This is something that you also could do by leveraging a
> > TEE.
> > 
> > Another example could be in 2-factor authentication which is becoming
> > increasingly more important. FIDO (https://fidoalliance.org) for example
> > are using public key cryptography in their 2-factor authentication standard
> > (U2F). With FIDO, a private and public key pair will be generated for every
> > site you visit and the private key should never leave the local device.
> > This is an example where you could use secure storage in a TEE for the
> > private key.
> > 
> > Today you will find a quite a few different out of tree implementations of
> > TEE drivers which tends to fragment the TEE ecosystem and development. We
> > think it would be a good idea to have a generic TEE driver integrated in
> > the kernel which would serve as a base for several different TEE solutions,
> > no matter if they are on-chip like TrustZone or if they are on a separate
> > crypto co-processor.
> > 
> > To develop this TEE subsystem we have been using the open source TEE called
> > OP-TEE (https://github.com/OP-TEE/optee_os) and therefore this would be the
> > first TEE solution supported by this new subsystem. OP-TEE is a
> > GlobalPlatform compliant TEE, however this TEE subsystem is not limited to
> > only GlobalPlatform TEEs, instead we have tried to design it so that it
> > should work with other TEE solutions also.
> > 
> 
> The above is my biggest concern with this whole subsystem, to me it
> still feels very OPTEE specific. As much as I would love to believe
> OPTEE will be the end-all TEE, I'm sure we soon will start to see wider
> use of vendor TEEs (like TI's own legacy Trustzone thing we are hoping
> to depreciate with OPTEE moving forward), possibly Google's Trusty TEE,
> and whatever Intel/AMD are cooking up for x86.

I'd rather say that it's slightly GlobalPlatform specific, but a bit
more flexible.

> 
> As we all know when things are upstreamed we lose the ability to make
> radical changes easily, especially to full subsystems. What happens when
> this framework, built with only one existing TEE, built by the one
> existing TEE's devs, is not as flexible as we need when other TEEs start
> rolling out?

Initially the TEE subsystem was much more flexible and was criticized
for that.

> 
> Do we see this as a chicken and egg situation, or is there any harm
> beyond the pains of supporting an out-of-tree driver for a while, to
> wait until we have at least one other TEE to add to this subsystem
> before merging?

This proposal is the bare minimum to have something useful. On top of
this there's more things we'd like to add, for example an in-kernel API
for accessing the TEE and secure buffer handling. The way we're dealing
with shared memory need to be improved to better support multiple guests
communicating with one TEE.

What we can do now with the subsystem now is somewhat limited by the
fact that we're trying to upstream it and want to do that it in
manageable increments.

Thanks,
Jens

> 
> This may also help with the perceived lack of reviewers for this series.
> 
> Thanks,
> Andrew
> 
> > "tee: generic TEE subsystem" brings in the generic TEE subsystem which
> > helps when writing a driver for a specific TEE, for example, OP-TEE.
> > 
> > "tee: add OP-TEE driver" is an OP-TEE driver which uses the subsystem to do
> > its work.
> > 
> > This patch set has been prepared in cooperation with Javier González who
> >

Re: [PATCH v12 RESEND 0/4] generic TEE subsystem

2016-10-29 Thread Jens Wiklander

On Fri, Oct 28, 2016 at 10:43:24AM -0500, Andrew F. Davis wrote:
> On 10/28/2016 05:19 AM, Jens Wiklander wrote:
> > Hi,
> > 
> > This patch set introduces a generic TEE subsystem. The TEE subsystem will
> > contain drivers for various TEE implementations. A TEE (Trusted Execution
> > Environment) is a trusted OS running in some secure environment, for
> > example, TrustZone on ARM CPUs, or a separate secure co-processor etc.
> > 
> > Regarding use cases, TrustZone has traditionally been used for
> > offloading secure tasks to the secure world. Examples include: 
> > - Secure key handling where the OS may or may not have direct access to key
> >   material.
> > - E-commerce and payment technologies. Credentials, credit card numbers etc
> >   could be stored in a more secure environment.
> > - Trusted User Interface (TUI) to ensure that no-one can snoop PIN-codes
> >   etc.
> > - Secure boot to ensure that loaded binaries haven’t been tampered with.
> >   It’s not strictly needed for secure boot, but you could enhance security
> >   by leveraging a TEE during boot.
> > - Digital Rights Management (DRM), the studios provides content with
> >   different resolution depending on the security of the device. Higher
> >   security means higher resolution.
> > 
> > A TEE could also be used in existing and new technologies. For example IMA
> > (Integrity Measurement Architecture) which has been in the kernel for quite
> > a while. Today you can enhance security by using a TPM-chip to sign the IMA
> > measurement list. This is something that you also could do by leveraging a
> > TEE.
> > 
> > Another example could be in 2-factor authentication which is becoming
> > increasingly more important. FIDO (https://fidoalliance.org) for example
> > are using public key cryptography in their 2-factor authentication standard
> > (U2F). With FIDO, a private and public key pair will be generated for every
> > site you visit and the private key should never leave the local device.
> > This is an example where you could use secure storage in a TEE for the
> > private key.
> > 
> > Today you will find a quite a few different out of tree implementations of
> > TEE drivers which tends to fragment the TEE ecosystem and development. We
> > think it would be a good idea to have a generic TEE driver integrated in
> > the kernel which would serve as a base for several different TEE solutions,
> > no matter if they are on-chip like TrustZone or if they are on a separate
> > crypto co-processor.
> > 
> > To develop this TEE subsystem we have been using the open source TEE called
> > OP-TEE (https://github.com/OP-TEE/optee_os) and therefore this would be the
> > first TEE solution supported by this new subsystem. OP-TEE is a
> > GlobalPlatform compliant TEE, however this TEE subsystem is not limited to
> > only GlobalPlatform TEEs, instead we have tried to design it so that it
> > should work with other TEE solutions also.
> > 
> 
> The above is my biggest concern with this whole subsystem, to me it
> still feels very OPTEE specific. As much as I would love to believe
> OPTEE will be the end-all TEE, I'm sure we soon will start to see wider
> use of vendor TEEs (like TI's own legacy Trustzone thing we are hoping
> to depreciate with OPTEE moving forward), possibly Google's Trusty TEE,
> and whatever Intel/AMD are cooking up for x86.

I'd rather say that it's slightly GlobalPlatform specific, but a bit
more flexible.

> 
> As we all know when things are upstreamed we lose the ability to make
> radical changes easily, especially to full subsystems. What happens when
> this framework, built with only one existing TEE, built by the one
> existing TEE's devs, is not as flexible as we need when other TEEs start
> rolling out?

Initially the TEE subsystem was much more flexible and was criticized
for that.

> 
> Do we see this as a chicken and egg situation, or is there any harm
> beyond the pains of supporting an out-of-tree driver for a while, to
> wait until we have at least one other TEE to add to this subsystem
> before merging?

This proposal is the bare minimum to have something useful. On top of
this there's more things we'd like to add, for example an in-kernel API
for accessing the TEE and secure buffer handling. The way we're dealing
with shared memory need to be improved to better support multiple guests
communicating with one TEE.

What we can do now with the subsystem now is somewhat limited by the
fact that we're trying to upstream it and want to do that it in
manageable increments.

Thanks,
Jens

> 
> This may also help with the perceived lack of reviewers for this series.
> 
> Thanks,
> Andrew
> 
> > "tee: generic TEE subsystem" brings in the generic TEE subsystem which
> > helps when writing a driver for a specific TEE, for example, OP-TEE.
> > 
> > "tee: add OP-TEE driver" is an OP-TEE driver which uses the subsystem to do
> > its work.
> > 
> > This patch set has been prepared in cooperation with Javier González who
> >

s2disk broken at a ThinkPad since 4.6.5

2016-10-29 Thread Toralf Förster

This is a hardened stable Gentoo Linux ThinkPad T440s.
After wakeup from s2disk the console stays at line "clocksource: tsc: mask:" 
forever.

FWIW (and maby completely unrelated) I do wonder why since that version I do 
have a dmesg line :
amd_nb: Cannot enumerate AMD northbridges

The hardened kernel 4.5.7 was fine.

-- 
Toralf
PGP: C4EACDDE 0076E94E, OTR: 420E74C8 30246EE7

s2disk broken at a ThinkPad since 4.6.5

2016-10-29 Thread Toralf Förster

This is a hardened stable Gentoo Linux ThinkPad T440s.
After wakeup from s2disk the console stays at line "clocksource: tsc: mask:" 
forever.

FWIW (and maby completely unrelated) I do wonder why since that version I do 
have a dmesg line :
amd_nb: Cannot enumerate AMD northbridges

The hardened kernel 4.5.7 was fine.

-- 
Toralf
PGP: C4EACDDE 0076E94E, OTR: 420E74C8 30246EE7

Re: [PATCH v3 1/6] pinctrl: sunxi: Deal with configless pins

2016-10-29 Thread Linus Walleij

On Thu, Oct 20, 2016 at 3:49 PM, Maxime Ripard
 wrote:

> Even though the our binding had the assumption that the allwinner,pull and
> allwinner,drive properties were optional, the code never took that into
> account.
>
> Fix that.
>
> Signed-off-by: Maxime Ripard 
> Acked-by: Chen-Yu Tsai 

Patch applied.

Yours,
Linus Walleij

Re: [PATCH v3 1/6] pinctrl: sunxi: Deal with configless pins

2016-10-29 Thread Linus Walleij

On Thu, Oct 20, 2016 at 3:49 PM, Maxime Ripard
 wrote:

> Even though the our binding had the assumption that the allwinner,pull and
> allwinner,drive properties were optional, the code never took that into
> account.
>
> Fix that.
>
> Signed-off-by: Maxime Ripard 
> Acked-by: Chen-Yu Tsai 

Patch applied.

Yours,
Linus Walleij

Re: [RFC PATCH 3/5] gpio-dmec: gpio support for dmec

2016-10-29 Thread Linus Walleij

On Thu, Oct 27, 2016 at 12:47 PM, Zahari Doychev
 wrote:

> This is support for the gpio functionality found on the Data Modul embedded
> controllers
>
> Signed-off-by: Zahari Doychev 
> ---
>  drivers/staging/dmec/Kconfig |  10 +-
>  drivers/staging/dmec/Makefile|   1 +-
>  drivers/staging/dmec/dmec.h  |   5 +-
>  drivers/staging/dmec/gpio-dmec.c | 390 -

I guess Greg has already asked, but why is this in staging?
The driver doesn't seem very complex or anything, it would not
be overly troublesome to get it into the proper kernel subsystems.

> +config GPIO_DMEC
> +   tristate "Data Modul GPIO"
> +   depends on MFD_DMEC && GPIOLIB

So the depends on GPIOLIB can go away if you just put it into
drivers/gpio with the rest.

> +struct dmec_gpio_platform_data {
> +   int gpio_base;

NAK, always use -1. No hardcoding the GPIO base other than on
legacy systems.

> +   int chip_num;

I suspect you may not need this either. struct platform_device
already contains a ->id field, just use that when instantiating
your driver if you need an instance number.

So I think you need zero platform data for this.

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

You should only need 

> +#ifdef CONFIG_PM
> +struct dmec_reg_ctx {
> +   u32 dat;
> +   u32 dir;
> +   u32 imask;
> +   u32 icfg[2];
> +   u32 emask[2];
> +};
> +#endif
> +
> +struct dmec_gpio_priv {
> +   struct regmap *regmap;
> +   struct gpio_chip gpio_chip;
> +   struct irq_chip irq_chip;
> +   unsigned int chip_num;
> +   unsigned int irq;
> +   u8 ver;
> +#ifdef CONFIG_PM
> +   struct dmec_reg_ctx regs;
> +#endif
> +};

The #ifdef for saving the state is a bit kludgy. Can't you just have it there
all the time? Or is this a footprint-sensitive system?

> +static int dmec_gpio_get(struct gpio_chip *gc, unsigned int offset)
> +{
> +   struct dmec_gpio_priv *priv = container_of(gc, struct dmec_gpio_priv,
> +  gpio_chip);

Use the new pattern with

struct dmec_gpio_priv *priv = gpiochip_get_data(gc);

This needs you to use devm_gpiochip_add_data() below.

> +static int dmec_gpio_irq_set_type(struct irq_data *d, unsigned int type)
> +{
> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> +   struct dmec_gpio_priv *priv = container_of(gc, struct dmec_gpio_priv,
> +  gpio_chip);
> +   struct regmap *regmap = priv->regmap;
> +   unsigned int offset, mask, val;
> +
> +   offset = DMEC_GPIO_IRQTYPE_OFFSET(priv) + (d->hwirq >> 2);
> +   mask = ((d->hwirq & 3) << 1);
> +
> +   regmap_read(regmap, offset, );
> +
> +   val &= ~(3 << mask);
> +   switch (type & IRQ_TYPE_SENSE_MASK) {
> +   case IRQ_TYPE_LEVEL_LOW:
> +   break;
> +   case IRQ_TYPE_EDGE_RISING:
> +   val |= (1 << mask);
> +   break;
> +   case IRQ_TYPE_EDGE_FALLING:
> +   val |= (2 << mask);
> +   break;
> +   case IRQ_TYPE_EDGE_BOTH:
> +   val |= (3 << mask);
> +   break;
> +   default:
> +   return -EINVAL;
> +   }
> +
> +   regmap_write(regmap, offset, val);
> +
> +   return 0;
> +}

This chip uses handle_simple_irq() which is fine if the chip really has no
edge detector ACK register.

What some chips have is a special register to clearing (ACK:ing) the edge
IRQ which makes it possible for a new IRQ to be handled as soon as
that has happened, and those need to use handle_edge_irq() for edge IRQs
and handle_level_irq() for level IRQs, with the side effect that the edge
IRQ path will additionally call the .irq_ack() callback on the irqchip
when handle_edge_irq() is used. In this case we set handle_bad_irq()
as default handler and set up the right handler i .set_type().

Look at drivers/gpio/gpio-pl061.c for an example.

If you DON'T have a special edge ACK register, keep it like this.

> +static irqreturn_t dmec_gpio_irq_handler(int irq, void *dev_id)
> +{
> +   struct dmec_gpio_priv *p = dev_id;
> +   struct irq_domain *d = p->gpio_chip.irqdomain;
> +   unsigned int irqs_handled = 0;
> +   unsigned int val = 0, stat = 0;
> +
> +   regmap_read(p->regmap, DMEC_GPIO_IRQSTA_OFFSET(p), );
> +   stat = val;
> +   while (stat) {
> +   int line = __ffs(stat);
> +   int child_irq = irq_find_mapping(d, line);
> +
> +   handle_nested_irq(child_irq);
> +   stat &= ~(BIT(line));
> +   irqs_handled++;
> +   }

I think you should re-read the status register in the loop. An IRQ may
appear while you are reading.

> +static int dmec_gpio_probe(struct platform_device *pdev)
> +{
> +   struct device *dev = >dev;
> +   struct

Re: [RFC PATCH 3/5] gpio-dmec: gpio support for dmec

2016-10-29 Thread Linus Walleij

On Thu, Oct 27, 2016 at 12:47 PM, Zahari Doychev
 wrote:

> This is support for the gpio functionality found on the Data Modul embedded
> controllers
>
> Signed-off-by: Zahari Doychev 
> ---
>  drivers/staging/dmec/Kconfig |  10 +-
>  drivers/staging/dmec/Makefile|   1 +-
>  drivers/staging/dmec/dmec.h  |   5 +-
>  drivers/staging/dmec/gpio-dmec.c | 390 -

I guess Greg has already asked, but why is this in staging?
The driver doesn't seem very complex or anything, it would not
be overly troublesome to get it into the proper kernel subsystems.

> +config GPIO_DMEC
> +   tristate "Data Modul GPIO"
> +   depends on MFD_DMEC && GPIOLIB

So the depends on GPIOLIB can go away if you just put it into
drivers/gpio with the rest.

> +struct dmec_gpio_platform_data {
> +   int gpio_base;

NAK, always use -1. No hardcoding the GPIO base other than on
legacy systems.

> +   int chip_num;

I suspect you may not need this either. struct platform_device
already contains a ->id field, just use that when instantiating
your driver if you need an instance number.

So I think you need zero platform data for this.

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

You should only need 

> +#ifdef CONFIG_PM
> +struct dmec_reg_ctx {
> +   u32 dat;
> +   u32 dir;
> +   u32 imask;
> +   u32 icfg[2];
> +   u32 emask[2];
> +};
> +#endif
> +
> +struct dmec_gpio_priv {
> +   struct regmap *regmap;
> +   struct gpio_chip gpio_chip;
> +   struct irq_chip irq_chip;
> +   unsigned int chip_num;
> +   unsigned int irq;
> +   u8 ver;
> +#ifdef CONFIG_PM
> +   struct dmec_reg_ctx regs;
> +#endif
> +};

The #ifdef for saving the state is a bit kludgy. Can't you just have it there
all the time? Or is this a footprint-sensitive system?

> +static int dmec_gpio_get(struct gpio_chip *gc, unsigned int offset)
> +{
> +   struct dmec_gpio_priv *priv = container_of(gc, struct dmec_gpio_priv,
> +  gpio_chip);

Use the new pattern with

struct dmec_gpio_priv *priv = gpiochip_get_data(gc);

This needs you to use devm_gpiochip_add_data() below.

> +static int dmec_gpio_irq_set_type(struct irq_data *d, unsigned int type)
> +{
> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> +   struct dmec_gpio_priv *priv = container_of(gc, struct dmec_gpio_priv,
> +  gpio_chip);
> +   struct regmap *regmap = priv->regmap;
> +   unsigned int offset, mask, val;
> +
> +   offset = DMEC_GPIO_IRQTYPE_OFFSET(priv) + (d->hwirq >> 2);
> +   mask = ((d->hwirq & 3) << 1);
> +
> +   regmap_read(regmap, offset, );
> +
> +   val &= ~(3 << mask);
> +   switch (type & IRQ_TYPE_SENSE_MASK) {
> +   case IRQ_TYPE_LEVEL_LOW:
> +   break;
> +   case IRQ_TYPE_EDGE_RISING:
> +   val |= (1 << mask);
> +   break;
> +   case IRQ_TYPE_EDGE_FALLING:
> +   val |= (2 << mask);
> +   break;
> +   case IRQ_TYPE_EDGE_BOTH:
> +   val |= (3 << mask);
> +   break;
> +   default:
> +   return -EINVAL;
> +   }
> +
> +   regmap_write(regmap, offset, val);
> +
> +   return 0;
> +}

This chip uses handle_simple_irq() which is fine if the chip really has no
edge detector ACK register.

What some chips have is a special register to clearing (ACK:ing) the edge
IRQ which makes it possible for a new IRQ to be handled as soon as
that has happened, and those need to use handle_edge_irq() for edge IRQs
and handle_level_irq() for level IRQs, with the side effect that the edge
IRQ path will additionally call the .irq_ack() callback on the irqchip
when handle_edge_irq() is used. In this case we set handle_bad_irq()
as default handler and set up the right handler i .set_type().

Look at drivers/gpio/gpio-pl061.c for an example.

If you DON'T have a special edge ACK register, keep it like this.

> +static irqreturn_t dmec_gpio_irq_handler(int irq, void *dev_id)
> +{
> +   struct dmec_gpio_priv *p = dev_id;
> +   struct irq_domain *d = p->gpio_chip.irqdomain;
> +   unsigned int irqs_handled = 0;
> +   unsigned int val = 0, stat = 0;
> +
> +   regmap_read(p->regmap, DMEC_GPIO_IRQSTA_OFFSET(p), );
> +   stat = val;
> +   while (stat) {
> +   int line = __ffs(stat);
> +   int child_irq = irq_find_mapping(d, line);
> +
> +   handle_nested_irq(child_irq);
> +   stat &= ~(BIT(line));
> +   irqs_handled++;
> +   }

I think you should re-read the status register in the loop. An IRQ may
appear while you are reading.

> +static int dmec_gpio_probe(struct platform_device *pdev)
> +{
> +   struct device *dev = >dev;
> +   struct dmec_gpio_platform_data *pdata =

Re: [PATCH] ubifs: Fix regression in ubifs_readdir()

2016-10-29 Thread Richard Weinberger

On 29.10.2016 00:23, Jörg Krause wrote:
>> Does reverting c83ed4c9dbb35 help?
>> And are you 100% sure you applied the fix?
> 
> I double double checked. The fix was applied on the git tree, but the
> compiler cache (I am using Buildroot with this option enabled) fooled
> me by using an old copy. After disabling the compiler cache I got a
> fixed build of the kernel. The panic is gone! Thanks!

Thanks for letting me know.
Let's get this fix into -rc3 then. :-)

Thanks,
//richard

Re: [PATCH] ubifs: Fix regression in ubifs_readdir()

2016-10-29 Thread Richard Weinberger

On 29.10.2016 00:23, Jörg Krause wrote:
>> Does reverting c83ed4c9dbb35 help?
>> And are you 100% sure you applied the fix?
> 
> I double double checked. The fix was applied on the git tree, but the
> compiler cache (I am using Buildroot with this option enabled) fooled
> me by using an old copy. After disabling the compiler cache I got a
> fixed build of the kernel. The panic is gone! Thanks!

Thanks for letting me know.
Let's get this fix into -rc3 then. :-)

Thanks,
//richard

Re: [RFC 1/3] regulator: core: Add over current changed event

2016-10-29 Thread Axel Haslam

Hi Mark,

On Fri, Oct 28, 2016 at 9:41 PM, Axel Haslam  wrote:
> Hi Mark,
>
> On Fri, Oct 28, 2016 at 8:22 PM, Mark Brown  wrote:
>> On Wed, Oct 26, 2016 at 09:00:52PM +0200, ahas...@baylibre.com wrote:
>>> From: Axel Haslam 
>>>
>>> Regulator consumers may be interested to know when the
>>> over current condition is over.
>>>
>>> Add an over currerent "changed" event. The registered useres
>>> for this event can then check the over current flag to know
>>> the status of the over current condition.
>>
>> Would a more general event for error conditions work as well?  Thinking
>> about this I'm unclear how interested consumers are going to be in the
>> specific error condition as opposed to the fact that the regulator ran
>> into trouble, and I can imagine that some regulators will report the
>> same root cause differently - another regulator might detect an
>> excessive current draw by seeing the output voltage collapse and the
>> regulator go out of regulation for example.
>

After some more thought,

I can change the logic a bit, and send an event named something like:

REGULATOR_EVENT_ERRORS_CLEARED

would that make more sense?

-Axel.


> Sorry if i misunderstood, but if we make the name generic,
> i think it might change a bit the definition of the flags,
> The flags will not represent events, but states.
>
> i think today each time an event occurs a notification is sent with the
> corresponding flag(s) set.
>
> if we use a generic name, It means that each time the regulator driver
> sends an event, it should check which "other" error conditons tied to the
> generic flag are present and set the corresponding bits too.
>
> illustrative example:
> today over current and over temp are two different events
> we send one notification for each with only the bits tied to the
> event that is happening set.
>
> if we add a generic error flag, it would mean that if over current happens
> and we set the generic error flag, we would also have to check
> if over temp is present to set or not that flag. similarly, when the over
> temp event happens the regulator driver would have to check if over
> current is present too.
>

Re: [RFC 1/3] regulator: core: Add over current changed event

2016-10-29 Thread Axel Haslam

Hi Mark,

On Fri, Oct 28, 2016 at 9:41 PM, Axel Haslam  wrote:
> Hi Mark,
>
> On Fri, Oct 28, 2016 at 8:22 PM, Mark Brown  wrote:
>> On Wed, Oct 26, 2016 at 09:00:52PM +0200, ahas...@baylibre.com wrote:
>>> From: Axel Haslam 
>>>
>>> Regulator consumers may be interested to know when the
>>> over current condition is over.
>>>
>>> Add an over currerent "changed" event. The registered useres
>>> for this event can then check the over current flag to know
>>> the status of the over current condition.
>>
>> Would a more general event for error conditions work as well?  Thinking
>> about this I'm unclear how interested consumers are going to be in the
>> specific error condition as opposed to the fact that the regulator ran
>> into trouble, and I can imagine that some regulators will report the
>> same root cause differently - another regulator might detect an
>> excessive current draw by seeing the output voltage collapse and the
>> regulator go out of regulation for example.
>

After some more thought,

I can change the logic a bit, and send an event named something like:

REGULATOR_EVENT_ERRORS_CLEARED

would that make more sense?

-Axel.


> Sorry if i misunderstood, but if we make the name generic,
> i think it might change a bit the definition of the flags,
> The flags will not represent events, but states.
>
> i think today each time an event occurs a notification is sent with the
> corresponding flag(s) set.
>
> if we use a generic name, It means that each time the regulator driver
> sends an event, it should check which "other" error conditons tied to the
> generic flag are present and set the corresponding bits too.
>
> illustrative example:
> today over current and over temp are two different events
> we send one notification for each with only the bits tied to the
> event that is happening set.
>
> if we add a generic error flag, it would mean that if over current happens
> and we set the generic error flag, we would also have to check
> if over temp is present to set or not that flag. similarly, when the over
> temp event happens the regulator driver would have to check if over
> current is present too.
>

Re: [RFC][PATCH] arm64: Add support for CONFIG_DEBUG_VIRTUAL

2016-10-29 Thread Ard Biesheuvel

On 28 October 2016 at 23:07, Laura Abbott  wrote:
 diff --git a/arch/arm64/mm/physaddr.c b/arch/arm64/mm/physaddr.c
 new file mode 100644
 index 000..6c271e2
 --- /dev/null
 +++ b/arch/arm64/mm/physaddr.c
 @@ -0,0 +1,17 @@
 +#include 
 +
 +#include 
 +
 +unsigned long __virt_to_phys(unsigned long x)
 +{
 +phys_addr_t __x = (phys_addr_t)x;
 +
 +if (__x & BIT(VA_BITS - 1)) {
 +/* The bit check ensures this is the right range */
 +return (__x & ~PAGE_OFFSET) + PHYS_OFFSET;
 +} else {
 +VIRTUAL_BUG_ON(x < kimage_vaddr || x > (unsigned long)_end);
>>>
>>>
>>> IIUC, in (3) you were asking if the last check should be '>' or '>='?
>>>
>>> To match high_memory, I suspect the latter, as _end doesn't fall within
>>> the mapped virtual address space.
>>>
>>
>> I was actually concerned about if _end would be correct with KASLR.
>> Ard confirmed that it gets fixed up to be correct. I'll change the
>> check to check for >=.
>>
>
> While testing this, I found two places with __pa(_end) to get bounds,
> one in arm64 code and one in memblock code. x86 gets away with this
> because memblock is actually __pa_symbol and x86 does image placement
> different and can check against the maximum image size. I think
> including _end in __pa_symbol but excluding it from the generic
> __virt_to_phys makes sense. It's a bit nicer than doing _end - 1 +
> 1 everywhere.
>

Could we redefine __pa_symbol() under CONFIG_DEBUG_VIRTUAL to
something that checks (x >= kimage_vaddr + TEXT_OFFSET || x <=
(unsigned long)_end), i.e., reject linear virtual addresses? (Assuming
my understanding of the meaning of __pa_symbol() is correct)

Re: [RFC][PATCH] arm64: Add support for CONFIG_DEBUG_VIRTUAL

2016-10-29 Thread Ard Biesheuvel

On 28 October 2016 at 23:07, Laura Abbott  wrote:
 diff --git a/arch/arm64/mm/physaddr.c b/arch/arm64/mm/physaddr.c
 new file mode 100644
 index 000..6c271e2
 --- /dev/null
 +++ b/arch/arm64/mm/physaddr.c
 @@ -0,0 +1,17 @@
 +#include 
 +
 +#include 
 +
 +unsigned long __virt_to_phys(unsigned long x)
 +{
 +phys_addr_t __x = (phys_addr_t)x;
 +
 +if (__x & BIT(VA_BITS - 1)) {
 +/* The bit check ensures this is the right range */
 +return (__x & ~PAGE_OFFSET) + PHYS_OFFSET;
 +} else {
 +VIRTUAL_BUG_ON(x < kimage_vaddr || x > (unsigned long)_end);
>>>
>>>
>>> IIUC, in (3) you were asking if the last check should be '>' or '>='?
>>>
>>> To match high_memory, I suspect the latter, as _end doesn't fall within
>>> the mapped virtual address space.
>>>
>>
>> I was actually concerned about if _end would be correct with KASLR.
>> Ard confirmed that it gets fixed up to be correct. I'll change the
>> check to check for >=.
>>
>
> While testing this, I found two places with __pa(_end) to get bounds,
> one in arm64 code and one in memblock code. x86 gets away with this
> because memblock is actually __pa_symbol and x86 does image placement
> different and can check against the maximum image size. I think
> including _end in __pa_symbol but excluding it from the generic
> __virt_to_phys makes sense. It's a bit nicer than doing _end - 1 +
> 1 everywhere.
>

Could we redefine __pa_symbol() under CONFIG_DEBUG_VIRTUAL to
something that checks (x >= kimage_vaddr + TEXT_OFFSET || x <=
(unsigned long)_end), i.e., reject linear virtual addresses? (Assuming
my understanding of the meaning of __pa_symbol() is correct)

Re: [PATCH 5/6] Input: gpio_keys - add support for GPIO descriptors

2016-10-29 Thread Linus Walleij

On Sat, Oct 29, 2016 at 1:14 AM, Dmitry Torokhov
 wrote:

> From: Geert Uytterhoeven 
>
> GPIO descriptors are the preferred way over legacy GPIO numbers
> nowadays. Convert the driver to use GPIO descriptors internally but
> still allow passing legacy GPIO numbers from platform data to support
> existing platforms.
>
> Based on commits 633a21d80b4a2cd6 ("input: gpio_keys_polled: Add support
> for GPIO descriptors") and 1ae5ddb6f8837558 ("Input: gpio_keys_polled -
> request GPIO pin as input.").
>
> Signed-off-by: Geert Uytterhoeven 
> Signed-off-by: Dmitry Torokhov 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij

Re: [PATCH 5/6] Input: gpio_keys - add support for GPIO descriptors

2016-10-29 Thread Linus Walleij

On Sat, Oct 29, 2016 at 1:14 AM, Dmitry Torokhov
 wrote:

> From: Geert Uytterhoeven 
>
> GPIO descriptors are the preferred way over legacy GPIO numbers
> nowadays. Convert the driver to use GPIO descriptors internally but
> still allow passing legacy GPIO numbers from platform data to support
> existing platforms.
>
> Based on commits 633a21d80b4a2cd6 ("input: gpio_keys_polled: Add support
> for GPIO descriptors") and 1ae5ddb6f8837558 ("Input: gpio_keys_polled -
> request GPIO pin as input.").
>
> Signed-off-by: Geert Uytterhoeven 
> Signed-off-by: Dmitry Torokhov 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij

Re: pinctrl: mediatek: build failure if CONFIG_IRQ_DOMAIN is not set

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 7:20 PM, Paul Bolle  wrote:

> 3) Would you like me to submit a proper (but lightly tested) patch or
> do you prefer to fix this yourself?

Please send a tested patch, I'll apply it.

Thanks for finding this!

Yours,
Linus Walleij

Re: pinctrl: mediatek: build failure if CONFIG_IRQ_DOMAIN is not set

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 7:20 PM, Paul Bolle  wrote:

> 3) Would you like me to submit a proper (but lightly tested) patch or
> do you prefer to fix this yourself?

Please send a tested patch, I'll apply it.

Thanks for finding this!

Yours,
Linus Walleij

Re: [PATCH] pinctrl: generic: Parse pinmux init nodes if node status is okay

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 1:24 PM, Laxman Dewangan  wrote:

> During pinmux registration, pinmux table is parsed from DT
> for making the pinmux table configuration of pins.
>
> Parse the only those node whose status is not disabled.
> This will help on reusing the pin configuration table across
> platform and disabling the node by status property if that node
> is not needed on given platform.
>
> Signed-off-by: Laxman Dewangan 

Makes perfect sense. Patch applied.

Yours,
Linus Walleij

[PATCH 04/60] block: floppy: use bio_add_page()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 drivers/block/floppy.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cdc916a95137..999099d9509d 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3807,11 +3807,6 @@ static int __floppy_read_block_0(struct block_device 
*bdev, int drive)
cbdata.drive = drive;
 
bio_init_with_vec_table(, _vec, 1);
-   bio_vec.bv_page = page;
-   bio_vec.bv_len = size;
-   bio_vec.bv_offset = 0;
-   bio.bi_vcnt = 1;
-   bio.bi_iter.bi_size = size;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = 0;
bio.bi_flags |= (1 << BIO_QUIET);
@@ -3819,6 +3814,8 @@ static int __floppy_read_block_0(struct block_device 
*bdev, int drive)
bio.bi_end_io = floppy_rb0_cb;
bio_set_op_attrs(, REQ_OP_READ, 0);
 
+   bio_add_page(, page, size, 0);
+
submit_bio();
process_fd_request();
 
-- 
2.7.4

[PATCH 06/60] bcache: debug: avoid to access .bi_io_vec directly

2016-10-29 Thread Ming Lei

Instead we use standard iterator way to do that.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/debug.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 333a1e5f6ae6..430f3050663c 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -107,8 +107,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
char name[BDEVNAME_SIZE];
struct bio *check;
-   struct bio_vec bv;
-   struct bvec_iter iter;
+   struct bio_vec bv, cbv;
+   struct bvec_iter iter, citer = { 0 };
 
check = bio_clone(bio, GFP_NOIO);
if (!check)
@@ -120,9 +120,13 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
 
submit_bio_wait(check);
 
+   citer.bi_size = UINT_MAX;
bio_for_each_segment(bv, bio, iter) {
void *p1 = kmap_atomic(bv.bv_page);
-   void *p2 = page_address(check->bi_io_vec[iter.bi_idx].bv_page);
+   void *p2;
+
+   cbv = bio_iter_iovec(check, citer);
+   p2 = page_address(cbv.bv_page);
 
cache_set_err_on(memcmp(p1 + bv.bv_offset,
p2 + bv.bv_offset,
@@ -133,6 +137,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 (uint64_t) bio->bi_iter.bi_sector);
 
kunmap_atomic(p1);
+   bio_advance_iter(check, , bv.bv_len);
}
 
bio_free_pages(check);
-- 
2.7.4

Re: [PATCH] pinctrl: generic: Parse pinmux init nodes if node status is okay

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 1:24 PM, Laxman Dewangan  wrote:

> During pinmux registration, pinmux table is parsed from DT
> for making the pinmux table configuration of pins.
>
> Parse the only those node whose status is not disabled.
> This will help on reusing the pin configuration table across
> platform and disabling the node by status property if that node
> is not needed on given platform.
>
> Signed-off-by: Laxman Dewangan 

Makes perfect sense. Patch applied.

Yours,
Linus Walleij

[PATCH 04/60] block: floppy: use bio_add_page()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 drivers/block/floppy.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cdc916a95137..999099d9509d 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3807,11 +3807,6 @@ static int __floppy_read_block_0(struct block_device 
*bdev, int drive)
cbdata.drive = drive;
 
bio_init_with_vec_table(, _vec, 1);
-   bio_vec.bv_page = page;
-   bio_vec.bv_len = size;
-   bio_vec.bv_offset = 0;
-   bio.bi_vcnt = 1;
-   bio.bi_iter.bi_size = size;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = 0;
bio.bi_flags |= (1 << BIO_QUIET);
@@ -3819,6 +3814,8 @@ static int __floppy_read_block_0(struct block_device 
*bdev, int drive)
bio.bi_end_io = floppy_rb0_cb;
bio_set_op_attrs(, REQ_OP_READ, 0);
 
+   bio_add_page(, page, size, 0);
+
submit_bio();
process_fd_request();
 
-- 
2.7.4

[PATCH 06/60] bcache: debug: avoid to access .bi_io_vec directly

2016-10-29 Thread Ming Lei

Instead we use standard iterator way to do that.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/debug.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 333a1e5f6ae6..430f3050663c 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -107,8 +107,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
char name[BDEVNAME_SIZE];
struct bio *check;
-   struct bio_vec bv;
-   struct bvec_iter iter;
+   struct bio_vec bv, cbv;
+   struct bvec_iter iter, citer = { 0 };
 
check = bio_clone(bio, GFP_NOIO);
if (!check)
@@ -120,9 +120,13 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
 
submit_bio_wait(check);
 
+   citer.bi_size = UINT_MAX;
bio_for_each_segment(bv, bio, iter) {
void *p1 = kmap_atomic(bv.bv_page);
-   void *p2 = page_address(check->bi_io_vec[iter.bi_idx].bv_page);
+   void *p2;
+
+   cbv = bio_iter_iovec(check, citer);
+   p2 = page_address(cbv.bv_page);
 
cache_set_err_on(memcmp(p1 + bv.bv_offset,
p2 + bv.bv_offset,
@@ -133,6 +137,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 (uint64_t) bio->bi_iter.bi_sector);
 
kunmap_atomic(p1);
+   bio_advance_iter(check, , bv.bv_len);
}
 
bio_free_pages(check);
-- 
2.7.4

[PATCH 05/60] target: avoid to access .bi_vcnt directly

2016-10-29 Thread Ming Lei

When the bio is full, bio_add_pc_page() will return zero,
so use this way to handle full bio.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Signed-off-by: Ming Lei 
---
 drivers/target/target_core_pscsi.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index 9125d9358dea..04d7aa7390d0 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -935,13 +935,9 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
 
rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
bio, page, bytes, off);
-   if (rc != bytes)
-   goto fail;
-
pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
-   bio->bi_vcnt, nr_vecs);
-
-   if (bio->bi_vcnt > nr_vecs) {
+   bio_segments(bio), nr_vecs);
+   if (rc != bytes) {
pr_debug("PSCSI: Reached bio->bi_vcnt max:"
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);
-- 
2.7.4

[PATCH 03/60] block: drbd: remove impossible failure handling

2016-10-29 Thread Ming Lei

For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.

Acked-by: Lars Ellenberg 
Signed-off-by: Ming Lei 
---
 drivers/block/drbd/drbd_receiver.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index 942384f34e22..c537e3bd09eb 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1648,20 +1648,8 @@ int drbd_submit_peer_request(struct drbd_device *device,
 
page_chain_for_each(page) {
unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
-   if (!bio_add_page(bio, page, len, 0)) {
-   /* A single page must always be possible!
-* But in case it fails anyways,
-* we deal with it, and complain (below). */
-   if (bio->bi_vcnt == 0) {
-   drbd_err(device,
-   "bio_add_page failed for len=%u, "
-   "bi_vcnt=0 (bi_sector=%llu)\n",
-   len, (uint64_t)bio->bi_iter.bi_sector);
-   err = -ENOSPC;
-   goto fail;
-   }
+   if (!bio_add_page(bio, page, len, 0))
goto next_bio;
-   }
data_size -= len;
sector += len >> 9;
--nr_pages;
-- 
2.7.4

[PATCH 05/60] target: avoid to access .bi_vcnt directly

2016-10-29 Thread Ming Lei

When the bio is full, bio_add_pc_page() will return zero,
so use this way to handle full bio.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Signed-off-by: Ming Lei 
---
 drivers/target/target_core_pscsi.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index 9125d9358dea..04d7aa7390d0 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -935,13 +935,9 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
 
rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
bio, page, bytes, off);
-   if (rc != bytes)
-   goto fail;
-
pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
-   bio->bi_vcnt, nr_vecs);
-
-   if (bio->bi_vcnt > nr_vecs) {
+   bio_segments(bio), nr_vecs);
+   if (rc != bytes) {
pr_debug("PSCSI: Reached bio->bi_vcnt max:"
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);
-- 
2.7.4

[PATCH 03/60] block: drbd: remove impossible failure handling

2016-10-29 Thread Ming Lei

For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.

Acked-by: Lars Ellenberg 
Signed-off-by: Ming Lei 
---
 drivers/block/drbd/drbd_receiver.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index 942384f34e22..c537e3bd09eb 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1648,20 +1648,8 @@ int drbd_submit_peer_request(struct drbd_device *device,
 
page_chain_for_each(page) {
unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
-   if (!bio_add_page(bio, page, len, 0)) {
-   /* A single page must always be possible!
-* But in case it fails anyways,
-* we deal with it, and complain (below). */
-   if (bio->bi_vcnt == 0) {
-   drbd_err(device,
-   "bio_add_page failed for len=%u, "
-   "bi_vcnt=0 (bi_sector=%llu)\n",
-   len, (uint64_t)bio->bi_iter.bi_sector);
-   err = -ENOSPC;
-   goto fail;
-   }
+   if (!bio_add_page(bio, page, len, 0))
goto next_bio;
-   }
data_size -= len;
sector += len >> 9;
--nr_pages;
-- 
2.7.4

[PATCH 17/60] kernel/power/swap.c: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 kernel/power/swap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index a3b1e617bcdc..8bc13a4461bc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 static void hib_end_io(struct bio *bio)
 {
struct hib_bio_batch *hb = bio->bi_private;
+
+   /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;
 
if (bio->bi_error) {
-- 
2.7.4

[PATCH 27/60] block: introduce BIO_SP_MAX_SECTORS

2016-10-29 Thread Ming Lei

This macro is needed when one multipage bvec based bio is
converted to singlepage bvec based bio, for example, bio bounce
requires singlepage bvec.

Signed-off-by: Ming Lei 
---
 include/linux/bio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8634bd24984c..fa71f6a57f81 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -40,6 +40,9 @@
 
 #define BIO_MAX_PAGES  256
 
+/* Max sectors of bio with singlepage bvec */
+#define BIO_SP_MAX_SECTORS (BIO_MAX_PAGES * (PAGE_SIZE >> 9))
+
 #define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
 
-- 
2.7.4

[PATCH 25/60] block: pktcdvd: set NO_MP for pktcdvd request queue

2016-10-29 Thread Ming Lei

At least pkt_start_write() operates on the bvec table directly,
it isn't ready to enable multipage bvec yet, so mark the
flag now.

Signed-off-by: Ming Lei 
---
 drivers/block/pktcdvd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 817d2cc17d01..403c93b46ea3 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2518,6 +2518,9 @@ static void pkt_init_queue(struct pktcdvd_device *pd)
blk_queue_logical_block_size(q, CD_FRAMESIZE);
blk_queue_max_hw_sectors(q, PACKET_MAX_SECTORS);
q->queuedata = pd;
+
+   /* not ready for multipage bvec yet */
+   set_bit(QUEUE_FLAG_NO_MP, >queue_flags);
 }
 
 static int pkt_seq_show(struct seq_file *m, void *p)
-- 
2.7.4

[PATCH 18/60] mm: page_io.c: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 mm/page_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index a2651f58c86a..b0c0069ec1f4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+   /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;
 
if (bio->bi_error) {
-- 
2.7.4

Re: [PATCH V2] pinctrl: qcom: Add msm8994 pinctrl driver

2016-10-29 Thread Linus Walleij

On Thu, Oct 27, 2016 at 1:32 AM, Michael Scott  wrote:

> Initial pinctrl driver for QCOM msm8994 platforms.
>
> In order to continue the initial board support for QCOM msm8994/msm8992
> presented in patches from Jeremy McNicoll , let's put
> a proper pinctrl driver in place.
>
> Currently, the DT for these platforms uses the msm8x74 pinctrl driver to 
> enable
> basic UART.  Beyond the first few pins the rest are different enough to 
> justify
> it's own driver.
>
> Note: This driver is also be used by QCOM's msm8992 platform as it's TLM block
> is the same.
>
> - Initial formatting and style was taken from the msm8x74 pinctrl driver added
>   by Björn Andersson 
> - Data was then adjusted per QCOM MSM8994 documentation for Top Level 
> Multiplexing
> - Bindings documentation was based on qcom,msm8996-pinctrl.txt by
>   Joonwoo Park  and then modified for msm8994 content
>
> Signed-off-by: Michael Scott 
> ---
>
> V1 -> V2: fixed missing FUNCTION(nav_pps) and removed 3 odd newlines between 
> blsp_i2c4_groups and cci_timer0_groups

Looks fine to me, just like the other Qcom drivers.

I just want Björn Andersson's ACK before merging, Björn?

Yours,
Linus Walleij

[PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments

2016-10-29 Thread Ming Lei

Avoid to access .bi_vcnt directly, because it may be not what
the driver expected any more after supporting multipage bvec.

Signed-off-by: Ming Lei 
---
 drivers/md/dm-rq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2adc050a..8534cbf8ce35 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -819,7 +819,8 @@ static void dm_old_request_fn(struct request_queue *q)
pos = blk_rq_pos(rq);
 
if ((dm_old_request_peeked_before_merge_deadline(md) &&
-md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+md_in_flight(md) && rq->bio &&
+!bio_multiple_segments(rq->bio) &&
 md->last_rq_pos == pos && md->last_rq_rw == 
rq_data_dir(rq)) ||
(ti->type->busy && ti->type->busy(ti))) {
blk_delay_queue(q, 10);
-- 
2.7.4

[PATCH 19/60] fs/buffer: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/buffer.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b205a629001d..81c3793948b4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3018,8 +3018,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
 void guard_bio_eod(int op, struct bio *bio)
 {
sector_t maxsector;
-   struct bio_vec *bvec = >bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes;
+   /*
+* It is safe to truncate the last bvec in the following way
+* even though multipage bvec is supported, but we need to
+* fix the parameters passed to zero_user().
+*/
+   struct bio_vec *bvec = >bi_io_vec[bio->bi_vcnt - 1];
 
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (!maxsector)
-- 
2.7.4

[PATCH 29/60] dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT

2016-10-29 Thread Ming Lei

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt and log write
targets.

Signed-off-by: Ming Lei 
---
 drivers/md/dm.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ef7bf1dd6900..ce454c6c1a4e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, 
sector_t len)
return -EINVAL;
}
 
-   ti->max_io_len = (uint32_t) len;
+   /*
+* BIO based queue uses its own splitting. When multipage bvecs
+* is switched on, size of the incoming bio may be too big to
+* be handled in some targets, such as crypt and log write.
+*
+* When these targets are ready for the big bio, we can remove
+* the limit.
+*/
+   ti->max_io_len = min_t(uint32_t, len,
+  BIO_SP_MAX_SECTORS << SECTOR_SHIFT);
 
return 0;
 }
-- 
2.7.4

[PATCH 17/60] kernel/power/swap.c: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 kernel/power/swap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index a3b1e617bcdc..8bc13a4461bc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 static void hib_end_io(struct bio *bio)
 {
struct hib_bio_batch *hb = bio->bi_private;
+
+   /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;
 
if (bio->bi_error) {
-- 
2.7.4

[PATCH 27/60] block: introduce BIO_SP_MAX_SECTORS

2016-10-29 Thread Ming Lei

This macro is needed when one multipage bvec based bio is
converted to singlepage bvec based bio, for example, bio bounce
requires singlepage bvec.

Signed-off-by: Ming Lei 
---
 include/linux/bio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8634bd24984c..fa71f6a57f81 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -40,6 +40,9 @@
 
 #define BIO_MAX_PAGES  256
 
+/* Max sectors of bio with singlepage bvec */
+#define BIO_SP_MAX_SECTORS (BIO_MAX_PAGES * (PAGE_SIZE >> 9))
+
 #define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
 
-- 
2.7.4

[PATCH 25/60] block: pktcdvd: set NO_MP for pktcdvd request queue

2016-10-29 Thread Ming Lei

At least pkt_start_write() operates on the bvec table directly,
it isn't ready to enable multipage bvec yet, so mark the
flag now.

Signed-off-by: Ming Lei 
---
 drivers/block/pktcdvd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 817d2cc17d01..403c93b46ea3 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2518,6 +2518,9 @@ static void pkt_init_queue(struct pktcdvd_device *pd)
blk_queue_logical_block_size(q, CD_FRAMESIZE);
blk_queue_max_hw_sectors(q, PACKET_MAX_SECTORS);
q->queuedata = pd;
+
+   /* not ready for multipage bvec yet */
+   set_bit(QUEUE_FLAG_NO_MP, >queue_flags);
 }
 
 static int pkt_seq_show(struct seq_file *m, void *p)
-- 
2.7.4

[PATCH 18/60] mm: page_io.c: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 mm/page_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index a2651f58c86a..b0c0069ec1f4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+   /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;
 
if (bio->bi_error) {
-- 
2.7.4

Re: [PATCH V2] pinctrl: qcom: Add msm8994 pinctrl driver

2016-10-29 Thread Linus Walleij

On Thu, Oct 27, 2016 at 1:32 AM, Michael Scott  wrote:

> Initial pinctrl driver for QCOM msm8994 platforms.
>
> In order to continue the initial board support for QCOM msm8994/msm8992
> presented in patches from Jeremy McNicoll , let's put
> a proper pinctrl driver in place.
>
> Currently, the DT for these platforms uses the msm8x74 pinctrl driver to 
> enable
> basic UART.  Beyond the first few pins the rest are different enough to 
> justify
> it's own driver.
>
> Note: This driver is also be used by QCOM's msm8992 platform as it's TLM block
> is the same.
>
> - Initial formatting and style was taken from the msm8x74 pinctrl driver added
>   by Björn Andersson 
> - Data was then adjusted per QCOM MSM8994 documentation for Top Level 
> Multiplexing
> - Bindings documentation was based on qcom,msm8996-pinctrl.txt by
>   Joonwoo Park  and then modified for msm8994 content
>
> Signed-off-by: Michael Scott 
> ---
>
> V1 -> V2: fixed missing FUNCTION(nav_pps) and removed 3 odd newlines between 
> blsp_i2c4_groups and cci_timer0_groups

Looks fine to me, just like the other Qcom drivers.

I just want Björn Andersson's ACK before merging, Björn?

Yours,
Linus Walleij

[PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments

2016-10-29 Thread Ming Lei

Avoid to access .bi_vcnt directly, because it may be not what
the driver expected any more after supporting multipage bvec.

Signed-off-by: Ming Lei 
---
 drivers/md/dm-rq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2adc050a..8534cbf8ce35 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -819,7 +819,8 @@ static void dm_old_request_fn(struct request_queue *q)
pos = blk_rq_pos(rq);
 
if ((dm_old_request_peeked_before_merge_deadline(md) &&
-md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+md_in_flight(md) && rq->bio &&
+!bio_multiple_segments(rq->bio) &&
 md->last_rq_pos == pos && md->last_rq_rw == 
rq_data_dir(rq)) ||
(ti->type->busy && ti->type->busy(ti))) {
blk_delay_queue(q, 10);
-- 
2.7.4

[PATCH 19/60] fs/buffer: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/buffer.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b205a629001d..81c3793948b4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3018,8 +3018,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
 void guard_bio_eod(int op, struct bio *bio)
 {
sector_t maxsector;
-   struct bio_vec *bvec = >bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes;
+   /*
+* It is safe to truncate the last bvec in the following way
+* even though multipage bvec is supported, but we need to
+* fix the parameters passed to zero_user().
+*/
+   struct bio_vec *bvec = >bi_io_vec[bio->bi_vcnt - 1];
 
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (!maxsector)
-- 
2.7.4

[PATCH 29/60] dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT

2016-10-29 Thread Ming Lei

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt and log write
targets.

Signed-off-by: Ming Lei 
---
 drivers/md/dm.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ef7bf1dd6900..ce454c6c1a4e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, 
sector_t len)
return -EINVAL;
}
 
-   ti->max_io_len = (uint32_t) len;
+   /*
+* BIO based queue uses its own splitting. When multipage bvecs
+* is switched on, size of the incoming bio may be too big to
+* be handled in some targets, such as crypt and log write.
+*
+* When these targets are ready for the big bio, we can remove
+* the limit.
+*/
+   ti->max_io_len = min_t(uint32_t, len,
+  BIO_SP_MAX_SECTORS << SECTOR_SHIFT);
 
return 0;
 }
-- 
2.7.4

[PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS

2016-10-29 Thread Ming Lei

There are lots of direct access to .bi_vcnt & .bi_io_vec
of bio, and it isn't ready to support multipage bvecs
for BTRFS, so set NO_MP for these request queues.

Signed-off-by: Ming Lei 
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 71a60cc01451..2e7237a3b84d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1011,6 +1011,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
if (blk_queue_discard(q))
device->can_discard = 1;
 
+   /* BTRFS isn't ready to support multipage bvecs */
+   set_bit(QUEUE_FLAG_NO_MP, >queue_flags);
+
device->bdev = bdev;
device->in_fs_metadata = 0;
device->mode = flags;
-- 
2.7.4

Re: [PATCH] pinctrl: max77620: add OF dependency

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 10:19 AM, Arnd Bergmann  wrote:

> Drivers using pinconf_generic_params tables cannot be built with
> CONFIG_OF disabled:
>
> drivers/pinctrl/pinctrl-max77620.c:53:44: error: array type has incomplete 
> element type ‘struct pinconf_generic_params’
> drivers/pinctrl/pinctrl-max77620.c:55:3: error: field name not in record or 
> union initializer
> drivers/pinctrl/pinctrl-max77620.c:55:3: note: (near initialization for 
> ‘max77620_cfg_params’)
> drivers/pinctrl/pinctrl-max77620.c:56:3: error: field name not in record or 
> union initializer
>
> This adds a dependency for max77620 to disallow that configuration.
>
> Alternatively, we could rework the pinctrl infrastructure to make the
> configuration valid for compile-testing.
>
> Cc: Krzysztof Kozlowski 
> Cc: Lee Jones 
> Fixes: 453943dc8f45 ("mfd: Enable compile testing for max77620 and max77686")
> Signed-off-by: Arnd Bergmann 

Patch applied.

Yours,
Linus Walleij

[PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP

2016-10-29 Thread Ming Lei

Some drivers(such as dm) should be capable of dealing with multipage
bvec, but the incoming bio may be too big, such as, a new singlepage bvec
bio can't be cloned from the bio, or can't be allocated to singlepage
bvec with same size.

At least crypt dm, log writes and bcache have this kind of issue.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c  | 4 
 include/linux/blkdev.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5fc8b69..266c94d1d82f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,6 +79,10 @@ static inline unsigned get_max_io_size(struct request_queue 
*q,
/* aligned to logical block size */
sectors &= ~(mask >> 9);
 
+   /* some queues can't handle bigger bio even it is ready for mp bvecs */
+   if (blk_queue_split_mp(q) && sectors > BIO_SP_MAX_SECTORS)
+   sectors = BIO_SP_MAX_SECTORS;
+
return sectors;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e4dd25361bd6..7cee0179c9e6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -506,6 +506,7 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ25  /* flush not queueuable */
 #define QUEUE_FLAG_DAX 26  /* device supports DAX */
 #define QUEUE_FLAG_NO_MP   27  /* multipage bvecs isn't ready */
+#define QUEUE_FLAG_SPLIT_MP28  /* split MP bvecs if too bigger */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_STACKABLE)|   \
@@ -597,6 +598,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)   test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_no_mp(q) test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
+#define blk_queue_split_mp(q)  test_bit(QUEUE_FLAG_SPLIT_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

[PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS

2016-10-29 Thread Ming Lei

There are lots of direct access to .bi_vcnt & .bi_io_vec
of bio, and it isn't ready to support multipage bvecs
for BTRFS, so set NO_MP for these request queues.

Signed-off-by: Ming Lei 
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 71a60cc01451..2e7237a3b84d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1011,6 +1011,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
if (blk_queue_discard(q))
device->can_discard = 1;
 
+   /* BTRFS isn't ready to support multipage bvecs */
+   set_bit(QUEUE_FLAG_NO_MP, >queue_flags);
+
device->bdev = bdev;
device->in_fs_metadata = 0;
device->mode = flags;
-- 
2.7.4

Re: [PATCH] pinctrl: max77620: add OF dependency

2016-10-29 Thread Linus Walleij

On Fri, Oct 28, 2016 at 10:19 AM, Arnd Bergmann  wrote:

> Drivers using pinconf_generic_params tables cannot be built with
> CONFIG_OF disabled:
>
> drivers/pinctrl/pinctrl-max77620.c:53:44: error: array type has incomplete 
> element type ‘struct pinconf_generic_params’
> drivers/pinctrl/pinctrl-max77620.c:55:3: error: field name not in record or 
> union initializer
> drivers/pinctrl/pinctrl-max77620.c:55:3: note: (near initialization for 
> ‘max77620_cfg_params’)
> drivers/pinctrl/pinctrl-max77620.c:56:3: error: field name not in record or 
> union initializer
>
> This adds a dependency for max77620 to disallow that configuration.
>
> Alternatively, we could rework the pinctrl infrastructure to make the
> configuration valid for compile-testing.
>
> Cc: Krzysztof Kozlowski 
> Cc: Lee Jones 
> Fixes: 453943dc8f45 ("mfd: Enable compile testing for max77620 and max77686")
> Signed-off-by: Arnd Bergmann 

Patch applied.

Yours,
Linus Walleij

[PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP

2016-10-29 Thread Ming Lei

Some drivers(such as dm) should be capable of dealing with multipage
bvec, but the incoming bio may be too big, such as, a new singlepage bvec
bio can't be cloned from the bio, or can't be allocated to singlepage
bvec with same size.

At least crypt dm, log writes and bcache have this kind of issue.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c  | 4 
 include/linux/blkdev.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5fc8b69..266c94d1d82f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,6 +79,10 @@ static inline unsigned get_max_io_size(struct request_queue 
*q,
/* aligned to logical block size */
sectors &= ~(mask >> 9);
 
+   /* some queues can't handle bigger bio even it is ready for mp bvecs */
+   if (blk_queue_split_mp(q) && sectors > BIO_SP_MAX_SECTORS)
+   sectors = BIO_SP_MAX_SECTORS;
+
return sectors;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e4dd25361bd6..7cee0179c9e6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -506,6 +506,7 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ25  /* flush not queueuable */
 #define QUEUE_FLAG_DAX 26  /* device supports DAX */
 #define QUEUE_FLAG_NO_MP   27  /* multipage bvecs isn't ready */
+#define QUEUE_FLAG_SPLIT_MP28  /* split MP bvecs if too bigger */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_STACKABLE)|   \
@@ -597,6 +598,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)   test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_no_mp(q) test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
+#define blk_queue_split_mp(q)  test_bit(QUEUE_FLAG_SPLIT_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

[PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg()

2016-10-29 Thread Ming Lei

Also this patch simplify the code a bit.

Signed-off-by: Ming Lei 
---
 fs/logfs/dev_bdev.c | 51 ---
 1 file changed, 20 insertions(+), 31 deletions(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 696dcdd65fdd..79be4cb0dfd8 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -72,56 +72,45 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
 {
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
-   struct bio *bio;
+   struct bio *bio = NULL;
struct page *page;
unsigned int max_pages;
-   int i;
+   int i, ret;
 
max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
 
-   bio = bio_alloc(GFP_NOFS, max_pages);
-   BUG_ON(!bio);
-
for (i = 0; i < nr_pages; i++) {
-   if (i >= max_pages) {
-   /* Block layer cannot split bios :( */
-   bio->bi_vcnt = i;
-   bio->bi_iter.bi_size = i * PAGE_SIZE;
+   if (!bio) {
+   bio = bio_alloc(GFP_NOFS, max_pages);
+   BUG_ON(!bio);
+
bio->bi_bdev = super->s_bdev;
bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = writeseg_end_io;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-   atomic_inc(>s_pending_writes);
-   submit_bio(bio);
-
-   ofs += i * PAGE_SIZE;
-   index += i;
-   nr_pages -= i;
-   i = 0;
-
-   bio = bio_alloc(GFP_NOFS, max_pages);
-   BUG_ON(!bio);
}
page = find_lock_page(mapping, index + i);
BUG_ON(!page);
-   bio->bi_io_vec[i].bv_page = page;
-   bio->bi_io_vec[i].bv_len = PAGE_SIZE;
-   bio->bi_io_vec[i].bv_offset = 0;
+   ret = bio_add_page(bio, page, PAGE_SIZE, 0);
 
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
+
+   if (!ret) {
+   /* Block layer cannot split bios :( */
+   ofs += bio->bi_iter.bi_size;
+   atomic_inc(>s_pending_writes);
+   submit_bio(bio);
+   bio = NULL;
+   }
+   }
+
+   if (bio) {
+   atomic_inc(>s_pending_writes);
+   submit_bio(bio);
}
-   bio->bi_vcnt = nr_pages;
-   bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
-   bio->bi_bdev = super->s_bdev;
-   bio->bi_iter.bi_sector = ofs >> 9;
-   bio->bi_private = sb;
-   bio->bi_end_io = writeseg_end_io;
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-   atomic_inc(>s_pending_writes);
-   submit_bio(bio);
return 0;
 }
 
-- 
2.7.4

[PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg()

2016-10-29 Thread Ming Lei

Also this patch simplify the code a bit.

Signed-off-by: Ming Lei 
---
 fs/logfs/dev_bdev.c | 51 ---
 1 file changed, 20 insertions(+), 31 deletions(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 696dcdd65fdd..79be4cb0dfd8 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -72,56 +72,45 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, 
pgoff_t index,
 {
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
-   struct bio *bio;
+   struct bio *bio = NULL;
struct page *page;
unsigned int max_pages;
-   int i;
+   int i, ret;
 
max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
 
-   bio = bio_alloc(GFP_NOFS, max_pages);
-   BUG_ON(!bio);
-
for (i = 0; i < nr_pages; i++) {
-   if (i >= max_pages) {
-   /* Block layer cannot split bios :( */
-   bio->bi_vcnt = i;
-   bio->bi_iter.bi_size = i * PAGE_SIZE;
+   if (!bio) {
+   bio = bio_alloc(GFP_NOFS, max_pages);
+   BUG_ON(!bio);
+
bio->bi_bdev = super->s_bdev;
bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = writeseg_end_io;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-   atomic_inc(>s_pending_writes);
-   submit_bio(bio);
-
-   ofs += i * PAGE_SIZE;
-   index += i;
-   nr_pages -= i;
-   i = 0;
-
-   bio = bio_alloc(GFP_NOFS, max_pages);
-   BUG_ON(!bio);
}
page = find_lock_page(mapping, index + i);
BUG_ON(!page);
-   bio->bi_io_vec[i].bv_page = page;
-   bio->bi_io_vec[i].bv_len = PAGE_SIZE;
-   bio->bi_io_vec[i].bv_offset = 0;
+   ret = bio_add_page(bio, page, PAGE_SIZE, 0);
 
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
+
+   if (!ret) {
+   /* Block layer cannot split bios :( */
+   ofs += bio->bi_iter.bi_size;
+   atomic_inc(>s_pending_writes);
+   submit_bio(bio);
+   bio = NULL;
+   }
+   }
+
+   if (bio) {
+   atomic_inc(>s_pending_writes);
+   submit_bio(bio);
}
-   bio->bi_vcnt = nr_pages;
-   bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
-   bio->bi_bdev = super->s_bdev;
-   bio->bi_iter.bi_sector = ofs >> 9;
-   bio->bi_private = sb;
-   bio->bi_end_io = writeseg_end_io;
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-   atomic_inc(>s_pending_writes);
-   submit_bio(bio);
return 0;
 }
 
-- 
2.7.4

[PATCH 35/60] bvec_iter: introduce BVEC_ITER_ALL_INIT

2016-10-29 Thread Ming Lei

Introduce BVEC_ITER_ALL_INIT for iterating one bio
from start to end.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 9df9e582bd3f..e12ce6bd63d7 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -183,4 +183,12 @@ static inline void bvec_iter_advance_mp(const struct 
bio_vec *bv,
((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
 bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
 
+#define BVEC_ITER_ALL_INIT (struct bvec_iter)  \
+{  \
+   .bi_sector  = 0,\
+   .bi_size= UINT_MAX, \
+   .bi_idx = 0,\
+   .bi_bvec_done   = 0,\
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4

[PATCH 22/60] block: comment on bio_alloc_pages()

2016-10-29 Thread Ming Lei

This patch adds comment on usage of bio_alloc_pages(),
also comments on one special case of bch_data_verify().

Signed-off-by: Ming Lei 
---
 block/bio.c   | 4 +++-
 drivers/md/bcache/debug.c | 6 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..a49d1d89a85c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,7 +907,9 @@ EXPORT_SYMBOL(bio_advance);
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
  *
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means no page isn't added
+ * to the bio via bio_add_page() yet.
  *
  * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages 
are
  * freed.
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 430f3050663c..71a9f05918eb 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
struct bio_vec bv, cbv;
struct bvec_iter iter, citer = { 0 };
 
+   /*
+* Once multipage bvec is supported, the bio_clone()
+* has to make sure page count in this bio can be held
+* in the new cloned bio because each single page need
+* to assign to each bvec of the new bio.
+*/
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
-- 
2.7.4

[PATCH 35/60] bvec_iter: introduce BVEC_ITER_ALL_INIT

2016-10-29 Thread Ming Lei

Introduce BVEC_ITER_ALL_INIT for iterating one bio
from start to end.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 9df9e582bd3f..e12ce6bd63d7 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -183,4 +183,12 @@ static inline void bvec_iter_advance_mp(const struct 
bio_vec *bv,
((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
 bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
 
+#define BVEC_ITER_ALL_INIT (struct bvec_iter)  \
+{  \
+   .bi_sector  = 0,\
+   .bi_size= UINT_MAX, \
+   .bi_idx = 0,\
+   .bi_bvec_done   = 0,\
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4

[PATCH 22/60] block: comment on bio_alloc_pages()

2016-10-29 Thread Ming Lei

This patch adds comment on usage of bio_alloc_pages(),
also comments on one special case of bch_data_verify().

Signed-off-by: Ming Lei 
---
 block/bio.c   | 4 +++-
 drivers/md/bcache/debug.c | 6 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..a49d1d89a85c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,7 +907,9 @@ EXPORT_SYMBOL(bio_advance);
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
  *
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means no page isn't added
+ * to the bio via bio_add_page() yet.
  *
  * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages 
are
  * freed.
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 430f3050663c..71a9f05918eb 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
struct bio_vec bv, cbv;
struct bvec_iter iter, citer = { 0 };
 
+   /*
+* Once multipage bvec is supported, the bio_clone()
+* has to make sure page count in this bio can be held
+* in the new cloned bio because each single page need
+* to assign to each bvec of the new bio.
+*/
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
-- 
2.7.4

[PATCH 21/60] bcache: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Looks all are safe after multipage bvec is supported.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/btree.c | 1 +
 drivers/md/bcache/super.c | 6 ++
 drivers/md/bcache/util.c  | 7 +++
 3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd7b..b419bc91ba32 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
 
continue_at(cl, btree_node_write_done, NULL);
} else {
+   /* No harm for multipage bvec since the new is just allocated */
b->bio->bi_vcnt = 0;
bch_bio_map(b->bio, i);
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d8a6d807b498..52876fcf2b36 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -207,6 +207,7 @@ static void write_bdev_super_endio(struct bio *bio)
 
 static void __write_super(struct cache_sb *sb, struct bio *bio)
 {
+   /* single page bio, safe for multipage bvec */
struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
unsigned i;
 
@@ -1153,6 +1154,8 @@ static void register_bdev(struct cache_sb *sb, struct 
page *sb_page,
dc->bdev->bd_holder = dc;
 
bio_init_with_vec_table(>sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+   /* single page bio, safe for multipage bvec */
dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);
 
@@ -1794,6 +1797,7 @@ void bch_cache_release(struct kobject *kobj)
for (i = 0; i < RESERVE_NR; i++)
free_fifo(>free[i]);
 
+   /* single page bio, safe for multipage bvec */
if (ca->sb_bio.bi_inline_vecs[0].bv_page)
put_page(ca->sb_bio.bi_io_vec[0].bv_page);
 
@@ -1850,6 +1854,8 @@ static int register_cache(struct cache_sb *sb, struct 
page *sb_page,
ca->bdev->bd_holder = ca;
 
bio_init_with_vec_table(>sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+   /* single page bio, safe for multipage bvec */
ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);
 
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index dde6172f3f10..5cc0b49a65fb 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t 
done)
: 0;
 }
 
+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
 void bch_bio_map(struct bio *bio, void *base)
 {
size_t size = bio->bi_iter.bi_size;
-- 
2.7.4

[PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP

2016-10-29 Thread Ming Lei

MD(especially raid1 and raid10) is a bit difficult to support
multipage bvec, so introduce this flag for not enabling multipage
bvec, then MD can still accept singlepage bvec only, and once
direct access to bvec table in MD and other fs/drivers are cleanuped,
the flag can be removed. BTRFS has the similar issue too.

Signed-off-by: Ming Lei 
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358ba052..e4dd25361bd6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA24   /* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ25  /* flush not queueuable */
 #define QUEUE_FLAG_DAX 26  /* device supports DAX */
+#define QUEUE_FLAG_NO_MP   27  /* multipage bvecs isn't ready */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_STACKABLE)|   \
@@ -595,6 +596,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
 #define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)   test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_no_mp(q) test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

[PATCH 33/60] block: introduce bio_for_each_segment_mp()

2016-10-29 Thread Ming Lei

This helper is used to iterate multipage bvec and it is
required in bio_clone().

Signed-off-by: Ming Lei 
---
 include/linux/bio.h  | 38 +-
 include/linux/bvec.h | 37 -
 2 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index fa71f6a57f81..17852ba0e40f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -65,6 +65,9 @@
 #define bio_sectors(bio)   ((bio)->bi_iter.bi_size >> 9)
 #define bio_end_sector(bio)((bio)->bi_iter.bi_sector + bio_sectors((bio)))
 
+#define bio_iter_iovec_mp(bio, iter)   \
+   bvec_iter_bvec_mp((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -167,15 +170,31 @@ static inline void *bio_data(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)  \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes, bool mp)
 {
iter->bi_sector += bytes >> 9;
 
-   if (bio_no_advance_iter(bio))
+   if (bio_no_advance_iter(bio)) {
iter->bi_size -= bytes;
-   else
-   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   } else {
+   if (!mp)
+   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   else
+   bvec_iter_advance_mp(bio->bi_io_vec, iter, bytes);
+   }
+}
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
+  unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, true);
 }
 
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
@@ -187,6 +206,15 @@ static inline void bio_advance_iter(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)   \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_segment_mp(bvl, bio, iter, start)   \
+   for (iter = (start);\
+(iter).bi_size &&  \
+   ((bvl = bio_iter_iovec_mp((bio), (iter))), 1);  \
+bio_advance_iter_mp((bio), &(iter), (bvl).bv_len))
+
+#define bio_for_each_segment_mp(bvl, bio, iter)
\
+   __bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 12c53a0eee52..9df9e582bd3f 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -128,16 +128,29 @@ struct bvec_iter {
.bv_offset  = bvec_iter_offset((bvec), (iter)), \
 })
 
-static inline void bvec_iter_advance(const struct bio_vec *bv,
-struct bvec_iter *iter,
-unsigned bytes)
+#define bvec_iter_bvec_mp(bvec, iter)  \
+((struct bio_vec) {\
+   .bv_page= bvec_iter_page_mp((bvec), (iter)),\
+   .bv_len = bvec_iter_len_mp((bvec), (iter)), \
+   .bv_offset  = bvec_iter_offset_mp((bvec), (iter)),  \
+})
+
+static inline void __bvec_iter_advance(const struct bio_vec *bv,
+  struct bvec_iter *iter,
+  unsigned bytes, bool mp)
 {
WARN_ONCE(bytes > iter->bi_size,
  "Attempted to advance past end of bvec iter\n");
 
while (bytes) {
-   unsigned iter_len = bvec_iter_len(bv, *iter);
-   unsigned len = min(bytes, iter_len);
+   unsigned len;
+
+   if (mp)
+   len = bvec_iter_len_mp(bv, *iter);
+   else
+   len = bvec_iter_len_sp(bv, *iter);
+
+   len = min(bytes, len);
 
bytes -= len;
iter->bi_size -= len;
@@ -150,6 +163,20 @@ static inline void bvec_iter_advance(const struct bio_vec 
*bv,
}
 }
 
+static inline void bvec_iter_advance(const struct bio_vec *bv,
+struct bvec_iter *iter,
+unsigned bytes)
+{
+   __bvec_iter_advance(bv, iter, bytes, false);
+}
+

[PATCH 31/60] block: introduce multipage/single page bvec helpers

2016-10-29 Thread Ming Lei

This patch introduces helpers which are suffixed with _mp
and _sp for the multipage bvec/segment support.

The helpers with _mp suffix are the interfaces for treating
one bvec/segment as real multipage one, for example, .bv_len
is the total length of the multipage segment.

The helpers with _sp suffix are interfaces for supporting
current bvec iterator which is thought as singlepage only
by drivers, fs, dm and etc. These _sp helpers are introduced
to build singlepage bvec in flight, so users of bio/bvec
iterator still can work well and needn't change even though
we store multipage into bvec.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 57 +---
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b82d98f..da984fa171bc 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -24,6 +24,42 @@
 #include 
 
 /*
+ * What is multipage bvecs(segment)?
+ *
+ * - bvec stored in bio->bi_io_vec is always multipage style vector
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ *   buffer, now the buffer may include more than one pages since
+ *   multipage(mp) bvec is supported, and all these pages represented
+ *   by one bvec is physically contiguous. Before mp support, at most
+ *   one page can be included in one bvec, we call it singlepage(sp)
+ *   bvec.
+ *
+ * - .bv_page of th bvec represents the 1st page in the mp segment
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ *   page, so we keep the sp interface not changed, for example,
+ *   bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment_all() will be changed to return singlepage
+ *   bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ *   updated in multipage bvec style and that means bvec_iter_advance()
+ *   is kept not changed
+ *
+ * - returned(copied) singlepage bvec is generated in flight from bvec
+ *   helpers
+ *
+ * - In case that some components(such as iov_iter) need to support mp
+ *   segment, we introduce new helpers(suffixed with _mp) for them.
+ */
+
+/*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
 struct bio_vec {
@@ -49,16 +85,31 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)   (&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter) \
+#define bvec_iter_page_mp(bvec, iter)  \
(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)  \
+#define bvec_iter_len_mp(bvec, iter)   \
min((iter).bi_size, \
__bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)   \
+#define bvec_iter_offset_mp(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+/*
+ *  of sp segment.
+ *
+ * This helpers will be implemented for building sp bvec in flight.
+ *
+ */
+#define bvec_iter_offset_sp(bvec, iter)bvec_iter_offset_mp((bvec), 
(iter))
+#define bvec_iter_len_sp(bvec, iter)   bvec_iter_len_mp((bvec), (iter))
+#define bvec_iter_page_sp(bvec, iter)  bvec_iter_page_mp((bvec), (iter))
+
+/* current interfaces support sp style at default */
+#define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
+#define bvec_iter_len(bvec, iter)  bvec_iter_len_sp((bvec), (iter))
+#define bvec_iter_offset(bvec, iter)   bvec_iter_offset_sp((bvec), (iter))
+
 #define bvec_iter_bvec(bvec, iter) \
 ((struct bio_vec) {\
.bv_page= bvec_iter_page((bvec), (iter)),   \
-- 
2.7.4

[PATCH 40/60] blk-merge: compute bio->bi_seg_front_size efficiently

2016-10-29 Thread Ming Lei

It is enough to check and compute bio->bi_seg_front_size just
after the 1st segment is found, but current code checks that
for each bvec, which is inefficient.

This patch follows the way in  __blk_recalc_rq_segments()
for computing bio->bi_seg_front_size, and it is more efficient
and code becomes more readable too.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 266c94d1d82f..465d9c65cb41 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -157,22 +157,21 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bvprvp = 
sectors += bv.bv_len >> 9;
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
continue;
}
 new_segment:
if (nsegs == queue_max_segments(q))
goto split;
 
+   if (nsegs == 1 && seg_size > front_seg_size)
+   front_seg_size = seg_size;
+
nsegs++;
bvprv = bv;
bvprvp = 
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
}
 
do_split = false;
@@ -185,6 +184,8 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bio = new;
}
 
+   if (nsegs == 1 && seg_size > front_seg_size)
+   front_seg_size = seg_size;
bio->bi_seg_front_size = front_seg_size;
if (seg_size > bio->bi_seg_back_size)
bio->bi_seg_back_size = seg_size;
-- 
2.7.4

[PATCH 39/60] bcache: debug: switch to bio_clone_sp()

2016-10-29 Thread Ming Lei

The cloned bio has to be singlepage bvec based, so
use bio_clone_sp(), and the allocated bvec table
is enough for hold the bvecs because QUEUE_FLAG_SPLIT_MP
is set for bcache.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/debug.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 71a9f05918eb..0735015b0842 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -111,12 +111,10 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
struct bvec_iter iter, citer = { 0 };
 
/*
-* Once multipage bvec is supported, the bio_clone()
-* has to make sure page count in this bio can be held
-* in the new cloned bio because each single page need
-* to assign to each bvec of the new bio.
+* QUEUE_FLAG_SPLIT_MP can make the cloned singlepage
+* bvecs to be held in the allocated bvec table.
 */
-   check = bio_clone(bio, GFP_NOIO);
+   check = bio_clone_sp(bio, GFP_NOIO);
if (!check)
return;
bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
-- 
2.7.4

[PATCH 21/60] bcache: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Looks all are safe after multipage bvec is supported.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/btree.c | 1 +
 drivers/md/bcache/super.c | 6 ++
 drivers/md/bcache/util.c  | 7 +++
 3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd7b..b419bc91ba32 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
 
continue_at(cl, btree_node_write_done, NULL);
} else {
+   /* No harm for multipage bvec since the new is just allocated */
b->bio->bi_vcnt = 0;
bch_bio_map(b->bio, i);
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d8a6d807b498..52876fcf2b36 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -207,6 +207,7 @@ static void write_bdev_super_endio(struct bio *bio)
 
 static void __write_super(struct cache_sb *sb, struct bio *bio)
 {
+   /* single page bio, safe for multipage bvec */
struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
unsigned i;
 
@@ -1153,6 +1154,8 @@ static void register_bdev(struct cache_sb *sb, struct 
page *sb_page,
dc->bdev->bd_holder = dc;
 
bio_init_with_vec_table(>sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+   /* single page bio, safe for multipage bvec */
dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);
 
@@ -1794,6 +1797,7 @@ void bch_cache_release(struct kobject *kobj)
for (i = 0; i < RESERVE_NR; i++)
free_fifo(>free[i]);
 
+   /* single page bio, safe for multipage bvec */
if (ca->sb_bio.bi_inline_vecs[0].bv_page)
put_page(ca->sb_bio.bi_io_vec[0].bv_page);
 
@@ -1850,6 +1854,8 @@ static int register_cache(struct cache_sb *sb, struct 
page *sb_page,
ca->bdev->bd_holder = ca;
 
bio_init_with_vec_table(>sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+   /* single page bio, safe for multipage bvec */
ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);
 
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index dde6172f3f10..5cc0b49a65fb 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t 
done)
: 0;
 }
 
+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
 void bch_bio_map(struct bio *bio, void *base)
 {
size_t size = bio->bi_iter.bi_size;
-- 
2.7.4

[PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP

2016-10-29 Thread Ming Lei

MD(especially raid1 and raid10) is a bit difficult to support
multipage bvec, so introduce this flag for not enabling multipage
bvec, then MD can still accept singlepage bvec only, and once
direct access to bvec table in MD and other fs/drivers are cleanuped,
the flag can be removed. BTRFS has the similar issue too.

Signed-off-by: Ming Lei 
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358ba052..e4dd25361bd6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA24   /* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ25  /* flush not queueuable */
 #define QUEUE_FLAG_DAX 26  /* device supports DAX */
+#define QUEUE_FLAG_NO_MP   27  /* multipage bvecs isn't ready */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_STACKABLE)|   \
@@ -595,6 +596,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
 #define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)   test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_no_mp(q) test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

[PATCH 33/60] block: introduce bio_for_each_segment_mp()

2016-10-29 Thread Ming Lei

This helper is used to iterate multipage bvec and it is
required in bio_clone().

Signed-off-by: Ming Lei 
---
 include/linux/bio.h  | 38 +-
 include/linux/bvec.h | 37 -
 2 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index fa71f6a57f81..17852ba0e40f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -65,6 +65,9 @@
 #define bio_sectors(bio)   ((bio)->bi_iter.bi_size >> 9)
 #define bio_end_sector(bio)((bio)->bi_iter.bi_sector + bio_sectors((bio)))
 
+#define bio_iter_iovec_mp(bio, iter)   \
+   bvec_iter_bvec_mp((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -167,15 +170,31 @@ static inline void *bio_data(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)  \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes, bool mp)
 {
iter->bi_sector += bytes >> 9;
 
-   if (bio_no_advance_iter(bio))
+   if (bio_no_advance_iter(bio)) {
iter->bi_size -= bytes;
-   else
-   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   } else {
+   if (!mp)
+   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   else
+   bvec_iter_advance_mp(bio->bi_io_vec, iter, bytes);
+   }
+}
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
+  unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, true);
 }
 
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
@@ -187,6 +206,15 @@ static inline void bio_advance_iter(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)   \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_segment_mp(bvl, bio, iter, start)   \
+   for (iter = (start);\
+(iter).bi_size &&  \
+   ((bvl = bio_iter_iovec_mp((bio), (iter))), 1);  \
+bio_advance_iter_mp((bio), &(iter), (bvl).bv_len))
+
+#define bio_for_each_segment_mp(bvl, bio, iter)
\
+   __bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 12c53a0eee52..9df9e582bd3f 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -128,16 +128,29 @@ struct bvec_iter {
.bv_offset  = bvec_iter_offset((bvec), (iter)), \
 })
 
-static inline void bvec_iter_advance(const struct bio_vec *bv,
-struct bvec_iter *iter,
-unsigned bytes)
+#define bvec_iter_bvec_mp(bvec, iter)  \
+((struct bio_vec) {\
+   .bv_page= bvec_iter_page_mp((bvec), (iter)),\
+   .bv_len = bvec_iter_len_mp((bvec), (iter)), \
+   .bv_offset  = bvec_iter_offset_mp((bvec), (iter)),  \
+})
+
+static inline void __bvec_iter_advance(const struct bio_vec *bv,
+  struct bvec_iter *iter,
+  unsigned bytes, bool mp)
 {
WARN_ONCE(bytes > iter->bi_size,
  "Attempted to advance past end of bvec iter\n");
 
while (bytes) {
-   unsigned iter_len = bvec_iter_len(bv, *iter);
-   unsigned len = min(bytes, iter_len);
+   unsigned len;
+
+   if (mp)
+   len = bvec_iter_len_mp(bv, *iter);
+   else
+   len = bvec_iter_len_sp(bv, *iter);
+
+   len = min(bytes, len);
 
bytes -= len;
iter->bi_size -= len;
@@ -150,6 +163,20 @@ static inline void bvec_iter_advance(const struct bio_vec 
*bv,
}
 }
 
+static inline void bvec_iter_advance(const struct bio_vec *bv,
+struct bvec_iter *iter,
+unsigned bytes)
+{
+   __bvec_iter_advance(bv, iter, bytes, false);
+}
+
+static inline void

[PATCH 31/60] block: introduce multipage/single page bvec helpers

2016-10-29 Thread Ming Lei

This patch introduces helpers which are suffixed with _mp
and _sp for the multipage bvec/segment support.

The helpers with _mp suffix are the interfaces for treating
one bvec/segment as real multipage one, for example, .bv_len
is the total length of the multipage segment.

The helpers with _sp suffix are interfaces for supporting
current bvec iterator which is thought as singlepage only
by drivers, fs, dm and etc. These _sp helpers are introduced
to build singlepage bvec in flight, so users of bio/bvec
iterator still can work well and needn't change even though
we store multipage into bvec.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 57 +---
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b82d98f..da984fa171bc 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -24,6 +24,42 @@
 #include 
 
 /*
+ * What is multipage bvecs(segment)?
+ *
+ * - bvec stored in bio->bi_io_vec is always multipage style vector
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ *   buffer, now the buffer may include more than one pages since
+ *   multipage(mp) bvec is supported, and all these pages represented
+ *   by one bvec is physically contiguous. Before mp support, at most
+ *   one page can be included in one bvec, we call it singlepage(sp)
+ *   bvec.
+ *
+ * - .bv_page of th bvec represents the 1st page in the mp segment
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ *   page, so we keep the sp interface not changed, for example,
+ *   bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment_all() will be changed to return singlepage
+ *   bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ *   updated in multipage bvec style and that means bvec_iter_advance()
+ *   is kept not changed
+ *
+ * - returned(copied) singlepage bvec is generated in flight from bvec
+ *   helpers
+ *
+ * - In case that some components(such as iov_iter) need to support mp
+ *   segment, we introduce new helpers(suffixed with _mp) for them.
+ */
+
+/*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
 struct bio_vec {
@@ -49,16 +85,31 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)   (&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter) \
+#define bvec_iter_page_mp(bvec, iter)  \
(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)  \
+#define bvec_iter_len_mp(bvec, iter)   \
min((iter).bi_size, \
__bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)   \
+#define bvec_iter_offset_mp(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+/*
+ *  of sp segment.
+ *
+ * This helpers will be implemented for building sp bvec in flight.
+ *
+ */
+#define bvec_iter_offset_sp(bvec, iter)bvec_iter_offset_mp((bvec), 
(iter))
+#define bvec_iter_len_sp(bvec, iter)   bvec_iter_len_mp((bvec), (iter))
+#define bvec_iter_page_sp(bvec, iter)  bvec_iter_page_mp((bvec), (iter))
+
+/* current interfaces support sp style at default */
+#define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
+#define bvec_iter_len(bvec, iter)  bvec_iter_len_sp((bvec), (iter))
+#define bvec_iter_offset(bvec, iter)   bvec_iter_offset_sp((bvec), (iter))
+
 #define bvec_iter_bvec(bvec, iter) \
 ((struct bio_vec) {\
.bv_page= bvec_iter_page((bvec), (iter)),   \
-- 
2.7.4

[PATCH 40/60] blk-merge: compute bio->bi_seg_front_size efficiently

2016-10-29 Thread Ming Lei

It is enough to check and compute bio->bi_seg_front_size just
after the 1st segment is found, but current code checks that
for each bvec, which is inefficient.

This patch follows the way in  __blk_recalc_rq_segments()
for computing bio->bi_seg_front_size, and it is more efficient
and code becomes more readable too.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 266c94d1d82f..465d9c65cb41 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -157,22 +157,21 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bvprvp = 
sectors += bv.bv_len >> 9;
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
continue;
}
 new_segment:
if (nsegs == queue_max_segments(q))
goto split;
 
+   if (nsegs == 1 && seg_size > front_seg_size)
+   front_seg_size = seg_size;
+
nsegs++;
bvprv = bv;
bvprvp = 
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
}
 
do_split = false;
@@ -185,6 +184,8 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bio = new;
}
 
+   if (nsegs == 1 && seg_size > front_seg_size)
+   front_seg_size = seg_size;
bio->bi_seg_front_size = front_seg_size;
if (seg_size > bio->bi_seg_back_size)
bio->bi_seg_back_size = seg_size;
-- 
2.7.4

[PATCH 39/60] bcache: debug: switch to bio_clone_sp()

2016-10-29 Thread Ming Lei

The cloned bio has to be singlepage bvec based, so
use bio_clone_sp(), and the allocated bvec table
is enough for hold the bvecs because QUEUE_FLAG_SPLIT_MP
is set for bcache.

Signed-off-by: Ming Lei 
---
 drivers/md/bcache/debug.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 71a9f05918eb..0735015b0842 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -111,12 +111,10 @@ void bch_data_verify(struct cached_dev *dc, struct bio 
*bio)
struct bvec_iter iter, citer = { 0 };
 
/*
-* Once multipage bvec is supported, the bio_clone()
-* has to make sure page count in this bio can be held
-* in the new cloned bio because each single page need
-* to assign to each bvec of the new bio.
+* QUEUE_FLAG_SPLIT_MP can make the cloned singlepage
+* bvecs to be held in the allocated bvec table.
 */
-   check = bio_clone(bio, GFP_NOIO);
+   check = bio_clone_sp(bio, GFP_NOIO);
if (!check)
return;
bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
-- 
2.7.4

[PATCH 32/60] block: implement sp version of bvec iterator helpers

2016-10-29 Thread Ming Lei

This patch implements singlepage version of the following
3 helpers:
- bvec_iter_offset_sp()
- bvec_iter_len_sp()
- bvec_iter_page_sp()

So that one multipage bvec can be splited to singlepage
bvec, and make users of current bvec iterator happy.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index da984fa171bc..12c53a0eee52 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * What is multipage bvecs(segment)?
@@ -95,15 +96,25 @@ struct bvec_iter {
 #define bvec_iter_offset_mp(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define bvec_iter_page_idx_mp(bvec, iter)  \
+   (bvec_iter_offset_mp((bvec), (iter)) / PAGE_SIZE)
+
 /*
  *  of sp segment.
  *
  * This helpers will be implemented for building sp bvec in flight.
  *
  */
-#define bvec_iter_offset_sp(bvec, iter)bvec_iter_offset_mp((bvec), 
(iter))
-#define bvec_iter_len_sp(bvec, iter)   bvec_iter_len_mp((bvec), (iter))
-#define bvec_iter_page_sp(bvec, iter)  bvec_iter_page_mp((bvec), (iter))
+#define bvec_iter_offset_sp(bvec, iter)
\
+   (bvec_iter_offset_mp((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len_sp(bvec, iter)   \
+   min_t(unsigned, bvec_iter_len_mp((bvec), (iter)),   \
+   (PAGE_SIZE - (bvec_iter_offset_sp((bvec), (iter)
+
+#define bvec_iter_page_sp(bvec, iter)  \
+   nth_page(bvec_iter_page_mp((bvec), (iter)), \
+bvec_iter_page_idx_mp((bvec), (iter)))
 
 /* current interfaces support sp style at default */
 #define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
-- 
2.7.4

[PATCH 20/60] f2fs: f2fs_read_end_io: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/f2fs/data.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ae194fd2fdb..24f6f6977d37 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -35,6 +35,10 @@ static void f2fs_read_end_io(struct bio *bio)
int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
+   /*
+* It is still safe to retrieve the 1st page of the bio
+* in this way after supporting multipage bvec.
+*/
if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
bio->bi_error = -EIO;
 #endif
-- 
2.7.4

[PATCH 24/60] md: set NO_MP for request queue of md

2016-10-29 Thread Ming Lei

MD isn't ready for multipage bvecs, so mark it as
NO_MP.

Signed-off-by: Ming Lei 
---
 drivers/md/md.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index eac84d8ff724..f8d98098dff8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5128,6 +5128,16 @@ static void md_safemode_timeout(unsigned long data)
 
 static int start_dirty_degraded;
 
+/*
+ * MD isn't ready for multipage bvecs yet, and set the flag
+ * so that MD still can see singlepage bvecs bio
+ */
+static inline void md_set_no_mp(struct mddev *mddev)
+{
+   if (mddev->queue)
+   set_bit(QUEUE_FLAG_NO_MP, >queue->queue_flags);
+}
+
 int md_run(struct mddev *mddev)
 {
int err;
@@ -5353,6 +5363,8 @@ int md_run(struct mddev *mddev)
if (mddev->flags & MD_UPDATE_SB_FLAGS)
md_update_sb(mddev, 0);
 
+   md_set_no_mp(mddev);
+
md_new_event(mddev);
sysfs_notify_dirent_safe(mddev->sysfs_state);
sysfs_notify_dirent_safe(mddev->sysfs_action);
-- 
2.7.4

[PATCH 20/60] f2fs: f2fs_read_end_io: comment on direct access to bvec table

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/f2fs/data.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ae194fd2fdb..24f6f6977d37 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -35,6 +35,10 @@ static void f2fs_read_end_io(struct bio *bio)
int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
+   /*
+* It is still safe to retrieve the 1st page of the bio
+* in this way after supporting multipage bvec.
+*/
if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
bio->bi_error = -EIO;
 #endif
-- 
2.7.4

[PATCH 24/60] md: set NO_MP for request queue of md

2016-10-29 Thread Ming Lei

MD isn't ready for multipage bvecs, so mark it as
NO_MP.

Signed-off-by: Ming Lei 
---
 drivers/md/md.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index eac84d8ff724..f8d98098dff8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5128,6 +5128,16 @@ static void md_safemode_timeout(unsigned long data)
 
 static int start_dirty_degraded;
 
+/*
+ * MD isn't ready for multipage bvecs yet, and set the flag
+ * so that MD still can see singlepage bvecs bio
+ */
+static inline void md_set_no_mp(struct mddev *mddev)
+{
+   if (mddev->queue)
+   set_bit(QUEUE_FLAG_NO_MP, >queue->queue_flags);
+}
+
 int md_run(struct mddev *mddev)
 {
int err;
@@ -5353,6 +5363,8 @@ int md_run(struct mddev *mddev)
if (mddev->flags & MD_UPDATE_SB_FLAGS)
md_update_sb(mddev, 0);
 
+   md_set_no_mp(mddev);
+
md_new_event(mddev);
sysfs_notify_dirent_safe(mddev->sysfs_state);
sysfs_notify_dirent_safe(mddev->sysfs_action);
-- 
2.7.4

[PATCH 32/60] block: implement sp version of bvec iterator helpers

2016-10-29 Thread Ming Lei

This patch implements singlepage version of the following
3 helpers:
- bvec_iter_offset_sp()
- bvec_iter_len_sp()
- bvec_iter_page_sp()

So that one multipage bvec can be splited to singlepage
bvec, and make users of current bvec iterator happy.

Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index da984fa171bc..12c53a0eee52 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * What is multipage bvecs(segment)?
@@ -95,15 +96,25 @@ struct bvec_iter {
 #define bvec_iter_offset_mp(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define bvec_iter_page_idx_mp(bvec, iter)  \
+   (bvec_iter_offset_mp((bvec), (iter)) / PAGE_SIZE)
+
 /*
  *  of sp segment.
  *
  * This helpers will be implemented for building sp bvec in flight.
  *
  */
-#define bvec_iter_offset_sp(bvec, iter)bvec_iter_offset_mp((bvec), 
(iter))
-#define bvec_iter_len_sp(bvec, iter)   bvec_iter_len_mp((bvec), (iter))
-#define bvec_iter_page_sp(bvec, iter)  bvec_iter_page_mp((bvec), (iter))
+#define bvec_iter_offset_sp(bvec, iter)
\
+   (bvec_iter_offset_mp((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len_sp(bvec, iter)   \
+   min_t(unsigned, bvec_iter_len_mp((bvec), (iter)),   \
+   (PAGE_SIZE - (bvec_iter_offset_sp((bvec), (iter)
+
+#define bvec_iter_page_sp(bvec, iter)  \
+   nth_page(bvec_iter_page_mp((bvec), (iter)), \
+bvec_iter_page_idx_mp((bvec), (iter)))
 
 /* current interfaces support sp style at default */
 #define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
-- 
2.7.4

[PATCH 41/60] block: blk-merge: try to make front segments in full size

2016-10-29 Thread Ming Lei

When merging one bvec into segment, if the bvec is too big
to merge, current policy is to move the whole bvec into another
new segment.

This patchset changes the policy into trying to maximize size of
front segments, that means in above situation, part of bvec
is merged into current segment, and the remainder is put
into next segment.

This patch prepares for support multipage bvec because
it can be quite common to see this case and we should try
to make front segments in full size.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 44 +++-
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 465d9c65cb41..a6457e70dafc 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -99,6 +99,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,
struct bio *new = NULL;
const unsigned max_sectors = get_max_io_size(q, bio);
unsigned bvecs = 0;
+   unsigned advance;
 
bio_for_each_segment(bv, bio, iter) {
/*
@@ -129,6 +130,7 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
goto split;
 
+   advance = 0;
if (sectors + (bv.bv_len >> 9) > max_sectors) {
/*
 * Consider this a new segment if we're splitting in
@@ -145,12 +147,24 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
}
 
if (bvprvp && blk_queue_cluster(q)) {
-   if (seg_size + bv.bv_len > queue_max_segment_size(q))
-   goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(bvprvp, ))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprvp, ))
goto new_segment;
+   if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
+   advance = queue_max_segment_size(q) - seg_size;
+
+   if (advance > 0) {
+   seg_size += advance;
+   sectors += advance >> 9;
+   bv.bv_len -= advance;
+   bv.bv_offset += advance;
+   } else {
+   advance = 0;
+   }
+
+   goto new_segment;
+   }
 
seg_size += bv.bv_len;
bvprv = bv;
@@ -172,6 +186,9 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
 
+   /* restore the bvec for iterator */
+   bv.bv_len += advance;
+   bv.bv_offset -= advance;
}
 
do_split = false;
@@ -371,16 +388,29 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
 {
 
int nbytes = bvec->bv_len;
+   int advance = 0;
 
if (*sg && *cluster) {
-   if ((*sg)->length + nbytes > queue_max_segment_size(q))
-   goto new_segment;
-
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;
 
+   /* try best to merge part of the bvec into previous seg */
+   if ((*sg)->length + nbytes > queue_max_segment_size(q)) {
+   advance = queue_max_segment_size(q) - (*sg)->length;
+   if (advance <= 0) {
+   advance = 0;
+   goto new_segment;
+   }
+
+   (*sg)->length += advance;
+
+   bvec->bv_offset += advance;
+   bvec->bv_len -= advance;
+   goto new_segment;
+   }
+
(*sg)->length += nbytes;
} else {
 new_segment:
@@ -403,6 +433,10 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
 
sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
(*nsegs)++;
+
+   /* for making iterator happy */
+   bvec->bv_offset -= advance;
+   bvec->bv_len += advance;
}
*bvprv = *bvec;
 }
-- 
2.7.4

[PATCH 50/60] ext4: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/ext4/page-io.c  | 3 ++-
 fs/ext4/readpage.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 0094923e5ebf..abde26af55e7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -63,8 +63,9 @@ static void ext4_finish_bio(struct bio *bio)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 #ifdef CONFIG_EXT4_FS_ENCRYPTION
struct page *data_page = NULL;
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index a81b829d56de..b30444fd9333 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -71,6 +71,7 @@ static void mpage_end_io(struct bio *bio)
 {
struct bio_vec *bv;
int i;
+   struct bvec_iter_all bia;
 
if (ext4_bio_encrypted(bio)) {
if (bio->bi_error) {
@@ -80,7 +81,7 @@ static void mpage_end_io(struct bio *bio)
return;
}
}
-   bio_for_each_segment_all(bv, bio, i) {
+   bio_for_each_segment_all_rd(bv, bio, i, bia) {
struct page *page = bv->bv_page;
 
if (!bio->bi_error) {
-- 
2.7.4

Re: [PATCH v2 2/6] net: phy: broadcom: Add BCM54810 PHY entry

2016-10-29 Thread Andrew Lunn

On Fri, Oct 28, 2016 at 04:56:55PM -0400, Jon Mason wrote:
> The BCM54810 PHY requires some semi-unique configuration, which results
> in some additional configuration in addition to the standard config.
> Also, some users of the BCM54810 require the PHY lanes to be swapped.
> Since there is no way to detect this, add a device tree query to see if
> it is applicable.
> 
> Inspired-by: Vikas Soni 
> Signed-off-by: Jon Mason 
> ---
>  drivers/net/phy/Kconfig|  2 +-
>  drivers/net/phy/broadcom.c | 58 
> +-
>  include/linux/brcmphy.h| 10 

Hi Jon

The binding documentation is missing.

> + if (of_property_read_bool(np, "brcm,enet-phy-lane-swap")) {
> + /* Lane Swap - Undocumented register...magic! */
> + ret = bcm_phy_write_exp(phydev, MII_BCM54XX_EXP_SEL_ER + 0x9,
> + 0x11B);
> + if (ret < 0)
> + return ret;
> + }
> +

I wounder if this property could be made generic? What exactly are you
swapping? Rx and Tx lanes? Maybe we should add it to phy.txt?

  Andrew

[PATCH 41/60] block: blk-merge: try to make front segments in full size

2016-10-29 Thread Ming Lei

When merging one bvec into segment, if the bvec is too big
to merge, current policy is to move the whole bvec into another
new segment.

This patchset changes the policy into trying to maximize size of
front segments, that means in above situation, part of bvec
is merged into current segment, and the remainder is put
into next segment.

This patch prepares for support multipage bvec because
it can be quite common to see this case and we should try
to make front segments in full size.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 44 +++-
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 465d9c65cb41..a6457e70dafc 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -99,6 +99,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,
struct bio *new = NULL;
const unsigned max_sectors = get_max_io_size(q, bio);
unsigned bvecs = 0;
+   unsigned advance;
 
bio_for_each_segment(bv, bio, iter) {
/*
@@ -129,6 +130,7 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
goto split;
 
+   advance = 0;
if (sectors + (bv.bv_len >> 9) > max_sectors) {
/*
 * Consider this a new segment if we're splitting in
@@ -145,12 +147,24 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
}
 
if (bvprvp && blk_queue_cluster(q)) {
-   if (seg_size + bv.bv_len > queue_max_segment_size(q))
-   goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(bvprvp, ))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprvp, ))
goto new_segment;
+   if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
+   advance = queue_max_segment_size(q) - seg_size;
+
+   if (advance > 0) {
+   seg_size += advance;
+   sectors += advance >> 9;
+   bv.bv_len -= advance;
+   bv.bv_offset += advance;
+   } else {
+   advance = 0;
+   }
+
+   goto new_segment;
+   }
 
seg_size += bv.bv_len;
bvprv = bv;
@@ -172,6 +186,9 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
 
+   /* restore the bvec for iterator */
+   bv.bv_len += advance;
+   bv.bv_offset -= advance;
}
 
do_split = false;
@@ -371,16 +388,29 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
 {
 
int nbytes = bvec->bv_len;
+   int advance = 0;
 
if (*sg && *cluster) {
-   if ((*sg)->length + nbytes > queue_max_segment_size(q))
-   goto new_segment;
-
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;
 
+   /* try best to merge part of the bvec into previous seg */
+   if ((*sg)->length + nbytes > queue_max_segment_size(q)) {
+   advance = queue_max_segment_size(q) - (*sg)->length;
+   if (advance <= 0) {
+   advance = 0;
+   goto new_segment;
+   }
+
+   (*sg)->length += advance;
+
+   bvec->bv_offset += advance;
+   bvec->bv_len -= advance;
+   goto new_segment;
+   }
+
(*sg)->length += nbytes;
} else {
 new_segment:
@@ -403,6 +433,10 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
 
sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
(*nsegs)++;
+
+   /* for making iterator happy */
+   bvec->bv_offset -= advance;
+   bvec->bv_len += advance;
}
*bvprv = *bvec;
 }
-- 
2.7.4

[PATCH 50/60] ext4: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/ext4/page-io.c  | 3 ++-
 fs/ext4/readpage.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 0094923e5ebf..abde26af55e7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -63,8 +63,9 @@ static void ext4_finish_bio(struct bio *bio)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 #ifdef CONFIG_EXT4_FS_ENCRYPTION
struct page *data_page = NULL;
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index a81b829d56de..b30444fd9333 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -71,6 +71,7 @@ static void mpage_end_io(struct bio *bio)
 {
struct bio_vec *bv;
int i;
+   struct bvec_iter_all bia;
 
if (ext4_bio_encrypted(bio)) {
if (bio->bi_error) {
@@ -80,7 +81,7 @@ static void mpage_end_io(struct bio *bio)
return;
}
}
-   bio_for_each_segment_all(bv, bio, i) {
+   bio_for_each_segment_all_rd(bv, bio, i, bia) {
struct page *page = bv->bv_page;
 
if (!bio->bi_error) {
-- 
2.7.4

Re: [PATCH v2 2/6] net: phy: broadcom: Add BCM54810 PHY entry

2016-10-29 Thread Andrew Lunn

On Fri, Oct 28, 2016 at 04:56:55PM -0400, Jon Mason wrote:
> The BCM54810 PHY requires some semi-unique configuration, which results
> in some additional configuration in addition to the standard config.
> Also, some users of the BCM54810 require the PHY lanes to be swapped.
> Since there is no way to detect this, add a device tree query to see if
> it is applicable.
> 
> Inspired-by: Vikas Soni 
> Signed-off-by: Jon Mason 
> ---
>  drivers/net/phy/Kconfig|  2 +-
>  drivers/net/phy/broadcom.c | 58 
> +-
>  include/linux/brcmphy.h| 10 

Hi Jon

The binding documentation is missing.

> + if (of_property_read_bool(np, "brcm,enet-phy-lane-swap")) {
> + /* Lane Swap - Undocumented register...magic! */
> + ret = bcm_phy_write_exp(phydev, MII_BCM54XX_EXP_SEL_ER + 0x9,
> + 0x11B);
> + if (ret < 0)
> + return ret;
> + }
> +

I wounder if this property could be made generic? What exactly are you
swapping? Rx and Tx lanes? Maybe we should add it to phy.txt?

  Andrew

[PATCH 49/60] fs/direct-io: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/direct-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index fb9aa16a7727..cfad1ac8fa53 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -487,7 +487,9 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
err = bio->bi_error;
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
-   bio_for_each_segment_all(bvec, bio, i) {
+   struct bvec_iter_all bia;
+
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 
if (dio->op == REQ_OP_READ && !PageCompound(page) &&
-- 
2.7.4

[PATCH 43/60] block: use bio_for_each_segment_mp() to map sg

2016-10-29 Thread Ming Lei

It is more efficient to use bio_for_each_segment_mp()
for mapping sg, meantime we have to consider splitting
multipage bvec as done in blk_bio_segment_split().

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 72 +++
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9142f1fc914b..e3b8cbc8b675 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -442,6 +442,56 @@ static int blk_phys_contig_segment(struct request_queue 
*q, struct bio *bio,
return 0;
 }
 
+static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
+   struct scatterlist *sglist)
+{
+   if (!*sg)
+   return sglist;
+   else {
+   /*
+* If the driver previously mapped a shorter
+* list, we could see a termination bit
+* prematurely unless it fully inits the sg
+* table on each mapping. We KNOW that there
+* must be more entries here or the driver
+* would be buggy, so force clear the
+* termination bit to avoid doing a full
+* sg_init_table() in drivers for each command.
+*/
+   sg_unmark_end(*sg);
+   return sg_next(*sg);
+   }
+}
+
+static inline unsigned
+blk_bvec_map_sg(struct request_queue *q, struct bio_vec *bvec,
+   struct scatterlist *sglist, struct scatterlist **sg)
+{
+   unsigned nbytes = bvec->bv_len;
+   unsigned nsegs = 0, total = 0;
+
+   while (nbytes > 0) {
+   unsigned seg_size;
+   struct page *pg;
+   unsigned offset, idx;
+
+   *sg = blk_next_sg(sg, sglist);
+
+   seg_size = min(nbytes, queue_max_segment_size(q));
+   offset = (total + bvec->bv_offset) % PAGE_SIZE;
+   idx = (total + bvec->bv_offset) / PAGE_SIZE;
+   pg = nth_page(bvec->bv_page, idx);
+
+   sg_set_page(*sg, pg, seg_size, offset);
+
+   total += seg_size;
+   nbytes -= seg_size;
+   nsegs++;
+   }
+
+   return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -475,25 +525,7 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
(*sg)->length += nbytes;
} else {
 new_segment:
-   if (!*sg)
-   *sg = sglist;
-   else {
-   /*
-* If the driver previously mapped a shorter
-* list, we could see a termination bit
-* prematurely unless it fully inits the sg
-* table on each mapping. We KNOW that there
-* must be more entries here or the driver
-* would be buggy, so force clear the
-* termination bit to avoid doing a full
-* sg_init_table() in drivers for each command.
-*/
-   sg_unmark_end(*sg);
-   *sg = sg_next(*sg);
-   }
-
-   sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-   (*nsegs)++;
+   (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
 
/* for making iterator happy */
bvec->bv_offset -= advance;
@@ -536,7 +568,7 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
}
 
for_each_bio(bio)
-   bio_for_each_segment(bvec, bio, iter)
+   bio_for_each_segment_mp(bvec, bio, iter)
__blk_segment_map_sg(q, , sglist, , sg,
 , );
 
-- 
2.7.4

[PATCH 52/60] logfs: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/logfs/dev_bdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index f05a02ff43e6..b81bd2154253 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -55,10 +55,11 @@ static void writeseg_end_io(struct bio *bio)
int i;
struct super_block *sb = bio->bi_private;
struct logfs_super *super = logfs_super(sb);
+   struct bvec_iter_all bia;
 
BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
end_page_writeback(bvec->bv_page);
put_page(bvec->bv_page);
}
-- 
2.7.4

[PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair

2016-10-29 Thread Ming Lei

This patches introduce bio_for_each_segment_all_rd() and
bio_for_each_segment_all_wt().

bio_for_each_segment_all_rd() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
is accessed as readonly.

bio_for_each_segment_all_wt() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
need to be updated.

Signed-off-by: Ming Lei 
---
 include/linux/bio.h   | 15 +++
 include/linux/blk_types.h |  6 ++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index ec1c0f2aaa19..f8a025ffaa9c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -215,6 +215,21 @@ static inline void bio_advance_iter_mp(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment_mp(bvl, bio, iter)
\
__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
 
+/* the bio has to be singlepage bvecs based */
+#define bio_for_each_segment_all_wt(bvl, bio, i)   \
+   bio_for_each_segment_all((bvl), (bio), (i))
+
+/*
+ * This helper returns singlepage bvec to caller for readonly
+ * purpose, and the caller can _not_ change the bvec stored in
+ * bio->bi_io_vec[] via this helper.
+ */
+#define bio_for_each_segment_all_rd(bvl, bio, i, bi)   \
+   for ((bi).iter = BVEC_ITER_ALL_INIT, i = 0, bvl = &(bi).bv; \
+(bi).iter.bi_idx < (bio)->bi_vcnt &&   \
+   (((bi).bv = bio_iter_iovec((bio), (bi).iter)), 1);  \
+bio_advance_iter((bio), &(bi).iter, (bi).bv.bv_len), i++)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned __bio_segments(struct bio *bio, bool mp)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ecec99d..b4a202e98016 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -108,6 +108,12 @@ struct bio {
 
 #define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs)
 
+/* this iter is only for implementing bio_for_each_segment_rd() */
+struct bvec_iter_all {
+   struct bvec_iteriter;
+   struct bio_vec  bv;  /* in-flight singlepage bvec */
+};
+
 /*
  * bio flags
  */
-- 
2.7.4

[PATCH 49/60] fs/direct-io: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/direct-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index fb9aa16a7727..cfad1ac8fa53 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -487,7 +487,9 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
err = bio->bi_error;
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
-   bio_for_each_segment_all(bvec, bio, i) {
+   struct bvec_iter_all bia;
+
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 
if (dio->op == REQ_OP_READ && !PageCompound(page) &&
-- 
2.7.4

[PATCH 43/60] block: use bio_for_each_segment_mp() to map sg

2016-10-29 Thread Ming Lei

It is more efficient to use bio_for_each_segment_mp()
for mapping sg, meantime we have to consider splitting
multipage bvec as done in blk_bio_segment_split().

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 72 +++
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9142f1fc914b..e3b8cbc8b675 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -442,6 +442,56 @@ static int blk_phys_contig_segment(struct request_queue 
*q, struct bio *bio,
return 0;
 }
 
+static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
+   struct scatterlist *sglist)
+{
+   if (!*sg)
+   return sglist;
+   else {
+   /*
+* If the driver previously mapped a shorter
+* list, we could see a termination bit
+* prematurely unless it fully inits the sg
+* table on each mapping. We KNOW that there
+* must be more entries here or the driver
+* would be buggy, so force clear the
+* termination bit to avoid doing a full
+* sg_init_table() in drivers for each command.
+*/
+   sg_unmark_end(*sg);
+   return sg_next(*sg);
+   }
+}
+
+static inline unsigned
+blk_bvec_map_sg(struct request_queue *q, struct bio_vec *bvec,
+   struct scatterlist *sglist, struct scatterlist **sg)
+{
+   unsigned nbytes = bvec->bv_len;
+   unsigned nsegs = 0, total = 0;
+
+   while (nbytes > 0) {
+   unsigned seg_size;
+   struct page *pg;
+   unsigned offset, idx;
+
+   *sg = blk_next_sg(sg, sglist);
+
+   seg_size = min(nbytes, queue_max_segment_size(q));
+   offset = (total + bvec->bv_offset) % PAGE_SIZE;
+   idx = (total + bvec->bv_offset) / PAGE_SIZE;
+   pg = nth_page(bvec->bv_page, idx);
+
+   sg_set_page(*sg, pg, seg_size, offset);
+
+   total += seg_size;
+   nbytes -= seg_size;
+   nsegs++;
+   }
+
+   return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -475,25 +525,7 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
(*sg)->length += nbytes;
} else {
 new_segment:
-   if (!*sg)
-   *sg = sglist;
-   else {
-   /*
-* If the driver previously mapped a shorter
-* list, we could see a termination bit
-* prematurely unless it fully inits the sg
-* table on each mapping. We KNOW that there
-* must be more entries here or the driver
-* would be buggy, so force clear the
-* termination bit to avoid doing a full
-* sg_init_table() in drivers for each command.
-*/
-   sg_unmark_end(*sg);
-   *sg = sg_next(*sg);
-   }
-
-   sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-   (*nsegs)++;
+   (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
 
/* for making iterator happy */
bvec->bv_offset -= advance;
@@ -536,7 +568,7 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
}
 
for_each_bio(bio)
-   bio_for_each_segment(bvec, bio, iter)
+   bio_for_each_segment_mp(bvec, bio, iter)
__blk_segment_map_sg(q, , sglist, , sg,
 , );
 
-- 
2.7.4

[PATCH 52/60] logfs: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/logfs/dev_bdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index f05a02ff43e6..b81bd2154253 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -55,10 +55,11 @@ static void writeseg_end_io(struct bio *bio)
int i;
struct super_block *sb = bio->bi_private;
struct logfs_super *super = logfs_super(sb);
+   struct bvec_iter_all bia;
 
BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
end_page_writeback(bvec->bv_page);
put_page(bvec->bv_page);
}
-- 
2.7.4

[PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair

2016-10-29 Thread Ming Lei

This patches introduce bio_for_each_segment_all_rd() and
bio_for_each_segment_all_wt().

bio_for_each_segment_all_rd() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
is accessed as readonly.

bio_for_each_segment_all_wt() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
need to be updated.

Signed-off-by: Ming Lei 
---
 include/linux/bio.h   | 15 +++
 include/linux/blk_types.h |  6 ++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index ec1c0f2aaa19..f8a025ffaa9c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -215,6 +215,21 @@ static inline void bio_advance_iter_mp(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment_mp(bvl, bio, iter)
\
__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
 
+/* the bio has to be singlepage bvecs based */
+#define bio_for_each_segment_all_wt(bvl, bio, i)   \
+   bio_for_each_segment_all((bvl), (bio), (i))
+
+/*
+ * This helper returns singlepage bvec to caller for readonly
+ * purpose, and the caller can _not_ change the bvec stored in
+ * bio->bi_io_vec[] via this helper.
+ */
+#define bio_for_each_segment_all_rd(bvl, bio, i, bi)   \
+   for ((bi).iter = BVEC_ITER_ALL_INIT, i = 0, bvl = &(bi).bv; \
+(bi).iter.bi_idx < (bio)->bi_vcnt &&   \
+   (((bi).bv = bio_iter_iovec((bio), (bi).iter)), 1);  \
+bio_advance_iter((bio), &(bi).iter, (bi).bv.bv_len), i++)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned __bio_segments(struct bio *bio, bool mp)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ecec99d..b4a202e98016 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -108,6 +108,12 @@ struct bio {
 
 #define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs)
 
+/* this iter is only for implementing bio_for_each_segment_rd() */
+struct bvec_iter_all {
+   struct bvec_iteriter;
+   struct bio_vec  bv;  /* in-flight singlepage bvec */
+};
+
 /*
  * bio flags
  */
-- 
2.7.4

[PATCH 47/60] block: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 block/bio.c| 17 +++--
 block/bounce.c |  6 --
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8e5af6e8bba3..c9cf0a81cca3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -934,7 +934,7 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   bio_for_each_segment_all_wt(bv, bio, i) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
@@ -1035,8 +1035,9 @@ static int bio_copy_from_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
ssize_t ret;
 
ret = copy_page_from_iter(bvec->bv_page,
@@ -1066,8 +1067,9 @@ static int bio_copy_to_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
ssize_t ret;
 
ret = copy_page_to_iter(bvec->bv_page,
@@ -1089,8 +1091,9 @@ void bio_free_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all_rd(bvec, bio, i, bia)
__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1390,11 +1393,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
/*
 * make sure we dirty pages we wrote to
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
 
@@ -1486,8 +1490,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
diff --git a/block/bounce.c b/block/bounce.c
index da240d1de809..5459127188c1 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -135,11 +135,12 @@ static void bounce_end_io(struct bio *bio, mempool_t 
*pool)
struct bio_vec *bvec, orig_vec;
int i;
struct bvec_iter orig_iter = bio_orig->bi_iter;
+   struct bvec_iter_all bia;
 
/*
 * free up bounce indirect pages used
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 
orig_vec = bio_iter_iovec(bio_orig, orig_iter);
if (bvec->bv_page == orig_vec.bv_page)
@@ -214,13 +215,14 @@ static void __blk_queue_bounce(struct request_queue *q, 
struct bio **bio_orig,
int rw = bio_data_dir(*bio_orig);
struct bio_vec *to;
unsigned i;
+   struct bvec_iter_all bia;
 
if (!need_bounce(q, *bio_orig))
return;
 
bio = bio_clone_bioset_sp(*bio_orig, GFP_NOIO, fs_bio_set);
 
-   bio_for_each_segment_all(to, bio, i) {
+   bio_for_each_segment_all_rd(to, bio, i, bia) {
struct page *page = to->bv_page;
 
if (page_to_pfn(page) <= queue_bounce_pfn(q))
-- 
2.7.4

[PATCH 48/60] fs/mpage: convert to bio_for_each_segment_all_rd()

2016-10-29 Thread Ming Lei

Signed-off-by: Ming Lei 
---
 fs/mpage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index d2413af0823a..2c906e82dd49 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -46,9 +46,10 @@
 static void mpage_end_io(struct bio *bio)
 {
struct bio_vec *bv;
+   struct bvec_iter_all bia;
int i;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   bio_for_each_segment_all_rd(bv, bio, i, bia) {
struct page *page = bv->bv_page;
page_endio(page, op_is_write(bio_op(bio)), bio->bi_error);
}
-- 
2.7.4

< 4 5 6 7 8 9 10 11 >

801 - 900 of 1010 matches

Mail list logo