Re: [PATCH v1 1/5] crypto: talitos - fix ECB and CBC algs ivsize

2019-06-11 Thread Christophe Leroy




Le 11/06/2019 à 13:52, Horia Geanta a écrit :

On 6/6/2019 2:31 PM, Christophe Leroy wrote:

commit d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")
wrongly modified CBC algs ivsize instead of ECB aggs ivsize.

This restore the CBC algs original ivsize of removes ECB's ones.

Signed-off-by: Christophe Leroy 
Fixes: d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")

Initial patch is correct:

$ git show -U10 d84cc9c9524e
[...]
@@ -2802,21 +2802,20 @@ static struct talitos_alg_template driver_algs[] = {
 {   .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
 .alg.crypto = {
 .cra_name = "ecb(aes)",
 .cra_driver_name = "ecb-aes-talitos",
 .cra_blocksize = AES_BLOCK_SIZE,
 .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
  CRYPTO_ALG_ASYNC,
 .cra_ablkcipher = {
 .min_keysize = AES_MIN_KEY_SIZE,
 .max_keysize = AES_MAX_KEY_SIZE,
-   .ivsize = AES_BLOCK_SIZE,
 .setkey = ablkcipher_aes_setkey,
 }
 },
[...]

and similar for ecb(des), ecb(des3_ede).

Current patch is incorrect: it adds ivsize for ecb and removes it from cbc.


Very strange. Looks like there has been some rebase weirdness which have 
applied the patch on the wrong block at some point on my side, probably 
due to the fact that both blocks have the two previous and following 
lines identical.


I've now rebased my serie on cryptodev/master and have the same 
behaviour as you. I'll resend the series without this patch.


Christophe




Horia



Re: [PATCH v1 2/5] crypto: talitos - move struct talitos_edesc into talitos.h

2019-06-11 Thread Horia Geanta
On 6/11/2019 3:38 PM, Christophe Leroy wrote:
> 
> 
> Le 11/06/2019 à 13:57, Horia Geanta a écrit :
>> On 6/6/2019 2:31 PM, Christophe Leroy wrote:
>>> Next patch will require struct talitos_edesc to be defined
>>> earlier in talitos.c
>>>
>>> This patch moves it into talitos.h so that it can be used
>>> from any place in talitos.c
>>>
>>> Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
>>> SEC1")
>> This isn't really a fix, so please drop the tag.
> 
> As the next patch requires this one and Fixes 37b5e8897eb5, by setting 
> Fixes: 37b5e8897eb5 here was for me a way to tell stable that this one 
> is required for the following one.
> 
> Otherwise, how can I ensure that this one will be taken when next one is 
> taken ?
> 
If you want these patches to be automatically sent to -stable (once they
are merged in main tree), then add a Cc:  tag.

Horia


Re: [PATCH v1 2/5] crypto: talitos - move struct talitos_edesc into talitos.h

2019-06-11 Thread Christophe Leroy




Le 11/06/2019 à 13:57, Horia Geanta a écrit :

On 6/6/2019 2:31 PM, Christophe Leroy wrote:

Next patch will require struct talitos_edesc to be defined
earlier in talitos.c

This patch moves it into talitos.h so that it can be used
from any place in talitos.c

Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
SEC1")

This isn't really a fix, so please drop the tag.


As the next patch requires this one and Fixes 37b5e8897eb5, by setting 
Fixes: 37b5e8897eb5 here was for me a way to tell stable that this one 
is required for the following one.


Otherwise, how can I ensure that this one will be taken when next one is 
taken ?


Christophe




Thanks,
Horia



Question - check in runtime which architecture am I running on

2019-06-11 Thread Oded Gabbay
Hello POWER developers,

I'm trying to find out if there is an internal kernel API so that a
PCI driver can call it to check if its PCI device is running inside a
POWER9 machine. Alternatively, if that's not available, if it is
running on a machine with powerpc architecture.

I need this information as my device (Goya AI accelerator)
unfortunately needs a slightly different configuration of its PCIe
controller in case of POWER9 (need to set bit 59 to be 1 in all
outbound transactions).

Currently I'm reading the PCI vendor and device ID of the parent PCI
bus device and checking if it is PHB4 but that is an ugly hack. (see
this commit - 
https://github.com/HabanaAI/linux/commit/1efd75ad5c9779b99a9a38c899e4e25e227626bf)

I dug through the code but didn't find anything that can help me so I
thought of asking more experienced people.

Thanks,
Oded


Re: [PATCH v1 0/5] Additional fixes on Talitos driver

2019-06-11 Thread Horia Geanta
On 6/6/2019 2:31 PM, Christophe Leroy wrote:
> This series is the last set of fixes for the Talitos driver.
> 
> We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and
> SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS:
> 
I get failures, probably due to patch 1/5:

alg: skcipher: cbc-aes-talitos encryption test failed (wrong result) on test 
vector 0, cfg="in-place"
alg: skcipher: cbc-des-talitos encryption test failed (wrong result) on test 
vector 0, cfg="in-place"
alg: skcipher: cbc-3des-talitos encryption test failed (wrong result) on test 
vector 0, cfg="in-place"

Horia


Re: [PATCH v1 2/5] crypto: talitos - move struct talitos_edesc into talitos.h

2019-06-11 Thread Horia Geanta
On 6/6/2019 2:31 PM, Christophe Leroy wrote:
> Next patch will require struct talitos_edesc to be defined
> earlier in talitos.c
> 
> This patch moves it into talitos.h so that it can be used
> from any place in talitos.c
> 
> Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
> SEC1")
This isn't really a fix, so please drop the tag.

Thanks,
Horia


Re: [PATCH v1 1/5] crypto: talitos - fix ECB and CBC algs ivsize

2019-06-11 Thread Horia Geanta
On 6/6/2019 2:31 PM, Christophe Leroy wrote:
> commit d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")
> wrongly modified CBC algs ivsize instead of ECB aggs ivsize.
> 
> This restore the CBC algs original ivsize of removes ECB's ones.
> 
> Signed-off-by: Christophe Leroy 
> Fixes: d84cc9c9524e ("crypto: talitos - fix ECB algs ivsize")
Initial patch is correct:

$ git show -U10 d84cc9c9524e
[...]
@@ -2802,21 +2802,20 @@ static struct talitos_alg_template driver_algs[] = {
{   .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
.alg.crypto = {
.cra_name = "ecb(aes)",
.cra_driver_name = "ecb-aes-talitos",
.cra_blocksize = AES_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
 CRYPTO_ALG_ASYNC,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
-   .ivsize = AES_BLOCK_SIZE,
.setkey = ablkcipher_aes_setkey,
}
},
[...]

and similar for ecb(des), ecb(des3_ede).

Current patch is incorrect: it adds ivsize for ecb and removes it from cbc.

Horia


Re: [PATCH v3 3/6] mm/nvdimm: Add page size and struct page size to pfn superblock

2019-06-11 Thread Jan Kara
On Tue 04-06-19 14:43:54, Aneesh Kumar K.V wrote:
> This is needed so that we don't wrongly initialize a namespace
> which doesn't have enough space reserved for holding struct pages
> with the current kernel.
> 
> We also increment PFN_MIN_VERSION to make sure that older kernel
> won't initialize namespace created with newer kernel.
> 
> Signed-off-by: Aneesh Kumar K.V 
...
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 00c57805cad3..e01eee9efafe 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -467,6 +467,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
> *sig)
>   if (__le16_to_cpu(pfn_sb->version_minor) < 2)
>   pfn_sb->align = 0;
>  
> + if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
> + /*
> +  * For a large part we use PAGE_SIZE. But we
> +  * do have some accounting code using SZ_4K.
> +  */
> + pfn_sb->page_struct_size = cpu_to_le16(64);
> + pfn_sb->page_size = cpu_to_le32(SZ_4K);
> + }
> +
>   switch (le32_to_cpu(pfn_sb->mode)) {
>   case PFN_MODE_RAM:
>   case PFN_MODE_PMEM:

As we discussed with Aneesh privately, this actually means that existing
NVDIMM namespaces on PPC64 will stop working due to these defaults for old
superblocks. I don't think that's a good thing as upgrading kernels is
going to be nightmare due to this on PPC64. So I believe we should make
defaults for old superblocks such that working setups keep working without
sysadmin having to touch anything.

Honza
-- 
Jan Kara 
SUSE Labs, CR


[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #9 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Christophe Leroy from comment #8)
> Argh !
> 
> CONFIG_SMP must (again) be the reason we missed it.
> 
> Can you please try the change below ?
Applied your change on top of 5.2-rc4. The G4 boots fine again, thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling
On Tue, 2019-06-11 at 09:51 +0200, Christophe Leroy wrote:
> 
> Le 11/06/2019 à 09:24, Michael Neuling a écrit :
> > On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> > > On 11/06/2019 08:44, Michael Neuling wrote:
> > > > > >   2:
> > > > > > -BEGIN_FTR_SECTION
> > > > > > -   /* POWER9 with disabled DAWR */
> > > > > > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > > > +   lbz r11, 0(r11)
> > > > > > +   cmpdi   r11, 0
> > > > > > li  r3, H_HARDWARE
> > > > > > -   blr
> > > > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > > > +   beqlr
> > > > > 
> > > > > Why is this a 'beqlr' ? Shouldn't it be a blr ?
> > > > 
> > > > I believe it's right and should be a beqlr.  It's to replace the FTR
> > > > section to
> > > > make it dynamic based on the dawr_force_enable bit.
> > > 
> > > hmm, see the crash below on a L1 running a nested guest. r3 is set
> > > to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> > > this ?
> > > 
> > > C.
> > > 
> > > 
> > > [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
> > > [   44.374848] Faulting instruction address: 0xc010b044
> > > [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> > > pSeries
> > > [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM
> > > iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack
> > > nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
> > > xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter
> > > ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum
> > > crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4
> > > virtio_net net_failover virtio_scsi failover
> > > [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not
> > > tainted 5.2.0-rc4+ #3
> > > [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR:
> > > c010aff4
> > > [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-
> > > rc4+)
> > > [   44.375691] MSR:  8280b033
> > >   CR: 42244842  XER: 
> > > [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR:
> > > 4200 IRQMASK: 0
> > > [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300
> > > 
> > > [   44.375815] GPR04:  0003 2444b05d
> > > c017f11c45d0
> > > [   44.375815] GPR08: 07803e018dfe 0028 0001
> > > 0075
> > > [   44.375815] GPR12: c010aff4 c7ff6300 
> > > 
> > > [   44.375815] GPR16:  c017f11d 
> > > c017f11ca7a8
> > > [   44.375815] GPR20: c017f11c42ec  
> > > 000a
> > > [   44.375815] GPR24: fffc  c017f11c
> > > c1a77ed8
> > > [   44.375815] GPR28: c0179af7 fffc c008089ff170
> > > c0179ae88540
> > > [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
> > > [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0
> > > [kvm_hv]
> > > [   44.376849] Call Trace:
> > > [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c
> > > (unreliable)
> > > [   44.376982] [c0179b397a10] [c008089dd6bc]
> > > kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> > > [   44.377084] [c0179b397ae0] [c008093f8bcc]
> > > kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > [   44.377185] [c0179b397b00] [c008093f522c]
> > > kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> > > [   44.377286] [c0179b397b90] [c008093e3618]
> > > kvm_vcpu_ioctl+0x460/0x850 [kvm]
> > > [   44.377384] [c0179b397d00] [c04ba6c4]
> > > do_vfs_ioctl+0xe4/0xb40
> > > [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
> > > [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
> > > [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
> > > [   44.377712] Instruction dump:
> > > [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b
> > > 2c2b 3860
> > > [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8
> > > 7c942ba6 7cbc2ba6
> > 
> > Opps, it's because I corrupted r3 :-(
> > 
> > Does this fix it?
> > 
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index 139027c62d..f781ee1458 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> > LOAD_REG_ADDR(r11, dawr_force_enable)
> > lbz r11, 0(r11)
> > cmpdi   r11, 0
> > +   bne 3f
> > li  r3, H_HARDWARE
> > -   beqlr
> > +   blr
> > +3:
> 
> Or you could copy r3 into another unused volatile register and use it 
> instead 

Re: [PATCH v3 1/6] nvdimm: Consider probe return -EOPNOTSUPP as success

2019-06-11 Thread Aneesh Kumar K.V


Hi Dan,

Any feedback on this?

A change I would like to get done on top of this series is

+   if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+   /*
+* For a large part we use PAGE_SIZE. But we
+* do have some accounting code using SZ_4K.
+*/
+   pfn_sb->page_struct_size = cpu_to_le16(64);
+   pfn_sb->page_size = cpu_to_le32(SZ_4K);
+   }
+

to

+   if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+   /*
+* For a large part we use PAGE_SIZE. But we
+* do have some accounting code using SZ_4K.
+*/
+   pfn_sb->page_struct_size = cpu_to_le16(64);
+   pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
+   }
+


That would make sure we will able to access the namespace created on
powerpc with newer kernel.

Kindly let me know if you want to see further changes to this series. Do
you think this is ready for next merge window?

-aneesh

"Aneesh Kumar K.V"  writes:

> With following patches we add EOPNOTSUPP as return from probe callback to
> indicate we were not able to initialize a namespace due to pfn superblock
> feature/version mismatch. We want to consider this a probe success so that
> we can create new namesapce seed and there by avoid marking the failed
> namespace as the seed namespace.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/bus.c |  4 ++--
>  drivers/nvdimm/nd-core.h |  3 ++-
>  drivers/nvdimm/region_devs.c | 19 +++
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 2eb6a6cfe9e4..792b3e90453b 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -100,8 +100,8 @@ static int nvdimm_bus_probe(struct device *dev)
>  
>   nvdimm_bus_probe_start(nvdimm_bus);
>   rc = nd_drv->probe(dev);
> - if (rc == 0)
> - nd_region_probe_success(nvdimm_bus, dev);
> + if (rc == 0 || rc == -EOPNOTSUPP)
> + nd_region_probe_success(nvdimm_bus, dev, rc);
>   else
>   nd_region_disable(nvdimm_bus, dev);
>   nvdimm_bus_probe_end(nvdimm_bus);
> diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
> index e5ffd5733540..9e67a79fb6d5 100644
> --- a/drivers/nvdimm/nd-core.h
> +++ b/drivers/nvdimm/nd-core.h
> @@ -134,7 +134,8 @@ int __init nvdimm_bus_init(void);
>  void nvdimm_bus_exit(void);
>  void nvdimm_devs_exit(void);
>  void nd_region_devs_exit(void);
> -void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device 
> *dev);
> +void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus,
> +  struct device *dev, int ret);
>  struct nd_region;
>  void nd_region_create_ns_seed(struct nd_region *nd_region);
>  void nd_region_create_btt_seed(struct nd_region *nd_region);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index b4ef7d9ff22e..fcf3d8828540 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -723,7 +723,7 @@ void nd_mapping_free_labels(struct nd_mapping *nd_mapping)
>   * disable the region.
>   */
>  static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
> - struct device *dev, bool probe)
> +struct device *dev, bool probe, int 
> ret)
>  {
>   struct nd_region *nd_region;
>  
> @@ -753,6 +753,16 @@ static void nd_region_notify_driver_action(struct 
> nvdimm_bus *nvdimm_bus,
>   nd_region_create_ns_seed(nd_region);
>   nvdimm_bus_unlock(dev);
>   }
> +
> + if (dev->parent && is_nd_region(dev->parent) &&
> + !probe && (ret == -EOPNOTSUPP)) {
> + nd_region = to_nd_region(dev->parent);
> + nvdimm_bus_lock(dev);
> + if (nd_region->ns_seed == dev)
> + nd_region_create_ns_seed(nd_region);
> + nvdimm_bus_unlock(dev);
> + }
> +
>   if (is_nd_btt(dev) && probe) {
>   struct nd_btt *nd_btt = to_nd_btt(dev);
>  
> @@ -788,14 +798,15 @@ static void nd_region_notify_driver_action(struct 
> nvdimm_bus *nvdimm_bus,
>   }
>  }
>  
> -void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device 
> *dev)
> +void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus,
> +  struct device *dev, int ret)
>  {
> - nd_region_notify_driver_action(nvdimm_bus, dev, true);
> + nd_region_notify_driver_action(nvdimm_bus, dev, true, ret);
>  }
>  
>  void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev)
>  {
> - nd_region_notify_driver_action(nvdimm_bus, dev, false);
> + nd_region_notify_driver_action(nvdimm_bus, dev, false, 0);
>  }
>  
>  static ssize_t mappingN(struct device *dev, char *buf, int n)
> -- 
> 2.21.0



Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Benjamin Herrenschmidt
On Tue, 2019-06-11 at 09:54 +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 04:59:54PM +1000, Benjamin Herrenschmidt
> wrote:
> > Ah stupid me ... it's dma_set_mask that failed, since it has no
> > idea
> > that the calling driver is limited to lowmem.
> > 
> > That's also why the "wrong" patch worked.
> > 
> > So yes, a ZONE_DMA at 30-bits will work, though it's somewhat
> > overkill.
> 
> Well, according to Larry it doesn't actually work, which is odd.

Oh I assume that's just a glitch in the patch :-)

Cheers,
Ben.




Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-06-11 Thread Wanpeng Li
On Tue, 11 Jun 2019 at 07:51, Naoya Horiguchi  wrote:
>
> On Wed, May 29, 2019 at 04:31:01PM -0700, Mike Kravetz wrote:
> > On 5/28/19 2:49 AM, Wanpeng Li wrote:
> > > Cc Paolo,
> > > Hi all,
> > > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz  
> > > wrote:
> > >>
> > >> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> > >>> Andrew Morton  writes:
> > >>>
> >  On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal 
> >   wrote:
> > 
> > >>
> > >> So I don't think that the above test result means that errors are 
> > >> properly
> > >> handled, and the proposed patch should help for arm64.
> > >
> > > Although, the deviation of pud_huge() avoids a kernel crash the code
> > > would be easier to maintain and reason about if arm64 helpers are
> > > consistent with expectations by core code.
> > >
> > > I'll look to update the arm64 helpers once this patch gets merged. But
> > > it would be helpful if there was a clear expression of semantics for
> > > pud_huge() for various cases. Is there any version that can be used as
> > > reference?
> > 
> >  Is that an ack or tested-by?
> > 
> >  Mike keeps plaintively asking the powerpc developers to take a look,
> >  but they remain steadfastly in hiding.
> > >>>
> > >>> Cc'ing linuxppc-dev is always a good idea :)
> > >>>
> > >>
> > >> Thanks Michael,
> > >>
> > >> I was mostly concerned about use cases for soft/hard offline of huge 
> > >> pages
> > >> larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
> > >> huge pages, and soft/hard offline support was specifically added for 
> > >> this.
> > >> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB 
> > >> pages
> > >> at PGD level"
> > >>
> > >> This patch will disable that functionality.  So, at a minimum this is a
> > >> 'heads up'.  If there are actual use cases that depend on this, then more
> > >> work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
> > >> support, I can not tell if there is a real use case or this is just a
> > >> 'nice to have'.
> > >
> > > 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> > > encounter gup_pud_range() panic several times in product environment.
> > > Is there any plan to reenable and fix arch codes?
> >
> > I too am aware of slightly more interest in 1G huge pages.  Suspect that as
> > Intel MMU capacity increases to handle more TLB entries there will be more
> > and more interest.
> >
> > Personally, I am not looking at this issue.  Perhaps Naoya will comment as
> > he know most about this code.
>
> Thanks for forwarding this to me, I'm feeling that memory error handling
> on 1GB hugepage is demanded as real use case.
>
> >
> > > In addition, 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> > > The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> > > are 1GB hugetlbfs pages, after above PUD panic is fixed,
> > > try_to_unmap() which is called in MCA recovery path will mark the PUD
> > > hwpoison entry. The guest will vmexit and retry endlessly when
> > > accessing any memory in the guest which is backed by this 1GB poisoned
> > > hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> > > hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> > > which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> > > into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> > > will be delivered to VM at page fault next time for the offensive
> > > SPTE. Is this proposal acceptable?
> >
> > I am not sure of the error handling design, but this does sound reasonable.
>
> I agree that that's better.
>
> > That block of code which potentially dissolves a huge page on memory error
> > is hard to understand and I'm not sure if that is even the 'normal'
> > functionality.  Certainly, we would hate to waste/poison an entire 1G page
> > for an error on a small subsection.
>
> Yes, that's not practical, so we need at first establish the code base for
> 2GB hugetlb splitting and then extending it to 1GB next.

I'm working on this, thanks for the inputs.

Regards,
Wanpeng Li


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Christoph Hellwig
On Tue, Jun 11, 2019 at 04:59:54PM +1000, Benjamin Herrenschmidt wrote:
> Ah stupid me ... it's dma_set_mask that failed, since it has no idea
> that the calling driver is limited to lowmem.
> 
> That's also why the "wrong" patch worked.
> 
> So yes, a ZONE_DMA at 30-bits will work, though it's somewhat overkill.

Well, according to Larry it doesn't actually work, which is odd.


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Christoph Hellwig
On Tue, Jun 11, 2019 at 04:58:12PM +1000, Benjamin Herrenschmidt wrote:
>  ... which b43legacy doesn't set to the best of my knowledge ...
> 
> Which makes me wonder how come it didn't work even with your patches ?
> AFAIK, we have less than 1GB of lowmem unless the config has been
> tweaked

It needs to bounce to somewhere.  And the dma-direct code is pretty
strict to require a zone it can do allocations from when setting the
dma mask.  As was the old ppc64 code, but not the ppc32 code that
allowed setting any DMA mask.  And something about the more strict
validation seem to trip up now.


Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Christophe Leroy




Le 11/06/2019 à 09:24, Michael Neuling a écrit :

On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:

On 11/06/2019 08:44, Michael Neuling wrote:

  2:
-BEGIN_FTR_SECTION
-   /* POWER9 with disabled DAWR */
+   LOAD_REG_ADDR(r11, dawr_force_enable)
+   lbz r11, 0(r11)
+   cmpdi   r11, 0
li  r3, H_HARDWARE
-   blr
-END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
+   beqlr


Why is this a 'beqlr' ? Shouldn't it be a blr ?


I believe it's right and should be a beqlr.  It's to replace the FTR section to
make it dynamic based on the dawr_force_enable bit.


hmm, see the crash below on a L1 running a nested guest. r3 is set
to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
this ?

C.


[   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
[   44.374848] Faulting instruction address: 0xc010b044
[   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
[   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
failover
[   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 
5.2.0-rc4+ #3
[   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: c010aff4
[   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
[   44.375691] MSR:  8280b033   CR: 
42244842  XER: 
[   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
IRQMASK: 0
[   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 

[   44.375815] GPR04:  0003 2444b05d 
c017f11c45d0
[   44.375815] GPR08: 07803e018dfe 0028 0001 
0075
[   44.375815] GPR12: c010aff4 c7ff6300  

[   44.375815] GPR16:  c017f11d  
c017f11ca7a8
[   44.375815] GPR20: c017f11c42ec   
000a
[   44.375815] GPR24: fffc  c017f11c 
c1a77ed8
[   44.375815] GPR28: c0179af7 fffc c008089ff170 
c0179ae88540
[   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
[   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
[kvm_hv]
[   44.376849] Call Trace:
[   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
(unreliable)
[   44.376982] [c0179b397a10] [c008089dd6bc] 
kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
[   44.377084] [c0179b397ae0] [c008093f8bcc] kvmppc_vcpu_run+0x34/0x48 
[kvm]
[   44.377185] [c0179b397b00] [c008093f522c] 
kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[   44.377286] [c0179b397b90] [c008093e3618] kvm_vcpu_ioctl+0x460/0x850 
[kvm]
[   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
[   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
[   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
[   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
[   44.377712] Instruction dump:
[   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
3860
[   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 7c942ba6 
7cbc2ba6



Opps, it's because I corrupted r3 :-(

Does this fix it?


diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
LOAD_REG_ADDR(r11, dawr_force_enable)
lbz r11, 0(r11)
cmpdi   r11, 0
+   bne 3f
li  r3, H_HARDWARE
-   beqlr
+   blr
+3:


Or you could copy r3 into another unused volatile register and use it 
instead of r3 below.


Christophe



/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
rlwimi  r5, r4, 2, DAWRX_WT



[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #8 from Christophe Leroy (christophe.le...@c-s.fr) ---
Argh !

CONFIG_SMP must (again) be the reason we missed it.

Can you please try the change below ?

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 1d5f1bd0dacd..f255e22184b4 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -752,6 +752,7 @@ __secondary_start:
stw r0,0(r3)

/* load up the MMU */
+   bl  load_segment_registers
bl  load_up_mmu

/* ptr to phys current thread */

Thanks
Christophe

On 06/11/2019 12:32 AM, bugzilla-dae...@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=203839
> 
> --- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
> Created attachment 283183
>--> https://bugzilla.kernel.org/attachment.cgi?id=283183=edit
> bisect.log
> 
> bisect took me a while due to quite some skips. Cherry-picking
> 397d2300b08cdee052053e362018cdb6dd65eea2 and
> 305d60012304684bd59ea1f67703e51662e4906a helped me complete it.
> 
> # git bisect good | tee -a /root/bisect02.log
> 215b823707ce4e8e52b106915f70357fa474c669 is the first bad commit
> commit 215b823707ce4e8e52b106915f70357fa474c669
> Author: Christophe Leroy 
> Date:   Fri Apr 26 16:23:36 2019 +
> 
>  powerpc/32s: set up an early static hash table for KASAN.
> 
>  KASAN requires early activation of hash table, before memblock()
>  functions are available.
> 
>  This patch implements an early hash_table statically defined in
>  __initdata.
> 
>  During early boot, a single page table is used.
> 
>  For hash32, when doing the final init, one page table is allocated
>  for each PGD entry because of the _PAGE_HASHPTE flag which can't be
>  common to several virt pages. This is done after memblock get
>  available but before switching to the final hash table, otherwise
>  there are issues with TLB flushing due to the shared entries.
> 
>  Signed-off-by: Christophe Leroy 
>  Signed-off-by: Michael Ellerman 
> 
> :04 04 abc24eb3c4ad3e4f2b1eb7b52c295c8b95d79a78
> c3b6114c26eb8e181abb3f1abc9b6ecc12292f4d M  arch
>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling
On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> On 11/06/2019 08:44, Michael Neuling wrote:
> > > >  2:
> > > > -BEGIN_FTR_SECTION
> > > > -   /* POWER9 with disabled DAWR */
> > > > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > +   lbz r11, 0(r11)
> > > > +   cmpdi   r11, 0
> > > > li  r3, H_HARDWARE
> > > > -   blr
> > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > +   beqlr
> > > 
> > > Why is this a 'beqlr' ? Shouldn't it be a blr ? 
> > 
> > I believe it's right and should be a beqlr.  It's to replace the FTR 
> > section to
> > make it dynamic based on the dawr_force_enable bit.
> 
> hmm, see the crash below on a L1 running a nested guest. r3 is set
> to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> this ? 
> 
> C.
> 
> 
> [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
> [   44.374848] Faulting instruction address: 0xc010b044
> [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
> iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
> nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
> bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
> iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
> sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
> failover
> [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not 
> tainted 5.2.0-rc4+ #3
> [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: 
> c010aff4
> [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
> [   44.375691] MSR:  8280b033   CR: 
> 42244842  XER: 
> [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
> IRQMASK: 0 
> [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 
>  
> [   44.375815] GPR04:  0003 2444b05d 
> c017f11c45d0 
> [   44.375815] GPR08: 07803e018dfe 0028 0001 
> 0075 
> [   44.375815] GPR12: c010aff4 c7ff6300  
>  
> [   44.375815] GPR16:  c017f11d  
> c017f11ca7a8 
> [   44.375815] GPR20: c017f11c42ec   
> 000a 
> [   44.375815] GPR24: fffc  c017f11c 
> c1a77ed8 
> [   44.375815] GPR28: c0179af7 fffc c008089ff170 
> c0179ae88540 
> [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
> [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
> [kvm_hv]
> [   44.376849] Call Trace:
> [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
> (unreliable)
> [   44.376982] [c0179b397a10] [c008089dd6bc] 
> kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> [   44.377084] [c0179b397ae0] [c008093f8bcc] 
> kvmppc_vcpu_run+0x34/0x48 [kvm]
> [   44.377185] [c0179b397b00] [c008093f522c] 
> kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> [   44.377286] [c0179b397b90] [c008093e3618] 
> kvm_vcpu_ioctl+0x460/0x850 [kvm]
> [   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
> [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
> [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
> [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
> [   44.377712] Instruction dump:
> [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
> 3860 
> [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 
> 7c942ba6 7cbc2ba6 


Opps, it's because I corrupted r3 :-(

Does this fix it?


diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
LOAD_REG_ADDR(r11, dawr_force_enable)
lbz r11, 0(r11)
cmpdi   r11, 0
+   bne 3f
li  r3, H_HARDWARE
-   beqlr
+   blr
+3:
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
rlwimi  r5, r4, 2, DAWRX_WT





Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Benjamin Herrenschmidt
On Tue, 2019-06-11 at 16:58 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 08:08 +0200, Christoph Hellwig wrote:
> > On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt
> > wrote:
> > > The reason I think it sort-of-mostly-worked is that to get more
> > > than
> > > 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> > > buffers aren't allocated in Highmem so you got lucky.
> > > 
> > > That said, there is such as thing as no-copy send on network, so I
> > > wouldn't be surprised if some things would still have failed, just
> > > not
> > > frequent enough for you to notice.
> > 
> > Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
> > will bounce buffer highmem pages for the driver under all
> > circumstances.
> 
>  ... which b43legacy doesn't set to the best of my knowledge ...
> 
> Which makes me wonder how come it didn't work even with your patches ?
> AFAIK, we have less than 1GB of lowmem unless the config has been
> tweaked

Ah stupid me ... it's dma_set_mask that failed, since it has no idea
that the calling driver is limited to lowmem.

That's also why the "wrong" patch worked.

So yes, a ZONE_DMA at 30-bits will work, though it's somewhat overkill.

Cheers,
Ben.




Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-11 Thread Anshuman Khandual
On 06/11/2019 05:46 AM, Nicholas Piggin wrote:
> Anshuman Khandual's on June 10, 2019 6:53 pm:
>> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them.
>>
>> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
>>
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Sure will try this on arm64.
>>
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>  include/asm-generic/4level-fixup.h |   1 +
>>>  include/asm-generic/5level-fixup.h |   1 +
>>>  include/linux/vmalloc.h|   1 +
>>>  mm/vmalloc.c   | 132 +++--
>>>  4 files changed, 107 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/include/asm-generic/4level-fixup.h 
>>> b/include/asm-generic/4level-fixup.h
>>> index e3667c9a33a5..3cc65a4dd093 100644
>>> --- a/include/asm-generic/4level-fixup.h
>>> +++ b/include/asm-generic/4level-fixup.h
>>> @@ -20,6 +20,7 @@
>>>  #define pud_none(pud)  0
>>>  #define pud_bad(pud)   0
>>>  #define pud_present(pud)   1
>>> +#define pud_large(pud) 0
>>>  #define pud_ERROR(pud) do { } while (0)
>>>  #define pud_clear(pud) pgd_clear(pud)
>>>  #define pud_val(pud)   pgd_val(pud)
>>> diff --git a/include/asm-generic/5level-fixup.h 
>>> b/include/asm-generic/5level-fixup.h
>>> index bb6cb347018c..c4377db09a4f 100644
>>> --- a/include/asm-generic/5level-fixup.h
>>> +++ b/include/asm-generic/5level-fixup.h
>>> @@ -22,6 +22,7 @@
>>>  #define p4d_none(p4d)  0
>>>  #define p4d_bad(p4d)   0
>>>  #define p4d_present(p4d)   1
>>> +#define p4d_large(p4d) 0
>>>  #define p4d_ERROR(p4d) do { } while (0)
>>>  #define p4d_clear(p4d) pgd_clear(p4d)
>>>  #define p4d_val(p4d)   pgd_val(p4d)
>>
>> Both of these are required from vmalloc_to_page() which as per a later 
>> comment
>> should be part of a prerequisite patch before this series.
> 
> I'm not sure what you mean. This patch is where they get used.

In case you move out vmalloc_to_page() changes to a separate patch.

> 
> Possibly I could split this and the vmalloc_to_page change out. I'll
> consider it.
> 
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 812bea5866d6..4c92dc608928 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -42,6 +42,7 @@ struct vm_struct {
>>> unsigned long   size;
>>> unsigned long   flags;
>>> struct page **pages;
>>> +   unsigned intpage_shift;
>>
>> So the entire vm_struct will be mapped with a single page_shift. It cannot 
>> have
>> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
>> allocation fails for larger ones, falling back etc what over other reasons.
> 
> For now, yes. I have a bit of follow up work to improve that and make
> it able to fall back, but it's a bit more churn and not a significant
> benefit just yet because there are not a lot of very large vmallocs
> (except the early hashes which can be satisfied with large allocs).

Right but it will make this new feature complete like ioremap which logically
supports till P4D (though AFAICT not used). If there are no actual vmalloc
requests that large it is fine. Allocation attempts will start from the page
table level depending on the requested size. It is better to have PUD/P4D
considerations now rather than trying to after fit it later.

> 
>>
>>> unsigned intnr_pages;
>>> phys_addr_t phys_addr;
>>> const void  *caller;
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index dd27cfb29b10..0cf8e861caeb 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -36,6 +36,7 @@
>>>  #include 
>>>  
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  
>>> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
>>> unsigned long end,
>>> return ret;
>>>  }
>>>  
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
>>
>> A small nit (if you agree) s/hpages/huge_pages/
> 
> Hmm. It's not actually a good function name because it can do small
> pages as well. vmap_pages_size_range or something may be better.

Right.

> 
>>
>>> +  pgprot_t prot, struct page **pages,
>>
>> Re-order (prot <---> pages) just to follow the standard like before.
> 
> Will do.
> 
>>> +  unsigned int page_shift)
>>> +{

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Benjamin Herrenschmidt
On Tue, 2019-06-11 at 08:08 +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt
> wrote:
> > The reason I think it sort-of-mostly-worked is that to get more
> > than
> > 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> > buffers aren't allocated in Highmem so you got lucky.
> > 
> > That said, there is such as thing as no-copy send on network, so I
> > wouldn't be surprised if some things would still have failed, just
> > not
> > frequent enough for you to notice.
> 
> Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
> will bounce buffer highmem pages for the driver under all
> circumstances.

 ... which b43legacy doesn't set to the best of my knowledge ...

Which makes me wonder how come it didn't work even with your patches ?
AFAIK, we have less than 1GB of lowmem unless the config has been
tweaked

Cheers,
Ben.




Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Cédric Le Goater
On 11/06/2019 08:44, Michael Neuling wrote:
> 
>>>  2:
>>> -BEGIN_FTR_SECTION
>>> -   /* POWER9 with disabled DAWR */
>>> +   LOAD_REG_ADDR(r11, dawr_force_enable)
>>> +   lbz r11, 0(r11)
>>> +   cmpdi   r11, 0
>>> li  r3, H_HARDWARE
>>> -   blr
>>> -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
>>> +   beqlr
>>
>> Why is this a 'beqlr' ? Shouldn't it be a blr ? 
> 
> I believe it's right and should be a beqlr.  It's to replace the FTR section 
> to
> make it dynamic based on the dawr_force_enable bit.

hmm, see the crash below on a L1 running a nested guest. r3 is set
to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
this ? 

C.


[   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
[   44.374848] Faulting instruction address: 0xc010b044
[   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
[   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
failover
[   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 
5.2.0-rc4+ #3
[   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: c010aff4
[   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
[   44.375691] MSR:  8280b033   CR: 
42244842  XER: 
[   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
IRQMASK: 0 
[   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 
 
[   44.375815] GPR04:  0003 2444b05d 
c017f11c45d0 
[   44.375815] GPR08: 07803e018dfe 0028 0001 
0075 
[   44.375815] GPR12: c010aff4 c7ff6300  
 
[   44.375815] GPR16:  c017f11d  
c017f11ca7a8 
[   44.375815] GPR20: c017f11c42ec   
000a 
[   44.375815] GPR24: fffc  c017f11c 
c1a77ed8 
[   44.375815] GPR28: c0179af7 fffc c008089ff170 
c0179ae88540 
[   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
[   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
[kvm_hv]
[   44.376849] Call Trace:
[   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
(unreliable)
[   44.376982] [c0179b397a10] [c008089dd6bc] 
kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
[   44.377084] [c0179b397ae0] [c008093f8bcc] kvmppc_vcpu_run+0x34/0x48 
[kvm]
[   44.377185] [c0179b397b00] [c008093f522c] 
kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[   44.377286] [c0179b397b90] [c008093e3618] kvm_vcpu_ioctl+0x460/0x850 
[kvm]
[   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
[   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
[   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
[   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
[   44.377712] Instruction dump:
[   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
3860 
[   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 7c942ba6 
7cbc2ba6 


Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range

2019-06-11 Thread Christophe Leroy




Le 10/06/2019 à 05:08, Nicholas Piggin a écrit :

Radix can use ioremap_page_range for ioremap, after slab is available.
This makes it possible to enable huge ioremap mapping support.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
  arch/powerpc/mm/book3s64/pgtable.c | 21 +
  arch/powerpc/mm/book3s64/radix_pgtable.c   | 21 +
  arch/powerpc/mm/pgtable_64.c   |  2 +-
  4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 574eca33f893..e04a839cb5b9 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -266,6 +266,9 @@ extern void radix__vmemmap_remove_mapping(unsigned long 
start,
  extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 pgprot_t flags, unsigned int psz);
  
+extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa,

+   unsigned long size, pgprot_t prot, int nid);
+


'extern' is pointless here, and checkpatch will cry.


  static inline unsigned long radix__get_tree_size(void)
  {
unsigned long rts_field;
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index ff98b663c83e..953850a602f7 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -450,3 +450,24 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
  
  	return true;

  }
+
+int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
+{
+   unsigned long i;
+
+   if (radix_enabled())
+   return radix__ioremap_range(ea, pa, size, prot, nid);


This function looks pretty similar to the one in the previous patch.
Since radix_enabled() is available and return false for all other 
subarches, I think the above could go in the generic ioremap_range(), 
you'll only need to move the function declaration in a common file, for 
instance asm/io.h



+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+   if (err) {
+   if (slab_is_available())
+   unmap_kernel_range(ea, size);
+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c9bcf428dd2b..db993bc1aef3 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -11,6 +11,7 @@
  
  #define pr_fmt(fmt) "radix-mmu: " fmt
  
+#include 

  #include 
  #include 
  #include 
@@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
  
  	set_pte_at(mm, addr, ptep, pte);

  }
+
+int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
+   pgprot_t prot, int nid)
+{
+   if (likely(slab_is_available())) {
+   int err = ioremap_page_range(ea, ea + size, pa, prot);
+   if (err)
+   unmap_kernel_range(ea, size);
+   return err;
+   } else {
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+   if (WARN_ON_ONCE(err)) /* Should clean up */
+   return err;
+   }


Same loop again.

What about not doing a radix specific function and just putting 
something like below in the core ioremap_range() function ?


if (likely(slab_is_available()) && radix_enabled()) {
int err = ioremap_page_range(ea, ea + size, pa, prot);

if (err)
unmap_kernel_range(ea, size);
return err;
}

Because I'm pretty sure will more and more use ioremap_page_range().


+   return 0;
+   }
+}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 6bd3660388aa..63cd81130643 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -108,7 +108,7 @@ unsigned long ioremap_bot;
  unsigned long ioremap_bot = IOREMAP_BASE;
  #endif
  
-static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)

+int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)


Hum. Weak functions remain in unused in vmlinux unless 
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is selected.


Also, they are some how dangerous because people might change them 
without seeing that it is overridden for some particular configuration.


Christophe


  {

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling


> >  2:
> > -BEGIN_FTR_SECTION
> > -   /* POWER9 with disabled DAWR */
> > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > +   lbz r11, 0(r11)
> > +   cmpdi   r11, 0
> > li  r3, H_HARDWARE
> > -   blr
> > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > +   beqlr
> 
> Why is this a 'beqlr' ? Shouldn't it be a blr ? 

I believe it's right and should be a beqlr.  It's to replace the FTR section to
make it dynamic based on the dawr_force_enable bit.

Mikey



[PATCH v11 13/13] ima: Allow template= option for appraise rules as well

2019-06-11 Thread Thiago Jung Bauermann
It's useful being able to specify a different IMA template on appraise
policy rules, so allow it.

Signed-off-by: Thiago Jung Bauermann 
Suggested-by: Mimi Zohar 
---
 security/integrity/ima/ima_policy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 6463ab8921ea..1ac1ef458f2e 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -1110,7 +1110,8 @@ static int ima_parse_rule(char *rule, struct 
ima_rule_entry *entry)
break;
case Opt_template:
ima_log_string(ab, "template", args[0].from);
-   if (entry->action != MEASURE) {
+   if (entry->action != MEASURE &&
+   entry->action != APPRAISE) {
result = -EINVAL;
break;
}



[PATCH v11 11/13] ima: Define ima-modsig template

2019-06-11 Thread Thiago Jung Bauermann
Define new "d-modsig" template field which holds the digest that is
expected to match the one contained in the modsig, and also new "modsig"
template field which holds the appended file signature.

Add a new "ima-modsig" defined template descriptor with the new fields as
well as the ones from the "ima-sig" descriptor.

Change ima_store_measurement() to accept a struct modsig * argument so that
it can be passed along to the templates via struct ima_event_data.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 Documentation/security/IMA-templates.rst  |  7 ++-
 security/integrity/ima/ima.h  | 20 +++-
 security/integrity/ima/ima_api.c  |  5 +-
 security/integrity/ima/ima_main.c |  2 +-
 security/integrity/ima/ima_modsig.c   | 19 +++
 security/integrity/ima/ima_policy.c   | 41 
 security/integrity/ima/ima_template.c |  7 ++-
 security/integrity/ima/ima_template_lib.c | 60 ++-
 security/integrity/ima/ima_template_lib.h |  4 ++
 9 files changed, 157 insertions(+), 8 deletions(-)

diff --git a/Documentation/security/IMA-templates.rst 
b/Documentation/security/IMA-templates.rst
index 2cd0e273cc9a..8da20b444be0 100644
--- a/Documentation/security/IMA-templates.rst
+++ b/Documentation/security/IMA-templates.rst
@@ -68,15 +68,18 @@ descriptors by adding their identifier to the format string
  - 'd-ng': the digest of the event, calculated with an arbitrary hash
algorithm (field format: [:]digest, where the digest
prefix is shown only if the hash algorithm is not SHA1 or MD5);
+ - 'd-modsig': the digest of the event without the appended modsig;
  - 'n-ng': the name of the event, without size limitations;
- - 'sig': the file signature.
+ - 'sig': the file signature;
+ - 'modsig' the appended file signature.
 
 
 Below, there is the list of defined template descriptors:
 
  - "ima": its format is ``d|n``;
  - "ima-ng" (default): its format is ``d-ng|n-ng``;
- - "ima-sig": its format is ``d-ng|n-ng|sig``.
+ - "ima-sig": its format is ``d-ng|n-ng|sig``;
+ - "ima-modsig": its format is ``d-ng|n-ng|sig|d-modsig|modsig``.
 
 
 
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 0acc8e56ec73..a2b2c13ceda8 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -64,6 +64,7 @@ struct ima_event_data {
const unsigned char *filename;
struct evm_ima_xattr_data *xattr_value;
int xattr_len;
+   const struct modsig *modsig;
const char *violation;
 };
 
@@ -207,7 +208,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file 
*file,
   const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
-  int xattr_len, int pcr,
+  int xattr_len, const struct modsig *modsig, int pcr,
   struct ima_template_desc *template_desc);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
@@ -308,6 +309,10 @@ bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
struct modsig **modsig);
 void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+ const u8 **digest, u32 *digest_size);
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+  u32 *data_len);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -326,6 +331,19 @@ static inline void ima_collect_modsig(struct modsig 
*modsig, const void *buf,
 {
 }
 
+static inline int ima_get_modsig_digest(const struct modsig *modsig,
+   enum hash_algo *algo, const u8 **digest,
+   u32 *digest_size)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int ima_get_raw_modsig(const struct modsig *modsig,
+const void **data, u32 *data_len)
+{
+   return -EOPNOTSUPP;
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c351b8c37278..32297d1e6164 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -291,7 +291,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 void ima_store_measurement(struct integrity_iint_cache *iint,
   struct file *file, const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
-  int xattr_len, int pcr,
+ 

[PATCH v11 12/13] ima: Store the measurement again when appraising a modsig

2019-06-11 Thread Thiago Jung Bauermann
If the IMA template contains the "modsig" or "d-modsig" field, then the
modsig should be added to the measurement list when the file is appraised.

And that is what normally happens, but if a measurement rule caused a file
containing a modsig to be measured before a different rule causes it to be
appraised, the resulting measurement entry will not contain the modsig
because it is only fetched during appraisal. When the appraisal rule
triggers, it won't store a new measurement containing the modsig because
the file was already measured.

We need to detect that situation and store an additional measurement with
the modsig. This is done by adding an IMA_MEASURE action flag if we read a
modsig and the IMA template contains a modsig field.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
---
 security/integrity/ima/ima.h  |  1 +
 security/integrity/ima/ima_api.c  | 19 +++
 security/integrity/ima/ima_main.c | 15 ---
 security/integrity/ima/ima_template.c | 19 +++
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index a2b2c13ceda8..44f5f60424c2 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -149,6 +149,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
 struct ima_template_desc *lookup_template_desc(const char *name);
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 32297d1e6164..bb887ed3d8a7 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -222,6 +222,14 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
char digest[IMA_MAX_DIGEST_SIZE];
} hash;
 
+   /*
+* Always collect the modsig, because IMA might have already collected
+* the file digest without collecting the modsig in a previous
+* measurement rule.
+*/
+   if (modsig)
+   ima_collect_modsig(modsig, buf, size);
+
if (iint->flags & IMA_COLLECTED)
goto out;
 
@@ -255,9 +263,6 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
memcpy(iint->ima_hash, , length);
iint->version = i_version;
 
-   if (modsig)
-   ima_collect_modsig(modsig, buf, size);
-
/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
if (!result)
iint->flags |= IMA_COLLECTED;
@@ -307,7 +312,13 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint,
 .modsig = modsig };
int violation = 0;
 
-   if (iint->measured_pcrs & (0x1 << pcr))
+   /*
+* We still need to store the measurement in the case of MODSIG because
+* we only have its contents to put in the list at the time of
+* appraisal, but a file measurement from earlier might already exist in
+* the measurement list.
+*/
+   if (iint->measured_pcrs & (0x1 << pcr) && !modsig)
return;
 
result = ima_alloc_init_template(_data, , template_desc);
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 85afb31fafe0..e0ca39f81a59 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -307,9 +307,18 @@ static int process_measurement(struct file *file, const 
struct cred *cred,
/* read 'security.ima' */
xattr_len = ima_read_xattr(file_dentry(file), _value);
 
-   /* Read the appended modsig if allowed by the policy. */
-   if (iint->flags & IMA_MODSIG_ALLOWED)
-   ima_read_modsig(func, buf, size, );
+   /*
+* Read the appended modsig if allowed by the policy, and allow
+* an additional measurement list entry, if needed, based on the
+* template format and whether the file was already measured.
+*/
+   if (iint->flags & IMA_MODSIG_ALLOWED) {
+   rc = ima_read_modsig(func, buf, size, );
+
+   if (!rc && ima_template_has_modsig(template_desc) &&
+   iint->flags & IMA_MEASURED)
+   action |= IMA_MEASURE;
+   }
}
 
hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index e25bef419c87..00d9a6cc8a60 100644
--- 

[PATCH v11 10/13] ima: Collect modsig

2019-06-11 Thread Thiago Jung Bauermann
Obtain the modsig and calculate its corresponding hash in
ima_collect_measurement().

Signed-off-by: Thiago Jung Bauermann 
---
 security/integrity/ima/ima.h  |  8 -
 security/integrity/ima/ima_api.c  |  5 ++-
 security/integrity/ima/ima_appraise.c |  2 +-
 security/integrity/ima/ima_main.c |  2 +-
 security/integrity/ima/ima_modsig.c   | 50 ++-
 5 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index ebbfae10f174..0acc8e56ec73 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -203,7 +203,7 @@ int ima_get_action(struct inode *inode, const struct cred 
*cred, u32 secid,
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
-   enum hash_algo algo);
+   enum hash_algo algo, struct modsig *modsig);
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file 
*file,
   const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
@@ -307,6 +307,7 @@ static inline int ima_read_xattr(struct dentry *dentry,
 bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
struct modsig **modsig);
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -320,6 +321,11 @@ static inline int ima_read_modsig(enum ima_hooks func, 
const void *buf,
return -EOPNOTSUPP;
 }
 
+static inline void ima_collect_modsig(struct modsig *modsig, const void *buf,
+ loff_t size)
+{
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c0cf4bcfc82f..c351b8c37278 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -208,7 +208,7 @@ int ima_get_action(struct inode *inode, const struct cred 
*cred, u32 secid,
  */
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
-   enum hash_algo algo)
+   enum hash_algo algo, struct modsig *modsig)
 {
const char *audit_cause = "failed";
struct inode *inode = file_inode(file);
@@ -255,6 +255,9 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
memcpy(iint->ima_hash, , length);
iint->version = i_version;
 
+   if (modsig)
+   ima_collect_modsig(modsig, buf, size);
+
/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
if (!result)
iint->flags |= IMA_COLLECTED;
diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 70252ac3321d..aa14e3fe25d5 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -438,7 +438,7 @@ void ima_update_xattr(struct integrity_iint_cache *iint, 
struct file *file)
!(iint->flags & IMA_HASH))
return;
 
-   rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo);
+   rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo, NULL);
if (rc < 0)
return;
 
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 8ddf9faa8d02..2c9d3cf85726 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -314,7 +314,7 @@ static int process_measurement(struct file *file, const 
struct cred *cred,
 
hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
 
-   rc = ima_collect_measurement(iint, file, buf, size, hash_algo);
+   rc = ima_collect_measurement(iint, file, buf, size, hash_algo, modsig);
if (rc != 0 && rc != -EBADF && rc != -EINVAL)
goto out_locked;
 
diff --git a/security/integrity/ima/ima_modsig.c 
b/security/integrity/ima/ima_modsig.c
index f41ebe370fa0..d438b87dba89 100644
--- a/security/integrity/ima/ima_modsig.c
+++ b/security/integrity/ima/ima_modsig.c
@@ -17,6 +17,19 @@
 
 struct modsig {
struct pkcs7_message *pkcs7_msg;
+
+   enum hash_algo hash_algo;
+
+   /* This digest will go in the 'd-modsig' field of the IMA template. */
+   const u8 *digest;
+   u32 digest_size;
+
+   /*
+* This is what will go to the measurement list if the template requires
+* storing the signature.
+*/
+   int raw_pkcs7_len;
+   u8 raw_pkcs7[];
 };
 
 /**
@@ -71,7 +84,8 @@ int ima_read_modsig(enum ima_hooks func, const void 

[PATCH v11 09/13] ima: Implement support for module-style appended signatures

2019-06-11 Thread Thiago Jung Bauermann
Implement the appraise_type=imasig|modsig option, allowing IMA to read and
verify modsig signatures.

In case a file has both an xattr signature and an appended modsig, IMA will
only use the appended signature if the key used by the xattr signature
isn't present in the IMA or platform keyring.

Because modsig verification needs to convert from an integrity keyring id
to the keyring itself, add an integrity_keyring_from_id() function in
digsig.c so that integrity_modsig_verify() can use it.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 security/integrity/digsig.c   | 43 
 security/integrity/ima/Kconfig|  3 ++
 security/integrity/ima/ima.h  | 22 -
 security/integrity/ima/ima_appraise.c | 51 +--
 security/integrity/ima/ima_main.c | 11 -
 security/integrity/ima/ima_modsig.c   | 71 +++
 security/integrity/ima/ima_policy.c   | 12 ++---
 security/integrity/integrity.h| 19 +++
 8 files changed, 209 insertions(+), 23 deletions(-)

diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
index e19c2eb72c51..3399a7e32830 100644
--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -43,11 +43,10 @@ static const char * const 
keyring_name[INTEGRITY_KEYRING_MAX] = {
 #define restrict_link_to_ima restrict_link_by_builtin_trusted
 #endif
 
-int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
-   const char *digest, int digestlen)
+static struct key *integrity_keyring_from_id(const unsigned int id)
 {
-   if (id >= INTEGRITY_KEYRING_MAX || siglen < 2)
-   return -EINVAL;
+   if (id >= INTEGRITY_KEYRING_MAX)
+   return ERR_PTR(-EINVAL);
 
if (!keyring[id]) {
keyring[id] =
@@ -56,23 +55,49 @@ int integrity_digsig_verify(const unsigned int id, const 
char *sig, int siglen,
int err = PTR_ERR(keyring[id]);
pr_err("no %s keyring: %d\n", keyring_name[id], err);
keyring[id] = NULL;
-   return err;
+   return ERR_PTR(err);
}
}
 
+   return keyring[id];
+}
+
+int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
+   const char *digest, int digestlen)
+{
+   struct key *keyring;
+
+   if (siglen < 2)
+   return -EINVAL;
+
+   keyring = integrity_keyring_from_id(id);
+   if (IS_ERR(keyring))
+   return PTR_ERR(keyring);
+
switch (sig[1]) {
case 1:
/* v1 API expect signature without xattr type */
-   return digsig_verify(keyring[id], sig + 1, siglen - 1,
-digest, digestlen);
+   return digsig_verify(keyring, sig + 1, siglen - 1, digest,
+digestlen);
case 2:
-   return asymmetric_verify(keyring[id], sig, siglen,
-digest, digestlen);
+   return asymmetric_verify(keyring, sig, siglen, digest,
+digestlen);
}
 
return -EOPNOTSUPP;
 }
 
+int integrity_modsig_verify(const unsigned int id, const struct modsig *modsig)
+{
+   struct key *keyring;
+
+   keyring = integrity_keyring_from_id(id);
+   if (IS_ERR(keyring))
+   return PTR_ERR(keyring);
+
+   return ima_modsig_verify(keyring, modsig);
+}
+
 static int __integrity_init_keyring(const unsigned int id, key_perm_t perm,
struct key_restriction *restriction)
 {
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index bba19f9ea184..0fb542455698 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -234,6 +234,9 @@ config IMA_APPRAISE_BOOTPARAM
 config IMA_APPRAISE_MODSIG
bool "Support module-style signatures for appraisal"
depends on IMA_APPRAISE
+   depends on INTEGRITY_ASYMMETRIC_KEYS
+   select PKCS7_MESSAGE_PARSER
+   select MODULE_SIG_FORMAT
default n
help
   Adds support for signatures appended to files. The format of the
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 9e2580164e97..ebbfae10f174 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -192,6 +192,10 @@ enum ima_hooks {
__ima_hooks(__ima_hook_enumify)
 };
 
+extern const char *const func_tokens[];
+
+struct modsig;
+
 /* LIM API function definitions */
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
@@ -245,7 +249,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 struct integrity_iint_cache *iint,
 

[PATCH v11 08/13] ima: Factor xattr_verify() out of ima_appraise_measurement()

2019-06-11 Thread Thiago Jung Bauermann
Verify xattr signature in a separate function so that the logic in
ima_appraise_measurement() remains clear when it gains the ability to also
verify an appended module signature.

The code in the switch statement is unchanged except for having to
dereference the status and cause variables (since they're now pointers),
and fixing the style of a block comment to appease checkpatch.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/ima/ima_appraise.c | 141 +++---
 1 file changed, 81 insertions(+), 60 deletions(-)

diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 18bbe753421a..5d4772f39757 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -202,6 +202,83 @@ int ima_read_xattr(struct dentry *dentry,
return ret;
 }
 
+/*
+ * xattr_verify - verify xattr digest or signature
+ *
+ * Verify whether the hash or signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int xattr_verify(enum ima_hooks func, struct integrity_iint_cache *iint,
+   struct evm_ima_xattr_data *xattr_value, int xattr_len,
+   enum integrity_status *status, const char **cause)
+{
+   int rc = -EINVAL, hash_start = 0;
+
+   switch (xattr_value->type) {
+   case IMA_XATTR_DIGEST_NG:
+   /* first byte contains algorithm id */
+   hash_start = 1;
+   /* fall through */
+   case IMA_XATTR_DIGEST:
+   if (iint->flags & IMA_DIGSIG_REQUIRED) {
+   *cause = "IMA-signature-required";
+   *status = INTEGRITY_FAIL;
+   break;
+   }
+   clear_bit(IMA_DIGSIG, >atomic_flags);
+   if (xattr_len - sizeof(xattr_value->type) - hash_start >=
+   iint->ima_hash->length)
+   /*
+* xattr length may be longer. md5 hash in previous
+* version occupied 20 bytes in xattr, instead of 16
+*/
+   rc = memcmp(_value->data[hash_start],
+   iint->ima_hash->digest,
+   iint->ima_hash->length);
+   else
+   rc = -EINVAL;
+   if (rc) {
+   *cause = "invalid-hash";
+   *status = INTEGRITY_FAIL;
+   break;
+   }
+   *status = INTEGRITY_PASS;
+   break;
+   case EVM_IMA_XATTR_DIGSIG:
+   set_bit(IMA_DIGSIG, >atomic_flags);
+   rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
+(const char *)xattr_value,
+xattr_len,
+iint->ima_hash->digest,
+iint->ima_hash->length);
+   if (rc == -EOPNOTSUPP) {
+   *status = INTEGRITY_UNKNOWN;
+   break;
+   }
+   if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+   func == KEXEC_KERNEL_CHECK)
+   rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
+(const char *)xattr_value,
+xattr_len,
+iint->ima_hash->digest,
+iint->ima_hash->length);
+   if (rc) {
+   *cause = "invalid-signature";
+   *status = INTEGRITY_FAIL;
+   } else {
+   *status = INTEGRITY_PASS;
+   }
+   break;
+   default:
+   *status = INTEGRITY_UNKNOWN;
+   *cause = "unknown-ima-data";
+   break;
+   }
+
+   return rc;
+}
+
 /*
  * ima_appraise_measurement - appraise file measurement
  *
@@ -221,7 +298,7 @@ int ima_appraise_measurement(enum ima_hooks func,
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_backing_inode(dentry);
enum integrity_status status = INTEGRITY_UNKNOWN;
-   int rc = xattr_len, hash_start = 0;
+   int rc = xattr_len;
 
if (!(inode->i_opflags & IOP_XATTR))
return INTEGRITY_UNKNOWN;
@@ -259,65 +336,9 @@ int ima_appraise_measurement(enum ima_hooks func,
WARN_ONCE(true, "Unexpected integrity status %d\n", status);
}
 
-   switch (xattr_value->type) {
-   case IMA_XATTR_DIGEST_NG:
-   /* first byte contains algorithm id */
-   hash_start = 1;
-   /* fall through */
-   case 

[PATCH v11 07/13] ima: Add modsig appraise_type option for module-style appended signatures

2019-06-11 Thread Thiago Jung Bauermann
Introduce the modsig keyword to the IMA policy syntax to specify that
a given hook should expect the file to have the IMA signature appended
to it. Here is how it can be used in a rule:

appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig

With this rule, IMA will accept either a signature stored in the extended
attribute or an appended signature.

For now, the rule above will behave exactly the same as if
appraise_type=imasig was specified. The actual modsig implementation
will be introduced separately.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
---
 Documentation/ABI/testing/ima_policy |  6 +-
 security/integrity/ima/Kconfig   | 10 +
 security/integrity/ima/Makefile  |  1 +
 security/integrity/ima/ima.h |  9 
 security/integrity/ima/ima_modsig.c  | 31 
 security/integrity/ima/ima_policy.c  | 12 +--
 security/integrity/integrity.h   |  1 +
 7 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index b383c1763610..e622cdafe0af 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -36,7 +36,7 @@ Description:
euid:= decimal value
fowner:= decimal value
lsm:are LSM specific
-   option: appraise_type:= [imasig]
+   option: appraise_type:= [imasig] [imasig|modsig]
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
@@ -104,3 +104,7 @@ Description:
 
measure func=KEXEC_KERNEL_CHECK pcr=4
measure func=KEXEC_INITRAMFS_CHECK pcr=5
+
+   Example of appraise rule allowing modsig appended signatures:
+
+   appraise func=KEXEC_KERNEL_CHECK 
appraise_type=imasig|modsig
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index a18f8c6d13b5..bba19f9ea184 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -231,6 +231,16 @@ config IMA_APPRAISE_BOOTPARAM
  This option enables the different "ima_appraise=" modes
  (eg. fix, log) from the boot command line.
 
+config IMA_APPRAISE_MODSIG
+   bool "Support module-style signatures for appraisal"
+   depends on IMA_APPRAISE
+   default n
+   help
+  Adds support for signatures appended to files. The format of the
+  appended signature is the same used for signed kernel modules.
+  The modsig keyword can be used in the IMA policy to allow a hook
+  to accept such signatures.
+
 config IMA_TRUSTED_KEYRING
bool "Require all keys on the .ima keyring be signed (deprecated)"
depends on IMA_APPRAISE && SYSTEM_TRUSTED_KEYRING
diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index d921dc4f9eb0..31d57cdf2421 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
 ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 18b48a6d0b80..9e2580164e97 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -298,6 +298,15 @@ static inline int ima_read_xattr(struct dentry *dentry,
 
 #endif /* CONFIG_IMA_APPRAISE */
 
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+bool ima_hook_supports_modsig(enum ima_hooks func);
+#else
+static inline bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+   return false;
+}
+#endif /* CONFIG_IMA_APPRAISE_MODSIG */
+
 /* LSM based policy rules require audit */
 #ifdef CONFIG_IMA_LSM_RULES
 
diff --git a/security/integrity/ima/ima_modsig.c 
b/security/integrity/ima/ima_modsig.c
new file mode 100644
index ..87503bfe8c8b
--- /dev/null
+++ b/security/integrity/ima/ima_modsig.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * IMA support for appraising module-style appended signatures.
+ *
+ * Copyright (C) 2019  IBM Corporation
+ *
+ * Author:
+ * Thiago Jung Bauermann 
+ */
+
+#include "ima.h"
+
+/**
+ * ima_hook_supports_modsig - can the policy allow modsig for this hook?
+ *
+ * modsig is only supported by hooks using ima_post_read_file(), because only
+ * they preload the contents of the file in a buffer. FILE_CHECK does that in
+ * some cases, but not when reached from vfs_open(). POLICY_CHECK can support
+ * it, but it's not useful in practice because it's a text file so deny.
+ */
+bool 

[PATCH v11 05/13] integrity: Select CONFIG_KEYS instead of depending on it

2019-06-11 Thread Thiago Jung Bauermann
This avoids a dependency cycle in soon-to-be-introduced
CONFIG_IMA_APPRAISE_MODSIG: it will select CONFIG_MODULE_SIG_FORMAT
which in turn selects CONFIG_KEYS. Kconfig then complains that
CONFIG_INTEGRITY_SIGNATURE depends on CONFIG_KEYS.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 security/integrity/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index 3ba1168b1756..93d73902c571 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -17,8 +17,8 @@ if INTEGRITY
 
 config INTEGRITY_SIGNATURE
bool "Digital signature verification using multiple keyrings"
-   depends on KEYS
default n
+   select KEYS
select SIGNATURE
help
  This option enables digital signature verification support


[PATCH v11 06/13] ima: Use designated initializers for struct ima_event_data

2019-06-11 Thread Thiago Jung Bauermann
Designated initializers allow specifying only the members of the struct
that need initialization. Non-mentioned members are initialized to zero.

This makes the code a bit clearer (particularly in ima_add_boot_aggregate)
and also allows adding a new member to the struct without having to update
all struct initializations.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/ima/ima_api.c  | 13 +
 security/integrity/ima/ima_init.c |  4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 78eb11c7ac07..c0cf4bcfc82f 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -139,8 +139,10 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
 {
struct ima_template_entry *entry;
struct inode *inode = file_inode(file);
-   struct ima_event_data event_data = {iint, file, filename, NULL, 0,
-   cause};
+   struct ima_event_data event_data = { .iint = iint,
+.file = file,
+.filename = filename,
+.violation = cause };
int violation = 1;
int result;
 
@@ -294,8 +296,11 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint,
int result = -ENOMEM;
struct inode *inode = file_inode(file);
struct ima_template_entry *entry;
-   struct ima_event_data event_data = {iint, file, filename, xattr_value,
-   xattr_len, NULL};
+   struct ima_event_data event_data = { .iint = iint,
+.file = file,
+.filename = filename,
+.xattr_value = xattr_value,
+.xattr_len = xattr_len };
int violation = 0;
 
if (iint->measured_pcrs & (0x1 << pcr))
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 993d0f1915ff..368ef658a1cd 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -49,8 +49,8 @@ static int __init ima_add_boot_aggregate(void)
const char *audit_cause = "ENOMEM";
struct ima_template_entry *entry;
struct integrity_iint_cache tmp_iint, *iint = _iint;
-   struct ima_event_data event_data = {iint, NULL, boot_aggregate_name,
-   NULL, 0, NULL};
+   struct ima_event_data event_data = { .iint = iint,
+.filename = boot_aggregate_name };
int result = -ENOMEM;
int violation = 0;
struct {



[PATCH v11 04/13] integrity: Introduce struct evm_xattr

2019-06-11 Thread Thiago Jung Bauermann
Even though struct evm_ima_xattr_data includes a fixed-size array to hold a
SHA1 digest, most of the code ignores the array and uses the struct to mean
"type indicator followed by data of unspecified size" and tracks the real
size of what the struct represents in a separate length variable.

The only exception to that is the EVM code, which correctly uses the
definition of struct evm_ima_xattr_data.

So make this explicit in the code by removing the length specification from
the array in struct evm_ima_xattr_data. Also, change the name of the
element from digest to data since in most places the array doesn't hold a
digest.

A separate struct evm_xattr is introduced, with the original definition of
evm_ima_xattr_data to be used in the places that actually expect that
definition, specifically the EVM HMAC code.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/evm/evm_main.c | 8 
 security/integrity/ima/ima_appraise.c | 7 ---
 security/integrity/integrity.h| 6 ++
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/security/integrity/evm/evm_main.c 
b/security/integrity/evm/evm_main.c
index b6d9f14bc234..588f22f1b5bd 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -169,7 +169,7 @@ static enum integrity_status evm_verify_hmac(struct dentry 
*dentry,
/* check value type */
switch (xattr_data->type) {
case EVM_XATTR_HMAC:
-   if (xattr_len != sizeof(struct evm_ima_xattr_data)) {
+   if (xattr_len != sizeof(struct evm_xattr)) {
evm_status = INTEGRITY_FAIL;
goto out;
}
@@ -179,7 +179,7 @@ static enum integrity_status evm_verify_hmac(struct dentry 
*dentry,
   xattr_value_len, );
if (rc)
break;
-   rc = crypto_memneq(xattr_data->digest, digest.digest,
+   rc = crypto_memneq(xattr_data->data, digest.digest,
   SHA1_DIGEST_SIZE);
if (rc)
rc = -EINVAL;
@@ -523,7 +523,7 @@ int evm_inode_init_security(struct inode *inode,
 const struct xattr *lsm_xattr,
 struct xattr *evm_xattr)
 {
-   struct evm_ima_xattr_data *xattr_data;
+   struct evm_xattr *xattr_data;
int rc;
 
if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
@@ -533,7 +533,7 @@ int evm_inode_init_security(struct inode *inode,
if (!xattr_data)
return -ENOMEM;
 
-   xattr_data->type = EVM_XATTR_HMAC;
+   xattr_data->data.type = EVM_XATTR_HMAC;
rc = evm_init_hmac(inode, lsm_xattr, xattr_data->digest);
if (rc < 0)
goto out;
diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 2f6536ab69e8..18bbe753421a 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -168,7 +168,8 @@ enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data 
*xattr_value,
return sig->hash_algo;
break;
case IMA_XATTR_DIGEST_NG:
-   ret = xattr_value->digest[0];
+   /* first byte contains algorithm id */
+   ret = xattr_value->data[0];
if (ret < HASH_ALGO__LAST)
return ret;
break;
@@ -176,7 +177,7 @@ enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data 
*xattr_value,
/* this is for backward compatibility */
if (xattr_len == 21) {
unsigned int zero = 0;
-   if (!memcmp(_value->digest[16], , 4))
+   if (!memcmp(_value->data[16], , 4))
return HASH_ALGO_MD5;
else
return HASH_ALGO_SHA1;
@@ -275,7 +276,7 @@ int ima_appraise_measurement(enum ima_hooks func,
/* xattr length may be longer. md5 hash in previous
   version occupied 20 bytes in xattr, instead of 16
 */
-   rc = memcmp(_value->digest[hash_start],
+   rc = memcmp(_value->data[hash_start],
iint->ima_hash->digest,
iint->ima_hash->length);
else
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 7de59f44cba3..88a29f72a74f 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -79,6 +79,12 @@ enum evm_ima_xattr_type {
 
 struct evm_ima_xattr_data {
u8 type;
+   u8 data[];
+} __packed;
+
+/* Only used in the EVM HMAC code. */
+struct evm_xattr {
+   struct evm_ima_xattr_data data;
u8 digest[SHA1_DIGEST_SIZE];

[PATCH v11 03/13] PKCS#7: Introduce pkcs7_get_digest()

2019-06-11 Thread Thiago Jung Bauermann
IMA will need to access the digest of the PKCS7 message (as calculated by
the kernel) before the signature is verified, so introduce
pkcs7_get_digest() for that purpose.

Also, modify pkcs7_digest() to detect when the digest was already
calculated so that it doesn't have to do redundant work. Verifying that
sinfo->sig->digest isn't NULL is sufficient because both places which
allocate sinfo->sig (pkcs7_parse_message() and pkcs7_note_signed_info())
use kzalloc() so sig->digest is always initialized to zero.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: David Howells 
Cc: David Woodhouse 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 crypto/asymmetric_keys/pkcs7_verify.c | 33 +++
 include/crypto/pkcs7.h|  4 
 2 files changed, 37 insertions(+)

diff --git a/crypto/asymmetric_keys/pkcs7_verify.c 
b/crypto/asymmetric_keys/pkcs7_verify.c
index f7b0980bf02d..3243981152b5 100644
--- a/crypto/asymmetric_keys/pkcs7_verify.c
+++ b/crypto/asymmetric_keys/pkcs7_verify.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "pkcs7_parser.h"
 
@@ -33,6 +34,10 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
 
kenter(",%u,%s", sinfo->index, sinfo->sig->hash_algo);
 
+   /* The digest was calculated already. */
+   if (sig->digest)
+   return 0;
+
if (!sinfo->sig->hash_algo)
return -ENOPKG;
 
@@ -121,6 +126,34 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
return ret;
 }
 
+int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf, u32 *len,
+enum hash_algo *hash_algo)
+{
+   struct pkcs7_signed_info *sinfo = pkcs7->signed_infos;
+   int i, ret;
+
+   /*
+* This function doesn't support messages with more than one signature.
+*/
+   if (sinfo == NULL || sinfo->next != NULL)
+   return -EBADMSG;
+
+   ret = pkcs7_digest(pkcs7, sinfo);
+   if (ret)
+   return ret;
+
+   *buf = sinfo->sig->digest;
+   *len = sinfo->sig->digest_size;
+
+   for (i = 0; i < HASH_ALGO__LAST; i++)
+   if (!strcmp(hash_algo_name[i], sinfo->sig->hash_algo)) {
+   *hash_algo = i;
+   break;
+   }
+
+   return 0;
+}
+
 /*
  * Find the key (X.509 certificate) to use to verify a PKCS#7 message.  PKCS#7
  * uses the issuer's name and the issuing certificate serial number for
diff --git a/include/crypto/pkcs7.h b/include/crypto/pkcs7.h
index 583f199400a3..3bfe6829eaae 100644
--- a/include/crypto/pkcs7.h
+++ b/include/crypto/pkcs7.h
@@ -13,6 +13,7 @@
 #define _CRYPTO_PKCS7_H
 
 #include 
+#include 
 #include 
 
 struct key;
@@ -44,4 +45,7 @@ extern int pkcs7_verify(struct pkcs7_message *pkcs7,
 extern int pkcs7_supply_detached_data(struct pkcs7_message *pkcs7,
  const void *data, size_t datalen);
 
+extern int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf,
+   u32 *len, enum hash_algo *hash_algo);
+
 #endif /* _CRYPTO_PKCS7_H */



[PATCH v11 02/13] PKCS#7: Refactor verify_pkcs7_signature()

2019-06-11 Thread Thiago Jung Bauermann
IMA will need to verify a PKCS#7 signature which has already been parsed.
For this reason, factor out the code which does that from
verify_pkcs7_signature() into a new function which takes a struct
pkcs7_message instead of a data buffer.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: David Howells 
Cc: David Woodhouse 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 certs/system_keyring.c   | 61 ++--
 include/linux/verification.h | 10 ++
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index c05c29ae4d5d..4ba82e52e4b4 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -194,33 +194,27 @@ late_initcall(load_system_certificate_list);
 #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
 
 /**
- * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * verify_pkcs7_message_sig - Verify a PKCS#7-based signature on system data.
  * @data: The data to be verified (NULL if expecting internal data).
  * @len: Size of @data.
- * @raw_pkcs7: The PKCS#7 message that is the signature.
- * @pkcs7_len: The size of @raw_pkcs7.
+ * @pkcs7: The PKCS#7 message that is the signature.
  * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
  * (void *)1UL for all trusted keys).
  * @usage: The use to which the key is being put.
  * @view_content: Callback to gain access to content.
  * @ctx: Context for callback.
  */
-int verify_pkcs7_signature(const void *data, size_t len,
-  const void *raw_pkcs7, size_t pkcs7_len,
-  struct key *trusted_keys,
-  enum key_being_used_for usage,
-  int (*view_content)(void *ctx,
-  const void *data, size_t len,
-  size_t asn1hdrlen),
-  void *ctx)
+int verify_pkcs7_message_sig(const void *data, size_t len,
+struct pkcs7_message *pkcs7,
+struct key *trusted_keys,
+enum key_being_used_for usage,
+int (*view_content)(void *ctx,
+const void *data, size_t len,
+size_t asn1hdrlen),
+void *ctx)
 {
-   struct pkcs7_message *pkcs7;
int ret;
 
-   pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
-   if (IS_ERR(pkcs7))
-   return PTR_ERR(pkcs7);
-
/* The data should be detached - so we need to supply it. */
if (data && pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
pr_err("PKCS#7 signature with non-detached data\n");
@@ -273,6 +267,41 @@ int verify_pkcs7_signature(const void *data, size_t len,
}
 
 error:
+   pr_devel("<==%s() = %d\n", __func__, ret);
+   return ret;
+}
+
+/**
+ * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * @data: The data to be verified (NULL if expecting internal data).
+ * @len: Size of @data.
+ * @raw_pkcs7: The PKCS#7 message that is the signature.
+ * @pkcs7_len: The size of @raw_pkcs7.
+ * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
+ * (void *)1UL for all trusted keys).
+ * @usage: The use to which the key is being put.
+ * @view_content: Callback to gain access to content.
+ * @ctx: Context for callback.
+ */
+int verify_pkcs7_signature(const void *data, size_t len,
+  const void *raw_pkcs7, size_t pkcs7_len,
+  struct key *trusted_keys,
+  enum key_being_used_for usage,
+  int (*view_content)(void *ctx,
+  const void *data, size_t len,
+  size_t asn1hdrlen),
+  void *ctx)
+{
+   struct pkcs7_message *pkcs7;
+   int ret;
+
+   pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
+   if (IS_ERR(pkcs7))
+   return PTR_ERR(pkcs7);
+
+   ret = verify_pkcs7_message_sig(data, len, pkcs7, trusted_keys, usage,
+  view_content, ctx);
+
pkcs7_free_message(pkcs7);
pr_devel("<==%s() = %d\n", __func__, ret);
return ret;
diff --git a/include/linux/verification.h b/include/linux/verification.h
index 018fb5f13d44..5e1d41f2b336 100644
--- a/include/linux/verification.h
+++ b/include/linux/verification.h
@@ -36,6 +36,7 @@ extern const char *const 
key_being_used_for[NR__KEY_BEING_USED_FOR];
 #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
 
 struct key;
+struct pkcs7_message;
 
 extern int verify_pkcs7_signature(const void *data, size_t len,
  const void 

[PATCH v11 01/13] MODSIGN: Export module signature definitions

2019-06-11 Thread Thiago Jung Bauermann
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.

Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: Jessica Yu 
---
 include/linux/module.h   |  3 --
 include/linux/module_signature.h | 44 +
 init/Kconfig |  6 +++-
 kernel/Makefile  |  1 +
 kernel/module.c  |  1 +
 kernel/module_signature.c| 46 ++
 kernel/module_signing.c  | 56 +---
 scripts/Makefile |  2 +-
 8 files changed, 106 insertions(+), 53 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 188998d3dca9..aa56f531cf1e 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -25,9 +25,6 @@
 #include 
 #include 
 
-/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
-#define MODULE_SIG_STRING "~Module signature appended~\n"
-
 /* Not Yet Implemented */
 #define MODULE_SUPPORTED_DEVICE(name)
 
diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
new file mode 100644
index ..523617fc5b6a
--- /dev/null
+++ b/include/linux/module_signature.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Module signature handling.
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#ifndef _LINUX_MODULE_SIGNATURE_H
+#define _LINUX_MODULE_SIGNATURE_H
+
+/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
+#define MODULE_SIG_STRING "~Module signature appended~\n"
+
+enum pkey_id_type {
+   PKEY_ID_PGP,/* OpenPGP generated key ID */
+   PKEY_ID_X509,   /* X.509 arbitrary subjectKeyIdentifier */
+   PKEY_ID_PKCS7,  /* Signature in PKCS#7 message */
+};
+
+/*
+ * Module signature information block.
+ *
+ * The constituents of the signature section are, in order:
+ *
+ * - Signer's name
+ * - Key identifier
+ * - Signature data
+ * - Information block
+ */
+struct module_signature {
+   u8  algo;   /* Public-key crypto algorithm [0] */
+   u8  hash;   /* Digest algorithm [0] */
+   u8  id_type;/* Key identifier type [PKEY_ID_PKCS7] */
+   u8  signer_len; /* Length of signer's name [0] */
+   u8  key_id_len; /* Length of key identifier [0] */
+   u8  __pad[3];
+   __be32  sig_len;/* Length of signature data */
+};
+
+int mod_check_sig(const struct module_signature *ms, size_t file_len,
+ const char *name);
+
+#endif /* _LINUX_MODULE_SIGNATURE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 8b9ffe236e4f..c2286a3c74c5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1852,6 +1852,10 @@ config BASE_SMALL
default 0 if BASE_FULL
default 1 if !BASE_FULL
 
+config MODULE_SIG_FORMAT
+   def_bool n
+   select SYSTEM_DATA_VERIFICATION
+
 menuconfig MODULES
bool "Enable loadable module support"
option modules
@@ -1929,7 +1933,7 @@ config MODULE_SRCVERSION_ALL
 config MODULE_SIG
bool "Module signature verification"
depends on MODULES
-   select SYSTEM_DATA_VERIFICATION
+   select MODULE_SIG_FORMAT
help
  Check modules for valid signatures upon load: the signature
  is simply appended to the module. For more information see
diff --git a/kernel/Makefile b/kernel/Makefile
index 33824f0385b3..f29ae2997a43 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ endif
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += module.o
 obj-$(CONFIG_MODULE_SIG) += module_signing.o
+obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signature.o
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_CRASH_CORE) += crash_core.o
diff --git a/kernel/module.c b/kernel/module.c
index 6e6712b3aaf5..2712f4d217f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/kernel/module_signature.c b/kernel/module_signature.c
new file mode 100644
index ..4224a1086b7d
--- /dev/null
+++ b/kernel/module_signature.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Module signature checker
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * mod_check_sig - check that the given signature is sane
+ *
+ * @ms:Signature to check.
+ * @file_len:  Size of the file to which @ms is 

[PATCH v11 00/13] Appended signatures support for IMA appraisal

2019-06-11 Thread Thiago Jung Bauermann
Hello,

Nothing big in this version. Noteworthy changes are:

1. Fixes for two bugs in ima_appraise_measurements() which were spotted and
resolved by Mimi Zohar. The changelog points them out.

2. One bugfix in process_measurement() which would cause all files
appraised with modsig to be measured as well, even if the policy didn't
request it.

3. Adapted to work with per policy rule template formats.

Plus small cosmetic changes in some places. The changelog has the details.

This has been tested with signed modules and with signed kernels loaded via
kexec_file_load().

Many thanks to Mimi Zohar for her help with the development of this patch
series.

The patches apply on today's linux-integrity/next-queued-testing.

Original cover letter:

On the OpenPOWER platform, secure boot and trusted boot are being
implemented using IMA for taking measurements and verifying signatures.
Since the kernel image on Power servers is an ELF binary, kernels are
signed using the scripts/sign-file tool and thus use the same signature
format as signed kernel modules.

This patch series adds support in IMA for verifying those signatures.
It adds flexibility to OpenPOWER secure boot, because it allows it to boot
kernels with the signature appended to them as well as kernels where the
signature is stored in the IMA extended attribute.

Changes since v10:

- Patch "MODSIGN: Export module signature definitions"
  - Moved config MODULE_SIG_FORMAT definition before its use. Suggested by
Mimi Zohar.
  - Added missing kerneldoc for @name parameter. Suggested by Mimi Zohar.

- Patch "ima: Implement support for module-style appended signatures"
  - Bugfix: don't check status variable when deciding whether to verify
modsig in ima_appraise_measurement(). Suggested by Mimi Zohar.
  - Bugfix: verify the modsig in ima_appraise_measurement() if the xattr
contains a digest. Suggested by Mimi Zohar.

- Patch "ima: Define ima-modsig template"
  - Renamed ima_modsig_serialize() to ima_get_raw_modsig().
  - Renamed check_current_template_modsig() to check_template_modsig().
  - Fixed outdated comment in ima_eventmodsig_init(). Suggested by Mimi
Zohar.
  - Check either the global or the per-rule template when an appraisal rule
allows modsig. Suggested by Mimi Zohar.

- Patch "ima: Store the measurement again when appraising a modsig"
  - Bugfix: Only re-measure file containing modsig if it was measured
before.
  - Check for modsig-related fields in the template_desc obtained in
process_measurement() which can be a per-rule template. Suggested by Mimi
Zohar.

- Patch "ima: Allow template= option for appraise rules as well"
  - New patch. Suggested by Mimi Zohar.

Changes since v9:

- Patch "MODSIGN: Export module signature definitions"
  - Moved mod_check_sig() to a new file so that CONFIG_IMA_APPRAISE_MODSIG
doesn't have to depend on CONFIG_MODULES.
  - Changed scripts/Makefile to build sign-file if CONFIG_MODULE_SIG_FORMAT
is set.
  - Removed Mimi's Reviewed-by because of the changes in this version.

- Patch "PKCS#7: Refactor verify_pkcs7_signature()"
  - Don't add function pkcs7_get_message_sig() anymore, since it's not
needed in the current version.

- Patch "PKCS#7: Introduce pkcs7_get_digest()"
  - Changed 'len' argument from 'u8 *' to 'u32 *'.
  - Added 'hash_algo' argument to obtain the algo used for the digest.
  - Don't check whether 'buf', 'len' and 'hash_algo' output arguments are NULL,
since the function's only caller always sets them.
  - Removed Mimi's Reviewed-by because of the changes in this version.

- Patch "integrity: Introduce asymmetric_sig_has_known_key()"
  - Dropped.

- Patch "integrity: Introduce integrity_keyring_from_id"
  - Squashed into "ima: Implement support for module-style appended signatures"
  - Changed integrity_keyring_from_id() to a static function (suggested by Mimi
Zohar).

- Patch "ima: Introduce is_signed()"
  - Dropped.

- Patch "ima: Export func_tokens"
  - Squashed into "ima: Implement support for module-style appended signatures"

- Patch "ima: Use designated initializers for struct ima_event_data"
  - New patch.

- Patch "ima: Factor xattr_verify() out of ima_appraise_measurement()"
  - New patch.

- Patch "ima: Implement support for module-style appended signatures"
  - Renamed 'struct modsig_hdr' to 'struct modsig'.
  - Added integrity_modsig_verify() to integrity/digsig.c so that it's not
necessary to export integrity_keyring_from_id() (Suggested by Mimi Zohar).
  - Don't add functions ima_xattr_sig_known_key() and
modsig_has_known_key() since they're not necessary anymore.
  - Added modsig argument to ima_appraise_measurement().
  - Verify modsig in a separate function called by ima_appraise_measurement().
  - Renamed ima_read_collect_modsig() to ima_read_modsig(), with a separate
collect function added in patch "ima: Collect modsig" (suggested by Mimi
Zohar).
  - In ima_read_modsig(), moved code saving of raw PKCS7 data to 

Re: [PATCH 1/3] powerpc/64: __ioremap_at clean up in the error case

2019-06-11 Thread Christophe Leroy




Le 10/06/2019 à 05:08, Nicholas Piggin a écrit :

__ioremap_at error handling is wonky, it requires caller to clean up
after it. Implement a helper that does the map and error cleanup and
remove the requirement from the caller.

Signed-off-by: Nicholas Piggin 
---

This series is a different approach to the problem, using the generic
ioremap_page_range directly which reduces added code, and moves
the radix specific code into radix files. Thanks to Christophe for
pointing out various problems with the previous patch.

  arch/powerpc/mm/pgtable_64.c | 27 ---
  1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index d2d976ff8a0e..6bd3660388aa 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -108,14 +108,30 @@ unsigned long ioremap_bot;
  unsigned long ioremap_bot = IOREMAP_BASE;
  #endif
  
+static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)

+{
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);


Missing a blank line


+   if (err) {


I'd have done the following to reduce indentation depth

if (!err)
continue


+   if (slab_is_available())
+   unmap_kernel_range(ea, size);


Shouldn't it be unmap_kernel_range(ea, i) ?

Christophe


+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
+
  /**
   * __ioremap_at - Low level function to establish the page tables
   *for an IO mapping
   */
  void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
  {
-   unsigned long i;
-
/* We don't support the 4K PFN hack with ioremap */
if (pgprot_val(prot) & H_PAGE_4K_PFN)
return NULL;
@@ -129,9 +145,8 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, 
unsigned long size, pgprot_
WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
WARN_ON(size & ~PAGE_MASK);
  
-	for (i = 0; i < size; i += PAGE_SIZE)

-   if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
-   return NULL;
+   if (ioremap_range((unsigned long)ea, pa, size, prot, NUMA_NO_NODE))
+   return NULL;
  
  	return (void __iomem *)ea;

  }
@@ -182,8 +197,6 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
  
  		area->phys_addr = paligned;

ret = __ioremap_at(paligned, area->addr, size, prot);
-   if (!ret)
-   vunmap(area->addr);
} else {
ret = __ioremap_at(paligned, (void *)ioremap_bot, size, prot);
if (ret)



Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-11 Thread Anshuman Khandual



On 06/10/2019 08:14 PM, Nicholas Piggin wrote:
> Mark Rutland's on June 11, 2019 12:10 am:
>> Hi,
>>
>> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Do you happen to know which vmalloc mappings these get used for in the
>> above case? Where do we see vmalloc mappings that large?
> 
> Large module vmalloc could be subject to huge mappings.
> 
>> I'm worried as to how this would interact with the set_memory_*()
>> functions, as on arm64 those can only operate on page-granular mappings.
>> Those may need fixing up to handle huge mappings; certainly if the above
>> is all for modules.
> 
> Good point, that looks like it would break on arm64 at least. I'll
> work on it. We may have to make this opt in beyond HUGE_VMAP.

This is another reason we might need to have an arch opt-ins like the one
I mentioned before.


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Christoph Hellwig
On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt wrote:
> The reason I think it sort-of-mostly-worked is that to get more than
> 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> buffers aren't allocated in Highmem so you got lucky.
> 
> That said, there is such as thing as no-copy send on network, so I
> wouldn't be surprised if some things would still have failed, just not
> frequent enough for you to notice.

Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
will bounce buffer highmem pages for the driver under all circumstances.


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-11 Thread Christoph Hellwig
On Mon, Jun 10, 2019 at 11:09:47AM -0500, Larry Finger wrote:
>>>  return -EIO;
>>>
>>> For b43legacy, dev->dma_mask is 0xc2656848.
>>>  dma_supported(dev, mask) is 0xc08b, mask is 0x3fff, and
>>> the routine returns -EIO.
>>>
>>> For b43,   dev->dma_mask is 0xc26568480001,
>>>  dma_supported(dev, mask) is 0xc08b, mask is 0x, and
>>> the routine returns 0.
>>
>> I don't fully understand what values the above map to.  Can you send
>> me your actual debugging patch as well?
>
> I do not understand why the if statement returns true as neither of the 
> values is zero. After seeing the x86 output shown below, I also do not 
> understand all the trailing zeros.
>
> My entire patch is attached. That output came from this section:

What might be confusing in your output is that dev->dma_mask is a pointer,
and we are setting it in dma_set_mask.  That is before we only check
if the pointer is set, and later we override it.  Of course this doesn't
actually explain the failure.  But what is even more strange to me
is that you get a return value from dma_supported() that isn't 0 or 1,
as that function is supposed to return a boolean, and I really can't see
how mask >= __phys_to_dma(dev, min_mask), would return anything but 0
or 1.  Does the output change if you use the correct printk specifiers?

i.e. with a debug patch like this:


diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..9e5b30b12b10 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -378,6 +378,7 @@ EXPORT_SYMBOL(dma_direct_map_resource);
 int dma_direct_supported(struct device *dev, u64 mask)
 {
u64 min_mask;
+   bool ret;
 
if (IS_ENABLED(CONFIG_ZONE_DMA))
min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
@@ -391,7 +392,12 @@ int dma_direct_supported(struct device *dev, u64 mask)
 * use __phys_to_dma() here so that the SME encryption mask isn't
 * part of the check.
 */
-   return mask >= __phys_to_dma(dev, min_mask);
+   ret = (mask >= __phys_to_dma(dev, min_mask));
+   if (!ret)
+   dev_info(dev,
+   "%s: failed (mask = 0x%llx, min_mask = 0x%llx/0x%llx, 
dma bits = %d\n",
+   __func__, mask, min_mask, __phys_to_dma(dev, min_mask), 
ARCH_ZONE_DMA_BITS);
+   return ret;
 }
 
 size_t dma_direct_max_mapping_size(struct device *dev)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f7afdadb6770..6c57ccdee2ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -317,8 +317,14 @@ void arch_dma_set_mask(struct device *dev, u64 mask);
 
 int dma_set_mask(struct device *dev, u64 mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, mask))
+   if (!dev->dma_mask) {
+   dev_info(dev, "no DMA mask set!\n");
return -EIO;
+   }
+   if (!dma_supported(dev, mask)) {
+   printk("DMA not supported\n");
+   return -EIO;
+   }
 
arch_dma_set_mask(dev, mask);
dma_check_mask(dev, mask);


<    1   2